The “cat’s fleas” example comes from an early chapter in a science fiction book by James P. Hogan, The Two Faces of Tomorrow (available for free download). That chapter describes a team of artificial intelligence developers trying to get their software to control a virtual human-like figure, named Hector, in a simulated, simplified home, taking Hector through the actions of preparing a simple breakfast. The software’s goal is to get Hector to fry an egg. The simulation knows quite a bit about how objects react to Hector’s interactions with them. This knowledge is unavailable to Hector or its controlling software, who can only observe the results of the simulated interactions. For example, Hector’s first attempts to perform the first step, of taking an egg out of the refrigerator and placing it on the table, fail because he didn’t treat the egg gently enough, and the simulation knows that this causes the egg to break. Hector observes this, but doesn’t even know that this is bad until the human developers add this fact to its knowledge. Now it proceeds to placing the indicated amount of butter into the pan, and again it needs to be told that the butter must be unwrapped first. Finally, Hector places the egg in the pan – intact! After all, he was told not to break the egg…
Is the solution to keep on adding facts to Hector’s knowledge? There’s something wrong with having myriads of facts such as “butter must be unwrapped for frying” and “eggs should be consumed without parts of their shell” – there must be some generalization, or we’d have to learn how to perform each variant of the most common tasks from scratch. Yet, such generalizations can be tricky. There is a difference between the “packaging” of butter and of eggs, and some vegetables are eaten with their skins. Trying to come up with the right generalizations, while stating the exceptions to these rules, may be beyond the capabilities of even the best legal minds. More troubling, even if we could accomplish these generalizations, they wouldn’t feel like the kind of knowledge that we humans employ. One way to see it is to observe the mistakes made when the knowledge is incomplete, as happens with children or with adults learning skills which are completely outside their prior experience. Their mistakes will be very different from the mistakes made by the fictional Hector.
The View from 1979
Hogan wrote his book in 1979. The IBM PC launch was still two years in the future. Computer hardware, storage and speeds have gone through several revolutions since then, as did computer software. Access to shared knowledge across the whole world has become effortless, for humans and for software. With such stunning progress, it might seem that Hogan was being too pessimistic in setting the action of his novel in the middle of the 21st century, when – at least at the start of the novel – computers still behave too unintelligently to be trusted with frying an egg, and must certainly be kept away from cats (especially those with fleas). Well, Hogan’s prediction for the state of computerized common sense, almost thirty years after he wrote his book, is quite accurate: We still haven’t figured out how to handle these kind of problems.
Let’s take another, simpler goal. This time, we’ll avoid the need to interact with objects, and we’ll limit ourselves to giving simple advice to humans, within a sharply delimited context. Think of an “information kiosk” placed in a busy urban street, with information about culture and shopping around the city. Let’s require only that it should answer questions such as “where can I find around here”, where could be a museum, cinema, mall etc. Sounds easy, right? In fact, it is easy, and we’ve all seen such kiosks in action. What we might not have noticed, however, is their limitations. Here’s what I mean: Let’s ask this urban information kiosk where we can buy newspapers nearby. The software may have several newsstands listed, but will it also inform us about book stores and other places that also sell newspapers? If it’s 9PM, would it display only those few stores that are still open, or would it at least ask you whether you’re looking for one that is open now? And how about the meaning of “nearby”? Is “nearby airport” the same as “nearby newsstand”? How about “nearby cinema” – will the software look for cinemas within, say, 20-minute travel time, and consider the current status of traffic when deciding whether a certain location is indeed reachable in 20 minutes? The software running the information kiosk is not up to such challenges, so when dealing with it, we make up for its limitations in knowing the right questions to ask. We would not expect humans to have the same failings.
What should we add to the software in order to give it near-human proficiency in answering such supposedly simple questions? As in Hogan’s fictional scenario, we could try adding facts: shops have opening times; people prefer to go to shops when they’re open; the meaning of “nearby” depends on what we’re looking for, and possibly on time of day and other factors; newspapers are likely to be found in several categories of stores; some landmarks may be reached at all times of day, but tourists usually prefer getting to them at specific times; etc. This domain is much simpler than the “breakfast-making” domain in which we started, but still we’ll find ourselves adding fact after fact without any clarity about when we can stop. Even worse, as long as people accepted the software’s limitations and adjusted their interaction with the software accordingly, they considered this interaction useful and acceptable. After all, what can you expect from a machine? However, after we have really worked hard and intelligently, and after we’ve met a few challenges which are beyond the capabilities of most current “information kiosk” software, our software will have finally earned the right to be considered “stupid”. Not exactly the achievement we were hoping for, right?
The Elusive “Common Sense”
I hope these two “case studies” managed to demonstrate the magnitude and complexity of what we sometimes call “common sense” and of the knowledge required in order to function well even in the simplest everyday situations. “Common sense” is notoriously hard to define, but intuitively it implies the knowledge – much of it implicit – that we expect just about all members of society to have. In the information-kiosk example, this does not include the detailed knowledge of store and cinema locations, but it does include the knowledge that we want to go to shops only when they are open; that we’re willing to accept airports as “nearby” if they’re 20 miles away, but cinemas have to be 2 miles away to qualify; etc.
Is there an isolated part of our “common sense” which is all that’s required for the information kiosk? No. The knowledge that we can only buy at an open shop is relevant to many aspects of our daily life. The knowledge that we’re willing to travel longer distances to reach an airport is actually derived from the fact that there are not many airports situated in urban centers; that there are fewer airports than newsstands; and that the time spent going to the airport is typically a start of a longer trip. We could try to list all of these facts for the sole use of our information kiosk, but it’s a large task. It would be much better to share the effort of creating this knowledge with other kinds of software.
This was quite evident back when Hogan was writing The Two Faces of Tomorrow. In the real world, the best-known attempt to create such a universal set of “common sense” knowledge is the Cyc ( Cyc is a registered trademark of Cycorp) Project. Cyc – (from “encyclopedia”, pronounced like psych) – was started in 1984 by Doug Lenat, a prominent artificial intelligence researcher and one of the original Fellows of the Association for the Advancement of Artificial Intelligence (AAAI). Cyc has been in continuous development since then, first as a project of Microelectronics and Computer Technology Corporation and, since 1994, by Cycorp, Inc. – a company devoted to Cyc and run by Lenat.
Cyc has the ambitious goal of codifying our shared real-world knowledge into a form that can be used by software. Estimates for the number of knowledge items required for this vary, but Cyc usually states several million items would be required. To clarify, these items do not include all that is known to humanity. For example, if there are nearly two million named species of animals known to biologists, and if we associate just a few facts with each, we’re way past the “several million” budget. However, the “common sense” underlying this knowledge may be described quite differently and compactly. First, we need to define species, at least using an everyday understanding which does not have to conform to the strictest scientific understanding. What did even early human societies know about species? First, only two animals of the same species can have offspring. Second, the offspring will also be of the same species. Third, members of the same species are typically similar to each other.
[Side note: this coding of information into software-usable context and related contexts has many parallels to the ideas of the “Semantic Web”. Since this is out of the scope of this column, let me just state that these parallels are not a coincidence. However, Cyc’s vision preceded the semantic net, and is much more ambitious, in that it goes beyond understanding what a web page is about, and also aims to use this understanding, together with its common knowledge, in order to derive new conclusions and understandings. In recent years, there has also been some collaboration between Cyc and semantic-network efforts.]
Now, at least a few readers are objecting to the above informal definition of species: What about asexual reproduction and cloning, where you only need one parent? What about mules, which are offspring of parents from different species? What about sexual dimorphism (think of peacocks and peahens, or about the fish species whose males are tiny and permanently attached to the much larger females)?
This is where you really need to be careful when defining the knowledge items, and this example should give you some idea of how hard it is to carry out effective “knowledge engineering”. Yet, the real test is not in absolute accuracy: every generalization will have exceptions. The test is in being able to use this common sense to make everyday deductions which are generally dependable, and in being able to capture important exceptions – sometimes in the general pool of “common sense” and sometimes in specific specialized knowledge pools.
These specialized knowledge pools are Cyc’s way of going beyond common sense into codification of “expert knowledge”. In the example of knowledge about biological species, it makes sense to have some facts about mammals in the general knowledge pool (e.g. “female mammals lactate to feed their young”; “cows are mammals”), whereas the scientific definition of the class Mammalia mammals and their taxonomic categorization into subclasses, orders etc. would be part of an expert knowledge module. A key part of Cyc design is the interaction between distributed “Cyc agents”. Every Cyc agent is endowed with some specialized knowledge, and communicates with the other agents using a shared “common sense” pool –pretty similar to the structure of human information society.
Now comes the next step: tapping into “shallow” information sources. By “shallow”, I mean sources that have not been codified as hierarchical knowledge. These could be lists and tables of data, such as location and opening times of stores, or geopolitical information. They could also be the Internet itself, using today’s search engines with the Cyc knowledge pool guiding the framing of the search question and the interpretation of the web pages that are found. Thus, asking whether two politicians from different states met during 2005 would first trigger a search for their names plus terms such as “meeting”, “summit” etc., as well as the requested date. Web pages that are retrieved by this search are scanned to see whether appropriate sentences indeed appear in them. If there isn’t evidence for such a meeting, Cyc would generate text strings to determine where each politician was during 2005. If it finds a date when both politicians were in the same city, Cyc could use its knowledge regarding the roles and relationships of the politicians to determine whether it is likely that a meeting had been set for that date. Cyc also detects contradictions between different web pages, as well as contradictions between its own knowledge pool and whatever it finds in its searches, so that it can assign “levels of confidence” to the answers it produces.
Once the coding of general knowledge and specialized knowledge has been completed and linked into “flat” information sources, many applications become possible: information kiosks that understand what you’re looking for without forcing you to formulize your questions to match the computer’s limitations; advice for pet owners that does not blithely suggest harming the pets; dependable home robots; and – possibly the one application at the top of every knowledge worker’s wish list – a human-like search engine.
What’s Wrong With Search?
Today’s search engines are awesome. They have access to so much information, and sift through it in milliseconds to answer any query we can think of. The problem, of course, is that again we teach ourselves how to query the search engine and how to interpret the results. Some of this involves our admission that some things just can’t be found by using a search engine. Last year, Doug Lenat gave a lecture called “Computers versus Common Sense“ at Google, heavily criticizing the state of the art in web search. Google’s Research Blog selected it as one of its “Videos of the Year” picks for 2006. In this lecture, Lenat gave examples of questions that must be broken into several searches – e.g. “is the Eiffel Tower taller than the Space Needle?”, where you must look up each height separately, find the number within the web page that comes up, and compare the numbers. Even tougher for a search engine is the trivial question “what color is a blue car?”.
From a commercial point of view, there may not be much value in a search engine that can answer the two questions above – the first only requires us to spend a minute or two more than we wish, and the second question is too simple to require a computer. Yet, these examples serve to show a much deeper difficulty. Imagine you’re doing market research on what colors of cars are preferred by people living in a certain location or matching some demographic criteria. Wouldn’t you want the search engine to know that “blue car” relates to car color, while “big car” relates to its size, unless it appears as part of the phrase “big car sale” etc.?
So does it all come down to the issue of “Natural Language Understanding” – the effort to get a computer to understand free-form text in English or any other language? Yes and no. Yes – because you can’t understand natural language without some common-sense knowledge about the world (compare “John was baking” to “the apple pie was baking”). No – because common-sense knowledge is required for so much else besides the understanding of natural language, as the next example shows.
One commercial application that Cyc identified years ago is the search for photographs. Creators of text used in reporting, marketing or many other applications often need to supplement the text with some appropriate images. But how do you find images that fit the spirit and theme of your text? The best answer today is to attach to each photograph a short description and/or a list of keywords that describe it, which allows standard text search to pull up relevant images. This depends on the skills of the person describing the photograph as well as those of the person searching for photographs. Cyc suggests another way: If you say what the picture is showing, many contexts will be obvious by common sense. Example: A search for “someone smiling” could discover a photograph titled “a man helping his daughter take her first step”.
How does Cyc do it? It relies on combining several items known to it: when you become happy, you smile; you become happy when someone you love accomplishes a milestone; taking one’s first step is a milestone; parents love their children; daughters are children; if a man has a daughter than he is her parent. While some natural-language understanding is involved in this process, the real strength of Cyc is in bringing together these items in a logical sequence that concludes it is highly likely that the man in the photograph is indeed smiling.
The State of Cyc Today
Cyc has been around since 1984. It may be the world’s most ambitious and longest-lasting project. In fact, it was conceived in exactly this way: Leading researchers, such as Marvin Minsky, who were sympathetic to Doug Lenat’s ideas, warned that it would take a thousand person-years to get all the required knowledge into a computer. Typical AI academic projects usually have about five people working on them at a time, so the expected completion date was two centuries away. This drove Lenat to turn to the commercial world, where he expected that fifty people could complete the same task in just two decades. After ten years as part of MCC, Cyc was spun off into Cycorp, which is the focus of Cyc work today. Much of its activities are funded by government and private investors, though does sell software, knowledge in expertise for some commercial applications. Cycorp contributes some of its research as open source (OpenCyc) and a larger subset to the academic community (ResearchCyc).
What does Cyc know today? In the overview given by Cyc, the top-level characterization is of “intangible things”, including events and ideas, and “individual”, including objects and events (yes, events are both individual and intangible). Other high-level categories include “space” and “time”, dealing with things about which you can ask “where?” or “when?”; and “agents”, dealing with things having desires and intentions as well as the ability to work towards their goals. Deeper down, we find knowledge about weather, chemistry, natural and political geography, mechanical and electric devices, professions and occupations, and dozens of other categories. Each of these includes specific facts as well as more general concepts: for example, knowledge under “political geography” contains both information about specific towns and cities, and existence and implications of borders.
It is hard to find consistent statements regarding how many “assertions” (facts and knowledge items) Cyc has today, but there are definitely millions. Similarly, it is hard to find an estimate of how many more assertions are required before the project is comlpeted or how much longer this will take. We can ignore the fact that the originally-estimated two decades ended a few years back, The “1,000 person-years” forecast was never more than a very rough estimate. It seems like we’re still at the phase where it is difficult to predict when – or if – Cyc will be ready to deliver on its ambitious promises. It does seem reasonable to expect that when this does happen, it will be sudden. Cyc will be so useful that it will be used in more and more contexts, and this will add size and momentum to the snowball as it receives – or learns for itself – more and more knowledge. When will this tipping point come? As is normal with tipping points, it’s very hard to tell until the tipping has already happened.
To me, it is more interesting to view Cyc as a process that is continually gathering new insights, as well as delivering some applications, which, while falling far below the full vision, are useful in themselves. For example, Lenat mentions in his Google lecture that Cyc has been dragged “kicking and screaming” into adding “higher-order logic”. This mathematical term has to do, among other things, with relationships between relationships, such as “many people don’t know that dolphins are mammals”: “dolphins are mammals” defines a relationship between dolphins and mammals; “many people don’t know that …” defines a relationship between people and the first relationship. In daily life we use much more complex knowledge of this kind. The fact that Cyc had to do this indicates a deep character of the knowledge we all have. Isn’t much of our everyday thinking concerned not with facts but with the effect of these facts on other facts and on the people who know – or don’t know – these facts?
Criticisms of Cyc
If you’ve followed the path from answering pet-care questions to understanding interpersonal relationships (expecting the father in the picture to smile), you might feel that if a computer can really do all that, it has achieved human intelligence. Furthermore, you might also get the impression that nothing below human-level intelligence would actually suffice to do a good-enough job, unless the domains of discussion are sharply circumscribed (so that a limited amount of knowledge is enough). Lenat would agree, as Cyc’s home page says “Cycorp’s vision is to create the world’s first true artificial intelligence, having both common sense and the ability to reason with it”. There’s no question that this goal has not been reached yet. Lenat and his co-workers believe that the goal is achievable, and that they are near the point where the computer itself could increasingly take over many of the tasks of teaching itself how to understand and reason.
Not everyone agrees – in fact, large parts of the Artificial Intelligence community are deeply skeptical about Cyc’s goals, methods and technology. While there are many kinds of criticisms, I believe that the disagreement originates in a deep-rooted and old controversy in AI – symbol-based artificial intelligence versus connectionist approaches. Some trace this schism back to the core of philosophy, where symbolism supports the philosophy of Descartes while connectionism follows Heidegger’s critique of these ideas.
At the risk of over-simplification, let’s describe symbolism as the effort to describe every aspect of mental activity as dealing with symbols and the relationship between symbols. In Cyc, for example, “parents love their children” and “daughters are children” are statements linking four symbols (parents, children, daughters, love). Cognition, in this view, is a process of manipulating these symbols using a set of rules, as when the above statements may yield the conclusion that parents love their daughters. This is often called, especially by opponents, “GOFAI”, for “Good Old Fashioned Artificial Intelligence”. Connectionism, on the other hand, starts not from symbolic descriptions that strive to model the real world, but from the real world itself. Cognition is then the reaction of interconnected units (such as neurons) to inputs from the real world, where the brain continually adjusts the connections between these units to achieve responses which are a better fit for the real world. For example, a better response could be one that made a better prediction of the next event detected by the senses.
Critics of Cyc typically use the same arguments used by connectionists against GOFAI: can a symbolic description really capture the complexity and “messiness” of the real world? How do you deal with exceptions? Birds can generally fly, but what about flightless birds,
dead birds, birds whose wings have been clipped, caged birds, and parrots in Monty Python sketches? Can we make a comprehensive list of all the exceptions to this rule? How about birds that can only fly short distances? In the statements “airplanes can fly” and “birds can fly”, should “fly” be represented as the same symbol or as two separate symbols?
Another aspect of the “messiness” of the real world is the many shades of meaning for just about any concept. Cyc currently holds about twenty semantically-distinct meanings of inclusion (“A is part of B”). Why not five, or fifty? How can what we know about one meaning of inclusion be used for another meaning of inclusion – and should it be used or would it generate wrong conclusions? When can we deduce that a specific parent does not love his or her children? What actions can we predict from the fact that X loves Y? Does it even make sense to identify “love” with a symbol with an agreed-upon meaning?
A subtler set of issues revolves around how all this knowledge may be usable, even if it is correctly represented. As Herbert Dreyfus, one of the chief critics of GOFAI, says: “Nowhere in the Encyclopedia Britannica does it say that people move forward more easily than they move backward”. In case this seems frivolous, consider our reaction when we see someone walking backwards. The key point is, we’d notice something odd, and this would cause us to look for an explanation. Among many possible explanations, we may suspect that the person is walking away from some danger and decide to look in the direction where he’s looking. It could save our lives. It’s important to notice that knowing the fact mentioned by Dreyfus is not enough. Even in the limited arena of observations about movements of people, we should also state, for similar reasons, that people prefer walking to crawling, that they typically keep their hands hanging at the sides of their bodies, etc. Each of these facts could be used to predict “normal” movements and to detect the need for more explanations. “Why is this woman raising her hand while walking? Is she waving to anybody? Let’s look at the direction she’s waving” – in order to start this chain of thought, we need to remember that people usually don’t just happen to raise their hands while walking.
It is even questionable whether we know how to state all the relevant facts about how people move. We can intuitively differentiate human walk from the walk of even the best-walking robots available today (one observer commented that Honda’s Asimo robot walks like a person who really needs to go to the bathroom). Can we explain to Cyc how we make this identification so quickly?
How would Cyc’s developers think of entering such a fact (“people move forward more easily than they move backward”) into their list of common-sense items? Remember that for this fact to be usable in reasoning we should also state some fact such as “when people have several ways to do something, they typically choose the easier way”. If Cyc developers don’t add these facts, could Cyc derive it from other items within its knowledge pool or from other information sources? As the above example shows, the lack of such facts, which are so obvious to us, could result in Cyc failing to make even apparently simple deductions. Such failures are well-known to users of Cyc and other similar, less-ambitious projects, many of whom believe that once some critical mass of knowledge has been achieved, the gaps will be automatically detected and filled in (possibly by Cyc asking humans to provide the missing pieces).
How would a connectionist approach teach a computer that people move forwards rather than backwards? It would let the computer teach itself, by observing the movements of many people in many situations (many AI researchers would say that there’s also a critical need here for the computer itself to “walk” – that is, to be embodied in a robot).
Let’s take a quick and simplified tour through the connectionist world: Imagine that there’s a unit, within the large set of interconnected units, which has come (through earlier learning) to be strongly active when forward movement is observed; another unit associated with backward movement; and yet another one with human walk. I emphasize that the tagging of a unit – e.g. as identifying “human walk” – is only of interest for external investigation of these units: the operation of the connected network of units does not need, understand or use this tagging. Over many observations, the computer will find that “human walk” is almost always active together with “forward movement”, and rarely active with “backward movement”. If this rare combination occurs, it will trigger other parts of the network to look for other experiences matching this combination. In other words, attention will be drawn to something after it is discovered that it is an unexpected combination. If no earlier observations can be retrieved, the observations may be decomposed into their constituent details. For example, the head’s direction may be ignored as a useless additional detail when the observed person is both moving and looking forwards – it is added into the “moving forward” activation. However, when attention has been focused by the unexpected pattern of unit activation, units which are activated by direction of gaze would receive a stronger signal. Eventually, this could yield the kind of reaction we’re expecting. I say “could yield” because I don’t know of any connectionist project that has demonstrated such success within real-world, unbounded situations like the ones Cyc is targeting.
There are also criticisms based on mathematical theories of logic. Mechanisms for handling exceptions, as well as for handling the higher-order logic discussed above, involve types of mathematical logic for which there are theoretical limitations regarding completeness and consistency. The “completeness” problem implies that there could be facts which are deducible from Cyc’s knowledge, but which Cyc would never discover (this is not the same as the simpler problem of completeness, which questions whether Cyc would ever have enough facts in order to have reliable “common sense”). The “consistency” problem means that it is theoretically possible that Cyc would be able to use parts of its knowledge to decide that some claim is true, while other parts of its knowledge lead it to deduce that the same claim is false. The only known way to prove that this would never happen is to drastically limit the rate and type of knowledge creation as well as limiting knowledge content – an unacceptable solution. Cyc’s developers took the middle road: They decided to allow contradictions between different bodies of knowledge, each of which is dedicated to one kind of “expertise”, while striving towards internal consistency in each such body. Regarding theoretical objections, Cyc counters that it is an engineering project which should be judged by empirical evaluation of results. Furthermore, if we’re trying to create human-level intelligence, shouldn’t we allow for some incompleteness and inconsistency – especially if there might be reason to believe that these are the costs of achieving real
Lastly, GOFAI simply doesn’t “feel right” for many people. It does not feel like what we’re doing when we’re thinking, and unlike connectionism it has very little biological support.
Most Cyc supporters and critics would at least agree about one thing: If a computer accomplishes the goals of properly using common sense in a real-world setting, regardless of whether it was achieved using symbolism, connectionism or something else, it will have become human in many ways. Shortly thereafter, it will become superhuman, if only in its capability to process and use far more information than any human or any group of humans ever could. Amazingly, the protagonists of The Two Faces of Tomorrow, written in 1979, who started with the deceptively simple question of how to get a computer to understand why cats shouldn’t be burned, face exactly this possibility by the end of the book. Only time will tell whether this is a coincidence or a hint of future developments.
For further discussion of A.I. and common sense see the TFOT forums.
About the author: Israel Beniaminy has a Bachelor’s degree in Physics and Computer Science, and a Graduate degree in Computer Science. He develops advanced optimization techniques at ClickSoftware technologies, and has published academic papers on numerical analysis, approximation algorithms and artificial intelligence, as well as articles on fault isolation, service management and optimization in industry magazines.
More columns by Israel Beniaminy can be found here.