Tuesday, June 30, 2015

Strong Artificial Intelligence is Emerging as we Talk



The “return of Artificial Intelligence” is an impressive trend of the blogosphere. I spent quite some time and pleasure reading the two great posts from Tim Urban’s blog WaitButWhy entitled “The AI Revolution: The Road to Superintelligence” and “The AI Revolution : Immortality or Extinction”  (part 1 and part 2). The core of these posts is the difference between ANI (Artificial Narrow Intelligence, what used to be called weak AI), AGI (Artificial General Intelligence, or strong AI) and ASI (Artificial Superintelligence). These posts are strongly influenced by Ray Kurzweil and his numerous books, but do a great job of collecting and sorting out conflicting opinions. They show a consensus in favor of the emergence of AGI between 2040 and 2060. I strongly recommend reading these posts because they are entertaining, actually quite deep and offer a very good introduction to the concepts that I will develop later on. On the other hand, they miss the importance of perception, emotions and consciousness, which I will address in this post.

The diversity of opinions is striking. On the one hand, we have very enthusiastic advocates, such as Ray Kurzweil and his book “How to Create a Mind” which I will refer to later on, or Kevin Kelly with this great Wired article: “The Three Breakthroughs that have Finally Unleashed AI on the world”. For this camp, strong AI is feasible, it’s coming soon and it’s good. Obviously Larry Page is in that camp. On the other side, we find either people who simply don’t believe in the feasibility of strong AI, such as Gerard Berry, famous for saying that “computers are stupid”, or people who are very worried of what it could mean for mankind such as Stephen Hawking, Bill Gates or Elon Musk, to name a few. One of the reason this topic is so hot in the Web is that the investment race has started. Each major software company is investing massively into AI, as is explained by Kevin Kelly in his paper. IBM and Watson have started the race, while Google was acquiring massively companies in the fields of AI and robotics. Facebook has a massive AI program that has attracted a lot of attention. Kevin Kelly quotes Yahoo, Twitter, LinkedIn or Pinterest for having invested into AI capabilities recently. There is no debate about the tidal wave of ANI (weak form of AI), which is both depicted by Kevin Kelly or Tim Urban. It’s already here, it’s working quite well and it’s improving rapidly. The big race (and those who invest believe that there is a game changer ahead) is to get first to the next generation of Artificial Intelligence.

I decided to write my own opinion a couple of months ago, first because I got tired to hear the same old arguments about why strong AI was impossible, and also because I found while reading Tim Urban or Kevin Kelly (to name a few) that some of the key ingredients to make it happen were missing. For instance, there is too much emphasis on computing power, which is a key factor but is not enough, in my opinion, to produce AGI, even though I have read and appreciated Ray Kurzweil’s books. I must say that I have been away too far from computer science to qualify as an expert in any form. Let say that I am an educated amateur, because I started my career and my PhD in the fields of knowledge representation, rule-based systems and so-called expert systems. I have worked a long time ago on machine learning applied to algorithm generation and then more recently on intelligent agent learning with the GTES framework. There is some irony for me when writing these pages, since one of my first lecture when I was a student at the Ecole Normale SupĂ©rieure in 1984 was on the topic of AI replacing today’s workforce (to be compared with one of my posts on the same topic last year).
In this post I will explore four ideas, which seems to be, in my opinion, missing from what I have read during the past few months:
  • Speculating about AI algorithms today as a way to achieve strong AI is hazardous since these algorithms will be synthesized.
  • True intelligence requires senses, it requires to perceive and experience the world. This is one of the key lesson from biology in general and neuroscience in particular from the last decades, I do not see why computer AI would escape this fate.
  • A similar case may be made about the need for computer emotions. Contrary to what I have heard, artificial emotions are no more complex to embed than computer reasoning.
  • Self-consciousness may be hard to code, but it will likely emerge as a property of the next generation complex systems. We are not taking about giving a “soul to a computer” but letting free will and consciousness of oneself in relation to time and environment  become a key perceived feature of tomorrow smart autonomous systems, in the sense of the Turing test.

1. Artificial Intelligence is Grown, Not Designed


This is not a new idea. I have made Kevin Kelly’s book “Out of Control” a major reference for this blog. The central idea of his book is that in order to create really intelligent systems, you must relinquish control. This is true for weak and strong AI alike. What makes this idea more relevant today is the combined availability of massive computing power, massive storage and massive amounts of data. As I explained in my Big Data post where I quoted Thomas Hofmann, “Big Data is becoming at the core of computer science”, the new way of designing algorithms is to grow them from massive amounts of data. These new algorithms are usually “simple” (parts or whole are sub-linear) in order to absorb really massive amounts of data (peta-bytes today, much more tomorrow). One thing that we have learned from the past years is that simpler algorithms trained on really huge corpus of evidence do better than more complex algorithms trained on smaller samples. This has been shown in machine translation, grammar checking and other machine learning domains.

One of the key AI algorithmic technology of the moment is Convolutional Neural Networks (CNN) and the emphasis is on “deep learning”. CNN are a family of neural networks – trying to replicate the brain mechanism for leaning from layers of neurons – characterized by the control of back-propagating the information from the training set into the neuron network. For instance, you may read Mark Montgomery entry on “recent trends in artificial intelligence algorithms”. Deep Learning has received a lot of media attention thanks to the success of Deep Mind and his founder Demis Hassabis. Feedforward neural networks are a good example of systems that are grown, not designed.

If you read carefully about the best methods for speech recognition and language generation, you will see that you need more than CPU power and large training sets, you actually need lots of memory to keep that information “alive”. I borrow from Mark Montgomery the citation from Sepp Hochreiter because he makes a very important point: “The advent of Big Data together with advanced and parallel hardware architectures gave these old nets a boost such that they currently revolutionize speech and vision under the brand Deep Learning. In particular the “long short-term memory” (LSTM) network, developed by us 25 years ago, is now one of the most successful speech recognition and language generation methods”. Kevin Kelly attributes the “long awaited arrival of AI” to three factors : cheap parallel computation, big data and better algorithms. I obviously agree with those three, but his vision of “big data” as the availability of large training set is too narrow.

I also do not believe that the current algorithms of 2015 are indicative of what we will grow in 2040 when we have massively superior computing and storage capabilities. History shows that the mind follows the tool and that scientists adapt continuously to the new capacities of their tools. We are still in the infancy stage, because our computing capabilities are really very limited (more on this in the next section). Among the skeptics in the computer science community are people who think – and I need to agree – that being able to play old arcade games at a “genius level” is still very far from a true step towards AGI. The ability to devise and explore search and game strategies has been around for a long time in the AI community (i.e., playing a game without the rules). Many of the critics about the possibility of AI quote the difficulty to create, to produce art or to invent new concepts. Here I tend to think the opposite, based on the last decade of seeing computers used in music or mathematics (as a hint, I would like to quote Henri PoincarĂ© : “Mathematics is the art of giving the same name to different things”). Creation is not difficult to express as a program, it is actually surprisingly easy and effective to write a program that explores a huge abstract space that represent new ideas, new images or new music. The hard part is obviously to recognize value in the creation, but computers are getting better at it.

2. A Truly Smart Artificial Intelligence Must Experience the World


The most common argument against strong AI and “true” natural language processing, when I was still close to the scientific AI community, was the “semantic problem”, that is, the difficulty to associate a meaning to words in a computer program. What we have learned in the last decades is that natural language cannot be understood through formal methods. Grammar, syntactical rules, lexicography cannot help you much without a “semantic reference" which is necessary to understand, even to disambiguate, many sentences that make our everyday life. Somehow, one needs a phenomenology foundation to understand humans and to be able to discuss convincingly.

The true revolution that is happening gradually is that the Web may be used as this “phenomenology foundation”. This was explained to me many years ago by Claude Kirchner during a talk at the NATF : if you are a computer and need to think “from experience” about a dog, why not used the network of millions of documents returned by a Google search with the query “dog” as the phenomenology reference ? It requires massive amounts of computing and storage, but it is more and more feasible. In all its richness, diversity and links with other experiences, this cloud of documents (text / image / video / ..) makes a solid foundation to answer  common-knowledge-questions about dogs. This is a departure from previous approaches where the huge amount of sources available on the web is used to produce “abstractions” (concepts that are represented by bit-vectors produced by techniques such as Latent Semantic Indexing from my departed friend Thomas Landauer). The idea here is to keep the whole network of document in memory as a substitute for experiencing a dog. I am a little heavy here –one could say that it is lazy deep learning -, because it is a key point when one wants to understand when we may get strong AI widely available : it is not the same thing to have the whole set of documents stored in your computer brain or to build a model through training. This is, to me, a key point since we have learned from other scientists that it is very hard to separate perception and thinking, as it is hard to separate body and mind. An obvious reference that comes to mind is Alain Berthoz and his work on sight (for instance, you may read his book on decision).

As a first hint that having access to huge amount of data builds the capability to understand texts, we have started to see significant progress in natural language processing (NLP) and we are bound to see much more when more storage and more processing power become available. NLP is one of the key priority for the Facebook AI program that I mentioned earlier. It is also a key priority for Google, Apple and many, many others. There are already a number of exciting signs that we are making progress. For instance, computers can now play with words games, such as the one that make IQ tests, better than most humans. This is not yet an example of keeping all “experience knowledge in memory”, but a sign that deep learning applied to massive amount of data can work pretty well. Another sign that race towards NLP is raging is the apparition of services that are mostly based on answering questions. The obvious reference here is IBM Watson, but there are many other innovative services that are popping up, such as texting services on top of WeChat. Many of these texting/concierge services are using a hybrid of human/robot assistance, waiting for technology to become fully sufficient. I also hear a lot of frustration in my close circle about the shortcomings of Google translate or Apple Siri, but the progress rate is very impressive. If you are not convinced, read this fascinating article about IBM Watson’s training.  During a lecture which I attended last month, Andrew McAffee used the graph (Figure 9) where you see the level of coverage/precision reached by Watson versions after versions, as a great illustration of the power of exponential technology growth.

This being said, one of the reason I am emphasizing the need for memory is that the slowing down of DRAM capacity increase may happen faster than the suspected decline of Moore’s Law. It turns out that there are many ways to continue increasing the processing power, even if speed is closed to its limit and if integration (reducing the transistor dimension) is also, in its two dimension version, not so far from hitting hard limits. On the other hand, DRAM performance seems to progress slower and with fewer routes to continue its growth. You may take a look at the table or the following chart to see that computer memory is progressing slower than processors, who are progressing slower than disks (this last part is very well explained in “The Innovator Dilemma”). Another way to look at it is as follows. I have been waiting of 1 Pb (peta-byte) of memory on my PC for many years … in the early 90s, I had a couple of megabytes, today I have a couple of gigabytes. Even at the previous CAGR of 35%, it may take 50 years to get there, which is why I am more with the group of thinkers who predict of AGI occurrence in 2060, compared to the optimistic group (2040). On the one hand, you could say that asking for one Pb is asking a lot (there are many ways to get this number, mine was simply 100K experiences time 10 Gbyte of real-life data), but clearly considering that memory will continue to grow at the same rate is too optimistic.

Linking a computer to a very large set of “experiences” in one step, the next approach is to build autonomous robots with their own senses. I often talk about the robotic arm from the University of Tokyo which is about to catch an egg that is launched towards it at full speed, and which is also able to play baseball with the accuracy of a professional player. The reason for this engineering feast is not an incredible algorithm, it is the incredible speed at which the robot sees the world, at 50 thousands images per second. At that speed, the ball or the egg moves very slowly and the control algorithm for the arm has a much easier job to perform. Because of the importance of senses, experiences and perception, it may be the case that we see faster progress from autonomous robots than cloud AI as far as reaching AGI is concerned. One could say that the best way to train an artificial intelligence is to let it learn by doing, by acting and exploring with a full feedback circle (which is precisely what happens with the DeepMind arcade games experiments). This may mean that autonomous robots, which will clearly be fitted with exceptional perception senses – one may think of Google autonomous car as an example – will be in the best situation to grow an emergent strong form of artificial intelligence.


3. Learning and Decisions Require Emotions


To continue on what we can learn from biology and neurosciences, it seems clear that computers needs to balance different types of thinking to reach decisions on a large range of topics, in a way which will appear « intelligent » to us humans. A lot of my thinking for this section has been influenced by Michio Kaku’s book “The Future of the Mind”, but many other references could be quoted here, starting from Damasio’s bestseller “Descartes’ error”. The key insight from neuroscience is that we need both rational thinking from the cortex and emotional thinking to take decisions. Emotions seem mostly triggered by “pattern-recognition” low level circuitry of the brain and the nervous system. This distinction is also related to the system 1 / system 2 description of Kahneman. We seem to be designed to mix inductive and deductive logic.
 



Michio Kaku has a very elegant way of looking at the role of emotions in the process of thinking. Emotions are a “cost / evaluation” function that is hard-wired (through DNA) and has evolved slowly through evolution (through DNA), to play two key roles. On the one hand, emotions are a valuation function that is used as a meta-strategy to search and to learn when we us the deductive, rational way thinking. For people trained in optimization problems, emotions define the first level of the “objective function”. However, as evolved creatures, we build our own goals, our own desires and our own cost functions for new situations, that is, how we value new experiences. The second role of emotions is to be the foundation (one could say, the anchors) for the cost function that we grow through experience.

This is closely related to a key cycle in biology which we could call the “learning cycle for living beings”: pleasure leads to desire, desire to planning, planning to action, actions lead to experiencing emotions, such as pleasure, fear, pain, etc. I heard about this cycle a few years ago while attending a complex systems conference. It seems to describe the learning loop for a large set of living beings, from very simple to us humans. Emotions, both positive such as pleasure and negative such as fear, play a key role in this cycle, from evaluating situations and formulating plans. We can see that a similar design is relevant to the goal of generating strong artificial intelligence.  It is clear that a truly smart system must be able to generate its own goals, which is actually easy, as explained earlier. Simulating “free will” from randomness is a simple task (very debatable from a philosophy standpoint but efficient from a pragmatic one). However, intelligence in goal generation requires to use an objective function that may evolved as the smart system is learning. Computer emotions may be used as seeds (anchors) of this objective function. For Michio Kaku, emotions are case-based heuristics that have been finely tuned through Darwinian evolution to make us a more adaptive species. Mixing emotions and reasoning is not really a new concept in AI.  It is a way of mixing case-based reasoning, in a “compiled form” that has been learned previously by previous generation of software instances with logic deductive reasoning that is “interpreted” and unique to each instance. This is clearly a multi-agent model (system1 vs sysytem2) that reminds us of “The Society of Mind” proposed by Marvin Minsky in 1986.

A great illustration of this idea proposed by Mikio Kaku is the sense of humor, which may be described as our ability to appreciate the difference between what we expect (the outcome of our own world model simulation) and what happens. This is how magic tricks and jokes work. Because we value this difference, we are playful creatures: we love to explore, to be surprised, to play game. Kaku makes a convincing argument that the sense of humor is a  key evolution trait that favors our learning ability as a living species. It is also very natural to think that smart AIs, with a similar ability to plan ahead and simulate constantly what they expect to happen, should be given a similar “sense of humor” (e.g., affinity for the unexpected) as a search “meta-strategy”. This remark also brings back to the need for “emotions” to avoid danger (i.e., how we learn not to play with fire). Kaku also sees the use of free will, in the sense of exploiting some form of randomness – with the same debate whether it is “true” freedom or a trick to use some form of biological pseudo-random generator -, as a meta-strategy evolved as a Darwinian advantage for species competition. He takes the hare as an example, which needed to develop random paths to avoid the fox. But a more general case can be made from game theory where we know that mixed strategies (that combine some form of choice or “free will”) fare better in a competition that pure (deterministic) strategies. A similar and more technical point could be made about the use of randomization in search algorithms, which has been proven in the past decade to be an effective meta-strategy.

I strongly recommend reading Michio Kaku’s book, which has a much larger scope than what is discussed here. For instance, the pages about experiments at Berkeley to read thoughts are very interesting. His insights about the role of emotions are quite fascinating, and make a nice complement to Kurweil’s book which I’ll discuss in the next section. To summarize and conclude this section, designing computer emotions is probably the best way to introduce some form of control into an emergent reasoning autonomous system. Emotions are both a bootstrap and a scaffolding mechanism for growing free will. They constitute our first level of objective function, hard-wired together with the more primitive senses signals such as pain. As we learn to derive more complex goals, plans and ambitions, emotions are a control mechanism to keep the new objective function within stable bounds. Emotions are somehow a simpler information processing mechanism than the cortex deductive thinking (which is why they work faster in our bodies) and they evolve at the species level, much more than the individual level (we learn to control them, not to change them). This makes computer emotions a mechanism that is far easier to control than emerging intelligence. My intuition is that this will become a key area for autonomous smart robots.


4. Consciousness is an Emerging Property of Complex Thinking Systems



Another classical argument of skeptics about the possibility of strong AI is that computers, contrary to humans, will never be aware of their thinking, therefore not consciously aware of their actions. I disagree with this statement since I think that consciousness will emerge gradually as we build more complex AI systems with deeper reasoning and perceiving abilities (cf. Section 2: perceiving is as important as reasoning). I am aware (pun intended) that there are many ways to understand this statement and that the precise definition is where the hot debate stays. Here, my own thinking has been influenced by Ray Kurzweil’s book “How to create a mind”. Even if I do not subscribe with the complete story (i.e., that everything you need to create a mind is explained in this book), I found this book a great read for two reasons: it contains a lot of insights and substance about the story of NLP and AI, and it proposes a model for conscious reasoning which is both practical and convincing. As you may have guessed, my main concern with the approach proposed by Kurzweil is the weak role played by senses and emotions is his mind design.

What I envision is a progressive path towards consciousness:

  • Self versus environment: the robot, or autonomous AI, is able to understand its environment, to see and recognize itself as part of the world (the famous “mirror test”).
  • Awareness of thoughts:  the robot can tell what it’s doing, why and how – it can explain its processing/ reasoning steps
  • Time awareness : the robot can think about its past, its present and its future. It is able to formulate scenarios, to define goals and to learn from what actually happens compared to its prediction
  • Choice consciousness: the robot is aware of its capability to make choices and creates a narrative  (about its goals, its aspirations, its emotions and its experiences) that is a foundation for these choices. “Narrative” (story) is a vague term, which I use to encompass deductive/inductive/causal reasoning.


Although I see a progression, this is not a step-by-step hierarchy. It is an embedded set of capabilities that emerge when sensing, modeling and reasoning skills grow. Emergence of consciousness is a key element of Kurzweil’s book, as shown by this quote: “My own view, which is perhaps a subschool of panprotopsychism, is that consciousness is an emergent property of a complex physical system. In this view the dog is also conscious but somewhat less than a human”. The emergent characteristic also implies that it is difficult to characterize, and even more difficult to understand how it comes to be. However, once an AI has reached the four levels of conscious abilities that I just described, it is able to talk to us about self-awareness in a very convincing manner.  One could object that this is a narrow, practical definition of consciousness, but I would say that it is the one that matters practically, for strong AI and autonomous robot applications. I will not touch in this post the key question of knowing if human consciousness is of the same nature (an emerging property of ourselves as a complex system, an essentially different characteristic of our species, or an attribute of our immortal soul). One of the hot questions about strong AI is the “hard problem of consciousness” defined by David Chambers. The “easy problems of consciousness” are self-awareness capabilities that Chambers and many others see as easily accessible to robots. “Hard problems” qualifies reflective thoughts about one’s experiences that seem harder to capture with a computer program/ Without trying to answer this hard question, it is clear to me that consciousness requires experience, hence the emphasis I have put on senses, perceptions and emotions. I also believe that, given sufficient complexity, sensing and reasoning capabilities, emergence may grow “artificial consciences” that will come close the “hard level of consciousness”. It is also clear that this will open a number of ethical issues about what we can and cannot do when we experiment with this type of strong AI program. For lack of time, I refer you to James Hughes’s book “Citizen Cyborg”, where the rights of emerging conscious beings are discussed.

5. Concluding Thoughts


There is much more that needs to be said, especially on the philosophical level about consciousness and the political level about the societal risks. So I will not risk a “conclusion”, I will conclude with a few thoughts. My previous post on this topic is almost 10 years old, but I have a keen intuition that many will follow sooner than 2025 :)

  • First, it is clear now that weak AI, or ANI, is already there in our lives, and has been progressing for the last twenty years making these lives easier. The two articles from Tim Urban and Kevin Kelly that I mentioned in that post give a detailed account with plenty of evidence. I can also point out James Haight post “What’s next for artificial intelligence in the enterprise?”. Kevin Kelly emphasizes the advent of “AI as a service”, delivered from the cloud by a small set of world leaders. I think he has a fair point, there is clearly a first move/scale advantage that will favor IBM, Google and a few other large players.
  • However, there are more opportunities than “smart thinking in the cloud”, (weak) AI is everywhere and will continue to be ubiquitous. Machine learning is already here in our smartphone and the next decades of Moore’s Law mean that connected objects and smart devices will be really smart.
  • The race towards strong (or at least stronger) AI is on, as illustrated by the massive investments made by large players in that field. The next target is NLP (natural language processing) which is within our reach because of the exponential progresses of computing power, big data (storage capacity and availably of data) and deep learning algorithm.
  • This is very disruptive topic.  I do not agree with Kelly’s optimistic vision in his paper, nor with Ray Kurzweil. The disruption will start much earlier than the advent of the strong AI stage.  For instance, the tidal wave of ANI may cause such a havoc as to make AGI impossible for decades. This could be either for ethical reasons (laws slowing down the access to AGI resources because of the concerns with what “weak” AI will be already able to do in a decade) or for political reasons (the turmoil created by massive jobs destructions due to automatization).
  • Emotion and senses are part of the roadmap towards strong AI (AGI). Today’s focus is on cortex simulation as a model for future AI, but everything, from cognitive science to biology, suggests that it’s the complete nervous system from brain to body that will teach us how to grow efficient autonomous thinking. This is actually easier to state in a negative form: AI designed without emotions, through a narrow focus on growing cognitive and deductive thinking by emergent learning will most probably be less effective than a more balanced “society of minds” and almost certainly very hard to control.
  • Consciousness will emerge along the way towards strong AI. It will happen faster than we think, but it will be more progressive (dog-level, child-level, adult-level, god-knows-what-level, …). Strong AI will not grow “in a box”, it will grow from constant and open interactions with a vast environment.

 
Technorati Profile