Yudkosky and Soares’ If Anyone Builds It, Everyone Dies centers around the idea that AI is unpredictable and has the ability to take complete control over us all. Taking a pessimistic view, Yudkowsky and Soares wholeheartedly believe that the development of Superintelligent AI can lead to the extinction of humanity as we know it. Understandably, this is an incredibly hard pill to swallow.
Starting with chapter 3, the authors introduce us to the idea of AIs “learning to want”. While AIs will never have humanlike passions, as they continue to get smarter they will begin to behave like they want things. They will steer the world toward the destinations that they are trying to reach. The authors explain how wanting is a very effective strategy of doing, so if an AI wants to do something, it will do whatever it can to do it. Yudkowsky and Soares talk about the idea of being trained for success. AIs are trained to succeed in a certain task that the programmer is having it do. So, the AI is obviously going to do whatever it takes to succeed in this certain task. They provide an example of OpenAI’s o1 doing things that they had never even thought of, things they assumed were impossible, all in the pursuit of success. What Yudkowsky and Soares are trying to explain is that this is a process of wanting. The AI wants to succeed, so its actions are dependent on this wanting to get the job done. This, ultimately, can turn into a problem. In the process of creating AIs that we want to steer in a certain direction, we are also creating the possibility of an AI that can go anywhere it wants, not just where we want it to go.
Chapter 4 expands on this problem, emphasizing the major unpredictability that comes with building and “training” AIs. Long story short, it is unpredictable how an AI will ultimately act. Just because you “train” it to do one thing, that does not ensure that it won’t go off on its own accord. No matter how smart the programmers are, AI companies are never going to get the AI that they are training for. Yudkowsky and Soares provide an interesting example to get this point across. They use the idea of an alien guessing what kind of treat hominid brains would consume. Using the information the alien had, it ended up blind guessing and failing. It was unable to guess ice cream, because how would a being that hunts and gathers possibly want frozen cream over a big piece of meat? The link between how humans were trained and what they ultimately have preference over could have never been predicted, showing just how unpredictable AIs that we train a certain way will end up behaving.
The rest of chapter 4 provides many more examples of how AIs can be unpredictable as they continue to modify and get smarter. As the title of the chapter suggests, you don’t get what you train for. And, as the title of the book suggests, the AIs will begin to outsmart even the smartest people, so it will be the case that if anyone builds it, everyone dies. Finishing up the chapter, Yudkowsky and Soares talk about the “AI Alignment Problem”, which basically sums up the challenge that is making sure AIs work in accordance with what they are supposed to be doing, what humans want them to be doing. This, at the end of the day, is the major problem with AI. They have an ability, an ability that we cannot predict, to take over the world. We can’t shape the preferences of something we don’t understand, and it is safe to say that we are never going to fully understand the potential of artificial intelligence.