In this short book, Stephen Wolfram, the mind behind Wolfram|Alpha and Mathematica, takes a deep dive into ChatGPT’s inner workings.
The text was originally published by Stephen Wolfram on his website. I read the Kindle eBook version, and a paperback version is also available.
Wolfram goes beyond “explaining” ChatGPT, in its own right, this is a great general introduction to deep learning. Even if you are not very familiar with machine learning concepts, the explanations are very clear, and the author offers easy-to-understand analogies.
It starts with an overview of what is happening when ChatGPT generates text. Does ChatGPT really “understands” what you’re asking? In reality, the model generates text by calculating the probability of the next “most reasonable” word over and over. But there’s a caveat; if chatGPT chooses only the words with the highest probability of occurrence, the generated text will be dull and lack creativity. In fact, better results are obtained by making ChatGPT occasionally choose words with lower probabilities. You can tune this parameter as you like. As Wolfram explains, there is no “theory” for how this works, it is just what has been found to work in practice.
Wolfram then explains what a “model” is and particularly how machine learning models work. The book covers the basics of “training” and “learning” in the context of AI.
The concept of the neural network (NN) and its architecture is expanded next. In this section, we learn about how training data is obtained, as well as the concept of transfer learning and data augmentation. Also, some important remarks are made about the hardware (CPUs, GPUs) used to train NNs.
The second half of the book starts to delve into Natural Language Processing (NLP) and Large Language Models (LLMs), of which ChatGPT is an example. NNs can only handle numeric data, so the concepts of word embeddings and tokenization are presented as key ideas.
The book moves on to “What‘s inside ChatGPT”, where the concept of Transformers is introduced. This is an idea that requires some effort to understand, and the author provides an excellent explanation. (By the way, GPT stands for Generative Pre-trained Transformer).
With all the basics covered now, Wolfram discusses how in particular ChatGPT training was done. Here we learn why sometimes ChatGPT confidently produces answers that seem Ok but aren’t quite right. This is particularly true when the answer requires a certain degree of mathematical reasoning.
Here and there, the author offers some of his insights about AIs doing “human tasks”. What can we tell about the fact that ChatGPT (performing simple computations) can generate human language? Is perhaps the human brain’s ability to produce meaningful language not as complex as we thought?
As you see, the book covers a lot of ground in a very concise manner. While some concepts are hard to grasp at once, it is very well written and balanced. When technical language is used, the author explains clearly what it means.
Possibly my only complaint is that I would have liked it to go deeper into some of the technical details. However, links to reference materials are provided throughout the book, so you can dig into the specifics.
ChatGPT has taken the world by storm, if you are looking to learn the facts rather than the hype, this book is a great starting point. Overall, the book is approachable and does a great job at demystifying ChatGPT.
Thanks for reading! I hope you find this review informative, feel free to share it and leave me your comments.