Seven Games in Silica

The topic of AI/ML (machine learning) is getting a lot of buzz right now. That makes this book review pretty timely.

I.

Seven Games: A Human History by Oliver Roeder is a book that came out this year. It is about board and card games, the seven alluded to in the title—Checkers, Chess, Go, Backgammon, Poker, Scrabble, Bridge—along with some philosophical musings about game-playing in general. On the latter point, Roeder discusses how games provide a constrained space for creative play and exploration, that can impart skills and lessons which can then be taken out and applied in the broader world. This is very relevant to one of the main themes of Seven Games, on the development of computer programs that can play these games, eventually getting strong enough to challenge even top human players (at most of the games in the book, at least). These games have provided a constrained problem space for the development of AI programs (with techniques getting ported over to more general problems). The history that Roeder recounts may also provide a useful peek into the changes this technology could bring to the workplace and other facets of our lives.

Overall, this was a fun book to read. It had a nice blend of the history of the games, details about their mechanics and strategy and complexity, a run-down of the effort to conquer each one with code, and big picture thoughts about games and about the future of automation. I appreciate that the author played all of these games, some (e.g. Scrabble) at a pretty high level. He also has a PhD in game theory and a lot of experience writing; Roeder was certainly the right person to write this book.

This excerpt gives a good overview of the book as a whole, from the philosophical point in the first two sentences to the descriptions of the elements of strategy that each game adds (emphasis added):

Inventing tools to augment skill is one of Homo sapiens’ foundational behaviors. Therefore, “human versus machine” contests are also “human versus human” contests. I, too, enjoy getting better at games, learning new truths about them. So I recently spent a long time playing games against these creations. I played chess and Go and backgammon and poker and Scrabble against opponents who couldn’t talk but nonetheless destroyed me. I knew the programs were written in bloodless code, but I found that they had their own personalities. And I discovered the stories of their programmers. In the wilderness of Oregon, an astrophysicist and programmer pondered the mathematics of bridge. In an IBM office in rural New York, a small team built a supercomputer that became perhaps the world’s best chess player. On the icy campus of the University of Alberta, a professor upended his family in pursuit of a solution to checkers. And in a Google office in London, an elite squadron conquered the most beautiful, and most complex, board game on the planet. The seven games in this book belong to a rough hierarchy; each game on the list adds a strategic feature and, therefore, more closely hews to some aspect of the “real world.” The aspects of each game crystallize a specific and potent form of agency. And when taken together, these aspects form a rough menu of intelligence. Computer scientists and their algorithms have been making their way through the list, on their way, perhaps, to truly general artificial intelligence. Checkers allows you to practice basic strategy—but its canvas is limited and its moves often rote. Add different pieces with more complex movements and you produce chess, a game that for centuries has been associated with intelligence itself. Or increase the number of pieces and the size of the board—like managing not just a small tribe but a giant civilization—and you have Go, the mathematically richest game played by humans. But life is random and is always throwing you some unexpected new development; practice for that with backgammon, which relies on chance. Poker models a world of hidden knowledge and deceit. Scrabble demands that a player make intertemporal trade-offs between satisfying desires today and saving up for tomorrow. Bridge, perhaps the most “human” of all the games in this book, offers a world of flourishing language, alliances, communication, empathy—and cheating.

II.

Before getting into the "human versus machine" part of the book, I'll first spend a bit of time on what Roeder has to say about the history and strategy of some of the games he covers (checkers and chess hardly need introduction). Go is my favourite game featured in this book, so I appreciated his description of it:

Go is often touted as the most complex board game played by humans. As a matter of sheer calculation, that may be true. But in another sense, it is the simplest. The rules of Go are few, elemental, and pure. You can learn them in a minute or two. It’s the expression of those rules that generates intricacy.

Other interesting things that this book mentioned about Go were: the last game of the Japanese master Shusai in 1938, a grand event that toured around the country and took weeks and weeks to play; a poem from the Han dynasty by Ma Rong (see the end of this article for a partial translation); and an aside that,

Even Confucius begrudgingly agreed that playing Go was better than sitting idle.

Backgammon has an ancient history, with roots reaching back more than four thousand years to the Royal Game of Ur. It was popularized in North America (if only for a few generations) by a Russian aristocrat, Alexis Obolensky, who fled to the US after the Revolution in his homeland.

Regarding Poker, Roeder points out that it is not only a game of chance, but one where not all one's cards are on the table, so to speak. This aspect is an important consideration in game theory (emphasis added):

But poker adds another element to vary the rewards for the player while also injecting a deeply human quality: hidden information. In backgammon, you can see all your opponent’s checkers and every roll of her dice. In poker, your opponents’ cards are hidden and much of the game’s betting strategy involves deciphering what’s in their hands. In the parlance of game theory, it is a game of imperfect information.

Scrabble is a game that Roeder has played competitively, so the chapter on it was quite interesting, even though the game itself has never really taken my fancy. I thought his perspective that each allowed word is a "rule" of Scrabble very insightful:

The Scrabble dictionary is less a dictionary—as in a reference work containing definitions, etymologies, and usage—than it is a really, really long rule book. Each word is a rule: its appearance in the dictionary indicates its validity, and its absence indicates its invalidity. To play the game well you first have to learn tens of thousands of rules, a fact in sharp contrast to the elegant simplicity of Go.

So memorization is one necessary (but not sufficient) condition for doing well at Scrabble. Studying the short words (107 allowable two-letter combos, 1082 allowable three-letter combos, and 4218 allowable four-letter combos) is the first step to getting good at Scrabble. This lets you churn tiles you don't need while waiting for a great combo that uses the full rack. Scrabble champions need a vocabulary far beyond what anyone uses in day-to-day life (although they don't need to know the definitions). A mere 850 words is considered by some to be enough to get started in a language like English. In contrast,

The feats of top human Scrabble players are difficult to comprehend. For starters, there is the sheer scale. One linguistic study has found that just two thousand root words provide coverage for around 99 percent of spoken English-language discourse. Other researchers have found that many adults have an overall vocabulary of some thirty thousand words. There are, however, 192,111 words in the latest edition of the Scrabble dictionary used in North American play.

Another crucial skill you need to get good at to succeed in Scrabble is anagramming, which Roeder demonstrates in this chapter using well-selected examples. Memorization and anagramming are both areas where computers have an advantage, but they aren't where Scrabble strategy ends. Players also need to have a good sense of how to steward their tiles: when to save them and when to spend them. In fact, an early program called Maven influenced a change in Scrabble strategy on this point, from aiming for high turnover (trying to get rid of as many tiles as possible each turn) to holding back promising tiles for really high-scoring words.

Scrabble, probably chief among the games in this book, is not a test of a single type of skill but is, rather, a sort of brainy heptathlon of tests of various skills: memorization, anagramming, calculation, spatialization, long-term strategizing, game-theoretic tactics, bluffing, and the ability to roll with an ever-shifting meta-game.

The chapter on Bridge was a window into a strange and vanishing subculture. I'm assuredly not alone in my generation in never having played it; the New York Times ended its Bridge column in 2015. Its rise in popularity was around the same time period as Backgammon (although promoted by a Vanderbilt rather than a Russian aristocrat). The norm in the high-level Bridge world is for teams to have a sponsor (who is usually one of the team members, it seems), and it's reportedly not unheard of for good players at major tournaments to be making five figures for a few days of play. The impression one gets is that it is predominately played by a small circle of high society.

As for aspects of the game, it incorporates randomness, hidden information, and teamwork:

Bridge reveals in stark terms the difference between the human approach to games (and to life) and that taken by computers. Humans become good at games through theory—through writing that seeks to explain important concepts and filter endless possibilities. Humans think in anecdote and narrative. Computers, on the other hand, become good at games through raw computation—through speedy search and evaluation of a game’s large decision tree. Unique among games in this book, bridge remains one of a dwindling set of pursuits where the human approach is thus far superior.

Bridge has been difficult for computers to crack partly because of the way teammates need to work together with only very limited and formalized communication—and probably also in part because its lower popularity has motivated less effort to develop a winning program.  

One of the important elements to understand about Bridge is that the bidding phase is simultaneously about competing with your opponents and passing information (according to an elaborate set of conventions about what different bids mean) to your partner:

You and your partner will have to work out what you know that he knows, and what you know that he knows that you know, in order to determine the meaning of plays.

Computers aren't there yet. But for the other games, impressively strong programs have been developed. Quite a bit of Seven Games is about the history of that development, which I'll share some of the highlights of in the subsequent paragraphs of this review.

III.

The first chapter is about Checkers. The man-vs-machine match-up it discussed took place in 1992 or 1993 between a pro player named Tinsley who'd only lost 3 times since 1950 and a program called Chinook. I'll discuss the techniques a bit more later, but basically Chinook took advantage of computers' capacity to store huge amounts of data and also search through or analyze many possibilities quickly. It had a complete database of all possible endgames once only 7 pieces were left, and much of the 8-piece endgame was also fully mapped. So if it could get to the endgame, it would be able to play it out to the end without any mistakes (aside from errors in the database, which there were some). In this match-up, Tinsley won 4, lost 2, and drew 33 games.

On a checkers board there are 5x1020 possible arrangements. The next two games covered in Seven Games, Chess and Go, have dozens or hundreds of orders of magnitude more complexity (in terms of their possibility space):

In chess, each turn presents, on average, about thirty-five possible moves; in Go, that number is about 250. Multiply thirty-five by itself four times to calculate the number of possible four-move chess sequences, say, and you’ve got some 1.5 million. Multiply 250 by itself four times and you’ve got nearly four billion. The gulf only widens from there. Compounding the complexity, the average chess game lasts about forty moves per player; the average Go game lasts about a hundred.

These are not calculated exactly the same way as the number for Checkers, but as an approximation (considering average number of moves and average length of games), we have around 5x1061 possible unique games of Chess and 6x10239  possible unique games of Go (calculating the possible arrangements on a Go board—361 positions that can be black, white, or empty—gives 1.7x10172); not all of these arrangements will be valid, of course, but on the other hand, because Go pieces are placed rather than moved as in Checkers or Chess, the requirement for a consistent game history prunes less of the possibility space). The astronomical complexity of these games means a database of solved endgames like Chinook had wouldn't be as much of an edge. I'll say more about the techniques that were used later in this post.  

The greater complexity of these games meant that the seminal man-vs-machine match-ups occurred later than they did for Checkers. For Chess, Roeder discusses the match between IBM's Deep Blue and Grandmaster Garry Kasparov in 1997, in which Deep Blue won 3 games, lost 2, and drew 1. He also discusses some earlier history: the Mechanical Turk (which was a fraud) and El Ajedrecista (which worked, but was restricted to a simple end-game scenario). For Go, the match-up he focused on took place in 2016. It was between Lee Sedol, an 18-time international champion, and Google's AlphaGo. There is a documentary about this tournament. The program won 4 games and lost 1, but the game that Lee Sedol won is considered a masterpiece of play.

The rest of the games in Seven Games incorporate elements of chance and imperfect information. That makes them much more challenging for programming AI. Still, there has been steady progress.

For Backgammon, Roeder discusses a match between a programmed called TD-Gammon and a human player (Robertie) in 1991. The current reigning computer program (eXtreme Gammon) is a successor; both use an artificial neural network approach.

When it comes to Poker, Roeder doesn't cover a seminal match-up, but estimates that there were programs that could be competitive against good human players shortly before 2010. He lists Polaris as an example. A more recent program, Cepheus, plays "almost perfect" heads-up limit poker as of 2015. There are many variants of Poker, however, and the ones that are no-limit or involve more than 2 players are more complex:

Poker’s mathematical complexity rivals that of chess—or exceeds it, depending on the variant—and poker adds randomness and hidden information, hewing it more closely to the “real world” that AI researchers so badly want to influence.

There are 3.2 million (i.e. 3.2x106) distinct possibilities for Scrabble draws, and its complexity grows further when you consider the number of words that can be anagrammed from a set of drawn tiles plus all of the options for where to place the word on the board. In 1998, a notable Scrabble game was played at the New York Times building between a program called Maven and a human player; that was not Maven's debut, however, as it had played in tournaments as early as 1986. The top modern Scrabble program is apparently one called Quackle.

This aspect of the game, remembering and finding the words, which requires years of Scrabble-training effort to perfect, is precisely what computers are built for. But finding words in jumbles of tiles, while essential, isn’t the only important Scrabble skill. First, Scrabble, like life, is a trade-off between today and tomorrow—between spending and saving. It’s what an economist would call a dynamic programming problem. Therefore, one must have an accurate sense of the value of the tiles and of leaves. One must be adept at “grooming” one’s rack for big scores in the future, while also scoring as well as possible in the present.

Basically, from 1990 2020 all of these games went from having non-existent or not very good computer programs to ones that can win, even against top-rated players. The one exception might be Bridge—probably due to a mix of its complexity and relatively-limited popularity. Roeder calculates that there are 5.3x1028 possible deals of a single Bridge hand. Of course the bidding system explained above and the need to understand your partner very well add layers of extra strategy on top of simply playing out the cards.

IV.

Now that I've written about the games and their strategies, and the timeline of the development of competitive computer programs for them (which seems to be correlated with the complexity of each game), I want to cover what we can learn about AI research from this book. This aspect of Seven Games makes it a very timely new book.

The history of how each game was attacked provides a decent overview of several approaches/techniques that have been applied in AI research. First of all, searching quickly through rapidly-branching possible sequences of moves (and going deeper than a human mind can manage without losing track of all the branches) is something computers excel at:

So how does a computer play a game? Imagine standing at the base of a very tall tree, looking up. The tree is all the possible futures of a game. The trunk represents your next move, a big limb some possible move after that, the smaller branches some moves later, and the countless tiny twigs and leaves way at the top a continuation of possible moves in the distant futures of the game—the endgames. Humans stare up at the tree and are reminded of trees we’ve climbed, trees we’ve seen, and trees our friends have told us about before. We have an innate, primal sense about which limbs can bear our weight and which branches will bend under pressure, and we know which twigs seem hardy. We remember the times we fell, and how we made it to the top. We write down which branches are safe and which are risky, and we share this knowledge with our fellow humans. We climb trees—that is, we play games—through our intuition, experience, community, and literature. Computers, on the other hand, have no such intuition for the tree. But they can climb all over the place, very fast, like a colony of ants. This is called search. At each point on the tree that they happen to arrive, they perform a small calculation, assessing that location’s quality and awarding it a score. This is called evaluation. Before any move in a game like checkers, a computer’s ants might climb to millions of places on the tree, collecting calculations. If one route upward returns higher scores, that’s where the computer will head. Computers climb trees—that is, they play games—by searching and evaluating, searching and evaluating, searching and evaluating.

This approach is especially good for early-game and end-game situations, where the possibility space is more limited. For example, if a computer works backwards from endgame states and builds a database, those can be played from memory instead of calculating each time; high-level human players also spend a lot of time studying endgame scenarios, but the possibilities multiply very quickly. This was a significant part of Chinook's edge at Checkers:  

Much better would be to work out the correct endgame plays ahead of time and bake this knowledge into Chinook. It’s easy at first. With one piece on a checkerboard, a trivial endgame situation, there are just 120 possible positions—a checker could be on one of twenty-eight squares, a king could be on one of thirty-two squares, and the piece could be either white or black. With two pieces, though, there are seven thousand. With three pieces there are more than a quarter million. With four pieces, seven million; five pieces, one hundred and fifty million; six pieces, two and a half billion; seven pieces, thirty-five billion.

As the complexity of a game increases, though, even computers have to work smarter, not harder. Claude Shannon, the famous computer/information theorist, contrasted two possible approaches:

Shannon’s paper described two strategies for programming a computer to play chess. Type A programs employed brute-force search and evaluation, analyzing every possible move, and all possible subsequent moves, no matter how unpromising they might look to a human observer. In other words, they exploited the computer’s chief strength: sheer calculation. Shannon, however, thought such a player would be “both slow and weak.” Type B programs were more selective, examining only certain small yet promising branches on chess’s enormous tree. In other words, they exploited something like human intuition. Early computer scientists thought the successful players would be Type B, but with the rapid expansion of computing power, computer scientists were lured into pursuing Type A.

Shannon's Type B would need to incorporate expert knowledge/heuristics to focus on evaluating the most promising moves. This approach, a "combination of speedy calculation within human-defined parameters is sometimes called “good old-fashioned AI,”" according to Roeder (he uses the abbreviation GOFAI).

One of the strongest current chess programs, called Stockfish, was developped following this type of approach. It has the additional innovation of crowdsourced iteration:

The program, which debuted in 2008, is open-source, meaning you can easily browse through its source code yourself. There you will find things like “polynomial material imbalance parameters,” followed by lists of numbers typed in with care by a human. Anyone in the world can submit a potential improvement to the code via a system called Fishtest. That potential improvement is then tested against the old version over tens of thousands of games on volunteers’ computers. If the tweak does indeed improve Stockfish in a statistically significant way, it is officially implemented, and the program grows stronger still. Stockfish is, therefore, not only handcrafted but crowdsourced—a sort of meritocratic populist technocracy of chess.

But GOFAI is being edged out by a new paradigm: machine learning (ML). In this approach, instead of encoding expert insights or databases of early- and end-game states, large neural networks are applied to pattern recognition in truly enormous datasets. Stockfish was beat by AlphaZero (from the same team that made AlphaGo), a program using ML, in 2017. Convincingly. Using the leverage of recognizing the most promising moves, "AlphaZero searches only about eighty thousand positions per second, compared to Stockfish’s seventy million." However, it has a very costly (in computational terms, and also electricity) training process. AlphaZero gets trained/learns by playing against itself. In doing so, it independently re-discovered a lot of textbook openings such as the Queen's Gambit and the Sicilian Defense during its training.

Machine learning can be thought of as a way of recognizing and applying patterns that comes at things from the opposite direction as traditional statistical approaches. To be specific, when applying something like a regression model, you need to start with some structure (e.g. linear, quadratic, exponential, logistic, etc.) and try to explain the bulk of the variability in the data with as few parameters/degrees of freedom as possible; going to too high a degree of a polynomial without a justified (i.e. knowledge of causal mechanisms) reason for doing so can result in overfitting (discussed in this post), where the calibration data is well matched by the model, but extrapolation or interpolation can yield nonsensical results. In contrast, machine learning (at least the dominant neural-network based approach) avoids an assumed underlying structure and uses an extremely large number of parameters. Overfitting is still a potential pitfall to avoid, but given that it's usually applied to enormous (and massively multi-dimensional) datasets, the profusion of degrees of freedom can pick up on even relatively minor patterns. It is also able to match multiple inputs to multiple outputs, versus a few inputs to a single output in a traditional regression. Something to be careful of is that in a traditional regression, you can usually decipher meaning from the parameter values whereas in machine learning, the interpretation of parameter weights is basically opaque—you don't get a clear understanding of how inputs map to outputs so the model behaviour can catch you by surprise.

Basically, ML is a 'black-box' technique: you can see the inputs and outputs, but in between it's not clear what is going on or at least it can't easily be related to a mechanistic explanation. Other data science tools, in contrast, can provide explanatory insight into a problem (e.g. in principal component analysis it can be determined how strongly the original variables contribute to the new major axis) or take advantage of known causal mechanisms (e.g. a Kalman filter incorporates assumptions from a physical model of the system). A data scientist friend of mine that I've talked to about this subject prefers techniques that aren't black boxes, at least for systems that interact with the physical world, for reasons such as these. And here's what Roeder has to say:

AlphaGo can win at Go, but it can’t explain why it played the way it played. It can do, but it can’t teach. And maybe that’s fine for a Go player. But consider a structural engineer or a medical doctor, say, on the receiving end of some advice from a machine-learning system.

In spite of these limitations as a 'black-box', ML is a very powerful technique that is getting applied to more and more use cases. One crucial example is AlphaFold (by the same team that made AlphaGo), which has made breakthroughs in protein folding (I plan to have more to say about this in a future post). Another area of active research for applying AI/ML is fusion energy. Needless to say, a breakthrough there would be an absolute game-changer.

One of the reasons games have been a venue for a lot of early ML successes is related to the availability of training data. Machine learning models get 'trained' by tuning their millions or billions of parameters to get inputs to match the correct outputs. This process requires a lot of computing resources (i.e. time on super-computers) and electricity. It also requires (generally—there might be exceptions in techniques used at the cutting edge) good quality data where the inputs and outputs are correctly matched up. When it comes to games, training data can be generated by having the computer play itself a mind-boggling number of times, then boot-strap a sense of strategy by finding patterns in what types of moves led to wins or losses. Generating training data, while perhaps costly in terms of electricity and computer time, is still far cheaper than collecting data from the real world and validating its correctness. For this reason, I expect tasks that can be done entirely on a computer to be more susceptible to AI than tasks that interface with the physical world.

Seven Games provides some quantification of just how costly these ML models are to develop (although that may decrease with further research):

The human brain, yours and mine and Lee Sedol’s, is unbelievably complex. It’s home to some hundred billion neurons that make some hundred trillion connections. The human brain is also remarkably efficient. It operates on about twenty watts of power—barely enough to power a dim lightbulb. AlphaGo, on the other hand, required somewhat more resources.
...
An applied mathematician named Aidan Rocke estimated that merely training one version of AlphaGo had a footprint of ninety-six metric tons of carbon dioxide—roughly equivalent to a thousand hours of air travel or a year’s worth of electricity usage for twenty-three American homes. An engineer named Dan Huang estimated that replicating a certain forty-day experiment by DeepMind to train AlphaGo would cost $35 million. Put another way, it took the equivalent of nearly thirteen thousand human brains running continuously

AlphaGo is getting a lot of my attention in this post, both because I like the game it was applied to and because the team behind it has a truly-impressive track record. However, the approach of getting the computer to play games against itself to generate training data has been used by other programs too:

It might console humanity to learn that Cepheus, to get as good as it did, played more training hands of poker against itself than the entire human race has played during the entire history of time, times ten.

I'll return to this topic of the need for good-quality training data near the end of this post, but for now I want to pivot to another topic Roeder covers in Seven Games: how human players have adapted to computers getting so good. He views these adaptations, I think correctly, as a bit of a preview of how the spread of AI/ML programs to more endeavours might affect various jobs and society at large.

One adaptation when competing against a program is to do something unexpected—trying to make a play that's outside of the program's expert heuristics or training data:  

But it was a meta-gambit by Kasparov, something known as anti-computer chess. The idea was that by playing something esoteric, a human could kick the computer out of its opening book—its extensive baked-in knowledge of opening theory—and thereby gain an advantage.

Another adaptive strategy is to work with the computer, such as by training against a program that is as strong or stronger than any human opponent you're likely to face:

As for machines and chess, Kasparov has famously promoted the idea of “advanced chess”: that a computer and a human sitting side by side, a cyborg playing the pieces together, is stronger than any machine on its own. A certain sort of centaur chess, a close coupling of human and machine, may also help explain a third trend visible in Campbell’s data: the sharp increase in the late 2010s in the performance of the new top human, in this case Magnus Carlsen. Kasparov was right that computers would be tools for chess players. They’re tools the way weapons in an arms race are tools. The other guy has them, so if you want to compete, you must have them. And the world’s best player has embraced a new, powerful tool.
...
In stark ways, the prevalence of superhuman chess machines in the world of professional chess is a glimpse into our own civilian future, when AI technologies will seep into our personal and professional lives, and where the only way to make a living in many fields will be to work side by side with an artificially intelligent machine. In chess, that future is here. The computers have “increased pure understanding of the game at the expense of creativity, mystery, and dynamism,” writer Yoni Wilkenfeld put it in a recent essay titled “Can Chess Survive Artificial Intelligence?” Gone is Capablanca’s swashbuckling “battle of ideas.” The computer will tell you, within seconds, if your ideas are right or wrong. The sole source of originality in chess is now the machine, and humans struggle to channel it, or at least to mimic it.

The term 'centaur' used here really stood out to me. I also liked the emphasis that these programs are still tools. Their winning records can be impressive, even daunting, but they lack agency of their own. An interesting question is whether they will be tools that everyone has access to, or if the vanguard of the arms race will be kept exclusive such that it will be very difficult to be on top of your game (vis-à-vis other human players) if you don't have access to the latest and greatest software for practice and study.

At the same time, having all the top players merely trying to replicate the moves that a program has determined are near-optimal can result in rather boring gameplay. This is perhaps more noticeable in some games than in others:

Here lies a paradox. On the one hand, professional poker needs engaging, human personalities in order to attract new players to repopulate its ecosystem. But today’s best players are often bland acolytes of the silicon solvers.  

Figuring things out for oneself can also be a lot more satisfying:

“One of my theories in games is much of the wonder and the pleasure is in figuring out for yourself—that’s why you do it,” C. Thi Nguyen, a philosopher at the University of Utah and the author of Games: Agency as Art, told me. “If I want to win, a computer has figured out the optimal strategy for me to win. I don’t understand it, but if I follow these rules, then I’m going to win. I think that takes something really important away from us.”

This quote comes back to some of the limits of black-box models, too.

I'll wrap up the book review portion of this post with a final excerpt from Seven Games that relates some of these issues back to the world beyond games:

Nguyen does worry, however, about the technological gamification of the real world. Modern AI systems require gobs of data to train, which makes easily collectible and cheap data attractive. These easy, quantified metrics can stand in for success in fields whose aims are in reality complicated and subtle: clicks for journalism, steps for exercise, box office for cinema, auction prices for paintings. Nguyen calls this “value collapse”—when rich, subtle values are replaced by simplified, quantified versions of those values. In games, this is OK—a success in a game is indeed simple to measure (win, loss, draw) and, indeed, this simplicity is why they’ve received so much attention from computer scientists. But for modern AI, serious, real-world tasks—facial recognition, self-driving—might as well be games, trained as neural networks are to maximize relentlessly some numerical reward. To a hammer, everything looks like a nail. In this sense, the impressive prowess of AI at games might be less success story and more cautionary tale. Suppose a would-be auteur developed an AI to write a good series for Netflix. They used—sensibly, it may seem—engagement hours as part of their training data. But in so doing they’ve optimized their show for addiction, not aesthetic value.  

V.

Having discussed Seven Games and its themes, especially as they relate to AI/ML research, I wanted to finish this post with some other considerations on this rapidly-unfolding field. I'm very far from an expert on this, but I've tried to stay abreast of some prominent developments and can at least offer some links to explore further.

One of my take-aways from reading Roeder's book regarding machine learning systems was the need for good quality training data. With games, as discussed above, it can be generated by having the computer play against itself. The next easiest source is collecting data that has already been digitized (e.g. scraping the web). Going out and collecting data from the physical world involves the greatest time and expense. For this reason, I expect jobs that mostly revolve around digital content are the most susceptible to disruption from AI in the near future. Conversely, a potential expanding sector of jobs will be in producing high-quality digital data—think proof-reading or error-checking scanned/uploaded documents, or taking a variety of measurements in the real world to fill in gaps in databases. Additionally, companies will probably continue to find sneaky ways to crowd-source the validation of datasets (for the unaware, this has been one of the purposes of CAPTCHAs).

Something else to keep in mind regarding training data is the old rule "Garbage in, garbage out" (possibly mitigated somewhat by statistical techniques). For example, Large Language Models (LLMs) can generate text that sounds very natural. However, they're generally trained from what people have written all over the internet, so there's no guarantee that their written output will be good, true, or beautiful.

Machine learning tools for content generation (especially text and photos) are getting a ton of buzz right now. Here is a good explainer article about the technology. It covers things at a general level, so it can be good for understanding various specific software in this category better.  It explains how classification (e.g. image recognition) and generation (e.g. of a new image) tasks are closely related. It gives advice to approach using these tools the way you would a search engine, crafting the right prompt/query to return the most relevant results; the difference is that a search engine returns results from actually-existing content, whereas these new ML tools search the 'latent space' of possible content. I have another how-to from the same author, Jon Stokes, shared below.

The way that ML content generation engines search through latent space reminds me of the Mirror of Galadriel, from Lord of the Rings: it "shows many things, and not all have yet come to pass". This article also uses a 'magical mirror' metaphor to discuss coming up with useful prompts for an ML model. It's important to keep in mind that this is just a metaphor: these models use powerful statistical techniques, not magic. Aside from tips on crafting useful prompts, the article has valuable cautions about the limitations of these ML models:  

Remember, the language model is less “crystal ball” and more “Magic 8 Ball.” Its function is to predict the next word, not to give accurate answers. Even if it doesn’t know the answer, or there is no sensible answer, it will keep coming up with words.  
Language models don’t have a dictionary of words and their definitions. Instead they have a high-dimensional semantic space where words that are similar to each other are close, and words that are dissimilar are farther apart. The “meaning” of any given word (or subword token) is its position in relation to all other tokens. Connotations, not definitions.  

Because these content engines are tools without agency of their own, the key ethical issues relate to how they are used. As mentioned above, one possible misuse would be to gain an unfair edge over people who don't have access to the latest and greatest version. Another unethical use is for deceit, as in 'deepfakes' and similar—generating images or even videos of things that "have not yet come to pass" in an effort to convince people they did happen. This article goes into detail about that danger and related ones:

To get started, a person or group training the model gathers images with metadata (such as alt tags and captions found on the web) and forms a large data set. In Stable Diffusion's case, Stability AI uses a subset of the LAION-5B image set, which is basically a huge image scrape of 5 billion publicly accessible images on the Internet. Recent analysis of the data set shows that many of the images come from sites such as Pinterest, DeviantArt, and even Getty Images. As a result, Stable Diffusion has absorbed the styles of many living artists, and some of them have spoken out forcefully against the practice.  
...  
While concerns about data set quality and bias echo strongly among some AI researchers, the Internet remains the largest source of images with metadata attached. This trove of data is freely accessible, so it will always be a tempting target for developers of ISMs. Attempting to manually write descriptive captions for millions or billions of images for a brand-new ethical data set is probably not economically feasible at the moment, so it's the heavily biased data on the Internet that is currently making this technology possible.  
Realistic image synthesis models are potentially dangerous for reasons already mentioned, such as the creation of propaganda or misinformation, tampering with history, accelerating political division, enabling character attacks and impersonation, and destroying the legal value of photo or video evidence. In the AI-powered future, how will we know if any remotely produced piece of media came from an actual camera, or if we are actually communicating with a real human? On these questions, Mostaque is broadly hopeful. "There will be new verification systems in place, and open releases like this will shift the public debate and development of these tools," he said.

In this excerpt, high quality training data is seen as a bottleneck, as discussed previously. But scraping images from the public internet can bring a lot of garbage in (and also get tangled up in some copyright concerns). The last paragraph anticipates an arms race between producing and detecting fake media. It's time to get serious about epistemology if you aren't already.

Finally, here are some other links for further reading and exploration:

  • About large language models as well as smaller specialized ones.
  • Using text as the interface makes these models very generalizable.  
  • There is a current cheating scandal/controversy in chess involving alleged assistance/advice from a program in grandmaster games.
  • Potential impacts of this technology in the geopolitical arena.  
  • A sustainability perspective on the sheer cost and energy usage to train some of these cutting-edge models.
  • Want to try using an ML content engine? Here's tips on getting started with one for text and one for images. And here's a free non-cutting-edge image generator you can use online without an account.
  • I am once again asking you not to do this.

Permalink