An engineer toying round with ChatGPT discovered OpenAI’s apparently world-leading LLM getting a bit bolshy about how it might do at chess. In actual fact, ChatGPT itself requested Citrix engineer Robert Caruso to set it up in opposition to a primary chess program to see “how rapidly” it might win: after which proceeded to get battered by an Atari 2600.
First issues first: chess engines are actually unquestionably superior to human gamers, and an off-the-shelf program like Stockfish will handily trounce one of the best on the planet. There are additionally AI-based chess engines from the likes of DeepMind. And ChatGPT 4o, the most recent mannequin, could also be a frontrunner in LLMs—however it’s not a chess engine.
However, you would possibly anticipate one thing a bit extra spectacular than this. Speaking to ChatGPT concerning the historical past of AI in chess “led to it volunteering to play Atari Chess,” stated Caruso on LinkedIn. “It wished to learn the way rapidly it may beat a recreation that solely thinks 1-2 strikes forward on a 1.19 MHz CPU.”
And?
“ChatGPT acquired completely wrecked on the newbie degree,” says Caruso. “Regardless of being given a baseline board format to establish items, ChatGPT confused rooks for bishops, missed pawn forks, and repeatedly misplaced observe of the place items have been—first blaming the Atari icons as too summary to acknowledge, then faring no higher even after switching to plain chess notation. It made sufficient blunders to get laughed out of a third grade chess membership.”
Video Chess is as primary as chess software program comes, which is solely a operate of its period: the foremost problem for the programmers was creating a working engine inside 4KB (which was nonetheless double the usual 2KB for different VCS video games). It primarily brute forces one of the best transfer in a given place, however lacks an general technique and would not suppose forward.
An honest human participant, in different phrases, ought to have a reasonably simple time conquering Video Chess. However for 90 minutes Caruso “needed to cease [ChatGPT] from making terrible strikes and proper its board consciousness a number of instances per flip. It stored promising it might enhance ‘if we simply began over.’ Ultimately, even ChatGPT knew it was beat—and conceded with its head hung low.”

ChatGPT itself requested for the sport of chess in opposition to an Atari, “which it proclaimed it might simply win,” after a dialog about Stockfish and AlphaZero. The LLM was apparently “curious how rapidly it may win” and, as a result of Caruso had instructed it he was a weak participant, “provided to show me technique alongside the best way.”
The story is not solely one-sided. Caruso says that when ChatGPT had an correct sense of the board it provided him some “stable steering” and at instances was “genuinely spectacular.” However at others, and this will probably be acquainted to anybody who’s spent a lot time playing around with ChatGPT, “it made absurd recommendations… or tried to maneuver items that had already been captured, even throughout turns when it in any other case had an correct view of the board.”
Naturally the AI evangelists will probably be out in power to say that is meaningless, it is not what LLMs are designed to do, and so forth. However this does elevate wider questions concerning the expertise and notably its understanding of context (or lack thereof). “Its incapacity to retain a primary board state from flip to show was very disappointing,” says Caruso. “Is that actually any totally different from forgetting different essential context in a dialog?”
In a nod to Atari’s once-famous advertising slogan, Caruso indicators off: “Have you ever performed Atari immediately? ChatGPT needs it hadn’t.”
