DeepMind’s new AI can perform over 600 tasks, from playing games to controlling robots –

The ultimate achievement for some in the AI ​​industry is to create a system with artificial general intelligence (AGI), or the ability to understand and learn any task a human can do. Long relegated to the realm of science fiction, it has been suggested that AGI would create systems with the ability to reason, plan, learn, display knowledge, and communicate in natural language.

Not every expert is convinced that AGI is a realistic goal – or even possible. But one could argue that DeepMind, the Alphabet-backed research lab, made an attempt this week with the release of an AI system called gato

Gato is what DeepMind describes as a “general purpose system,” a system that can be taught to perform many different types of tasks. DeepMind researchers trained Gato to complete 604, to be precise, including captioning images, engaging in dialogue, stacking blocks with a real robotic arm, and playing Atari games.

Jack Hessel, a research scientist at the Allen Institute for AI, points out that a single AI system that can solve many tasks is not new. For example, Google recently started using a system in Google Search called multitask unified model, or MUM, that can process text, images, and videos to perform tasks from finding interlingual variations in the spelling of a word to relating a search query to an image. But what is potentially newer here, Hessel says, is the diversity of the tasks being tackled and the training method.

DeepMind Gato

DeepMind’s Gato Architecture.

“We’ve seen evidence before that individual models can handle surprisingly diverse sets of inputs,” Hessel told via email. “I think the key question when it comes to multitasking learning… whether the tasks complement each other or not. You could imagine a duller case if the model implicitly separates the tasks before solving them, for example: ‘If I detect task A as input, I will use subnet A. If I detect task B instead, I will use another subnet B. For that null hypothesis, similar performance could be achieved by training A and B separately, which is disappointing. If training A and B together leads to improvements for either (or both!), then things get more exciting.”

For example, like all AI systems, Gato learned billions of words, images from real and simulated environments, button presses, joint pairings, and more in the form of tokens. These tokens served to represent data in a way that Gato could understand, allowing the system to — say — tease the mechanics of Breakout, or whatever combination of words in a sentence might make grammatical sense.

Gato doesn’t necessarily do these tasks good† For example, when you chat with a person, the system often responds with a superficially or factually incorrect answer (eg “Marseille” in response to “What is the capital of France?”). When captioning photos, Gato is giving people the wrong gender. And the system only stacks blocks correctly 60% of the time with a real robot.

But on 450 of the 604 tasks listed above, DeepMind claims that Gato outperforms an expert more than half the time.

“If you think we need General [systems]what many people are in the field of AI and machine learning, then [Gato is] a big problem,” Matthew Guzdial, an assistant professor of computer science at the University of Alberta, told via email. “I think people who say it’s a big step toward AGI are overhyping it a bit, because we’re still not on human intelligence and probably won’t get there anytime soon (in my opinion). I’m personally more in the camp of many small models [and systems] more useful, but there are certainly advantages to these generalized models in terms of their performance on tasks outside of their training data. †

Oddly enough, from an architectural standpoint, Gato isn’t dramatically different from many of the AI ​​systems in production today. It shares features with OpenAI’s GPT-3 in that it is a ‘transformer’. Dating back to 2017, the Transformer has become the architecture of choice for complex reasoning tasks, demonstrating an aptitude for summarizing documents, generating music, classifying objects in images, and analyzing protein sequences.

DeepMind Gato

Complete the various tasks Gato learned.

Perhaps more remarkable, Gato is orders of magnitude smaller than single-task systems, including GPT-3, in terms of the number of parameters. Parameters are the parts of the system learned from training data and essentially determine the ability of the system to handle a problem, such as generating text. Gato has only 1.2 billion, while GPT-3 has more than 170 billion.

DeepMind researchers purposely kept Gato small so that the system could control a robotic arm in real time. But they hypothesize that Gato — if scaled up — could tackle any “task, behavior, and embodiment of interest.”

Assuming this turns out to be the case, several other hurdles would have to be overcome to make Gato superior to advanced single-task systems in specific tasks, such as Gato’s inability to learn continuously. Like most Transformer based systems, Gato’s knowledge of the world is based on the training data and remains static. If you ask Gato a date-sensitive question, such as the current US president, chances are he will answer incorrectly.

The Transformer – and Gato, by extension – has yet another limitation in its context window, or the amount of information the system can “remember” in the context of a given task. Even the best Transformer-based language models can’t write a long essay, let alone a book, without remembering important details and losing sight of the plot. Forgetting happens with any task, be it writing or controlling a robot, which is why some experts have: called it’s the “achilles heel” of machine learning.

“It’s not like Gato makes new things possible,” Guzdial added, pointing to the system’s shortcomings. †[B]it makes it clear that we can do more with modern machine learning models than we thought.”