ARSTECHNICA.COM
Are LLMs capable of non-verbal reasoning?
words are overrated Are LLMs capable of non-verbal reasoning? Processing in the "latent space" could help AI with tricky logical questions. Kyle Orland Dec 12, 2024 4:55 pm | 44 It's thinking, but not in words. Credit: Getty Images It's thinking, but not in words. Credit: Getty Images Story textSizeSmallStandardLargeWidth *StandardWideLinksStandardOrange* Subscribers only Learn moreLarge language models have found great success so far byusing their transformer architecturetoeffectively predict the next words (i.e., language tokens) needed to respond to queries. When it comes to complex reasoning tasks that require abstract logic, though, some researchers have found that interpreting everything through this kind of "language space" can start to cause some problems, even for modern "reasoning" models.Now, researchers are trying to work around these problems by crafting models that can work out potential logical solutions completely in "latent space"the hidden computational layer just before the transformer generates language. While this approach doesn't cause a sea change in an LLM's reasoning capabilities, it does show distinct improvements in accuracy for certain types of logical problems and shows some interesting directions for new research.Wait, what space?Modern reasoning models like ChatGPT's o1 tend to work by generating a "chain of thought." Each step of the logical process in these models is expressed as a sequence of natural language word tokens which are fed back through the model.In a new paper, researchers at Meta's Fundamental AI Research team (FAIR) and UC San Diego identify this reliance on natural language and "word tokens" as a "fundamental constraint" for these reasoning models. That's because the successful completion of reasoning tasks often requires complex planning on specific critical tokens to figure out the right logical path from a number of options. A figure illustrating the difference between standard models going through a transformer after every step and the COCONUT model's use of hidden, "latent" states. Credit: Training Large Language Models to Reason in a Continuous Latent Space In current chain-of-thought models, though, word tokens are often generated for "textual coherence" and "fluency" while "contributing little to the actual reasoning process," the researchers write. Instead, they suggest, "it would be ideal for LLMs to have the freedom to reason without any language constraints and then translate their findings into language only when necessary."To achieve that "ideal," the researchers describe a method for "Training Large Language Models to Reason in a Continuous Latent Space," as the paper's title puts it. That "latent space" is essentially made up of the "hidden" set of intermediate token weightings that the model contains just before the transformer generates a human-readable natural language version of that internal state.In the researchers' COCONUT model (for Chain Of CONtinUous Thought), those kinds of hidden states are encoded as "latent thoughts" that replace the individual written steps in a logical sequence both during training and when processing a query. This avoids the need to convert to and from natural language for each step and "frees the reasoning from being within the language space," the researchers write, leading to an optimized reasoning path that they term a "continuous thought."Being more breadth-mindedWhile doing logical processing in the latent space has some benefits for model efficiency, the more important finding is that this kind of model can "encode multiple potential next steps simultaneously." Rather than having to pursue individual logical options fully and one by one (in a "greedy" sort of process), staying in the "latent space" allows for a kind of instant backtracking that the researchers compare to a breadth-first-search through a graph.This emergent, simultaneous processing property comes through in testing even though the model isn't explicitly trained to do so, the researchers write. "While the model may not initially make the correct decision, it can maintain many possible options within the continuous thoughts and progressively eliminate incorrect paths through reasoning, guided by some implicit value functions," they write. A figure highlighting some of the ways different models can fail at certain types of logical inference. Credit: Training Large Language Models to Reason in a Continuous Latent Space That kind of multi-path reasoning didn't really improve COCONUT's accuracy over traditional chain-of-thought models on relatively straightforward tests of math reasoning (GSM8K) or general reasoning (ProntoQA). But the researchers found the model did comparatively well on a randomly generated set of ProntoQA-style queries involving complex and winding sets of logical conditions (e.g., "every apple is a fruit, every fruit is food, etc.")For these tasks, standard chain-of-thought reasoning models would often get stuck down dead-end paths of inference or even hallucinate completely made-up rules when trying to resolve the logical chain. Previous research has also shown that the "verbalized" logical steps output by these chain-of-thought models "may actually utilize a different latent reasoning process" than the one being shared.This new research joins a growing body of research looking to understand and exploit the way large language models work at the level of their underlying neural networks. And while that kind of research hasn't led to a huge breakthrough just yet, the researchers conclude that models pre-trained with these kinds of "continuous thoughts" from the get-go could "enable models to generalize more effectively across a wider range of reasoning scenarios."Kyle OrlandSenior Gaming EditorKyle OrlandSenior Gaming Editor Kyle Orland has been the Senior Gaming Editor at Ars Technica since 2012, writing primarily about the business, tech, and culture behind video games. He has journalism and computer science degrees from University of Maryland. He once wrote a whole book about Minesweeper. 44 Comments
0 Comentários
0 Compartilhamentos
32 Visualizações