Reasoning Model: Short Overview and Feature for Developers
towardsai.net
LatestMachine LearningReasoning Model: Short Overview and Feature for Developers 0 like January 21, 2025Share this postLast Updated on January 21, 2025 by Editorial TeamAuthor(s): Igor Novikov Originally published on Towards AI. Image by the authorWhen LLMs first came out they were kinda like children, they would say the first thing that came to mind and didnt bother much with logic. You had to tell them they should think before you speak. And just like with children even then it didnt mean they would think.Many argued that because of that, the models do not possess real intelligence and must be supplemented with either human help or some sort of external framework on top of an LLM, like Chain of Thought.It was only a matter of time before major LLM developers like OpenAI decided to replicate this external thinking step (see picture below) inside an LLM. After all, its pretty simple create a dataset that contains not just question-answer pairs but the whole step-by-step logic, and train on that. Additionally, it would require more computation resources at inference time, as a model would go through the same step-by-step thinking process when determining the answer.Added thinking step. Image by OpenAIThey natively break down problems into small pieces and integrate a Chain of thought approach, error correction, and trying multiple strategies before answering.O1 spends more time at inference (o1 is 30 times slower than Gpt4o), what a surprise longer thinking time leads to better results!Image by OpenAIReasoning tokens are not passed from one turn to the next only the output.Also, it verifies the solution by generating multiple answers and choosing the best via consensus, and the approach that we used to implement manually. Here is the overall process:Image by OpenAIOne important conclusion is that GPU computation requirements are going to grow as it is obvious that longer thinking time (in tokens) leads to better answers, so it is possible to scale model quality just by giving the model more computing power, whereas before this was mostly true at training phase. So GPU requirements for modern models are going to go significantly higher.These models are thus different and old approaches no longer work.How to work with reasoning modelsInterestingly it is kind of similar to working with an intelligent human:Be simple and direct. State your question clearly.No explicit Chain of Thought. The model will do that internallyHave a good structure: break the prompt into sections using clear markupShow vs tell: it is better to show the model and example of a good answer or behavior than describe it in several thousand wordsNo more need for coaxing, intimidating, or bribing the model nonsenseI can even summarize this into one: know what you want to ask and ask it clearly.Mini vs Full modelsSince reasoning models like o3 consume a lot of tokens during inference it is rather expensive to use them for everything and the latency is not great. So the idea is to delegate the most difficult task high-level thinking and planning, and have faster and more cost-efficient mini-models to execute the plan. They can be used for tasks like coding, math, and science.This is an agentic approach, that allows us to combine best of the both worlds smart but expensive models with small and fast workers.How much better these models are?Much better, and going to get even better soon. For o1 its approaching expert humans in math and coding (see below):MathImage by OpenAICodingImage by OpenAIELO 2727 puts o3 in the 200 best coders in the world. If you are not worried about your job security as a developer its time to start now. This is exactly the job that scales perfectly by adding more computing power and the current rate of progress is not showing any signs of slowing down.What is nextI can only speculate but my take is that for a year or two it is possible to dramatically improve the model quality just by adding more inference computing power and improving training datasets. Adding some sort of memory outside of the context window also seems logical although very expensive on a large scale.I think the next big step really is to implement multiagent architecture on the LLM level, so it can have multiple collaborating internal dialogues, that share the same memory and context. It follows the current trajectory of embedding external thinking tools into the model and also benefits from linear scaling of compute power at training and inference, so I think end of this or next year we will see an LMM, Large Multiagent Model, or something similar. The sky is the limit for such a model so I propose to call it SkyNet.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AITowards AI - Medium Share this post
0 Commentaires ·0 Parts ·47 Vue