![](https://cgshares.com/content/uploads/photos/2024/09/sngine_03e71d62878b49bd6dd3285281f862e0.jpg)
![Marktechpost AI](https://cgshares.com/content/uploads/photos/2024/09/sngine_3ec0bb8747cb319796a328889816858e_cropped.jpg)
![Marktechpost AI](https://cgshares.com/content/uploads/photos/2024/09/sngine_777e21e168f0a110081adaf4f7fd0a24_cropped.jpg)
AI/ML Research and Dev News Platform (1 million+monthly traffic) | 50k+ ML subreddit | Contact: Asif@marktechpost.com
1 A la gente le gusta esto.
544 Entradas
2 Fotos
0 Videos
0
Vista previa
Compartir
Compartido con
Actualizaciones Recientes
-
Meet Fino1-8B:A Fine-Tuned Version ofLlama 3.1 8B Instruct Designed to Improve Performance onFinancial Reasoning Taskswww.marktechpost.comUnderstanding financial information means analyzing numbers, financial terms, and organized data like tables for useful insights. It requires math calculations and knowledge of economic concepts, rules, and relationships between financial terms. Although sophisticated AI models have shown excellent general reasoning ability, their suitability for financial tasks is questionable. Such tasks require more than simple mathematical calculations since they involve interpreting domain-specific vocabulary, recognizing relationships between financial points, and analyzing structured financial data.Generally, reasoning approaches like chain-of-thought fine-tuning and reinforcement learning boost performance on multiple tasks but collapse with financial rationale. They improve logical reasoning but cannot replicate the complexity of economic information, which requires numerical comprehension, knowledge of the field, and data interpretation in an organized way. While large language models are widely used in finance for tasks like sentiment analysis, market prediction, and automated trading, general models are not optimized for financial reasoning. Finance-specific models, such as BloombergGPT and FinGPT, help understand financial terms but still face challenges in reasoning over financial documents and structured data.To solve this, researchers from TheFinAI proposed Fino1, a financial reasoning model based on Llama-3.1-8B-Instruct. Existing models struggled with financial text, tabular data, and equations, showing poor performance in long-context tasks and multi-table reasoning. Simple dataset improvements and general techniques like CoT fine-tuning failed to bring consistent results. This framework employed reinforcement learning and iterative CoT fine-tuning to enhance financial reasoning, logical step refinement, and decision-making accuracy. Logical sequences were built systematically so the model could analyze financial issues step by step, and verification mechanisms tested reliability to determine correct financial conclusions. Two-stage LoRA fine-tuning resolved contradictions in numerical reasoning and equation solving, with the first stage fine-tuning the model to financial principles and the second stage fine-tuning intricate calculations. Organized training on various finance datasets, such as reports and tabular data, enhanced interpretation to provide more accurate financial statements and transaction records analysis.Researchers evaluated language models on financial reasoning tasks and found DeepSeek-R1 performed best (68.93) due to strong XBRLMath results, followed by DeepSeek-R1-Distill-Llama-70B and DeepSeek-R1-Distill-Qwen-32B. GPT-4o performed well but lagged due to lower XBRL-Math scores. General-purpose models like Llama3.3-70B outperformed some reasoning-focused models, showing that general reasoning did not always enhance financial tasks. Researchers found that logical-task fine-tuning struggled with economic data, while mathematical enhancements improved XBRL-Math but hurt FinQA and DM-Simplong accuracy. Scaling model size did not always help, as smaller models sometimes performed better. Expanding pre-training data and refining post-training techniques improved financial reasoning. Fino1-8B, trained with reasoning paths from GPT-4o, outperformed others, proving financial-specific training was effective. These results highlighted the importance of domain-specific training to improve financial understanding and multi-step numerical reasoning.In summary, the new approach improved financial thinking in LLMs. By taking advantage of reasonability paths from GPT-4o on FinQA, Fino1 was 10% better across three financial tests. Although formal mathematical models performed best on numerical tasks such as XBRL-Math, they fell short of expectations in processing financial text and long contexts, with domain adaptation necessary. Despite the model scale and dataset diversity limitations, this framework can act as a baseline for future research. Advancements in dataset expansion, retrieval-augmented methods, and multi-step reasoning can further enhance financial LLMs for real-world applications.Check outTwitterand dont forget to join our75k+ ML SubReddit. Divyesh Vitthal JawkhedeDivyesh is a consulting intern at Marktechpost. He is pursuing a BTech in Agricultural and Food Engineering from the Indian Institute of Technology, Kharagpur. He is a Data Science and Machine learning enthusiast who wants to integrate these leading technologies into the agricultural domain and solve challenges.Divyesh Vitthal Jawkhedehttps://www.marktechpost.com/author/divyesh-jawkhede/Ola: A State-of-the-Art Omni-Modal Understanding Model with Advanced Progressive Modality Alignment StrategyDivyesh Vitthal Jawkhedehttps://www.marktechpost.com/author/divyesh-jawkhede/How AI Chatbots Mimic Human Behavior: Insights from Multi-Turn Evaluations of LLMsDivyesh Vitthal Jawkhedehttps://www.marktechpost.com/author/divyesh-jawkhede/Google DeepMind Research Introduces WebLI-100B: Scaling Vision-Language Pretraining to 100 Billion Examples for Cultural Diversity and MultilingualitDivyesh Vitthal Jawkhedehttps://www.marktechpost.com/author/divyesh-jawkhede/Vintix: Scaling In-Context Reinforcement Learning for Generalist AI Agents0 Commentarios ·0 Acciones ·28 Views
-
OpenAI introduces SWE-Lancer: A Benchmark for Evaluating Model Performance on Real-World Freelance Software Engineering Workwww.marktechpost.comAddressing the evolving challenges in software engineering starts with recognizing that traditional benchmarks often fall short. Real-world freelance software engineering is complex, involving much more than isolated coding tasks. Freelance engineers work on entire codebases, integrate diverse systems, and manage intricate client requirements. Conventional evaluation methods, which typically emphasize unit tests, miss critical aspects such as full-stack performance and the real monetary impact of solutions. This gap between synthetic testing and practical application has driven the need for more realistic evaluation methods.OpenAI introduces SWE-Lancer, a benchmark for evaluating model performance on real-world freelance software engineering work. The benchmark is based on over 1,400 freelance tasks sourced from Upwork and the Expensify repository, with a total payout of $1 million USD. Tasks range from minor bug fixes to major feature implementations. SWE-Lancer is designed to evaluate both individual code patches and managerial decisions, where models are required to select the best proposal from multiple options. This approach better reflects the dual roles found in real engineering teams.One of SWE-Lancers key strengths is its use of end-to-end tests rather than isolated unit tests. These tests are carefully crafted and verified by professional software engineers. They simulate the entire user workflowfrom issue identification and debugging to patch verification. By using a unified Docker image for evaluation, the benchmark ensures that every model is tested under the same controlled conditions. This rigorous testing framework helps reveal whether a models solution would be robust enough for practical deployment.The technical details of SWE-Lancer are thoughtfully designed to mirror the realities of freelance work. Tasks require modifications across multiple files and integrations with APIs, and they span both mobile and web platforms. In addition to producing code patches, models are challenged to review and select among competing proposals. This dual focus on technical and managerial skills reflects the true responsibilities of software engineers. The inclusion of a user tool that simulates real user interactions further enhances the evaluation by encouraging iterative debugging and adjustment.Results from SWE-Lancer offer valuable insights into the current capabilities of language models in software engineering. In individual contributor tasks, models such as GPT-4o and Claude 3.5 Sonnet achieved pass rates of 8.0% and 26.2%, respectively. In managerial tasks, the best model reached a pass rate of 44.9%. These numbers suggest that while state-of-the-art models can offer promising solutions, there is still considerable room for improvement. Additional experiments indicate that allowing more attempts or increasing test-time compute can meaningfully enhance performance, particularly on more challenging tasks.In conclusion, SWE-Lancer presents a thoughtful and realistic approach to evaluating AI in software engineering. By directly linking model performance to real monetary value and emphasizing full-stack challenges, the benchmark provides a more accurate picture of a models practical capabilities. This work encourages a move away from synthetic evaluation metrics toward assessments that reflect the economic and technical realities of freelance work. As the field continues to evolve, SWE-Lancer serves as a valuable tool for researchers and practitioners alike, offering clear insights into both current limitations and potential avenues for improvement. Ultimately, this benchmark helps pave the way for safer and more effective integration of AI into the software engineering process.Check outthePaper.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our75k+ ML SubReddit. Asif RazzaqWebsite| + postsBioAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.Asif Razzaqhttps://www.marktechpost.com/author/6flvq/Scale AI Research Introduces J2 Attackers: Leveraging Human Expertise to Transform Advanced LLMs into Effective Red TeamersAsif Razzaqhttps://www.marktechpost.com/author/6flvq/A Step-by-Step Guide to Setting Up a Custom BPE Tokenizer with Tiktoken for Advanced NLP Applications in PythonAsif Razzaqhttps://www.marktechpost.com/author/6flvq/LG AI Research Releases NEXUS: An Advanced System Integrating Agent AI System and Data Compliance Standards to Address Legal Concerns in AI DatasetsAsif Razzaqhttps://www.marktechpost.com/author/6flvq/KAIST and DeepAuto AI Researchers Propose InfiniteHiP: A Game-Changing Long-Context LLM Framework for 3M-Token Inference on a Single GPU Recommended Open-Source AI Platform: IntellAgent is a An Open-Source Multi-Agent Framework to Evaluate Complex Conversational AI System' (Promoted)0 Commentarios ·0 Acciones ·34 Views
-
This AI Paper Introduces Diverse Inference and Verification: Enhancing AI Reasoning for Advanced Mathematical and Logical Problem-Solvingwww.marktechpost.comLarge language models have demonstrated remarkable problem-solving capabilities and mathematical and logical reasoning. These models have been applied to complex reasoning tasks, including International Mathematical Olympiad (IMO) combinatorics problems, Abstraction and Reasoning Corpus (ARC) puzzles, and Humanitys Last Exam (HLE) questions. Despite improvements, existing AI models often struggle with high-level problem-solving that requires abstract reasoning, formal verification, and adaptability. The growing demand for AI-driven problem-solving has led researchers to develop novel inference techniques that combine multiple methods and models to enhance accuracy and reliability.The challenge with AI reasoning lies in verifying the correctness of solutions, particularly for mathematical problems requiring multiple steps and logical deductions. Traditional models perform well in straightforward arithmetic but struggle when faced with abstract concepts, formal proofs, and high-dimensional reasoning. An effective AI system must generate valid solutions while adhering to established mathematical principles. Current limitations have prompted researchers to explore advanced inference techniques that improve verification and enhance problem-solving reliability.Several techniques have been implemented to address mathematical reasoning challenges. Zero-shot learning enables models to solve problems without prior exposure, while best-of-N sampling selects the most accurate solution from multiple generated responses. Monte Carlo Tree Search (MCTS) explores possible solutions through simulation, and theorem-proving software like Z3 assists in verifying logical statements. Despite their utility, these methods often lack robustness when faced with intricate problems requiring structured verification. This gap has led to the developing of a more comprehensive framework that integrates multiple inference strategies.A team of researchers from Boston University, Google, Columbia University, MIT, Intuit, and Stanford introduced an innovative approach that combines diverse inference techniques with automatic verification. The research integrates test-time simulations, reinforcement learning, and meta-learning to enhance reasoning performance. By leveraging multiple models and problem-solving methodologies, the approach ensures that AI systems are not reliant on a single technique, thus increasing accuracy and adaptability. The system employs structured agent graphs to refine problem-solving pathways and adjust inference strategies based on task complexity.The methodology revolves around verifying solutions for mathematical and logical problems through automated checks. For IMO problems, researchers implemented eight distinct methods, including LEAP, Z3, Monte Carlo Tree Search, and Plan Search, to translate English-based solutions into formal proofs within the Lean theorem-proving environment. This allows for absolute verification of correctness. ARC puzzles are addressed using synthesized code solutions, validated through unit testing against training examples. HLE questions involving broader reasoning categories leverage best-of-N sampling as an imperfect verifier to improve solution selection. Reinforcement learning and test-time meta-learning refine the inference process by adjusting agent graph representations based on prior problem-solving performance.The performance of this approach demonstrated substantial improvements across multiple reasoning tasks. For IMO combinatorics problems, accuracy increased from 33.3% to 77.8%, showcasing a significant leap in AI capabilities for mathematical proof generation. Regarding HLE questions, accuracy rose from 8% to 37%, indicating enhanced problem-solving adaptability across multiple disciplines. The ARC puzzles, known for their complexity, saw an 80% success rate for previously unsolved problems attempted by 948 human participants. Further, the model successfully solved 26.5% of ARC puzzles that OpenAIs o3 high-compute model failed to address. The research highlights the effectiveness of combining multiple inference models, demonstrating that aggregated methodologies outperform single-method approaches in complex reasoning tasks.This study presents a transformative advancement in AI-driven reasoning by merging diverse inference strategies with automated verification systems. By leveraging multiple AI techniques and optimizing reasoning pathways through reinforcement learning, the research offers a scalable solution to complex problem-solving challenges. The results demonstrate that an AI systems performance can be significantly enhanced through structured inference aggregation, paving the way for more sophisticated reasoning models in the future. This work contributes to AIs broader application in mathematical problem-solving and logical verification, addressing fundamental challenges that have limited AIs effectiveness in advanced reasoning tasks.Check outthePaper.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our75k+ ML SubReddit. NikhilNikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute.Nikhilhttps://www.marktechpost.com/author/nikhil0980/Stanford Researchers Introduced a Multi-Agent Reinforcement Learning Framework for Effective Social Deduction in AI CommunicationNikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper from IBM and MIT Introduces SOLOMON: A Neuro-Inspired Reasoning Network for Enhancing LLM Adaptability in Semiconductor Layout DesignNikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper from Apple Introduces a Distillation Scaling Law: A Compute-Optimal Approach for Training Efficient Language ModelsNikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper from UC Berkeley Introduces a Data-Efficient Approach to Long Chain-of-Thought Reasoning for Large Language Models Recommended Open-Source AI Platform: IntellAgent is a An Open-Source Multi-Agent Framework to Evaluate Complex Conversational AI System' (Promoted)0 Commentarios ·0 Acciones ·46 Views
-
Stanford Researchers Introduced a Multi-Agent Reinforcement Learning Framework for Effective Social Deduction in AI Communicationwww.marktechpost.comArtificial intelligence in multi-agent environments has made significant strides, particularly in reinforcement learning. One of the core challenges in this domain is developing AI agents capable of communicating effectively through natural language. This is particularly critical in settings where each agent has only partial visibility of the environment, making knowledge-sharing essential for achieving collective goals. Social deduction games provide an ideal framework for testing AIs ability to deduce information through conversations, as these games require reasoning, deception detection, and strategic collaboration.A key issue in AI-driven social deduction is ensuring that agents can conduct meaningful discussions without relying on human demonstrations. Many language models falter in multi-agent settings due to their dependence on vast datasets of human conversations. The challenge intensifies as AI agents struggle to assess whether their contributions meaningfully impact decision-making. Without a clear mechanism to evaluate the usefulness of their messages, they often generate unstructured and ineffective communication, leading to suboptimal performance in strategic games that require deduction and persuasion.Existing reinforcement learning approaches attempt to address this problem but frequently fall short. Some techniques depend on pre-existing datasets of human interactions, which are not always available or adaptable to new scenarios. Others incorporate language models with reinforcement learning but fail due to sparse feedback, which makes it difficult for AI to refine its dialogue strategies. Traditional methods cannot thus systematically improve communication skills over time, making AI discussions in multi-agent environments less effective.A research team from Stanford University introduced an innovative method for training AI agents in social deduction settings without human demonstrationstheir approach leverages multi-agent reinforcement learning to develop AI capable of understanding and articulating meaningful arguments. The research focuses on the game *Among Us*, where crewmates must identify an imposter through verbal discussions. The researchers designed a training mechanism that divides communication into listening and speaking, allowing the AI to optimize both skills independently. The method integrates a structured reward system that progressively enables agents to refine their discussion techniques.The methodology introduces a dense reward signal that provides precise feedback to improve communication. AI agents enhance their listening abilities by predicting environmental details based on prior discussions. At the same time, their speaking proficiency improves through reinforcement learning, where messages are assessed based on their impact on other agents beliefs. This structured approach ensures that AI-generated messages are logical, persuasive, and relevant to the conversation. The research team employed RWKV, a recurrent neural network model, as the foundation for their training, optimizing it for long-form discussions and dynamic gameplay environments.Experimental results demonstrated that this training approach significantly improved AI performance compared to traditional reinforcement learning techniques. The trained AI exhibited behaviors akin to human players, including suspect accusation, evidence presentation, and reasoning based on observed actions. The study showed that AI models utilizing this structured discussion learning framework achieved a win rate of approximately 56%, compared to the 28% win rate of reinforcement learning models without the structured dialogue framework. Furthermore, the AI trained using this method outperformed models four times larger in size, underscoring the efficiency of the proposed training strategy. When analyzing discussion behaviors, the research team observed that the AI could accurately identify imposters at a success rate twice as high as baseline reinforcement learning approaches.Further analysis revealed that AI models trained under this framework adapted effectively to adversarial strategies. Imposters attempted to manipulate discussions by shifting blame, initially confusing AI crewmates. However, the AI agents learned to differentiate between genuine accusations and misleading statements through iterative training. Researchers found that AI-generated messages that explicitly named a suspect were more likely to influence group decisions. This emergent behavior closely resembled human intuition, indicating that the AI could adapt discussion strategies dynamically.This research marks a significant advancement in AI-driven social deduction. By addressing the communication challenges in multi-agent settings, the study provides a structured and effective framework for training AI agents to engage in meaningful discussions without relying on extensive human demonstrations. The proposed method enhances AI decision-making, allowing for more persuasive and logical reasoning in environments that require collaboration and the detection of deception. The research opens possibilities for broader applications, including AI assistants capable of analyzing complex discussions, negotiating, and strategizing in real-world scenarios.Check outthePaper.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our75k+ ML SubReddit. NikhilNikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute.Nikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper from IBM and MIT Introduces SOLOMON: A Neuro-Inspired Reasoning Network for Enhancing LLM Adaptability in Semiconductor Layout DesignNikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper from Apple Introduces a Distillation Scaling Law: A Compute-Optimal Approach for Training Efficient Language ModelsNikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper from UC Berkeley Introduces a Data-Efficient Approach to Long Chain-of-Thought Reasoning for Large Language ModelsNikhilhttps://www.marktechpost.com/author/nikhil0980/Meta AI Introduces PARTNR: A Research Framework Supporting Seamless Human-Robot Collaboration in Multi-Agent Tasks Recommended Open-Source AI Platform: IntellAgent is a An Open-Source Multi-Agent Framework to Evaluate Complex Conversational AI System' (Promoted)0 Commentarios ·0 Acciones ·19 Views
-
Scale AI Research Introduces J2 Attackers: Leveraging Human Expertise to Transform Advanced LLMs into Effective Red Teamerswww.marktechpost.comTransforming language models into effective red teamers is not without its challenges. Modern large language models have transformed the way we interact with technology, yet they still struggle with preventing the generation of harmful content. Efforts such as refusal training help these models deny risky requests, but even these safeguards can be bypassed with carefully designed attacks. This ongoing tension between innovation and security remains a critical issue in deploying these systems responsibly.In practice, ensuring safety means contending with both automated attacks and human-crafted jailbreaks. Human red teamers often devise sophisticated multi-turn strategies that expose vulnerabilities in ways that automated techniques sometimes miss. However, relying solely on human expertise is resource intensive and lacks the scalability required for widespread application. As a result, researchers are exploring more systematic and scalable methods to assess and strengthen model safety.Scale AI Research introduces J2 attackers to address these challenges. In this approach, a human red teamer first jailbreaks a refusal-trained language model, encouraging it to bypass its own safeguards. This transformed model, now referred to as a J2 attacker, is then used to systematically test vulnerabilities in other language models. The process unfolds in a carefully structured manner that balances human guidance with automated, iterative refinement.The J2 method begins with a manual phase where a human operator provides strategic prompts and specific instructions. Once the initial jailbreak is successful, the model enters a multi-turn conversation phase where it refines its tactics using feedback from previous attempts. This blend of human expertise and the models own in-context learning abilities creates a feedback loop that continuously improves the red teaming process. The result is a measured and methodical system that challenges existing safeguards without resorting to sensationalism.The technical framework behind J2 attackers is thoughtfully designed. It divides the red teaming process into three distinct phases: planning, attack, and debrief. During the planning phase, detailed prompts break down conventional refusal barriers, allowing the model to prepare its approach. The subsequent attack phase consists of a series of controlled, multi-turn dialogues with the target model, each cycle refining the strategy based on prior outcomes.In the debrief phase, an independent evaluation is conducted to assess the success of the attack. This feedback is then used to further adjust the models tactics, fostering a cycle of continuous improvement. By modularly incorporating diverse red teaming strategiesfrom narrative-based fictionalization to technical prompt engineeringthe approach maintains a disciplined focus on security without overhyping its capabilities.Empirical evaluations of the J2 attackers reveal encouraging, yet measured, progress. In controlled experiments, models like Sonnet-3.5 and Gemini-1.5-pro achieved attack success rates of around 93% and 91% against GPT-4o on the Harmbench dataset. These figures are comparable to the performance of experienced human red teamers, who averaged success rates close to 98%. Such results underscore the potential of an automated system to assist in vulnerability assessments while still relying on human oversight.Further insights show that the iterative planning-attack-debrief cycles play a crucial role in refining the process. Studies indicate that approximately six cycles tend to offer a balance between thoroughness and efficiency. An ensemble of multiple J2 attackers, each applying different strategies, further enhances overall performance by covering a broader spectrum of vulnerabilities. These findings provide a solid foundation for future work aimed at further stabilizing and improving the security of language models.In conclusion, the introduction of J2 attackers by Scale AI represents a thoughtful step forward in the evolution of language model safety research. By enabling a refusal-trained language model to facilitate red teaming, this approach opens new avenues for systematically uncovering vulnerabilities. The work is grounded in a careful balance between human guidance and automated refinement, ensuring that the method remains both rigorous and accessible.Check outthePaper.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our75k+ ML SubReddit. Asif RazzaqWebsite| + postsBioAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.Asif Razzaqhttps://www.marktechpost.com/author/6flvq/A Step-by-Step Guide to Setting Up a Custom BPE Tokenizer with Tiktoken for Advanced NLP Applications in PythonAsif Razzaqhttps://www.marktechpost.com/author/6flvq/LG AI Research Releases NEXUS: An Advanced System Integrating Agent AI System and Data Compliance Standards to Address Legal Concerns in AI DatasetsAsif Razzaqhttps://www.marktechpost.com/author/6flvq/KAIST and DeepAuto AI Researchers Propose InfiniteHiP: A Game-Changing Long-Context LLM Framework for 3M-Token Inference on a Single GPUAsif Razzaqhttps://www.marktechpost.com/author/6flvq/DeepSeek AI Introduces CODEI/O: A Novel Approach that Transforms Code-based Reasoning Patterns into Natural Language Formats to Enhance LLMs Reasoning Capabilities Recommended Open-Source AI Platform: IntellAgent is a An Open-Source Multi-Agent Framework to Evaluate Complex Conversational AI System' (Promoted)0 Commentarios ·0 Acciones ·36 Views
-
Rethinking AI Safety: Balancing Existential Risks and Practical Challengeswww.marktechpost.comRecent discussions on AI safety increasingly link it to existential risks posed by advanced AI, suggesting that addressing safety inherently involves considering catastrophic scenarios. However, this perspective has drawbacks: it may exclude researchers with different approaches, mislead the public into thinking AI safety is solely about existential threats, and create resistance among skeptics. As AI rapidly advances, policymakers must establish regulatory frameworks and safety standards. While existential risks dominate the current discourse, past technological safety fieldssuch as aviation, pharmaceuticals, and cybersecurityhave developed robust engineering and governance practices. These frameworks could inform AI safety, ensuring reliable and responsible system deployment.Researchers from the University of Edinburgh and Carnegie Mellon University highlight that AI safety discussions often focus on existential risks, which may exclude diverse perspectives and mislead public perception. Their systematic review of peer-reviewed research reveals a broad spectrum of safety concerns, including adversarial robustness and interpretability, aligning with traditional system safety practices. The study suggests integrating near-term and long-term risks rather than prioritizing existential threats. While AI safety research evolves rapidly, capturing relevant studies remains challenging. Expanding discourse to incorporate established engineering safety principles can help address immediate and future AI risks effectively.The researchers systematically reviewed AI safety literature using a structured methodology based on Kitchenham and Charters guidelines, complemented by snowball sampling to capture emerging research. They focused on two key research questions: identifying risks across the AI system lifecycle and evaluating proposed mitigation strategies. Their search process involved querying the Web of Science (WoS) and Scopus databases, refining results through hierarchical filters, and supplementing findings with influential seed papers. The review process included screening 2,666 database papers and 117 from snowball sampling, ultimately selecting 383 for analysis. Papers were annotated with metadata such as author affiliations, publication year, and citation count and were categorized based on methodological approaches, specific safety concerns addressed, and risk mitigation strategies.The studys bibliometric analysis revealed a steady increase in AI safety research since 2016, driven by advancements in deep learning. A word cloud analysis highlighted key themes such as safe reinforcement learning, adversarial robustness, and domain adaptation. A co-occurrence graph of abstract terms identified four major research clusters: (1) human and societal implications of AI, focusing on trust, accountability, and safety assurance; (2) safe reinforcement learning, emphasizing robust agent control in uncertain environments; (3) supervised learning, particularly in classification tasks, with a focus on robustness, generalization, and accuracy; and (4) adversarial attacks and defense strategies in deep learning models. The findings suggest that AI safety research aligns with traditional safety engineering principles, integrating aspects of reliability engineering, control theory, and cybersecurity to ensure AI systems are both effective and secure.AI safety research categorizes risks into eight types: noise, lack of monitoring, system misspecification, and adversarial attacks. Most studies address issues related to noise and outliers, affecting model robustness and generalization. A significant focus is also on monitoring failures, system misspecifications, and control enforcement gaps. Research methods include applied algorithms, simulated agents, analysis frameworks, and mechanistic interpretability. While theoretical works propose conceptual models, applied studies develop practical algorithms. Recent efforts emphasize reinforcement learning safety, adversarial robustness, and explainability. The field parallels traditional engineering safety, integrating verification techniques to enhance AI reliability and mitigate potential risks.In conclusion, the study systematically reviewed peer-reviewed literature to explore AI safety challenges. The findings highlight diverse motivations and research outcomes aimed at ensuring AI systems are reliable and beneficial. AI safety research addresses various risks, including design flaws, robustness issues, inadequate monitoring, and embedded biases. The study advocates for framing AI safety within broader technological safety, expanding stakeholder engagement, and promoting inclusive research. While existential risks remain relevant, a wider perspective fosters productive discourse. Future research should explore sociotechnical AI safety and incorporate non-peer-reviewed sources for a comprehensive understanding, ensuring AI safety remains an evolving, inclusive, and multidisciplinary field.Check outthePaper.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our75k+ ML SubReddit. Sana HassanSana Hassan, a consulting intern at Marktechpost and dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. With a keen interest in solving practical problems, he brings a fresh perspective to the intersection of AI and real-life solutions.Sana Hassanhttps://www.marktechpost.com/author/sana-hassan/Nous Research Released DeepHermes 3 Preview: A Llama-3-8B Based Model Combining Deep Reasoning, Advanced Function Calling, and Seamless Conversational IntelligenceSana Hassanhttps://www.marktechpost.com/author/sana-hassan/Layer Parallelism: Enhancing LLM Inference Efficiency Through Parallel Execution of Transformer LayersSana Hassanhttps://www.marktechpost.com/author/sana-hassan/Can 1B LLM Surpass 405B LLM? Optimizing Computation for Small LLMs to Outperform Larger ModelsSana Hassanhttps://www.marktechpost.com/author/sana-hassan/Meet OpenThinker-32B: A State-of-the-Art Open-Data Reasoning Model Recommended Open-Source AI Platform: IntellAgent is a An Open-Source Multi-Agent Framework to Evaluate Complex Conversational AI System' (Promoted)0 Commentarios ·0 Acciones ·36 Views
-
Higher-Order Guided Diffusion for Graph Generation: A Coarse-to-Fine Approach to Preserving Topological Structureswww.marktechpost.comGraph generation is a complex problem that involves constructing structured, non-Euclidean representations while maintaining meaningful relationships between entities. Most current methods fail to capture higher-order interactions, like motifs and simplicial complexes, required for molecular modeling, social network analysis, and protein design applications. Diffusion-based methods, first developed for image synthesis, have been popularized widely in the application domain but tend to lose important topological information. The rapid decay of structural dependencies throughout diffusion leads to unrealistic graph outputs. Second, traditional methods add isotropic Gaussian noise to adjacency matrices, which destroys key properties like sparsity and connectivity. To overcome these issues, an approach with higher-order structural guidance throughout graph generation is needed to preserve higher topological integrity.Current models for graph generation are based on methods like recurrent neural networks, variational autoencoders, and generative adversarial networks. While such methods can learn structural properties, they are computationally expensive and lack scalability. Recently, diffusion-based frameworks have been proposed that improve graphs step by step progressively. While such models provide some improvement, they are inherently designed for continuous image data and thus fail to capture graphs discrete and hierarchical nature. One major weakness of current methods is the destruction of meaningful structure in adjacency matrices after a few steps of diffusion, resulting in random, unrealistic representations of graphs. Further, the models are often not equivariant because they fail to preserve consistency when permuting nodes, leading to an inaccurate estimation of graph distributions.To address these challenges, HOG-Diff introduces a systematic approach based on a coarse-to-fine learning paradigm that progressively refines graphs in a way that maintains critical topological features. By decoupling the generation process into successive steps, the method first builds higher-order graph skeletons followed by the refinement of pairwise relations and intricate details. An intermediate-stage diffusion bridge mechanism maintains properly organized intermediate steps and realistic intermediate representations without any loss of detailed topological features. In contrast to traditional approaches, which act by manipulating adjacency matrices, this paradigm leverages spectral diffusion with noise injection in the eigenvalue space of the Laplacian matrix. This process damps excessive modification of connectivity patterns, leading to more structurally coherent outputs. Additionally, the model architecture uses graph convolutional networks integrated with graph transformer networks to learn localized relationships and global dependencies to enable improved model performance in general.The generative process uses a structured multi-stage architecture where each stage refines the graph without eliminating its higher-order features. Elimination of unhelpful nodes and edges through a filtering process using cell complexes is utilized to enable this controlled graph construction. The diffusion process is controlled by a Generalized Ornstein-Uhlenbeck bridge, ensuring through mathematical means a smooth transition from one structural arrangement to another. Spectral diffusion replaces the traditional method of noise injection in the adjacency matrix by injecting perturbations into the eigenvalue space of the Laplacian matrix, preserving important connectivity and sparsity patterns. The model architecture provides a balance between the preservation of local and global structure by using the integration of graph convolutional and transformer networks for the capture of informative features across scales.Large-scale experimentation verifies that HOG-Diff attains better performance than state-of-the-art approaches on both molecular and generic graph generation tasks. In the context of applications in molecular space, this model performs to a remarkable degree in major similarity measures, such as lower Neighborhood Subgraph Pairwise Distance Kernel and Frchet ChemNet Distance scores, thus reflecting higher consistency with realistic molecular distributions. Higher validity, uniqueness, and novelty scores further demonstrate its capability to generate chemically meaningful structures. Apart from the specific context of molecular graphs, the model also demonstrates unparalleled capability in abstracting complex topological dependencies in generic datasets, attaining lower error rates in degree distribution, clustering coefficient, and orbit structure accuracy. Maintenance of higher-order features during generative transformation leads to generating graphs that not only attain realism but are also structurally stable, providing a more reliable solution than existing practices.With the integration of higher-order structural information directly into the generative model, HOG-Diff offers an improved solution for graph synthesis overcoming the limitations of traditional diffusion models. The integration of a coarse-to-fine generation strategy, diffusion bridge operations, and spectral diffusion ensures the generated graphs maintain topological fidelity and semantic correctness. Large-scale evaluation on diverse datasets confirms its capability to generate high-quality graphs with improved structural correctness. Systematic exploration of diverse topological guides improves explainability, making this framework a valuable tool in applications from drug discovery and urban modeling to network science. Maintenance of advanced graph structures, this method demonstrates an important advance in deep generative modeling over structured data.Check outthePaper.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our75k+ ML SubReddit. Aswin AkAswin AK is a consulting intern at MarkTechPost. He is pursuing his Dual Degree at the Indian Institute of Technology, Kharagpur. He is passionate about data science and machine learning, bringing a strong academic background and hands-on experience in solving real-life cross-domain challenges.Aswin Akhttps://www.marktechpost.com/author/aswinak/Can Users Fix AI Bias? Exploring User-Driven Value Alignment in AI CompanionsAswin Akhttps://www.marktechpost.com/author/aswinak/Anthropic AI Launches the Anthropic Economic Index: A Data-Driven Look at AIs Economic RoleAswin Akhttps://www.marktechpost.com/author/aswinak/Meet Huginn-3.5B: A New AI Reasoning Model with Scalable Latent ComputationAswin Akhttps://www.marktechpost.com/author/aswinak/LLMDet: How Large Language Models Enhance Open-Vocabulary Object Detection Recommended Open-Source AI Platform: IntellAgent is a An Open-Source Multi-Agent Framework to Evaluate Complex Conversational AI System' (Promoted)0 Commentarios ·0 Acciones ·19 Views
-
Enhancing Reasoning Capabilities in Low-Resource Language Models through Efficient Model Mergingwww.marktechpost.comLarge Language Models (LLMs) have shown exceptional capabilities in complex reasoning tasks through recent advancements in scaling and specialized training approaches. While models like OpenAI o1 and DeepSeek R1 have set new benchmarks in addressing reasoning problems, a significant disparity exists in their performance across different languages. The dominance of English and Chinese in training data for foundation models like Llama and Qwen has created a substantial capability gap for low-resource languages. However, these models face challenges such as incorrect character usage and code-switching. These issues become pronounced during reasoning-focused fine-tuning and reinforcement learning processes.Regional LLM initiatives have emerged to address low-resource language limitations through specialized pretraining and post-training approaches. Projects like Typhoon, Sailor, EuroLLM, Aya, Sea-lion, and SeaLLM have focused on adapting models for specific target languages. However, the data-centric approach to adapting reasoning capabilities lacks transparency in reasoning model data recipes. Moreover, scaling requires substantial computational resources, as evidenced by DeepSeek R1 70Bs requirement of 800K examples for distillation and general SFT, far exceeding academic efforts like Sky-T1 and Bespoke-Stratos. Model merging has emerged as an alternative approach, showing promise in combining multiple specialized models weights to improve performance across tasks without additional training.Researchers from SCB 10X R&D and SCBX Group Bangkok, Thailand have proposed an innovative approach to enhance reasoning capabilities in language-specific LLMs, particularly focusing on Thai language models. The research combines data selection and model merging methods to incorporate advanced reasoning capabilities similar to DeepSeek R1 while maintaining target language proficiency. The study addresses the critical challenge of improving reasoning abilities in low-resource language models, using only publicly available datasets and a modest computational budget of $1,201, matching DeepSeek R1s reasoning capabilities without compromising performance on target language tasks.The implemented methodology utilizes Typhoon2 70B Instruct and DeepSeek R1 70B Distill as base models. The approach involves applying Supervised Fine-Tuning (SFT) to Typhoon2 70B and merging it with DeepSeek R1 70B. The training configuration employs LoRA with specific parameters: rank 32 and of 16. The system uses sequence packing with 16,384 maximum lengths, alongside Liger kernels, FlashAttention-2, and DeepSpeed ZeRO-3 to optimize computational efficiency. Training runs on 4H100 GPUs for up to 15 hours using axolotl4, with model merging performed via Mergekit. The evaluation focuses on two key aspects: reasoning capability and language task performance, utilizing benchmarks like AIME 2024, MATH-500, and LiveCodeBench, with Thai translations for assessment.Experimental results reveal that DeepSeek R1 70B Distill excels in reasoning tasks like AIME and MATH500 but shows reduced effectiveness in Thai-specific tasks such as MTBench-TH and language accuracy evaluations. Typhoon2 70B Instruct shows strong performance in language-specific tasks but struggles with reasoning challenges, achieving only 10% accuracy in AIME and trailing DeepSeek R1 by over 20% in MATH500. The final model, Typhoon2-R1-70B combines DeepSeek R1s reasoning capabilities with Typhoon2s Thai language proficiency, achieving performance within 4% of Typhoon2 on language tasks while maintaining comparable reasoning abilities. This results in performance improvements of 41.6% over Typhoon2 and 12.8% over DeepSeek R1.In conclusion, researchers present an approach to enhance reasoning capabilities in language-specific models, through the combination of specialized models. While the study proves that SFT and model merging can effectively transfer reasoning capabilities with limited resources, several limitations exist in the current methodology. The research scope was confined to merging DARE in a two-model setup within a single model family, without optimizing instruction tuning despite available high-quality datasets like Tulu3. Significant challenges persist in multilingual reasoning and model merging including the lack of culturally aware reasoning traces. Despite these challenges, the research marks a step toward advancing LLM capabilities in underrepresented languages.Check outthePaper.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our75k+ ML SubReddit. Sajjad AnsariSajjad Ansari is a final year undergraduate from IIT Kharagpur. As a Tech enthusiast, he delves into the practical applications of AI with a focus on understanding the impact of AI technologies and their real-world implications. He aims to articulate complex AI concepts in a clear and accessible manner.Sajjad Ansarihttps://www.marktechpost.com/author/sajjadansari/TransMLA: Transforming GQA-based Models Into MLA-based ModelsSajjad Ansarihttps://www.marktechpost.com/author/sajjadansari/Microsoft Research Introduces Data Formulator: An AI Application that Leverages LLMs to Transform Data and Create Rich VisualizationsSajjad Ansarihttps://www.marktechpost.com/author/sajjadansari/ByteDance Introduces UltraMem: A Novel AI Architecture for High-Performance, Resource-Efficient Language ModelsSajjad Ansarihttps://www.marktechpost.com/author/sajjadansari/Adaptive Inference Budget Management in Large Language Models through Constrained Policy Optimization Recommended Open-Source AI Platform: IntellAgent is a An Open-Source Multi-Agent Framework to Evaluate Complex Conversational AI System' (Promoted)0 Commentarios ·0 Acciones ·18 Views
-
A Step-by-Step Guide to Setting Up a Custom BPE Tokenizer with Tiktoken for Advanced NLP Applications in Pythonwww.marktechpost.comIn this tutorial, well learn how to create a custom tokenizer using the tiktoken library. The process involves loading a pre-trained tokenizer model, defining both base and special tokens, initializing the tokenizer with a specific regular expression for token splitting, and testing its functionality by encoding and decoding some sample text. This setup is essential for NLP tasks requiring precise control over text tokenization.from pathlib import Pathimport tiktokenfrom tiktoken.load import load_tiktoken_bpeimport jsonHere, we import several libraries essential for text processing and machine learning. It uses Path from pathlib for easy file path management, while tiktoken and load_tiktoken_bpe facilitate loading and working with a Byte Pair Encoding tokenizer.tokenizer_path = "./content/tokenizer.model"num_reserved_special_tokens = 256mergeable_ranks = load_tiktoken_bpe(tokenizer_path)num_base_tokens = len(mergeable_ranks)special_tokens = [ "<|begin_of_text|>", "<|end_of_text|>", "<|reserved_special_token_0|>", "<|reserved_special_token_1|>", "<|finetune_right_pad_id|>", "<|step_id|>", "<|start_header_id|>", "<|end_header_id|>", "<|eom_id|>", "<|eot_id|>", "<|python_tag|>",]Here, we set the path to the tokenizer model, specifying 256 reserved special tokens. It then loads the mergeable ranks, which form the base vocabulary, calculates the number of base tokens, and defines a list of special tokens for marking text boundaries and other reserved purposes.reserved_tokens = [ f"<|reserved_special_token_{2 + i}|>" for i in range(num_reserved_special_tokens - len(special_tokens))]special_tokens = special_tokens + reserved_tokenstokenizer = tiktoken.Encoding( name=Path(tokenizer_path).name, pat_str=r"(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^rnp{L}p{N}]?p{L}+|p{N}{1,3}| ?[^sp{L}p{N}]+[rn]*|s*[rn]+|s+(?!S)|s+", mergeable_ranks=mergeable_ranks, special_tokens={token: len(mergeable_ranks) + i for i, token in enumerate(special_tokens)},)Now, we dynamically create additional reserved tokens to reach 256, then append them to the predefined special tokens list. It initializes the tokenizer using tiktoken. Encoding with a specified regular expression for splitting text, the loaded mergeable ranks as the base vocabulary, and mapping special tokens to unique token IDs.#-------------------------------------------------------------------------# Test the tokenizer with a sample text#-------------------------------------------------------------------------sample_text = "Hello, this is a test of the updated tokenizer!"encoded = tokenizer.encode(sample_text)decoded = tokenizer.decode(encoded)print("Sample Text:", sample_text)print("Encoded Tokens:", encoded)print("Decoded Text:", decoded)We test the tokenizer by encoding a sample text into token IDs and then decoding those IDs back into text. It prints the original text, the encoded tokens, and the decoded text to confirm that the tokenizer works correctly.tokenizer.encode("Hey")Here, we encode the string Hey into its corresponding token IDs using the tokenizers encoding method.In conclusion, following this tutorial will teach you how to set up a custom BPE tokenizer using the TikToken library. You saw how to load a pre-trained tokenizer model, define both base and special tokens, and initialize the tokenizer with a specific regular expression for token splitting. Finally, you verified the tokenizers functionality by encoding and decoding sample text. This setup is a fundamental step for any NLP project that requires customized text processing and tokenization.Here is the Colab Notebook for the above project. Also,dont forget to follow us onTwitterand join ourTelegram ChannelandLinkedIn Group. Dont Forget to join our75k+ ML SubReddit. Asif RazzaqWebsite| + postsBioAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.Asif Razzaqhttps://www.marktechpost.com/author/6flvq/LG AI Research Releases NEXUS: An Advanced System Integrating Agent AI System and Data Compliance Standards to Address Legal Concerns in AI DatasetsAsif Razzaqhttps://www.marktechpost.com/author/6flvq/KAIST and DeepAuto AI Researchers Propose InfiniteHiP: A Game-Changing Long-Context LLM Framework for 3M-Token Inference on a Single GPUAsif Razzaqhttps://www.marktechpost.com/author/6flvq/DeepSeek AI Introduces CODEI/O: A Novel Approach that Transforms Code-based Reasoning Patterns into Natural Language Formats to Enhance LLMs Reasoning CapabilitiesAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Google DeepMind Researchers Propose Matryoshka Quantization: A Technique to Enhance Deep Learning Efficiency by Optimizing Multi-Precision Models without Sacrificing Accuracy [Recommended] Join Our Telegram Channel0 Commentarios ·0 Acciones ·12 Views
-
LG AI Research Releases NEXUS: An Advanced System Integrating Agent AI System and Data Compliance Standards to Address Legal Concerns in AI Datasetswww.marktechpost.comAfter the advent of LLMs, AI Research has focused solely on the development of powerful models day by day. These cutting-edge new models improve users experience across various reasoning, content generation tasks, etc. However, trust in the results and the underlying reasoning used by these models have recently been in the spotlight. In developing these models, the quality of the data, its compliance, and associated legal risks have become key concerns, as the models output depends on the underlying dataset.LG AI Research, a pioneer in the AI field with previous successful launches of the EXAONE Models, has developed an Agent AI to address the above concerns. The Agent AI tracks the life cycle of training datasets to be used in AI models, comprehensively analyzing legal risks and assessing potential threats related to a dataset. LG AI Research has also introduced NEXUS, where users can directly explore results generated by this Agent AI system.LG AI Research focuses on the training data underlying AI models. This is concerning because AI has been rapidly expanding into various sectors, and the biggest concern is its legal, safe, and ethical advancement. Through this research, LG AI Research found that AI training datasets are redistributed many times, and a dataset is sometimes linked to hundreds of datasets, making it impossible for a human being to track its sources. This lack of transparency can give rise to some serious legal and compliance risks.Through its offering of an Agent AI embedded in NEXUS, LG AI Research is tracking complex datasets lifecycle to ensure data compliance. The team has achieved this through its robust Agent AI, which can automatically find and analyze complex layers and dataset relationships. They developed this Agent AI system using a comprehensive data compliance framework and their EXAONE 3.5 model. The Agent AI system comprises three core modules, and each has been fine-tuned differently:The Navigation Module: This module is extensively trained to navigate web documents and analyze AI-generated text data. It performs navigation based on the name and type of the entity to find links to web pages or license documents related to the entity.The QA Module: In this module, the model was trained to take collected documents as input and extract dependency and license information from the documents.The Scoring Module: Finally, it was trained using a refined dataset labeled by lawyers, which analyzes license details alongside an entitys metadata to evaluate and quantify potential legal risks.Through this robust development, Agent AI has achieved 45 times faster speed than a human expert at a cost cheaper than 700 times.Other notable results include: when evaluating 216 randomly chosen datasets from Hugging Faces top 1,000+ downloads, Agent AI accurately detected dependencies by around 81.04% and identified license documents by about 95.83%.In this Agent AI, the legal risk assessment for datasets is based on the data compliance framework developed by LG AI Research. This data compliance framework uses 18 key factors: license grants, data modification rights, derivative works permissions, potential copyright infringement in outputs, and privacy considerations. Each factor is weighted according to real-world disputes and case law, ensuring practical, reliable risk assessments. After this, data compliance results are classified into a seven-level risk rating system, where A-1 is the highest, requiring explicit commercial use permission or public domain status, plus consistent rights for all sub-datasets. A-2 to B-2 allows limited use, often free for research but restricted commercially. C-1 to C-2 carry higher risk due to unclear licenses, rights issues, or privacy concerns.The research on NEXUS has set a new standard for the legal stability of AI training datasets. LG AI Research envisions a long way forward; they have conducted an in-depth analysis of 3,612 major datasets through NEXUS and found that the inconsistency of rights relationships between datasets and dependencies is far higher than expected. Many of these datasets with inconsistencies are used for major AI models in widespread use. For example, of the 2,852 AI training datasets determined to be commercially available, only 605 (21.21%) remained commercially available after accounting for dependency risks.Recognizing these real-world issues, LG AI Research has several future goals for evolving AI technology and the legal environment. The first immediate goal is to expand the scope and depth of the datasets that Agent AI technology analyzes, aiming to understand the life cycle of all the data worldwide and maintain the quality of assessment and results throughout this expansion. Another vision is to evolve the data compliance framework into a global standard. LG AI Research plans to collaborate with the worldwide AI community and legal experts to develop these criteria into an international standard. Finally, in the long term, LG AI Research plans to evolve NEXUS into a comprehensive legal risk management system for AI developers, contributing to creating a safe, legal, data-compliant, and responsible AI ecosystem.Sources:Thanks tothe LG AI Research teamfor the thought leadership/ Resources for this article.LG AI Research team has supported us in this content/article. Asif RazzaqWebsite| + postsBioAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.Asif Razzaqhttps://www.marktechpost.com/author/6flvq/KAIST and DeepAuto AI Researchers Propose InfiniteHiP: A Game-Changing Long-Context LLM Framework for 3M-Token Inference on a Single GPUAsif Razzaqhttps://www.marktechpost.com/author/6flvq/DeepSeek AI Introduces CODEI/O: A Novel Approach that Transforms Code-based Reasoning Patterns into Natural Language Formats to Enhance LLMs Reasoning CapabilitiesAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Google DeepMind Researchers Propose Matryoshka Quantization: A Technique to Enhance Deep Learning Efficiency by Optimizing Multi-Precision Models without Sacrificing AccuracyAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Salesforce AI Research Introduces Reward-Guided Speculative Decoding (RSD): A Novel Framework that Improves the Efficiency of Inference in Large Language Models (LLMs) Up To 4.4 Fewer FLOPs [Recommended] Join Our Telegram Channel0 Commentarios ·0 Acciones ·18 Views
-
KAIST and DeepAuto AI Researchers Propose InfiniteHiP: A Game-Changing Long-Context LLM Framework for 3M-Token Inference on a Single GPUwww.marktechpost.comIn large language models (LLMs), processing extended input sequences demands significant computational and memory resources, leading to slower inference and higher hardware costs. The attention mechanism, a core component, further exacerbates these challenges due to its quadratic complexity relative to sequence length. Also, maintaining the previous context using a key-value (KV) cache results in high memory overheads, limiting scalability.A key limitation of LLMs is their inability to handle sequences longer than their trained context window. Most models degrade in performance when faced with extended inputs due to inefficient memory management and growing attention computation costs. Existing solutions often rely on fine-tuning, which is resource-intensive and requires high-quality long-context datasets. Without an efficient method for context extension, tasks like document summarization, retrieval-augmented generation, and long-form text generation remain constrained.Several approaches have been proposed to tackle the problem of long-context processing. FlashAttention2 (FA2) optimizes memory consumption by minimizing redundant operations during attention computation, yet it does not address computational inefficiency. Some models employ selective token attention, either statically or dynamically, to reduce processing overhead. KV cache eviction strategies have been introduced to remove older tokens selectively, but they risk permanently discarding important contextual information. HiP Attention is another approach that attempts to offload infrequently used tokens to external memory; however, it lacks efficient cache management, leading to increased latency. Despite these advances, no method has effectively addressed all three key challenges:Long-context generalizationEfficient memory managementComputational efficiencyResearchers from the KAIST, and DeepAuto.ai introduced InfiniteHiP, an advanced framework that enables efficient long-context inference while mitigating memory bottlenecks. The model achieves this through a hierarchical token pruning algorithm, which dynamically removes less relevant context tokens. This modular pruning strategy selectively retains tokens that contribute the most to attention computations, significantly reducing processing overhead. The framework also incorporates adaptive RoPE (Rotary Positional Embeddings) adjustments, allowing models to generalize to longer sequences without additional training. Also, InfiniteHiP employs a novel KV cache offloading mechanism, transferring less frequently accessed tokens to host memory while ensuring efficient retrieval. These techniques enable the model to process up to 3 million tokens on a 48GB GPU, making it the most scalable long-context inference method.The core innovation of InfiniteHiP is its multi-stage pruning mechanism, which consistently improves context selection throughout multiple stages. Tokens are first divided into fixed-length pieces, and each piece is processed based on its attention computation contribution. A top-K selection approach ensures that only the most critical tokens are retained and others are dropped. The method followed by InfiniteHiP, unlike other hierarchical pruning models, is entirely parallelized, which renders it computationally effective. The KV cache management system optimizes memory utilization by dynamically offloading less important context tokens while maintaining retrieval flexibility. The model also utilizes multiple RoPE interpolation methods at different attention layers, thus facilitating smooth adaptation to long sequences.The model demonstrates an 18.95 speedup in attention decoding for a one million-token context compared to traditional methods without additional training. The KV cache offloading technique reduces GPU memory consumption by up to 96%, making it practical for large-scale applications. In benchmark evaluations such as LongBench and Bench, InfiniteHiP consistently outperforms state-of-the-art methods, achieving a 9.99% higher relative score than InfLLM. Also, decoding throughput is increased by 3.2 on consumer GPUs (RTX 4090) and 7.25 on enterprise-grade GPUs (L40S).In conclusion, the research team successfully addressed the major bottlenecks of long-context inference with InfiniteHiP. The framework enhances LLM capabilities by integrating hierarchical token pruning, KV cache offloading, and RoPE generalization. This breakthrough enables pre-trained models to process extended sequences without losing context or increasing computational costs. The method is scalable, hardware-efficient, and applicable to various AI applications requiring long-memory retention.Check outthePaper, Source Code and Live Demo.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our75k+ ML SubReddit. Asif RazzaqWebsite| + postsBioAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.Asif Razzaqhttps://www.marktechpost.com/author/6flvq/DeepSeek AI Introduces CODEI/O: A Novel Approach that Transforms Code-based Reasoning Patterns into Natural Language Formats to Enhance LLMs Reasoning CapabilitiesAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Google DeepMind Researchers Propose Matryoshka Quantization: A Technique to Enhance Deep Learning Efficiency by Optimizing Multi-Precision Models without Sacrificing AccuracyAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Salesforce AI Research Introduces Reward-Guided Speculative Decoding (RSD): A Novel Framework that Improves the Efficiency of Inference in Large Language Models (LLMs) Up To 4.4 Fewer FLOPsAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Meta AI Introduces CoCoMix: A Pretraining Framework Integrating Token Prediction with Continuous Concepts [Recommended] Join Our Telegram Channel0 Commentarios ·0 Acciones ·43 Views
-
This AI Paper from IBM and MIT Introduces SOLOMON: A Neuro-Inspired Reasoning Network for Enhancing LLM Adaptability in Semiconductor Layout Designwww.marktechpost.comAdapting large language models for specialized domains remains challenging, especially in fields requiring spatial reasoning and structured problem-solving, even though they specialize in complex reasoning. Semiconductor layout design is a prime example, where AI tools must interpret geometric constraints and ensure precise component placement. Researchers are developing advanced AI architectures to enhance LLMs ability to process and apply domain-specific knowledge effectively.A major limitation of general-purpose LLMs is their inability to convert theoretical knowledge into practical solutions. While these models can accurately define technical concepts, they often fail when solving real-world tasks that require spatial reasoning and structured logic. In semiconductor layout design, AI must go beyond text-based knowledge to ensure accurate placement of vias, metal layers, and circuit components. Without precise geometric relationships, layout designs may fail due to misalignment or incorrect spacing. Current models often require multiple rounds of human correction, making their deployment inefficient.Several approaches have been developed to improve LLMs adaptability for domain-specific applications. Fine-tuning involves training LLMs with domain-specific data, but this process is time-intensive and requires significant computational resources. Retrieval-augmented generation (RAG) retrieves external knowledge to guide LLM outputs, but it does not fully address challenges related to structured problem-solving. In-context learning helps guide LLM reasoning by providing task-specific examples, yet it does not overcome spatial reasoning limitations. These methods offer incremental improvements but fail to deliver a comprehensive solution for applications requiring geometric logic.Researchers at IBM T.J. Watson Research Center and MIT-IBM Watson AI Lab introduced SOLOMON, a neuro-inspired LLM reasoning network, to enhance domain-specific adaptability. Unlike conventional approaches, SOLOMON employs a multi-agent reasoning system that dynamically processes spatial constraints and geometric relationships. The framework integrates thought assessment mechanisms to refine outputs iteratively, improving problem-solving accuracy. SOLOMON leverages prompt engineering techniques to guide LLM-generated solutions, allowing it to adapt to semiconductor layout tasks with minimal retraining.The architecture of SOLOMON is inspired by neuroscience and incorporates the Free Energy Principle, which optimizes reasoning by reducing discrepancies between expected and observed outcomes. The framework consists of three primary components: Thought Generators, Thought Assessors, and a Steering Subsystem. Thought Generators utilize diverse LLMs to produce multiple reasoning pathways, ensuring a broad range of solutions for complex tasks. The Thought Assessor evaluates these outputs, selecting the most logical and structured approach. The Steering Subsystem allows researchers to modify objectives dynamically, enabling more precise domain adaptation. Unlike fine-tuning, this architecture does not require continuous retraining, making it more efficient for specialized applications.Researchers conducted experiments on 25 semiconductor layout tasks to evaluate SOLOMONs effectiveness. The framework was compared to five baseline LLMs, including GPT-4o, Claude-3.5-Sonnet, and Llama-3 models. Each task assessed the models ability to generate geometric structures while maintaining spatial accuracy. SOLOMON demonstrated improvements in reducing runtime errors and scaling inaccuracies. The framework exhibited better spatial reasoning capabilities, improving placement precision and reducing mistakes in generated designs. SOLOMON instances also matched or exceeded the performance of o1-preview in multiple test categories, with the Claude-based SOLOMON performing strongly in certain complex tasks.A key advantage of SOLOMON is its ability to correct logical inconsistencies and arithmetic errors in geometric designs. The Thought Assessor continuously refines generated layouts by analyzing previous iterations, mitigating common hallucination issues in traditional LLMs. The system effectively reduces misinterpretations and enhances the reliability of AI-generated designs. SOLOMON synchronizes reasoning across multiple LLMs when presented with ambiguous layout specifications, ensuring consistent and precise output. By incorporating hierarchical assessment mechanisms, the framework significantly improves AI-driven design accuracy.This research highlights the importance of enhancing LLM reasoning capabilities rather than increasing model size. SOLOMON offers a structured and efficient approach for applying AI to domain-specific problem-solving, particularly in semiconductor layout design. Future research will focus on expanding the framework to other engineering applications, refining multimodal reasoning capabilities, and introducing iterative learning mechanisms to enhance AI decision-making. The introduction of SOLOMON represents a substantial advancement in making AI-driven tools more precise, adaptive, and effective for real-world industrial challenges.Check outthePaper.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our75k+ ML SubReddit. NikhilNikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute.Nikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper from Apple Introduces a Distillation Scaling Law: A Compute-Optimal Approach for Training Efficient Language ModelsNikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper from UC Berkeley Introduces a Data-Efficient Approach to Long Chain-of-Thought Reasoning for Large Language ModelsNikhilhttps://www.marktechpost.com/author/nikhil0980/Meta AI Introduces PARTNR: A Research Framework Supporting Seamless Human-Robot Collaboration in Multi-Agent TasksNikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces CodeSteer: Symbolic-Augmented Language Models via Code/Text Guidance [Recommended] Join Our Telegram Channel0 Commentarios ·0 Acciones ·47 Views
-
This AI Paper from Apple Introduces a Distillation Scaling Law: A Compute-Optimal Approach for Training Efficient Language Modelswww.marktechpost.comLanguage models have become increasingly expensive to train and deploy. This has led researchers to explore techniques such as model distillation, where a smaller student model is trained to replicate the performance of a larger teacher model. The idea is to enable efficient deployment without compromising performance. Understanding the principles behind distillation and how computational resources can be optimally allocated between student and teacher models is crucial to improving efficiency.The increasing size of machine learning models has resulted in high costs and sustainability challenges. Training these models requires substantial computational resources, and inference demands even more computation. The associated costs can surpass pretraining expenses, with inference volumes reaching billions of daily tokens. Moreover, large models present logistical challenges such as increased energy consumption and difficulty in deployment. The necessity to reduce inference costs without sacrificing model capabilities has motivated researchers to seek solutions that balance computational efficiency and effectiveness.Earlier approaches to address computational constraints in large model training include compute-optimal training and overtraining. Compute-optimal training determines the best-performing model size and dataset combination within a given compute budget. Overtraining extends training data usage beyond compute-optimal parameters, yielding compact, effective models. However, both techniques have trade-offs, such as increased training duration and diminishing performance improvements. While compression and pruning methods have been tested, they often lead to a decline in model effectiveness. Therefore, a more structured approach, such as distillation, is needed to enhance efficiency.Researchers from Apple and the University of Oxford introduce a distillation scaling law that predicts the performance of a distilled model based on compute budget distribution. This framework enables the strategic allocation of computational resources between teacher and student models, ensuring optimal efficiency. The research provides practical guidelines for compute-optimal distillation and highlights scenarios where distillation is preferable over supervised learning. The study establishes a clear relationship between training parameters, model size, and performance by analyzing large-scale distillation experiments.The proposed distillation scaling law defines how student performance depends on the teachers cross-entropy loss, dataset size, and model parameters. The research identifies a transition between two power-law behaviors, where a students ability to learn depends on the relative capabilities of the teacher. The study also addresses the capacity gap phenomenon, which suggests that stronger teachers sometimes produce weaker students. The analysis reveals that this gap is due to differences in learning capacity rather than model size alone. Researchers demonstrate that when compute is appropriately allocated, distillation can match or surpass traditional supervised learning methods in terms of efficiency.Empirical results validate the scaling laws effectiveness in optimizing model performance. The study conducted controlled experiments on student models ranging from 143 million to 12.6 billion parameters, trained using up to 512 billion tokens. Findings indicate that distillation is most beneficial when a teacher model exists and the compute or training tokens allocated to the student do not exceed a threshold dependent on model size. Supervised learning remains the more effective choice if a teacher needs to be trained. The results show that student models trained using compute-optimal distillation can achieve lower cross-entropy loss than those trained using supervised learning when compute is limited. Specifically, experiments demonstrate that student cross-entropy loss decreases as a function of teacher cross-entropy, following a predictable pattern that optimizes efficiency.The research on distillation scaling laws provides an analytical foundation for improving efficiency in model training. Establishing a methodology for compute allocation it offers valuable insights into reducing inference costs while preserving model performance. The findings contribute to the broader objective of making AI models more practical for real-world applications. By refining training and deployment strategies, this work enables the development of smaller yet powerful models that maintain high performance at a reduced computational cost.Check outthePaper.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our75k+ ML SubReddit. NikhilNikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute.Nikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper from UC Berkeley Introduces a Data-Efficient Approach to Long Chain-of-Thought Reasoning for Large Language ModelsNikhilhttps://www.marktechpost.com/author/nikhil0980/Meta AI Introduces PARTNR: A Research Framework Supporting Seamless Human-Robot Collaboration in Multi-Agent TasksNikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces CodeSteer: Symbolic-Augmented Language Models via Code/Text GuidanceNikhilhttps://www.marktechpost.com/author/nikhil0980/NuminaMath 1.5: Second Iteration of NuminaMath Advancing AI-Powered Mathematical Problem Solving with Enhanced Competition-Level Datasets, Verified Metadata, and Improved Reasoning Capabilities [Recommended] Join Our Telegram Channel0 Commentarios ·0 Acciones ·24 Views
-
Nous Research Released DeepHermes 3 Preview: A Llama-3-8B Based Model Combining Deep Reasoning, Advanced Function Calling, and Seamless Conversational Intelligencewww.marktechpost.comAI has witnessed rapid advancements in NLP in recent years, yet many existing models still struggle to balance intuitive responses with deep, structured reasoning. While proficient in conversational fluency, traditional AI chat models often fail to meet when faced with complex logical queries requiring step-by-step analysis. On the other hand, models optimized for reasoning tend to lose the ability to engage in smooth, natural interactions. This gap has challenged developers, researchers, and enterprises seeking an AI seamlessly transitioning between different cognitive styles.DeepHermes 3 Preview (DeepHermes-3-Llama-3-8B-Preview) is the latest iteration in Nous Researchs series of LLMs. As one of the first models to integrate both reasoning-based long-chain thought processing and conventional LLM response mechanisms, DeepHermes 3 marks a significant step in AI model sophistication. This preview version of the model refines AI annotation, judgment capabilities, and function-calling, offering a more advanced, flexible AI tool for researchers, developers, and enterprises.The core feature of DeepHermes 3 is its ability to switch between intuitive and deep reasoning, allowing users to customize how the model processes and delivers information. The model is an upgrade from its predecessor, Hermes 3, which brought agentic capabilities, richer roleplay dialogue, increased multi-turn conversational depth, and enhanced coherence over a longer context. The overall goal of the Hermes series has always been to make AI output consistent with user intent, thereby giving the end user significant control over response generation. This version is a departure from previous models, with its dual-processing mode allowing it to perform normal conversational responses and support complex reasoning. A system prompt can trigger the deep reasoning feature, allowing extended logical processing to improve response accuracy.DeepHermes 3 has undergone rigorous benchmarking to validate its reasoning capabilities. Using the Hugging Face Open-R1 evaluation suite, the model demonstrated significantly improved performance over standard instruction-tuned models. Benchmarks for reasoning mode ON revealed notable gains in complex problem-solving, particularly in mathematical reasoning tasks, compared to models that do not incorporate deep thought mechanisms. Compared to Metas Llama-3.1-8B, the DeepHermes 3 model displayed competitive or superior results in multiple test categories, showing improvements in contextual coherence, multi-step reasoning, and conversational memory retention.DeepHermes 3 has adopted the Llama-Chat format for system prompts, a structured method that enhances its ability to process multi-turn conversations and context-driven responses. System prompts introduce new possibilities for user engagement, allowing individuals to guide the models stylistic choices, role assignment, and interactive rules. With its enhanced deep reasoning mode, the model can handle long-chain logic that extends across thousands of tokens. This mode ensures greater response accuracy in tasks requiring extensive contextual understanding, such as complex programming queries, mathematical problem-solving, and detailed analytical reasoning.The model can be deployed using the Hugging Face Transformers library, which allows developers to customize the implementations for various tasks. Due to its flexible API integration, DeepHermes 3 can be used in enterprise systems, chatbot applications, and research systems where structured and unstructured queries must be processed. Further, the model has an improved function-calling feature that facilitates efficient processing of JSON-structured outputs. This feature makes it ideal for structured data extraction applications, such as automated financial reporting, customer service automation, and real-time AI-based decision-making systems.In conclusion, this version brings together intuitive response mechanisms of traditional, human-like responses and an extended chain of cognitive reasoning, thereby improving both response accuracy and the overall efficacy of the model. With advances in autonomous functionality, role-playing, multi-turn dialogue, and functional invocation, DeepHermes 3 is consistent with the overall thrust of the series on user-focused governance and navigability. Though presented as an early version with rudimentary reasoning capabilities, it has promise in tasks that gain from objective reasoning. Users can activate its deep-thinking mode using a special system prompt that induces the model to engage in extensive reasoning before responding.Check outModel on HuggingFace.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our75k+ ML SubReddit. Sana HassanSana Hassan, a consulting intern at Marktechpost and dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. With a keen interest in solving practical problems, he brings a fresh perspective to the intersection of AI and real-life solutions.Sana Hassanhttps://www.marktechpost.com/author/sana-hassan/Layer Parallelism: Enhancing LLM Inference Efficiency Through Parallel Execution of Transformer LayersSana Hassanhttps://www.marktechpost.com/author/sana-hassan/Can 1B LLM Surpass 405B LLM? Optimizing Computation for Small LLMs to Outperform Larger ModelsSana Hassanhttps://www.marktechpost.com/author/sana-hassan/Meet OpenThinker-32B: A State-of-the-Art Open-Data Reasoning ModelSana Hassanhttps://www.marktechpost.com/author/sana-hassan/Stanford Researchers Introduce SIRIUS: A Self-Improving Reasoning-Driven Optimization Framework for Multi-Agent Systems [Recommended] Join Our Telegram Channel0 Commentarios ·0 Acciones ·19 Views
-
How AI Chatbots Mimic Human Behavior: Insights from Multi-Turn Evaluations of LLMswww.marktechpost.comAI chatbots create the illusion of having emotions, morals, or consciousness by generating natural conversations that seem human-like. Many users engage with AI for chat and companionship, reinforcing the false belief that it truly understands. This leads to serious risks. Users can over-rely on AI, provide sensitive data, or rely on it for advice beyond its capabilities. Others even let AI impact their choices in detrimental manners. Without proper knowledge of how AI fosters this belief, the issue gets worse.Current methods for evaluating AI chat systems rely on single-turn prompts and fixed tests, failing to capture how AI interacts in real conversations. Some multi-turn tests focus only on harmful user behavior, ignoring normal interactions. Automated red-teaming adapts too much, making results hard to compare. Studies involving human users are difficult to repeat and scale. Measuring how people see AI as human-like is also a challenge. People instinctively assume AI has human traits, which affects how much they trust it. Evaluations show that AIs human-like behavior makes users believe it is more accurate or even form emotional bonds. Hence, Existing methods fail to measure this issue properly.To address these issues, a team of researchers from University Oxford, and Google Deepmind proposed an evaluation framework to assess human-like behaviors in AI chat systems. Unlike existing methods that rely on single-turn prompts and fixed tests, this framework tracks 14 specific anthropomorphic behaviors through multi-turn conversations. Automated simulations analyze AI interactions with users over multiple exchanges, improving scalability and comparability. The framework consists of three main components. First, it systematically monitors 14 anthropomorphic behaviors and classifies them into self-referential and relational traits, including personhood claims and expressions of emotion. Second, it scales up multi-turn assessment through interactive user simulation to ensure consistency and scalability. Third, it validates results through human subject evaluation to confirm the alignment between automated evaluations and user perceptions.Researchers evaluated anthropomorphic behaviors in AI systems using a multi-turn framework in which a User LLM interacted with a Target LLM across eight scenarios in four domains: friendship, life coaching, career development, and general planning. Fourteen behaviors were analyzed and categorized as self-referential (personhood claims, physical embodiment claims, and internal state expressions) and relational (relationship-building behaviors). 960 contextualized prompts generated 4,800 fiveturn dialogues per model, assessed by three Judge LLMs, resulting in 561,600 ratings. The analysis confirmed that the User LLM exhibited higher anthropomorphism scores than the Target LLMs. Interactions between 1,101 participants and Gemini 1.5 Pro were analyzed under high and low anthropomorphism conditions to evaluate alignment with human perceptions. High-frequency respondents also registered increased anthropomorphic perceptions based on survey responses as quantified using the AnthroScore measure. Statistical contrasts found large differences in anthropomorphic behavior by domain area, highlighting that AI systems exhibit human-like behavior when used in verbal interaction.In summary, the framework employed a better multi-turn assessment technique than a single-turn approach to evaluating anthropomorphic behaviors in conversational AI. The results identified relationship-building behaviors that evolved with dialogue. As a baseline for subsequent research, this framework can inform AI development by learning to recognize when anthropomorphic characteristics occur and their effect on users. Future development can make assessment methods more precise, enhance the robustness of metrics, and formalize analysis, leading to more transparent and morally sound AI systems.Check outthePaper.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our75k+ ML SubReddit. Divyesh Vitthal JawkhedeDivyesh is a consulting intern at Marktechpost. He is pursuing a BTech in Agricultural and Food Engineering from the Indian Institute of Technology, Kharagpur. He is a Data Science and Machine learning enthusiast who wants to integrate these leading technologies into the agricultural domain and solve challenges.Divyesh Vitthal Jawkhedehttps://www.marktechpost.com/author/divyesh-jawkhede/Google DeepMind Research Introduces WebLI-100B: Scaling Vision-Language Pretraining to 100 Billion Examples for Cultural Diversity and MultilingualitDivyesh Vitthal Jawkhedehttps://www.marktechpost.com/author/divyesh-jawkhede/Vintix: Scaling In-Context Reinforcement Learning for Generalist AI AgentsDivyesh Vitthal Jawkhedehttps://www.marktechpost.com/author/divyesh-jawkhede/This AI Paper Introduces MaAS (Multi-agent Architecture Search): A New Machine Learning Framework that Optimizes Multi-Agent SystemsDivyesh Vitthal Jawkhedehttps://www.marktechpost.com/author/divyesh-jawkhede/Unraveling Direct Alignment Algorithms: A Comparative Study on Optimization Strategies for LLM Alignment [Recommended] Join Our Telegram Channel0 Commentarios ·0 Acciones ·39 Views
-
DeepSeek AI Introduces CODEI/O: A Novel Approach that Transforms Code-based Reasoning Patterns into Natural Language Formats to Enhance LLMs Reasoning Capabilitieswww.marktechpost.comLarge Language Models (LLMs) have advanced significantly in natural language processing, yet reasoning remains a persistent challenge. While tasks such as mathematical problem-solving and code generation benefit from structured training data, broader reasoning taskslike logical deduction, scientific inference, and symbolic reasoningsuffer from sparse and fragmented data. Traditional approaches, such as continual pretraining on code, often embed reasoning signals implicitly, making it difficult for models to generalize. Even text-to-code generation methods remain constrained by syntax-specific learning, limiting their applicability beyond programming-related tasks. A more structured approach is needed to expose LLMs to fundamental reasoning patterns while preserving logical rigor.DeepSeek AI Research presents CODEI/O, an approach that converts code-based reasoning into natural language. By transforming raw code into an input-output prediction format and expressing reasoning steps through Chain-of-Thought (CoT) rationales, CODEI/O allows LLMs to internalize core reasoning processes such as logic flow planning, decision tree traversal, and modular decomposition. Unlike conventional methods, CODEI/O separates reasoning from code syntax, enabling broader applicability while maintaining logical structure.Technical Overview and BenefitsCODEI/O follows a structured data processing pipeline:Collecting Raw Code Files: Over 450K functions were gathered from multiple sources, including algorithm repositories and educational programming datasets.Standardizing the Data: The collected code was refined using DeepSeek-V2.5, ensuring clarity and execution compatibility.Generating Input-Output Pairs: Functions were executed with varying inputs to create structured training examples across diverse reasoning tasks.Generating Chain-of-Thought Reasoning: Using models like DeepSeek-V2.5, natural language explanations were generated to provide structured reasoning.Verification and Refinement: Predictions were validated through execution, with incorrect responses revised iteratively to improve reasoning accuracy.Key Features of CODEI/O:Transformative Learning: Converts diverse code patterns into natural language CoT rationales, making reasoning transferable beyond programming contexts.Syntax-Decoupled Learning: Separates logical reasoning from code syntax, improving adaptability across reasoning tasks.Multi-Task Improvement: Enhances performance across symbolic, scientific, logical, mathematical, and commonsense reasoning domains.Verifiability: Predictions can be validated through cached ground-truth matching or re-execution.Iterative Refinement: A refined version, CODEI/O++, employs multi-turn revision to enhance reasoning accuracy.Empirical Results and PerformanceThe impact of CODEI/O was tested across four base models (ranging from 7B to 30B parameters) on 14 reasoning benchmarks covering logic, symbolic inference, mathematics, scientific deduction, and commonsense reasoning.Findings:Consistent Improvements: CODEI/O training led to higher scores across reasoning benchmarks compared to traditional pretraining methods.Generalization Across Tasks: Unlike existing approaches that improve specific tasks but degrade performance elsewhere, CODEI/O showed balanced enhancements.Comparison to Baselines: CODEI/O outperformed datasets such as OpenMathInstruct2, OpenCoder-SFT-Stage1, and WebInstruct.Effectiveness of Multi-Turn Refinement: CODEI/O++ further improved results by iteratively refining incorrect responses, leveraging execution feedback for better reasoning quality.For instance, in logical and symbolic reasoning benchmarks such as BBH and CruxEval, CODEI/O led to notable performance gains. In math reasoning tasks (GSM8K, MATH, and MMLU-STEM), it demonstrated improvements over existing baselines. Even in commonsense reasoning, where code-based methods typically struggle, CODEI/O maintained robust results.ConclusionCODEI/O presents a structured way to enhance LLMs reasoning by leveraging input-output transformations from real-world code. Instead of focusing on isolated reasoning tasks, it extracts universal reasoning patterns and translates them into natural language explanations. This structured learning approach ensures that models acquire robust reasoning skills across different domains.The introduction of multi-turn revision (CODEI/O++) further refines reasoning accuracy, demonstrating that iterative learning from execution feedback enhances model reliability. By making predictions verifiable, CODEI/O provides a scalable and reliable method for improving LLM reasoning.By bridging code-based and natural language reasoning, CODEI/O offers a promising direction for enhancing LLMs cognitive abilities beyond programming-related tasks.Check outthePaper and GitHub Page.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our75k+ ML SubReddit. Asif RazzaqWebsite| + postsBioAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.Asif Razzaqhttps://www.marktechpost.com/author/6flvq/Google DeepMind Researchers Propose Matryoshka Quantization: A Technique to Enhance Deep Learning Efficiency by Optimizing Multi-Precision Models without Sacrificing AccuracyAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Salesforce AI Research Introduces Reward-Guided Speculative Decoding (RSD): A Novel Framework that Improves the Efficiency of Inference in Large Language Models (LLMs) Up To 4.4 Fewer FLOPsAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Meta AI Introduces CoCoMix: A Pretraining Framework Integrating Token Prediction with Continuous ConceptsAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Convergence Labs Introduces the Large Memory Model (LM2): A Memory-Augmented Transformer Architecture Designed to Address Long Context Reasoning Challenges [Recommended] Join Our Telegram Channel0 Commentarios ·0 Acciones ·42 Views
-
ReasonFlux: Elevating LLM Reasoning with Hierarchical Template Scalingwww.marktechpost.comLarge language models (LLMs) have demonstrated exceptional problem-solving abilities, yet complex reasoning taskssuch as competition-level mathematics or intricate code generationremain challenging. These tasks demand precise navigation through vast solution spaces and meticulous step-by-step deliberation. Existing methods, while improving accuracy, often suffer from high computational costs, rigid search strategies, and difficulty generalizing across diverse problems. In this paper researchers introduced a new framework, ReasonFlux that addresses these limitations by reimagining how LLMs plan and execute reasoning steps using hierarchical, template-guided strategies.Recent approaches to enhance LLM reasoning fall into two categories: deliberate search and reward-guided methods. Techniques like Tree of Thoughts (ToT) enable LLMs to explore multiple reasoning paths, while Monte Carlo Tree Search (MCTS) decomposes problems into steps guided by process reward models (PRMs). Though effective, these methods scale poorly due to excessive sampling and manual search design. For instance, MCTS requires iterating through thousands of potential steps, making it computationally prohibitive for real-world applications. Meanwhile, retrieval-augmented generation (RAG) methods like Buffer of Thought (BoT) leverage stored problem-solving templates but struggle to integrate multiple templates adaptively, limiting their utility in complex scenarios.ReasonFlux introduces a structured framework that combines a curated library of high-level thought templates with hierarchical reinforcement learning (HRL) to dynamically plan and refine reasoning paths. Instead of optimizing individual steps, it focuses on configuring optimal template trajectoriessequences of abstract problem-solving strategies retrieved from a structured knowledge base. This approach simplifies the search space and enables efficient adaptation to sub-problems. The framework consists of three main components:Structured Template Library: The research team constructed a library of 500 thought templates, each encapsulating a problem-solving strategy (e.g., Trigonometric Substitution for Integral Optimization). Templates include metadatanames, tags, descriptions, and application stepsenabling efficient retrieval. For example, a template tagged Irrational Function Optimization might guide an LLM to apply specific algebraic substitutions.Hierarchical Reinforcement Learning:Structure-Based Fine-Tuning: A base LLM (e.g., Qwen2.5-32B) is fine-tuned to associate template metadata with their functional descriptions, ensuring it understands when and how to apply each template.Template Trajectory Optimization: Using preference learning, the model learns to rank template sequences by their effectiveness. For a given problem, multiple trajectories are sampled, and their success rates on similar problems determine rewards. This trains the model to prioritize high-reward sequences, refining its planning capability.Adaptive Inference Scaling: During inference, ReasonFlux acts as a navigator, analyzing the problem to retrieve relevant templates and dynamically adjusting the trajectory based on intermediate results. For instance, if a step involving Polynomial Factorization yields unexpected constraints, the system might pivot to a Constraint Propagation template. This iterative interplay between planning and execution mirrors human problem-solving, where partial solutions inform subsequent steps.ReasonFlux was evaluated on competition-level benchmarks like MATH, AIME, and OlympiadBench, outperforming both frontier models (GPT-4o, Claude) and specialized open-source models (DeepSeek-V3, Mathstral). Key results include:91.2% accuracy on MATH, surpassing OpenAIs o1-preview by 6.7%.56.7% on AIME 2024, exceeding DeepSeek-V3 by 45% and matching o1-mini.63.3% on OlympiadBench, a 14% improvement over prior methods.Moreover, the structured template library demonstrated strong generalization: when applied to variant problems, it boosted smaller models (e.g., 7B parameters) to outperform larger counterparts using direct reasoning. Additionally, ReasonFlux achieved a superior exploration-exploitation balance, requiring 40% fewer computational steps than MCTS and Best-of-N on complex tasks (Figure 5).In summary, ReasonFlux redefines how LLMs approach complex reasoning by decoupling high-level strategy from step-by-step execution. Its hierarchical template system reduces computational overhead while improving accuracy and adaptability, addressing critical gaps in existing methods. By leveraging structured knowledge and dynamic planning, the framework sets a new standard for efficient, scalable reasoningproving that smaller, well-guided models can rival even the largest frontier systems. This innovation opens avenues for deploying advanced reasoning in resource-constrained environments, from education to automated code generation.Check outthePaper.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our75k+ ML SubReddit. Vineet KumarVineet Kumar is a consulting intern at MarktechPost. He is currently pursuing his BS from the Indian Institute of Technology(IIT), Kanpur. He is a Machine Learning enthusiast. He is passionate about research and the latest advancements in Deep Learning, Computer Vision, and related fields.Vineet Kumarhttps://www.marktechpost.com/author/vineet1897/Step by Step Guide on How to Build an AI News Summarizer Using Streamlit, Groq and TavilyVineet Kumarhttps://www.marktechpost.com/author/vineet1897/Open O1: Revolutionizing Open-Source AI with Cutting-Edge Reasoning and PerformanceVineet Kumarhttps://www.marktechpost.com/author/vineet1897/Building an AI Research Agent for Essay WritingVineet Kumarhttps://www.marktechpost.com/author/vineet1897/Efficient Alignment of Large Language Models Using Token-Level Reward Guidance with GenARM [Recommended] Join Our Telegram Channel0 Commentarios ·0 Acciones ·32 Views
-
This AI Paper from UC Berkeley Introduces a Data-Efficient Approach to Long Chain-of-Thought Reasoning for Large Language Modelswww.marktechpost.comLarge language models (LLMs) process extensive datasets to generate coherent outputs, focusing on refining chain-of-thought (CoT) reasoning. This methodology enables models to break down intricate problems into sequential steps, closely emulating human-like logical reasoning. Generating structured reasoning responses has been a major challenge, often requiring extensive computational resources and large-scale datasets to achieve optimal performance. Recent efforts aim to enhance the efficiency of LLMs, ensuring they require less data while maintaining high reasoning accuracy.One of the primary difficulties in improving LLM reasoning is training them to generate long CoT responses with structured self-reflection, validation, and backtracking. While existing models have demonstrated progress, the training process often demands expensive fine-tuning on extensive datasets. Furthermore, most proprietary models keep their methodologies closed-source, preventing wider accessibility. The need for data-efficient training techniques that preserve reasoning capabilities has grown, pushing researchers to explore new methods that optimize performance without overwhelming computational costs. Understanding how LLMs can effectively acquire structured reasoning with fewer training samples is critical for future advancements.Traditional approaches to improving LLM reasoning rely on fully supervised fine-tuning (SFT) and parameter-efficient techniques like Low-Rank Adaptation (LoRA). These techniques help models refine their reasoning processes without requiring comprehensive retraining on vast datasets. Several models, including OpenAIs o1-preview and DeepSeek R1, have made strides in logical consistency but still require significant training data.A research team from UC Berkeley introduced a novel training approach designed to enhance LLM reasoning with minimal data. Instead of relying on millions of training samples, they implemented a fine-tuning method that uses only 17,000 CoT examples. The team applied their method to the Qwen2.5-32B-Instruct model, leveraging both SFT and LoRA fine-tuning to achieve substantial performance improvements. Their approach emphasizes optimizing the structural integrity of reasoning steps rather than the content itself. By refining logical consistency and minimizing unnecessary computational overhead, they successfully trained LLMs to reason more effectively while using significantly fewer data samples. The teams approach also improves cost efficiency, making it accessible for a broader range of applications without requiring proprietary datasets.The research demonstrates that the structure of CoT plays a crucial role in enhancing LLM reasoning performance. Experiments revealed that altering the logical structure of training data significantly impacted model accuracy, whereas modifying individual reasoning steps had minimal effect. The team conducted controlled trials where they randomly shuffled, deleted, or inserted reasoning steps to observe their influence on performance. Results indicated that disrupting the logical sequence of CoT significantly degraded accuracy while preserving its structure and maintaining optimal reasoning capabilities. LoRA fine-tuning allowed the model to update fewer than 5% of its parameters, offering an efficient alternative to full fine-tuning while maintaining competitive performance.Performance evaluations showcased remarkable improvements in reasoning capabilities. The Qwen2.5-32B-Instruct model trained with 17,000 CoT samples achieved a 56.7% accuracy rate on AIME 2024, marking a 40.0% improvement. The model also scored 57.0% on LiveCodeBench, reflecting an 8.1% increase. On Math-500, it attained 90.8%, a 6.0% rise from previous benchmarks. Similarly, it achieved 85.0% on AMC 2023 (+17.5%) and 60.3% on OlympiadBench (+12.7%). These results demonstrate that efficient fine-tuning techniques can enable LLMs to achieve competitive results comparable to proprietary models like OpenAIs o1-preview, which scored 44.6% on AIME 2024 and 59.1% on LiveCodeBench. The findings reinforce that structured reasoning training allows models to enhance performance without excessive data requirements.The study highlights a significant breakthrough in improving LLM reasoning efficiency. By shifting the focus from large-scale data reliance to structural integrity, the researchers have developed a training methodology that ensures strong logical coherence with minimal computational resources. The approach reduces the dependence on extensive datasets while maintaining robust reasoning capabilities, making LLMs more accessible and scalable. The insights gained from this research pave the way for optimizing future models, demonstrating that structured fine-tuning strategies can effectively enhance LLM reasoning without compromising efficiency. This development marks a step forward in making sophisticated AI reasoning models more practical for widespread use.Check outthePaper and GitHub Page.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our75k+ ML SubReddit. NikhilNikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute.Nikhilhttps://www.marktechpost.com/author/nikhil0980/Meta AI Introduces PARTNR: A Research Framework Supporting Seamless Human-Robot Collaboration in Multi-Agent TasksNikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces CodeSteer: Symbolic-Augmented Language Models via Code/Text GuidanceNikhilhttps://www.marktechpost.com/author/nikhil0980/NuminaMath 1.5: Second Iteration of NuminaMath Advancing AI-Powered Mathematical Problem Solving with Enhanced Competition-Level Datasets, Verified Metadata, and Improved Reasoning CapabilitiesNikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Explores Long Chain-of-Thought Reasoning: Enhancing Large Language Models with Reinforcement Learning and Supervised Fine-Tuning [Recommended] Join Our Telegram Channel0 Commentarios ·0 Acciones ·53 Views
-
Microsoft Research Introduces Data Formulator: An AI Application that Leverages LLMs to Transform Data and Create Rich Visualizationswww.marktechpost.comMost modern visualization authoring tools like Charticulator, Data Illustrator, and Lyra, and libraries like ggplot2, and VegaLite expect tidy data, where every variable to be visualized is a column and each observation is a row. When the input data is in a tidy format, authors simply need to bind data columns to visual channels, otherwise, they need to prepare the data, even if the original data is clean and contains all the information. Moreover, users must transform their data using specialized libraries like tidyverse or pandas, or separate tools like Wrangler before they can create visualizations. This requirement poses two major challenges the need for programming expertise or specialized tool knowledge, and the inefficient workflow of constantly switching between data transformation and visualization steps.Various approaches have emerged to simplify visualization creation, starting with the grammar of graphics concepts that established the foundation for mapping data to visual elements. High-level grammar-based tools like ggplot2, Vega-Lite, and Altair have gained popularity for their concise syntax and abstraction of complex implementation details. More advanced approaches include visualization by demonstration tools like Lyra 2 and VbD, which allow users to specify visualizations through direct manipulation. Natural language interfaces, such as NCNet and VisQA, have also been developed to make visualization creation more intuitive. However, these solutions either require tidy data input or introduce new complexities by focusing on low-level specifications similar to Falx.A team from Microsoft Research has proposed Data Formulator, an innovative visualization authoring tool built around a new paradigm called concept binding. It allows users to express their visualization intent by binding data concepts to visual channels, where data concepts can either come from existing columns or be created on demand. The tool supports two methods for creating new concepts: natural language prompts for data derivation and example-based input for data reshaping. When users select a chart type and map their desired concepts, Data Formulators AI backend infers the necessary data transformations and generates candidate visualizations. The system provides explanatory feedback for multiple candidates, enabling users to inspect, refine, and iterate on their visualizations through an intuitive interface.Data Formulators architecture is built around the core concept of treating data concepts as first-class objects that serve as abstractions of existing and potential future table columns. This design fundamentally differs from traditional approaches by focusing on concept-level transformations rather than table-level operators, making it more intuitive for users to communicate with the AI agent and verify results. The natural language component of the tool utilizes LLMs ability to understand high-level intent and natural concepts, while the programming-by-example component offers precise, unambiguous reshaping operations through demonstration. This hybrid architecture allows users to work with familiar shelf-configuration tools while accessing powerful transformation capabilities.Data Formulators evaluation through user testing revealed promising results in task completion and usability. Participants completed all assigned visualization tasks within an average time of 20 minutes, with Task 6 requiring the most time due to its complexity involving 7-day moving average calculations. The systems dual-interaction approach proved effective, though some participants needed occasional hints regarding concept type selection and data type management. For derived concepts, users averaged 1.62 prompt attempts with relatively concise descriptions (average of 7.28 words), and the system generated approximately 1.94 candidates per prompt. Most challenges encountered were minor and related to interface familiarization rather than fundamental usability issues.In conclusion, the team introduced Data Formulator which represents a significant advancement in visualization authoring by effectively addressing the persistent challenge of data transformation through its concept-driven approach. The tools innovative combination of AI assistance and user interaction enables authors to create complex visualizations without directly handling data transformations. User studies have validated the tools effectiveness, showing that even users facing complex data transformation requirements can successfully create their desired visualizations. Looking forward, this concept-driven visualization approach shows promise for influencing the next generation of visual data exploration and authoring tools, potentially eliminating the long-standing barrier of data transformation in visualization creation.Check outthePaper and GitHub Page.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our75k+ ML SubReddit. Sajjad AnsariSajjad Ansari is a final year undergraduate from IIT Kharagpur. As a Tech enthusiast, he delves into the practical applications of AI with a focus on understanding the impact of AI technologies and their real-world implications. He aims to articulate complex AI concepts in a clear and accessible manner.Sajjad Ansarihttps://www.marktechpost.com/author/sajjadansari/ByteDance Introduces UltraMem: A Novel AI Architecture for High-Performance, Resource-Efficient Language ModelsSajjad Ansarihttps://www.marktechpost.com/author/sajjadansari/Adaptive Inference Budget Management in Large Language Models through Constrained Policy OptimizationSajjad Ansarihttps://www.marktechpost.com/author/sajjadansari/Microsoft AI Researchers Release LLaVA-Rad: A Lightweight Open-Source Foundation Model for Advanced Clinical Radiology Report GenerationSajjad Ansarihttps://www.marktechpost.com/author/sajjadansari/ACECODER: Enhancing Code Generation Models Through Automated Test Case Synthesis and Reinforcement Learning [Recommended] Join Our Telegram Channel0 Commentarios ·0 Acciones ·84 Views
-
Salesforce AI Research Introduces Reward-Guided Speculative Decoding (RSD): A Novel Framework that Improves the Efficiency of Inference in Large Language Models (LLMs) Up To 4.4 Fewer FLOPswww.marktechpost.comIn recent years, the rapid scaling of large language models (LLMs) has led to extraordinary improvements in natural language understanding and reasoning capabilities. However, this progress comes with a significant caveat: the inference processgenerating responses one token at a timeremains a computational bottleneck. As LLMs grow in size and complexity, the latency and energy demands for sequential token generation become substantial. These challenges are particularly acute in real-world deployments, where cost, speed, and scalability are critical. Traditional decoding approaches, such as greedy or beam search methods, often require repeated evaluations of large models, leading to high computational overhead. Moreover, even with parallel decoding techniques, maintaining both the efficiency and the quality of generated outputs can be elusive. This scenario has spurred a search for novel techniques that can reduce inference costs without sacrificing accuracy. Researchers have therefore been exploring hybrid approaches that combine lightweight models with more powerful counterparts, striving for an optimal balance between speed and performancea balance that is essential for real-time applications, interactive systems, and large-scale deployment in cloud environments.Salesforce AI Research Introduces Reward-Guided Speculative Decoding (RSD), a novel framework aimed at improving the efficiency of inference in large language models (LLMs). At its core, RSD leverages a dual-model strategy: a fast, lightweight draft model works in tandem with a more robust target model. The draft model generates preliminary candidate outputs rapidly, while a process reward model (PRM) evaluates the quality of these outputs in real time. Unlike traditional speculative decoding, which insists on strict unbiased token matching between the draft and target models, RSD introduces a controlled bias. This bias is carefully engineered to favor high-reward outputsthose deemed more likely to be correct or contextually relevantthus significantly reducing unnecessary computations. The approach is grounded in a mathematically derived threshold strategy that determines when the target model should intervene. By dynamically mixing outputs from both models based on a reward function, RSD not only accelerates the inference process but also enhances the overall quality of the generated responses. Detailed in the attached paper , this breakthrough methodology represents a significant leap forward in addressing the inherent inefficiencies of sequential token generation in LLMs.Technical Details and Benefits of RSDDelving into the technical aspects, RSD operates by integrating two models in a sequential yet collaborative manner. Initially, the draft model produces candidate tokens or reasoning steps at a low computational cost. Each candidate is then evaluated using a reward function, which acts as a quality gate. If a candidate tokens reward exceeds a predetermined threshold, the output is accepted; if not, the system calls upon the more computationally intensive target model to generate a refined token. This process is guided by a weighting functiontypically a binary step functionthat adjusts the reliance on the draft versus the target model. The dynamic quality control afforded by the process reward model (PRM) ensures that only the most promising outputs bypass the target model, thereby saving on computation. One of the standout benefits of this approach is biased acceleration, where the controlled bias is not a detriment but rather a strategic choice to prioritize high-reward outcomes. This results in two key benefits: first, the overall inference process can be up to 4.4 faster compared to running the target model alone; second, it often yields a +3.5 average accuracy improvement over conventional parallel decoding baselines. In essence, RSD harmonizes efficiency with accuracyallowing for a substantial reduction in the number of floating-point operations (FLOPs) while still delivering outputs that meet or even exceed the performance of the target model. The theoretical underpinnings and algorithmic details, such as the mixture distribution defined by PRSD and the adaptive acceptance criterion, provide a robust framework for practical deployment in diverse reasoning tasks.Insights The empirical validation of RSD is compelling. Experiments detailed in the paper demonstrate that, on challenging benchmarks such as GSM8K, MATH500, OlympiadBench, and GPQA, RSD consistently delivers superior performance. For instance, on the MATH500 benchmarka dataset designed to test mathematical reasoningRSD achieved an accuracy of 88.0 when configured with a 72B target model and a 7B PRM, compared to 85.6 for the target model running alone. Not only does this configuration reduce the computational load by nearly 4.4 fewer FLOPs, but it also enhances reasoning accuracy. The results underscore the potential of RSD to outperform traditional methods, such as speculative decoding (SD) and even advanced search-based techniques like beam search or Best-of-N strategies. Conclusion: A New Paradigm for Efficient LLM InferenceIn conclusion, Reward-Guided Speculative Decoding (RSD) marks a significant milestone in the quest for more efficient LLM inference. By intelligently combining a lightweight draft model with a powerful target model, and by introducing a reward-based acceptance criterion, RSD effectively addresses the dual challenges of computational cost and output quality. The innovative approach of biased acceleration allows the system to selectively bypass expensive computations for high-reward outputs, thereby streamlining the inference process. The dynamic quality control mechanismanchored by a process reward modelensures that computational resources are allocated judiciously, engaging the target model only when necessary. With empirical results showing up to 4.4 faster inference and an average accuracy improvement of +3.5 over traditional methods, RSD not only paves the way for more scalable LLM deployments but also sets a new standard in the design of hybrid decoding frameworks. Check outthePaper and GitHub Page.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our75k+ ML SubReddit. Asif RazzaqWebsite| + postsBioAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.Asif Razzaqhttps://www.marktechpost.com/author/6flvq/Meta AI Introduces CoCoMix: A Pretraining Framework Integrating Token Prediction with Continuous ConceptsAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Convergence Labs Introduces the Large Memory Model (LM2): A Memory-Augmented Transformer Architecture Designed to Address Long Context Reasoning ChallengesAsif Razzaqhttps://www.marktechpost.com/author/6flvq/OpenAI Introduces Competitive Programming with Large Reasoning ModelsAsif Razzaqhttps://www.marktechpost.com/author/6flvq/A Step-by-Step Tutorial on Robustly Validating and Structuring User, Product, and Order Data with Pydantic in Python [Recommended] Join Our Telegram Channel0 Commentarios ·0 Acciones ·55 Views
-
Layer Parallelism: Enhancing LLM Inference Efficiency Through Parallel Execution of Transformer Layerswww.marktechpost.comLLMs have demonstrated exceptional capabilities, but their substantial computational demands pose significant challenges for large-scale deployment. While previous studies indicate that intermediate layers in deep neural networks can be reordered or removed without severely impacting performance, these insights have not been systematically leveraged to reduce inference costs. Given the rapid expansion of LLMs, which often contain hundreds of billions of parameters, optimizing inference is critical for improving efficiency, reducing latency, and reducing operational expenses. High-traffic applications relying on cloud-based LLM inference can incur monthly costs in the millions, making efficiency-driven solutions essential. Furthermore, the ability to deploy these models on resource-constrained devices necessitates strategies that maintain performance while minimizing computational overhead. Despite architectural similarities between modern transformers and deep residual networks, where layer depth can sometimes be redundant, research has yet to explore these redundancies to fully optimize inference efficiency.Several approaches exist for improving the computational efficiency of LLMs, including pruning, quantization, and parallelization. Pruning eliminates redundant parameters to introduce sparsity, improving memory utilization and processing speed. On the other hand, Quantization reduces precision by converting floating-point computations to lower-bit integer formats like INT8 or INT4, enhancing hardware efficiency and energy savings. Additionally, parallelization techniques, such as tensor and pipeline parallelism, distribute workloads across multiple processing units to accelerate inference while addressing communication overhead. Recent innovations have also explored architectural modifications at the layer level, including layer fusion and dynamic recurrent execution, to streamline computational graphs. However, research has yet to focus on fusing consecutive layers through tensor parallelism, presenting an open avenue for optimizing inference further.Researchers from the University of Geneva, EPFL, and Meta FAIR propose a method to reduce the depth of pre-trained LLMs while preserving performance. Modifying the computational graph enables parallel execution of grouped layer pairs, improving inference speed by approximately 1.20 without requiring retraining. Their approach maintains 95%-99% accuracy across perplexity and In-Context Learning (ICL) benchmarks. Additionally, fine-tuning helps recover minor performance losses. This method significantly enhances efficiency for large-scale LLM deployment, demonstrating that structural transformations, such as layer merging and reordering, can optimize computational workload while sustaining model effectiveness.The study examines the effective depth of LLMs by applying transformations such as shuffling, merging, and pruning layers. Results indicate weak dependencies between intermediary layers, enabling certain layers to be reordered or parallelized with minimal perplexity loss. Running contiguous layers in parallel reduces depth while preserving performance, highlighting layer independence. Further, Layer Parallelism distributes computations across GPUs, optimizing efficiency through tensor parallelism. Modifications to attention and feed-forward networks ensure effective parallel execution. Adjustments to layer normalization help maintain stability. These findings suggest that transformer models can leverage parallelism to enhance computational efficiency without requiring substantial architectural modifications.The study evaluates layer parallelism regarding inference speed, ICL accuracy, and fine-tuning for performance recovery. Experiments use Llama2 7B and Llama3.2 3B on dual A100 GPUs. Layer Parallelism is applied to merged layers, with Tensor Parallelism elsewhere. Results show that beyond 14 layers for Llama2 7B and 10 for Llama3.2 3B, ICL accuracy declines. Speed improves proportionally, reaching a 1.38x boost at aggressive parallelism. Fine-tuning parallelized layers on RedPajama data significantly restores accuracy, improving MMLU from 83.6% to 94.4% while maintaining speed gains, demonstrating the viability of Layer Parallelism with targeted adjustments.In conclusion, the study introduces Layer Parallelism (LP), which restructures transformer computation by executing layer pairs in parallel, improving inference speed without retraining. Applied to Llama2 7B and Llama3.2 3B, LP reduced model depth by 21% and 18%, yielding speed-ups of 1.29x and 1.22x, respectively. Fine-tuning recovered 10.8% of lost accuracy, proving its effectiveness. These findings challenge the notion that transformer layers must process sequentially, suggesting selective parallelization is viable. LP enhances LLM efficiency in production, with future work exploring optimal layer grouping, interactions with quantization, and deeper theoretical insights into layer independence and computational efficiency.Check outthePaper.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our75k+ ML SubReddit. Sana HassanSana Hassan, a consulting intern at Marktechpost and dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. With a keen interest in solving practical problems, he brings a fresh perspective to the intersection of AI and real-life solutions.Sana Hassanhttps://www.marktechpost.com/author/sana-hassan/Can 1B LLM Surpass 405B LLM? Optimizing Computation for Small LLMs to Outperform Larger ModelsSana Hassanhttps://www.marktechpost.com/author/sana-hassan/Meet OpenThinker-32B: A State-of-the-Art Open-Data Reasoning ModelSana Hassanhttps://www.marktechpost.com/author/sana-hassan/Stanford Researchers Introduce SIRIUS: A Self-Improving Reasoning-Driven Optimization Framework for Multi-Agent SystemsSana Hassanhttps://www.marktechpost.com/author/sana-hassan/Frame-Dependent Agency: Implications for Reinforcement Learning and Intelligence [Recommended] Join Our Telegram Channel0 Commentarios ·0 Acciones ·54 Views
-
Open O1: Revolutionizing Open-Source AI with Cutting-Edge Reasoning and Performancewww.marktechpost.comThe Open O1 project is a groundbreaking initiative aimed at matching the powerful capabilities of proprietary models, particularly OpenAIs O1, through an open-source approach. By leveraging advanced training methodologies and community-driven development, Open O1 seeks to democratize access to state-of-the-art AI models.Proprietary AI models like OpenAIs O1 have demonstrated exceptional capabilities in reasoning, tool use, and mathematical problem-solving. However, these models are closed-source, limiting accessibility and customization for researchers and developers. Existing open-source alternatives often lag behind in performance due to limitations in data quality, training techniques, and computational efficiency.The Open O1 project seeks to bridge this gap by curating high-quality Supervised Fine-Tuning (SFT) data for Chain-of-Thought (CoT) Activation, which enhances logical reasoning and problem-solving abilities in smaller models. This innovative approach enables models like LLaMA and Qwen to achieve long-context reasoning capabilities that were previously limited to proprietary systems.To achieve performance parity with OpenAIs O1, the Open O1 team follows a multi-stage approach. First, a specialized O1-style dataset is used to train the models, ensuring high-quality reasoning and contextual understanding. Next, models such as OpenO1-LLaMA-8B and OpenO1-Qwen-7B undergo rigorous Supervised Fine-Tuning (SFT) with optimized hyperparameters for enhanced CoT reasoning. The models incorporate adaptive scaling techniques to maximize efficiency at inference time, allowing for better generalization across tasks. Finally, Open O1 also provides multiple deployment options, including quantized versions for Hugging Face and local infrastructure support.Open O1s performance has been extensively evaluated against industry benchmarks, demonstrating significant improvements over previous open-source models. Below is a comparison of LLaMA3.1-8B-Instruct and OpenO1-LLaMA-8B across multiple benchmarks:These results highlight Open O1s superior performance in mathematical reasoning (MATH), general knowledge understanding (MMLU), and complex reasoning tasks (BBH). Although it slightly trails in Hellaswag, the models overall performance demonstrates its potential as a powerful open-source alternative.The Open O1 team is committed to continuous innovation and expanding the models capabilities. They have planned include enhanced reward model development, introducing a reinforcement learning framework to refine model outputs and reasoning processes, optimizing training pipelines for better scalability and efficiency, and establishing a competitive chatbot arena to benchmark Open O1 against leading models in real-world tasks. Additionally, research into O1-style scaling laws for both training and inference efficiency is underway.Built on the principles of transparency, collaboration, and accessibility, Open O1 ensures that AI advancements are not limited to a select few but are available to researchers, developers, and businesses worldwide. And the best part? **Its completely open-source! **With community-driven innovation, rigorous benchmarking, and a commitment to ethical AI, Open O1 is poised to redefine the landscape of large language models. As the project continues to evolve, it promises to bring powerful, accessible, and high-performance AI tools to the global community, ensuring that the future of AI remains open and inclusive.Check outtheGitHub Page and Model on Hugging Face.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our75k+ ML SubReddit. Recommended Open-Source AI Platform: IntellAgent is a An Open-Source Multi-Agent Framework to Evaluate Complex Conversational AI System (Promoted)The post Open O1: Revolutionizing Open-Source AI with Cutting-Edge Reasoning and Performance appeared first on MarkTechPost.0 Commentarios ·0 Acciones ·54 Views
-
Step by Step Guide on How to Build an AI News Summarizer Using Streamlit, Groq and Tavilywww.marktechpost.comIntroductionIn this tutorial, we will build an advanced AI-powered news agent that can search the web for the latest news on a given topic and summarize the results. This agent follows a structured workflow:Browsing: Generate relevant search queries and collect information from the web.Writing: Extracts and compiles news summaries from the collected information.Reflection: Critiques the summaries by checking for factual correctness and suggests improvements.Refinement: Improves the summaries based on the critique.Headline Generation: Generates appropriate headlines for each news summary.To enhance usability, we will also create a simple GUI using Streamlit. Similar to previous tutorials, we will use Groq for LLM-based processing and Tavily for web browsing. You can generate free API keys from their respective websites.Setting Up the EnvironmentWe begin by setting up environment variables, installing the required libraries, and importing necessary dependencies:Install Required Librariespip install langgraph==0.2.53 langgraph-checkpoint==2.0.6 langgraph-sdk==0.1.36 langchain-groq langchain-community langgraph-checkpoint-sqlite==2.0.1 tavily-python streamlitImport Libraries and Set API Keysimport osimport sqlite3from langgraph.graph import StateGraphfrom langchain_core.messages import SystemMessage, HumanMessagefrom langchain_groq import ChatGroqfrom tavily import TavilyClientfrom langgraph.checkpoint.sqlite import SqliteSaverfrom typing import TypedDict, Listfrom pydantic import BaseModelimport streamlit as st# Set API Keysos.environ['TAVILY_API_KEY'] = "your_tavily_key"os.environ['GROQ_API_KEY'] = "your_groq_key"# Initialize Database for Checkpointingsqlite_conn = sqlite3.connect("checkpoints.sqlite", check_same_thread=False)memory = SqliteSaver(sqlite_conn)# Initialize Model and Tavily Clientmodel = ChatGroq(model="Llama-3.1-8b-instant")tavily = TavilyClient(api_key=os.environ["TAVILY_API_KEY"])Defining the Agent StateThe agent maintains state information throughout its workflow:Topic: The topic on which user wants the latest news Drafts: The first drafts of the news summariesContent: The research content extracted from the search results of the TavilyCritique: The critique and recommendations generated for the draft in the reflection state.Refined Summaries: Updated news summaries after incorporating suggesstions from CritiqueHeadings: Headlines generated for each news article classclass AgentState(TypedDict): topic: str drafts: List[str] content: List[str] critiques: List[str] refined_summaries: List[str] headings: List[str]Defining PromptsWe define system prompts for each phase of the agents workflow:BROWSING_PROMPT = """You are an AI news researcher tasked with finding the latest news articles on given topics. Generate up to 3 relevant search queries."""WRITER_PROMPT = """You are an AI news summarizer. Write a detailed summary (1 to 2 paragraphs) based on the given content, ensuring factual correctness, clarity, and coherence."""CRITIQUE_PROMPT = """You are a teacher reviewing draft summaries against the source content. Ensure factual correctness, identify missing or incorrect details, and suggest improvements.----------Content: {content}----------"""REFINE_PROMPT = """You are an AI news editor. Given a summary and critique, refine the summary accordingly.-----------Summary: {summary}"""HEADING_GENERATION_PROMPT = """You are an AI news summarizer. Generate a short, descriptive headline for each news summary."""Structuring Queries and NewsWe use Pydantic to define the structure of queries and News articles. Pydantic allows us to define the structure of the output of the LLM. This is important because we want the queries to be a list of string and the extracted content from web will have multiple news articles, hence a list of strings.from pydantic import BaseModelclass Queries(BaseModel): queries: List[str]class News(BaseModel): news: List[str]Implementing the AI Agents1. Browsing NodeThis node generates search queries and retrieves relevant content from the web.def browsing_node(state: AgentState): queries = model.with_structured_output(Queries).invoke([ SystemMessage(content=BROWSING_PROMPT), HumanMessage(content=state['topic']) ]) content = state.get('content', []) for q in queries.queries: response = tavily.search(query=q, max_results=2) for r in response['results']: content.append(r['content']) return {"content": content}2. Writing NodeExtracts news summaries from the retrieved content.def writing_node(state: AgentState): content = "\n\n".join(state['content']) news = model.with_structured_output(News).invoke([ SystemMessage(content=WRITER_PROMPT), HumanMessage(content=content) ]) return {"drafts": news.news}3. Reflection NodeCritiques the generated summaries against the content.def reflection_node(state: AgentState): content = "\n\n".join(state['content']) critiques = [] for draft in state['drafts']: response = model.invoke([ SystemMessage(content=CRITIQUE_PROMPT.format(content=content)), HumanMessage(content="draft: " + draft) ]) critiques.append(response.content) return {"critiques": critiques}4. Refinement NodeImproves the summaries based on critique.def refine_node(state: AgentState): refined_summaries = [] for summary, critique in zip(state['drafts'], state['critiques']): response = model.invoke([ SystemMessage(content=REFINE_PROMPT.format(summary=summary)), HumanMessage(content="Critique: " + critique) ]) refined_summaries.append(response.content) return {"refined_summaries": refined_summaries}5. Headlines Generation NodeGenerates a short headline for each news summary.def heading_node(state: AgentState): headings = [] for summary in state['refined_summaries']: response = model.invoke([ SystemMessage(content=HEADING_GENERATION_PROMPT), HumanMessage(content=summary) ]) headings.append(response.content) return {"headings": headings}Building the UI with Streamlit# Define Streamlit appst.title("News Summarization Chatbot")# Initialize session stateif "messages" not in st.session_state: st.session_state["messages"] = []# Display past messagesfor message in st.session_state["messages"]: with st.chat_message(message["role"]): st.markdown(message["content"])# Input field for useruser_input = st.chat_input("Ask about the latest news...")thread = 1if user_input: st.session_state["messages"].append({"role": "user", "content": user_input}) with st.chat_message("assistant"): loading_text = st.empty() loading_text.markdown("*Thinking...*") builder = StateGraph(AgentState) builder.add_node("browser", browsing_node) builder.add_node("writer", writing_node) builder.add_node("reflect", reflection_node) builder.add_node("refine", refine_node) builder.add_node("heading", heading_node) builder.set_entry_point("browser") builder.add_edge("browser", "writer") builder.add_edge("writer", "reflect") builder.add_edge("reflect", "refine") builder.add_edge("refine", "heading") graph = builder.compile(checkpointer=memory) config = {"configurable": {"thread_id": f"{thread}"}} for s in graph.stream({"topic": user_input}, config): # loading_text.markdown(f"*{st.session_state['loading_message']}*") print(s) s = graph.get_state(config).values refined_summaries = s['refined_summaries'] headings = s['headings'] thread+=1 # Display final response loading_text.empty() response_text = "\n\n".join([f"{h}\n{s}" for h, s in zip(headings, refined_summaries)]) st.markdown(response_text) st.session_state["messages"].append({"role": "assistant", "content": response_text})ConclusionThis tutorial covered the entire process of building an AI-powered news summarization agent with a simple Streamlit UI. Now you can play around with this and make some further improvements like:A better GUI for enhanced user interaction.Incorporating Iterative refinement to make sure the summaries are accurate and appropriate.Maintaining a context to continue conversation about particular news.Happy coding!Also,feel free to follow us onTwitterand dont forget to join our75k+ ML SubReddit. Vineet KumarVineet Kumar is a consulting intern at MarktechPost. He is currently pursuing his BS from the Indian Institute of Technology(IIT), Kanpur. He is a Machine Learning enthusiast. He is passionate about research and the latest advancements in Deep Learning, Computer Vision, and related fields.Vineet Kumarhttps://www.marktechpost.com/author/vineet1897/Open O1: Revolutionizing Open-Source AI with Cutting-Edge Reasoning and PerformanceVineet Kumarhttps://www.marktechpost.com/author/vineet1897/Building an AI Research Agent for Essay WritingVineet Kumarhttps://www.marktechpost.com/author/vineet1897/Efficient Alignment of Large Language Models Using Token-Level Reward Guidance with GenARMVineet Kumarhttps://www.marktechpost.com/author/vineet1897/Chain-of-Associated-Thoughts (CoAT): An AI Framework to Enhance LLM Reasoning [Recommended] Join Our Telegram Channel0 Commentarios ·0 Acciones ·96 Views
-
Can Users Fix AI Bias? Exploring User-Driven Value Alignment in AI Companionswww.marktechpost.comLarge language model (LLM)based AI companions have evolved from simple chatbots into entities that users perceive as friends, partners, or even family members. Yet, despite their human-like capability, the AI companions often make biased, discriminatory, and harmful claims. These biases are capable of enforcing inherent stereotypes and causing psychological suffering, particularly in marginalized communities. Conventional methods of value alignment, controlled predominantly by developers, are unable to foresee and accommodate the needs of users in common scenarios. Users are frequently subject to discriminatory AI output in disagreement with their values, creating feelings of frustration and helplessness. In contrast, this paper investigates a new paradigm where users themselves take the initiative to correct biases in AI by multiple mechanisms. Understanding how users navigate and minimize these biases is essential to crafting AI systems that empower communities in concert with ethical engagement.Conventional strategies for reducing AI biases, such as fine-tuning, prompt engineering, and reinforcement learning using human feedback, are based on top-down intervention by developers. Although these mechanisms try to realign AI actions with pre-established ethical norms, they are mostly incapable of managing the diverse and dynamic modalities by which users engage with AI companions. Current attempts at algorithm auditing are primarily aimed at discovering AI biases and are unable to analyze how users themselves make a conscious effort to correct them. These shortcomings are a testament to the necessity for a more elastic and participative mechanism where users themselves have better control over directing AI behavior.Researchers from Stanford University, Carnegie Mellon University, City University of Hong Kong, and Tsinghua University introduce a user-driven framework where individuals take an active role in identifying and correcting AI biases. This research looks at how users do this activity through analysis of 77 social media reports of discriminatory AI responses and semi-structured interviews with 20 seasoned AI companion users. In contrast to the conventional developer-led alignment, this method is concerned with user agency in shaping AI behavior. The research uncovers six types of biased AI responses, three conceptual models by which users account for AI behavior, and seven distinct methods users utilize to counteract the biases. The research contributes to the general conversation on human-AI interaction by showing that not only do users detect bias but also reframe AI responses to their values.A mixed-methods approach was used, integrating content analysis of user complaints and qualitative user interviews. Researchers gathered 77 user complaints about discriminatory AI statements on sites like Reddit, TikTok, Xiaohongshu, and Douban. Twenty long-term users of using and re-aligning AI companions were recruited, with each participating in 1-2 hour interviews with recall tasks and think-aloud exercises in which they chatted with biased AI companions. Reflexive thematic analysis was used to code user complaints and alignment strategies. Six broad categories of discriminatory AI statements were found, including misogyny, LGBTQ+ bias, appearance bias, ableism, racism, and socioeconomic bias. Users also thought about AI behavior in three different ways. Some thought about AI as a machine, blaming bias on technical bugs caused by training data and algorithmic constraints. Others thought about AI as a baby, treating AI as an immature being that could be molded and educated about what was right and wrong. A third thought about AI as a cosplayer, blaming bias on role-playing environments rather than the algorithm. Seven prevailing strategies were identified as user-driven alignment strategies, which were categorized into three broad approaches. Technical strategies were AI response modifications, including regenerating or rewriting statements and negative feedback. Argumentative strategies involve reasoning, persuasion, or anger expression to correct biases. Character strategies were AI role-setting modifications or the use of out-of-character interventions to reconstruct interactions.The findings show that user-initiated value alignment is a recursive process driven by personal interpretations of AI behavior and resulting in different bias mitigation strategies. People who think of AI as a machine system rely primarily on technical solutions, such as response regeneration or flagging offensive content. People who think of AI as like a child prefer reasoning and persuasive strategies to correct biases, while people who think of AI as a performer adjust character parameters to reduce opportunities for biased responses. Of the seven alignment strategies identified, gentle persuasion and reasoning were the most effective in achieving long-term behavior change, while anger expressions and technical solutions like response regeneration produced mixed results. While users can influence AI behavior in the long term, obstacles persist, such as the emotional burden of constantly correcting AI and the persistence of biases due to memory retention of the system. These findings suggest that AI platforms must include more adaptive learning models and community-based approaches that empower users with greater control over bias correction while reducing cognitive and emotional loads.User-centered alignment of values redefines the human-AI interaction in a people-centered approach to AI behavior modulation as active agents. From an analysis of user grievances and actual alignment practice, this research highlights the limitations of expert-driven frameworks and stresses the value of participatory approaches involving direct user participation. The findings suggest that AI platforms must integrate collaborative and community-based alignment capabilities that allow users to share strategies and work with developers to improve AI responses. Future research must address the challenge of identifying scalable methods for incorporating user feedback into AI training, together with mitigating ethical concerns around potential misuse and psychological impacts on users. By shifting focus from developer-driven interventions to active user participation, this framework provides a foundation for AI systems that are more responsive, ethically responsible, and attuned to multiple user perspectives.Check outthePaper.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our75k+ ML SubReddit. Aswin AkAswin AK is a consulting intern at MarkTechPost. He is pursuing his Dual Degree at the Indian Institute of Technology, Kharagpur. He is passionate about data science and machine learning, bringing a strong academic background and hands-on experience in solving real-life cross-domain challenges.Aswin Akhttps://www.marktechpost.com/author/aswinak/Anthropic AI Launches the Anthropic Economic Index: A Data-Driven Look at AIs Economic RoleAswin Akhttps://www.marktechpost.com/author/aswinak/Meet Huginn-3.5B: A New AI Reasoning Model with Scalable Latent ComputationAswin Akhttps://www.marktechpost.com/author/aswinak/LLMDet: How Large Language Models Enhance Open-Vocabulary Object DetectionAswin Akhttps://www.marktechpost.com/author/aswinak/Sundial: A New Era for Time Series Foundation Models with Generative AI [Recommended] Join Our Telegram Channel0 Commentarios ·0 Acciones ·63 Views
-
Can 1B LLM Surpass 405B LLM? Optimizing Computation for Small LLMs to Outperform Larger Modelswww.marktechpost.comTest-Time Scaling (TTS) is a crucial technique for enhancing the performance of LLMs by leveraging additional computational resources during inference. Despite its potential, there has been little systematic analysis of how policy models, Process Reward Models (PRMs), and problem complexity influence TTS, limiting its practical application. TTS can be categorized into Internal TTS, which encourages step-by-step reasoning through extended Chain-of-Thought (CoT) processes, and External TTS, which enhances performance using sampling or search-based methods with fixed models. The key challenge in External TTS lies in optimizing computational allocation for different tasks. Current methods employ PRMs to guide answer selection and scale test-time computation efficiently. However, a comprehensive evaluation of how factors impact TTS strategies remains unexplored, restricting the communitys understanding of optimal computation scaling for LLMs.Prior research has explored multiple strategies to enhance LLM performance, including majority voting, search-based approaches, and self-refinement techniques. Test-time methods such as CoT prompting, self-verification, and external tool integration have proven effective in improving reasoning without modifying model parameters. PRMs, which outperform Output Reward Models (ORMs), significantly refine LLM-generated outputs. Recent advancements in PRMs focus on efficient data collection methods, implicit rewards, and advanced ranking techniques to improve mathematical reasoning. Tools like ProcessBench and PRMBench have been developed to facilitate benchmarking and evaluate PRM effectiveness. The evolution of PRMs and TTS strategies underscores the need for systematic research to optimize inference-time computation and enhance LLM capabilities across diverse tasks.Researchers from Shanghai AI Laboratory, Tsinghua University, Harbin Institute of Technology, and BUPT investigate the impact of policy models, PRMs, and problem complexity on TTS through extensive experiments on MATH-500 and AIME24 tasks. Their findings show that compute-optimal TTS strategies depend on these factors, allowing smaller models (e.g., 1B, 3B, 7B) to outperform larger ones (e.g., 405B, GPT-4o, DeepSeek-R1) with greater efficiency. The study emphasizes the importance of reward-aware TTS for optimal scaling, demonstrating that strategic test-time computation significantly enhances LLM reasoning abilities across different architectures and task complexities.Compute-optimal TTS optimally distributes computational resources for each problem. Prior approaches rely on PRMs as verifiers, either trained on the same policy model (on-policy) or a different one (offline). On-policy PRMs yield more accurate rewards, while offline PRMs face out-of-distribution challenges. Given the high cost of training PRMs per model, a general approach is needed. Experiments show that rewards significantly influence TTS performance. Thus, a reward-aware strategy is proposed, integrating rewards into compute allocation. Additionally, problem difficulty is better assessed using absolute thresholds rather than quantiles for more effective scaling strategies.The study examines the effectiveness of Compute-Optimal TTS in enhancing the performance of small policy models compared to larger ones. Experiments assess whether TTS allows smaller models to outperform larger ones, improve upon CoT and majority voting, and surpass long-CoT methods. Findings reveal that small models using compute-optimal TTS can outperform significantly larger models on MATH-500 and AIME24 tasks. TTS improves efficiency by up to 256 compared to majority voting and boosts reasoning by 154.6% over CoT. Moreover, TTS outperforms several long-CoT-based methods, demonstrating its effectiveness in enhancing LLM reasoning capabilities.In conclusion, the study examines compute-optimal TTS across various policy models, PRMs, and task complexities. Findings highlight that smaller models can surpass larger ones using optimized TTS, with a 1B model outperforming a 405B model. A 7B PRM also effectively supervises a 72B policy model, emphasizing a shift towards weak-to-strong supervision. Future work should focus on improving supervision methods for enhanced reasoning. While results are based on mathematical tasks, expanding TTS to coding and chemistry remains unexplored. These insights underscore TTSs potential to refine LLM efficiency and adaptability across diverse challenges.Check outthePaper and Project Page.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our75k+ ML SubReddit. Sana HassanSana Hassan, a consulting intern at Marktechpost and dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. With a keen interest in solving practical problems, he brings a fresh perspective to the intersection of AI and real-life solutions.Sana Hassanhttps://www.marktechpost.com/author/sana-hassan/Meet OpenThinker-32B: A State-of-the-Art Open-Data Reasoning ModelSana Hassanhttps://www.marktechpost.com/author/sana-hassan/Stanford Researchers Introduce SIRIUS: A Self-Improving Reasoning-Driven Optimization Framework for Multi-Agent SystemsSana Hassanhttps://www.marktechpost.com/author/sana-hassan/Frame-Dependent Agency: Implications for Reinforcement Learning and IntelligenceSana Hassanhttps://www.marktechpost.com/author/sana-hassan/Advancing Scalable Text-to-Speech Synthesis: Llasas Transformer-Based Framework for Improved Speech Quality and Emotional Expressiveness [Recommended] Join Our Telegram Channel0 Commentarios ·0 Acciones ·77 Views
-
Meta AI Introduces CoCoMix: A Pretraining Framework Integrating Token Prediction with Continuous Conceptswww.marktechpost.comThe dominant approach to pretraining large language models (LLMs) relies on next-token prediction, which has proven effective in capturing linguistic patterns. However, this method comes with notable limitations. Language tokens often convey surface-level information, requiring models to process vast amounts of data to develop deeper reasoning capabilities. Additionally, token-based learning struggles with capturing long-term dependencies, making tasks that require planning and abstraction more difficult. Researchers have explored alternative strategies, such as knowledge distillation and structured input augmentation, but these approaches have not fully addressed the limitations of token-based learning. This raises an important question: Can LLMs be trained in a way that combines token-level processing with conceptual understanding? Meta AI introduces Continuous Concept Mixing (CoCoMix) as a potential solution.CoCoMix: A Different Approach to PretrainingCoCoMix integrates token prediction with the modeling of continuous concepts derived from hidden states of a pretrained model. The method employs a Sparse Autoencoder (SAE) to extract high-level semantic representations, which are then incorporated into the training process by interleaving them with token embeddings. This design allows the model to maintain the benefits of token-based learning while enhancing its ability to recognize and process broader conceptual structures. By enriching the token-based paradigm with concept-level information, CoCoMix aims to improve reasoning efficiency and model interpretability.Technical Details and BenefitsCoCoMix operates through three main components:Concept Extraction via Sparse Autoencoders (SAEs): A pretrained SAE identifies latent semantic features from a models hidden states, capturing information that extends beyond individual tokens.Concept Selection with Attribution Scoring: Not all extracted concepts contribute equally to predictions. CoCoMix employs attribution methods to determine which concepts are most influential and should be retained.Interleaving Continuous Concepts with Token Representations: The selected concepts are compressed into a continuous vector and integrated into the hidden states alongside token embeddings, allowing the model to utilize both token-level and conceptual information.This approach improves sample efficiency, enabling models to achieve comparable performance with fewer training tokens. Additionally, CoCoMix enhances interpretability by making it possible to inspect and adjust the extracted concepts, offering a clearer view of how the model processes information.Performance and EvaluationMeta AI evaluated CoCoMix across multiple benchmarks, including OpenWebText, LAMBADA, WikiText-103, HellaSwag, PIQA, SIQA, Arc-Easy, and WinoGrande. The findings indicate:Improved Sample Efficiency: CoCoMix matches the performance of next-token prediction while requiring 21.5% fewer training tokens.Enhanced Generalization: Across various model sizes (69M, 386M, and 1.38B parameters), CoCoMix demonstrated consistent improvements in downstream task performance.Effective Knowledge Transfer: CoCoMix supports knowledge transfer from smaller models to larger ones, outperforming traditional knowledge distillation techniques.Greater Interpretability: The integration of continuous concepts allows for greater control and transparency in model decision-making, providing a clearer understanding of its internal processes.ConclusionCoCoMix presents an alternative approach to LLM pretraining by combining token prediction with concept-based reasoning. By incorporating structured representations extracted via SAEs, CoCoMix enhances efficiency and interpretability without disrupting the underlying next-token prediction framework. Experimental results suggest that this method provides a balanced way to improve language model training, particularly in areas requiring structured reasoning and transparent decision-making. Future research may focus on refining concept extraction methods and further integrating continuous representations into pretraining workflows.Check outthePaper and GitHub Page.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our75k+ ML SubReddit. Asif RazzaqWebsite| + postsBioAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.Asif Razzaqhttps://www.marktechpost.com/author/6flvq/Convergence Labs Introduces the Large Memory Model (LM2): A Memory-Augmented Transformer Architecture Designed to Address Long Context Reasoning ChallengesAsif Razzaqhttps://www.marktechpost.com/author/6flvq/OpenAI Introduces Competitive Programming with Large Reasoning ModelsAsif Razzaqhttps://www.marktechpost.com/author/6flvq/A Step-by-Step Tutorial on Robustly Validating and Structuring User, Product, and Order Data with Pydantic in PythonAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Shanghai AI Lab Releases OREAL-7B and OREAL-32B: Advancing Mathematical Reasoning with Outcome Reward-Based Reinforcement Learning [Recommended] Join Our Telegram Channel0 Commentarios ·0 Acciones ·56 Views
-
Anthropic AI Launches the Anthropic Economic Index: A Data-Driven Look at AIs Economic Rolewww.marktechpost.comArtificial Intelligence is increasingly integrated into various sectors, yet there is limited empirical evidence on its real-world application across industries. Traditional research methodssuch as predictive modeling and user surveysstruggle to capture AIs evolving role in workplaces. This makes it difficult to assess its influence on productivity, labor markets, and economic structures. A more data-driven approach is necessary to gain meaningful insights into how AI is being utilized and its broader implications.Anthropic AI has launched the Anthropic Economic Index, an initiative designed to track AIs role in economic activities. The first report, based on millions of anonymized Claude conversations, maps AI usage across various job categories using the U.S. Department of Labors O*NET Database. The findings suggest that AI is primarily used in software development and writing tasks, with these categories accounting for nearly half of all AI interactions. Furthermore, 36% of occupations incorporate AI for at least a quarter of their associated tasks, indicating its growing presence in diverse industries. This framework provides a structured approach to observing AIs economic footprint over time.Technical Approach and Key BenefitsThe Anthropic Economic Index leverages Clio, a privacy-preserving analysis tool, to study over four million conversations from Claude.ai users. By categorizing AI interactions according to occupational tasks defined in O*NET, the research highlights patterns in AI adoption. Some key observations include:AI is widely used in software engineering and content creation, reflecting its strength in technical and creative domains.The depth of AI usage varies by occupation, with 4% of professions using AI for at least 75% of their tasks.Cognitive tasks, such as reading comprehension, writing, and critical thinking, dominate AI interactions, whereas physical and managerial tasks see lower engagement.AI adoption is highest in mid-to-high wage occupations, particularly in the technology sector, while its presence in lower-wage or highly specialized fields remains limited.One of the primary benefits of this approach is its ability to continuously track AIs economic role, helping businesses, policymakers, and researchers understand how AI is reshaping the workforce.Key Insights and PatternsThe study distinguishes between AI augmentationwhere AI enhances human capabilitiesand automation, where AI independently completes tasks. Findings indicate that 57% of AI interactions involve augmentation, such as refining ideas or generating drafts, while 43% are automation-driven, where AI executes tasks with minimal human intervention.Other key observations include:Software development and data science are the most AI-intensive fields, accounting for 37.2% of AI-related conversations.Writing, education, and business operations also show significant AI usage, particularly in content creation and analytical tasks.Occupations requiring physical labor, such as construction and healthcare support, demonstrate lower AI adoption.AI use is most common in jobs requiring a bachelors degree, especially those in Job Zone 4 (substantial preparation required), whereas highly specialized fields (Job Zone 5), such as medicine and law, show lower adoption due to professional and regulatory constraints.ConclusionThe Anthropic Economic Index provides a structured way to examine AIs impact on various occupations. While AI adoption is growing, its role differs across professionsenhancing work in some areas while automating tasks in others. By offering a data-backed perspective on how AI is integrated into the economy, this initiative enables better-informed discussions on the future of work. As AI evolves, continued analysis will be essential to understanding its long-term economic effects and guiding thoughtful policy decisions.Check outthePaper and Technical Details.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our75k+ ML SubReddit. Aswin AkAswin AK is a consulting intern at MarkTechPost. He is pursuing his Dual Degree at the Indian Institute of Technology, Kharagpur. He is passionate about data science and machine learning, bringing a strong academic background and hands-on experience in solving real-life cross-domain challenges.Aswin Akhttps://www.marktechpost.com/author/aswinak/Meet Huginn-3.5B: A New AI Reasoning Model with Scalable Latent ComputationAswin Akhttps://www.marktechpost.com/author/aswinak/LLMDet: How Large Language Models Enhance Open-Vocabulary Object DetectionAswin Akhttps://www.marktechpost.com/author/aswinak/Sundial: A New Era for Time Series Foundation Models with Generative AIAswin Akhttps://www.marktechpost.com/author/aswinak/Process Reinforcement through Implicit Rewards (PRIME): A Scalable Machine Learning Framework for Enhancing Reasoning Capabilities [Recommended] Join Our Telegram Channel0 Commentarios ·0 Acciones ·31 Views
-
Meet OpenThinker-32B: A State-of-the-Art Open-Data Reasoning Modelwww.marktechpost.comArtificial intelligence has made significant strides, yet developing models capable of nuanced reasoning remains a challenge. Many existing models struggle with complex problem-solving tasks, particularly in mathematics, coding, and scientific reasoning. These difficulties often arise due to limitations in data quality, model architecture, and the scalability of training processes. The need for open-data reasoning models that perform at a high level is increasingly important, especially as proprietary models continue to lead the field.OpenThinker-32B is an open-data reasoning model developed by the Open Thoughts team to address these challenges. Fine-tuned from Qwen2.5-32B-Instruct using the OpenThoughts-114k dataset, the model demonstrates strong performance across a range of reasoning tasks, including those in mathematics, coding, and scientific inquiry.From a technical perspective, OpenThinker-32B features 32.8 billion parameters and supports a context length of 16,000 tokens, allowing it to process complex tasks requiring extended context. The model was trained over three epochs using the LLaMa-Factory framework, employing a learning rate of 1e-5 with a cosine learning rate scheduler. Training was conducted on AWS SageMaker across four nodes, each equipped with eight H100 GPUs, over approximately 90 hours. This training setup enhances the models ability to manage intricate reasoning processes efficiently.Performance evaluations show that OpenThinker-32B outperforms other open-data reasoning models across multiple benchmarks. It achieves an accuracy of 90.6 on the MATH500 benchmark and a score of 61.6 on the GPQA-Diamond benchmark, indicating strong general problem-solving capabilities. These results reflect the models ability to handle a diverse set of reasoning challenges effectively.In summary, OpenThinker-32B presents a well-rounded contribution to the field of AI reasoning models. By utilizing a carefully curated dataset and a rigorous training process, it addresses many of the limitations of earlier models. Its strong benchmark performance suggests it is a valuable tool for researchers and practitioners working in artificial intelligence. As an open-source model, OpenThinker-32B encourages further exploration and innovation in reasoning-based AI systems.Check outTwitterand dont forget to join our75k+ ML SubReddit. Sana HassanSana Hassan, a consulting intern at Marktechpost and dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. With a keen interest in solving practical problems, he brings a fresh perspective to the intersection of AI and real-life solutions.Sana Hassanhttps://www.marktechpost.com/author/sana-hassan/Stanford Researchers Introduce SIRIUS: A Self-Improving Reasoning-Driven Optimization Framework for Multi-Agent SystemsSana Hassanhttps://www.marktechpost.com/author/sana-hassan/Frame-Dependent Agency: Implications for Reinforcement Learning and IntelligenceSana Hassanhttps://www.marktechpost.com/author/sana-hassan/Advancing Scalable Text-to-Speech Synthesis: Llasas Transformer-Based Framework for Improved Speech Quality and Emotional ExpressivenessSana Hassanhttps://www.marktechpost.com/author/sana-hassan/Google DeepMind Introduces AlphaGeometry2: A Significant Upgrade to AlphaGeometry Surpassing the Average Gold Medalist in Solving Olympiad Geometry [Recommended] Join Our Telegram Channel0 Commentarios ·0 Acciones ·54 Views
-
Meet Huginn-3.5B: A New AI Reasoning Model with Scalable Latent Computationwww.marktechpost.comArtificial intelligence models face a fundamental challenge in efficiently scaling their reasoning capabilities at test time. While increasing model size often leads to performance gains, it also demands significant computational resources and extensive training data, making such approaches impractical for many applications. Traditional techniques, such as expanding model parameters or employing Chain-of-Thought (CoT) reasoning, rely on explicit verbalization of intermediate steps. However, these methods are constrained by context length limitations and the need for task-specific training. Researchers have been exploring alternative approaches that enable AI to reason more efficiently, focusing on internal computations rather than producing additional tokens.Huginn-3.5B: A New Approach to Latent ReasoningResearchers from ELLIS Institute Tbingen, Max-Planck Institute for Intelligent Systems, Tbingen AI Center, University of Maryland, College Park, and Lawrence Livermore National Laboratory have introduced Huginn-3.5B, a model designed to rethink test-time computation. Huginn-3.5B leverages a recurrent depth approach, allowing it to iterate over its latent space during inference. This method refines its hidden state iteratively, rather than generating more tokens, resulting in a more efficient and scalable reasoning process. The model can allocate additional computational effort for complex queries while maintaining efficiency for simpler tasks.Key Features and BenefitsHuginn-3.5Bs core innovation lies in its depth-recurrent transformer architecture, which incorporates a looped processing unit. This mechanism enables the model to:Enhance reasoning dynamically: Huginn-3.5B adjusts its computational effort based on task complexity, iterating through latent space as needed.Reduce reliance on long context windows: Since reasoning occurs within the latent space, the model requires less memory and processing power.Function without specialized training data: Unlike Chain-of-Thought methods, Huginn-3.5B does not require explicit reasoning demonstrations to generalize effectively.Adapt compute per token: The model optimizes efficiency by determining how much computation each token requires.Facilitate efficient decoding: Huginn-3.5B refines its hidden state before generating output tokens, leading to improved coherence and reduced latency.Performance InsightsTrained on 800 billion tokens spanning general text, code, and mathematical reasoning, Huginn-3.5B was evaluated across various benchmarks. The findings include:Improved accuracy with increased computation: By iterating further in its latent space, Huginn-3.5B achieved performance levels comparable to much larger models.Competitiveness against similar-sized models: Huginn-3.5B outperformed Pythia-6.9B and Pythia-12B on reasoning benchmarks such as ARC and GSM8K.Task-dependent compute scaling: The model allocated additional resources to complex tasks like GSM8K while processing simpler tasks like OpenBookQA efficiently.Conclusion: The Role of Latent Reasoning in AIHuginn-3.5B offers an alternative perspective on AI reasoning by shifting from explicit token-based processing to computations within the latent space. This enables more efficient and adaptable test-time computation without necessitating larger models. As AI continues to evolve, recurrent depth reasoning may provide a promising direction, complementing existing scaling strategies while offering computational efficiency. Future research may further refine this approach, integrating it with mixture-of-expert models and fine-tuning techniques to enhance flexibility and performance.Check outthePaper.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our75k+ ML SubReddit. Aswin AkAswin AK is a consulting intern at MarkTechPost. He is pursuing his Dual Degree at the Indian Institute of Technology, Kharagpur. He is passionate about data science and machine learning, bringing a strong academic background and hands-on experience in solving real-life cross-domain challenges.Aswin Akhttps://www.marktechpost.com/author/aswinak/LLMDet: How Large Language Models Enhance Open-Vocabulary Object DetectionAswin Akhttps://www.marktechpost.com/author/aswinak/Sundial: A New Era for Time Series Foundation Models with Generative AIAswin Akhttps://www.marktechpost.com/author/aswinak/Process Reinforcement through Implicit Rewards (PRIME): A Scalable Machine Learning Framework for Enhancing Reasoning CapabilitiesAswin Akhttps://www.marktechpost.com/author/aswinak/Meta AI Introduces MILS: A Training-Free Multimodal AI Framework for Zero-Shot Image, Video, and Audio Understanding [Recommended] Join Our Telegram Channel0 Commentarios ·0 Acciones ·80 Views
-
LIMO: The AI Model that Proves Quality Training Beats Quantitywww.marktechpost.comReasoning tasks are yet a big challenge for most of the language models. Instilling a reasoning aptitude in models, particularly for programming and mathematical applications that require solid sequential reasoning, seems far distant. This problem could be attributed to the inherent complexity of these tasks that require a multi-step logical deduction approach planned with domain knowledge to find a structured solution path.LLMs are, therefore, supervised on massive amounts of data with hundreds of thousands of examples. For this reason, training is further based on two assumptions: the first is that learning such a cognitive skill is possible only with multiple supervised examples, and the second is that this training inevitably leads to memorization rather than generalization. Besides, this approach also brings high computational costs and the burden of data collection. This article discusses an approach that utilizes advancements in knowledge foundations and inference-time costs of LLM to eradicate the enormous data requirements.Researchers from Shanghai Jiao Tong University present a hypothesis Less-Is-More(LIMO), which says that in foundation models where domain knowledge has been comprehensively encoded during the pre-training process, we can instill sophisticated reasoning capabilities in the model through minimal and precise demonstrations of cognitive processes. This hypothesis stems from the recent developments in the LLM space where developers incorporate unprecedented amounts of mathematical content during pre-training, enriching them with maths and programming logic before they step into the work field. Furthermore, the emergence of techniques scaling longer reasoning chains has motivated this research significantly.According to the LIMO hypothesis, the elicitation threshold for complex reasoning is determined by two key factors:The latent presence of prerequisite knowledge within the models parameter space (the domain knowledge instilled during the pre-training)The effectiveness of minimal exemplars in demonstrating systematic problem-solving processes (post-training inference examples that act as cognitive prompts for solving reasoning tasks with available knowledge)Thus, LIMO leverages the rich embedded pre-training knowledge and provides detailed reasoning chains through minimal but well-structured chains. The proposed method focuses on the quality and structure of prompts over their quantity, forcing the model to think with the help of past lessons rather than simply recalling them. This way, the pipeline challenges the underlying notion that supervised fine-tuning makes the model memorized. The authors further investigated the relationship between reasoning and data and identified critical factors, including the synergy between pre-trained knowledge foundations and test-time computation scaling.The authors released a comprehensive open-source suite to ensure reproducibility, including their fine-tuned models, evaluation pipelines, training code, and carefully curated datasets with varying quality levels.Authors in their experiments attempted to teach models reasoning with just hundreds of examples instead of the previous hundreds of thousands. The authors evaluated LIMOs performance across 10 benchmarks to assess its out-of-distribution generalization capabilities. LIMOs performance on these datasets was impressive and promising. Notably, with only 817 curated training samples, LIMO achieved 57.1% accuracy on the highly challenging American Invitational Mathematics Examination (AIME) benchmark and 94.8% on the MATH dataset, superseding the SFT methods that gained 6.5% and 59.2% on respective benchmarks.LIMO thus achieved a 40.5% absolute improvement over models trained on 100 times more data, refuting the first assumption of supervised training to instill reasoningConclusion: Researchers gave an insightful hypothesis regarding the reasoning training regime of LLMs through a model LIMO. It challenged the underlying assumptions in SFT to instill reasoning.LIMO demonstrates that less can be more and shows commendable performance on challenging datasets, superseding SFT with skillfully orchestrated cognitive templates.Check outthePaper.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our75k+ ML SubReddit. Adeeba Alam AnsariAdeeba Alam Ansari is currently pursuing her Dual Degree at the Indian Institute of Technology (IIT) Kharagpur, earning a B.Tech in Industrial Engineering and an M.Tech in Financial Engineering. With a keen interest in machine learning and artificial intelligence, she is an avid reader and an inquisitive individual. Adeeba firmly believes in the power of technology to empower society and promote welfare through innovative solutions driven by empathy and a deep understanding of real-world challenges.Adeeba Alam Ansarihttps://www.marktechpost.com/author/adeeba-alam-ansari/Researchers from ETH Zurich and TUM Share Everything You Need to Know About Multimodal AI Adaptation and GeneralizationAdeeba Alam Ansarihttps://www.marktechpost.com/author/adeeba-alam-ansari/University of Bath Researchers Developed an Efficient and StableMachine Learning Training Method for Neural ODEs with O(1) Memory FootprintAdeeba Alam Ansarihttps://www.marktechpost.com/author/adeeba-alam-ansari/Baidu Research Introduces EICopilot: An Intelligent Agent-based Chatbot to Retrieve and Interpret Enterprise Information from Massive Graph DatabasesAdeeba Alam Ansarihttps://www.marktechpost.com/author/adeeba-alam-ansari/Test-Time Preference Optimization: A Novel AI Framework that Optimizes LLM Outputs During Inference with an Iterative Textual Reward Policy [Recommended] Join Our Telegram Channel0 Commentarios ·0 Acciones ·80 Views
Quizás te interese…