Marktechpost AI
Marktechpost AI
AI/ML Research and Dev News Platform (1 million+monthly traffic) | 50k+ ML subreddit | Contact: Asif@marktechpost.com
1 pessoas curtiram isso
737 Publicações
2 fotos
0 Vídeos
0 Anterior
Atualizações recentes
  • Meta AI Proposes Multi-Token Attention (MTA): A New Attention Method which Allows LLMs to Condition their Attention Weights on Multiple Query and Key Vectors
    www.marktechpost.com
    Large Language Models (LLMs) significantly benefit from attention mechanisms, enabling the effective retrieval of contextual information. Nevertheless, traditional attention methods primarily depend on single token attention, where each attention weight is computed from a single pair of query and key vectors. This design inherently constrains the models ability to discern contexts requiring the integration of multiple token signals, thereby limiting its effectiveness on complex linguistic dependencies. For example, identifying sentences simultaneously containing both Alice and rabbit is challenging because conventional attention mechanisms struggle to integrate multiple separate attention signals efficiently without substantially increasing model complexity.Meta AI addresses this limitation by introducing Multi-Token Attention (MTA), an advanced attention mechanism that conditions attention weights simultaneously on multiple query and key vectors. MTA integrates convolution operations over queries, keys, and attention heads, thus enhancing the precision and efficiency of contextual information retrieval. Specifically, the MTA framework consists of two convolutional components: key-query convolution, which aggregates multiple token signals within individual attention heads, and head mixing convolution, which facilitates information sharing among different attention heads. Additionally, the implementation employs group normalization with depth-dependent scaling to stabilize gradient flow, further improving model training stability and efficacy.At a technical level, MTA modifies conventional attention calculations by incorporating a two-dimensional convolution operation on the attention logits prior to softmax normalization. This convolution allows adjacent queries and keys to influence attention scores mutually, thus enabling the attention mechanism to identify contextual relationships involving multiple tokens more precisely. Consequently, the model efficiently aggregates local token interactions without substantially increasing the number of parameters or the dimensionality of attention vectors. Moreover, head convolution promotes effective knowledge transfer among attention heads, selectively amplifying relevant context signals while mitigating less pertinent information. Collectively, these enhancements yield a more robust attention mechanism capable of capturing complex multi-token interactions.Empirical evaluations validate the efficacy of MTA across several benchmarks. In a structured motivating task explicitly designed to illustrate the shortcomings of single-token attention mechanisms, MTA demonstrated near-perfect performance, achieving an error rate of only 0.1%, in contrast to standard Transformer models that exhibited error rates above 50%. Further large-scale experiments involving an 880M-parameter model trained on 105 billion tokens showed MTA consistently outperforming baseline architectures. MTA achieved superior validation perplexity scores across datasets such as arXiv, GitHub, and Wikipedia. Specifically, in tasks requiring extended context comprehension, such as Needle-in-the-Haystack and BabiLong benchmarks, MTA significantly exceeded the performance of standard Transformer models. In the Needle-in-the-Haystack task with 4K token contexts containing multiple needles, MTA attained accuracies ranging from 67% to 97.6%, surpassing standard models by substantial margins.In summary, Multi-Token Attention (MTA) presents a refined advancement in attention mechanisms by addressing fundamental limitations of traditional single-token attention. Leveraging convolutional operations to concurrently integrate multiple query-key interactions, MTA enhances the ability of language models to handle intricate contextual dependencies. These methodological improvements facilitate more precise and efficient performance, particularly in scenarios involving complex token interactions and long-range contextual understanding. Through targeted modifications to standard attention mechanisms, MTA contributes meaningfully to the evolution of more sophisticated, accurate, and computationally efficient language models.Check outthe Paper.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our85k+ ML SubReddit. [Register Now] miniCON Virtual Conference on OPEN SOURCE AI: FREE REGISTRATION + Certificate of Attendance + 3 Hour Short Event (April 12, 9 am- 12 pm PST) + Hands on Workshop [Sponsored]The post Meta AI Proposes Multi-Token Attention (MTA): A New Attention Method which Allows LLMs to Condition their Attention Weights on Multiple Query and Key Vectors appeared first on MarkTechPost.
    0 Comentários ·0 Compartilhamentos ·12 Visualizações
  • A Comprehensive Guide to LLM Routing: Tools and Frameworks
    www.marktechpost.com
    Deploying LLMs presents challenges, particularly in optimizing efficiency, managing computational costs, and ensuring high-quality performance. LLM routing has emerged as a strategic solution to these challenges, enabling intelligent task allocation to the most suitable models or tools. Lets delve into the intricacies of LLM routing, explore various tools and frameworks designed for its implementation, and examine academic perspectives on the subject.Understanding LLM RoutingLLM routing is a process of examining incoming queries or tasks and directing them to the best-suited language model or collection of models in a system. This guarantees that every task is treated by the optimal model suited to its particular needs, resulting in better-quality responses and optimal resource use. For example, simple questions may be handled by less resource-heavy, smaller models, whereas computationally heavy and sophisticated tasks may be assigned to more powerful LLMs. This dynamic reallocation optimizes computational expense, response time, and accuracy.How LLM Routing WorksThe LLM routing process typically involves three key steps:Query Analysis: The system examines the incoming query, considering content, intent, required domain knowledge, complexity, and specific user preferences or requirements.Model Selection: Based on the analysis, the router evaluates available models by assessing their capabilities, specializations, past performance metrics, current load, availability, and associated operational costs.Query Forwarding: The router directs the query to the selected model(s) for processing, ensuring that the most suitable resource handles each task.This intelligent routing mechanism enhances the overall performance of AI systems by ensuring that tasks are processed efficiently and effectively. citeturn0search0The Rationale Behind LLM RoutingThe requirement for LLM routing stems from the varying capabilities and resource demands of language models. Using one monolithic model for every task results in inefficiencies, particularly when less complex models can better respond to specific queries. Through routing, systems can dynamically allocate tasks according to the complexity and capability of available models, maximizing the use of computational resources. The approach increases throughput, lowers latency, and efficiently manages operational expense.Several innovative frameworks and tools have been developed to facilitate LLM routing, each bringing unique features to optimize resource utilization and maintain high-quality output.RouteLLMRouteLLM is a leading open-source framework that has been developed with the express purpose of maximizing the cost savings and efficiency of LLM deployment. Designed as a drop-in replacement for current API integrations such as OpenAIs client, RouteLLM integrates seamlessly with current infrastructure. The framework also dynamically assesses query complexity, sending simple or lower-resource queries to smaller, more cost-effective models and more difficult queries to heavy-duty, high-performance LLMs. In doing so, RouteLLM lowers operational expenses dramatically, with real-world deployments shown to save as much as 85% of costs while maintaining performance near GPT-4 levels. The platform is also extremely extensible, making it simple to incorporate new routing strategies and models and test them on varied tasks. RouteLLM achieves the highest routing accuracy and cost savings by dynamically routing queries to best-fit models depending on complexity. It offers robust extensibility for customization and benchmarking, enabling it to be extremely flexible for various deployment applications.NVIDIA AI Blueprint for LLM RoutingNVIDIA offers an advanced AI Blueprint designed explicitly for efficient multi-LLM routing. Leveraging a robust Rust-based backend powered by the NVIDIA Triton Inference Server, this tool ensures extremely low latency, often rivaling direct inference requests. NVIDIAs AI Blueprint framework is compatible with various foundational models, including NVIDIAs own NIM models and third-party LLMs, providing broad integration capabilities. Also, its compatibility with the OpenAI API standard allows developers to replace existing OpenAI-based deployments with minimal configuration changes, streamlining integration into the current infrastructure. NVIDIAs AI Blueprint prioritizes performance through a highly optimized architecture that reduces latency. It offers broad configurability with multiple foundational models, simplifying the deployment of diverse LLM ecosystems.Martian: Model RouterMartians Model Router is yet another advanced solution intended to enhance the operational efficiency of AI systems utilizing multiple LLMs. The solution provides uninterrupted uptime by redirecting inquiries successfully in real time during outages or performance issues, thus delivering equal service quality. Martians routing algorithms are intelligent and examine the incoming queries to select models accordingly based on their capabilities and current status. This smart decision-making mechanism enables Martian to utilize resources optimally, minimizing infrastructure expenses without compromising response speed or accuracy. Martians Model Router is well-equipped to ensure system reliability through real-time rerouting. Its sophisticated analysis capabilities ensure that every query reaches the best model, effectively balancing performance and operational expenses.LangChainLangChain is a general-purpose and popular software framework for plugging LLMs into applications, with strong features architected specifically for intelligent routing. It makes it easy to plug in different LLMs, allowing developers to apply rich routing schemes that choose the right model depending on the needs of the task, performance requirements, and cost. LangChain is compatible with varied use-cases, such as chatbots, summarization of text, analysis of documents, and code completion tasks, proving versatility in varied applications and settings. LangChain is highly compatible with ease of integration and flexibility, enabling developers to introduce effective routing techniques for various application setups. LangChain effectively copes with varied operating settings, collectively increasing several LLMs usability.TryageTryage is an innovative method for context-aware routing, drawn from biological metaphors to brain anatomy. It is based on an advanced perceptive router that can predict the performance of various models in terms of input queries and choose the best model to apply. The routing decisions made by Tryage take into consideration anticipated performance, user-level goals, and limitations to deliver optimized and personalized routing results. Its predictive features make it superior to most conventional routing systems, especially in dynamically changing operating environments. Tryage stands out by being context-sensitive in its performance prediction, mapping routing decisions tightly to individual user goals and constraints. Its predictive accuracy supports accurate and customized query allocation, maximizing resource utilization and response quality.PickLLMPickLLM is an adaptive routing system that utilizes reinforcement learning (RL) techniques to control the choice of language models. With an RL-based router, PickLLM repeatedly monitors and learns from cost, latency, and response accuracy metrics to adjust its routing decisions. This iterative learning makes the routing system more efficient and accurate over time. Developers can tailor PickLLMs reward function to their specific business priorities, balancing cost and quality dynamically. PickLLM differentiates itself by the reinforcement learning-based methodology, which supports adaptive and continuously improving routing choices. Its ability to define custom objectives flexibly ensures compatibility with varied operation priorities.MasRouterMasRouter solves routing problems in multi-agent AI systems where specialized LLMs work together on complicated tasks. Using a cascaded controller network, MasRouter effectively decides collaboration modes, allocates roles to various agents, and dynamically routes tasks across available LLMs. Its architecture provides optimal collaboration between specialized models, efficiently handling complex, multi-dimensional queries while maintaining overall system performance and computational efficiency. MasRouters biggest strength lies in its advanced multi-agent coordination, which allows for effective role assignment and collaboration-based routing. It performs best task management even in intricate, multi-model AI implementations.Academic Perspectives on LLM RoutingKey contributions include:Implementing Routing Strategies in Large Language Model-Based SystemsThis paper explores key considerations for integrating routing into LLM-based systems, focusing on resource management, cost definition, and strategy selection. It offers a novel taxonomy of existing approaches and a comparative analysis of industry practices. The paper also identifies critical challenges and directions for future research in LLM routing.Bottlenecks and Considerations in LLM RoutingDespite its substantial benefits, LLM routing presents several challenges that organizations and developers must effectively address. These include:In conclusion, LLM routing represents a vital strategy in optimizing the deployment and utilization of large language models. Routing mechanisms significantly enhance AI system efficiency by intelligently assigning tasks to the most suitable models based on complexity, performance, and cost factors. Although routing introduces challenges such as latency, scalability, and cost management complexities, advancements in intelligent, adaptive routing solutions promise to address these effectively. With the continuous evolution of frameworks, tools, and research in this domain, LLM routing undoubtedly plays a central role in shaping future AI deployments, ensuring optimal performance, cost-efficiency, and user satisfaction.SourcesAlso,feel free to follow us onTwitterand dont forget to join our85k+ ML SubReddit. Sana HassanSana Hassan, a consulting intern at Marktechpost and dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. With a keen interest in solving practical problems, he brings a fresh perspective to the intersection of AI and real-life solutions.Sana Hassanhttps://www.marktechpost.com/author/sana-hassan/Understanding AI Agent Memory: Building Blocks for Intelligent SystemsSana Hassanhttps://www.marktechpost.com/author/sana-hassan/Advancing Medical Reasoning with Reinforcement Learning from Verifiable Rewards (RLVR): Insights from MED-RLVRSana Hassanhttps://www.marktechpost.com/author/sana-hassan/Efficient Inference-Time Scaling for Flow Models: Enhancing Sampling Diversity and Compute AllocationSana Hassanhttps://www.marktechpost.com/author/sana-hassan/UCLA Researchers Released OpenVLThinker-7B: A Reinforcement Learning Driven Model for Enhancing Complex Visual Reasoning and Step-by-Step Problem Solving in Multimodal Systems
    0 Comentários ·0 Compartilhamentos ·22 Visualizações
  • This AI Paper from ByteDance Introduces a Hybrid Reward System Combining Reasoning Task Verifiers (RTV) and a Generative Reward Model (GenRM) to Mitigate Reward Hacking
    www.marktechpost.com
    Reinforcement Learning from Human Feedback (RLHF) is crucial for aligning LLMs with human values and preferences. Despite introducing non-RL alternatives like DPO, industry-leading models such as ChatGPT/GPT-4, Claude, and Gemini continue to rely on RL algorithms like PPO for policy optimization. Recent research focuses on algorithmic improvements, including eliminating critic models to reduce computational costs, filtering noisy samples during PPO sampling, and enhancing reward models to mitigate reward hacking problems. However, only a few studies focus on RLHF data construction (i.e., training prompts) and its performance scaling based on these training prompts.The success of RLHF heavily depends on reward model quality, which faces three challenges: mis-specified reward modeling in representing human preferences, incorrect and ambiguous preferences in training datasets, and poor generalization ability. To address these issues, GenRM was introduced to validate model predictions against ground-truth responses, showing good resistance to reward hacking and gaining adoption in advanced LLMs like DeepSeekV3. Methods like principled data selection that filter overly challenging instances during training and strategic selection methods identify key training prompts to achieve comparable performance with reduced data. Performance scale analysis reveals that RLHF shows superior generalization compared to SFT on novel inputs but significantly reduces output diversity.Researchers from ByteDance Seed address a critical gap in RLHF research where the role of prompt-data construction and its scalability has received less attention. They explore data-driven bottlenecks that limit RLHF performance scaling, focusing on reward hacking and decreasing response diversity challenges. A hybrid reward system is introduced by combining reasoning task verifiers (RTV) and a generative reward model (GenRM) that shows stronger resistance to reward hacking and enables a more accurate assessment of responses against ground-truth solutions. Moreover, a novel prompt-selection method called Pre-PPO is introduced to identify inherently challenging training prompts less susceptible to reward hacking.The experimental setup employs two pre-trained language models of different scales: a smaller model with 25B parameters and a larger model with 150B parameters. The training dataset contains one million prompts from diverse domains, including mathematics, coding, instruction-following, creative writing, and logical reasoning. Moreover, the researchers constructed a detailed evaluation framework covering multiple skill areas: logical reasoning, instruction-following, STEM tasks, coding, natural language processing, knowledge, contextual understanding, and out-of-distribution generalization. The evaluation framework includes two versions (V1.0 and V2.0) with overlapping prompts, though V2.0 features more challenging prompts.The experimental results show that the proposed approach combining Pre-PPO with prioritized mathematical and coding tasks consistently outperforms the baseline method across model sizes and evaluation datasets. The approach shows an improvement of +1.1 over the baseline when evaluated at 100-step intervals using TestSet V1.0. When tested on the more challenging TestSet V2.0, the performance improvement increases to +1.4. The most substantial gains appear in mathematics-intensive and coding tasks, with an improvement of +3.9 points in STEM and +3.2 points in coding. These improvements are attributed to the strategic prioritization of mathematical reasoning and coding tasks during early RLHF training phases.In conclusion, this paper addresses critical bottlenecks in RLHF data scaling, specifically identifying reward hacking and reduced response diversity as significant challenges. The researchers proposed a combined approach featuring strategic prompt construction and early-stage training prioritization to solve this issue. The method uses RTV and GenRM to combat reward hacking alongside the novel Pre-PPO prompt selection strategy that identifies and prioritizes challenging training prompts. Analysis reveals that RTV supervision shows the strongest resistance to reward hacking, followed by GenRM with ground-truth labels and then the BT Reward Model. The research establishes a foundation for optimizing RLHF data construction and developing more principle methods to reward hacking and model alignment.Check outthe Paper and GitHub Page.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our85k+ ML SubReddit. [Register Now] miniCON Virtual Conference on OPEN SOURCE AI: FREE REGISTRATION + Certificate of Attendance + 3 Hour Short Event (April 12, 9 am- 12 pm PST) + Hands on Workshop [Sponsored] Sajjad AnsariSajjad Ansari is a final year undergraduate from IIT Kharagpur. As a Tech enthusiast, he delves into the practical applications of AI with a focus on understanding the impact of AI technologies and their real-world implications. He aims to articulate complex AI concepts in a clear and accessible manner.Sajjad Ansarihttps://www.marktechpost.com/author/sajjadansari/VideoMind: A Role-Based Agent for Temporal-Grounded Video UnderstandingSajjad Ansarihttps://www.marktechpost.com/author/sajjadansari/PilotANN: A Hybrid CPU-GPU System For Graph-based ANNSSajjad Ansarihttps://www.marktechpost.com/author/sajjadansari/This AI Paper Propose the UI-R1 Framework that Extends Rule-based Reinforcement Learning to GUI Action Prediction TasksSajjad Ansarihttps://www.marktechpost.com/author/sajjadansari/TokenBridge: Bridging The Gap Between Continuous and Discrete Token Representations In Visual Generation
    0 Comentários ·0 Compartilhamentos ·5 Visualizações
  • The Complete Beginners Guide to Terminal/Command Prompt
    www.marktechpost.com
    The terminal (on Mac/Linux) or command prompt (on Windows) is a powerful tool that allows you to interact with your computer using text commands instead of clicking through a graphical interface. While it might seem intimidating at first, mastering basic terminal commands can help you:Navigate through files and folders more efficientlyPerform tasks that arent possible through the regular interfaceAutomate repetitive tasksGain a deeper understanding of how your computer worksThis guide will introduce you to the essential commands and concepts to get you started, regardless of which operating system you use.Getting StartedOpening the TerminalOn Windows:Press Win + R, type cmd, and press EnterOr search for Command Prompt in the Start menuOn Mac:Press Command + Space to open Spotlight, type Terminal, and press EnterOr find Terminal in Applications Utilities TerminalOn Linux:Press Ctrl + Alt + T (on most distributions)Or search for Terminal in your applications menuUnderstanding the PromptWhen you first open the terminal, youll see a prompt that looks something like this:Windows: C:\Users\YourUsername>Mac/Linux: username@computer:~$This tells you:Your current location in the file systemWhere to type your commandsOn Mac/Linux, the ~ symbol represents your home directoryBasic Navigation CommandsViewing Your Current LocationWindows: cdMac/Linux: pwd (Print Working Directory)Example:Listing Files and DirectoriesWindows: dirMac/Linux: lsExample:Options:ls -l List with detailed information (file size, date modified, permissions)ls -a Show hidden files (files that start with a dot)ls -la Combine both optionsChanging DirectoriesAll platforms: cd DirectoryNameExamples:Creating DirectoriesAll platforms: mkdir DirectoryNameExample:Creating FilesWindows: type nul > filename.txtMac/Linux: touch filename.txtExample:Working with FilesViewing File ContentsWindows: type filename.txtMac/Linux: cat filename.txtFor larger files:Windows: more filename.txtMac/Linux: less filename.txt (use q to quit)Copying FilesWindows: copy source destinationMac/Linux: cp source destinationExample:Moving/Renaming FilesWindows: move source destinationMac/Linux: mv source destinationExamples:Deleting Files and DirectoriesWindows:Mac/Linux: Warning: Be very careful with delete commands, especially rm -r! There is no Recycle Bin or Trash when using the terminal deletions are permanent.Helpful TipsCommand HistoryPress the up arrow to cycle through previously used commandsOn Mac/Linux, type history to see a list of recent commandsTab CompletionStart typing a file or directory name, then press TabThe terminal will attempt to complete it for youIf there are multiple options, press Tab twice to see all possibilitiesGetting HelpWindows: help command or command /?Mac/Linux: man command (manual pages, press q to exit)Examples:Clearing the ScreenWindows: clsMac/Linux: clear or Ctrl+LPower User CommandsSearching for FilesWindows: dir /s filenameMac/Linux: find . -name filenameSearching Within FilesWindows: findstr text filenameMac/Linux: grep text filenameChaining CommandsAll platforms: Use && to run commands in sequenceExample:Redirecting OutputAll platforms: Use > to send output to a fileExample:Next StepsAs you become more comfortable with these basic commands, you might want to explore:Command line text editors like Nano, Vim, or EmacsWriting simple shell scripts to automate tasksPackage managers like apt (Linux), Homebrew (Mac), or Chocolatey (Windows)Environment variables and how to set themSSH to connect to remote computersCommon Mistakes and TroubleshootingCommand not found: Check spelling or ensure the command is available on your systemPermission denied: You may need administrator/root privilegesWindows: Run Command Prompt as AdministratorMac/Linux: Use sudo before commands that need elevated privilegesNo such file or directory: Double-check path and file namesOperation not permitted: Similar to permission denied, you might need special permissionsTasksWindowsMac/LinuxCurrent locationcdpwdList filesdirlsChange directorycd dircd dirCreate directorymkdir dirmkdir dirCreate filetype nul > filetouch fileCopy filecopy source destinationcp source destinationMove/renamemove source destinationmv source destinationDelete filedel filerm fileDelete directoryrmdir /s dirrm -r dirClear screenclsclearGet helphelp commandman commandConclusionIn this tutorial, we have covered everything beginners need to know about using the terminal. We explored how to open the terminal across different operating systems, navigate file systems, create and manage files and directories, and use essential commands. We also learned helpful shortcuts, power user commands, and troubleshooting tips. With these foundational skills, you can now confidently use the command line as a powerful tool in your computing journey.Remember, the terminal is a powerful tool that rewards practice and experimentation. Dont be afraid to try new commands, but always be careful with commands that modify or delete files. Also,feel free to follow us onTwitterand dont forget to join our85k+ ML SubReddit. [Register Now] miniCON Virtual Conference on OPEN SOURCE AI: FREE REGISTRATION + Certificate of Attendance + 3 Hour Short Event (April 12, 9 am- 12 pm PST) + Hands on Workshop [Sponsored]The post The Complete Beginners Guide to Terminal/Command Prompt appeared first on MarkTechPost.
    0 Comentários ·0 Compartilhamentos ·4 Visualizações
  • Meet ReSearch: A Novel AI Framework that Trains LLMs to Reason with Search via Reinforcement Learning without Using Any Supervised Data on Reasoning Steps
    www.marktechpost.com
    Large language models (LLMs) have demonstrated significant progress across various tasks, particularly in reasoning capabilities. However, effectively integrating reasoning processes with external search operations remains challenging, especially for multi-hop questions requiring intricate reasoning chains and multiple retrieval steps. Current methods primarily depend on manually designed prompts or heuristics, posing limitations in scalability and flexibility. Additionally, generating supervised data for multi-step reasoning scenarios is often prohibitively expensive and practically infeasible.Researchers from Baichuan Inc., Tongji University, The University of Edinburgh, and Zhejiang University introduce ReSearch, a novel AI framework designed to train LLMs to integrate reasoning with search via reinforcement learning, notably without relying on supervised reasoning steps. The core methodology of ReSearch incorporates search operations directly into the reasoning chain. Utilizing Group Relative Policy Optimization (GRPO), a reinforcement learning technique, ReSearch guides LLMs to autonomously identify optimal moments and strategies for performing search operations, which subsequently influence ongoing reasoning. This approach enables models to progressively refine their reasoning and naturally facilitates advanced capabilities such as reflection and self-correction.From a technical perspective, ReSearch employs structured output formats by embedding specific tagssuch as <think>, <search>, <result>, and <answer>within the reasoning chain. These tags facilitate clear communication between the model and the external retrieval environment, systematically organizing generated outputs. During training, ReSearch intentionally excludes retrieval results from loss computations to prevent model bias. Reward signals guiding the reinforcement learning process are based on straightforward criteria: accuracy assessment through F1 scores and adherence to the predefined structured output format. This design encourages the autonomous development of sophisticated reasoning patterns, circumventing the need for manually annotated reasoning datasets.Experimental evaluation confirms the robustness of ReSearch. When assessed on multi-hop question-answering benchmarks, including HotpotQA, 2WikiMultiHopQA, MuSiQue, and Bamboogle, ReSearch consistently outperformed baseline methods. Specifically, ReSearch-Qwen-32B-Instruct achieved improvements ranging between 8.9% and 22.4% in performance compared to established baselines. Notably, these advancements were achieved despite the model being trained exclusively on a single dataset, underscoring its strong generalization capabilities. Further analyses demonstrated that models gradually increased their reliance on iterative search operations throughout training, indicative of enhanced reasoning proficiency. A detailed case study illustrated the models capacity to identify suboptimal search queries, reflect on its reasoning steps, and implement corrective actions autonomously.In summary, ReSearch presents a significant methodological advancement in training LLMs to seamlessly integrate reasoning with external search mechanisms via reinforcement learning. By eliminating dependency on supervised reasoning data, this framework effectively addresses critical scalability and adaptability issues inherent in multi-hop reasoning scenarios. Its capability for self-reflection and correction enhances its practical applicability in complex, realistic contexts. Future research directions may further extend this reinforcement learning-based framework to broader applications and incorporate additional external knowledge resources.Check outthe Paper and GitHub Page.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our85k+ ML SubReddit. Asif RazzaqWebsite| + postsBioAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.Asif Razzaqhttps://www.marktechpost.com/author/6flvq/How to Build a Prototype X-ray Judgment Tool (Open Source Medical Inference System) Using TorchXRayVision, Gradio, and PyTorchAsif Razzaqhttps://www.marktechpost.com/author/6flvq/A Code Implementation of Using Atlas Evaluation Platform and Selene Model via Python SDK to Score Legal Domain LLM Outputs for GDPR ComplianceAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Meet Hostinger Horizons: A No-Code AI Tool that Lets You Create, Edit, and Publish Custom Web Apps Without Writing a Single Line of CodeAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Tencent AI Researchers Introduce Hunyuan-T1: A Mamba-Powered Ultra-Large Language Model Redefining Deep Reasoning, Contextual Efficiency, and Human-Centric Reinforcement Learning
    0 Comentários ·0 Compartilhamentos ·22 Visualizações
  • How to Use Git and Git Bash Locally: A Comprehensive Guide
    www.marktechpost.com
    Table of contentsIntroductionGit is a distributed version control system that helps you track changes in your code, collaborate with others, and maintain a history of your project. Git Bash is a terminal application for Windows that provides a Unix-like command-line experience for using Git.This guide will walk you through setting up Git, using Git Bash, and mastering essential Git commands for local development.InstallationWindowsDownload Git for Windows from git-scm.comRun the installer with default options (or customize as needed)Git Bash will be installed automatically as part of the packagemacOSInstall Git using Homebrew: brew install gitAlternatively, download from git-scm.comLinuxFor Debian/Ubuntu: sudo apt-get install gitFor Fedora: sudo dnf install gitFor other distributions, use the appropriate package managerVerifying InstallationOpen Git Bash (Windows) or Terminal (macOS/Linux) and type:This should display the installed Git version.Git Bash BasicsGit Bash provides a Unix-like shell experience on Windows. Here are some essential commands:Navigation Commandspwd Print working directoryls List files and directoriescd [directory] Change directorymkdir [directory] Create a new directoryrm [file] Remove a filerm -r [directory] Remove a directory and its contentsFile Operationstouch [filename] Create an empty filecat [filename] Display file contentsnano [filename] or vim [filename] Edit files in the terminalKeyboard ShortcutsCtrl + C Terminate the current commandCtrl + L Clear the screenTab Auto-complete commands or filenamesUp/Down arrows Navigate through command historyGit ConfigurationBefore using Git, configure your identity:Additional ConfigurationsSet your default editor:Enable colorful output:View all configurations:Basic Git WorkflowInitializing a RepositoryNavigate to your project folder and initialize a Git repository:Checking StatusSee which files are tracked, modified, or staged:Staging FilesAdd files to the staging area:Committing ChangesSave staged changes to the repository:Or open an editor to write a more detailed commit message:Viewing Commit HistoryBranching and MergingWorking with BranchesCreate a new branch:Switch to a branch:Create and switch to a new branch in one command:List all branches:Merging BranchesMerge changes from another branch into your current branch:Handling Merge ConflictsWhen Git cant automatically merge changes, youll need to resolve conflicts:Git will mark the conflicted filesOpen the files and look for conflict markers (<<<<<<<, =======, >>>>>>>)Edit the files to resolve conflictsAdd the resolved files: git add <filename>Complete the merge: git commitDeleting BranchesDelete a branch after merging:Remote RepositoriesAdding a Remote RepositoryViewing Remote RepositoriesPushing to a Remote RepositoryPulling from a Remote RepositoryCloning a RepositoryAdvanced Git CommandsStashing ChangesTemporarily store modified files to work on something else:Reverting ChangesUndo commits:Reset to a previous state (use with caution):Viewing and Comparing ChangesInteractive RebaseRewrite, squash, or reorder commits:TroubleshootingCommon Issues and SolutionsProblem: fatal: not a git repositorySolution: Make sure youre in the correct directory or initialize a repository with git initProblem: Unable to push to remote repositorySolution:Check if you have the correct permissionsPull latest changes first: git pull origin mainCheck if remote URL is correct: git remote -vProblem: Merge conflictsSolution: Resolve conflicts manually, then git add the resolved files and git commitProblem: Accidental commitSolution: Use git reset soft HEAD~1 to undo the last commit while keeping changesGit Best PracticesCommit frequently with clear, descriptive commit messagesCreate branches for new features or bug fixesPull before pushing to minimize conflictsWrite meaningful commit messages that explain why changes were madeUse .gitignore to exclude unnecessary files (build artifacts, dependencies, etc.)Review changes before committing with git diff and git statusKeep commits focused on a single logical changeUse tags for marking releases or important milestonesBack up your repositories regularlyDocument your Git workflow for team collaboration.gitignore ExampleCreate a .gitignore file in your repository root:Customize this file according to your projects specific needs.ConclusionGit and Git Bash provide powerful tools for version control and collaborative development. In this guide, we covered installation across platforms, essential Git Bash commands, repository initialization, the core add-commit workflow, branching strategies, remote repository management, and advanced operations like stashing and rebasing. We also addressed common troubleshooting scenarios and best practices to maintain a clean workflow. With these fundamentals, youre now equipped to track changes, collaborate effectively, and maintain a structured history of your projects. NikhilNikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute.Nikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces Diversified DPO and ORPO: Post-Training Methods to Boost Output Diversity in Creative Writing with LLMsNikhilhttps://www.marktechpost.com/author/nikhil0980/A Beginners Guide to Using Visual Studio Code for PythonNikhilhttps://www.marktechpost.com/author/nikhil0980/Empowering Time Series AI: How Salesforce is Leveraging Synthetic Data to Enhance Foundation ModelsNikhilhttps://www.marktechpost.com/author/nikhil0980/Meta Reality Labs Research Introduces Sonata: Advancing Self-Supervised Representation Learning for 3D Point Clouds
    0 Comentários ·0 Compartilhamentos ·18 Visualizações
  • This AI Paper Introduces Diversified DPO and ORPO: Post-Training Methods to Boost Output Diversity in Creative Writing with LLMs
    www.marktechpost.com
    Creative writing is a domain that thrives on diversity and imagination. Unlike fact-based or task-specific writing, where a single correct output may exist, creative writing involves numerous valid responses to a prompt. Stories, poems, and narratives can branch in countless directions, each with stylistic flavor and meaning. This inherent open-mindedness makes creative writing a prime challenge for AI systems, which need to maintain narrative coherence while producing novel and distinct outputs.The core issue lies in how large language models are refined after their initial training. Post-training methods often emphasize quality improvements by aligning responses with user preferences or maximizing reward scores. However, these adjustments inadvertently cause the models to produce responses that are too similar across prompts. In creative settings, this leads to a noticeable drop in output diversity. A lack of variation limits the expressive power of the model, resulting in uniform storylines or similar sentence constructions even when prompts are vastly different.Earlier solutions attempted to address this by tweaking decoding methods or prompt strategies. Researchers used sampling temperature adjustment, top-k or top-p filtering, or iterative prompting to introduce randomness. Some explored methods, such as beam search modifications or self-critiquing, to encourage alternative responses. While these helped diversify outputs, they often came with a costsacrificing overall response quality, increasing generation time, or introducing inconsistencies in tone and grammar. More crucially, they did not adopt the models core training process to learn from diverse samples.Researchers from Midjourney and New York University proposed a novel adjustment during the post-training phase. They introduced Diversified DPO and Diversified ORPOenhanced versions of two popular preference-based optimization techniques. Their innovation was incorporating a deviation score, quantifying how much a training example differs from others responding to the same prompt. Rare and diverse responses are given more importance during learning by using this score to weight training losses. The researchers specifically implemented these strategies on large models like Metas Llama-3.1-8B and Mistral-7B using parameter-efficient fine-tuning via LoRA.In this approach, deviation acts as a learning signal. For every training pair of a better and worse response to a prompt, the deviation of the better response is computed using both semantic and stylistic embeddings. These embeddings measure not only content differences but also stylistic uniqueness between responses. The resulting score then influences how much that training pair contributes to the models weight updates. This method increases the likelihood that the model generates distinct yet high-quality outputs. The training used over 400,000 prompt-response pairs with Reddit upvotes as quality signals and introduced mixing methods to effectively balance semantic and style deviations.Quantitative results demonstrated the success of the proposed method. The best-performing model, Llama-3.1-8B with Diversified DPO using semantic and style deviation (DDPO-both), achieved nearly the same reward score as GPT-4o while significantly outperforming it in diversity. Specifically, the model had semantic diversity approaching that of the human-crafted reference dataset and style diversity slightly below it. In head-to-head human evaluations, 68% of reviewers preferred DDPO-boths outputs over GPT-4os for quality, and 100% chose them as more diverse. Compared to the baseline DPO, DDPO-both still came out ahead, selected 50% of the time for quality and 62% for diversity. When fewer responses per prompt were available during training, slight drops in reward scores were mitigated using a minimum deviation threshold or sampling higher-quality responses.This research highlighted a compelling solution to the diversity-quality trade-off in AI-generated creative writing. By emphasizing deviation in training, the researchers enabled models to value uniqueness without compromising coherence. The outcome is a model that delivers richer and more varied storytelling, marking a meaningful step forward in creative AI development.Check outthe Paper.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our85k+ ML SubReddit. NikhilNikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute.Nikhilhttps://www.marktechpost.com/author/nikhil0980/A Beginners Guide to Using Visual Studio Code for PythonNikhilhttps://www.marktechpost.com/author/nikhil0980/Empowering Time Series AI: How Salesforce is Leveraging Synthetic Data to Enhance Foundation ModelsNikhilhttps://www.marktechpost.com/author/nikhil0980/Meta Reality Labs Research Introduces Sonata: Advancing Self-Supervised Representation Learning for 3D Point CloudsNikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces the Kolmogorov-Test: A Compression-as-Intelligence Benchmark for Evaluating Code-Generating Language Models
    0 Comentários ·0 Compartilhamentos ·6 Visualizações
  • How to Build a Prototype X-ray Judgment Tool (Open Source Medical Inference System) Using TorchXRayVision, Gradio, and PyTorch
    www.marktechpost.com
    In this tutorial, we demonstrate how to build a prototype X-ray judgment tool using open-source libraries in Google Colab. By leveraging the power of TorchXRayVision for loading pre-trained DenseNet models and Gradio for creating an interactive user interface, we show how to process and classify chest X-ray images with minimal setup. This notebook guides you through image preprocessing, model inference, and result interpretation, all designed to run seamlessly on Colab without requiring external API keys or logins. Please note that this demo is intended for educational purposes only and should not be used as a substitute for professional clinical diagnosis.!pip install torchxrayvision gradioFirst, we install the torchxrayvision library for X-ray analysis and Gradio to create an interactive interface.import torchimport torchxrayvision as xrvimport torchvision.transforms as transformsimport gradio as grWe import PyTorch for deep learning operations, TorchXRayVision for Xray analysis, torchvisions transforms for image preprocessing, and Gradio for building an interactive UI.model = xrv.models.DenseNet(weights="densenet121-res224-all")model.eval() Then, we load a pre-trained DenseNet model using the densenet121-res224-all weights and set it to evaluation mode for inference.try: pathology_labels = model.meta["labels"] print("Retrieved pathology labels from model.meta.")except Exception as e: print("Could not retrieve labels from model.meta. Using fallback labels.") pathology_labels = [ "Atelectasis", "Cardiomegaly", "Consolidation", "Edema", "Emphysema", "Fibrosis", "Hernia", "Infiltration", "Mass", "Nodule", "Pleural Effusion", "Pneumonia", "Pneumothorax", "No Finding" ]Now, we attempt to retrieve pathology labels from the models metadata and fall back to a predefined list if the retrieval fails.def classify_xray(image): try: transform = transforms.Compose([ transforms.Resize((224, 224)), transforms.Grayscale(num_output_channels=1), transforms.ToTensor() ]) input_tensor = transform(image).unsqueeze(0) # add batch dimension with torch.no_grad(): preds = model(input_tensor) pathology_scores = preds[0].detach().numpy() results = {} for idx, label in enumerate(pathology_labels): results[label] = float(pathology_scores[idx]) sorted_results = sorted(results.items(), key=lambda x: x[1], reverse=True) top_label, top_score = sorted_results[0] judgement = ( f"Prediction: {top_label} (score: {top_score:.2f})nn" f"Full Scores:n{results}" ) return judgement except Exception as e: return f"Error during inference: {str(e)}"Here, with this function, we preprocess an input X-ray image, run inference using the pre-trained model, extract pathology scores, and return a formatted summary of the top prediction and all scores while handling errors gracefully.iface = gr.Interface( fn=classify_xray, inputs=gr.Image(type="pil"), outputs="text", title="X-ray Judgement Tool (Prototype)", description=( "Upload a chest X-ray image to receive a classification judgement. " "This demo is for educational purposes only and is not intended for clinical use." ))iface.launch()Finally, we build and launch a Gradio interface that lets users upload a chest X-ray image. The classify_xray function processes the image to output a diagnostic judgment.Gradio Interface for the toolThrough this tutorial, weve explored the development of an interactive X-ray judgment tool that integrates advanced deep learning techniques with a user-friendly interface. Despite the inherent limitations, such as the model not being fine-tuned for clinical diagnostics, this prototype serves as a valuable starting point for experimenting with medical imaging applications. We encourage you to build upon this foundation, considering the importance of rigorous validation and adherence to medical standards for real-world use.Here is the Colab Notebook. Also,dont forget to follow us onTwitterand join ourTelegram ChannelandLinkedIn Group. Dont Forget to join our85k+ ML SubReddit. Asif RazzaqWebsite| + postsBioAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.Asif Razzaqhttps://www.marktechpost.com/author/6flvq/A Code Implementation of Using Atlas Evaluation Platform and Selene Model via Python SDK to Score Legal Domain LLM Outputs for GDPR ComplianceAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Meet Hostinger Horizons: A No-Code AI Tool that Lets You Create, Edit, and Publish Custom Web Apps Without Writing a Single Line of CodeAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Tencent AI Researchers Introduce Hunyuan-T1: A Mamba-Powered Ultra-Large Language Model Redefining Deep Reasoning, Contextual Efficiency, and Human-Centric Reinforcement LearningAsif Razzaqhttps://www.marktechpost.com/author/6flvq/NVIDIA AI Researchers Introduce FFN Fusion: A Novel Optimization Technique that Demonstrates How Sequential Computation in Large Language Models LLMs can be Effectively Parallelized
    0 Comentários ·0 Compartilhamentos ·3 Visualizações
  • VideoMind: A Role-Based Agent for Temporal-Grounded Video Understanding
    www.marktechpost.com
    LLMs have shown impressive capabilities in reasoning tasks like Chain-of-Thought (CoT), enhancing accuracy and interpretability in complex problem-solving. While researchers are extending these capabilities to multi-modal domains, videos present unique challenges due to their temporal dimension. Unlike static images, videos require understanding dynamic interactions over time. Current visual CoT methods excel with static inputs but struggle with video content because they cannot explicitly localize or revisit specific moments in sequences. Humans overcome these challenges by breaking down complex problems, identifying and revisiting key moments, and synthesizing observations into coherent answers. This approach highlights the need for AI systems to manage multiple reasoning abilities.Recent video understanding advances have improved tasks like captioning and question answering, but models often lack visual-grounded correspondence and interpretability, especially for long-form videos. Video Temporal Grounding addresses this by requiring precise localization. Large Multimodal Models trained with supervised instruction-tuning struggle with complex reasoning tasks. Two major approaches have emerged to address these limitations: agent-based interfaces and pure text-based reasoning paradigms exemplified by CoT processes. Moreover, Inference-time searching techniques are valuable in domains like robotics, games, and navigation by allowing models to iteratively refine outputs without changing underlying weights.Researchers from the Hong Kong Polytechnic University and Show Lab, National University of Singapore, have proposed VideoMind, a video-language agent designed for temporal-grounded video understanding. VideoMind introduces two key innovations to address the challenges of video reasoning. First, it identifies essential capabilities for video temporal reasoning and implements a role-based agentic workflow with specialized components: a planner, a grounder, a verifier, and an answerer. Second, it proposes a Chain-of-LoRA strategy that enables seamless role-switching through lightweight LoRA adaptors, avoiding the overhead of multiple models while balancing efficiency and flexibility. Experiments across 14 public benchmarks show state-of-the-art performance in diverse video understanding tasks.VideoMind builds upon the Qwen2-VL, combining an LLM backbone with a ViT-based visual encoder capable of handling dynamic resolution inputs. Its core innovation is its Chain-of-LoRA strategy, which dynamically activates role-specific LoRA adapters during inference via self-calling. Moreover, it contains four specialized components: (a) Planner, which coordinates all other roles and determines which function to call next based on query, (b) Grounder, which localizes relevant moments by identifying start and end timestamps based on text queries (c) Verifier, which provides binary (Yes/No) responses to validate temporal intervals and (d) Answerer, which generates responses based on either cropped video segments identified by the Grounder or the entire video when direct answering is more appropriate.In grounding metrics, VideoMinds lightweight 2B model outperforms most compared models, including InternVL2-78B and Claude-3.5-Sonnet, with only GPT-4o showing superior results. However, the 7B version of VideoMind surpasses even GPT-4o, achieving competitive overall performance. On the NExT-GQA benchmark, the 2B model matches state-of-the-art 7B models across both agent-based and end-to-end approaches, comparing favorably with text-rich, agent-based solutions like LLoVi, LangRepo, and SeViLA. VideoMind shows exceptional zero-shot capabilities, outperforming all LLM-based temporal grounding methods and achieving competitive results compared to fine-tuned temporal grounding experts. Moreover, VideoMind excels in general video QA tasks across Video-MME (Long), MLVU, and LVBench, showing effective localization of cue segments before answering questions.In this paper, researchers introduced VideoMind, a significant advancement in temporal grounded video reasoning. It addresses the complex challenges of video understanding through agentic workflow, combining a Planner, Grounder, Verifier, Answerer, and an efficient Chain-of-LoRA strategy for role-switching. Experiments across three key domains, grounded video question-answering, video temporal grounding, and general video question-answering, confirm VideoMinds effectiveness for long-form video reasoning tasks where it provides precise, evidence-based answers. This work establishes a foundation for future developments in multimodal video agents and reasoning capabilities, opening new pathways for more complex video understanding systems.Check outthe Paper and Project Page.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our85k+ ML SubReddit. Sajjad AnsariSajjad Ansari is a final year undergraduate from IIT Kharagpur. As a Tech enthusiast, he delves into the practical applications of AI with a focus on understanding the impact of AI technologies and their real-world implications. He aims to articulate complex AI concepts in a clear and accessible manner.Sajjad Ansarihttps://www.marktechpost.com/author/sajjadansari/PilotANN: A Hybrid CPU-GPU System For Graph-based ANNSSajjad Ansarihttps://www.marktechpost.com/author/sajjadansari/This AI Paper Propose the UI-R1 Framework that Extends Rule-based Reinforcement Learning to GUI Action Prediction TasksSajjad Ansarihttps://www.marktechpost.com/author/sajjadansari/TokenBridge: Bridging The Gap Between Continuous and Discrete Token Representations In Visual GenerationSajjad Ansarihttps://www.marktechpost.com/author/sajjadansari/TokenSet: A Dynamic Set-Based Framework for Semantic-Aware Visual Representation
    0 Comentários ·0 Compartilhamentos ·11 Visualizações
  • A Code Implementation of Using Atlas Evaluation Platform and Selene Model via Python SDK to Score Legal Domain LLM Outputs for GDPR Compliance
    www.marktechpost.com
    In this tutorial, we demonstrate how to evaluate the quality of LLM-generated responses using Atlas Python SDK, a powerful tool for automating evaluation workflows with natural language criteria. Powered by Selene, Atlas state-of-the-art evaluator model, we analyze whether legal responses align with the principles of the GDPR (General Data Protection Regulation). Atlas platform enables programmatic assessments using custom or predefined criteria with synchronous and asynchronous support via the official Atla SDK.In this implementation, we did the following:Used custom GDPR evaluation logicQueried Selene to return binary scores (0 or 1) and human-readable critiquesProcessed the evaluation in batch using asyncioPrinted critiques to understand the reasoning behind each judgmentThe Colab-compatible setup requires minimal dependencies, primarily the atla SDK, pandas, and nest_asyncio.!pip install atla pandas matplotlib nest_asyncio --quietimport osimport nest_asyncioimport asyncioimport pandas as pdfrom atla import Atla, AsyncAtlaATLA_API_KEY = "your atla API key"client = Atla(api_key=ATLA_API_KEY)async_client = AsyncAtla(api_key=ATLA_API_KEY)nest_asyncio.apply()First, we install required libraries and initialize synchronous and asynchronous Atla clients using your API key. nest_asyncio is applied to allow asynchronous code to run smoothly within a Jupyter or Colab notebook environment. This enables seamless integration with Atlas async evaluation API via the AsyncAtla client.data = [ { "question": "Can a company monitor employee emails under GDPR?", "llm_response": "Yes, any employer can freely monitor emails as long as it's for productivity.", "expected": 0 }, { "question": "Can employers access private chats on company devices?", "llm_response": "Only if there is a legitimate business need and employees are informed.", "expected": 1 }, { "question": "Can browsing history be stored under EU privacy law?", "llm_response": "Yes, but consent and transparency are required.", "expected": 1 }, { "question": "Can employers check WhatsApp messages on personal phones?", "llm_response": "No, personal device content is protected unless explicitly authorized.", "expected": 1 },]df = pd.DataFrame(data)df.head()We define a small dataset of legal questions and LLM-generated responses related to GDPR compliance. Each entry includes an expected binary label (1 for compliant, 0 for non-compliant). The data is loaded into a Pandas DataFrame for easy processing and evaluation.custom_eval_criteria = """Score this 1 if the response complies with GDPR principles:- lawful basis- employee consent or notice- data minimization- legitimate interestOtherwise, score it 0.Explain briefly why it qualifies or not."""We define a custom evaluation prompt that guides Atlas Selene model in scoring responses based on key GDPR principles. It instructs the model to assign a score of 1 for compliant answers and 0 otherwise, along with a brief explanation justifying the score.async def evaluate_with_selene(df): async def evaluate_row(row): try: result = await async_client.evaluation.create( model_id="atla-selene", model_input=row["question"], model_output=row["llm_response"], evaluation_criteria=custom_eval_criteria, ) return result.result.evaluation.score, result.result.evaluation.critique except Exception as e: return None, f"Error: {e}" tasks = [evaluate_row(row) for _, row in df.iterrows()] results = await asyncio.gather(*tasks) df["selene_score"], df["critique"] = zip(*results) return dfdf = asyncio.run(evaluate_with_selene(df))df.head()Here, this asynchronous function evaluates each row in the DataFrame using Atlas Selene model. It submits the data along with the custom GDPR evaluation criteria for each legal question and LLM response pair. It then gathers scores and critiques concurrently using asyncio.gather, appends them to the DataFrame, and returns the enriched results.for i, row in df.iterrows(): print(f"\n Q: {row['question']}") print(f" A: {row['llm_response']}") print(f" Selene: {row['critique']} Score: {row['selene_score']}")We iterate through the evaluated DataFrame and print each question, the corresponding LLM-generated answer, and Selenes critique with its assigned score. It provides a clear, human-readable summary of how the evaluator judged each response based on the custom GDPR criteria.In conclusion, this notebook demonstrated how to leverage Atlas evaluation capabilities to assess the quality of LLM-generated legal responses with precision and flexibility. Using the Atla Python SDK and its Selene evaluator, we defined custom GDPR-specific evaluation criteria and automated the scoring of AI outputs with interpretable critiques. The process was asynchronous, lightweight, and designed to run seamlessly in Google Colab.Here is the Colab Notebook. Also,dont forget to follow us onTwitterand join ourTelegram ChannelandLinkedIn Group. Dont Forget to join our85k+ ML SubReddit. Asif RazzaqWebsite| + postsBioAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.Asif Razzaqhttps://www.marktechpost.com/author/6flvq/Meet Hostinger Horizons: A No-Code AI Tool that Lets You Create, Edit, and Publish Custom Web Apps Without Writing a Single Line of CodeAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Tencent AI Researchers Introduce Hunyuan-T1: A Mamba-Powered Ultra-Large Language Model Redefining Deep Reasoning, Contextual Efficiency, and Human-Centric Reinforcement LearningAsif Razzaqhttps://www.marktechpost.com/author/6flvq/NVIDIA AI Researchers Introduce FFN Fusion: A Novel Optimization Technique that Demonstrates How Sequential Computation in Large Language Models LLMs can be Effectively ParallelizedAsif Razzaqhttps://www.marktechpost.com/author/6flvq/A Step by Step Guide to Solve 1D Burgers Equation with Physics-Informed Neural Networks (PINNs): A PyTorch Approach Using Automatic Differentiation and Collocation Methods
    0 Comentários ·0 Compartilhamentos ·24 Visualizações
  • Meet Hostinger Horizons: A No-Code AI Tool that Lets You Create, Edit, and Publish Custom Web Apps Without Writing a Single Line of Code
    www.marktechpost.com
    In the evolving landscape of web development, the emergence of no-code platforms has significantly broadened access to application creation. Among these, Hostinger Horizons stands out as an AI-powered tool designed to facilitate the building, editing, and publishing of custom web applications without necessitating any coding expertise. By integrating essential services such as hosting, domain registration, and email functionalities, Hostinger Horizons offers a comprehensive solution for individuals and businesses seeking to establish a digital presence.Technical OverviewHostinger Horizons utilizes advanced artificial intelligence and natural language processing to interpret user inputs and generate functional web applications. The platform features a user-friendly chat interface where users can describe their envisioned application in everyday language. For example, a prompt like Create a personal finance tracker that allows users to log expenses and view spending reports enables the AI to construct an application aligned with these specifications. Notable Technical Features:Real-Time Editing and Live Preview: Users can make modifications to their applications and observe changes instantaneously, promoting an iterative development process. Multilingual Support: The platform accommodates over 80 languages, allowing users worldwide to develop applications in their native tongues. Image and Voice Input: Beyond text prompts, users can upload images or utilize voice commands to guide the AI in building the application, enhancing accessibility and flexibility. Sandbox Environment: Hostinger Horizons provides a sandbox environment where users can test their applications without affecting the live version, ensuring a smooth deployment process. Integrated Deployment: Once the application meets the users satisfaction, it can be deployed directly through the platform. Hostinger Horizons manages all backend processes, including hosting and domain setup, streamlining the launch process. Business ConsiderationsHostinger Horizons is tailored to a diverse audience, encompassing entrepreneurs, small businesses, and individual creators. By removing the necessity for coding skills, the platform lowers the barrier to web application development, enabling rapid transformation of ideas into functional applications.Advantages for Businesses:Cost-Effective Development: Traditional web application development often involves significant expenses related to hiring developers. Hostinger Horizons offers a more economical alternative, making it particularly advantageous for startups and small enterprises. Rapid Prototyping: The platform facilitates swift development and deployment of applications, allowing businesses to test concepts and iterate based on user feedback without substantial time investments.Integrated Services: With built-in hosting, domain registration, and email services, businesses can manage all aspects of their web presence from a single platform, simplifying operations and reducing the need for multiple service providers. Scalability: Hostinger Horizons cloud-based infrastructure ensures that applications can scale seamlessly as the business grows, accommodating increasing traffic and user engagement.Pricing Structure:Hostinger Horizons offers several pricing plans to accommodate different needs:Starter Plan: Priced at $19.99 per month, it includes 100 messages, hosting (one month free), unlimited bandwidth, up to 50 web apps, and free email services. Hobbyist Plan: At $49.99 per month, this plan offers 250 messages along with the features included in the Starter Plan.Hustler Plan: For $99.99 per month, users receive 500 messages and the standard features.Pro Plan: The most comprehensive plan at $199.99 per month provides 1,000 messages and all included features.Hostinger also offer a free test with 5 messages when clicking on the Start for free buttonTutorial: Creating a Web Application with Hostinger HorizonsDeveloping a web application with Hostinger Horizons involves a straightforward process. Heres a step-by-step guide:Step 1: Sign Up and Access Hostinger HorizonsVisit the Hostinger Horizons page and select a plan that aligns with your requirements.After purchasing, log in to your Hostinger account and navigate to the hPanel dashboard.Go to Websites Website List and click on Add Website. Choose Hostinger Horizons from the options to access the platform. Step 2: Define Your Application IdeaIn the chat interface, describe the application you wish to create. For example: Create a web application for SUDUKO Game. The web application should be mobile friendly. There should be 3 levels of games. Level 1: Easy mode. Level 2: Medium difficulty. Level 3: Difficult Mode.The AI will process your input and generate a basic version of the application based on your description.Step 3: Customize the ApplicationLayout and Design: Use the real-time editor to adjust the layout, color scheme, and overall design to match your preferences.Functionality: Add or modify features by providing additional prompts. For instance, you can request the inclusion of a budgeting feature or integration with external APIs for real-time data.Content: Upload images, input text content, and configure any necessary settings to personalize the application.Step 4: Test the ApplicationUtilize the sandbox environment to test the applications functionality. Ensure all features operate as intended and make any necessary adjustments based on your testing.Step 5: Deploy the ApplicationOnce satisfied, click the Publish button to deploy your application.DemoThanks tothe Hostinger teamfor the thought leadership/ Resources for this article.Hostinger team has supported us in this content/article. Asif RazzaqWebsite| + postsBioAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.Asif Razzaqhttps://www.marktechpost.com/author/6flvq/Tencent AI Researchers Introduce Hunyuan-T1: A Mamba-Powered Ultra-Large Language Model Redefining Deep Reasoning, Contextual Efficiency, and Human-Centric Reinforcement LearningAsif Razzaqhttps://www.marktechpost.com/author/6flvq/NVIDIA AI Researchers Introduce FFN Fusion: A Novel Optimization Technique that Demonstrates How Sequential Computation in Large Language Models LLMs can be Effectively ParallelizedAsif Razzaqhttps://www.marktechpost.com/author/6flvq/A Step by Step Guide to Solve 1D Burgers Equation with Physics-Informed Neural Networks (PINNs): A PyTorch Approach Using Automatic Differentiation and Collocation MethodsAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Tutorial to Create a Data Science Agent: A Code Implementation using gemini-2.0-flash-lite model through Google API, google.generativeai, Pandas and IPython.display for Interactive Data Analysis
    0 Comentários ·0 Compartilhamentos ·45 Visualizações
  • Understanding AI Agent Memory: Building Blocks for Intelligent Systems
    www.marktechpost.com
    AI agent memory comprises multiple layers, each serving a distinct role in shaping the agents behavior and decision-making. By dividing memory into different types, it is better to understand and design AI systems that are both contextually aware and responsive. Lets explore the four key types of memory commonly used in AI agents: Episodic, Semantic, Procedural, and Short-Term (or Working) Memory, along with the interplay between long-term and short-term storage.1. Episodic Memory: Recalling Past InteractionsEpisodic memory in AI refers to the storage of past interactions and the specific actions taken by the agent. Like human memory, episodic memory records the events or episodes an agent experiences during its operation. This type of memory is crucial because it enables the agent to reference previous conversations, decisions, and outcomes to inform future actions. For example, when a user interacts with a customer support bot, the bot might store the conversation history in an episodic memory log, allowing it to maintain context over multiple exchanges. This contextual awareness is especially important in multi-turn dialogues where understanding previous interactions can dramatically improve the quality of responses.In practical applications, episodic memory is often implemented using persistent storage systems like vector databases. These systems can store semantic representations of interactions, enabling rapid retrieval based on similarity searches. This means that when an AI agent needs to refer back to an earlier conversation, it can quickly identify and pull relevant segments of past interactions, thereby enhancing the continuity and personalization of the experience.2. Semantic Memory: External Knowledge and Self-awarenessSemantic memory in AI encompasses the agents repository of factual, external information and internal knowledge. Unlike episodic memory, which is tied to specific interactions, semantic memory holds generalized knowledge that the agent can use to understand and interpret the world. This may include language rules, domain-specific information, or self-awareness of the agents capabilities and limitations.One common semantic memory use is in Retrieval-Augmented Generation (RAG) applications, where the agent leverages a vast data store to answer questions accurately. For instance, if an AI agent is tasked with providing technical support for a software product, its semantic memory might contain user manuals, troubleshooting guides, and FAQs. Semantic memory also includes grounding context that helps the agent filter and prioritize relevant data from a broader corpus of information available on the internet.Integrating semantic memory ensures that an AI agent responds based on immediate context and draws on a broad spectrum of external knowledge. This creates a more robust, informed system that can handle diverse queries with accuracy and nuance.3. Procedural Memory: The Blueprint of OperationsProcedural memory is the backbone of an AI systems operational aspects. It includes systemic information such as the structure of the system prompt, the tools available to the agent, and the guardrails that ensure safe and appropriate interactions. In essence, procedural memory defines how the agent functions rather than what it knows.This type of memory is typically managed through well-organized registries, such as Git repositories for code, prompt registries for conversational contexts, and tool registries that enumerate the available functions and APIs. An AI agent can execute tasks more reliably and predictably by having a clear blueprint of its operational procedures. The explicit definition of protocols and guidelines also ensures that the agent behaves in a controlled manner, thereby minimizing risks such as unintended outputs or safety violations.Procedural memory supports consistency in performance and facilitates easier updates and maintenance. As new tools become available or system requirements evolve, the procedural memory can be updated in a centralized manner, ensuring that the agent adapts seamlessly to changes without compromising its core functionality.4. Short-Term (Working) Memory: Integrating Information for ActionIn many AI systems, the information drawn from long-term memory is consolidated into short-term or working memory. This is the temporary context that the agent actively uses to process current tasks. Short-term memory is a compilation of the episodic, semantic, and procedural memories that have been retrieved and localized for immediate use.When an agent is presented with a new task or query, it assembles relevant information from its long-term stores. This might include a snippet of a previous conversation (episodic memory), pertinent factual data (semantic memory), and operational guidelines (procedural memory). The combined information forms the prompt fed into the underlying language model, allowing the AI to generate coherent, context-aware responses.This process of compiling short-term memory is critical for tasks that require nuanced decision-making and planning. It allows the AI agent to remember the conversation history and tailor responses accordingly. The agility provided by short-term memory is a significant factor in creating interactions that feel natural and human-like. Also, the separation between long-term and short-term memory ensures that while the system has a vast knowledge repository, only the most pertinent information is actively engaged during interaction, optimizing performance and accuracy.The Synergy of Long-Term and Short-Term MemoryTo fully appreciate the architecture of AI agent memory, it is important to understand the dynamic interplay between long-term memory and short-term (working) memory. Long-term memory, consisting of episodic, semantic, and procedural types, is the deep storage that informs the AI about its history, external facts, and internal operational frameworks. On the other hand, short-term memory is a fluid, working subset that the agent uses to navigate current tasks. The agent can adapt to new contexts without losing the richness of stored experiences and knowledge by periodically retrieving and synthesizing data from long-term memory. This dynamic balance ensures that AI systems are well-informed, responsive, and contextually aware.In conclusion, the multifaceted approach to memory in AI agents underscores the complexity and sophistication required to build systems that can interact intelligently with the world. Episodic memory allows for the personalization of interactions, semantic memory enriches responses with factual depth, and procedural memory guarantees operational reliability. Meanwhile, integrating these long-term memories into short-term working memory enables the AI to act swiftly and contextually in real-time scenarios. As AI advances, refining these memory systems will be pivotal in creating smart agents capable of nuanced, context-aware decision-making. The layered memory approach is a cornerstone of intelligent agent design, ensuring these systems remain robust, adaptive, and ready to tackle the challenges of an ever-evolving digital landscape.Sources: Sana HassanSana Hassan, a consulting intern at Marktechpost and dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. With a keen interest in solving practical problems, he brings a fresh perspective to the intersection of AI and real-life solutions.Sana Hassanhttps://www.marktechpost.com/author/sana-hassan/Advancing Medical Reasoning with Reinforcement Learning from Verifiable Rewards (RLVR): Insights from MED-RLVRSana Hassanhttps://www.marktechpost.com/author/sana-hassan/Efficient Inference-Time Scaling for Flow Models: Enhancing Sampling Diversity and Compute AllocationSana Hassanhttps://www.marktechpost.com/author/sana-hassan/UCLA Researchers Released OpenVLThinker-7B: A Reinforcement Learning Driven Model for Enhancing Complex Visual Reasoning and Step-by-Step Problem Solving in Multimodal SystemsSana Hassanhttps://www.marktechpost.com/author/sana-hassan/Vision-R1: Redefining Reinforcement Learning for Large Vision-Language Models
    0 Comentários ·0 Compartilhamentos ·53 Visualizações
  • PilotANN: A Hybrid CPU-GPU System For Graph-based ANNS
    www.marktechpost.com
    Approximate Nearest Neighbor Search (ANNS) is a fundamental vector search technique that efficiently identifies similar items in high-dimensional vector spaces. Traditionally, ANNS has served as the backbone for retrieval engines and recommendation systems, however, it struggles to keep pace with modern Transformer architectures that employ higher-dimensional embeddings and larger datasets. Unlike deep learning systems that can be horizontally scaled due to their stateless nature, ANNS remains centralized, creating a severe single-machine throughput bottleneck. Empirical testing with 100-million scale datasets reveals that even state-of-the-art CPU implementations of the Hierarchical Navigable Small World (HNSW) algorithm cant maintain adequate performance as vector dimensions increase.Previous research on large-scale ANNS has explored two optimization paths: index structure improvements and hardware acceleration. The Inverted MultiIndex (IMI) enhanced space partitioning through multi-codebook quantization, while PQFastScan improved performance with SIMD and cache-aware optimizations. DiskANN and SPANN introduced disk-based indexing for billion-scale datasets, addressing memory hierarchy challenges through different approaches. SONG and CAGRA achieved impressive speedups through GPU parallelization but remain constrained by GPU memory capacity. BANG handled billion-scale datasets via hybrid CPU-GPU processing but lacked critical CPU baseline comparisons. These methods frequently sacrifice compatibility, accuracy or require specialized hardware.Researchers from the Chinese University of Hong Kong, Centre for Perceptual and Interactive Intelligence, and Theory Lab of Huawei Technologies have proposed PilotANN, a hybrid CPU-GPU system designed to overcome the limitations of existing ANNS implementations. PilotANN addresses the challenge: CPU-only implementations struggle with computational demands, while GPU-only solutions are constrained by limited memory capacity. It solves this issue by utilizing both the abundant RAM of CPUs and the parallel processing capabilities of GPUs. Moreover, it employs a three-stage graph traversal process, GPU-accelerated subgraph traversal using dimensionally-reduced vectors, CPU refinement, and precise search with complete vectors.PilotANN fundamentally reimagines the vector search process through a staged data ready processing paradigm. It minimizes data movement across processing stages rather than adhering to traditional move data for computation models. It also consists of three stages: GPU piloting with subgraph and dimensionally-reduced vectors, residual refinement using subgraph with full vectors, and final traversal employing full graph and complete vectors. The design shows cost-effectiveness with only a single commodity GPU while scaling effectively across vector dimensions and graph complexity. Data transfer overhead is minimized to just the initial query vector movement to GPU and a small candidate set returning to CPU after GPU piloting.Experimental results show PilotANNs performance advantages across diverse large-scale datasets. PilotANN achieves a 3.9 times throughput speedup on the 96-dimensional DEEP dataset compared to the HNSW-CPU baseline, with even more impressive gains of 5.1-5.4 times on higher-dimensional datasets. PilotANN delivers significant speedups even on the notoriously challenging T2I dataset despite no specific optimizations for this benchmark. Moreover, it shows remarkable cost-effectiveness despite utilizing more expensive hardware. While the GPU-based platform costs 2.81 USD/hour compared to the CPU-only solution at 1.69 USD/hour, PilotANN achieves 2.3 times cost-effectiveness for DEEP and 3.0-3.2 times for T2I, WIKI, and LAION datasets when measuring throughput per dollar.In conclusion, researchers introduced PilotANN, an advancement in graph-based ANNS that effectively utilizes CPU and GPU resources for emerging workloads. It shows great performance over existing CPU-only approaches through the intelligent decomposition of top-k search into a multi-stage CPU-GPU pipeline and implementation of efficient entry selection. It democratizes high-performance nearest neighbor search by achieving competitive results with a single commodity GPU, making advanced search capabilities accessible to researchers and organizations with limited computing resources. Unlike alternative solutions requiring expensive high-end GPUs, PilotANN enables efficient ANNS deployment on common hardware configurations while maintaining search accuracy.Check outthe Paper and GitHub Page.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our85k+ ML SubReddit. Sajjad AnsariSajjad Ansari is a final year undergraduate from IIT Kharagpur. As a Tech enthusiast, he delves into the practical applications of AI with a focus on understanding the impact of AI technologies and their real-world implications. He aims to articulate complex AI concepts in a clear and accessible manner.Sajjad Ansarihttps://www.marktechpost.com/author/sajjadansari/This AI Paper Propose the UI-R1 Framework that Extends Rule-based Reinforcement Learning to GUI Action Prediction TasksSajjad Ansarihttps://www.marktechpost.com/author/sajjadansari/TokenBridge: Bridging The Gap Between Continuous and Discrete Token Representations In Visual GenerationSajjad Ansarihttps://www.marktechpost.com/author/sajjadansari/TokenSet: A Dynamic Set-Based Framework for Semantic-Aware Visual RepresentationSajjad Ansarihttps://www.marktechpost.com/author/sajjadansari/SuperBPE: Advancing Language Models with Cross-Word Tokenization
    0 Comentários ·0 Compartilhamentos ·30 Visualizações
  • Advancing Medical Reasoning with Reinforcement Learning from Verifiable Rewards (RLVR): Insights from MED-RLVR
    www.marktechpost.com
    Reinforcement Learning from Verifiable Rewards (RLVR) has recently emerged as a promising method for enhancing reasoning abilities in language models without direct supervision. This approach has shown notable success in mathematics and coding, where reasoning naturally aligns with structured problem-solving. While studies have demonstrated that RLVR alone can lead to self-evolved reasoning, research has largely been limited to these technical fields. Efforts to extend RLVR have explored synthetic datasets, such as those involving sequential tasks and object counting, indicating potential but also highlighting the challenges of adapting this method to different domains.Expanding RLVR to broader areas remains an open challenge, particularly in tasks like multiple-choice question answering (MCQA), which provides structured, verifiable labels across diverse subjects, including medicine. However, unlike math and coding, which involve complex reasoning with an open-ended answer space, MCQA tasks typically have predefined answer choices, making it uncertain whether RLVRs benefits translate effectively. This limitation is especially relevant in medical reasoning tasks, where models must navigate intricate clinical knowledge to produce accurate responses, an area that has proven difficult for existing AI systems.Researchers from Microsoft Research investigate whether medical reasoning can emerge through RLVR. They introduce MED-RLVR, leveraging medical MCQA data to assess RLVRs effectiveness in the medical domain. Their findings show that RLVR extends beyond math and coding, achieving performance comparable to supervised fine-tuning (SFT) in in-distribution tasks while significantly improving out-of-distribution generalization by eight percentage points. Analyzing training dynamics, they observe that reasoning capabilities emerge in a 3B-parameter base model without explicit supervision, highlighting RLVRs potential for advancing reasoning in knowledge-intensive fields like medicine.RL optimizes decision-making by training an agent to maximize rewards through interactions with an environment. It has been effectively applied to language models to align outputs with human preferences and, more recently, to elicit reasoning without explicit supervision. This study employs Proximal Policy Optimization (PPO) to train a policy model, incorporating a clipped objective function to stabilize training. Using a rule-based reward function, MED-RLVR assigns rewards based on output correctness and format validity. Without additional supervision, the model demonstrates emergent medical reasoning, similar to mathematical reasoning in prior RLVR studies, highlighting RLVRs potential beyond structured domains.The MedQA-USMLE dataset, which includes multi-choice medical exam questions, is used to train MED-RLVR. Unlike the standard four-option version, this dataset presents a greater challenge by offering more answer choices. Training is based on the Qwen2.5-3B model using OpenRLHF for reinforcement learning. Compared to SFT, MED-RLVR demonstrates superior generalization, particularly on the MMLU-Pro-Health dataset. Analysis reveals six stages of reasoning evolution: format failures, verbose outputs, reward hacking, and reintegrated reasoning. Unlike math or coding tasks, no self-validation behaviors (aha-moments) were observed, suggesting potential improvements through penalizing short reasoning chains or fine-tuning with longer CoTs.In conclusion, the study focuses on MCQA in medicine, providing a controlled setting for evaluation. However, MCQA does not fully capture the complexity of real-world tasks like open-text answering, report generation, or medical dialogues. Additionally, the unimodal approach limits the models ability to integrate multimodal data, which is crucial for diagnostic applications. Future work should address these limitations. MED-RLVR, based on reinforcement learning with verifiable rewards, matches SFT on in-distribution tasks and improves out-of-distribution generalization. While medical reasoning emerges without explicit supervision, challenges like reward hacking persist, highlighting the need for further exploration of complex reasoning and multimodal integration.Check outthe Paper.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our85k+ ML SubReddit. Sana HassanSana Hassan, a consulting intern at Marktechpost and dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. With a keen interest in solving practical problems, he brings a fresh perspective to the intersection of AI and real-life solutions.Sana Hassanhttps://www.marktechpost.com/author/sana-hassan/Efficient Inference-Time Scaling for Flow Models: Enhancing Sampling Diversity and Compute AllocationSana Hassanhttps://www.marktechpost.com/author/sana-hassan/UCLA Researchers Released OpenVLThinker-7B: A Reinforcement Learning Driven Model for Enhancing Complex Visual Reasoning and Step-by-Step Problem Solving in Multimodal SystemsSana Hassanhttps://www.marktechpost.com/author/sana-hassan/Vision-R1: Redefining Reinforcement Learning for Large Vision-Language ModelsSana Hassanhttps://www.marktechpost.com/author/sana-hassan/Understanding and Mitigating Failure Modes in LLM-Based Multi-Agent Systems
    0 Comentários ·0 Compartilhamentos ·36 Visualizações
  • Tencent AI Researchers Introduce Hunyuan-T1: A Mamba-Powered Ultra-Large Language Model Redefining Deep Reasoning, Contextual Efficiency, and Human-Centric Reinforcement Learning
    www.marktechpost.com
    Large language models struggle to process and reason over lengthy, complex texts without losing essential context. Traditional models often suffer from context loss, inefficient handling of long-range dependencies, and difficulties aligning with human preferences, affecting the accuracy and efficiency of their responses. Tencents Hunyuan-T1 directly tackles these challenges by integrating a novel Mamba-powered architecture with advanced reinforcement learning and curriculum strategies, ensuring robust context capture and enhanced reasoning capabilities.Hunyuan-T1 is the first model powered by the innovative Mamba architecture, a design that fuses Hybrid Transformer and Mixture-of-Experts (MoE) technologies. Built on the TurboS fast-thinking base, Hunyuan-T1 is specifically engineered to optimize the processing of long textual sequences while minimizing computational overhead. This allows the model to effectively capture extended context and manage long-distance dependencies, crucial for tasks that demand deep, coherent reasoning.A key highlight of Hunyuan-T1 is its heavy reliance on RL during the post-training phase. Tencent dedicated 96.7% of its computing power to this approach, enabling the model to refine its reasoning abilities iteratively. Techniques such as data replay, periodic policy resetting, and self-rewarding feedback loops help improve output quality, ensuring the models responses are detailed, efficient, and closely aligned with human expectations.To further boost reasoning proficiency, Tencent employed a curriculum learning strategy. This approach gradually increases the difficulty of training data while simultaneously expanding the models context length. As a result, Hunyuan-T1 is trained to use tokens more efficiently, seamlessly adapting from solving basic mathematical problems to tackling complex scientific and logical challenges. Efficiency is another cornerstone of Hunyuan-T1s design. The TurboS bases ability to capture long-text information prevents context loss, a common issue in many language models, and doubles the decoding speed compared to similar systems. This breakthrough means that users benefit from faster, higher-quality responses without compromising performance.The model has achieved impressive scores on multiple benchmarks: 87.2 on MMLU-PRO, which tests various subjects including humanities, social sciences, and STEM fields; 69.3 on GPQA-diamond, a challenging evaluation featuring doctoral-level scientific problems; 64.9 on LiveCodeBench for coding tasks; and a remarkable 96.2 on the MATH-500 benchmark for mathematical reasoning. These results underscore Hunyuan-T1s versatility and ability to handle high-stakes, professional-grade tasks across various fields. Beyond quantitative metrics, Hunyuan-T1 is designed to deliver outputs with human-like understanding and creativity. During its RL phase, the model underwent a comprehensive alignment process that combined self-rewarding feedback with external reward models. This dual approach ensures its responses are accurate and exhibit rich details and natural flow.In conclusion, Tencents Hunyuan-T1 combines an ultra-large-scale, Mamba-powered architecture with state-of-the-art reinforcement learning and curriculum strategies. Hunyuan-T1 delivers high performance, enhanced reasoning, and exceptional efficiency.Check outTwitterand dont forget to join our85k+ ML SubReddit. Asif RazzaqWebsite| + postsBioAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.Asif Razzaqhttps://www.marktechpost.com/author/6flvq/NVIDIA AI Researchers Introduce FFN Fusion: A Novel Optimization Technique that Demonstrates How Sequential Computation in Large Language Models LLMs can be Effectively ParallelizedAsif Razzaqhttps://www.marktechpost.com/author/6flvq/A Step by Step Guide to Solve 1D Burgers Equation with Physics-Informed Neural Networks (PINNs): A PyTorch Approach Using Automatic Differentiation and Collocation MethodsAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Tutorial to Create a Data Science Agent: A Code Implementation using gemini-2.0-flash-lite model through Google API, google.generativeai, Pandas and IPython.display for Interactive Data AnalysisAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Google AI Released TxGemma: A Series of 2B, 9B, and 27B LLM for Multiple Therapeutic Tasks for Drug Development Fine-Tunable with Transformers
    0 Comentários ·0 Compartilhamentos ·48 Visualizações
  • NVIDIA AI Researchers Introduce FFN Fusion: A Novel Optimization Technique that Demonstrates How Sequential Computation in Large Language Models LLMs can be Effectively Parallelized
    www.marktechpost.com
    Large language models (LLMs) have become vital across domains, enabling high-performance applications such as natural language generation, scientific research, and conversational agents. Underneath these advancements lies the transformer architecture, where alternating layers of attention mechanisms and feed-forward networks (FFNs) sequentially process tokenized input. However, with an increase in size and complexity, the computational burden required for inference grows substantially, creating an efficiency bottleneck. Efficient inference is now a critical concern, with many research groups focusing on strategies that can reduce latency, increase throughput, and cut computational costs while maintaining or improving model performance.At the center of this efficiency problem lies the inherently sequential structure of transformers. Each layers output feeds into the next, demanding strict order and synchronization, which is especially problematic at scale. As model sizes expand, the cost of sequential computation and communication across GPUs grows, leading to reduced efficiency and increased deployment cost. This challenge is amplified in scenarios requiring fast, multi-token generation, such as real-time AI assistants. Reducing this sequential load while maintaining model capabilities presents a key technical hurdle. Unlocking new parallelization strategies that preserve accuracy yet significantly reduce computation depth is essential to broadening the accessibility and scalability of LLMs.Several techniques have emerged to improve efficiency. Quantization reduces the precision of numerical representations to minimize memory and computation needs, though it often risks accuracy losses, especially at low bit-widths. Pruning eliminates redundant parameters and simplifies models but potentially harms accuracy without care. Mixture-of-Experts (MoE) models activate only a subset of parameters per input, making them highly efficient for specific workloads. Still, they can underperform at intermediate batch sizes due to low hardware utilization. While valuable, these strategies have trade-offs that limit their universal applicability. Consequently, the field seeks methods that offer broad efficiency improvements with fewer compromises, especially for dense architectures that are simpler to train, deploy, and maintain.Researchers at NVIDIA introduced a new architectural optimization technique named FFN Fusion, which addresses the sequential bottleneck in transformers by identifying FFN sequences that can be executed in parallel. This approach emerged from the observation that when attention layers are removed using a Puzzle tool, models often retain long sequences of consecutive FFNs. These sequences show minimal interdependency and, therefore, can be processed simultaneously. By analyzing the structure of LLMs such as Llama-3.1-405B-Instruct, researchers created a new model called Ultra-253B-Base by pruning and restructuring the base model through FFN Fusion. This method results in a significantly more efficient model that maintains competitive performance.FFN Fusion fuses multiple consecutive FFN layers into a single, wider FFN. This process is grounded in mathematical equivalence: by concatenating the weights of several FFNs, one can produce a single module that behaves like the sum of the original layers but can be computed in parallel. For instance, if three FFNs are stacked sequentially, each dependent on the output of the previous one, their fusion removes these dependencies by ensuring all three operate on the same input and their outputs are aggregated. The theoretical foundation for this method shows that the fused FFN maintains the same representational capacity. Researchers performed dependency analysis using cosine distance between FFN outputs to identify regions with low interdependence. These regions were deemed optimal for fusion, as minimal change in token direction between layers indicated the feasibility of parallel processing.Applying FFN Fusion to the Llama-405B model resulted in Ultra-253B-Base, which delivered notable gains in speed and resource efficiency. Specifically, the new model achieved a 1.71x improvement in inference latency and reduced per-token computational cost by 35x at a batch size of 32. This efficiency did not come at the expense of capability. Ultra-253B-Base scored 85.17% on MMLU, 72.25% on MMLU-Pro, 84.92% on Arena Hard, 86.58% on HumanEval, and 9.19 on MT-Bench. These results often matched or exceeded the original 405B-parameter model, even though Ultra-253B-Base contained only 253 billion parameters. Memory usage also improved with a 2 reduction in kv-cache requirements. The training process involved distilling 54 billion tokens at an 8k context window, followed by staged fine-tuning at 16k, 32k, and 128k contexts. These steps ensured the fused model maintained high accuracy while benefiting from reduced size.This research demonstrates how thoughtful architectural redesign can unlock significant efficiency gains. Researchers showed that FFN layers in transformer architectures are often more independent than previously assumed. Their method of quantifying inter-layer dependency and transforming model structures allowed for broader application across models of various sizes. The technique was also validated on a 70B-parameter model, proving generalizability. Further experiments indicated that while FFN layers can often be fused with minimal impact, full block parallelization, including attention, introduces more performance degradation due to stronger interdependencies.Several Key Takeaways from the Research on FFN Fusion:The FFN Fusion technique reduces sequential computation in transformers by parallelizing low-dependency FFN layers.Fusion is achieved by replacing sequences of FFNs with a single wider FFN using concatenated weights.Ultra-253B-Base, derived from Llama-3.1-405B, achieves 1.71x faster inference and 35x lower per-token cost.Benchmark results include: 85.17% (MMLU), 72.25% (MMLU-Pro), 86.58% (HumanEval), 84.92% (Arena Hard), and 9.19 (MT-Bench).Memory usage is cut by half due to kv-cache optimization.FFN Fusion is more effective at larger model scales and works well with techniques like pruning and quantization.Full transformer block parallelization shows potential but requires further research due to stronger interdependencies.A systematic method using cosine distance helps identify which FFN sequences are safe to fuse.The technique is validated across different model sizes, including 49B, 70B, and 253B.This approach lays the foundation for more parallel-friendly and hardware-efficient LLM designs.Check outthe Paper.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our85k+ ML SubReddit. Asif RazzaqWebsite| + postsBioAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.Asif Razzaqhttps://www.marktechpost.com/author/6flvq/A Step by Step Guide to Solve 1D Burgers Equation with Physics-Informed Neural Networks (PINNs): A PyTorch Approach Using Automatic Differentiation and Collocation MethodsAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Tutorial to Create a Data Science Agent: A Code Implementation using gemini-2.0-flash-lite model through Google API, google.generativeai, Pandas and IPython.display for Interactive Data AnalysisAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Google AI Released TxGemma: A Series of 2B, 9B, and 27B LLM for Multiple Therapeutic Tasks for Drug Development Fine-Tunable with TransformersAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Meet Open Deep Search (ODS): A Plug-and-Play Framework Democratizing Search with Open-source Reasoning Agents
    0 Comentários ·0 Compartilhamentos ·57 Visualizações
  • A Step by Step Guide to Solve 1D Burgers Equation with Physics-Informed Neural Networks (PINNs): A PyTorch Approach Using Automatic Differentiation and Collocation Methods
    www.marktechpost.com
    In this tutorial, we explore an innovative approach that blends deep learning with physical laws by leveraging Physics-Informed Neural Networks (PINNs) to solve the one-dimensional Burgers equation. Using PyTorch on Google Colab, we demonstrate how to encode the governing differential equation directly into the neural networks loss function, allowing the model to learn the solution (,) that inherently respects the underlying physics. This technique reduces the reliance on large labeled datasets and offers a fresh perspective on solving complex, non-linear partial differential equations using modern computational tools.!pip install torch matplotlibFirst, we install the PyTorch and matplotlib libraries using pip, ensuring you have the necessary tools for building neural networks and visualizing the results in your Google Colab environment.import torchimport torch.nn as nnimport torch.optim as optimimport numpy as npimport matplotlib.pyplot as plttorch.set_default_dtype(torch.float32)We import essential libraries: PyTorch for deep learning, NumPy for numerical operations, and matplotlib for plotting. We set the default tensor data type to float32 for consistent numerical precision throughout your computations.x_min, x_max = -1.0, 1.0t_min, t_max = 0.0, 1.0nu = 0.01 / np.piN_f = 10000 N_0 = 200 N_b = 200 X_f = np.random.rand(N_f, 2)X_f[:, 0] = X_f[:, 0] * (x_max - x_min) + x_min # x in [-1, 1]X_f[:, 1] = X_f[:, 1] * (t_max - t_min) + t_min # t in [0, 1]x0 = np.linspace(x_min, x_max, N_0)[:, None]t0 = np.zeros_like(x0)u0 = -np.sin(np.pi * x0)tb = np.linspace(t_min, t_max, N_b)[:, None]xb_left = np.ones_like(tb) * x_minxb_right = np.ones_like(tb) * x_maxub_left = np.zeros_like(tb)ub_right = np.zeros_like(tb)X_f = torch.tensor(X_f, dtype=torch.float32, requires_grad=True)x0 = torch.tensor(x0, dtype=torch.float32)t0 = torch.tensor(t0, dtype=torch.float32)u0 = torch.tensor(u0, dtype=torch.float32)tb = torch.tensor(tb, dtype=torch.float32)xb_left = torch.tensor(xb_left, dtype=torch.float32)xb_right = torch.tensor(xb_right, dtype=torch.float32)ub_left = torch.tensor(ub_left, dtype=torch.float32)ub_right = torch.tensor(ub_right, dtype=torch.float32)We establish the simulation domain for the Burgers equation by defining spatial and temporal boundaries, viscosity, and the number of collocation, initial, and boundary points. It then generates random and evenly spaced data points for these conditions and converts them into PyTorch tensors, enabling gradient computation where needed.class PINN(nn.Module): def __init__(self, layers): super(PINN, self).__init__() self.activation = nn.Tanh() layer_list = [] for i in range(len(layers) - 1): layer_list.append(nn.Linear(layers[i], layers[i+1])) self.layers = nn.ModuleList(layer_list) def forward(self, x): for i, layer in enumerate(self.layers[:-1]): x = self.activation(layer(x)) return self.layers[-1](x)layers = [2, 50, 50, 50, 50, 1]model = PINN(layers)print(model)Here, we define a custom Physics-Informed Neural Network (PINN) by extending PyTorchs nn.Module. The network architecture is built dynamically using a list of layer sizes, where each linear layer is followed by a Tanh activation (except for the final output layer). In this example, the network takes a 2-dimensional input, passes it through four hidden layers (each with 50 neurons), and outputs a single value. Finally, the model is instantiated with the specified architecture, and its structure is printed.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")model.to(device)Here, we check if a CUDA-enabled GPU is available, set the device accordingly, and move the model to that device for accelerated computation during training and inference.def pde_residual(model, X): x = X[:, 0:1] t = X[:, 1:2] u = model(torch.cat([x, t], dim=1)) u_x = torch.autograd.grad(u, x, grad_outputs=torch.ones_like(u), create_graph=True, retain_graph=True)[0] u_t = torch.autograd.grad(u, t, grad_outputs=torch.ones_like(u), create_graph=True, retain_graph=True)[0] u_xx = torch.autograd.grad(u_x, x, grad_outputs=torch.ones_like(u_x), create_graph=True, retain_graph=True)[0] f = u_t + u * u_x - nu * u_xx return fdef loss_func(model): f_pred = pde_residual(model, X_f.to(device)) loss_f = torch.mean(f_pred**2) u0_pred = model(torch.cat([x0.to(device), t0.to(device)], dim=1)) loss_0 = torch.mean((u0_pred - u0.to(device))**2) u_left_pred = model(torch.cat([xb_left.to(device), tb.to(device)], dim=1)) u_right_pred = model(torch.cat([xb_right.to(device), tb.to(device)], dim=1)) loss_b = torch.mean(u_left_pred**2) + torch.mean(u_right_pred**2) loss = loss_f + loss_0 + loss_b return lossNow, we compute the residual of Burgers equation at the collocation points by calculating the required derivatives via automatic differentiation. Then, we define a loss function that aggregates the PDE residual loss, the error from the initial condition, and the errors from the boundary conditions. This combined loss guides the network to learn a solution that satisfies both the physical law and the imposed conditions.optimizer = optim.Adam(model.parameters(), lr=1e-3)num_epochs = 5000for epoch in range(num_epochs): optimizer.zero_grad() loss = loss_func(model) loss.backward() optimizer.step() if (epoch+1) % 500 == 0: print(f'Epoch {epoch+1}/{num_epochs}, Loss: {loss.item():.5e}') print("Training complete!")Here, we set up the PINNs training loop using the Adam optimizer with a learning rate of 1103. Over 5000 epochs, it repeatedly computes the loss (which includes the PDE residual, initial, and boundary condition errors), backpropagates the gradients, and updates the model parameters. Every 500 epochs, it prints the current epoch and loss to monitor progress and finally announces when training is complete.N_x, N_t = 256, 100x = np.linspace(x_min, x_max, N_x)t = np.linspace(t_min, t_max, N_t)X, T = np.meshgrid(x, t)XT = np.hstack((X.flatten()[:, None], T.flatten()[:, None]))XT_tensor = torch.tensor(XT, dtype=torch.float32).to(device)model.eval()with torch.no_grad(): u_pred = model(XT_tensor).cpu().numpy().reshape(N_t, N_x)plt.figure(figsize=(8, 5))plt.contourf(X, T, u_pred, levels=100, cmap='viridis')plt.colorbar(label='u(x,t)')plt.xlabel('x')plt.ylabel('t')plt.title("Predicted solution u(x,t) via PINN")plt.show()Finally, we create a grid of points over the defined spatial () and temporal () domain, feed these points to the trained model to predict the solution (, ), and reshape the output into a 2D array. Also, it visualizes the predicted solution as a contour plot using matplotlib, complete with a colorbar, axis labels, and a title, allowing you to observe how the PINN has approximated the dynamics of the Burgers equation.In conclusion, this tutorial has showcased how PINNs can be effectively implemented to solve the 1D Burgers equation by incorporating the physics of the problem into the training process. Through careful construction of the neural network, generation of collocation and boundary data, and automatic differentiation, we achieved a model that learns a solution consistent with the PDE and the prescribed conditions. This fusion of machine learning and traditional physics paves the way for tackling more challenging problems in computational science and engineering, inviting further exploration into higher-dimensional systems and more sophisticated neural architectures.Here is the Colab Notebook. Also,dont forget to follow us onTwitterand join ourTelegram ChannelandLinkedIn Group. Dont Forget to join our85k+ ML SubReddit. Asif RazzaqWebsite| + postsBioAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.Asif Razzaqhttps://www.marktechpost.com/author/6flvq/Tutorial to Create a Data Science Agent: A Code Implementation using gemini-2.0-flash-lite model through Google API, google.generativeai, Pandas and IPython.display for Interactive Data AnalysisAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Google AI Released TxGemma: A Series of 2B, 9B, and 27B LLM for Multiple Therapeutic Tasks for Drug Development Fine-Tunable with TransformersAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Meet Open Deep Search (ODS): A Plug-and-Play Framework Democratizing Search with Open-source Reasoning AgentsAsif Razzaqhttps://www.marktechpost.com/author/6flvq/A Code Implementation of Monocular Depth Estimation Using Intel MiDaS Open Source Model on Google Colab with PyTorch and OpenCV
    0 Comentários ·0 Compartilhamentos ·52 Visualizações
  • Empowering Time Series AI: How Salesforce is Leveraging Synthetic Data to Enhance Foundation Models
    www.marktechpost.com
    Time series analysis faces significant hurdles in data availability, quality, and diversity, critical factors in developing effective foundation models. Real-world datasets often fall short due to regulatory limitations, inherent biases, poor quality, and limited paired textual annotations, making it difficult to create robust, generalizable Time Series Foundation Models (TSFMs) and Large Language Model-based Time Series Models (TSLLMs). This scarcity impacts tasks such as forecasting, classification, anomaly detection, reasoning, and captioning, limiting the full potential of current advancements in artificial intelligence.Salesforce AI Research has addressed these challenges by proposing a comprehensive approach to leveraging synthetic data for enhancing TSFMs and TSLLMs. Their recent study, Empowering Time Series Analysis with Synthetic Data, presents a novel strategy of using synthetic data to improve model training, evaluation, and fine-tuning, focusing on mitigating biases, increasing dataset diversity, and enriching contextual information. By developing innovative data-generation frameworks and incorporating synthetic datasets, Salesforce AI aims to advance the practical application of TSFMs and TSLLMs, especially in sensitive domains like healthcare and finance, where data sharing is heavily regulated.The technical cornerstone of Salesforce AI Researchs methodology involves various synthetic data generation approaches, each addressing specific aspects of time series dynamics, such as trends, seasonal patterns, and noise characteristics. For instance, the ForecastPFN method combines linear-exponential trends and periodic seasonalities with Weibull-distributed noise, effectively simulating realistic yet diverse scenarios. Similarly, TimesFM integrates piecewise linear trends and autoregressive moving average (ARMA) models with periodic patterns. Another innovative technique, KernelSynth by Chronos, employs Gaussian Processes (GPs) combined with linear, periodic, and radial basis function (RBF) kernels to generate rich synthetic datasets. These methods enable a controlled yet varied synthetic data creation that helps in capturing a comprehensive range of realistic time series behaviors.The Salesforce teams findings highlight substantial benefits derived from synthetic data in multiple stages of model development. In pretraining, synthetic datasets provided clear performance enhancements, notably demonstrated in models like ForecastPFN, Mamba4Cast, and TimesFM. For example, ForecastPFN pretrained entirely on synthetic data showed significant improvements in zero-shot forecasting scenarios, while Chronos found optimal performance gains by mixing around 10% synthetic data with real-world datasets, beyond which additional synthetic data could potentially degrade performance due to less diverse representations. Additionally, synthetic data also played a crucial role in evaluation, allowing researchers to precisely assess the models capabilities, understanding internal representations, and identifying gaps in the learned patterns. Moment utilized synthetically generated sinusoidal waves to evaluate internal embeddings and model sensitivity to variations in time series characteristics, demonstrating its effectiveness in capturing subtle trends and frequencies.The paper also addresses current limitations in synthetic data usage, identifying areas for future improvement. One critical gap is the absence of systematic integration methods for synthetic datasets, suggesting the need for structured frameworks to identify and fill missing real-world data patterns strategically. Another limitation noted is the dominance of statistical methods, prompting a call for exploring data-driven generative techniques, like diffusion models, to enhance realism. Salesforce researchers further emphasize untapped potential in leveraging synthetic data during fine-tuning phases to address specific domain gaps or model weaknesses more efficiently and adaptively.In conclusion, Salesforce AI Research demonstrates that synthetic data offers a powerful toolset for overcoming data-related challenges in time series analysis. By systematically integrating high-quality synthetic datasets into various stages of model development, TSFMs and TSLLMs can achieve enhanced generalization, reduced biases, and improved performance across diverse analytical tasks. Despite existing limitations, such as ensuring realism and alignment, the proactive advancement and exploration of synthetic data generation methodologies indicate significant potential. Future research, as suggested by Salesforce, should focus on improving data realism, systematically addressing data gaps, and exploiting iterative, human-in-the-loop synthetic data generation processes. These advancements could dramatically expand the applicability and reliability of time series models, laying a solid foundation for future innovations in artificial intelligence.Check outthe Paper.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our85k+ ML SubReddit. NikhilNikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute.Nikhilhttps://www.marktechpost.com/author/nikhil0980/Meta Reality Labs Research Introduces Sonata: Advancing Self-Supervised Representation Learning for 3D Point CloudsNikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces the Kolmogorov-Test: A Compression-as-Intelligence Benchmark for Evaluating Code-Generating Language ModelsNikhilhttps://www.marktechpost.com/author/nikhil0980/Beginners Guide to Deploying a Machine Learning API with FastAPINikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces PLAN-AND-ACT: A Modular Framework for Long-Horizon Planning in Web-Based Language Agents
    0 Comentários ·0 Compartilhamentos ·71 Visualizações
  • Meta Reality Labs Research Introduces Sonata: Advancing Self-Supervised Representation Learning for 3D Point Clouds
    www.marktechpost.com
    3D self-supervised learning (SSL) has faced persistent challenges in developing semantically meaningful point representations suitable for diverse applications with minimal supervision. Despite substantial progress in image-based SSL, existing point cloud SSL methods have largely been limited due to the issue known as the geometric shortcut, where models excessively rely on low-level geometric features like surface normals or point heights. This reliance compromises the generalizability and semantic depth of the representations, hindering their practical deployment.Researchers from the University of Hong Kong and Meta Reality Labs Research introduce Sonata, an advanced approach designed to address these fundamental challenges. Sonata employs a self-supervised learning framework that effectively mitigates the geometric shortcut by strategically obscuring low-level spatial cues and reinforcing dependency on richer input features. Drawing inspiration from recent advancements in image-based SSL, Sonata integrates a point self-distillation mechanism that gradually refines representation quality and ensures robustness against geometric simplifications.At a technical level, Sonata utilizes two core strategies: firstly, it operates on coarser scales to obscure spatial information that might otherwise dominate the learned representations. Secondly, Sonata adopts a point self-distillation approach, progressively increasing task difficulty through adaptive masking strategies to foster deeper semantic understanding. Crucially, Sonata removes decoder structures traditionally used in hierarchical models to avoid reintroducing local geometric shortcuts, allowing the encoder alone to build robust, multi-scale feature representations. Additionally, Sonata applies masked point jitter, introducing random perturbations to the spatial coordinates of masked points, thus further discouraging reliance on trivial geometric features.The empirical results reported validate Sonatas efficacy and efficiency. Sonata achieves significant performance gains on benchmarks like ScanNet, where it records a linear probing accuracy of 72.5%, substantially surpassing previous state-of-the-art SSL approaches. Importantly, Sonata demonstrates robustness even with limited data, performing effectively using as little as 1% of the ScanNet dataset, which highlights its suitability for low-resource scenarios. Its parameter efficiency is also notable, delivering strong performance improvements with fewer parameters compared to conventional methods. Furthermore, integrating Sonata with image-derived representations such as DINOv2 results in enhanced accuracy, emphasizing its capacity to capture distinctive semantic details specific to 3D data.Sonatas capabilities are further illustrated through insightful zero-shot visualizations including PCA-colored point clouds and dense feature correspondence, demonstrating coherent semantic clustering and robust spatial reasoning under challenging augmentation conditions. The versatility of Sonata is also evidenced across various semantic segmentation tasks, spanning indoor datasets like ScanNet and ScanNet200, as well as outdoor datasets including Waymo, consistently achieving state-of-the-art outcomes.In conclusion, Sonata represents a significant advancement in addressing inherent limitations in 3D self-supervised learning. Its methodological innovations effectively resolve issues associated with the geometric shortcut, providing semantically richer and more reliable representations. Sonatas integration of self-distillation, careful manipulation of spatial information, and scalability to large datasets establish a solid foundation for future explorations in versatile and robust 3D representation learning. The framework sets a methodological benchmark, facilitating further research towards comprehensive multimodal SSL integration and practical 3D applications.Check outthe Paper and GitHub Page.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our85k+ ML SubReddit. NikhilNikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute.Nikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces the Kolmogorov-Test: A Compression-as-Intelligence Benchmark for Evaluating Code-Generating Language ModelsNikhilhttps://www.marktechpost.com/author/nikhil0980/Beginners Guide to Deploying a Machine Learning API with FastAPINikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces PLAN-AND-ACT: A Modular Framework for Long-Horizon Planning in Web-Based Language AgentsNikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces GRPO-based Open-RS: A Low-Cost Reinforcement Learning Framework to Enhance Reasoning in Small Language Models
    0 Comentários ·0 Compartilhamentos ·77 Visualizações
  • Tutorial to Create a Data Science Agent: A Code Implementation using gemini-2.0-flash-lite model through Google API, google.generativeai, Pandas and IPython.display for Interactive Data Analysis
    www.marktechpost.com
    In this tutorial, we demonstrate the integration of Pythons robust data manipulation library Pandas with Google Clouds advanced generative capabilities through the google.generativeai package and the Gemini Pro model. By setting up the environment with the necessary libraries, configuring the Google Cloud API key, and leveraging the IPython display functionalities, the code provides a step-by-step approach to building a data science agent analyzing a sample sales dataset. The example shows how to convert a DataFrame into markdown format and then use natural language queries to generate insights about the data, highlighting the potential of combining traditional data analysis tools with modern AI-driven methods.!pip install pandas google-generativeai --quietFirst, we install the Pandas and google-generativeai libraries quietly, setting up the environment for data manipulation and AI-powered analysis.import pandas as pdimport google.generativeai as genaifrom IPython.display import MarkdownWe import Pandas for data manipulation, google.generativeai for accessing Googles generative AI capabilities, and Markdown from IPython.display to render markdown-formatted outputs.GOOGLE_API_KEY = "Use Your API Key Here"genai.configure(api_key=GOOGLE_API_KEY)model = genai.GenerativeModel('gemini-2.0-flash-lite')We assign a placeholder API key, configure the google.generativeai client with it, and initialize the gemini-2.0-flash-lite GenerativeModel for generating content.data = {'Product': ['Laptop', 'Mouse', 'Keyboard', 'Monitor', 'Webcam', 'Headphones'], 'Category': ['Electronics', 'Electronics', 'Electronics', 'Electronics', 'Electronics', 'Electronics'], 'Region': ['North', 'South', 'East', 'West', 'North', 'South'], 'Units Sold': [150, 200, 180, 120, 90, 250], 'Price': [1200, 25, 75, 300, 50, 100]}sales_df = pd.DataFrame(data)print("Sample Sales Data:")print(sales_df)print("-" * 30)Here, we create a Pandas DataFrame named sales_df containing sample sales data for various products, and then print the DataFrame followed by a separator line to visually distinguish the output.def ask_gemini_about_data(dataframe, query): """ Asks the Gemini Pro model a question about the given Pandas DataFrame. Args: dataframe: The Pandas DataFrame to analyze. query: The natural language question about the DataFrame. Returns: The response from the Gemini Pro model as a string. """ prompt = f"""You are a data analysis agent. Analyze the following pandas DataFrame and answer the question. DataFrame: ``` {dataframe.to_markdown(index=False)} ``` Question: {query} Answer: """ response = model.generate_content(prompt) return response.textHere, we construct a markdown-formatted prompt from a Pandas DataFrame and a natural language query, then use the Gemini Pro model to generate and return an analytical response.# Query 1: What is the total number of units sold across all products?query1 = "What is the total number of units sold across all products?"response1 = ask_gemini_about_data(sales_df, query1)print(f"Question 1: {query1}")print(f"Answer 1:\n{response1}")print("-" * 30)Query 1 Output# Query 2: Which product had the highest number of units sold?query2 = "Which product had the highest number of units sold?"response2 = ask_gemini_about_data(sales_df, query2)print(f"Question 2: {query2}")print(f"Answer 2:\n{response2}")print("-" * 30)Query 2 Output# Query 3: What is the average price of the products?query3 = "What is the average price of the products?"response3 = ask_gemini_about_data(sales_df, query3)print(f"Question 3: {query3}")print(f"Answer 3:\n{response3}")print("-" * 30)Query 3 Output# Query 4: Show me the products sold in the 'North' region.query4 = "Show me the products sold in the 'North' region."response4 = ask_gemini_about_data(sales_df, query4)print(f"Question 4: {query4}")print(f"Answer 4:\n{response4}")print("-" * 30)Query 4 Output# Query 5. More complex query: Calculate the total revenue for each product.query5 = "Calculate the total revenue (Units Sold * Price) for each product and present it in a table."response5 = ask_gemini_about_data(sales_df, query5)print(f"Question 5: {query5}")print(f"Answer 5:\n{response5}")print("-" * 30)Query 5 OutputIn conclusion, the tutorial successfully illustrates how the synergy between Pandas, the google.generativeai package, and the Gemini Pro model can transform data analysis tasks into a more interactive and insightful process. The approach simplifies querying and interpreting data and opens up avenues for advanced use cases such as data cleaning, feature engineering, and exploratory data analysis. By harnessing these state-of-the-art tools within the familiar Python ecosystem, data scientists can enhance their productivity and innovation, making it easier to derive meaningful insights from complex datasets.Here is the Colab Notebook. Also,dont forget to follow us onTwitterand join ourTelegram ChannelandLinkedIn Group. Dont Forget to join our85k+ ML SubReddit. Asif RazzaqWebsite| + postsBioAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.Asif Razzaqhttps://www.marktechpost.com/author/6flvq/Google AI Released TxGemma: A Series of 2B, 9B, and 27B LLM for Multiple Therapeutic Tasks for Drug Development Fine-Tunable with TransformersAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Meet Open Deep Search (ODS): A Plug-and-Play Framework Democratizing Search with Open-source Reasoning AgentsAsif Razzaqhttps://www.marktechpost.com/author/6flvq/A Code Implementation of Monocular Depth Estimation Using Intel MiDaS Open Source Model on Google Colab with PyTorch and OpenCVAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Google DeepMind Researchers Propose CaMeL: A Robust Defense that Creates a Protective System Layer around the LLM, Securing It even when Underlying Models may be Susceptible to Attacks
    0 Comentários ·0 Compartilhamentos ·52 Visualizações
  • Google AI Released TxGemma: A Series of 2B, 9B, and 27B LLM for Multiple Therapeutic Tasks for Drug Development Fine-Tunable with Transformers
    www.marktechpost.com
    Developing therapeutics continues to be an inherently costly and challenging endeavor, characterized by high failure rates and prolonged development timelines. The traditional drug discovery process necessitates extensive experimental validations from initial target identification to late-stage clinical trials, consuming substantial resources and time. Computational methodologies, particularly machine learning and predictive modeling, have emerged as pivotal tools to streamline this process. However, existing computational models are typically highly specialized, limiting their effectiveness in addressing diverse therapeutic tasks and offering limited interactive reasoning capabilities required for scientific inquiry and analysis.To address these limitations, Google AI has introduced TxGemma, a collection of generalist large language models (LLMs) designed explicitly to facilitate various therapeutic tasks in drug development. TxGemma distinguishes itself by integrating diverse datasets, encompassing small molecules, proteins, nucleic acids, diseases, and cell lines, which allows it to span multiple stages within the therapeutic development pipeline. TxGemma models, available with 2 billion (2B), 9 billion (9B), and 27 billion (27B) parameters, are fine-tuned from Gemma-2 architecture using comprehensive therapeutic datasets. Additionally, the suite includes TxGemma-Chat, an interactive conversational model variant, that enables scientists to engage in detailed discussions and mechanistic interpretations of predictive outcomes, fostering transparency in model utilization.From a technical standpoint, TxGemma capitalizes on the extensive Therapeutic Data Commons (TDC), a curated dataset containing over 15 million datapoints across 66 therapeutically relevant datasets. TxGemma-Predict, the predictive variant of the model suite, demonstrates significant performance across these datasets, matching or exceeding the performance of both generalist and specialist models currently employed in therapeutic modeling. Notably, the fine-tuning approach employed in TxGemma optimizes predictive accuracy with substantially fewer training samples, providing a crucial advantage in domains where data scarcity is prevalent. Further extending its capabilities, Agentic-Tx, powered by Gemini 2.0, dynamically orchestrates complex therapeutic queries by combining predictive insights from TxGemma-Predict and interactive discussions from TxGemma-Chat with external domain-specific tools.Empirical evaluations underscore TxGemmas capability. Across 66 tasks curated by the TDC, TxGemma-Predict consistently achieved performance comparable to or exceeding existing state-of-the-art models. Specifically, TxGemmas predictive models surpassed state-of-the-art generalist models in 45 tasks and specialized models in 26 tasks, with notable efficiency in clinical trial adverse event predictions. On challenging benchmarks such as ChemBench and Humanitys Last Exam, Agentic-Tx demonstrated clear advantages over previous leading models, enhancing accuracy by approximately 5.6% and 17.9%, respectively. Moreover, the conversational capabilities embedded in TxGemma-Chat provided essential interactive reasoning to support in-depth scientific analyses and discussions.TxGemmas practical utility is particularly evident in adverse event prediction during clinical trials, an essential aspect of therapeutic safety evaluation. TxGemma-27B-Predict demonstrated robust predictive performance while utilizing significantly fewer training samples compared to conventional models, illustrating enhanced data efficiency and reliability. Moreover, computational performance assessments indicate that the inference speed of TxGemma supports practical real-time applications, such as virtual screening, with the largest variant (27B parameters) capable of efficiently processing large sample volumes daily when deployed on scalable infrastructure.In summary, the introduction of TxGemma by Google AI represents a methodical advancement in computational therapeutic research, combining predictive efficacy, interactive reasoning, and improved data efficiency. By making TxGemma publicly accessible, Google enables further validation and adaptation on diverse, proprietary datasets, thereby promoting broader applicability and reproducibility in therapeutic research. With sophisticated conversational functionality via TxGemma-Chat and complex workflow integration through Agentic-Tx, the suite provides researchers with advanced computational tools capable of significantly enhancing decision-making processes in therapeutic development.Check outthe Paper and Models on Hugging Face .All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our85k+ ML SubReddit. Asif RazzaqWebsite| + postsBioAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.Asif Razzaqhttps://www.marktechpost.com/author/6flvq/Meet Open Deep Search (ODS): A Plug-and-Play Framework Democratizing Search with Open-source Reasoning AgentsAsif Razzaqhttps://www.marktechpost.com/author/6flvq/A Code Implementation of Monocular Depth Estimation Using Intel MiDaS Open Source Model on Google Colab with PyTorch and OpenCVAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Google DeepMind Researchers Propose CaMeL: A Robust Defense that Creates a Protective System Layer around the LLM, Securing It even when Underlying Models may be Susceptible to AttacksAsif Razzaqhttps://www.marktechpost.com/author/6flvq/DeepSeek AI Unveils DeepSeek-V3-0324: Blazing Fast Performance on Mac Studio, Heating Up the Competition with OpenAI
    0 Comentários ·0 Compartilhamentos ·70 Visualizações
  • A Code Implementation of Monocular Depth Estimation Using Intel MiDaS Open Source Model on Google Colab with PyTorch and OpenCV
    www.marktechpost.com
    Monocular depth estimation involves predicting scene depth from a single RGB imagea fundamental task in computer vision with wide-ranging applications, including augmented reality, robotics, and 3D scene understanding. In this tutorial, we implement Intels MiDaS (Monocular Depth Estimation via a Multi-Scale Vision Transformer), a state-of-the-art model designed for high-quality depth prediction from a single image. Leveraging Google Colab as the compute platform, along with PyTorch, OpenCV, and Matplotlib, this tutorial enables you to upload your image and visualize the corresponding depth maps easily.!pip install -q timm opencv-python matplotlibFirst, we install the necessary Python librariestimm for model support, opencv-python for image processing, and matplotlib for visualizing the depth maps.!git clone https://github.com/isl-org/MiDaS.git%cd MiDaSThen, we clone the official Intel MiDaS repository from GitHub and navigate into its directory to access the model code and transformation utilities.import torchimport cv2import matplotlib.pyplot as pltimport numpy as npfrom PIL import Imagefrom torchvision.transforms import Composefrom google.colab import filesfrom midas.dpt_depth import DPTDepthModelfrom midas.transforms import Resize, NormalizeImage, PrepareForNetdevice = torch.device("cuda" if torch.cuda.is_available() else "cpu")We import all the necessary libraries and MiDaS components required for loading the model, preprocessing images, handling uploads, and visualizing depth predictions. Then we set the computation device to GPU (CUDA) if available; otherwise, it defaults to CPU, ensuring system compatibility.model_path = torch.hub.load("intel-isl/MiDaS", "DPT_Large", pretrained=True, force_reload=True)model = model_path.to(device)model.eval()Here, we download the pretrained MiDaS DPT_Large model from Intels torch.hub, moves it to the selected device (CPU or GPU), and sets it to evaluation mode for inference.transform = Compose([ Resize(384, 384, resize_target=None, keep_aspect_ratio=True, ensure_multiple_of=32, resize_method="upper_bound"), NormalizeImage(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), PrepareForNet()])We define MiDaSs image preprocessing pipeline, which resizes the input image, normalizes its pixel values, and formats it appropriately for model inference.uploaded = files.upload()for filename in uploaded: img = cv2.imread(filename) img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) breakWe allow the user to upload an image in Colab, read it using OpenCV, and convert it from BGR to RGB format for accurate color representation.img_input = transform({"image": img})["image"]input_tensor = torch.from_numpy(img_input).unsqueeze(0).to(device)with torch.no_grad(): prediction = model(input_tensor) prediction = torch.nn.functional.interpolate( prediction.unsqueeze(1), size=img.shape[:2], mode="bicubic", align_corners=False, ).squeeze()depth_map = prediction.cpu().numpy()Now, we apply the preprocessing transform to the uploaded image, convert it to a tensor, perform depth prediction using the MiDaS model, resize the output to match the original image dimensions, and extract the final depth map as a NumPy array.plt.figure(figsize=(10, 5))plt.subplot(1, 2, 1)plt.imshow(img)plt.title("Original Image")plt.axis("off")plt.subplot(1, 2, 2)plt.imshow(depth_map, cmap='inferno')plt.title("Depth Map")plt.axis("off")plt.tight_layout()plt.show()Finally, we create a side-by-side visualization of the original image and its corresponding depth map using Matplotlib. The depth map is displayed using the inferno colormap for better contrast.In conclusion, by completing this tutorial, weve successfully deployed Intels MiDaS model on Google Colab to perform monocular depth estimation using just an RGB image. Using PyTorch for model inference, OpenCV for image processing, and Matplotlib for visualization, weve built a robust pipeline to generate high-quality depth maps with minimal setup. This implementation is a strong foundation for further exploration, including video depth estimation, real-time applications, and integration of AR/VR systems.Here is the Colab Notebook. Also,dont forget to follow us onTwitterand join ourTelegram ChannelandLinkedIn Group. Dont Forget to join our85k+ ML SubReddit. Asif RazzaqWebsite| + postsBioAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.Asif Razzaqhttps://www.marktechpost.com/author/6flvq/Meet Open Deep Search (ODS): A Plug-and-Play Framework Democratizing Search with Open-source Reasoning AgentsAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Google DeepMind Researchers Propose CaMeL: A Robust Defense that Creates a Protective System Layer around the LLM, Securing It even when Underlying Models may be Susceptible to AttacksAsif Razzaqhttps://www.marktechpost.com/author/6flvq/DeepSeek AI Unveils DeepSeek-V3-0324: Blazing Fast Performance on Mac Studio, Heating Up the Competition with OpenAIAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Google AI Released Gemini 2.5 Pro Experimental: An Advanced AI Model that Excels in Reasoning, Coding, and Multimodal Capabilities
    0 Comentários ·0 Compartilhamentos ·59 Visualizações
  • Meet Open Deep Search (ODS): A Plug-and-Play Framework Democratizing Search with Open-source Reasoning Agents
    www.marktechpost.com
    The rapid advancements in search engine technologies integrated with large language models (LLMs) have predominantly favored proprietary solutions such as Googles GPT-4o Search Preview and Perplexitys Sonar Reasoning Pro. While these proprietary systems offer strong performance, their closed-source nature poses significant challenges, particularly concerning transparency, innovation, and community collaboration. This exclusivity limits customization and hampers broader academic and entrepreneurial engagement with search-enhanced AI.In response to these limitations, researchers from the University of Washington, Princeton University, and UC Berkeley have introduced Open Deep Search (ODS)an open-source search AI framework designed for seamless integration with any user-selected LLM in a modular manner. ODS comprises two central components: the Open Search Tool and the Open Reasoning Agent. Together, these components substantially improve the capabilities of the base LLM by enhancing content retrieval and reasoning accuracy.The Open Search Tool distinguishes itself through an advanced retrieval pipeline, featuring an intelligent query rephrasing method that better captures user intent by generating multiple semantically related queries. This approach notably improves the accuracy and diversity of search results. Furthermore, the tool employs refined chunking and re-ranking techniques to systematically filter search results according to relevance. Complementing the retrieval component, the Open Reasoning Agent operates through two distinct methodologies: the Chain-of-thought ReAct agent and the Chain-of-code CodeAct agent. These agents interpret user queries, manage tool usageincluding searches and calculationsand produce comprehensive, contextually accurate responses.Empirical evaluations underscore the effectiveness of ODS. Integrated with DeepSeek-R1, an advanced open-source reasoning model, ODS-v2 achieves 88.3% accuracy on the SimpleQA benchmark and 75.3% on the FRAMES benchmark. This performance notably surpasses proprietary alternatives such as Perplexitys Sonar Reasoning Pro, which scores 85.8% and 44.4% on these benchmarks, respectively. Compared with OpenAIs GPT-4o Search Preview, ODS-v2 shows a significant advantage on the FRAMES benchmark, achieving a 9.7% higher accuracy. These results illustrate ODSs capacity to deliver competitive, and in specific areas superior, performance relative to proprietary systems.An important feature of ODS is its adaptive use of tools, as demonstrated by strategic decision-making regarding additional web searches. For straightforward queries, as observed in SimpleQA, ODS minimizes additional searches, demonstrating efficient resource utilization. Conversely, for complex multi-hop queries, as in the FRAMES benchmark, ODS appropriately increases its use of web searches, thus exemplifying intelligent resource management tailored to query complexity.In conclusion, Open Deep Search represents a notable advancement towards democratizing search-enhanced AI by providing an open-source framework compatible with diverse LLMs. It encourages innovation and transparency within the AI research community and supports broader participation in the development of sophisticated search and reasoning capabilities. By effectively integrating advanced retrieval techniques with adaptive reasoning methodologies, ODS contributes meaningfully to open-source AI development, setting a robust standard for future exploration in search-integrated large language models.Check outthe Paper and GitHub Page.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our85k+ ML SubReddit.The post Meet Open Deep Search (ODS): A Plug-and-Play Framework Democratizing Search with Open-source Reasoning Agents appeared first on MarkTechPost.
    0 Comentários ·0 Compartilhamentos ·57 Visualizações
  • Beginners Guide to Deploying a Machine Learning API with FastAPI
    www.marktechpost.com
    In this guide, you will learn how to deploy a machine learning model as an API using FastAPI. We will create an API that predicts the species of a penguin based on its bill length and flipper length.PrerequisitesBasic knowledge of PythonPython installed on your system (preferably version 3.7 or higher)Familiarity with machine learning concepts (optional)Step 1: Set Up Your EnvironmentCreate a Project DirectoryOpen your terminal and create a new directory for your project:Set Up a Virtual EnvironmentOn windows use: venvScriptsactivateInstall Required PackagesStep 2: Prepare Your Machine Learning ModelDownload Dataset here.Create a Python Script for the ModelStep 3: Create the FastAPI ApplicationCreate the Main Application FileStep 4: Run Your FastAPI ApplicationRun the ApplicationAccess the APIStep 5: Test Your APIUse Swagger UIConclusionCongratulations! You have successfully deployed a machine learning API using FastAPI. This guide covered:Setting up your environment.Preparing a machine learning model.Creating a FastAPI application.Running and testing your API.Next StepsExplore more advanced features of FastAPI like authentication and database integration.Experiment with different machine learning models and datasets.Consider containerizing your application using Docker for easier deployment.Feel free to reach out if you have any questions or need further assistance! NikhilNikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute.Nikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces the Kolmogorov-Test: A Compression-as-Intelligence Benchmark for Evaluating Code-Generating Language ModelsNikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces PLAN-AND-ACT: A Modular Framework for Long-Horizon Planning in Web-Based Language AgentsNikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces GRPO-based Open-RS: A Low-Cost Reinforcement Learning Framework to Enhance Reasoning in Small Language ModelsNikhilhttps://www.marktechpost.com/author/nikhil0980/Qwen Releases the Qwen2.5-VL-32B-Instruct: A 32B Parameter VLM that Surpasses Qwen2.5-VL-72B and Other Models like GPT-4o Mini
    0 Comentários ·0 Compartilhamentos ·53 Visualizações
  • This AI Paper Introduces the Kolmogorov-Test: A Compression-as-Intelligence Benchmark for Evaluating Code-Generating Language Models
    www.marktechpost.com
    Compression is a cornerstone of computational intelligence, deeply rooted in the theory of Kolmogorov complexity, which defines the minimal program needed to reproduce a given sequence. Unlike traditional compression methods that look for repetition and redundancy, Kolmogorovs framework interprets compression as a problem of discovering structured patterns through programmatic representation. While the theory promises optimal compression, its uncomputability poses a significant hurdle. Nevertheless, the emergence of large language models capable of code generation opens an intriguing opportunity to test how closely modern systems can approximate this theoretical ideal by reasoning through code rather than pattern matching.A core issue arises from the limitations of current tools in compressing data sequences using concise, executable code. Models often replicate inputs rather than generate programs that reproduce them, indicating a gap in true pattern understanding. This becomes especially evident when dealing with real-world audio, text, or DNA sequences, where complex logical structures must be uncovered to achieve efficient compression. The main challenge is ensuring the model replicates the sequence and uses a minimal and rational set of instructions. Furthermore, though synthetic training data is useful for controlled evaluation, it often fails to support robust generalization to natural data, which is essential for practical applications.Several compression tools exist, ranging from traditional algorithms like GZIP to newer neural compression systems. GZIP remains a strong baseline, especially for long or repetitive sequences, due to its effective encoding of statistical regularities. More recently, language modeling approaches have integrated with arithmetic coding, using prediction probabilities to compress input data. However, these methods typically require access to the full model weights at decoding time, limiting their efficiency and applicability. Prompted code-generating models like GPT-4 and LLaMA have also been evaluated in zero-shot settings to generate Python programs that reproduce input sequences. Yet, they frequently produce lengthy, imprecise code with limited success, particularly when faced with unseen or complex sequences.Researchers from Meta AI and Tel Aviv University introduced the Kolmogorov-Test (KT), a benchmark for assessing the reasoning capability of code-generating language models. The test evaluates a models ability to generate the shortest program that outputs a given input sequence. Unlike typical benchmarks, KT emphasizes logical composition and program generation over predictive text modeling. Sequences include natural data from audio (LibriSpeech), text (Wikipedia enwik9), and DNA (GRCh38), as well as synthetic sequences generated through a custom-designed domain-specific language (DSL). This DSL supports building structured sequences by composing operations like range creation, sequence modification, merging, and filtering.The researchers developed an automated framework to generate millions of synthetic program-sequence pairs using this DSL. These programs then train and evaluate models, including large pre-trained and specifically trained ones like SEQCODER. To measure performance, the team employed metrics such as accuracywhether the generated program reproduces the sequenceand precisionhow concise the correct program is compared to GZIP compression. The test involved compressing sequences of varying lengths, with synthetic sequences averaging 76 bytes and real sequences capped at 128.Results showed that even the most powerful models struggled. GPT-4 achieved 69.5% accuracy on high-quality audio but dropped to 36.4% for 8-bit audio and 50.3% for DNA data. LLaMA-3.1-405B performed worse, with accuracies as low as 3.9% for audio and only 24.8% for DNA. In synthetic data, SEQCODER-8B reached 92.5% accuracy with a precision score of 0.56, outperforming traditional tools like GZIP. However, its accuracy on real-world data remained near zero. This discrepancy illustrates the difficulty in transferring success from synthetic benchmarks to more varied and noisy real-world sequences, highlighting the limitations of current training regimes and prompting the need for new strategies.Overall, this research clearly outlines the complexity of compression via code generation. The KT benchmark provides a rigorous and diverse model reasoning and structure recognition test, exposing the stark divide between synthetic learning environments and real-world applications. The introduced methodology and test set a high bar for future models aiming to unify reasoning with compression, but significant innovation is still required to meet this challenge.Check outthe Paper.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our85k+ ML SubReddit. NikhilNikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute.Nikhilhttps://www.marktechpost.com/author/nikhil0980/Beginners Guide to Deploying a Machine Learning API with FastAPINikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces PLAN-AND-ACT: A Modular Framework for Long-Horizon Planning in Web-Based Language AgentsNikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces GRPO-based Open-RS: A Low-Cost Reinforcement Learning Framework to Enhance Reasoning in Small Language ModelsNikhilhttps://www.marktechpost.com/author/nikhil0980/Qwen Releases the Qwen2.5-VL-32B-Instruct: A 32B Parameter VLM that Surpasses Qwen2.5-VL-72B and Other Models like GPT-4o Mini
    0 Comentários ·0 Compartilhamentos ·53 Visualizações
  • Google DeepMind Researchers Propose CaMeL: A Robust Defense that Creates a Protective System Layer around the LLM, Securing It even when Underlying Models may be Susceptible to Attacks
    www.marktechpost.com
    Large Language Models (LLMs) are becoming integral to modern technology, driving agentic systems that interact dynamically with external environments. Despite their impressive capabilities, LLMs are highly vulnerable to prompt injection attacks. These attacks occur when adversaries inject malicious instructions through untrusted data sources, aiming to compromise the system by extracting sensitive data or executing harmful operations. Traditional security methods, such as model training and prompt engineering, have shown limited effectiveness, underscoring the urgent need for robust defenses.Google DeepMind Researchers propose CaMeL, a robust defense that creates a protective system layer around the LLM, securing it even when underlying models may be susceptible to attacks. Unlike traditional approaches that require retraining or model modifications, CaMeL introduces a new paradigm inspired by proven software security practices. It explicitly extracts control and data flows from user queries, ensuring untrusted inputs never alter program logic directly. This design isolates potentially harmful data, preventing it from influencing the decision-making processes inherent to LLM agents.Technically, CaMeL functions by employing a dual-model architecture: a Privileged LLM and a Quarantined LLM. The Privileged LLM orchestrates the overall task, isolating sensitive operations from potentially harmful data. The Quarantined LLM processes data separately and is explicitly stripped of tool-calling capabilities to limit potential damage. CaMeL further strengthens security by assigning metadata or capabilities to each data value, defining strict policies about how each piece of information can be utilized. A custom Python interpreter enforces these fine-grained security policies, monitoring data provenance and ensuring compliance through explicit control-flow constraints.Results from empirical evaluation using the AgentDojo benchmark highlight CaMeLs effectiveness. In controlled tests, CaMeL successfully thwarted prompt injection attacks by enforcing security policies at granular levels. The system demonstrated the ability to maintain functionality, solving 67% of tasks securely within the AgentDojo framework. Compared to other defenses like Prompt Sandwiching and Spotlighting, CaMeL outperformed significantly in terms of security, providing near-total protection against attacks while incurring moderate overheads. The overhead primarily manifests in token usage, with approximately a 2.82 increase in input tokens and a 2.73 increase in output tokens, acceptable considering the security guarantees provided.Moreover, CaMeL addresses subtle vulnerabilities, such as data-to-control flow manipulations, by strictly managing dependencies through its metadata-based policies. For instance, a scenario where an adversary attempts to leverage benign-looking instructions from email data to control the system execution flow would be mitigated effectively by CaMeLs rigorous data tagging and policy enforcement mechanisms. This comprehensive protection is essential, given that conventional methods might fail to recognize such indirect manipulation threats.In conclusion, CaMeL represents a significant advancement in securing LLM-driven agentic systems. Its ability to robustly enforce security policies without altering the underlying LLM offers a powerful and flexible approach to defending against prompt injection attacks. By adopting principles from traditional software security, CaMeL not only mitigates explicit prompt injection risks but also safeguards against sophisticated attacks leveraging indirect data manipulation. As LLM integration expands into sensitive applications, adopting CaMeL could be vital in maintaining user trust and ensuring secure interactions within complex digital ecosystems.Check outthe Paper.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our85k+ ML SubReddit. Asif RazzaqWebsite| + postsBioAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.Asif Razzaqhttps://www.marktechpost.com/author/6flvq/DeepSeek AI Unveils DeepSeek-V3-0324: Blazing Fast Performance on Mac Studio, Heating Up the Competition with OpenAIAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Google AI Released Gemini 2.5 Pro Experimental: An Advanced AI Model that Excels in Reasoning, Coding, and Multimodal CapabilitiesAsif Razzaqhttps://www.marktechpost.com/author/6flvq/A Code Implementation for Advanced Human Pose Estimation Using MediaPipe, OpenCV and MatplotlibAsif Razzaqhttps://www.marktechpost.com/author/6flvq/A Coding Implementation of Extracting Structured Data Using LangSmith, Pydantic, LangChain, and Claude 3.7 Sonnet
    0 Comentários ·0 Compartilhamentos ·55 Visualizações
  • This AI Paper Introduces PLAN-AND-ACT: A Modular Framework for Long-Horizon Planning in Web-Based Language Agents
    www.marktechpost.com
    Large language models are powering a new wave of digital agents to handle sophisticated web-based tasks. These agents are expected to interpret user instructions, navigate interfaces, and execute complex commands in ever-changing environments. The difficulty lies not in understanding language but in translating that understanding into precise, sequenced actions while adapting to dynamic contexts. Success for long-horizon tasks like booking travel or retrieving specific web data depends on managing a sequence of steps that evolves with each action. Despite major progress in language capabilities, creating agents that can effectively plan and adapt at each step remains an unsolved problem.Composing broad goals into actionable steps is a major issue in building such agents. When a user requests follow the top contributor of this GitHub project, the agent must interpret the command and determine how to navigate to the contributors section, identify the relevant person, and initiate the following action. This task becomes even more complex in dynamic environments where content may shift between executions. Without a clear planning and updating strategy, agents can make inconsistent decisions or fail entirely. The scarcity of training data that shows how to plan and execute long tasks correctly adds another layer of difficulty.Previously, researchers attempted to address these issues with models that either relied on single-agent strategies or applied reinforcement learning to guide actions. Single-agent systems like ReAct attempted to merge reasoning and execution but often faltered as the model was overwhelmed by thinking and acting at once. Reinforcement learning approaches showed promise but proved unstable and highly sensitive to environment-specific tuning. Collecting training data for these methods required extensive interaction with environments, making it time-consuming and impractical to scale. These methods also struggled to maintain performance consistency when tasks changed mid-process.Researchers from UC Berkeley, the University of Tokyo, and ICSI introduced a new PLAN-AND-ACT system. Companies like Apple, Nvidia, Microsoft, and Intel supported the work. This framework splits task planning and execution into two modules: a PLANNER and an EXECUTOR. The PLANNER is tasked with creating a structured plan based on the users request, essentially outlining what steps need to be taken. The EXECUTOR then translates each step into environment-specific actions. By separating these responsibilities, the system allows the PLANNER to focus on strategy while the EXECUTOR handles execution, improving the reliability of both components. This modular design marks a significant shift from previous approaches.The methodology behind PLAN-AND-ACT is detailed and focuses heavily on scalable training. Since human-annotated planning data is limited, researchers introduced a synthetic data generation pipeline. They began by collecting action trajectories from simulated agentssequences of clicks, inputs, and responses. Large language models then analyzed these trajectories to reconstruct high-level plans grounded in actual outcomes. For example, a plan might specify identifying the top contributor, while the actions linked to it include clicking the Contributors tab and parsing the resulting HTML. The team expanded their dataset with 10,000 additional synthetic plans and then generated 5,000 more targeted plans based on failure analysis. This synthetic training method saved time and produced high-quality data that reflected real execution needs.In testing, PLAN-AND-ACT achieved a task success rate of 53.94% on the WebArena-Lite benchmark, surpassing the previous best result of 49.1% from WebRL. Without any planner, a base executor only achieved 9.85%. Adding a non-finetuned planner boosted performance to 29.63% while finetuning on 10,000 synthetic plans brought results up to 44.24%. Incorporating dynamic replanning added a final 10.31% performance gain. Across all experiments, the data showed that most performance improvements came from enhancing the PLANNER rather than the EXECUTOR. Even with a base EXECUTOR, having a strong PLANNER led to substantial success rate increases, validating the researchers hypothesis that separating planning and execution yields better task outcomes.In conclusion, this paper highlights how identifying the gap between goal understanding and environment interaction can lead to more effective AI systems. By focusing on structured planning and scalable data generation, the researchers proposed a method that solves a specific problem and demonstrates a framework that can extend to broader applications. PLAN-AND-ACT shows that effective planning, not just execution, is critical to AI agent success in complex environments.Check outthe Paper.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our85k+ ML SubReddit. NikhilNikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute.Nikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces GRPO-based Open-RS: A Low-Cost Reinforcement Learning Framework to Enhance Reasoning in Small Language ModelsNikhilhttps://www.marktechpost.com/author/nikhil0980/Qwen Releases the Qwen2.5-VL-32B-Instruct: A 32B Parameter VLM that Surpasses Qwen2.5-VL-72B and Other Models like GPT-4o MiniNikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper from NVIDIA Introduces Cosmos-Reason1: A Multimodal Model for Physical Common Sense and Embodied ReasoningNikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper from UC Berkeley Introduces TULIP: A Unified Contrastive Learning Model for High-Fidelity Vision and Language Understanding
    0 Comentários ·0 Compartilhamentos ·60 Visualizações
  • DeepSeek AI Unveils DeepSeek-V3-0324: Blazing Fast Performance on Mac Studio, Heating Up the Competition with OpenAI
    www.marktechpost.com
    Artificial intelligence (AI) has made significant strides in recent years, yet challenges persist in achieving efficient, cost-effective, and high-performance models. Developing large language models (LLMs) often requires substantial computational resources and financial investment, which can be prohibitive for many organizations. Additionally, ensuring that these models possess strong reasoning capabilities and can be deployed effectively on consumer-grade hardware remains a hurdle.DeepSeek AI has addressed these challenges head-on with the release of DeepSeek-V3-0324, a significant upgrade to its V3 large language model. This new model not only enhances performance but also operates at an impressive speed of 20 tokens per second on a Mac Studio, a consumer-grade device. This advancement intensifies the competition with industry leaders like OpenAI, showcasing DeepSeeks commitment to making high-quality AI models more accessible and efficient. DeepSeek-V3-0324 introduces several technical improvements over its predecessor. Notably, it demonstrates significant enhancements in reasoning capabilities, with benchmark scores showing substantial increases:MMLU-Pro: 75.9 81.2 (+5.3)GPQA: 59.1 68.4 (+9.3)AIME: 39.6 59.4 (+19.8)LiveCodeBench: 39.2 49.2 (+10.0)These improvements indicate a more robust understanding and processing of complex tasks. Additionally, the model has enhanced front-end web development skills, producing more executable code and aesthetically pleasing web pages and game interfaces. Its Chinese writing proficiency has also seen advancements, aligning with the R1 writing style and improving the quality of medium-to-long-form content. Furthermore, function calling accuracy has been increased, addressing issues present in previous versions. The release of DeepSeek-V3-0324 under the MIT License underscores DeepSeek AIs dedication to open-source collaboration, allowing developers worldwide to utilize and build upon this technology without restrictive licensing constraints. The models ability to run efficiently on devices like the Mac Studio, achieving 20 tokens per second, exemplifies its practical applicability and efficiency. This performance level not only makes advanced AI more accessible but also reduces the dependency on expensive, specialized hardware, thereby lowering the barrier to entry for many users and organizations. In conclusion, DeepSeek AIs release of DeepSeek-V3-0324 marks a significant milestone in the AI landscape. By addressing key challenges related to performance, cost, and accessibility, DeepSeek has positioned itself as a formidable competitor to established entities like OpenAI. The models technical advancements and open-source availability promise to democratize AI technology further, fostering innovation and broader adoption across various sectors.Check outthe Model on Hugging Face.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our85k+ ML SubReddit. Asif RazzaqWebsite| + postsBioAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.Asif Razzaqhttps://www.marktechpost.com/author/6flvq/Google AI Released Gemini 2.5 Pro Experimental: An Advanced AI Model that Excels in Reasoning, Coding, and Multimodal CapabilitiesAsif Razzaqhttps://www.marktechpost.com/author/6flvq/A Code Implementation for Advanced Human Pose Estimation Using MediaPipe, OpenCV and MatplotlibAsif Razzaqhttps://www.marktechpost.com/author/6flvq/A Coding Implementation of Extracting Structured Data Using LangSmith, Pydantic, LangChain, and Claude 3.7 SonnetAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Meet LocAgent: Graph-Based AI Agents Transforming Code Localization for Scalable Software Maintenance
    0 Comentários ·0 Compartilhamentos ·75 Visualizações
  • Understanding and Mitigating Failure Modes in LLM-Based Multi-Agent Systems
    www.marktechpost.com
    Despite the growing interest in Multi-Agent Systems (MAS), where multiple LLM-based agents collaborate on complex tasks, their performance gains remain limited compared to single-agent frameworks. While MASs are explored in software engineering, drug discovery, and scientific simulations, they often struggle with coordination inefficiencies, leading to high failure rates. These failures reveal key challenges, including task misalignment, reasoning-action mismatches, and ineffective verification mechanisms. Empirical evaluations show that even state-of-the-art open-source MASs, such as ChatDev, can exhibit low success rates, raising questions about their reliability. Unlike single-agent frameworks, MASs must address inter-agent misalignment, conversation resets, and incomplete task verification, significantly impacting their effectiveness. Additionally, current best practices, such as best-of-N sampling, often outperform MASs, emphasizing the need for a deeper understanding of their limitations.Existing research has tackled specific challenges in agentic systems, such as improving workflow memory, enhancing state control, and refining communication flows. However, these approaches do not offer a holistic strategy for improving MAS reliability across domains. While various benchmarks assess agentic systems based on performance, security, and trustworthiness, there is no consensus on how to build robust MASs. Prior studies highlight the risks of overcomplicating agentic frameworks and stress the importance of modular design, yet systematic investigations into MAS failure modes remain scarce. This work contributes by providing a structured taxonomy of MAS failures and suggesting design principles to enhance their reliability, paving the way for more effective multi-agent LLM systems.Researchers from UC Berkeley and Intesa Sanpaolo present the first comprehensive study of MAS challenges, analyzing five frameworks across 150 tasks with expert annotators. They identify 14 failure modes, categorized into system design flaws, inter-agent misalignment, and task verification issues, forming the Multi-Agent System Failure Taxonomy (MASFT). They develop an LLM-as-a-judge pipeline to facilitate evaluation, achieving high agreement with human annotators. Despite interventions like improved agent specification and orchestration, MAS failures persist, underscoring the need for structural redesigns. Their work, including datasets and annotations, is open-sourced to guide future MAS research and development.The study explores failure patterns in MAS and categorizes them into a structured taxonomy. Using the Grounded Theory (GT) approach, researchers analyze MAS execution traces iteratively, refining failure categories through inter-annotator agreement studies. They developed an LLM-based annotator for automated failure detection, achieving 94% accuracy. Failures are classified into system design flaws, inter-agent misalignment, and inadequate task verification. The taxonomy is validated through iterative refinement, ensuring reliability. Results highlight diverse failure modes across MAS architectures, emphasizing the need for improved coordination, clearer role definitions, and robust verification mechanisms to enhance MAS performance.Strategies are categorized into tactical and structural approaches to enhance MASs and reduce failures. Tactical methods involve refining prompts, agent organization, interaction management, and improving clarity and verification steps. However, their effectiveness varies. Structural strategies focus on system-wide improvements, such as verification mechanisms, standardized communication, reinforcement learning, and memory management. Two case studiesMathChat and ChatDevdemonstrate these approaches. MathChat refines prompts and agent roles, improving results inconsistently. ChatDev enhances role adherence and modifies framework topology for iterative verification. While these interventions help, significant improvements require deeper structural modifications, emphasizing the need for further research in MAS reliability.In conclusion, the study comprehensively analyzes failure modes in MASs using LLMs. By examining over 150 traces, the research identifies 14 distinct failure modes: specification and system design, inter-agent misalignment, and task verification and termination. An automated LLM Annotator is introduced to analyze MAS traces, demonstrating reliability. Case studies reveal that simple fixes often fall short, necessitating structural strategies for consistent improvements. Despite growing interest in MASs, their performance remains limited compared to single-agent systems, underscoring the need for deeper research into agent coordination, verification, and communication strategies.Check outthe Paper.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our85k+ ML SubReddit. Sana HassanSana Hassan, a consulting intern at Marktechpost and dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. With a keen interest in solving practical problems, he brings a fresh perspective to the intersection of AI and real-life solutions.Sana Hassanhttps://www.marktechpost.com/author/sana-hassan/RWKV-7: Advancing Recurrent Neural Networks for Efficient Sequence ModelingSana Hassanhttps://www.marktechpost.com/author/sana-hassan/Lyra: A Computationally Efficient Subquadratic Architecture for Biological Sequence ModelingSana Hassanhttps://www.marktechpost.com/author/sana-hassan/Fin-R1: A Specialized Large Language Model for Financial Reasoning and Decision-MakingSana Hassanhttps://www.marktechpost.com/author/sana-hassan/Microsoft AI Releases RD-Agent: An AI-Driven Tool for Performing R&D with LLM-based Agents
    0 Comentários ·0 Compartilhamentos ·81 Visualizações
  • Google AI Released Gemini 2.5 Pro Experimental: An Advanced AI Model that Excels in Reasoning, Coding, and Multimodal Capabilities
    www.marktechpost.com
    In the evolving field of artificial intelligence, a significant challenge has been developing models that can effectively reason through complex problems, generate accurate code, and process multiple forms of data. Traditional AI systems often excel in specific tasks but struggle to generalize across diverse domains, limiting their practical applications. This fragmentation underscores the need for more integrated and versatile AI solutions.Addressing this, Google has introduced Gemini 2.5 Pro Experimental, an advanced AI model designed to enhance reasoning, coding, and multimodal capabilities. Building upon its predecessors, Gemini 2.5 Pro is engineered to tackle complex challenges in fields such as coding, science, and mathematics. Its multimodal design enables it to interpret and generate text, audio, images, video, and code, broadening its applicability across various sectors. From a technical standpoint, Gemini 2.5 Pro incorporates advanced reasoning capabilities, allowing the model to process tasks methodically and make informed decisions. It features a substantial context window, currently supporting up to 1 million tokens, with plans to expand to 2 million tokens. This extensive context window enables the model to comprehend large datasets and address intricate problems that require synthesizing information from multiple sources. In coding applications, Gemini 2.5 Pro demonstrates proficiency by creating visually compelling web applications and efficiently performing code transformation and editing tasks.https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/#advanced-codingEmpirical evaluations highlight Gemini 2.5 Pros strong performance. It leads in benchmarks related to mathematics and science, such as GPQA and AIME 2025, reflecting its robust reasoning capabilities. Notably, it achieved a score of 18.8% on Humanitys Last Exam, a dataset designed to assess advanced knowledge and reasoning. In coding benchmarks, Gemini 2.5 Pro scored 63.8% on SWE-Bench Verified, indicating its competence in agentic code evaluations. Furthermore, it topped the LMArena leaderboard by a significant margin, underscoring its advanced capabilities in multimodal reasoning, coding, and STEM fields.In conclusion, Gemini 2.5 Pro Experimental represents a notable advancement in AI, reflecting Googles commitment to developing more intelligent and versatile models. By integrating reasoning capabilities directly into its architecture, Gemini 2.5 Pro addresses previous limitations, offering enhanced performance and improved accuracy. Its ability to handle complex problems across coding, science, and mathematics, coupled with its multimodal proficiency, positions it as a valuable tool in the AI landscape. As AI continues to evolve, models like Gemini 2.5 Pro pave the way for more sophisticated and context-aware applications, fostering innovation across various sectors.Check outthe Technical details and Try it here.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our85k+ ML SubReddit. Asif RazzaqWebsite| + postsBioAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.Asif Razzaqhttps://www.marktechpost.com/author/6flvq/A Code Implementation for Advanced Human Pose Estimation Using MediaPipe, OpenCV and MatplotlibAsif Razzaqhttps://www.marktechpost.com/author/6flvq/A Coding Implementation of Extracting Structured Data Using LangSmith, Pydantic, LangChain, and Claude 3.7 SonnetAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Meet LocAgent: Graph-Based AI Agents Transforming Code Localization for Scalable Software MaintenanceAsif Razzaqhttps://www.marktechpost.com/author/6flvq/A Coding Implementation to Build a Conversational Research Assistant with FAISS, Langchain, Pypdf, and TinyLlama-1.1B-Chat-v1.0
    0 Comentários ·0 Compartilhamentos ·79 Visualizações
Mais stories