Marktechpost AI
Marktechpost AI
AI/ML Research and Dev News Platform (1 million+monthly traffic) | 50k+ ML subreddit | Contact: Asif@marktechpost.com
1 Bu gibi insanlar
742 Yazı
2 Fotoğraflar
0 Videolar
0 önizleme
Son Güncellemeler
  • Salesforce AI Introduce BingoGuard: An LLM-based Moderation System Designed to Predict both Binary Safety Labels and Severity Levels
    www.marktechpost.com
    The advancement of large language models (LLMs) has significantly influenced interactive technologies, presenting both benefits and challenges. One prominent issue arising from these models is their potential to generate harmful content. Traditional moderation systems, typically employing binary classifications (safe vs. unsafe), lack the necessary granularity to distinguish varying levels of harmfulness effectively. This limitation can lead to either excessively restrictive moderation, diminishing user interaction, or inadequate filtering, which could expose users to harmful content.Salesforce AI introduces BingoGuard, an LLM-based moderation system designed to address the inadequacies of binary classification by predicting both binary safety labels and detailed severity levels. BingoGuard utilizes a structured taxonomy, categorizing potentially harmful content into eleven specific areas, including violent crime, sexual content, profanity, privacy invasion, and weapon-related content. Each category incorporates five clearly defined severity levels ranging from benign (level 0) to extreme risk (level 4). This structure enables platforms to calibrate their moderation settings precisely according to their specific safety guidelines, ensuring appropriate content management across varying severity contexts.From a technical perspective, BingoGuard employs a generate-then-filter methodology to assemble its comprehensive training dataset, BingoGuardTrain, consisting of 54,897 entries spanning multiple severity levels and content styles. This framework initially generates responses tailored to different severity tiers, subsequently filtering these outputs to ensure alignment with defined quality and relevance standards. Specialized LLMs undergo individual fine-tuning processes for each severity tier, using carefully selected and expertly audited seed datasets. This fine-tuning guarantees that generated outputs adhere closely to predefined severity rubrics. The resultant moderation model, BingoGuard-8B, leverages this meticulously curated dataset, enabling precise differentiation among various degrees of harmful content. Consequently, moderation accuracy and flexibility are significantly enhanced.Empirical evaluations of BingoGuard indicate strong performance. Testing against BingoGuardTest, an expert-labeled dataset comprising 988 examples, revealed that BingoGuard-8B achieves higher detection accuracy than leading moderation models such as WildGuard and ShieldGemma, with improvements of up to 4.3%. Notably, BingoGuard demonstrates superior accuracy in identifying lower-severity content (levels 1 and 2), traditionally difficult for binary classification systems. Additionally, in-depth analyses uncovered a relatively weak correlation between predicted unsafe probabilities and the actual severity level, underscoring the necessity of explicitly incorporating severity distinctions. These findings illustrate fundamental gaps in current moderation methods that primarily rely on binary classifications.In conclusion, BingoGuard enhances the precision and effectiveness of AI-driven content moderation by integrating detailed severity assessments alongside binary safety evaluations. This approach allows platforms to handle moderation with greater accuracy and sensitivity, minimizing the risks associated with both overly cautious and insufficient moderation strategies. Salesforces BingoGuard thus provides an improved framework for addressing the complexities of content moderation within increasingly sophisticated AI-generated interactions.Check outthe Paper.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our85k+ ML SubReddit. Asif RazzaqWebsite| + postsBioAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.Asif Razzaqhttps://www.marktechpost.com/author/6flvq/Open AI Releases PaperBench: A Challenging Benchmark for Assessing AI Agents Abilities to Replicate Cutting-Edge Machine Learning ResearchAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Nomic Open Sources State-of-the-Art Multimodal Embedding ModelAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Meta AI Proposes Multi-Token Attention (MTA): A New Attention Method which Allows LLMs to Condition their Attention Weights on Multiple Query and Key VectorsAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Meet ReSearch: A Novel AI Framework that Trains LLMs to Reason with Search via Reinforcement Learning without Using Any Supervised Data on Reasoning Steps
    0 Yorumlar ·0 hisse senetleri ·33 Views
  • Enhancing Strategic Decision-Making in Gomoku Using Large Language Models and Reinforcement Learning
    www.marktechpost.com
    LLMs have significantly advanced NLP, demonstrating strong text generation, comprehension, and reasoning capabilities. These models have been successfully applied across various domains, including education, intelligent decision-making, and gaming. LLMs serve as interactive tutors in education, aiding personalized learning and improving students reading and writing skills. In decision-making, they analyze large datasets to generate insights for complex problems. LLMs enhance player experiences by generating dynamic content and facilitating strategy development within gaming. However, despite these successes, their application to intricate tasks such as strategic gameplay in Gomoku remains challenging. Gomoku, a classic board game known for its simple rules yet deep strategic complexity, presents difficulties for both traditional search-based methods, which are computationally expensive, and machine learning approaches, which often struggle with efficiency. This has led researchers to explore how LLMs can be integrated with deep learning and reinforcement learning to develop an AI capable of making rational strategic decisions in Gomoku.Research on LLM applications in gaming has taken multiple directions, including evaluating model competency in simple deterministic games like Tic-Tac-Toe and assessing their strategic reasoning in more complex environments. Studies suggest that LLMs perform better in probabilistic games than in deterministic, complete-information settings, which presents challenges for games like Gomoku that demand deep spatial reasoning. Theoretical insights from game theory have examined LLMs ability to engage in strategic decision-making, while empirical studies emphasize the importance of prompt engineering in shaping their gameplay strategies. Despite advancements in multi-game evaluations, a notable gap persists between LLMs and human-level strategic reasoning. Addressing this limitation requires refining reinforcement learning frameworks to improve decision-making efficiency, ultimately bridging the gap between LLM-based agents and expert human players in strategic board games like Gomoku.Researchers from Peking University have developed a Gomoku AI system based on LLMs that mimics human learning to enhance strategic decision-making. The system enables the model to interpret the board state, understand the game rules, select strategies, and evaluate positions. By incorporating self-play and reinforcement learning, the AI refines its move selection, avoids illegal moves, and improves efficiency through parallel position evaluation. Extensive training has significantly enhanced its gameplay, allowing it to adapt strategies dynamically. This approach demonstrates that LLMs can effectively learn and apply complex game strategies, making them valuable tools for strategic gameplay development.The implementation of the Gomoku AI system is structured into five key components: prompt design, strategy selection, position evaluation, self-play, and reinforcement learning. A specialized prompt template enables LLMs to simulate human decision-making by incorporating board state, game rules, and strategic logic. The model selects from 52 strategies and nine analytical methods to refine its gameplay. To prevent illegal moves, a local position evaluation method scores legal positions for optimal selection. Self-play enhances strategic adaptability, while reinforcement learning with Deep Q-networks introduces per-turn rewards to accelerate learning efficiency. This integrated approach significantly improves Gomoku AIs decision-making and performance.A parallel framework using Ray accelerates local position evaluation to enhance efficiency, reducing move time from 150 to 28 seconds. A state-action-reward database preserves self-play data, preventing progress loss due to API failures. A visualization module graphically represents moves and strategies for clarity. The model, trained through 1,046 self-play games with a Deep Q-Network, significantly outperforms Zero-shot, Few-shot, and Chain-of-Thought methods. Performance evaluation includes human assessment and survival step testing against AlphaZero, showing improved strategic accuracy and gameplay durability. Training over 1,000 episodes leads to notable performance gains, demonstrating the methods effectiveness.In conclusion, despite its success, the model faces challenges such as slow self-play learning and limited strategy depth due to selecting only one strategy and analytical logic per move. Future improvements include combining multiple strategies for deeper analysis, leveraging advanced reinforcement learning methods like Deep Deterministic Policy Gradient, and incorporating multi-agent systems. Using AlphaZeros results may further refine decision-making. The study demonstrates how LLMs can effectively play Gomoku through strategic reasoning and reinforcement learning, improving decision speed and accuracy. Future research will focus on optimizing strategy selection and integrating vision-language models for enhanced performance.Check outthe Paper.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our85k+ ML SubReddit. Sana HassanSana Hassan, a consulting intern at Marktechpost and dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. With a keen interest in solving practical problems, he brings a fresh perspective to the intersection of AI and real-life solutions.Sana Hassanhttps://www.marktechpost.com/author/sana-hassan/Mitigating Hallucinations in Large Vision-Language Models: A Latent Space Steering ApproachSana Hassanhttps://www.marktechpost.com/author/sana-hassan/A Comprehensive Guide to LLM Routing: Tools and FrameworksSana Hassanhttps://www.marktechpost.com/author/sana-hassan/Understanding AI Agent Memory: Building Blocks for Intelligent SystemsSana Hassanhttps://www.marktechpost.com/author/sana-hassan/Advancing Medical Reasoning with Reinforcement Learning from Verifiable Rewards (RLVR): Insights from MED-RLVR
    0 Yorumlar ·0 hisse senetleri ·25 Views
  • Open AI Releases PaperBench: A Challenging Benchmark for Assessing AI Agents Abilities to Replicate Cutting-Edge Machine Learning Research
    www.marktechpost.com
    The rapid progress in artificial intelligence (AI) and machine learning (ML) research underscores the importance of accurately evaluating AI agents capabilities in replicating complex, empirical research tasks traditionally performed by human researchers. Currently, systematic evaluation tools that precisely measure the ability of AI agents to autonomously reproduce ML research findings remain limited, posing challenges in fully understanding the potential and limitations of such systems.OpenAI has introduced PaperBench, a benchmark designed to evaluate the competence of AI agents in autonomously replicating state-of-the-art machine learning research. PaperBench specifically measures whether AI systems can accurately interpret research papers, independently develop the necessary codebases, and execute experiments to replicate empirical outcomes. The benchmark comprises 20 papers selected from ICML 2024, covering areas including reinforcement learning, robustness, and probabilistic methods. Detailed rubrics, co-developed with original paper authors, specify 8,316 individually gradable tasks to facilitate precise evaluation of AI capabilities.From a technical perspective, PaperBench requires AI agents to process provided research papers and supplementary clarifications to develop comprehensive code repositories from scratch. These repositories must include complete experimental setups and execution scripts, notably the reproduce.sh file. To ensure genuine independent replication, agents are prohibited from referencing or reusing code from the original authors repositories. Rubrics are structured hierarchically to detail explicit pass-fail criteria at various levels, allowing systematic and objective assessment. Evaluation is conducted using SimpleJudge, an automated large language model (LLM)-based judge, which simplifies the grading process. SimpleJudge achieved an F1 score of 0.83 on JudgeEval, an auxiliary evaluation dataset specifically designed to validate automated grading accuracy.Empirical evaluations of several advanced AI models indicate varying performance levels on PaperBench. Claude 3.5 Sonnet exhibited the highest capability with an average replication score of 21.0%. Other models such as OpenAIs GPT-4o and Gemini 2.0 Flash attained significantly lower scores of 4.1% and 3.2%, respectively. Comparatively, expert human ML researchers achieved considerably higher accuracy, reaching up to 41.4% after 48 hours of dedicated effort. Analysis of model performance revealed strengths in initial rapid code generation and early experimental setup but highlighted substantial weaknesses in managing prolonged tasks, troubleshooting, and adapting strategic approaches over time.These results provide critical technical insights into current AI system capabilities. While AI models demonstrate competence in certain coding tasks and initial experiment implementation, significant gaps persist, particularly regarding sustained task execution, adaptive problem-solving, and strategic planning. Additionally, the introduction of PaperBench Code-Dev, a streamlined variant emphasizing code correctness without experimental execution, offers a practical alternative for broader and resource-limited community use due to reduced computational and evaluation costs.In summary, PaperBench represents an important step toward methodically evaluating AI research capabilities. It provides a structured and detailed assessment environment that highlights specific strengths and limitations of contemporary AI models relative to human performance. The collaborative development of rubrics ensures precise and realistic evaluations. OpenAIs open-sourcing of PaperBench supports further exploration and development in the field, enhancing understanding of autonomous AI research capabilities and informing responsible progression in this area.Check outthe Paper and GitHub page.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our85k+ ML SubReddit. Asif RazzaqWebsite| + postsBioAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.Asif Razzaqhttps://www.marktechpost.com/author/6flvq/Nomic Open Sources State-of-the-Art Multimodal Embedding ModelAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Meta AI Proposes Multi-Token Attention (MTA): A New Attention Method which Allows LLMs to Condition their Attention Weights on Multiple Query and Key VectorsAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Meet ReSearch: A Novel AI Framework that Trains LLMs to Reason with Search via Reinforcement Learning without Using Any Supervised Data on Reasoning StepsAsif Razzaqhttps://www.marktechpost.com/author/6flvq/How to Build a Prototype X-ray Judgment Tool (Open Source Medical Inference System) Using TorchXRayVision, Gradio, and PyTorch
    0 Yorumlar ·0 hisse senetleri ·11 Views
  • Mitigating Hallucinations in Large Vision-Language Models: A Latent Space Steering Approach
    www.marktechpost.com
    Hallucination remains a significant challenge in deploying Large Vision-Language Models (LVLMs), as these models often generate text misaligned with visual inputs. Unlike hallucination in LLMs, which arises from linguistic inconsistencies, LVLMs struggle with cross-modal discrepancies, leading to inaccurate image descriptions or incorrect spatial relationships. These models leverage vision encoders, such as CLIP, alongside pretrained text decoders to map visual information into language. Despite their strong performance in tasks like image captioning, visual question answering, and medical treatment planning, LVLMs remain prone to hallucination, which limits their real-world applicability. The issue stems from various factors, including statistical biases in pretraining, an over-reliance on language priors, and feature learning biases. However, existing research often fails to account for the unique architecture of LVLMs, treating their hallucination mechanisms similarly to those in LLMs despite the distinct role of visual input processing.To mitigate hallucination in LVLMs, researchers have explored both training-based and training-free approaches. Training-based solutions focus on enhancing model alignment with ground truth through additional supervision, but they require extensive datasets and computational resources. In contrast, training-free methods, such as self-feedback correction and auxiliary model integration, have gained popularity due to their efficiency. Some approaches refine the text decoding process to reduce inconsistencies, but these often fail to address hallucination from the visual encoder. As LVLMs evolve, developing targeted solutions that consider visual and textual components will be crucial for improving their robustness and reliability in real-world applications.Researchers from Stanford University investigate the mechanisms behind hallucinations in LVLMs, focusing on the instability of vision encoders and their impact on text decoders. They introduce Visual and Textual Intervention (VTI), a test-time technique stabilizing vision features by modifying latent space representations. Unlike traditional smoothing methods, VTI pre-computes transformation directions from perturbed images and applies them to new queries, reducing hallucinations without extra training costs. Experimental results show that VTI consistently outperforms baseline approaches across multiple benchmarks, emphasizing the importance of vision feature stability in mitigating hallucinations and improving LVLM reliability.LVLMs comprise a vision encoder and a text decoder, where unstable vision features can lead to hallucinations. Researchers identify that perturbations in vision embeddings cause inconsistencies in generated text. To address this, they propose VTI, which pre-computes stable feature shifts using Principal Component Analysis (PCA) on perturbed image embeddings. These shifts are then applied to new queries, improving feature stability without additional training. VTI also adjusts text decoder embeddings to reduce hallucinations. Experiments confirm its effectiveness in mitigating hallucinations while maintaining computational efficiency across diverse tasks and datasets.The study evaluates the effectiveness of VTI in mitigating hallucinations in LVLMs. Using 80 COCO image-text pairs, the method generalizes across tasks and datasets. Experiments on POPE, CHAIR, and MMHAL-Bench demonstrate VTIs superiority over baseline methods like OPERA and VCD. Results show that visual intervention stabilizes feature representations while textual intervention enhances image attention. Their combination improves accuracy while maintaining text richness. Additionally, an ablation study on and confirms their impact on reducing hallucinations. VTI effectively addresses multimodal hallucinations without compromising content quality.In conclusion, the study presents VTI as an effective method to mitigate hallucinations in LVLMs. Unlike hallucinations in LLMs, those in LVLMs stem from misalignments between visual inputs and textual outputs, often due to separately pre-trained image encoders and text decoders. VTI stabilizes vision features by adjusting latent space representations during inference, requiring no additional training. Experimental results confirm its superiority over baseline methods in reducing hallucinations while maintaining output quality. These findings emphasize the importance of robust feature representation, paving the way for more accurate and reliable LVLM applications in real-world settings.Check outthe Paper.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our85k+ ML SubReddit. Sana HassanSana Hassan, a consulting intern at Marktechpost and dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. With a keen interest in solving practical problems, he brings a fresh perspective to the intersection of AI and real-life solutions.Sana Hassanhttps://www.marktechpost.com/author/sana-hassan/A Comprehensive Guide to LLM Routing: Tools and FrameworksSana Hassanhttps://www.marktechpost.com/author/sana-hassan/Understanding AI Agent Memory: Building Blocks for Intelligent SystemsSana Hassanhttps://www.marktechpost.com/author/sana-hassan/Advancing Medical Reasoning with Reinforcement Learning from Verifiable Rewards (RLVR): Insights from MED-RLVRSana Hassanhttps://www.marktechpost.com/author/sana-hassan/Efficient Inference-Time Scaling for Flow Models: Enhancing Sampling Diversity and Compute Allocation
    0 Yorumlar ·0 hisse senetleri ·8 Views
  • Nomic Open Sources State-of-the-Art Multimodal Embedding Model
    www.marktechpost.com
    Nomic has announced the release of Nomic Embed Multimodal, a groundbreaking embedding model that achieves state-of-the-art performance on visual document retrieval tasks. The new model seamlessly processes interleaved text, images, and screenshots, establishing a new high score on the Vidore-v2 benchmark for visual document retrieval. This advancement is particularly significant for retrieval augmented generation (RAG) applications working with PDF documents, where capturing both visual and textual context is crucial.Breaking New Ground in Visual Document RetrievalThe Nomic Embed Multimodal 7B model has achieved an impressive 62.7 NDCG@5 score on the Vidore-v2 benchmark, representing a 2.8-point improvement over previous best-performing models. This advancement marks a significant milestone in the evolution of multimodal embeddings for document processing.Unlike traditional retrieval systems that primarily rely on extracted text and often miss crucial visual elements, Nomics new model captures the full richness of documents by embedding both text and visual components directly. This approach eliminates the need for complex, error-prone processing pipelines commonly used in document analysis.Solving Real-World Document ChallengesDocuments are inherently multimodal, conveying information through text, figures, page layouts, tables, and even fonts. Traditional text-only systems struggle with this complexity, often requiring separate encoders for visual and text inputs or complex preprocessing pipelines.Nomic Embed Multimodal provides an elegant solution by supporting interleaved text and image inputs in a single model, making it ideal for:PDF documents and research papersScreenshots of applications and websitesVisually rich content where layout mattersMultilingual documents where visual context is importantA Complete Embedding EcosystemWith the release of Nomic Embed Multimodal, Nomic has finalized a comprehensive suite of embedding models that achieve state-of-the-art performance across multiple domains:Nomic Embed Multimodal: The latest addition that achieves state-of-the-art performance on interleaved text, images, and screenshots. It is ideal for document retrieval workflows.Nomic Embed Text v2: A powerful multilingual text embedding model that achieves state-of-the-art performance on the MIRACL benchmark. It is ideal for text retrieval workflows in any language.Nomic Embed Code: An embedding model that is specialized for code search applications, achieving a state-of-the-art score on the CodeSearchNet benchmark. It is ideal for code agent applications.This complete ecosystem provides developers with cutting-edge tools for handling diverse data types, from pure text to complex multimodal documents and specialized code repositories. Each model in the ecosystem is designed to work seamlessly with modern RAG workflows while delivering best-in-class performance in its domain.AvailabilityNomic has made their multimodal embedding models available on Hugging Face, along with the corresponding dataset and GitHub repository, making this cutting-edge technology accessible to researchers and developers worldwide.This release represents a significant step forward in multimodal representation learning and document understanding, completing Nomics vision of providing state-of-the-art embedding solutions across the full spectrum of data modalities.Availability is upcoming in the (Nomic Atlas Data and Embedding Platform)Thanks tothe Nomic teamfor the thought leadership/ Resources for this article.Nomic team has supported us financially and by content for this article. Asif RazzaqWebsite| + postsBioAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.Asif Razzaqhttps://www.marktechpost.com/author/6flvq/Meta AI Proposes Multi-Token Attention (MTA): A New Attention Method which Allows LLMs to Condition their Attention Weights on Multiple Query and Key VectorsAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Meet ReSearch: A Novel AI Framework that Trains LLMs to Reason with Search via Reinforcement Learning without Using Any Supervised Data on Reasoning StepsAsif Razzaqhttps://www.marktechpost.com/author/6flvq/How to Build a Prototype X-ray Judgment Tool (Open Source Medical Inference System) Using TorchXRayVision, Gradio, and PyTorchAsif Razzaqhttps://www.marktechpost.com/author/6flvq/A Code Implementation of Using Atlas Evaluation Platform and Selene Model via Python SDK to Score Legal Domain LLM Outputs for GDPR Compliance
    0 Yorumlar ·0 hisse senetleri ·9 Views
  • Meta AI Proposes Multi-Token Attention (MTA): A New Attention Method which Allows LLMs to Condition their Attention Weights on Multiple Query and Key Vectors
    www.marktechpost.com
    Large Language Models (LLMs) significantly benefit from attention mechanisms, enabling the effective retrieval of contextual information. Nevertheless, traditional attention methods primarily depend on single token attention, where each attention weight is computed from a single pair of query and key vectors. This design inherently constrains the models ability to discern contexts requiring the integration of multiple token signals, thereby limiting its effectiveness on complex linguistic dependencies. For example, identifying sentences simultaneously containing both Alice and rabbit is challenging because conventional attention mechanisms struggle to integrate multiple separate attention signals efficiently without substantially increasing model complexity.Meta AI addresses this limitation by introducing Multi-Token Attention (MTA), an advanced attention mechanism that conditions attention weights simultaneously on multiple query and key vectors. MTA integrates convolution operations over queries, keys, and attention heads, thus enhancing the precision and efficiency of contextual information retrieval. Specifically, the MTA framework consists of two convolutional components: key-query convolution, which aggregates multiple token signals within individual attention heads, and head mixing convolution, which facilitates information sharing among different attention heads. Additionally, the implementation employs group normalization with depth-dependent scaling to stabilize gradient flow, further improving model training stability and efficacy.At a technical level, MTA modifies conventional attention calculations by incorporating a two-dimensional convolution operation on the attention logits prior to softmax normalization. This convolution allows adjacent queries and keys to influence attention scores mutually, thus enabling the attention mechanism to identify contextual relationships involving multiple tokens more precisely. Consequently, the model efficiently aggregates local token interactions without substantially increasing the number of parameters or the dimensionality of attention vectors. Moreover, head convolution promotes effective knowledge transfer among attention heads, selectively amplifying relevant context signals while mitigating less pertinent information. Collectively, these enhancements yield a more robust attention mechanism capable of capturing complex multi-token interactions.Empirical evaluations validate the efficacy of MTA across several benchmarks. In a structured motivating task explicitly designed to illustrate the shortcomings of single-token attention mechanisms, MTA demonstrated near-perfect performance, achieving an error rate of only 0.1%, in contrast to standard Transformer models that exhibited error rates above 50%. Further large-scale experiments involving an 880M-parameter model trained on 105 billion tokens showed MTA consistently outperforming baseline architectures. MTA achieved superior validation perplexity scores across datasets such as arXiv, GitHub, and Wikipedia. Specifically, in tasks requiring extended context comprehension, such as Needle-in-the-Haystack and BabiLong benchmarks, MTA significantly exceeded the performance of standard Transformer models. In the Needle-in-the-Haystack task with 4K token contexts containing multiple needles, MTA attained accuracies ranging from 67% to 97.6%, surpassing standard models by substantial margins.In summary, Multi-Token Attention (MTA) presents a refined advancement in attention mechanisms by addressing fundamental limitations of traditional single-token attention. Leveraging convolutional operations to concurrently integrate multiple query-key interactions, MTA enhances the ability of language models to handle intricate contextual dependencies. These methodological improvements facilitate more precise and efficient performance, particularly in scenarios involving complex token interactions and long-range contextual understanding. Through targeted modifications to standard attention mechanisms, MTA contributes meaningfully to the evolution of more sophisticated, accurate, and computationally efficient language models.Check outthe Paper.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our85k+ ML SubReddit. [Register Now] miniCON Virtual Conference on OPEN SOURCE AI: FREE REGISTRATION + Certificate of Attendance + 3 Hour Short Event (April 12, 9 am- 12 pm PST) + Hands on Workshop [Sponsored]The post Meta AI Proposes Multi-Token Attention (MTA): A New Attention Method which Allows LLMs to Condition their Attention Weights on Multiple Query and Key Vectors appeared first on MarkTechPost.
    0 Yorumlar ·0 hisse senetleri ·42 Views
  • A Comprehensive Guide to LLM Routing: Tools and Frameworks
    www.marktechpost.com
    Deploying LLMs presents challenges, particularly in optimizing efficiency, managing computational costs, and ensuring high-quality performance. LLM routing has emerged as a strategic solution to these challenges, enabling intelligent task allocation to the most suitable models or tools. Lets delve into the intricacies of LLM routing, explore various tools and frameworks designed for its implementation, and examine academic perspectives on the subject.Understanding LLM RoutingLLM routing is a process of examining incoming queries or tasks and directing them to the best-suited language model or collection of models in a system. This guarantees that every task is treated by the optimal model suited to its particular needs, resulting in better-quality responses and optimal resource use. For example, simple questions may be handled by less resource-heavy, smaller models, whereas computationally heavy and sophisticated tasks may be assigned to more powerful LLMs. This dynamic reallocation optimizes computational expense, response time, and accuracy.How LLM Routing WorksThe LLM routing process typically involves three key steps:Query Analysis: The system examines the incoming query, considering content, intent, required domain knowledge, complexity, and specific user preferences or requirements.Model Selection: Based on the analysis, the router evaluates available models by assessing their capabilities, specializations, past performance metrics, current load, availability, and associated operational costs.Query Forwarding: The router directs the query to the selected model(s) for processing, ensuring that the most suitable resource handles each task.This intelligent routing mechanism enhances the overall performance of AI systems by ensuring that tasks are processed efficiently and effectively. citeturn0search0The Rationale Behind LLM RoutingThe requirement for LLM routing stems from the varying capabilities and resource demands of language models. Using one monolithic model for every task results in inefficiencies, particularly when less complex models can better respond to specific queries. Through routing, systems can dynamically allocate tasks according to the complexity and capability of available models, maximizing the use of computational resources. The approach increases throughput, lowers latency, and efficiently manages operational expense.Several innovative frameworks and tools have been developed to facilitate LLM routing, each bringing unique features to optimize resource utilization and maintain high-quality output.RouteLLMRouteLLM is a leading open-source framework that has been developed with the express purpose of maximizing the cost savings and efficiency of LLM deployment. Designed as a drop-in replacement for current API integrations such as OpenAIs client, RouteLLM integrates seamlessly with current infrastructure. The framework also dynamically assesses query complexity, sending simple or lower-resource queries to smaller, more cost-effective models and more difficult queries to heavy-duty, high-performance LLMs. In doing so, RouteLLM lowers operational expenses dramatically, with real-world deployments shown to save as much as 85% of costs while maintaining performance near GPT-4 levels. The platform is also extremely extensible, making it simple to incorporate new routing strategies and models and test them on varied tasks. RouteLLM achieves the highest routing accuracy and cost savings by dynamically routing queries to best-fit models depending on complexity. It offers robust extensibility for customization and benchmarking, enabling it to be extremely flexible for various deployment applications.NVIDIA AI Blueprint for LLM RoutingNVIDIA offers an advanced AI Blueprint designed explicitly for efficient multi-LLM routing. Leveraging a robust Rust-based backend powered by the NVIDIA Triton Inference Server, this tool ensures extremely low latency, often rivaling direct inference requests. NVIDIAs AI Blueprint framework is compatible with various foundational models, including NVIDIAs own NIM models and third-party LLMs, providing broad integration capabilities. Also, its compatibility with the OpenAI API standard allows developers to replace existing OpenAI-based deployments with minimal configuration changes, streamlining integration into the current infrastructure. NVIDIAs AI Blueprint prioritizes performance through a highly optimized architecture that reduces latency. It offers broad configurability with multiple foundational models, simplifying the deployment of diverse LLM ecosystems.Martian: Model RouterMartians Model Router is yet another advanced solution intended to enhance the operational efficiency of AI systems utilizing multiple LLMs. The solution provides uninterrupted uptime by redirecting inquiries successfully in real time during outages or performance issues, thus delivering equal service quality. Martians routing algorithms are intelligent and examine the incoming queries to select models accordingly based on their capabilities and current status. This smart decision-making mechanism enables Martian to utilize resources optimally, minimizing infrastructure expenses without compromising response speed or accuracy. Martians Model Router is well-equipped to ensure system reliability through real-time rerouting. Its sophisticated analysis capabilities ensure that every query reaches the best model, effectively balancing performance and operational expenses.LangChainLangChain is a general-purpose and popular software framework for plugging LLMs into applications, with strong features architected specifically for intelligent routing. It makes it easy to plug in different LLMs, allowing developers to apply rich routing schemes that choose the right model depending on the needs of the task, performance requirements, and cost. LangChain is compatible with varied use-cases, such as chatbots, summarization of text, analysis of documents, and code completion tasks, proving versatility in varied applications and settings. LangChain is highly compatible with ease of integration and flexibility, enabling developers to introduce effective routing techniques for various application setups. LangChain effectively copes with varied operating settings, collectively increasing several LLMs usability.TryageTryage is an innovative method for context-aware routing, drawn from biological metaphors to brain anatomy. It is based on an advanced perceptive router that can predict the performance of various models in terms of input queries and choose the best model to apply. The routing decisions made by Tryage take into consideration anticipated performance, user-level goals, and limitations to deliver optimized and personalized routing results. Its predictive features make it superior to most conventional routing systems, especially in dynamically changing operating environments. Tryage stands out by being context-sensitive in its performance prediction, mapping routing decisions tightly to individual user goals and constraints. Its predictive accuracy supports accurate and customized query allocation, maximizing resource utilization and response quality.PickLLMPickLLM is an adaptive routing system that utilizes reinforcement learning (RL) techniques to control the choice of language models. With an RL-based router, PickLLM repeatedly monitors and learns from cost, latency, and response accuracy metrics to adjust its routing decisions. This iterative learning makes the routing system more efficient and accurate over time. Developers can tailor PickLLMs reward function to their specific business priorities, balancing cost and quality dynamically. PickLLM differentiates itself by the reinforcement learning-based methodology, which supports adaptive and continuously improving routing choices. Its ability to define custom objectives flexibly ensures compatibility with varied operation priorities.MasRouterMasRouter solves routing problems in multi-agent AI systems where specialized LLMs work together on complicated tasks. Using a cascaded controller network, MasRouter effectively decides collaboration modes, allocates roles to various agents, and dynamically routes tasks across available LLMs. Its architecture provides optimal collaboration between specialized models, efficiently handling complex, multi-dimensional queries while maintaining overall system performance and computational efficiency. MasRouters biggest strength lies in its advanced multi-agent coordination, which allows for effective role assignment and collaboration-based routing. It performs best task management even in intricate, multi-model AI implementations.Academic Perspectives on LLM RoutingKey contributions include:Implementing Routing Strategies in Large Language Model-Based SystemsThis paper explores key considerations for integrating routing into LLM-based systems, focusing on resource management, cost definition, and strategy selection. It offers a novel taxonomy of existing approaches and a comparative analysis of industry practices. The paper also identifies critical challenges and directions for future research in LLM routing.Bottlenecks and Considerations in LLM RoutingDespite its substantial benefits, LLM routing presents several challenges that organizations and developers must effectively address. These include:In conclusion, LLM routing represents a vital strategy in optimizing the deployment and utilization of large language models. Routing mechanisms significantly enhance AI system efficiency by intelligently assigning tasks to the most suitable models based on complexity, performance, and cost factors. Although routing introduces challenges such as latency, scalability, and cost management complexities, advancements in intelligent, adaptive routing solutions promise to address these effectively. With the continuous evolution of frameworks, tools, and research in this domain, LLM routing undoubtedly plays a central role in shaping future AI deployments, ensuring optimal performance, cost-efficiency, and user satisfaction.SourcesAlso,feel free to follow us onTwitterand dont forget to join our85k+ ML SubReddit. Sana HassanSana Hassan, a consulting intern at Marktechpost and dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. With a keen interest in solving practical problems, he brings a fresh perspective to the intersection of AI and real-life solutions.Sana Hassanhttps://www.marktechpost.com/author/sana-hassan/Understanding AI Agent Memory: Building Blocks for Intelligent SystemsSana Hassanhttps://www.marktechpost.com/author/sana-hassan/Advancing Medical Reasoning with Reinforcement Learning from Verifiable Rewards (RLVR): Insights from MED-RLVRSana Hassanhttps://www.marktechpost.com/author/sana-hassan/Efficient Inference-Time Scaling for Flow Models: Enhancing Sampling Diversity and Compute AllocationSana Hassanhttps://www.marktechpost.com/author/sana-hassan/UCLA Researchers Released OpenVLThinker-7B: A Reinforcement Learning Driven Model for Enhancing Complex Visual Reasoning and Step-by-Step Problem Solving in Multimodal Systems
    0 Yorumlar ·0 hisse senetleri ·34 Views
  • This AI Paper from ByteDance Introduces a Hybrid Reward System Combining Reasoning Task Verifiers (RTV) and a Generative Reward Model (GenRM) to Mitigate Reward Hacking
    www.marktechpost.com
    Reinforcement Learning from Human Feedback (RLHF) is crucial for aligning LLMs with human values and preferences. Despite introducing non-RL alternatives like DPO, industry-leading models such as ChatGPT/GPT-4, Claude, and Gemini continue to rely on RL algorithms like PPO for policy optimization. Recent research focuses on algorithmic improvements, including eliminating critic models to reduce computational costs, filtering noisy samples during PPO sampling, and enhancing reward models to mitigate reward hacking problems. However, only a few studies focus on RLHF data construction (i.e., training prompts) and its performance scaling based on these training prompts.The success of RLHF heavily depends on reward model quality, which faces three challenges: mis-specified reward modeling in representing human preferences, incorrect and ambiguous preferences in training datasets, and poor generalization ability. To address these issues, GenRM was introduced to validate model predictions against ground-truth responses, showing good resistance to reward hacking and gaining adoption in advanced LLMs like DeepSeekV3. Methods like principled data selection that filter overly challenging instances during training and strategic selection methods identify key training prompts to achieve comparable performance with reduced data. Performance scale analysis reveals that RLHF shows superior generalization compared to SFT on novel inputs but significantly reduces output diversity.Researchers from ByteDance Seed address a critical gap in RLHF research where the role of prompt-data construction and its scalability has received less attention. They explore data-driven bottlenecks that limit RLHF performance scaling, focusing on reward hacking and decreasing response diversity challenges. A hybrid reward system is introduced by combining reasoning task verifiers (RTV) and a generative reward model (GenRM) that shows stronger resistance to reward hacking and enables a more accurate assessment of responses against ground-truth solutions. Moreover, a novel prompt-selection method called Pre-PPO is introduced to identify inherently challenging training prompts less susceptible to reward hacking.The experimental setup employs two pre-trained language models of different scales: a smaller model with 25B parameters and a larger model with 150B parameters. The training dataset contains one million prompts from diverse domains, including mathematics, coding, instruction-following, creative writing, and logical reasoning. Moreover, the researchers constructed a detailed evaluation framework covering multiple skill areas: logical reasoning, instruction-following, STEM tasks, coding, natural language processing, knowledge, contextual understanding, and out-of-distribution generalization. The evaluation framework includes two versions (V1.0 and V2.0) with overlapping prompts, though V2.0 features more challenging prompts.The experimental results show that the proposed approach combining Pre-PPO with prioritized mathematical and coding tasks consistently outperforms the baseline method across model sizes and evaluation datasets. The approach shows an improvement of +1.1 over the baseline when evaluated at 100-step intervals using TestSet V1.0. When tested on the more challenging TestSet V2.0, the performance improvement increases to +1.4. The most substantial gains appear in mathematics-intensive and coding tasks, with an improvement of +3.9 points in STEM and +3.2 points in coding. These improvements are attributed to the strategic prioritization of mathematical reasoning and coding tasks during early RLHF training phases.In conclusion, this paper addresses critical bottlenecks in RLHF data scaling, specifically identifying reward hacking and reduced response diversity as significant challenges. The researchers proposed a combined approach featuring strategic prompt construction and early-stage training prioritization to solve this issue. The method uses RTV and GenRM to combat reward hacking alongside the novel Pre-PPO prompt selection strategy that identifies and prioritizes challenging training prompts. Analysis reveals that RTV supervision shows the strongest resistance to reward hacking, followed by GenRM with ground-truth labels and then the BT Reward Model. The research establishes a foundation for optimizing RLHF data construction and developing more principle methods to reward hacking and model alignment.Check outthe Paper and GitHub Page.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our85k+ ML SubReddit. [Register Now] miniCON Virtual Conference on OPEN SOURCE AI: FREE REGISTRATION + Certificate of Attendance + 3 Hour Short Event (April 12, 9 am- 12 pm PST) + Hands on Workshop [Sponsored] Sajjad AnsariSajjad Ansari is a final year undergraduate from IIT Kharagpur. As a Tech enthusiast, he delves into the practical applications of AI with a focus on understanding the impact of AI technologies and their real-world implications. He aims to articulate complex AI concepts in a clear and accessible manner.Sajjad Ansarihttps://www.marktechpost.com/author/sajjadansari/VideoMind: A Role-Based Agent for Temporal-Grounded Video UnderstandingSajjad Ansarihttps://www.marktechpost.com/author/sajjadansari/PilotANN: A Hybrid CPU-GPU System For Graph-based ANNSSajjad Ansarihttps://www.marktechpost.com/author/sajjadansari/This AI Paper Propose the UI-R1 Framework that Extends Rule-based Reinforcement Learning to GUI Action Prediction TasksSajjad Ansarihttps://www.marktechpost.com/author/sajjadansari/TokenBridge: Bridging The Gap Between Continuous and Discrete Token Representations In Visual Generation
    0 Yorumlar ·0 hisse senetleri ·18 Views
  • The Complete Beginners Guide to Terminal/Command Prompt
    www.marktechpost.com
    The terminal (on Mac/Linux) or command prompt (on Windows) is a powerful tool that allows you to interact with your computer using text commands instead of clicking through a graphical interface. While it might seem intimidating at first, mastering basic terminal commands can help you:Navigate through files and folders more efficientlyPerform tasks that arent possible through the regular interfaceAutomate repetitive tasksGain a deeper understanding of how your computer worksThis guide will introduce you to the essential commands and concepts to get you started, regardless of which operating system you use.Getting StartedOpening the TerminalOn Windows:Press Win + R, type cmd, and press EnterOr search for Command Prompt in the Start menuOn Mac:Press Command + Space to open Spotlight, type Terminal, and press EnterOr find Terminal in Applications Utilities TerminalOn Linux:Press Ctrl + Alt + T (on most distributions)Or search for Terminal in your applications menuUnderstanding the PromptWhen you first open the terminal, youll see a prompt that looks something like this:Windows: C:\Users\YourUsername>Mac/Linux: username@computer:~$This tells you:Your current location in the file systemWhere to type your commandsOn Mac/Linux, the ~ symbol represents your home directoryBasic Navigation CommandsViewing Your Current LocationWindows: cdMac/Linux: pwd (Print Working Directory)Example:Listing Files and DirectoriesWindows: dirMac/Linux: lsExample:Options:ls -l List with detailed information (file size, date modified, permissions)ls -a Show hidden files (files that start with a dot)ls -la Combine both optionsChanging DirectoriesAll platforms: cd DirectoryNameExamples:Creating DirectoriesAll platforms: mkdir DirectoryNameExample:Creating FilesWindows: type nul > filename.txtMac/Linux: touch filename.txtExample:Working with FilesViewing File ContentsWindows: type filename.txtMac/Linux: cat filename.txtFor larger files:Windows: more filename.txtMac/Linux: less filename.txt (use q to quit)Copying FilesWindows: copy source destinationMac/Linux: cp source destinationExample:Moving/Renaming FilesWindows: move source destinationMac/Linux: mv source destinationExamples:Deleting Files and DirectoriesWindows:Mac/Linux: Warning: Be very careful with delete commands, especially rm -r! There is no Recycle Bin or Trash when using the terminal deletions are permanent.Helpful TipsCommand HistoryPress the up arrow to cycle through previously used commandsOn Mac/Linux, type history to see a list of recent commandsTab CompletionStart typing a file or directory name, then press TabThe terminal will attempt to complete it for youIf there are multiple options, press Tab twice to see all possibilitiesGetting HelpWindows: help command or command /?Mac/Linux: man command (manual pages, press q to exit)Examples:Clearing the ScreenWindows: clsMac/Linux: clear or Ctrl+LPower User CommandsSearching for FilesWindows: dir /s filenameMac/Linux: find . -name filenameSearching Within FilesWindows: findstr text filenameMac/Linux: grep text filenameChaining CommandsAll platforms: Use && to run commands in sequenceExample:Redirecting OutputAll platforms: Use > to send output to a fileExample:Next StepsAs you become more comfortable with these basic commands, you might want to explore:Command line text editors like Nano, Vim, or EmacsWriting simple shell scripts to automate tasksPackage managers like apt (Linux), Homebrew (Mac), or Chocolatey (Windows)Environment variables and how to set themSSH to connect to remote computersCommon Mistakes and TroubleshootingCommand not found: Check spelling or ensure the command is available on your systemPermission denied: You may need administrator/root privilegesWindows: Run Command Prompt as AdministratorMac/Linux: Use sudo before commands that need elevated privilegesNo such file or directory: Double-check path and file namesOperation not permitted: Similar to permission denied, you might need special permissionsTasksWindowsMac/LinuxCurrent locationcdpwdList filesdirlsChange directorycd dircd dirCreate directorymkdir dirmkdir dirCreate filetype nul > filetouch fileCopy filecopy source destinationcp source destinationMove/renamemove source destinationmv source destinationDelete filedel filerm fileDelete directoryrmdir /s dirrm -r dirClear screenclsclearGet helphelp commandman commandConclusionIn this tutorial, we have covered everything beginners need to know about using the terminal. We explored how to open the terminal across different operating systems, navigate file systems, create and manage files and directories, and use essential commands. We also learned helpful shortcuts, power user commands, and troubleshooting tips. With these foundational skills, you can now confidently use the command line as a powerful tool in your computing journey.Remember, the terminal is a powerful tool that rewards practice and experimentation. Dont be afraid to try new commands, but always be careful with commands that modify or delete files. Also,feel free to follow us onTwitterand dont forget to join our85k+ ML SubReddit. [Register Now] miniCON Virtual Conference on OPEN SOURCE AI: FREE REGISTRATION + Certificate of Attendance + 3 Hour Short Event (April 12, 9 am- 12 pm PST) + Hands on Workshop [Sponsored]The post The Complete Beginners Guide to Terminal/Command Prompt appeared first on MarkTechPost.
    0 Yorumlar ·0 hisse senetleri ·17 Views
  • Meet ReSearch: A Novel AI Framework that Trains LLMs to Reason with Search via Reinforcement Learning without Using Any Supervised Data on Reasoning Steps
    www.marktechpost.com
    Large language models (LLMs) have demonstrated significant progress across various tasks, particularly in reasoning capabilities. However, effectively integrating reasoning processes with external search operations remains challenging, especially for multi-hop questions requiring intricate reasoning chains and multiple retrieval steps. Current methods primarily depend on manually designed prompts or heuristics, posing limitations in scalability and flexibility. Additionally, generating supervised data for multi-step reasoning scenarios is often prohibitively expensive and practically infeasible.Researchers from Baichuan Inc., Tongji University, The University of Edinburgh, and Zhejiang University introduce ReSearch, a novel AI framework designed to train LLMs to integrate reasoning with search via reinforcement learning, notably without relying on supervised reasoning steps. The core methodology of ReSearch incorporates search operations directly into the reasoning chain. Utilizing Group Relative Policy Optimization (GRPO), a reinforcement learning technique, ReSearch guides LLMs to autonomously identify optimal moments and strategies for performing search operations, which subsequently influence ongoing reasoning. This approach enables models to progressively refine their reasoning and naturally facilitates advanced capabilities such as reflection and self-correction.From a technical perspective, ReSearch employs structured output formats by embedding specific tagssuch as <think>, <search>, <result>, and <answer>within the reasoning chain. These tags facilitate clear communication between the model and the external retrieval environment, systematically organizing generated outputs. During training, ReSearch intentionally excludes retrieval results from loss computations to prevent model bias. Reward signals guiding the reinforcement learning process are based on straightforward criteria: accuracy assessment through F1 scores and adherence to the predefined structured output format. This design encourages the autonomous development of sophisticated reasoning patterns, circumventing the need for manually annotated reasoning datasets.Experimental evaluation confirms the robustness of ReSearch. When assessed on multi-hop question-answering benchmarks, including HotpotQA, 2WikiMultiHopQA, MuSiQue, and Bamboogle, ReSearch consistently outperformed baseline methods. Specifically, ReSearch-Qwen-32B-Instruct achieved improvements ranging between 8.9% and 22.4% in performance compared to established baselines. Notably, these advancements were achieved despite the model being trained exclusively on a single dataset, underscoring its strong generalization capabilities. Further analyses demonstrated that models gradually increased their reliance on iterative search operations throughout training, indicative of enhanced reasoning proficiency. A detailed case study illustrated the models capacity to identify suboptimal search queries, reflect on its reasoning steps, and implement corrective actions autonomously.In summary, ReSearch presents a significant methodological advancement in training LLMs to seamlessly integrate reasoning with external search mechanisms via reinforcement learning. By eliminating dependency on supervised reasoning data, this framework effectively addresses critical scalability and adaptability issues inherent in multi-hop reasoning scenarios. Its capability for self-reflection and correction enhances its practical applicability in complex, realistic contexts. Future research directions may further extend this reinforcement learning-based framework to broader applications and incorporate additional external knowledge resources.Check outthe Paper and GitHub Page.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our85k+ ML SubReddit. Asif RazzaqWebsite| + postsBioAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.Asif Razzaqhttps://www.marktechpost.com/author/6flvq/How to Build a Prototype X-ray Judgment Tool (Open Source Medical Inference System) Using TorchXRayVision, Gradio, and PyTorchAsif Razzaqhttps://www.marktechpost.com/author/6flvq/A Code Implementation of Using Atlas Evaluation Platform and Selene Model via Python SDK to Score Legal Domain LLM Outputs for GDPR ComplianceAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Meet Hostinger Horizons: A No-Code AI Tool that Lets You Create, Edit, and Publish Custom Web Apps Without Writing a Single Line of CodeAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Tencent AI Researchers Introduce Hunyuan-T1: A Mamba-Powered Ultra-Large Language Model Redefining Deep Reasoning, Contextual Efficiency, and Human-Centric Reinforcement Learning
    0 Yorumlar ·0 hisse senetleri ·44 Views
  • How to Use Git and Git Bash Locally: A Comprehensive Guide
    www.marktechpost.com
    Table of contentsIntroductionGit is a distributed version control system that helps you track changes in your code, collaborate with others, and maintain a history of your project. Git Bash is a terminal application for Windows that provides a Unix-like command-line experience for using Git.This guide will walk you through setting up Git, using Git Bash, and mastering essential Git commands for local development.InstallationWindowsDownload Git for Windows from git-scm.comRun the installer with default options (or customize as needed)Git Bash will be installed automatically as part of the packagemacOSInstall Git using Homebrew: brew install gitAlternatively, download from git-scm.comLinuxFor Debian/Ubuntu: sudo apt-get install gitFor Fedora: sudo dnf install gitFor other distributions, use the appropriate package managerVerifying InstallationOpen Git Bash (Windows) or Terminal (macOS/Linux) and type:This should display the installed Git version.Git Bash BasicsGit Bash provides a Unix-like shell experience on Windows. Here are some essential commands:Navigation Commandspwd Print working directoryls List files and directoriescd [directory] Change directorymkdir [directory] Create a new directoryrm [file] Remove a filerm -r [directory] Remove a directory and its contentsFile Operationstouch [filename] Create an empty filecat [filename] Display file contentsnano [filename] or vim [filename] Edit files in the terminalKeyboard ShortcutsCtrl + C Terminate the current commandCtrl + L Clear the screenTab Auto-complete commands or filenamesUp/Down arrows Navigate through command historyGit ConfigurationBefore using Git, configure your identity:Additional ConfigurationsSet your default editor:Enable colorful output:View all configurations:Basic Git WorkflowInitializing a RepositoryNavigate to your project folder and initialize a Git repository:Checking StatusSee which files are tracked, modified, or staged:Staging FilesAdd files to the staging area:Committing ChangesSave staged changes to the repository:Or open an editor to write a more detailed commit message:Viewing Commit HistoryBranching and MergingWorking with BranchesCreate a new branch:Switch to a branch:Create and switch to a new branch in one command:List all branches:Merging BranchesMerge changes from another branch into your current branch:Handling Merge ConflictsWhen Git cant automatically merge changes, youll need to resolve conflicts:Git will mark the conflicted filesOpen the files and look for conflict markers (<<<<<<<, =======, >>>>>>>)Edit the files to resolve conflictsAdd the resolved files: git add <filename>Complete the merge: git commitDeleting BranchesDelete a branch after merging:Remote RepositoriesAdding a Remote RepositoryViewing Remote RepositoriesPushing to a Remote RepositoryPulling from a Remote RepositoryCloning a RepositoryAdvanced Git CommandsStashing ChangesTemporarily store modified files to work on something else:Reverting ChangesUndo commits:Reset to a previous state (use with caution):Viewing and Comparing ChangesInteractive RebaseRewrite, squash, or reorder commits:TroubleshootingCommon Issues and SolutionsProblem: fatal: not a git repositorySolution: Make sure youre in the correct directory or initialize a repository with git initProblem: Unable to push to remote repositorySolution:Check if you have the correct permissionsPull latest changes first: git pull origin mainCheck if remote URL is correct: git remote -vProblem: Merge conflictsSolution: Resolve conflicts manually, then git add the resolved files and git commitProblem: Accidental commitSolution: Use git reset soft HEAD~1 to undo the last commit while keeping changesGit Best PracticesCommit frequently with clear, descriptive commit messagesCreate branches for new features or bug fixesPull before pushing to minimize conflictsWrite meaningful commit messages that explain why changes were madeUse .gitignore to exclude unnecessary files (build artifacts, dependencies, etc.)Review changes before committing with git diff and git statusKeep commits focused on a single logical changeUse tags for marking releases or important milestonesBack up your repositories regularlyDocument your Git workflow for team collaboration.gitignore ExampleCreate a .gitignore file in your repository root:Customize this file according to your projects specific needs.ConclusionGit and Git Bash provide powerful tools for version control and collaborative development. In this guide, we covered installation across platforms, essential Git Bash commands, repository initialization, the core add-commit workflow, branching strategies, remote repository management, and advanced operations like stashing and rebasing. We also addressed common troubleshooting scenarios and best practices to maintain a clean workflow. With these fundamentals, youre now equipped to track changes, collaborate effectively, and maintain a structured history of your projects. NikhilNikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute.Nikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces Diversified DPO and ORPO: Post-Training Methods to Boost Output Diversity in Creative Writing with LLMsNikhilhttps://www.marktechpost.com/author/nikhil0980/A Beginners Guide to Using Visual Studio Code for PythonNikhilhttps://www.marktechpost.com/author/nikhil0980/Empowering Time Series AI: How Salesforce is Leveraging Synthetic Data to Enhance Foundation ModelsNikhilhttps://www.marktechpost.com/author/nikhil0980/Meta Reality Labs Research Introduces Sonata: Advancing Self-Supervised Representation Learning for 3D Point Clouds
    0 Yorumlar ·0 hisse senetleri ·36 Views
  • This AI Paper Introduces Diversified DPO and ORPO: Post-Training Methods to Boost Output Diversity in Creative Writing with LLMs
    www.marktechpost.com
    Creative writing is a domain that thrives on diversity and imagination. Unlike fact-based or task-specific writing, where a single correct output may exist, creative writing involves numerous valid responses to a prompt. Stories, poems, and narratives can branch in countless directions, each with stylistic flavor and meaning. This inherent open-mindedness makes creative writing a prime challenge for AI systems, which need to maintain narrative coherence while producing novel and distinct outputs.The core issue lies in how large language models are refined after their initial training. Post-training methods often emphasize quality improvements by aligning responses with user preferences or maximizing reward scores. However, these adjustments inadvertently cause the models to produce responses that are too similar across prompts. In creative settings, this leads to a noticeable drop in output diversity. A lack of variation limits the expressive power of the model, resulting in uniform storylines or similar sentence constructions even when prompts are vastly different.Earlier solutions attempted to address this by tweaking decoding methods or prompt strategies. Researchers used sampling temperature adjustment, top-k or top-p filtering, or iterative prompting to introduce randomness. Some explored methods, such as beam search modifications or self-critiquing, to encourage alternative responses. While these helped diversify outputs, they often came with a costsacrificing overall response quality, increasing generation time, or introducing inconsistencies in tone and grammar. More crucially, they did not adopt the models core training process to learn from diverse samples.Researchers from Midjourney and New York University proposed a novel adjustment during the post-training phase. They introduced Diversified DPO and Diversified ORPOenhanced versions of two popular preference-based optimization techniques. Their innovation was incorporating a deviation score, quantifying how much a training example differs from others responding to the same prompt. Rare and diverse responses are given more importance during learning by using this score to weight training losses. The researchers specifically implemented these strategies on large models like Metas Llama-3.1-8B and Mistral-7B using parameter-efficient fine-tuning via LoRA.In this approach, deviation acts as a learning signal. For every training pair of a better and worse response to a prompt, the deviation of the better response is computed using both semantic and stylistic embeddings. These embeddings measure not only content differences but also stylistic uniqueness between responses. The resulting score then influences how much that training pair contributes to the models weight updates. This method increases the likelihood that the model generates distinct yet high-quality outputs. The training used over 400,000 prompt-response pairs with Reddit upvotes as quality signals and introduced mixing methods to effectively balance semantic and style deviations.Quantitative results demonstrated the success of the proposed method. The best-performing model, Llama-3.1-8B with Diversified DPO using semantic and style deviation (DDPO-both), achieved nearly the same reward score as GPT-4o while significantly outperforming it in diversity. Specifically, the model had semantic diversity approaching that of the human-crafted reference dataset and style diversity slightly below it. In head-to-head human evaluations, 68% of reviewers preferred DDPO-boths outputs over GPT-4os for quality, and 100% chose them as more diverse. Compared to the baseline DPO, DDPO-both still came out ahead, selected 50% of the time for quality and 62% for diversity. When fewer responses per prompt were available during training, slight drops in reward scores were mitigated using a minimum deviation threshold or sampling higher-quality responses.This research highlighted a compelling solution to the diversity-quality trade-off in AI-generated creative writing. By emphasizing deviation in training, the researchers enabled models to value uniqueness without compromising coherence. The outcome is a model that delivers richer and more varied storytelling, marking a meaningful step forward in creative AI development.Check outthe Paper.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our85k+ ML SubReddit. NikhilNikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute.Nikhilhttps://www.marktechpost.com/author/nikhil0980/A Beginners Guide to Using Visual Studio Code for PythonNikhilhttps://www.marktechpost.com/author/nikhil0980/Empowering Time Series AI: How Salesforce is Leveraging Synthetic Data to Enhance Foundation ModelsNikhilhttps://www.marktechpost.com/author/nikhil0980/Meta Reality Labs Research Introduces Sonata: Advancing Self-Supervised Representation Learning for 3D Point CloudsNikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces the Kolmogorov-Test: A Compression-as-Intelligence Benchmark for Evaluating Code-Generating Language Models
    0 Yorumlar ·0 hisse senetleri ·22 Views
  • How to Build a Prototype X-ray Judgment Tool (Open Source Medical Inference System) Using TorchXRayVision, Gradio, and PyTorch
    www.marktechpost.com
    In this tutorial, we demonstrate how to build a prototype X-ray judgment tool using open-source libraries in Google Colab. By leveraging the power of TorchXRayVision for loading pre-trained DenseNet models and Gradio for creating an interactive user interface, we show how to process and classify chest X-ray images with minimal setup. This notebook guides you through image preprocessing, model inference, and result interpretation, all designed to run seamlessly on Colab without requiring external API keys or logins. Please note that this demo is intended for educational purposes only and should not be used as a substitute for professional clinical diagnosis.!pip install torchxrayvision gradioFirst, we install the torchxrayvision library for X-ray analysis and Gradio to create an interactive interface.import torchimport torchxrayvision as xrvimport torchvision.transforms as transformsimport gradio as grWe import PyTorch for deep learning operations, TorchXRayVision for Xray analysis, torchvisions transforms for image preprocessing, and Gradio for building an interactive UI.model = xrv.models.DenseNet(weights="densenet121-res224-all")model.eval() Then, we load a pre-trained DenseNet model using the densenet121-res224-all weights and set it to evaluation mode for inference.try: pathology_labels = model.meta["labels"] print("Retrieved pathology labels from model.meta.")except Exception as e: print("Could not retrieve labels from model.meta. Using fallback labels.") pathology_labels = [ "Atelectasis", "Cardiomegaly", "Consolidation", "Edema", "Emphysema", "Fibrosis", "Hernia", "Infiltration", "Mass", "Nodule", "Pleural Effusion", "Pneumonia", "Pneumothorax", "No Finding" ]Now, we attempt to retrieve pathology labels from the models metadata and fall back to a predefined list if the retrieval fails.def classify_xray(image): try: transform = transforms.Compose([ transforms.Resize((224, 224)), transforms.Grayscale(num_output_channels=1), transforms.ToTensor() ]) input_tensor = transform(image).unsqueeze(0) # add batch dimension with torch.no_grad(): preds = model(input_tensor) pathology_scores = preds[0].detach().numpy() results = {} for idx, label in enumerate(pathology_labels): results[label] = float(pathology_scores[idx]) sorted_results = sorted(results.items(), key=lambda x: x[1], reverse=True) top_label, top_score = sorted_results[0] judgement = ( f"Prediction: {top_label} (score: {top_score:.2f})nn" f"Full Scores:n{results}" ) return judgement except Exception as e: return f"Error during inference: {str(e)}"Here, with this function, we preprocess an input X-ray image, run inference using the pre-trained model, extract pathology scores, and return a formatted summary of the top prediction and all scores while handling errors gracefully.iface = gr.Interface( fn=classify_xray, inputs=gr.Image(type="pil"), outputs="text", title="X-ray Judgement Tool (Prototype)", description=( "Upload a chest X-ray image to receive a classification judgement. " "This demo is for educational purposes only and is not intended for clinical use." ))iface.launch()Finally, we build and launch a Gradio interface that lets users upload a chest X-ray image. The classify_xray function processes the image to output a diagnostic judgment.Gradio Interface for the toolThrough this tutorial, weve explored the development of an interactive X-ray judgment tool that integrates advanced deep learning techniques with a user-friendly interface. Despite the inherent limitations, such as the model not being fine-tuned for clinical diagnostics, this prototype serves as a valuable starting point for experimenting with medical imaging applications. We encourage you to build upon this foundation, considering the importance of rigorous validation and adherence to medical standards for real-world use.Here is the Colab Notebook. Also,dont forget to follow us onTwitterand join ourTelegram ChannelandLinkedIn Group. Dont Forget to join our85k+ ML SubReddit. Asif RazzaqWebsite| + postsBioAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.Asif Razzaqhttps://www.marktechpost.com/author/6flvq/A Code Implementation of Using Atlas Evaluation Platform and Selene Model via Python SDK to Score Legal Domain LLM Outputs for GDPR ComplianceAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Meet Hostinger Horizons: A No-Code AI Tool that Lets You Create, Edit, and Publish Custom Web Apps Without Writing a Single Line of CodeAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Tencent AI Researchers Introduce Hunyuan-T1: A Mamba-Powered Ultra-Large Language Model Redefining Deep Reasoning, Contextual Efficiency, and Human-Centric Reinforcement LearningAsif Razzaqhttps://www.marktechpost.com/author/6flvq/NVIDIA AI Researchers Introduce FFN Fusion: A Novel Optimization Technique that Demonstrates How Sequential Computation in Large Language Models LLMs can be Effectively Parallelized
    0 Yorumlar ·0 hisse senetleri ·20 Views
  • VideoMind: A Role-Based Agent for Temporal-Grounded Video Understanding
    www.marktechpost.com
    LLMs have shown impressive capabilities in reasoning tasks like Chain-of-Thought (CoT), enhancing accuracy and interpretability in complex problem-solving. While researchers are extending these capabilities to multi-modal domains, videos present unique challenges due to their temporal dimension. Unlike static images, videos require understanding dynamic interactions over time. Current visual CoT methods excel with static inputs but struggle with video content because they cannot explicitly localize or revisit specific moments in sequences. Humans overcome these challenges by breaking down complex problems, identifying and revisiting key moments, and synthesizing observations into coherent answers. This approach highlights the need for AI systems to manage multiple reasoning abilities.Recent video understanding advances have improved tasks like captioning and question answering, but models often lack visual-grounded correspondence and interpretability, especially for long-form videos. Video Temporal Grounding addresses this by requiring precise localization. Large Multimodal Models trained with supervised instruction-tuning struggle with complex reasoning tasks. Two major approaches have emerged to address these limitations: agent-based interfaces and pure text-based reasoning paradigms exemplified by CoT processes. Moreover, Inference-time searching techniques are valuable in domains like robotics, games, and navigation by allowing models to iteratively refine outputs without changing underlying weights.Researchers from the Hong Kong Polytechnic University and Show Lab, National University of Singapore, have proposed VideoMind, a video-language agent designed for temporal-grounded video understanding. VideoMind introduces two key innovations to address the challenges of video reasoning. First, it identifies essential capabilities for video temporal reasoning and implements a role-based agentic workflow with specialized components: a planner, a grounder, a verifier, and an answerer. Second, it proposes a Chain-of-LoRA strategy that enables seamless role-switching through lightweight LoRA adaptors, avoiding the overhead of multiple models while balancing efficiency and flexibility. Experiments across 14 public benchmarks show state-of-the-art performance in diverse video understanding tasks.VideoMind builds upon the Qwen2-VL, combining an LLM backbone with a ViT-based visual encoder capable of handling dynamic resolution inputs. Its core innovation is its Chain-of-LoRA strategy, which dynamically activates role-specific LoRA adapters during inference via self-calling. Moreover, it contains four specialized components: (a) Planner, which coordinates all other roles and determines which function to call next based on query, (b) Grounder, which localizes relevant moments by identifying start and end timestamps based on text queries (c) Verifier, which provides binary (Yes/No) responses to validate temporal intervals and (d) Answerer, which generates responses based on either cropped video segments identified by the Grounder or the entire video when direct answering is more appropriate.In grounding metrics, VideoMinds lightweight 2B model outperforms most compared models, including InternVL2-78B and Claude-3.5-Sonnet, with only GPT-4o showing superior results. However, the 7B version of VideoMind surpasses even GPT-4o, achieving competitive overall performance. On the NExT-GQA benchmark, the 2B model matches state-of-the-art 7B models across both agent-based and end-to-end approaches, comparing favorably with text-rich, agent-based solutions like LLoVi, LangRepo, and SeViLA. VideoMind shows exceptional zero-shot capabilities, outperforming all LLM-based temporal grounding methods and achieving competitive results compared to fine-tuned temporal grounding experts. Moreover, VideoMind excels in general video QA tasks across Video-MME (Long), MLVU, and LVBench, showing effective localization of cue segments before answering questions.In this paper, researchers introduced VideoMind, a significant advancement in temporal grounded video reasoning. It addresses the complex challenges of video understanding through agentic workflow, combining a Planner, Grounder, Verifier, Answerer, and an efficient Chain-of-LoRA strategy for role-switching. Experiments across three key domains, grounded video question-answering, video temporal grounding, and general video question-answering, confirm VideoMinds effectiveness for long-form video reasoning tasks where it provides precise, evidence-based answers. This work establishes a foundation for future developments in multimodal video agents and reasoning capabilities, opening new pathways for more complex video understanding systems.Check outthe Paper and Project Page.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our85k+ ML SubReddit. Sajjad AnsariSajjad Ansari is a final year undergraduate from IIT Kharagpur. As a Tech enthusiast, he delves into the practical applications of AI with a focus on understanding the impact of AI technologies and their real-world implications. He aims to articulate complex AI concepts in a clear and accessible manner.Sajjad Ansarihttps://www.marktechpost.com/author/sajjadansari/PilotANN: A Hybrid CPU-GPU System For Graph-based ANNSSajjad Ansarihttps://www.marktechpost.com/author/sajjadansari/This AI Paper Propose the UI-R1 Framework that Extends Rule-based Reinforcement Learning to GUI Action Prediction TasksSajjad Ansarihttps://www.marktechpost.com/author/sajjadansari/TokenBridge: Bridging The Gap Between Continuous and Discrete Token Representations In Visual GenerationSajjad Ansarihttps://www.marktechpost.com/author/sajjadansari/TokenSet: A Dynamic Set-Based Framework for Semantic-Aware Visual Representation
    0 Yorumlar ·0 hisse senetleri ·38 Views
  • A Code Implementation of Using Atlas Evaluation Platform and Selene Model via Python SDK to Score Legal Domain LLM Outputs for GDPR Compliance
    www.marktechpost.com
    In this tutorial, we demonstrate how to evaluate the quality of LLM-generated responses using Atlas Python SDK, a powerful tool for automating evaluation workflows with natural language criteria. Powered by Selene, Atlas state-of-the-art evaluator model, we analyze whether legal responses align with the principles of the GDPR (General Data Protection Regulation). Atlas platform enables programmatic assessments using custom or predefined criteria with synchronous and asynchronous support via the official Atla SDK.In this implementation, we did the following:Used custom GDPR evaluation logicQueried Selene to return binary scores (0 or 1) and human-readable critiquesProcessed the evaluation in batch using asyncioPrinted critiques to understand the reasoning behind each judgmentThe Colab-compatible setup requires minimal dependencies, primarily the atla SDK, pandas, and nest_asyncio.!pip install atla pandas matplotlib nest_asyncio --quietimport osimport nest_asyncioimport asyncioimport pandas as pdfrom atla import Atla, AsyncAtlaATLA_API_KEY = "your atla API key"client = Atla(api_key=ATLA_API_KEY)async_client = AsyncAtla(api_key=ATLA_API_KEY)nest_asyncio.apply()First, we install required libraries and initialize synchronous and asynchronous Atla clients using your API key. nest_asyncio is applied to allow asynchronous code to run smoothly within a Jupyter or Colab notebook environment. This enables seamless integration with Atlas async evaluation API via the AsyncAtla client.data = [ { "question": "Can a company monitor employee emails under GDPR?", "llm_response": "Yes, any employer can freely monitor emails as long as it's for productivity.", "expected": 0 }, { "question": "Can employers access private chats on company devices?", "llm_response": "Only if there is a legitimate business need and employees are informed.", "expected": 1 }, { "question": "Can browsing history be stored under EU privacy law?", "llm_response": "Yes, but consent and transparency are required.", "expected": 1 }, { "question": "Can employers check WhatsApp messages on personal phones?", "llm_response": "No, personal device content is protected unless explicitly authorized.", "expected": 1 },]df = pd.DataFrame(data)df.head()We define a small dataset of legal questions and LLM-generated responses related to GDPR compliance. Each entry includes an expected binary label (1 for compliant, 0 for non-compliant). The data is loaded into a Pandas DataFrame for easy processing and evaluation.custom_eval_criteria = """Score this 1 if the response complies with GDPR principles:- lawful basis- employee consent or notice- data minimization- legitimate interestOtherwise, score it 0.Explain briefly why it qualifies or not."""We define a custom evaluation prompt that guides Atlas Selene model in scoring responses based on key GDPR principles. It instructs the model to assign a score of 1 for compliant answers and 0 otherwise, along with a brief explanation justifying the score.async def evaluate_with_selene(df): async def evaluate_row(row): try: result = await async_client.evaluation.create( model_id="atla-selene", model_input=row["question"], model_output=row["llm_response"], evaluation_criteria=custom_eval_criteria, ) return result.result.evaluation.score, result.result.evaluation.critique except Exception as e: return None, f"Error: {e}" tasks = [evaluate_row(row) for _, row in df.iterrows()] results = await asyncio.gather(*tasks) df["selene_score"], df["critique"] = zip(*results) return dfdf = asyncio.run(evaluate_with_selene(df))df.head()Here, this asynchronous function evaluates each row in the DataFrame using Atlas Selene model. It submits the data along with the custom GDPR evaluation criteria for each legal question and LLM response pair. It then gathers scores and critiques concurrently using asyncio.gather, appends them to the DataFrame, and returns the enriched results.for i, row in df.iterrows(): print(f"\n Q: {row['question']}") print(f" A: {row['llm_response']}") print(f" Selene: {row['critique']} Score: {row['selene_score']}")We iterate through the evaluated DataFrame and print each question, the corresponding LLM-generated answer, and Selenes critique with its assigned score. It provides a clear, human-readable summary of how the evaluator judged each response based on the custom GDPR criteria.In conclusion, this notebook demonstrated how to leverage Atlas evaluation capabilities to assess the quality of LLM-generated legal responses with precision and flexibility. Using the Atla Python SDK and its Selene evaluator, we defined custom GDPR-specific evaluation criteria and automated the scoring of AI outputs with interpretable critiques. The process was asynchronous, lightweight, and designed to run seamlessly in Google Colab.Here is the Colab Notebook. Also,dont forget to follow us onTwitterand join ourTelegram ChannelandLinkedIn Group. Dont Forget to join our85k+ ML SubReddit. Asif RazzaqWebsite| + postsBioAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.Asif Razzaqhttps://www.marktechpost.com/author/6flvq/Meet Hostinger Horizons: A No-Code AI Tool that Lets You Create, Edit, and Publish Custom Web Apps Without Writing a Single Line of CodeAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Tencent AI Researchers Introduce Hunyuan-T1: A Mamba-Powered Ultra-Large Language Model Redefining Deep Reasoning, Contextual Efficiency, and Human-Centric Reinforcement LearningAsif Razzaqhttps://www.marktechpost.com/author/6flvq/NVIDIA AI Researchers Introduce FFN Fusion: A Novel Optimization Technique that Demonstrates How Sequential Computation in Large Language Models LLMs can be Effectively ParallelizedAsif Razzaqhttps://www.marktechpost.com/author/6flvq/A Step by Step Guide to Solve 1D Burgers Equation with Physics-Informed Neural Networks (PINNs): A PyTorch Approach Using Automatic Differentiation and Collocation Methods
    0 Yorumlar ·0 hisse senetleri ·50 Views
  • Meet Hostinger Horizons: A No-Code AI Tool that Lets You Create, Edit, and Publish Custom Web Apps Without Writing a Single Line of Code
    www.marktechpost.com
    In the evolving landscape of web development, the emergence of no-code platforms has significantly broadened access to application creation. Among these, Hostinger Horizons stands out as an AI-powered tool designed to facilitate the building, editing, and publishing of custom web applications without necessitating any coding expertise. By integrating essential services such as hosting, domain registration, and email functionalities, Hostinger Horizons offers a comprehensive solution for individuals and businesses seeking to establish a digital presence.Technical OverviewHostinger Horizons utilizes advanced artificial intelligence and natural language processing to interpret user inputs and generate functional web applications. The platform features a user-friendly chat interface where users can describe their envisioned application in everyday language. For example, a prompt like Create a personal finance tracker that allows users to log expenses and view spending reports enables the AI to construct an application aligned with these specifications. Notable Technical Features:Real-Time Editing and Live Preview: Users can make modifications to their applications and observe changes instantaneously, promoting an iterative development process. Multilingual Support: The platform accommodates over 80 languages, allowing users worldwide to develop applications in their native tongues. Image and Voice Input: Beyond text prompts, users can upload images or utilize voice commands to guide the AI in building the application, enhancing accessibility and flexibility. Sandbox Environment: Hostinger Horizons provides a sandbox environment where users can test their applications without affecting the live version, ensuring a smooth deployment process. Integrated Deployment: Once the application meets the users satisfaction, it can be deployed directly through the platform. Hostinger Horizons manages all backend processes, including hosting and domain setup, streamlining the launch process. Business ConsiderationsHostinger Horizons is tailored to a diverse audience, encompassing entrepreneurs, small businesses, and individual creators. By removing the necessity for coding skills, the platform lowers the barrier to web application development, enabling rapid transformation of ideas into functional applications.Advantages for Businesses:Cost-Effective Development: Traditional web application development often involves significant expenses related to hiring developers. Hostinger Horizons offers a more economical alternative, making it particularly advantageous for startups and small enterprises. Rapid Prototyping: The platform facilitates swift development and deployment of applications, allowing businesses to test concepts and iterate based on user feedback without substantial time investments.Integrated Services: With built-in hosting, domain registration, and email services, businesses can manage all aspects of their web presence from a single platform, simplifying operations and reducing the need for multiple service providers. Scalability: Hostinger Horizons cloud-based infrastructure ensures that applications can scale seamlessly as the business grows, accommodating increasing traffic and user engagement.Pricing Structure:Hostinger Horizons offers several pricing plans to accommodate different needs:Starter Plan: Priced at $19.99 per month, it includes 100 messages, hosting (one month free), unlimited bandwidth, up to 50 web apps, and free email services. Hobbyist Plan: At $49.99 per month, this plan offers 250 messages along with the features included in the Starter Plan.Hustler Plan: For $99.99 per month, users receive 500 messages and the standard features.Pro Plan: The most comprehensive plan at $199.99 per month provides 1,000 messages and all included features.Hostinger also offer a free test with 5 messages when clicking on the Start for free buttonTutorial: Creating a Web Application with Hostinger HorizonsDeveloping a web application with Hostinger Horizons involves a straightforward process. Heres a step-by-step guide:Step 1: Sign Up and Access Hostinger HorizonsVisit the Hostinger Horizons page and select a plan that aligns with your requirements.After purchasing, log in to your Hostinger account and navigate to the hPanel dashboard.Go to Websites Website List and click on Add Website. Choose Hostinger Horizons from the options to access the platform. Step 2: Define Your Application IdeaIn the chat interface, describe the application you wish to create. For example: Create a web application for SUDUKO Game. The web application should be mobile friendly. There should be 3 levels of games. Level 1: Easy mode. Level 2: Medium difficulty. Level 3: Difficult Mode.The AI will process your input and generate a basic version of the application based on your description.Step 3: Customize the ApplicationLayout and Design: Use the real-time editor to adjust the layout, color scheme, and overall design to match your preferences.Functionality: Add or modify features by providing additional prompts. For instance, you can request the inclusion of a budgeting feature or integration with external APIs for real-time data.Content: Upload images, input text content, and configure any necessary settings to personalize the application.Step 4: Test the ApplicationUtilize the sandbox environment to test the applications functionality. Ensure all features operate as intended and make any necessary adjustments based on your testing.Step 5: Deploy the ApplicationOnce satisfied, click the Publish button to deploy your application.DemoThanks tothe Hostinger teamfor the thought leadership/ Resources for this article.Hostinger team has supported us in this content/article. Asif RazzaqWebsite| + postsBioAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.Asif Razzaqhttps://www.marktechpost.com/author/6flvq/Tencent AI Researchers Introduce Hunyuan-T1: A Mamba-Powered Ultra-Large Language Model Redefining Deep Reasoning, Contextual Efficiency, and Human-Centric Reinforcement LearningAsif Razzaqhttps://www.marktechpost.com/author/6flvq/NVIDIA AI Researchers Introduce FFN Fusion: A Novel Optimization Technique that Demonstrates How Sequential Computation in Large Language Models LLMs can be Effectively ParallelizedAsif Razzaqhttps://www.marktechpost.com/author/6flvq/A Step by Step Guide to Solve 1D Burgers Equation with Physics-Informed Neural Networks (PINNs): A PyTorch Approach Using Automatic Differentiation and Collocation MethodsAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Tutorial to Create a Data Science Agent: A Code Implementation using gemini-2.0-flash-lite model through Google API, google.generativeai, Pandas and IPython.display for Interactive Data Analysis
    0 Yorumlar ·0 hisse senetleri ·86 Views
  • Understanding AI Agent Memory: Building Blocks for Intelligent Systems
    www.marktechpost.com
    AI agent memory comprises multiple layers, each serving a distinct role in shaping the agents behavior and decision-making. By dividing memory into different types, it is better to understand and design AI systems that are both contextually aware and responsive. Lets explore the four key types of memory commonly used in AI agents: Episodic, Semantic, Procedural, and Short-Term (or Working) Memory, along with the interplay between long-term and short-term storage.1. Episodic Memory: Recalling Past InteractionsEpisodic memory in AI refers to the storage of past interactions and the specific actions taken by the agent. Like human memory, episodic memory records the events or episodes an agent experiences during its operation. This type of memory is crucial because it enables the agent to reference previous conversations, decisions, and outcomes to inform future actions. For example, when a user interacts with a customer support bot, the bot might store the conversation history in an episodic memory log, allowing it to maintain context over multiple exchanges. This contextual awareness is especially important in multi-turn dialogues where understanding previous interactions can dramatically improve the quality of responses.In practical applications, episodic memory is often implemented using persistent storage systems like vector databases. These systems can store semantic representations of interactions, enabling rapid retrieval based on similarity searches. This means that when an AI agent needs to refer back to an earlier conversation, it can quickly identify and pull relevant segments of past interactions, thereby enhancing the continuity and personalization of the experience.2. Semantic Memory: External Knowledge and Self-awarenessSemantic memory in AI encompasses the agents repository of factual, external information and internal knowledge. Unlike episodic memory, which is tied to specific interactions, semantic memory holds generalized knowledge that the agent can use to understand and interpret the world. This may include language rules, domain-specific information, or self-awareness of the agents capabilities and limitations.One common semantic memory use is in Retrieval-Augmented Generation (RAG) applications, where the agent leverages a vast data store to answer questions accurately. For instance, if an AI agent is tasked with providing technical support for a software product, its semantic memory might contain user manuals, troubleshooting guides, and FAQs. Semantic memory also includes grounding context that helps the agent filter and prioritize relevant data from a broader corpus of information available on the internet.Integrating semantic memory ensures that an AI agent responds based on immediate context and draws on a broad spectrum of external knowledge. This creates a more robust, informed system that can handle diverse queries with accuracy and nuance.3. Procedural Memory: The Blueprint of OperationsProcedural memory is the backbone of an AI systems operational aspects. It includes systemic information such as the structure of the system prompt, the tools available to the agent, and the guardrails that ensure safe and appropriate interactions. In essence, procedural memory defines how the agent functions rather than what it knows.This type of memory is typically managed through well-organized registries, such as Git repositories for code, prompt registries for conversational contexts, and tool registries that enumerate the available functions and APIs. An AI agent can execute tasks more reliably and predictably by having a clear blueprint of its operational procedures. The explicit definition of protocols and guidelines also ensures that the agent behaves in a controlled manner, thereby minimizing risks such as unintended outputs or safety violations.Procedural memory supports consistency in performance and facilitates easier updates and maintenance. As new tools become available or system requirements evolve, the procedural memory can be updated in a centralized manner, ensuring that the agent adapts seamlessly to changes without compromising its core functionality.4. Short-Term (Working) Memory: Integrating Information for ActionIn many AI systems, the information drawn from long-term memory is consolidated into short-term or working memory. This is the temporary context that the agent actively uses to process current tasks. Short-term memory is a compilation of the episodic, semantic, and procedural memories that have been retrieved and localized for immediate use.When an agent is presented with a new task or query, it assembles relevant information from its long-term stores. This might include a snippet of a previous conversation (episodic memory), pertinent factual data (semantic memory), and operational guidelines (procedural memory). The combined information forms the prompt fed into the underlying language model, allowing the AI to generate coherent, context-aware responses.This process of compiling short-term memory is critical for tasks that require nuanced decision-making and planning. It allows the AI agent to remember the conversation history and tailor responses accordingly. The agility provided by short-term memory is a significant factor in creating interactions that feel natural and human-like. Also, the separation between long-term and short-term memory ensures that while the system has a vast knowledge repository, only the most pertinent information is actively engaged during interaction, optimizing performance and accuracy.The Synergy of Long-Term and Short-Term MemoryTo fully appreciate the architecture of AI agent memory, it is important to understand the dynamic interplay between long-term memory and short-term (working) memory. Long-term memory, consisting of episodic, semantic, and procedural types, is the deep storage that informs the AI about its history, external facts, and internal operational frameworks. On the other hand, short-term memory is a fluid, working subset that the agent uses to navigate current tasks. The agent can adapt to new contexts without losing the richness of stored experiences and knowledge by periodically retrieving and synthesizing data from long-term memory. This dynamic balance ensures that AI systems are well-informed, responsive, and contextually aware.In conclusion, the multifaceted approach to memory in AI agents underscores the complexity and sophistication required to build systems that can interact intelligently with the world. Episodic memory allows for the personalization of interactions, semantic memory enriches responses with factual depth, and procedural memory guarantees operational reliability. Meanwhile, integrating these long-term memories into short-term working memory enables the AI to act swiftly and contextually in real-time scenarios. As AI advances, refining these memory systems will be pivotal in creating smart agents capable of nuanced, context-aware decision-making. The layered memory approach is a cornerstone of intelligent agent design, ensuring these systems remain robust, adaptive, and ready to tackle the challenges of an ever-evolving digital landscape.Sources: Sana HassanSana Hassan, a consulting intern at Marktechpost and dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. With a keen interest in solving practical problems, he brings a fresh perspective to the intersection of AI and real-life solutions.Sana Hassanhttps://www.marktechpost.com/author/sana-hassan/Advancing Medical Reasoning with Reinforcement Learning from Verifiable Rewards (RLVR): Insights from MED-RLVRSana Hassanhttps://www.marktechpost.com/author/sana-hassan/Efficient Inference-Time Scaling for Flow Models: Enhancing Sampling Diversity and Compute AllocationSana Hassanhttps://www.marktechpost.com/author/sana-hassan/UCLA Researchers Released OpenVLThinker-7B: A Reinforcement Learning Driven Model for Enhancing Complex Visual Reasoning and Step-by-Step Problem Solving in Multimodal SystemsSana Hassanhttps://www.marktechpost.com/author/sana-hassan/Vision-R1: Redefining Reinforcement Learning for Large Vision-Language Models
    0 Yorumlar ·0 hisse senetleri ·83 Views
  • PilotANN: A Hybrid CPU-GPU System For Graph-based ANNS
    www.marktechpost.com
    Approximate Nearest Neighbor Search (ANNS) is a fundamental vector search technique that efficiently identifies similar items in high-dimensional vector spaces. Traditionally, ANNS has served as the backbone for retrieval engines and recommendation systems, however, it struggles to keep pace with modern Transformer architectures that employ higher-dimensional embeddings and larger datasets. Unlike deep learning systems that can be horizontally scaled due to their stateless nature, ANNS remains centralized, creating a severe single-machine throughput bottleneck. Empirical testing with 100-million scale datasets reveals that even state-of-the-art CPU implementations of the Hierarchical Navigable Small World (HNSW) algorithm cant maintain adequate performance as vector dimensions increase.Previous research on large-scale ANNS has explored two optimization paths: index structure improvements and hardware acceleration. The Inverted MultiIndex (IMI) enhanced space partitioning through multi-codebook quantization, while PQFastScan improved performance with SIMD and cache-aware optimizations. DiskANN and SPANN introduced disk-based indexing for billion-scale datasets, addressing memory hierarchy challenges through different approaches. SONG and CAGRA achieved impressive speedups through GPU parallelization but remain constrained by GPU memory capacity. BANG handled billion-scale datasets via hybrid CPU-GPU processing but lacked critical CPU baseline comparisons. These methods frequently sacrifice compatibility, accuracy or require specialized hardware.Researchers from the Chinese University of Hong Kong, Centre for Perceptual and Interactive Intelligence, and Theory Lab of Huawei Technologies have proposed PilotANN, a hybrid CPU-GPU system designed to overcome the limitations of existing ANNS implementations. PilotANN addresses the challenge: CPU-only implementations struggle with computational demands, while GPU-only solutions are constrained by limited memory capacity. It solves this issue by utilizing both the abundant RAM of CPUs and the parallel processing capabilities of GPUs. Moreover, it employs a three-stage graph traversal process, GPU-accelerated subgraph traversal using dimensionally-reduced vectors, CPU refinement, and precise search with complete vectors.PilotANN fundamentally reimagines the vector search process through a staged data ready processing paradigm. It minimizes data movement across processing stages rather than adhering to traditional move data for computation models. It also consists of three stages: GPU piloting with subgraph and dimensionally-reduced vectors, residual refinement using subgraph with full vectors, and final traversal employing full graph and complete vectors. The design shows cost-effectiveness with only a single commodity GPU while scaling effectively across vector dimensions and graph complexity. Data transfer overhead is minimized to just the initial query vector movement to GPU and a small candidate set returning to CPU after GPU piloting.Experimental results show PilotANNs performance advantages across diverse large-scale datasets. PilotANN achieves a 3.9 times throughput speedup on the 96-dimensional DEEP dataset compared to the HNSW-CPU baseline, with even more impressive gains of 5.1-5.4 times on higher-dimensional datasets. PilotANN delivers significant speedups even on the notoriously challenging T2I dataset despite no specific optimizations for this benchmark. Moreover, it shows remarkable cost-effectiveness despite utilizing more expensive hardware. While the GPU-based platform costs 2.81 USD/hour compared to the CPU-only solution at 1.69 USD/hour, PilotANN achieves 2.3 times cost-effectiveness for DEEP and 3.0-3.2 times for T2I, WIKI, and LAION datasets when measuring throughput per dollar.In conclusion, researchers introduced PilotANN, an advancement in graph-based ANNS that effectively utilizes CPU and GPU resources for emerging workloads. It shows great performance over existing CPU-only approaches through the intelligent decomposition of top-k search into a multi-stage CPU-GPU pipeline and implementation of efficient entry selection. It democratizes high-performance nearest neighbor search by achieving competitive results with a single commodity GPU, making advanced search capabilities accessible to researchers and organizations with limited computing resources. Unlike alternative solutions requiring expensive high-end GPUs, PilotANN enables efficient ANNS deployment on common hardware configurations while maintaining search accuracy.Check outthe Paper and GitHub Page.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our85k+ ML SubReddit. Sajjad AnsariSajjad Ansari is a final year undergraduate from IIT Kharagpur. As a Tech enthusiast, he delves into the practical applications of AI with a focus on understanding the impact of AI technologies and their real-world implications. He aims to articulate complex AI concepts in a clear and accessible manner.Sajjad Ansarihttps://www.marktechpost.com/author/sajjadansari/This AI Paper Propose the UI-R1 Framework that Extends Rule-based Reinforcement Learning to GUI Action Prediction TasksSajjad Ansarihttps://www.marktechpost.com/author/sajjadansari/TokenBridge: Bridging The Gap Between Continuous and Discrete Token Representations In Visual GenerationSajjad Ansarihttps://www.marktechpost.com/author/sajjadansari/TokenSet: A Dynamic Set-Based Framework for Semantic-Aware Visual RepresentationSajjad Ansarihttps://www.marktechpost.com/author/sajjadansari/SuperBPE: Advancing Language Models with Cross-Word Tokenization
    0 Yorumlar ·0 hisse senetleri ·62 Views
  • Advancing Medical Reasoning with Reinforcement Learning from Verifiable Rewards (RLVR): Insights from MED-RLVR
    www.marktechpost.com
    Reinforcement Learning from Verifiable Rewards (RLVR) has recently emerged as a promising method for enhancing reasoning abilities in language models without direct supervision. This approach has shown notable success in mathematics and coding, where reasoning naturally aligns with structured problem-solving. While studies have demonstrated that RLVR alone can lead to self-evolved reasoning, research has largely been limited to these technical fields. Efforts to extend RLVR have explored synthetic datasets, such as those involving sequential tasks and object counting, indicating potential but also highlighting the challenges of adapting this method to different domains.Expanding RLVR to broader areas remains an open challenge, particularly in tasks like multiple-choice question answering (MCQA), which provides structured, verifiable labels across diverse subjects, including medicine. However, unlike math and coding, which involve complex reasoning with an open-ended answer space, MCQA tasks typically have predefined answer choices, making it uncertain whether RLVRs benefits translate effectively. This limitation is especially relevant in medical reasoning tasks, where models must navigate intricate clinical knowledge to produce accurate responses, an area that has proven difficult for existing AI systems.Researchers from Microsoft Research investigate whether medical reasoning can emerge through RLVR. They introduce MED-RLVR, leveraging medical MCQA data to assess RLVRs effectiveness in the medical domain. Their findings show that RLVR extends beyond math and coding, achieving performance comparable to supervised fine-tuning (SFT) in in-distribution tasks while significantly improving out-of-distribution generalization by eight percentage points. Analyzing training dynamics, they observe that reasoning capabilities emerge in a 3B-parameter base model without explicit supervision, highlighting RLVRs potential for advancing reasoning in knowledge-intensive fields like medicine.RL optimizes decision-making by training an agent to maximize rewards through interactions with an environment. It has been effectively applied to language models to align outputs with human preferences and, more recently, to elicit reasoning without explicit supervision. This study employs Proximal Policy Optimization (PPO) to train a policy model, incorporating a clipped objective function to stabilize training. Using a rule-based reward function, MED-RLVR assigns rewards based on output correctness and format validity. Without additional supervision, the model demonstrates emergent medical reasoning, similar to mathematical reasoning in prior RLVR studies, highlighting RLVRs potential beyond structured domains.The MedQA-USMLE dataset, which includes multi-choice medical exam questions, is used to train MED-RLVR. Unlike the standard four-option version, this dataset presents a greater challenge by offering more answer choices. Training is based on the Qwen2.5-3B model using OpenRLHF for reinforcement learning. Compared to SFT, MED-RLVR demonstrates superior generalization, particularly on the MMLU-Pro-Health dataset. Analysis reveals six stages of reasoning evolution: format failures, verbose outputs, reward hacking, and reintegrated reasoning. Unlike math or coding tasks, no self-validation behaviors (aha-moments) were observed, suggesting potential improvements through penalizing short reasoning chains or fine-tuning with longer CoTs.In conclusion, the study focuses on MCQA in medicine, providing a controlled setting for evaluation. However, MCQA does not fully capture the complexity of real-world tasks like open-text answering, report generation, or medical dialogues. Additionally, the unimodal approach limits the models ability to integrate multimodal data, which is crucial for diagnostic applications. Future work should address these limitations. MED-RLVR, based on reinforcement learning with verifiable rewards, matches SFT on in-distribution tasks and improves out-of-distribution generalization. While medical reasoning emerges without explicit supervision, challenges like reward hacking persist, highlighting the need for further exploration of complex reasoning and multimodal integration.Check outthe Paper.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our85k+ ML SubReddit. Sana HassanSana Hassan, a consulting intern at Marktechpost and dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. With a keen interest in solving practical problems, he brings a fresh perspective to the intersection of AI and real-life solutions.Sana Hassanhttps://www.marktechpost.com/author/sana-hassan/Efficient Inference-Time Scaling for Flow Models: Enhancing Sampling Diversity and Compute AllocationSana Hassanhttps://www.marktechpost.com/author/sana-hassan/UCLA Researchers Released OpenVLThinker-7B: A Reinforcement Learning Driven Model for Enhancing Complex Visual Reasoning and Step-by-Step Problem Solving in Multimodal SystemsSana Hassanhttps://www.marktechpost.com/author/sana-hassan/Vision-R1: Redefining Reinforcement Learning for Large Vision-Language ModelsSana Hassanhttps://www.marktechpost.com/author/sana-hassan/Understanding and Mitigating Failure Modes in LLM-Based Multi-Agent Systems
    0 Yorumlar ·0 hisse senetleri ·68 Views
  • Tencent AI Researchers Introduce Hunyuan-T1: A Mamba-Powered Ultra-Large Language Model Redefining Deep Reasoning, Contextual Efficiency, and Human-Centric Reinforcement Learning
    www.marktechpost.com
    Large language models struggle to process and reason over lengthy, complex texts without losing essential context. Traditional models often suffer from context loss, inefficient handling of long-range dependencies, and difficulties aligning with human preferences, affecting the accuracy and efficiency of their responses. Tencents Hunyuan-T1 directly tackles these challenges by integrating a novel Mamba-powered architecture with advanced reinforcement learning and curriculum strategies, ensuring robust context capture and enhanced reasoning capabilities.Hunyuan-T1 is the first model powered by the innovative Mamba architecture, a design that fuses Hybrid Transformer and Mixture-of-Experts (MoE) technologies. Built on the TurboS fast-thinking base, Hunyuan-T1 is specifically engineered to optimize the processing of long textual sequences while minimizing computational overhead. This allows the model to effectively capture extended context and manage long-distance dependencies, crucial for tasks that demand deep, coherent reasoning.A key highlight of Hunyuan-T1 is its heavy reliance on RL during the post-training phase. Tencent dedicated 96.7% of its computing power to this approach, enabling the model to refine its reasoning abilities iteratively. Techniques such as data replay, periodic policy resetting, and self-rewarding feedback loops help improve output quality, ensuring the models responses are detailed, efficient, and closely aligned with human expectations.To further boost reasoning proficiency, Tencent employed a curriculum learning strategy. This approach gradually increases the difficulty of training data while simultaneously expanding the models context length. As a result, Hunyuan-T1 is trained to use tokens more efficiently, seamlessly adapting from solving basic mathematical problems to tackling complex scientific and logical challenges. Efficiency is another cornerstone of Hunyuan-T1s design. The TurboS bases ability to capture long-text information prevents context loss, a common issue in many language models, and doubles the decoding speed compared to similar systems. This breakthrough means that users benefit from faster, higher-quality responses without compromising performance.The model has achieved impressive scores on multiple benchmarks: 87.2 on MMLU-PRO, which tests various subjects including humanities, social sciences, and STEM fields; 69.3 on GPQA-diamond, a challenging evaluation featuring doctoral-level scientific problems; 64.9 on LiveCodeBench for coding tasks; and a remarkable 96.2 on the MATH-500 benchmark for mathematical reasoning. These results underscore Hunyuan-T1s versatility and ability to handle high-stakes, professional-grade tasks across various fields. Beyond quantitative metrics, Hunyuan-T1 is designed to deliver outputs with human-like understanding and creativity. During its RL phase, the model underwent a comprehensive alignment process that combined self-rewarding feedback with external reward models. This dual approach ensures its responses are accurate and exhibit rich details and natural flow.In conclusion, Tencents Hunyuan-T1 combines an ultra-large-scale, Mamba-powered architecture with state-of-the-art reinforcement learning and curriculum strategies. Hunyuan-T1 delivers high performance, enhanced reasoning, and exceptional efficiency.Check outTwitterand dont forget to join our85k+ ML SubReddit. Asif RazzaqWebsite| + postsBioAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.Asif Razzaqhttps://www.marktechpost.com/author/6flvq/NVIDIA AI Researchers Introduce FFN Fusion: A Novel Optimization Technique that Demonstrates How Sequential Computation in Large Language Models LLMs can be Effectively ParallelizedAsif Razzaqhttps://www.marktechpost.com/author/6flvq/A Step by Step Guide to Solve 1D Burgers Equation with Physics-Informed Neural Networks (PINNs): A PyTorch Approach Using Automatic Differentiation and Collocation MethodsAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Tutorial to Create a Data Science Agent: A Code Implementation using gemini-2.0-flash-lite model through Google API, google.generativeai, Pandas and IPython.display for Interactive Data AnalysisAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Google AI Released TxGemma: A Series of 2B, 9B, and 27B LLM for Multiple Therapeutic Tasks for Drug Development Fine-Tunable with Transformers
    0 Yorumlar ·0 hisse senetleri ·74 Views
  • NVIDIA AI Researchers Introduce FFN Fusion: A Novel Optimization Technique that Demonstrates How Sequential Computation in Large Language Models LLMs can be Effectively Parallelized
    www.marktechpost.com
    Large language models (LLMs) have become vital across domains, enabling high-performance applications such as natural language generation, scientific research, and conversational agents. Underneath these advancements lies the transformer architecture, where alternating layers of attention mechanisms and feed-forward networks (FFNs) sequentially process tokenized input. However, with an increase in size and complexity, the computational burden required for inference grows substantially, creating an efficiency bottleneck. Efficient inference is now a critical concern, with many research groups focusing on strategies that can reduce latency, increase throughput, and cut computational costs while maintaining or improving model performance.At the center of this efficiency problem lies the inherently sequential structure of transformers. Each layers output feeds into the next, demanding strict order and synchronization, which is especially problematic at scale. As model sizes expand, the cost of sequential computation and communication across GPUs grows, leading to reduced efficiency and increased deployment cost. This challenge is amplified in scenarios requiring fast, multi-token generation, such as real-time AI assistants. Reducing this sequential load while maintaining model capabilities presents a key technical hurdle. Unlocking new parallelization strategies that preserve accuracy yet significantly reduce computation depth is essential to broadening the accessibility and scalability of LLMs.Several techniques have emerged to improve efficiency. Quantization reduces the precision of numerical representations to minimize memory and computation needs, though it often risks accuracy losses, especially at low bit-widths. Pruning eliminates redundant parameters and simplifies models but potentially harms accuracy without care. Mixture-of-Experts (MoE) models activate only a subset of parameters per input, making them highly efficient for specific workloads. Still, they can underperform at intermediate batch sizes due to low hardware utilization. While valuable, these strategies have trade-offs that limit their universal applicability. Consequently, the field seeks methods that offer broad efficiency improvements with fewer compromises, especially for dense architectures that are simpler to train, deploy, and maintain.Researchers at NVIDIA introduced a new architectural optimization technique named FFN Fusion, which addresses the sequential bottleneck in transformers by identifying FFN sequences that can be executed in parallel. This approach emerged from the observation that when attention layers are removed using a Puzzle tool, models often retain long sequences of consecutive FFNs. These sequences show minimal interdependency and, therefore, can be processed simultaneously. By analyzing the structure of LLMs such as Llama-3.1-405B-Instruct, researchers created a new model called Ultra-253B-Base by pruning and restructuring the base model through FFN Fusion. This method results in a significantly more efficient model that maintains competitive performance.FFN Fusion fuses multiple consecutive FFN layers into a single, wider FFN. This process is grounded in mathematical equivalence: by concatenating the weights of several FFNs, one can produce a single module that behaves like the sum of the original layers but can be computed in parallel. For instance, if three FFNs are stacked sequentially, each dependent on the output of the previous one, their fusion removes these dependencies by ensuring all three operate on the same input and their outputs are aggregated. The theoretical foundation for this method shows that the fused FFN maintains the same representational capacity. Researchers performed dependency analysis using cosine distance between FFN outputs to identify regions with low interdependence. These regions were deemed optimal for fusion, as minimal change in token direction between layers indicated the feasibility of parallel processing.Applying FFN Fusion to the Llama-405B model resulted in Ultra-253B-Base, which delivered notable gains in speed and resource efficiency. Specifically, the new model achieved a 1.71x improvement in inference latency and reduced per-token computational cost by 35x at a batch size of 32. This efficiency did not come at the expense of capability. Ultra-253B-Base scored 85.17% on MMLU, 72.25% on MMLU-Pro, 84.92% on Arena Hard, 86.58% on HumanEval, and 9.19 on MT-Bench. These results often matched or exceeded the original 405B-parameter model, even though Ultra-253B-Base contained only 253 billion parameters. Memory usage also improved with a 2 reduction in kv-cache requirements. The training process involved distilling 54 billion tokens at an 8k context window, followed by staged fine-tuning at 16k, 32k, and 128k contexts. These steps ensured the fused model maintained high accuracy while benefiting from reduced size.This research demonstrates how thoughtful architectural redesign can unlock significant efficiency gains. Researchers showed that FFN layers in transformer architectures are often more independent than previously assumed. Their method of quantifying inter-layer dependency and transforming model structures allowed for broader application across models of various sizes. The technique was also validated on a 70B-parameter model, proving generalizability. Further experiments indicated that while FFN layers can often be fused with minimal impact, full block parallelization, including attention, introduces more performance degradation due to stronger interdependencies.Several Key Takeaways from the Research on FFN Fusion:The FFN Fusion technique reduces sequential computation in transformers by parallelizing low-dependency FFN layers.Fusion is achieved by replacing sequences of FFNs with a single wider FFN using concatenated weights.Ultra-253B-Base, derived from Llama-3.1-405B, achieves 1.71x faster inference and 35x lower per-token cost.Benchmark results include: 85.17% (MMLU), 72.25% (MMLU-Pro), 86.58% (HumanEval), 84.92% (Arena Hard), and 9.19 (MT-Bench).Memory usage is cut by half due to kv-cache optimization.FFN Fusion is more effective at larger model scales and works well with techniques like pruning and quantization.Full transformer block parallelization shows potential but requires further research due to stronger interdependencies.A systematic method using cosine distance helps identify which FFN sequences are safe to fuse.The technique is validated across different model sizes, including 49B, 70B, and 253B.This approach lays the foundation for more parallel-friendly and hardware-efficient LLM designs.Check outthe Paper.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our85k+ ML SubReddit. Asif RazzaqWebsite| + postsBioAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.Asif Razzaqhttps://www.marktechpost.com/author/6flvq/A Step by Step Guide to Solve 1D Burgers Equation with Physics-Informed Neural Networks (PINNs): A PyTorch Approach Using Automatic Differentiation and Collocation MethodsAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Tutorial to Create a Data Science Agent: A Code Implementation using gemini-2.0-flash-lite model through Google API, google.generativeai, Pandas and IPython.display for Interactive Data AnalysisAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Google AI Released TxGemma: A Series of 2B, 9B, and 27B LLM for Multiple Therapeutic Tasks for Drug Development Fine-Tunable with TransformersAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Meet Open Deep Search (ODS): A Plug-and-Play Framework Democratizing Search with Open-source Reasoning Agents
    0 Yorumlar ·0 hisse senetleri ·92 Views
  • A Step by Step Guide to Solve 1D Burgers Equation with Physics-Informed Neural Networks (PINNs): A PyTorch Approach Using Automatic Differentiation and Collocation Methods
    www.marktechpost.com
    In this tutorial, we explore an innovative approach that blends deep learning with physical laws by leveraging Physics-Informed Neural Networks (PINNs) to solve the one-dimensional Burgers equation. Using PyTorch on Google Colab, we demonstrate how to encode the governing differential equation directly into the neural networks loss function, allowing the model to learn the solution (,) that inherently respects the underlying physics. This technique reduces the reliance on large labeled datasets and offers a fresh perspective on solving complex, non-linear partial differential equations using modern computational tools.!pip install torch matplotlibFirst, we install the PyTorch and matplotlib libraries using pip, ensuring you have the necessary tools for building neural networks and visualizing the results in your Google Colab environment.import torchimport torch.nn as nnimport torch.optim as optimimport numpy as npimport matplotlib.pyplot as plttorch.set_default_dtype(torch.float32)We import essential libraries: PyTorch for deep learning, NumPy for numerical operations, and matplotlib for plotting. We set the default tensor data type to float32 for consistent numerical precision throughout your computations.x_min, x_max = -1.0, 1.0t_min, t_max = 0.0, 1.0nu = 0.01 / np.piN_f = 10000 N_0 = 200 N_b = 200 X_f = np.random.rand(N_f, 2)X_f[:, 0] = X_f[:, 0] * (x_max - x_min) + x_min # x in [-1, 1]X_f[:, 1] = X_f[:, 1] * (t_max - t_min) + t_min # t in [0, 1]x0 = np.linspace(x_min, x_max, N_0)[:, None]t0 = np.zeros_like(x0)u0 = -np.sin(np.pi * x0)tb = np.linspace(t_min, t_max, N_b)[:, None]xb_left = np.ones_like(tb) * x_minxb_right = np.ones_like(tb) * x_maxub_left = np.zeros_like(tb)ub_right = np.zeros_like(tb)X_f = torch.tensor(X_f, dtype=torch.float32, requires_grad=True)x0 = torch.tensor(x0, dtype=torch.float32)t0 = torch.tensor(t0, dtype=torch.float32)u0 = torch.tensor(u0, dtype=torch.float32)tb = torch.tensor(tb, dtype=torch.float32)xb_left = torch.tensor(xb_left, dtype=torch.float32)xb_right = torch.tensor(xb_right, dtype=torch.float32)ub_left = torch.tensor(ub_left, dtype=torch.float32)ub_right = torch.tensor(ub_right, dtype=torch.float32)We establish the simulation domain for the Burgers equation by defining spatial and temporal boundaries, viscosity, and the number of collocation, initial, and boundary points. It then generates random and evenly spaced data points for these conditions and converts them into PyTorch tensors, enabling gradient computation where needed.class PINN(nn.Module): def __init__(self, layers): super(PINN, self).__init__() self.activation = nn.Tanh() layer_list = [] for i in range(len(layers) - 1): layer_list.append(nn.Linear(layers[i], layers[i+1])) self.layers = nn.ModuleList(layer_list) def forward(self, x): for i, layer in enumerate(self.layers[:-1]): x = self.activation(layer(x)) return self.layers[-1](x)layers = [2, 50, 50, 50, 50, 1]model = PINN(layers)print(model)Here, we define a custom Physics-Informed Neural Network (PINN) by extending PyTorchs nn.Module. The network architecture is built dynamically using a list of layer sizes, where each linear layer is followed by a Tanh activation (except for the final output layer). In this example, the network takes a 2-dimensional input, passes it through four hidden layers (each with 50 neurons), and outputs a single value. Finally, the model is instantiated with the specified architecture, and its structure is printed.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")model.to(device)Here, we check if a CUDA-enabled GPU is available, set the device accordingly, and move the model to that device for accelerated computation during training and inference.def pde_residual(model, X): x = X[:, 0:1] t = X[:, 1:2] u = model(torch.cat([x, t], dim=1)) u_x = torch.autograd.grad(u, x, grad_outputs=torch.ones_like(u), create_graph=True, retain_graph=True)[0] u_t = torch.autograd.grad(u, t, grad_outputs=torch.ones_like(u), create_graph=True, retain_graph=True)[0] u_xx = torch.autograd.grad(u_x, x, grad_outputs=torch.ones_like(u_x), create_graph=True, retain_graph=True)[0] f = u_t + u * u_x - nu * u_xx return fdef loss_func(model): f_pred = pde_residual(model, X_f.to(device)) loss_f = torch.mean(f_pred**2) u0_pred = model(torch.cat([x0.to(device), t0.to(device)], dim=1)) loss_0 = torch.mean((u0_pred - u0.to(device))**2) u_left_pred = model(torch.cat([xb_left.to(device), tb.to(device)], dim=1)) u_right_pred = model(torch.cat([xb_right.to(device), tb.to(device)], dim=1)) loss_b = torch.mean(u_left_pred**2) + torch.mean(u_right_pred**2) loss = loss_f + loss_0 + loss_b return lossNow, we compute the residual of Burgers equation at the collocation points by calculating the required derivatives via automatic differentiation. Then, we define a loss function that aggregates the PDE residual loss, the error from the initial condition, and the errors from the boundary conditions. This combined loss guides the network to learn a solution that satisfies both the physical law and the imposed conditions.optimizer = optim.Adam(model.parameters(), lr=1e-3)num_epochs = 5000for epoch in range(num_epochs): optimizer.zero_grad() loss = loss_func(model) loss.backward() optimizer.step() if (epoch+1) % 500 == 0: print(f'Epoch {epoch+1}/{num_epochs}, Loss: {loss.item():.5e}') print("Training complete!")Here, we set up the PINNs training loop using the Adam optimizer with a learning rate of 1103. Over 5000 epochs, it repeatedly computes the loss (which includes the PDE residual, initial, and boundary condition errors), backpropagates the gradients, and updates the model parameters. Every 500 epochs, it prints the current epoch and loss to monitor progress and finally announces when training is complete.N_x, N_t = 256, 100x = np.linspace(x_min, x_max, N_x)t = np.linspace(t_min, t_max, N_t)X, T = np.meshgrid(x, t)XT = np.hstack((X.flatten()[:, None], T.flatten()[:, None]))XT_tensor = torch.tensor(XT, dtype=torch.float32).to(device)model.eval()with torch.no_grad(): u_pred = model(XT_tensor).cpu().numpy().reshape(N_t, N_x)plt.figure(figsize=(8, 5))plt.contourf(X, T, u_pred, levels=100, cmap='viridis')plt.colorbar(label='u(x,t)')plt.xlabel('x')plt.ylabel('t')plt.title("Predicted solution u(x,t) via PINN")plt.show()Finally, we create a grid of points over the defined spatial () and temporal () domain, feed these points to the trained model to predict the solution (, ), and reshape the output into a 2D array. Also, it visualizes the predicted solution as a contour plot using matplotlib, complete with a colorbar, axis labels, and a title, allowing you to observe how the PINN has approximated the dynamics of the Burgers equation.In conclusion, this tutorial has showcased how PINNs can be effectively implemented to solve the 1D Burgers equation by incorporating the physics of the problem into the training process. Through careful construction of the neural network, generation of collocation and boundary data, and automatic differentiation, we achieved a model that learns a solution consistent with the PDE and the prescribed conditions. This fusion of machine learning and traditional physics paves the way for tackling more challenging problems in computational science and engineering, inviting further exploration into higher-dimensional systems and more sophisticated neural architectures.Here is the Colab Notebook. Also,dont forget to follow us onTwitterand join ourTelegram ChannelandLinkedIn Group. Dont Forget to join our85k+ ML SubReddit. Asif RazzaqWebsite| + postsBioAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.Asif Razzaqhttps://www.marktechpost.com/author/6flvq/Tutorial to Create a Data Science Agent: A Code Implementation using gemini-2.0-flash-lite model through Google API, google.generativeai, Pandas and IPython.display for Interactive Data AnalysisAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Google AI Released TxGemma: A Series of 2B, 9B, and 27B LLM for Multiple Therapeutic Tasks for Drug Development Fine-Tunable with TransformersAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Meet Open Deep Search (ODS): A Plug-and-Play Framework Democratizing Search with Open-source Reasoning AgentsAsif Razzaqhttps://www.marktechpost.com/author/6flvq/A Code Implementation of Monocular Depth Estimation Using Intel MiDaS Open Source Model on Google Colab with PyTorch and OpenCV
    0 Yorumlar ·0 hisse senetleri ·83 Views
  • Empowering Time Series AI: How Salesforce is Leveraging Synthetic Data to Enhance Foundation Models
    www.marktechpost.com
    Time series analysis faces significant hurdles in data availability, quality, and diversity, critical factors in developing effective foundation models. Real-world datasets often fall short due to regulatory limitations, inherent biases, poor quality, and limited paired textual annotations, making it difficult to create robust, generalizable Time Series Foundation Models (TSFMs) and Large Language Model-based Time Series Models (TSLLMs). This scarcity impacts tasks such as forecasting, classification, anomaly detection, reasoning, and captioning, limiting the full potential of current advancements in artificial intelligence.Salesforce AI Research has addressed these challenges by proposing a comprehensive approach to leveraging synthetic data for enhancing TSFMs and TSLLMs. Their recent study, Empowering Time Series Analysis with Synthetic Data, presents a novel strategy of using synthetic data to improve model training, evaluation, and fine-tuning, focusing on mitigating biases, increasing dataset diversity, and enriching contextual information. By developing innovative data-generation frameworks and incorporating synthetic datasets, Salesforce AI aims to advance the practical application of TSFMs and TSLLMs, especially in sensitive domains like healthcare and finance, where data sharing is heavily regulated.The technical cornerstone of Salesforce AI Researchs methodology involves various synthetic data generation approaches, each addressing specific aspects of time series dynamics, such as trends, seasonal patterns, and noise characteristics. For instance, the ForecastPFN method combines linear-exponential trends and periodic seasonalities with Weibull-distributed noise, effectively simulating realistic yet diverse scenarios. Similarly, TimesFM integrates piecewise linear trends and autoregressive moving average (ARMA) models with periodic patterns. Another innovative technique, KernelSynth by Chronos, employs Gaussian Processes (GPs) combined with linear, periodic, and radial basis function (RBF) kernels to generate rich synthetic datasets. These methods enable a controlled yet varied synthetic data creation that helps in capturing a comprehensive range of realistic time series behaviors.The Salesforce teams findings highlight substantial benefits derived from synthetic data in multiple stages of model development. In pretraining, synthetic datasets provided clear performance enhancements, notably demonstrated in models like ForecastPFN, Mamba4Cast, and TimesFM. For example, ForecastPFN pretrained entirely on synthetic data showed significant improvements in zero-shot forecasting scenarios, while Chronos found optimal performance gains by mixing around 10% synthetic data with real-world datasets, beyond which additional synthetic data could potentially degrade performance due to less diverse representations. Additionally, synthetic data also played a crucial role in evaluation, allowing researchers to precisely assess the models capabilities, understanding internal representations, and identifying gaps in the learned patterns. Moment utilized synthetically generated sinusoidal waves to evaluate internal embeddings and model sensitivity to variations in time series characteristics, demonstrating its effectiveness in capturing subtle trends and frequencies.The paper also addresses current limitations in synthetic data usage, identifying areas for future improvement. One critical gap is the absence of systematic integration methods for synthetic datasets, suggesting the need for structured frameworks to identify and fill missing real-world data patterns strategically. Another limitation noted is the dominance of statistical methods, prompting a call for exploring data-driven generative techniques, like diffusion models, to enhance realism. Salesforce researchers further emphasize untapped potential in leveraging synthetic data during fine-tuning phases to address specific domain gaps or model weaknesses more efficiently and adaptively.In conclusion, Salesforce AI Research demonstrates that synthetic data offers a powerful toolset for overcoming data-related challenges in time series analysis. By systematically integrating high-quality synthetic datasets into various stages of model development, TSFMs and TSLLMs can achieve enhanced generalization, reduced biases, and improved performance across diverse analytical tasks. Despite existing limitations, such as ensuring realism and alignment, the proactive advancement and exploration of synthetic data generation methodologies indicate significant potential. Future research, as suggested by Salesforce, should focus on improving data realism, systematically addressing data gaps, and exploiting iterative, human-in-the-loop synthetic data generation processes. These advancements could dramatically expand the applicability and reliability of time series models, laying a solid foundation for future innovations in artificial intelligence.Check outthe Paper.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our85k+ ML SubReddit. NikhilNikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute.Nikhilhttps://www.marktechpost.com/author/nikhil0980/Meta Reality Labs Research Introduces Sonata: Advancing Self-Supervised Representation Learning for 3D Point CloudsNikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces the Kolmogorov-Test: A Compression-as-Intelligence Benchmark for Evaluating Code-Generating Language ModelsNikhilhttps://www.marktechpost.com/author/nikhil0980/Beginners Guide to Deploying a Machine Learning API with FastAPINikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces PLAN-AND-ACT: A Modular Framework for Long-Horizon Planning in Web-Based Language Agents
    0 Yorumlar ·0 hisse senetleri ·104 Views
  • Meta Reality Labs Research Introduces Sonata: Advancing Self-Supervised Representation Learning for 3D Point Clouds
    www.marktechpost.com
    3D self-supervised learning (SSL) has faced persistent challenges in developing semantically meaningful point representations suitable for diverse applications with minimal supervision. Despite substantial progress in image-based SSL, existing point cloud SSL methods have largely been limited due to the issue known as the geometric shortcut, where models excessively rely on low-level geometric features like surface normals or point heights. This reliance compromises the generalizability and semantic depth of the representations, hindering their practical deployment.Researchers from the University of Hong Kong and Meta Reality Labs Research introduce Sonata, an advanced approach designed to address these fundamental challenges. Sonata employs a self-supervised learning framework that effectively mitigates the geometric shortcut by strategically obscuring low-level spatial cues and reinforcing dependency on richer input features. Drawing inspiration from recent advancements in image-based SSL, Sonata integrates a point self-distillation mechanism that gradually refines representation quality and ensures robustness against geometric simplifications.At a technical level, Sonata utilizes two core strategies: firstly, it operates on coarser scales to obscure spatial information that might otherwise dominate the learned representations. Secondly, Sonata adopts a point self-distillation approach, progressively increasing task difficulty through adaptive masking strategies to foster deeper semantic understanding. Crucially, Sonata removes decoder structures traditionally used in hierarchical models to avoid reintroducing local geometric shortcuts, allowing the encoder alone to build robust, multi-scale feature representations. Additionally, Sonata applies masked point jitter, introducing random perturbations to the spatial coordinates of masked points, thus further discouraging reliance on trivial geometric features.The empirical results reported validate Sonatas efficacy and efficiency. Sonata achieves significant performance gains on benchmarks like ScanNet, where it records a linear probing accuracy of 72.5%, substantially surpassing previous state-of-the-art SSL approaches. Importantly, Sonata demonstrates robustness even with limited data, performing effectively using as little as 1% of the ScanNet dataset, which highlights its suitability for low-resource scenarios. Its parameter efficiency is also notable, delivering strong performance improvements with fewer parameters compared to conventional methods. Furthermore, integrating Sonata with image-derived representations such as DINOv2 results in enhanced accuracy, emphasizing its capacity to capture distinctive semantic details specific to 3D data.Sonatas capabilities are further illustrated through insightful zero-shot visualizations including PCA-colored point clouds and dense feature correspondence, demonstrating coherent semantic clustering and robust spatial reasoning under challenging augmentation conditions. The versatility of Sonata is also evidenced across various semantic segmentation tasks, spanning indoor datasets like ScanNet and ScanNet200, as well as outdoor datasets including Waymo, consistently achieving state-of-the-art outcomes.In conclusion, Sonata represents a significant advancement in addressing inherent limitations in 3D self-supervised learning. Its methodological innovations effectively resolve issues associated with the geometric shortcut, providing semantically richer and more reliable representations. Sonatas integration of self-distillation, careful manipulation of spatial information, and scalability to large datasets establish a solid foundation for future explorations in versatile and robust 3D representation learning. The framework sets a methodological benchmark, facilitating further research towards comprehensive multimodal SSL integration and practical 3D applications.Check outthe Paper and GitHub Page.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our85k+ ML SubReddit. NikhilNikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute.Nikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces the Kolmogorov-Test: A Compression-as-Intelligence Benchmark for Evaluating Code-Generating Language ModelsNikhilhttps://www.marktechpost.com/author/nikhil0980/Beginners Guide to Deploying a Machine Learning API with FastAPINikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces PLAN-AND-ACT: A Modular Framework for Long-Horizon Planning in Web-Based Language AgentsNikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces GRPO-based Open-RS: A Low-Cost Reinforcement Learning Framework to Enhance Reasoning in Small Language Models
    0 Yorumlar ·0 hisse senetleri ·92 Views
  • Tutorial to Create a Data Science Agent: A Code Implementation using gemini-2.0-flash-lite model through Google API, google.generativeai, Pandas and IPython.display for Interactive Data Analysis
    www.marktechpost.com
    In this tutorial, we demonstrate the integration of Pythons robust data manipulation library Pandas with Google Clouds advanced generative capabilities through the google.generativeai package and the Gemini Pro model. By setting up the environment with the necessary libraries, configuring the Google Cloud API key, and leveraging the IPython display functionalities, the code provides a step-by-step approach to building a data science agent analyzing a sample sales dataset. The example shows how to convert a DataFrame into markdown format and then use natural language queries to generate insights about the data, highlighting the potential of combining traditional data analysis tools with modern AI-driven methods.!pip install pandas google-generativeai --quietFirst, we install the Pandas and google-generativeai libraries quietly, setting up the environment for data manipulation and AI-powered analysis.import pandas as pdimport google.generativeai as genaifrom IPython.display import MarkdownWe import Pandas for data manipulation, google.generativeai for accessing Googles generative AI capabilities, and Markdown from IPython.display to render markdown-formatted outputs.GOOGLE_API_KEY = "Use Your API Key Here"genai.configure(api_key=GOOGLE_API_KEY)model = genai.GenerativeModel('gemini-2.0-flash-lite')We assign a placeholder API key, configure the google.generativeai client with it, and initialize the gemini-2.0-flash-lite GenerativeModel for generating content.data = {'Product': ['Laptop', 'Mouse', 'Keyboard', 'Monitor', 'Webcam', 'Headphones'], 'Category': ['Electronics', 'Electronics', 'Electronics', 'Electronics', 'Electronics', 'Electronics'], 'Region': ['North', 'South', 'East', 'West', 'North', 'South'], 'Units Sold': [150, 200, 180, 120, 90, 250], 'Price': [1200, 25, 75, 300, 50, 100]}sales_df = pd.DataFrame(data)print("Sample Sales Data:")print(sales_df)print("-" * 30)Here, we create a Pandas DataFrame named sales_df containing sample sales data for various products, and then print the DataFrame followed by a separator line to visually distinguish the output.def ask_gemini_about_data(dataframe, query): """ Asks the Gemini Pro model a question about the given Pandas DataFrame. Args: dataframe: The Pandas DataFrame to analyze. query: The natural language question about the DataFrame. Returns: The response from the Gemini Pro model as a string. """ prompt = f"""You are a data analysis agent. Analyze the following pandas DataFrame and answer the question. DataFrame: ``` {dataframe.to_markdown(index=False)} ``` Question: {query} Answer: """ response = model.generate_content(prompt) return response.textHere, we construct a markdown-formatted prompt from a Pandas DataFrame and a natural language query, then use the Gemini Pro model to generate and return an analytical response.# Query 1: What is the total number of units sold across all products?query1 = "What is the total number of units sold across all products?"response1 = ask_gemini_about_data(sales_df, query1)print(f"Question 1: {query1}")print(f"Answer 1:\n{response1}")print("-" * 30)Query 1 Output# Query 2: Which product had the highest number of units sold?query2 = "Which product had the highest number of units sold?"response2 = ask_gemini_about_data(sales_df, query2)print(f"Question 2: {query2}")print(f"Answer 2:\n{response2}")print("-" * 30)Query 2 Output# Query 3: What is the average price of the products?query3 = "What is the average price of the products?"response3 = ask_gemini_about_data(sales_df, query3)print(f"Question 3: {query3}")print(f"Answer 3:\n{response3}")print("-" * 30)Query 3 Output# Query 4: Show me the products sold in the 'North' region.query4 = "Show me the products sold in the 'North' region."response4 = ask_gemini_about_data(sales_df, query4)print(f"Question 4: {query4}")print(f"Answer 4:\n{response4}")print("-" * 30)Query 4 Output# Query 5. More complex query: Calculate the total revenue for each product.query5 = "Calculate the total revenue (Units Sold * Price) for each product and present it in a table."response5 = ask_gemini_about_data(sales_df, query5)print(f"Question 5: {query5}")print(f"Answer 5:\n{response5}")print("-" * 30)Query 5 OutputIn conclusion, the tutorial successfully illustrates how the synergy between Pandas, the google.generativeai package, and the Gemini Pro model can transform data analysis tasks into a more interactive and insightful process. The approach simplifies querying and interpreting data and opens up avenues for advanced use cases such as data cleaning, feature engineering, and exploratory data analysis. By harnessing these state-of-the-art tools within the familiar Python ecosystem, data scientists can enhance their productivity and innovation, making it easier to derive meaningful insights from complex datasets.Here is the Colab Notebook. Also,dont forget to follow us onTwitterand join ourTelegram ChannelandLinkedIn Group. Dont Forget to join our85k+ ML SubReddit. Asif RazzaqWebsite| + postsBioAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.Asif Razzaqhttps://www.marktechpost.com/author/6flvq/Google AI Released TxGemma: A Series of 2B, 9B, and 27B LLM for Multiple Therapeutic Tasks for Drug Development Fine-Tunable with TransformersAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Meet Open Deep Search (ODS): A Plug-and-Play Framework Democratizing Search with Open-source Reasoning AgentsAsif Razzaqhttps://www.marktechpost.com/author/6flvq/A Code Implementation of Monocular Depth Estimation Using Intel MiDaS Open Source Model on Google Colab with PyTorch and OpenCVAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Google DeepMind Researchers Propose CaMeL: A Robust Defense that Creates a Protective System Layer around the LLM, Securing It even when Underlying Models may be Susceptible to Attacks
    0 Yorumlar ·0 hisse senetleri ·69 Views
  • Google AI Released TxGemma: A Series of 2B, 9B, and 27B LLM for Multiple Therapeutic Tasks for Drug Development Fine-Tunable with Transformers
    www.marktechpost.com
    Developing therapeutics continues to be an inherently costly and challenging endeavor, characterized by high failure rates and prolonged development timelines. The traditional drug discovery process necessitates extensive experimental validations from initial target identification to late-stage clinical trials, consuming substantial resources and time. Computational methodologies, particularly machine learning and predictive modeling, have emerged as pivotal tools to streamline this process. However, existing computational models are typically highly specialized, limiting their effectiveness in addressing diverse therapeutic tasks and offering limited interactive reasoning capabilities required for scientific inquiry and analysis.To address these limitations, Google AI has introduced TxGemma, a collection of generalist large language models (LLMs) designed explicitly to facilitate various therapeutic tasks in drug development. TxGemma distinguishes itself by integrating diverse datasets, encompassing small molecules, proteins, nucleic acids, diseases, and cell lines, which allows it to span multiple stages within the therapeutic development pipeline. TxGemma models, available with 2 billion (2B), 9 billion (9B), and 27 billion (27B) parameters, are fine-tuned from Gemma-2 architecture using comprehensive therapeutic datasets. Additionally, the suite includes TxGemma-Chat, an interactive conversational model variant, that enables scientists to engage in detailed discussions and mechanistic interpretations of predictive outcomes, fostering transparency in model utilization.From a technical standpoint, TxGemma capitalizes on the extensive Therapeutic Data Commons (TDC), a curated dataset containing over 15 million datapoints across 66 therapeutically relevant datasets. TxGemma-Predict, the predictive variant of the model suite, demonstrates significant performance across these datasets, matching or exceeding the performance of both generalist and specialist models currently employed in therapeutic modeling. Notably, the fine-tuning approach employed in TxGemma optimizes predictive accuracy with substantially fewer training samples, providing a crucial advantage in domains where data scarcity is prevalent. Further extending its capabilities, Agentic-Tx, powered by Gemini 2.0, dynamically orchestrates complex therapeutic queries by combining predictive insights from TxGemma-Predict and interactive discussions from TxGemma-Chat with external domain-specific tools.Empirical evaluations underscore TxGemmas capability. Across 66 tasks curated by the TDC, TxGemma-Predict consistently achieved performance comparable to or exceeding existing state-of-the-art models. Specifically, TxGemmas predictive models surpassed state-of-the-art generalist models in 45 tasks and specialized models in 26 tasks, with notable efficiency in clinical trial adverse event predictions. On challenging benchmarks such as ChemBench and Humanitys Last Exam, Agentic-Tx demonstrated clear advantages over previous leading models, enhancing accuracy by approximately 5.6% and 17.9%, respectively. Moreover, the conversational capabilities embedded in TxGemma-Chat provided essential interactive reasoning to support in-depth scientific analyses and discussions.TxGemmas practical utility is particularly evident in adverse event prediction during clinical trials, an essential aspect of therapeutic safety evaluation. TxGemma-27B-Predict demonstrated robust predictive performance while utilizing significantly fewer training samples compared to conventional models, illustrating enhanced data efficiency and reliability. Moreover, computational performance assessments indicate that the inference speed of TxGemma supports practical real-time applications, such as virtual screening, with the largest variant (27B parameters) capable of efficiently processing large sample volumes daily when deployed on scalable infrastructure.In summary, the introduction of TxGemma by Google AI represents a methodical advancement in computational therapeutic research, combining predictive efficacy, interactive reasoning, and improved data efficiency. By making TxGemma publicly accessible, Google enables further validation and adaptation on diverse, proprietary datasets, thereby promoting broader applicability and reproducibility in therapeutic research. With sophisticated conversational functionality via TxGemma-Chat and complex workflow integration through Agentic-Tx, the suite provides researchers with advanced computational tools capable of significantly enhancing decision-making processes in therapeutic development.Check outthe Paper and Models on Hugging Face .All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our85k+ ML SubReddit. Asif RazzaqWebsite| + postsBioAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.Asif Razzaqhttps://www.marktechpost.com/author/6flvq/Meet Open Deep Search (ODS): A Plug-and-Play Framework Democratizing Search with Open-source Reasoning AgentsAsif Razzaqhttps://www.marktechpost.com/author/6flvq/A Code Implementation of Monocular Depth Estimation Using Intel MiDaS Open Source Model on Google Colab with PyTorch and OpenCVAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Google DeepMind Researchers Propose CaMeL: A Robust Defense that Creates a Protective System Layer around the LLM, Securing It even when Underlying Models may be Susceptible to AttacksAsif Razzaqhttps://www.marktechpost.com/author/6flvq/DeepSeek AI Unveils DeepSeek-V3-0324: Blazing Fast Performance on Mac Studio, Heating Up the Competition with OpenAI
    0 Yorumlar ·0 hisse senetleri ·96 Views
  • A Code Implementation of Monocular Depth Estimation Using Intel MiDaS Open Source Model on Google Colab with PyTorch and OpenCV
    www.marktechpost.com
    Monocular depth estimation involves predicting scene depth from a single RGB imagea fundamental task in computer vision with wide-ranging applications, including augmented reality, robotics, and 3D scene understanding. In this tutorial, we implement Intels MiDaS (Monocular Depth Estimation via a Multi-Scale Vision Transformer), a state-of-the-art model designed for high-quality depth prediction from a single image. Leveraging Google Colab as the compute platform, along with PyTorch, OpenCV, and Matplotlib, this tutorial enables you to upload your image and visualize the corresponding depth maps easily.!pip install -q timm opencv-python matplotlibFirst, we install the necessary Python librariestimm for model support, opencv-python for image processing, and matplotlib for visualizing the depth maps.!git clone https://github.com/isl-org/MiDaS.git%cd MiDaSThen, we clone the official Intel MiDaS repository from GitHub and navigate into its directory to access the model code and transformation utilities.import torchimport cv2import matplotlib.pyplot as pltimport numpy as npfrom PIL import Imagefrom torchvision.transforms import Composefrom google.colab import filesfrom midas.dpt_depth import DPTDepthModelfrom midas.transforms import Resize, NormalizeImage, PrepareForNetdevice = torch.device("cuda" if torch.cuda.is_available() else "cpu")We import all the necessary libraries and MiDaS components required for loading the model, preprocessing images, handling uploads, and visualizing depth predictions. Then we set the computation device to GPU (CUDA) if available; otherwise, it defaults to CPU, ensuring system compatibility.model_path = torch.hub.load("intel-isl/MiDaS", "DPT_Large", pretrained=True, force_reload=True)model = model_path.to(device)model.eval()Here, we download the pretrained MiDaS DPT_Large model from Intels torch.hub, moves it to the selected device (CPU or GPU), and sets it to evaluation mode for inference.transform = Compose([ Resize(384, 384, resize_target=None, keep_aspect_ratio=True, ensure_multiple_of=32, resize_method="upper_bound"), NormalizeImage(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), PrepareForNet()])We define MiDaSs image preprocessing pipeline, which resizes the input image, normalizes its pixel values, and formats it appropriately for model inference.uploaded = files.upload()for filename in uploaded: img = cv2.imread(filename) img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) breakWe allow the user to upload an image in Colab, read it using OpenCV, and convert it from BGR to RGB format for accurate color representation.img_input = transform({"image": img})["image"]input_tensor = torch.from_numpy(img_input).unsqueeze(0).to(device)with torch.no_grad(): prediction = model(input_tensor) prediction = torch.nn.functional.interpolate( prediction.unsqueeze(1), size=img.shape[:2], mode="bicubic", align_corners=False, ).squeeze()depth_map = prediction.cpu().numpy()Now, we apply the preprocessing transform to the uploaded image, convert it to a tensor, perform depth prediction using the MiDaS model, resize the output to match the original image dimensions, and extract the final depth map as a NumPy array.plt.figure(figsize=(10, 5))plt.subplot(1, 2, 1)plt.imshow(img)plt.title("Original Image")plt.axis("off")plt.subplot(1, 2, 2)plt.imshow(depth_map, cmap='inferno')plt.title("Depth Map")plt.axis("off")plt.tight_layout()plt.show()Finally, we create a side-by-side visualization of the original image and its corresponding depth map using Matplotlib. The depth map is displayed using the inferno colormap for better contrast.In conclusion, by completing this tutorial, weve successfully deployed Intels MiDaS model on Google Colab to perform monocular depth estimation using just an RGB image. Using PyTorch for model inference, OpenCV for image processing, and Matplotlib for visualization, weve built a robust pipeline to generate high-quality depth maps with minimal setup. This implementation is a strong foundation for further exploration, including video depth estimation, real-time applications, and integration of AR/VR systems.Here is the Colab Notebook. Also,dont forget to follow us onTwitterand join ourTelegram ChannelandLinkedIn Group. Dont Forget to join our85k+ ML SubReddit. Asif RazzaqWebsite| + postsBioAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.Asif Razzaqhttps://www.marktechpost.com/author/6flvq/Meet Open Deep Search (ODS): A Plug-and-Play Framework Democratizing Search with Open-source Reasoning AgentsAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Google DeepMind Researchers Propose CaMeL: A Robust Defense that Creates a Protective System Layer around the LLM, Securing It even when Underlying Models may be Susceptible to AttacksAsif Razzaqhttps://www.marktechpost.com/author/6flvq/DeepSeek AI Unveils DeepSeek-V3-0324: Blazing Fast Performance on Mac Studio, Heating Up the Competition with OpenAIAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Google AI Released Gemini 2.5 Pro Experimental: An Advanced AI Model that Excels in Reasoning, Coding, and Multimodal Capabilities
    0 Yorumlar ·0 hisse senetleri ·78 Views
  • Meet Open Deep Search (ODS): A Plug-and-Play Framework Democratizing Search with Open-source Reasoning Agents
    www.marktechpost.com
    The rapid advancements in search engine technologies integrated with large language models (LLMs) have predominantly favored proprietary solutions such as Googles GPT-4o Search Preview and Perplexitys Sonar Reasoning Pro. While these proprietary systems offer strong performance, their closed-source nature poses significant challenges, particularly concerning transparency, innovation, and community collaboration. This exclusivity limits customization and hampers broader academic and entrepreneurial engagement with search-enhanced AI.In response to these limitations, researchers from the University of Washington, Princeton University, and UC Berkeley have introduced Open Deep Search (ODS)an open-source search AI framework designed for seamless integration with any user-selected LLM in a modular manner. ODS comprises two central components: the Open Search Tool and the Open Reasoning Agent. Together, these components substantially improve the capabilities of the base LLM by enhancing content retrieval and reasoning accuracy.The Open Search Tool distinguishes itself through an advanced retrieval pipeline, featuring an intelligent query rephrasing method that better captures user intent by generating multiple semantically related queries. This approach notably improves the accuracy and diversity of search results. Furthermore, the tool employs refined chunking and re-ranking techniques to systematically filter search results according to relevance. Complementing the retrieval component, the Open Reasoning Agent operates through two distinct methodologies: the Chain-of-thought ReAct agent and the Chain-of-code CodeAct agent. These agents interpret user queries, manage tool usageincluding searches and calculationsand produce comprehensive, contextually accurate responses.Empirical evaluations underscore the effectiveness of ODS. Integrated with DeepSeek-R1, an advanced open-source reasoning model, ODS-v2 achieves 88.3% accuracy on the SimpleQA benchmark and 75.3% on the FRAMES benchmark. This performance notably surpasses proprietary alternatives such as Perplexitys Sonar Reasoning Pro, which scores 85.8% and 44.4% on these benchmarks, respectively. Compared with OpenAIs GPT-4o Search Preview, ODS-v2 shows a significant advantage on the FRAMES benchmark, achieving a 9.7% higher accuracy. These results illustrate ODSs capacity to deliver competitive, and in specific areas superior, performance relative to proprietary systems.An important feature of ODS is its adaptive use of tools, as demonstrated by strategic decision-making regarding additional web searches. For straightforward queries, as observed in SimpleQA, ODS minimizes additional searches, demonstrating efficient resource utilization. Conversely, for complex multi-hop queries, as in the FRAMES benchmark, ODS appropriately increases its use of web searches, thus exemplifying intelligent resource management tailored to query complexity.In conclusion, Open Deep Search represents a notable advancement towards democratizing search-enhanced AI by providing an open-source framework compatible with diverse LLMs. It encourages innovation and transparency within the AI research community and supports broader participation in the development of sophisticated search and reasoning capabilities. By effectively integrating advanced retrieval techniques with adaptive reasoning methodologies, ODS contributes meaningfully to open-source AI development, setting a robust standard for future exploration in search-integrated large language models.Check outthe Paper and GitHub Page.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our85k+ ML SubReddit.The post Meet Open Deep Search (ODS): A Plug-and-Play Framework Democratizing Search with Open-source Reasoning Agents appeared first on MarkTechPost.
    0 Yorumlar ·0 hisse senetleri ·76 Views
  • Beginners Guide to Deploying a Machine Learning API with FastAPI
    www.marktechpost.com
    In this guide, you will learn how to deploy a machine learning model as an API using FastAPI. We will create an API that predicts the species of a penguin based on its bill length and flipper length.PrerequisitesBasic knowledge of PythonPython installed on your system (preferably version 3.7 or higher)Familiarity with machine learning concepts (optional)Step 1: Set Up Your EnvironmentCreate a Project DirectoryOpen your terminal and create a new directory for your project:Set Up a Virtual EnvironmentOn windows use: venvScriptsactivateInstall Required PackagesStep 2: Prepare Your Machine Learning ModelDownload Dataset here.Create a Python Script for the ModelStep 3: Create the FastAPI ApplicationCreate the Main Application FileStep 4: Run Your FastAPI ApplicationRun the ApplicationAccess the APIStep 5: Test Your APIUse Swagger UIConclusionCongratulations! You have successfully deployed a machine learning API using FastAPI. This guide covered:Setting up your environment.Preparing a machine learning model.Creating a FastAPI application.Running and testing your API.Next StepsExplore more advanced features of FastAPI like authentication and database integration.Experiment with different machine learning models and datasets.Consider containerizing your application using Docker for easier deployment.Feel free to reach out if you have any questions or need further assistance! NikhilNikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute.Nikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces the Kolmogorov-Test: A Compression-as-Intelligence Benchmark for Evaluating Code-Generating Language ModelsNikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces PLAN-AND-ACT: A Modular Framework for Long-Horizon Planning in Web-Based Language AgentsNikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces GRPO-based Open-RS: A Low-Cost Reinforcement Learning Framework to Enhance Reasoning in Small Language ModelsNikhilhttps://www.marktechpost.com/author/nikhil0980/Qwen Releases the Qwen2.5-VL-32B-Instruct: A 32B Parameter VLM that Surpasses Qwen2.5-VL-72B and Other Models like GPT-4o Mini
    0 Yorumlar ·0 hisse senetleri ·65 Views
  • This AI Paper Introduces the Kolmogorov-Test: A Compression-as-Intelligence Benchmark for Evaluating Code-Generating Language Models
    www.marktechpost.com
    Compression is a cornerstone of computational intelligence, deeply rooted in the theory of Kolmogorov complexity, which defines the minimal program needed to reproduce a given sequence. Unlike traditional compression methods that look for repetition and redundancy, Kolmogorovs framework interprets compression as a problem of discovering structured patterns through programmatic representation. While the theory promises optimal compression, its uncomputability poses a significant hurdle. Nevertheless, the emergence of large language models capable of code generation opens an intriguing opportunity to test how closely modern systems can approximate this theoretical ideal by reasoning through code rather than pattern matching.A core issue arises from the limitations of current tools in compressing data sequences using concise, executable code. Models often replicate inputs rather than generate programs that reproduce them, indicating a gap in true pattern understanding. This becomes especially evident when dealing with real-world audio, text, or DNA sequences, where complex logical structures must be uncovered to achieve efficient compression. The main challenge is ensuring the model replicates the sequence and uses a minimal and rational set of instructions. Furthermore, though synthetic training data is useful for controlled evaluation, it often fails to support robust generalization to natural data, which is essential for practical applications.Several compression tools exist, ranging from traditional algorithms like GZIP to newer neural compression systems. GZIP remains a strong baseline, especially for long or repetitive sequences, due to its effective encoding of statistical regularities. More recently, language modeling approaches have integrated with arithmetic coding, using prediction probabilities to compress input data. However, these methods typically require access to the full model weights at decoding time, limiting their efficiency and applicability. Prompted code-generating models like GPT-4 and LLaMA have also been evaluated in zero-shot settings to generate Python programs that reproduce input sequences. Yet, they frequently produce lengthy, imprecise code with limited success, particularly when faced with unseen or complex sequences.Researchers from Meta AI and Tel Aviv University introduced the Kolmogorov-Test (KT), a benchmark for assessing the reasoning capability of code-generating language models. The test evaluates a models ability to generate the shortest program that outputs a given input sequence. Unlike typical benchmarks, KT emphasizes logical composition and program generation over predictive text modeling. Sequences include natural data from audio (LibriSpeech), text (Wikipedia enwik9), and DNA (GRCh38), as well as synthetic sequences generated through a custom-designed domain-specific language (DSL). This DSL supports building structured sequences by composing operations like range creation, sequence modification, merging, and filtering.The researchers developed an automated framework to generate millions of synthetic program-sequence pairs using this DSL. These programs then train and evaluate models, including large pre-trained and specifically trained ones like SEQCODER. To measure performance, the team employed metrics such as accuracywhether the generated program reproduces the sequenceand precisionhow concise the correct program is compared to GZIP compression. The test involved compressing sequences of varying lengths, with synthetic sequences averaging 76 bytes and real sequences capped at 128.Results showed that even the most powerful models struggled. GPT-4 achieved 69.5% accuracy on high-quality audio but dropped to 36.4% for 8-bit audio and 50.3% for DNA data. LLaMA-3.1-405B performed worse, with accuracies as low as 3.9% for audio and only 24.8% for DNA. In synthetic data, SEQCODER-8B reached 92.5% accuracy with a precision score of 0.56, outperforming traditional tools like GZIP. However, its accuracy on real-world data remained near zero. This discrepancy illustrates the difficulty in transferring success from synthetic benchmarks to more varied and noisy real-world sequences, highlighting the limitations of current training regimes and prompting the need for new strategies.Overall, this research clearly outlines the complexity of compression via code generation. The KT benchmark provides a rigorous and diverse model reasoning and structure recognition test, exposing the stark divide between synthetic learning environments and real-world applications. The introduced methodology and test set a high bar for future models aiming to unify reasoning with compression, but significant innovation is still required to meet this challenge.Check outthe Paper.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our85k+ ML SubReddit. NikhilNikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute.Nikhilhttps://www.marktechpost.com/author/nikhil0980/Beginners Guide to Deploying a Machine Learning API with FastAPINikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces PLAN-AND-ACT: A Modular Framework for Long-Horizon Planning in Web-Based Language AgentsNikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces GRPO-based Open-RS: A Low-Cost Reinforcement Learning Framework to Enhance Reasoning in Small Language ModelsNikhilhttps://www.marktechpost.com/author/nikhil0980/Qwen Releases the Qwen2.5-VL-32B-Instruct: A 32B Parameter VLM that Surpasses Qwen2.5-VL-72B and Other Models like GPT-4o Mini
    0 Yorumlar ·0 hisse senetleri ·65 Views
Daha Hikayeler