Marktechpost AI
Marktechpost AI
AI/ML Research and Dev News Platform (1 million+monthly traffic) | 50k+ ML subreddit | Contact: Asif@marktechpost.com
  • 1 χρήστες τους αρέσει
  • 154 Δημοσιεύσεις
  • 2 τις φωτογραφίες μου
  • 0 Videos
  • 0 Προεπισκόπηση
  • News
Αναζήτηση
Πρόσφατες ενημερώσεις
  • WWW.MARKTECHPOST.COM
    Microsoft Research Introduces Reducio-DiT: Enhancing Video Generation Efficiency with Advanced Compression
    Recent advancements in video generation models have enabled the production of high-quality, realistic video clips. However, these models face challenges in scaling for large-scale, real-world applications due to the computational demands required for training and inference. Current commercial models like Sora, Runway Gen-3, and Movie Gen demand extensive resources, including thousands of GPUs and millions of GPU hours for training, with each second of video inference taking several minutes. These high requirements make these solutions costly and impractical for many potential applications, limiting the use of high-fidelity video generation to only those with substantial computational resources.Reducio-DiT: A New SolutionMicrosoft researchers have introduced Reducio-DiT, a new approach designed to address this problem. This solution centers around an image-conditioned variational autoencoder (VAE) that significantly compresses the latent space for video representation. The core idea behind Reducio-DiT is that videos contain more redundant information compared to static images, and this redundancy can be leveraged to achieve a 64-fold reduction in latent representation size without compromising video quality. The research team has combined this VAE with diffusion models to improve the efficiency of generating 10241024 video clips, reducing the inference time to 15.5 seconds on a single A100 GPU.Technical ApproachFrom a technical perspective, Reducio-DiT stands out due to its two-stage generation approach. First, it generates a content image using text-to-image techniques, and then it uses this image as a prior to create video frames through a diffusion process. The motion information, which constitutes a large part of a videos content, is separated from the static background and compressed efficiently in the latent space, resulting in a much smaller computational footprint. Specifically, Reducio-VAEthe autoencoder component of Reducio-DiTleverages 3D convolutions to achieve a significant compression factor, enabling a 4096-fold down-sampled representation of the input videos. The diffusion component, Reducio-DiT, integrates this highly compressed latent representation with features extracted from both the content image and the corresponding text prompt, thereby producing smooth, high-quality video sequences with minimal overhead.This approach is important for several reasons. Reducio-DiT offers a cost-effective solution to an industry burdened by computational challenges, making high-resolution video generation more accessible. The model demonstrated a speedup of 16.6 times over existing methods like Lavie, while achieving a Frchet Video Distance (FVD) score of 318.5 on UCF-101, outperforming other models in this category. By utilizing a multi-stage training strategy that scales up from low to high-resolution video generation, Reducio-DiT maintains the visual integrity and temporal consistency across generated framesa challenge that many previous approaches to video generation struggled to achieve. Additionally, the compact latent space not only accelerates the video generation process but also reduces the hardware requirements, making it feasible for use in environments without extensive GPU resources.ConclusionMicrosofts Reducio-DiT represents an advance in video generation efficiency, balancing high quality with reduced computational cost. The ability to generate a 10241024 video clip in 15.5 seconds, combined with a significant reduction in training and inference costs, marks a notable development in the field of generative AI for video. For further technical exploration and access to the source code, visit Microsofts GitHub repository for Reducio-VAE. This development paves the way for more widespread adoption of video generation technology in applications such as content creation, advertising, and interactive entertainment, where generating engaging visual media quickly and cost-effectively is essential.Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also,dont forget to follow us onTwitter and join ourTelegram Channel andLinkedIn Group. If you like our work, you will love ournewsletter.. Dont Forget to join our55k+ ML SubReddit. Aswin Ak+ postsAswin AK is a consulting intern at MarkTechPost. He is pursuing his Dual Degree at the Indian Institute of Technology, Kharagpur. He is passionate about data science and machine learning, bringing a strong academic background and hands-on experience in solving real-life cross-domain challenges. Read this AI Research Report from Kili Technology on 'Evaluation of Large Language Model Vulnerabilities: A Comparative Analysis of Red Teaming Techniques'
    0 Σχόλια 0 Μοιράστηκε 3 Views
  • WWW.MARKTECHPOST.COM
    This AI Paper Unveils TrialGPT: Revolutionizing Patient-to-Trial Matching with Precision and Speed
    Matching patients to suitable clinical trials is a pivotal but highly challenging process in modern medical research. It involves analyzing complex patient medical histories and mapping them against considerable levels of detail found in trial eligibility criteria. These criteria are complex, ambiguous, and heterogeneous, making the undertaking labor-intensive and prone to error, inefficient, and delaying the realization of critical research progress while many patients are kept waiting for experimental treatments. This is exacerbated by the requirement to scale across large collections of trials, especially in areas like oncology and rare diseases, where precision and efficiency are highly valued.Traditional methods of patient trial matching are twofold: one for cohort-level recruitment is the trial-to-patient match, and the second is the patient-to-trial match with its focus on individual referrals and patient-centric care. Despite this, several limitations plague state-of-the-art neural embedding-based methods. Such shortcomings involve reliance on large-scale annotated datasets that are difficult to obtain, with low computational efficiency and poor capabilities in terms of real-time applications. A lack of transparency regarding the predictions also undermines clinician confidence. One can conclude that such imperfections call for innovative and explainable data-efficient ways to improve matching performance in clinical environments.To address these challenges, the researchers developed TrialGPT, a groundbreaking framework that leverages large language models (LLMs) to streamline patient-to-trial matching. These three major parts constitute the composition of TrialGPT: TrialGPT-Retrieval, which filters out most irrelevant trials with the help of hybrid fusion retrieval and keywords generated from patient summaries; TrialGPT-Matching, which performs detailed evaluation of the eligibility of patients at criterion level, hence providing natural language explanations and evidence localization; and TrialGPT-Ranking, which aggregates criterion-level results into trial-level scores to prioritize and rule out. This framework integrates deep natural language understanding and generation capabilities, guaranteeing accuracy, explainability, and flexibility for analyzing unstructured medical data.The researchers assessed TrialGPT on three public datasets: SIGIR, TREC 2021, and TREC 2022, covering 183 synthetic patients and over 75,000 trial annotations. The datasets contain a wide range of eligibility criteria categorized into inclusion and exclusion labels. The retrieval component uses GPT-4 to generate context-aware keywords from patient notes with more than 90% recall and reducing the search space by 94%. The matching component conducts a criterion-level analysis which provides high accuracy and is supported by explainable eligibility predictions as well as evidence localization. The ranking approach combines linear and LLM-based aggregation methods efficiently to rank appropriate trials while discarding inappropriate ones, and hence it is well capable of being used at scale in real-world applications.The trialGPT model performed robustly on all relevant benchmarks, solving both retrieval and matching problems. The retrieval module narrowed down large collections of trials while still maintaining good recall for relevant options. The matching module offered criterion-level predictions with accuracy equivalent to human experts in addition to natural language explanations and the evidence at the exact sentence level. Its ranking feature outperformed all other methods in terms of ranking precision and exclusion effectiveness in identifying and ranking eligible trials. Patient-trial matching workflow efficiency was further improved by TrialGPT, leading to a decrease in screening time by over 42%, demonstrating its practical value for clinical trial recruitment.TrialGPT illustrates a radical solution to the problems of patient-trial matching: scalability, accuracy, and transparency in a novel usage application of LLMs. Its modularity overcomes key limitations in conventional approaches, accelerating patient recruitment processes and streamlining clinical research while producing better patient outcomes. With advanced language understanding integrated with explainable outputs, TrialGPT illustrates a new scale for personalized and efficient trials. Future work may involve the integration of multi-modal data sources and the adaptation of open-source LLMs to various applications for real-world validation.Check out the Paper. All credit for this research goes to the researchers of this project. Also,dont forget to follow us onTwitter and join ourTelegram Channel andLinkedIn Group. If you like our work, you will love ournewsletter.. Dont Forget to join our55k+ ML SubReddit. Aswin Ak+ postsAswin AK is a consulting intern at MarkTechPost. He is pursuing his Dual Degree at the Indian Institute of Technology, Kharagpur. He is passionate about data science and machine learning, bringing a strong academic background and hands-on experience in solving real-life cross-domain challenges. Read this AI Research Report from Kili Technology on 'Evaluation of Large Language Model Vulnerabilities: A Comparative Analysis of Red Teaming Techniques'
    0 Σχόλια 0 Μοιράστηκε 6 Views
  • WWW.MARKTECHPOST.COM
    DeepSeek Introduces DeepSeek-R1-Lite-Preview with Complete Reasoning Outputs Matching OpenAI o1
    Artificial intelligence (AI) models have made substantial progress over the last few years, but they continue to face critical challenges, particularly in reasoning tasks. Large language models are proficient at generating coherent text, but when it comes to complex reasoning or problem-solving, they often fall short. This inadequacy is particularly evident in areas requiring structured, step-by-step logic, such as mathematical reasoning or code-breaking. Despite their impressive generative capabilities, models tend to lack transparency in their thought processes, which limits their reliability. Users are often left guessing how a conclusion was reached, leading to a trust gap between AI outputs and user expectations. To address these issues, there is a growing need for models that can provide comprehensive reasoning, clearly showing the steps that led to their conclusions.DeepSeek-R1-Lite-Preview: A New Approach to Transparent ReasoningDeepSeek has made progress in addressing these reasoning gaps by launching DeepSeek-R1-Lite-Preview, a model that not only improves performance but also introduces transparency in its decision-making process. The model matches OpenAIs o1 preview-level performance and is now available for testing through DeepSeeks chat interface, which is optimized for extended reasoning tasks. This release aims to tackle deficiencies in AI-driven problem-solving by offering complete reasoning outputs. DeepSeek-R1-Lite-Preview demonstrates its capabilities through benchmarks like AIME and MATH, positioning itself as a viable alternative to some of the most advanced models in the industry.https://x.com/deepseek_ai/status/1859200149844803724/photo/1Technical DetailsDeepSeek-R1-Lite-Preview provides a significant improvement in reasoning by incorporating Chain-of-Thought (CoT) reasoning capabilities. This feature allows the AI to present its thought process in real time, enabling users to follow the logical steps taken to reach a solution. Such transparency is crucial for users who require detailed insight into how an AI model arrives at its conclusions, whether they are students, professionals, or researchers. The models ability to tackle intricate prompts and display its thinking process helps clarify AI-driven results and instills confidence in its accuracy. With o1-preview-level performance on industry benchmarks like AIME (American Invitational Mathematics Examination) and MATH, DeepSeek-R1-Lite-Preview stands as a strong contender in the field of advanced AI models. Additionally, the model and its API are slated to be open-sourced, making these capabilities accessible to the broader community for experimentation and integration.https://x.com/deepseek_ai/status/1859200145037869485/photo/1Significance and ResultsDeepSeek-R1-Lite-Previews transparent reasoning outputs represent a significant advancement for AI applications in education, problem-solving, and research. One of the critical shortcomings of many advanced language models is their opacity; they arrive at conclusions without revealing their underlying processes. By providing a transparent, step-by-step chain of thought, DeepSeek ensures that users can see not only the final answer but also understand the reasoning that led to it. This is particularly beneficial for applications in educational technology, where understanding the why is often just as important as the what. In benchmark testing, the model displayed performance levels comparable to OpenAIs o1 preview, specifically on challenging tasks like those found in AIME and MATH. One test prompt involved deciphering the correct sequence of numbers based on cluestasks requiring multiple layers of reasoning to exclude incorrect options and arrive at the solution. DeepSeek-R1-Lite-Preview provided the correct answer (3841) while maintaining a transparent output that explained each step of the reasoning process.ConclusionDeepSeeks introduction of DeepSeek-R1-Lite-Preview marks a noteworthy advancement in AI reasoning capabilities, addressing some of the critical shortcomings seen in current models. By matching OpenAIs o1 in terms of benchmark performance and enhancing transparency in decision-making, DeepSeek has managed to push the boundaries of AI in meaningful ways. The real-time thought process and forthcoming open-source model and API release indicate DeepSeeks commitment to making advanced AI technologies more accessible. As the field continues to evolve, models like DeepSeek-R1-Lite-Preview could bring clarity, accuracy, and accessibility to complex reasoning tasks across various domains. Users now have the opportunity to experience a reasoning model that not only provides answers but also reveals the reasoning behind them, making AI both more understandable and trustworthy.Check out the Official Tweet and Try it here. All credit for this research goes to the researchers of this project. Also,dont forget to follow us onTwitter and join ourTelegram Channel andLinkedIn Group. If you like our work, you will love ournewsletter.. Dont Forget to join our55k+ ML SubReddit. Asif RazzaqAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences. Read this AI Research Report from Kili Technology on 'Evaluation of Large Language Model Vulnerabilities: A Comparative Analysis of Red Teaming Techniques'
    0 Σχόλια 0 Μοιράστηκε 3 Views
  • WWW.MARKTECHPOST.COM
    Google AI Research Introduces Caravan MultiMet: A Novel Extension to Caravan for Enhancing Hydrological Forecasting with Diverse Meteorological Data
    Large-sample hydrology is a critical field that addresses pressing global challenges, such as climate change, flood prediction, and water resource management. By leveraging vast datasets of hydrological and meteorological information across diverse regions, researchers develop models to predict water-related phenomena. This enables the creation of effective tools to mitigate risks and improve decision-making in real-world scenarios. These advancements are instrumental in safeguarding communities and ecosystems from water-related challenges.A significant problem in hydrological research is the limited availability of datasets that support real-time forecasting and operational benchmarking. Traditional datasets like ERA5-Land, while comprehensive, are restricted to historical data, limiting their application in real-time forecasting. This restriction poses challenges for hydrological model development, as researchers cannot adequately test model performance under live conditions or evaluate how uncertainty in forecasts propagates through hydrological systems. These gaps hinder advancements in predictive accuracy and the reliability of water management systems.Existing hydrological tools, such as CAMELS and ERA5-Land, provide valuable model development and evaluation insights. CAMELS datasets, which cover regions like the United States, Australia, and Europe, standardize data for various catchments and support regional hydrological studies. ERA5-Land, with its global coverage and high-quality surface variables, is widely used in hydrology. However, these datasets rely on historical observations and need more integration with real-time forecast data. This limitation prevents researchers from fully addressing the dynamic nature of water-related phenomena and responding effectively to real-time scenarios.Researchers from Google Research introduced the Caravan MultiMet extension, significantly enhancing the existing Caravan dataset. This extension integrates six new meteorological products, including three nowcastsCPC, IMERG v07 Early, and CHIRPSand three weather forecastsECMWF IFS HRES, GraphCast, and CHIRPS-GEFS. These additions enable comprehensive analyses of hydrological models in real-time contexts. By incorporating weather forecast data, the extension bridges the divide between hindcasting and operational forecasting, establishing Caravan as the first large-sample hydrology dataset to include such diverse forecast data.The Caravan MultiMet extension includes meteorological data aggregated at daily resolutions for over 22,000 gauges across 48 countries. The integration of both nowcast and forecast products ensures compatibility across datasets. For example, ERA5-Land data in the extension was recalculated in UTC zones to align with other products, simplifying comparisons. Forecast data, such as CHIRPS-GEFS, offers daily lead times ranging from one to 16 days, while GraphCast, developed by DeepMind, employs graph neural networks to produce global weather forecasts with a 10-day lead time. The extensions zarr file format enhances usability, allowing researchers to efficiently query specific variables, basins, and periods without processing the entire dataset. Furthermore, including diverse spatial resolutions, such as CHIRPSs high resolution of 0.05, further enhances the datasets robustness for localized studies.Including forecast data in Caravan has significantly improved model performance and evaluation capabilities. Tests revealed that variables such as temperature, precipitation, and wind components strongly agreed with ERA5-Land data, achieving R scores as high as 0.99 in certain cases. For example, total precipitation data from GraphCast demonstrated an R of 0.87 when compared to ERA5-Land, highlighting its reliability for hydrological applications. Similarly, ECMWF IFS HRES data showed compatibility with ERA5-Land variables, making it a valuable addition to the dataset. These results underscore the MultiMet extensions effectiveness in enhancing hydrological models accuracy and applicability.By introducing the Caravan MultiMet extension, researchers from Google Research addressed critical limitations in hydrological datasets. Integrating diverse meteorological products facilitates real-time forecasting, robust model benchmarking, and improved prediction accuracy. This advancement represents a significant step forward in hydrological research, enabling better water resource management and hazard mitigation decision-making. The availability of this dataset under open licenses further ensures its accessibility and impact on the global research community.Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also,dont forget to follow us onTwitter and join ourTelegram Channel andLinkedIn Group. If you like our work, you will love ournewsletter.. Dont Forget to join our55k+ ML SubReddit. Nikhil+ postsNikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute. LinkedIn event, 'One Platform, Multimodal Possibilities,' where Encord CEO Eric Landau and Head of Product Engineering, Justin Sharps will talk how they are reinventing data development process to help teams build game-changing multimodal AI models, fast
    0 Σχόλια 0 Μοιράστηκε 4 Views
  • WWW.MARKTECHPOST.COM
    LAION AI Unveils LAION-DISCO-12M: Enabling Machine Learning Research in Foundation Models with 12 Million YouTube Audio Links and Metadata
    The machine learning community faces a significant challenge in audio and music applications: the lack of a diverse, open, and large-scale dataset that researchers can freely access for developing foundation models. Despite advances in image and text-based AI research, the audio domain lags due to the absence of comprehensive datasets comparable to those available for computer vision or natural language processing. The community has long struggled with access to high-quality, diverse datasets that encapsulate real-world, contextually rich audio data, which has been a bottleneck for innovation in music and audio foundation models.Introduction to LAION-DISCO-12MTo address this gap, LAION AI has released LAION-DISCO-12Ma collection of 12 million links to publicly available YouTube samples, paired with metadata designed to support foundational machine learning research in audio and music. LAION-DISCO-12M draws from the publicly accessible sections of YouTube, ensuring that all the linked content complies with open access standards. By providing metadata, such as timestamps, descriptions, and other semantic details, researchers can effectively explore and contextualize the rich audio content available. The aim is to bridge the gap between the scale of data available for training AI systems in vision and text and the relatively limited datasets available for audio and music, enabling a significant leap forward in developing capable foundation models in these domains.Technical Details and BenefitsThe LAION-DISCO-12M dataset stands out due to its immense scale, meticulous metadata, and the careful curation process that ensures content diversity and quality. With over 12 million audio samples, the dataset provides extensive coverage of different music genres, soundscapes, spoken word, and various environmental sounds. The dataset is particularly valuable for those researching large-scale transformer models for music generation, audio classification, or generic audio-to-text translation. Moreover, each sample is accompanied by detailed metadata, including title, description, keywords, and timestamp information, which can be instrumental in training models for multimodal tasks, such as audio-visual learning or audio classification aligned with contextual cues.A key advantage of LAION-DISCO-12M is its scale and diversity. Researchers often face limitations due to the size or lack of contextual data in existing audio datasets, which can hinder model performance in real-world scenarios. LAION-DISCO-12M addresses these challenges by providing a larger dataset with enriched metadata, enhancing the models ability to learn complex relationships in audio data. The alignment of metadata to each audio clip provides valuable contextual information, facilitating more effective learning. For instance, models can use timestamps to localize sound events within longer samples, enabling new possibilities in event detection and audio understanding. LAION-DISCO-12M supports training and fine-tuning of advanced models, such as MusicLM or Wav2Vec, on a dataset that offers both breadth and depth.Significance and Initial ResultsThe availability of this dataset represents a meaningful advancement in foundation model research for audio. While existing datasets like Googles AudioSet have been valuable, LAION-DISCO-12M offers an important resource for open and community-driven AI research. It provides researchers worldwide with access to a comprehensive dataset, free from licensing fees or restricted access. Initial tests using subsets of LAION-DISCO-12M have shown promising improvements in the generalizability of music classification models, with preliminary results indicating up to a 15% accuracy increase compared to models trained on smaller datasets. This dataset also opens up possibilities for research into multimodal music generation and more context-aware voice assistants capable of understanding complex audio environments.ConclusionIn conclusion, LAION-DISCO-12M represents an important step forward for the machine learning community, particularly for those working on audio and music research. By providing a large and diverse collection of publicly accessible YouTube audio samples, LAION AI has made foundational research in audio more accessible. This dataset aims to support advancements in generative music models, contextual audio understanding, and multimodal AI research, similar to the impact of large text datasets in natural language processing. LAION-DISCO-12M serves as a valuable resource for expanding access to audio research and fostering innovation in AI-driven audio and music technologies.Check out the Details and Dataset on Hugging Face. All credit for this research goes to the researchers of this project. Also,dont forget to follow us onTwitter and join ourTelegram Channel andLinkedIn Group. If you like our work, you will love ournewsletter.. Dont Forget to join our55k+ ML SubReddit. Aswin Ak+ postsAswin AK is a consulting intern at MarkTechPost. He is pursuing his Dual Degree at the Indian Institute of Technology, Kharagpur. He is passionate about data science and machine learning, bringing a strong academic background and hands-on experience in solving real-life cross-domain challenges. LinkedIn event, 'One Platform, Multimodal Possibilities,' where Encord CEO Eric Landau and Head of Product Engineering, Justin Sharps will talk how they are reinventing data development process to help teams build game-changing multimodal AI models, fast
    0 Σχόλια 0 Μοιράστηκε 3 Views
  • WWW.MARKTECHPOST.COM
    This AI Paper Explores AgentOps Tools: Enhancing Observability and Traceability in Foundation Model FM-Based Autonomous Agents
    Foundation models (FMs) and large language models (LLMs) are revolutionizing AI applications by enabling tasks such as text summarization, real-time translation, and software development. These technologies have powered the development of autonomous agents that can perform complex decision-making and iterative processes with minimal human intervention. However, as these systems tackle increasingly multifaceted tasks, they require robust observability, traceability, and compliance mechanisms. Ensuring their reliability has become critical, especially as the demand for FM-based autonomous agents grows across academia and industry.A major hurdle in FM-based autonomous agents is their need for consistent traceability and observability across operational workflows. These agents rely on intricate processes, integrating various tools, memory modules, and decision-making capabilities to perform their tasks. This complexity often leads to suboptimal outputs that are difficult to debug and correct. Regulatory requirements, such as the EU AI Act, add another layer of complexity by demanding transparency and traceability in high-risk AI systems. Compliance with such frameworks is vital for gaining trust and ensuring the ethical deployment of AI systems.Existing tools and frameworks provide partial solutions but need to deliver end-to-end observability. For instance, LangSmith and Arize offer features for monitoring agent costs and improving latency but fail to address the broader life-cycle traceability required for debugging and compliance. Similarly, frameworks such as SuperAGI and CrewAI enable multi-agent collaboration and agent customization but lack robust mechanisms for monitoring decision-making pathways or tracing errors to their source. These limitations urgently require tools that can provide comprehensive oversight throughout the agent production life-cycle.Researchers at CSIROs Data61, Australia, conducted a rapid review of tools and methodologies in the AgentOps ecosystem to address these gaps. Their study examined existing AgentOps tools and identified key features for achieving observability and traceability in FM-based agents. Based on their findings, the researchers proposed a comprehensive overview of observability data and traceable artifacts that span the entire agent life cycle. Their review underscores the importance of these tools in ensuring system reliability, debugging, and compliance with regulatory frameworks such as the EU AI Act.The methodology employed in the study involved a detailed analysis of tools supporting the AgentOps ecosystem. The researchers identified observability and traceability as core components for enhancing the reliability of FM-based agents. AgentOps tools allow developers to monitor workflows, record LLM interactions, and trace external tool usage. Memory modules were highlighted as crucial for maintaining both short-term and long-term context, enabling agents to produce coherent outputs in multi-step tasks. Another important feature is the integration of guardrails, which enforce ethical and operational constraints to guide agents toward achieving their predefined objectives. Observability features like artifact tracing and session-level analytics were critical for real-time monitoring and debugging.The study revealed results that emphasize the effectiveness of AgentOps tools in addressing the challenges of FM-based agents. These tools ensure compliance with the EU AI Acts Articles 12, 26, and 79 by implementing comprehensive logging and monitoring capabilities. Developers can trace every decision made by the agent, from initial user inputs to intermediate steps and final outputs. This level of traceability not only simplifies debugging but also enhances transparency in agent operations. Observability tools within the AgentOps ecosystem also enable performance optimization through session-level analytics and actionable insights, helping developers refine workflows and improve efficiency. Although specific numerical improvements were not provided in the paper, the ability of these tools to streamline processes and enhance system reliability was consistently emphasized.The findings by CSIROs Data61 researchers provide a systematic overview of the AgentOps landscape and its potential to transform FM-based agent development. Their review offers valuable insights for developers and stakeholders looking to deploy reliable and compliant AI systems by focusing on observability and traceability. The study underscores the importance of integrating these capabilities into AgentOps platforms, which serve as a foundation for building scalable, transparent, and trustworthy autonomous agents. As the demand for FM-based agents continues to grow, the methodologies and tools outlined in this research set a benchmark for future advancements.Check out the Paper. All credit for this research goes to the researchers of this project. Also,dont forget to follow us onTwitter and join ourTelegram Channel andLinkedIn Group. If you like our work, you will love ournewsletter.. Dont Forget to join our55k+ ML SubReddit. Nikhil+ postsNikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute. LinkedIn event, 'One Platform, Multimodal Possibilities,' where Encord CEO Eric Landau and Head of Product Engineering, Justin Sharps will talk how they are reinventing data development process to help teams build game-changing multimodal AI models, fast
    0 Σχόλια 0 Μοιράστηκε 4 Views
  • WWW.MARKTECHPOST.COM
    Meet LLaVA-o1: The First Visual Language Model Capable of Spontaneous, Systematic Reasoning Similar to GPT-o1
    The development of vision-language models (VLMs) has faced challenges in handling complex visual question-answering tasks. Despite substantial advances in reasoning capabilities by large language models like OpenAIs GPT-o1, VLMs still struggle with systematic and structured reasoning. Current models often lack the ability to organize information and engage in logical, sequential reasoning, limiting their effectiveness for tasks that require deep cognitive processing, particularly when dealing with multimodal inputs such as images combined with text. Traditional VLMs tend to generate immediate responses without a step-by-step reasoning approach, leading to errors and inconsistencies.Meet LLaVA-o1A team of researchers from Peking University, Tsinghua University, Peng Cheng Laboratory, Alibaba DAMO Academy, and Lehigh University has introduced LLaVA-o1: a visual language model capable of systematic reasoning, similar to GPT-o1. LLaVA-o1 is an 11-billion-parameter model designed for autonomous, multistage reasoning. It builds upon the Llama-3.2-Vision-Instruct model and introduces a structured reasoning process, addressing the limitations of previous VLMs with a more methodical approach. The key innovation in LLaVA-o1 is the implementation of four distinct reasoning stages: summary, caption, reasoning, and conclusion.The model is fine-tuned using a dataset called LLaVA-o1-100k, derived from visual question answering (VQA) sources and structured reasoning annotations generated by GPT-4o. This enables LLaVA-o1 to perform multistage reasoning, extending capabilities similar to GPT-o1 into vision-language tasks, which have historically lagged behind text-based models.Technical Details and BenefitsLLaVA-o1 employs a novel inference-time scaling technique called stage-level beam search. Unlike previous methods, such as best-of-N or sentence-level beam search, LLaVA-o1 generates multiple responses for each stage of its structured reasoning process and selects the best candidate at each step, ensuring higher-quality results. This structured approach maintains logical coherence throughout the reasoning process, leading to more accurate conclusions.Fine-tuned from the Llama-3.2-11B-Vision-Instruct model, LLaVA-o1 shows an 8.9% improvement on multimodal reasoning benchmarks compared to its base model, even outperforming larger or closed-source competitors like Gemini-1.5-pro, GPT-4o-mini, and Llama-3.2-90B-Vision-Instruct. It achieves this with only 100,000 training samples, making LLaVA-o1 an efficient solution in terms of both performance and scalability. By employing structured thinking through distinct stages, LLaVA-o1 systematically addresses problems, minimizing reasoning errors common in other VLMs.Importance and ResultsLLaVA-o1 addresses a significant gap between textual and visual question-answering models by enabling systematic reasoning in vision-language tasks. Experimental results show that LLaVA-o1 improves performance across benchmarks like MMStar, MMBench, MMVet, MathVista, AI2D, and HallusionBench. It consistently surpasses its base model by over 6.9% across multimodal benchmarks, particularly in reasoning-intensive domains such as mathematical and scientific visual questions.Stage-level beam search enhances the models reliability by generating and verifying multiple candidate responses for each stage, selecting the most appropriate one. This allows LLaVA-o1 to excel in complex visual tasks, compared to traditional inference scaling methods that can be inefficient. LLaVA-o1 demonstrates that structured responses are crucial for achieving high-quality, consistent reasoning, setting a new standard for similarly sized models.ConclusionLLaVA-o1 is a visual language model capable of systematic reasoning, similar to GPT-o1. Its four-stage reasoning structure, combined with stage-level beam search, sets a new benchmark for multimodal AI. By training on a relatively small yet strategically constructed dataset, LLaVA-o1 demonstrates that efficient and scalable multimodal reasoning is achievable without the massive resources required by larger closed-source models. LLaVA-o1 paves the way for future research on structured reasoning within vision-language models, promising more advanced capabilities in AI-driven cognitive processing across visual and textual domains.Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also,dont forget to follow us onTwitter and join ourTelegram Channel andLinkedIn Group. If you like our work, you will love ournewsletter.. Dont Forget to join our55k+ ML SubReddit. Nikhil+ postsNikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute. LinkedIn event, 'One Platform, Multimodal Possibilities,' where Encord CEO Eric Landau and Head of Product Engineering, Justin Sharps will talk how they are reinventing data development process to help teams build game-changing multimodal AI models, fast
    0 Σχόλια 0 Μοιράστηκε 3 Views
  • WWW.MARKTECHPOST.COM
    DeBaTeR: A New AI Method that Leverages Time Information in Neural Graph Collaborative Filtering to Enhance both Denoising and Prediction Performance
    Recommender systems have been widely applied for studying user preferences; however, they face significant challenges in accurately capturing user preferences, particularly in the context of neural graph collaborative filtering. While these systems use interaction histories between users and items through Graph Neural Networks (GNNs) to mine latent information and capture high-order interactions, the quality of collected data poses a major obstacle. Moreover, malicious attacks that introduce fake interactions further deteriorate the recommendation quality. This challenge becomes acute in graph neural collaborative filtering, where the message-passing mechanism of GNNs amplifies the impact of these noisy interactions, leading to misaligned recommendations that fail to reflect users interests.Existing attempts to address these challenges mainly focus on two approaches: denoising recommender systems and time-aware recommender systems. Denoising methods utilize various strategies, such as identifying and down-weighting interactions between dissimilar users and items, pruning samples with larger losses during training, and using memory-based techniques to identify clean samples. Time-aware systems are extensively used in sequential recommendations but have limited application in collaborative filtering contexts. Most temporal approaches concentrate on incorporating timestamps into sequential models or constructing item-item graphs based on temporal order but fail to address the complex interplay between temporal patterns and noise in user interactions.Researchers from the University of Illinois at Urbana-Champaign USA and Amazon USA have proposed DeBaTeR, a novel approach for denoising bipartite temporal graphs in recommender systems. The method introduces two distinct strategies: DeBaTeR-A and DeBaTeR-L. The first strategy, DeBaTeR-A, focuses on reweighting the adjacency matrix using a reliability score derived from time-aware user and item embeddings, implementing both soft and hard assignment mechanisms to handle noisy interactions. The second strategy, DeBaTeR-L, employs a weight generator that utilizes time-aware embeddings to identify and down-weight potentially noisy interactions in the loss function.A comprehensive evaluation framework is utilized to evaluate DeBaTeRs predictive performance and denoising capabilities with vanilla and artificially noisy datasets to ensure robust testing. For vanilla datasets, specific filtering criteria are applied to retain only high-quality interactions (ratings 4 for Yelp and 4.5 for Amazon Movies and TV) from users and items with substantial engagement (>50 reviews). The datasets are split using a 7:3 ratio for training and testing, with noisy variations created by introducing 20% random interactions into the training sets. The evaluation framework uses temporal aspects by using the earliest test set timestamp as the query time for each user, with results averaged across four experimental rounds.The experimental results for the question How does the proposed approach perform compared to state-of-the-art denoising and general neural graph collaborative filtering methods? demonstrate the superior performance of both DeBaTeR variants across multiple datasets and metrics. DeBaTeR-L achieves higher NDCG scores, making it more suitable for ranking tasks, while DeBaTeR-A shows better precision and recall metrics, indicating its effectiveness for retrieval tasks. Moreover, DeBaTeR-L demonstrates enhanced robustness when dealing with noisy datasets, outperforming DeBaTeR-A across more metrics compared to their performance on vanilla datasets. The relative improvements against seven baseline methods are significant, confirming the effectiveness of both proposed approaches.In this paper, researchers introduced DeBaTeR, an innovative approach to address noise in recommender systems through time-aware embedding generation. The methods dual strategies DeBaTeR-A for adjacency matrix reweighting and DeBaTeR-L for loss function reweighting provide flexible solutions for different recommendation scenarios. The frameworks success lies in its integration of temporal information with user/item embeddings, shown through extensive experimentation on real-world datasets. Future research directions point toward exploring additional time-aware neural graph collaborative filtering algorithms and expanding the denoising capabilities to include user profiles and item attributes.Check out the Paper. All credit for this research goes to the researchers of this project. Also,dont forget to follow us onTwitter and join ourTelegram Channel andLinkedIn Group. If you like our work, you will love ournewsletter.. Dont Forget to join our55k+ ML SubReddit. Sajjad Ansari+ postsSajjad Ansari is a final year undergraduate from IIT Kharagpur. As a Tech enthusiast, he delves into the practical applications of AI with a focus on understanding the impact of AI technologies and their real-world implications. He aims to articulate complex AI concepts in a clear and accessible manner. LinkedIn event, 'One Platform, Multimodal Possibilities,' where Encord CEO Eric Landau and Head of Product Engineering, Justin Sharps will talk how they are reinventing data development process to help teams build game-changing multimodal AI models, fast
    0 Σχόλια 0 Μοιράστηκε 4 Views
  • WWW.MARKTECHPOST.COM
    Support Vector Machine (SVM) Algorithm
    Support Vector Machines (SVMs) are a powerful and versatile supervised machine learning algorithm primarily used for classification and regression tasks. They excel in high-dimensional spaces and are particularly effective when dealing with complex datasets. The core principle behind SVM is to identify the optimal hyperplane that effectively separates data points into different classes while maximizing the margin between them.SVMs have gained significant popularity due to their ability to handle both linear and non-linear classification problems. By employing kernel functions, SVMs can map data into higher-dimensional feature spaces, capturing intricate patterns and relationships that may not be apparent in the original space.Why Use SVM?Effective in High-Dimensional Spaces: SVM can handle high-dimensional data without overfitting, making it suitable for complex problems.Versatile: It can be used for both linear and non-linear classification and regression tasks.Robust to Outliers: SVM is relatively insensitive to outliers, which can improve its performance on noisy datasets.Memory Efficient: SVM models are relatively compact, making them efficient in terms of storage and computational resources.Linear SVMIn a linearly separable dataset, the goal is to find the hyperplane that maximizes the margin between the two classes. The margin is the distance between the hyperplane and the closest data points from each class, known as support vectors.The equation of a hyperplane in d-dimensional space is:w^T * x + b = 0where:w: Weight vectorx: Input feature vectorb: Bias termThe decision function for a new data point x is:f(x) = sign(w^T * x + b)The optimization problem for maximizing the margin can be formulated as:Maximize: Margin = 2 / ||w||Subject to: yi * (w^T * xi + b) >= 1, for all iwhere:yi: Class label of the ith data pointNon-Linear SVMFor non-linearly separable data, SVM employs the kernel trick. The kernel function maps the data from the original space to a higher-dimensional feature space where it becomes linearly separable. Common kernel functions include:Polynomial Kernel:K(x, y) = (x^T * y + c)^dRadial Basis Function (RBF) Kernel:K(x, y) = exp(-gamma * ||x y||^2)Limitations of SVMSensitivity to Kernel Choice: The choice of kernel function significantly impacts SVMs performance.Computational Complexity: Training SVM can be computationally expensive, especially for large datasets.Difficulty in Interpreting Results: SVM models can be difficult to interpret, especially when using complex kernel functions.Understanding Where to Apply the SVM AlgorithmAre you unsure where to use the Support Vector Machine (SVM) algorithm? Lets explore its ideal applications and the types of tasks and data it excels at.Key Applications of SVMText ClassificationImage ClassificationBioinformaticsFinancial Data AnalysisSVM works best with well-defined classes, clear decision boundaries, and a moderate amount of data. It is particularly effective when the number of features is comparable to or larger than the number of samples.ConclusionSupport Vector Machine is a versatile and powerful algorithm for classification and regression tasks. Its ability to handle high-dimensional data, its robustness to outliers, and its ability to learn complex decision boundaries make it a valuable tool in the machine learning toolkit. However, to achieve optimal performance, careful consideration of the kernel function and computational resources is necessary. Pragati Jhunjhunwala+ postsPragati Jhunjhunwala is a consulting intern at MarktechPost. She is currently pursuing her B.Tech from the Indian Institute of Technology(IIT), Kharagpur. She is a tech enthusiast and has a keen interest in the scope of software and data science applications. She is always reading about the developments in different field of AI and ML. LinkedIn event, 'One Platform, Multimodal Possibilities,' where Encord CEO Eric Landau and Head of Product Engineering, Justin Sharps will talk how they are reinventing data development process to help teams build game-changing multimodal AI models, fast
    0 Σχόλια 0 Μοιράστηκε 6 Views
  • WWW.MARKTECHPOST.COM
    Kinetix: An Open-Ended Universe of Physics-based Tasks for Reinforcement Learning
    Self-supervised learning on offline datasets has permitted large models to reach remarkable capabilities both in text and image domains. Still, analogous generalizations for agents acting sequentially in decision-making problems are difficult to attain. The environments of classical Reinforcement Learning (RL) are mostly narrow and homogeneous and, consequently, hard to generalize.Current reinforcement learning (RL) methods often train agents on fixed tasks, limiting their ability to generalize to new environments. Platforms like MuJoCo and OpenAI Gym focus on specific scenarios, restricting agent adaptability. RL is based on Markov Decision Processes (MDPs), where agents maximize cumulative rewards by interacting with environments. Unsupervised Environment Design (UED) addresses these limitations by introducing a teacher-student framework, where the teacher designs tasks to challenge the agent and promote efficient learning. Certain metrics ensure tasks are neither too easy nor impossible. Tools like JAX enable faster GPU-based RL training through parallelization, while transformers, using attention mechanisms, enhance agent performance by modeling complex relationships in sequential or unordered data.To address these limitations, a team of researchers has developed Kinetix, an open-ended space of physics-based RL environments.Kinetix, proposed by a team of researchers from Oxford University, can represent tasks ranging from robotic locomotion and grasping to video games and classic RL environments. Kinetix uses a novel hardware-accelerated physics engine, Jax2D, that allows for the cheap simulation of billions of environmental steps during training. The trained agent exhibits strong physical reasoning capabilities, being able to zero-shot solve unseen human-designed environments. Furthermore, fine-tuning this general agent on tasks of interest shows significantly stronger performance than training an RL agent tabula rasa. Jax2D applies discrete Euler steps for rotational and positional velocities and uses impulses and higher-order corrections to constrain instantaneous sequences for efficient simulation of diversified physical tasks. Kinetix is suited for multi-discrete and continuous action spaces and for a wide array of RL tasks.The researchers trained a general RL agent on tens of millions of procedurally generated 2D physics-based tasks. The agent exhibited strong physical reasoning capabilities, being able to zero-shot solve unseen human-designed environments. Fine-tuning this demonstrates the feasibility of large-scale, mixed-quality pre-training for online RL.In conclusion, Kinetix is a discovery that addresses the limitations of traditional RL environments by providing a diverse and open-ended space for training, leading to improved generalization and performance of RL agents. This work can serve as a foundation for future research in large-scale online pre-training of general RL agents and unsupervised environment design.Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also,dont forget to follow us onTwitter and join ourTelegram Channel andLinkedIn Group. If you like our work, you will love ournewsletter.. Dont Forget to join our55k+ ML SubReddit. Nazmi Syed+ postsNazmi Syed is a consulting intern at MarktechPost and is pursuing a Bachelor of Science degree at the Indian Institute of Technology (IIT) Kharagpur. She has a deep passion for Data Science and actively explores the wide-ranging applications of artificial intelligence across various industries. Fascinated by technological advancements, Nazmi is committed to understanding and implementing cutting-edge innovations in real-world contexts. LinkedIn event, 'One Platform, Multimodal Possibilities,' where Encord CEO Eric Landau and Head of Product Engineering, Justin Sharps will talk how they are reinventing data development process to help teams build game-changing multimodal AI models, fast
    0 Σχόλια 0 Μοιράστηκε 4 Views
  • WWW.MARKTECHPOST.COM
    Meet Memoripy: A Python Library that Brings Real Memory Capabilities to AI Applications
    Artificial intelligence systems often struggle with retaining meaningful context over extended interactions. This limitation poses challenges for applications such as chatbots and virtual assistants, where maintaining a coherent conversation thread is essential. Most traditional AI models operate in a stateless manner, focusing solely on immediate inputs without considering the continuity of prior exchanges. This lack of effective memory leads to fragmented and inconsistent interactions, hampering the ability to build truly engaging, context-sensitive AI systems.Meet Memoripy: A Python library that brings real memory capabilities to AI applications. Memoripy addresses the problem of maintaining conversational context by equipping AI systems with structured memory, allowing them to effectively store, recall, and build upon prior interactions. Memoripy provides both short-term and long-term memory storage, enabling AI systems to retain context from recent interactions while preserving important information over the long term. By structuring memory in a way that mimics human cognitionprioritizing recent events and retaining key detailsMemoripy ensures that interactions remain relevant and coherent over time.Memoripy organizes memory into short-term and long-term clusters, enabling the prioritization of recent interactions for immediate recall while retaining significant historical interactions for future use. This prevents the AI from becoming overwhelmed with excessive data while ensuring relevant information is accessible. Memoripy also implements semantic clustering, grouping similar memories together to facilitate efficient context retrieval. This capability allows AI systems to quickly identify and link related memories, thereby enhancing response quality. Furthermore, Memoripy incorporates memory decay and reinforcement mechanisms, whereby less useful memories gradually fade, and frequently accessed memories are reinforced, reflecting principles of human memory. Memoripys design emphasizes local storage, which allows developers to handle memory operations entirely on local infrastructure. This approach mitigates privacy concerns and provides flexibility in integrating with locally hosted language models, as well as with external services like OpenAI and Ollama.To illustrate how Memoripy can be integrated into an AI application, consider the following example:from memoripy import MemoryManager, JSONStoragedef main(): # Replace 'your-api-key' with your actual OpenAI API key api_key = "your-key" if not api_key: raise ValueError("Please set your OpenAI API key.") # Define chat and embedding models chat_model = "openai" # Choose 'openai' or 'ollama' for chat chat_model_name = "gpt-4o-mini" # Specific chat model name embedding_model = "ollama" # Choose 'openai' or 'ollama' for embeddings embedding_model_name = "mxbai-embed-large" # Specific embedding model name # Choose your storage option storage_option = JSONStorage("interaction_history.json") # Initialize the MemoryManager with the selected models and storage memory_manager = MemoryManager( api_key=api_key, chat_model=chat_model, chat_model_name=chat_model_name, embedding_model=embedding_model, embedding_model_name=embedding_model_name, storage=storage_option ) # New user prompt new_prompt = "My name is Khazar" # Load the last 5 interactions from history (for context) short_term, _ = memory_manager.load_history() last_interactions = short_term[-5:] if len(short_term) >= 5 else short_term # Retrieve relevant past interactions, excluding the last 5 relevant_interactions = memory_manager.retrieve_relevant_interactions(new_prompt, exclude_last_n=5) # Generate a response using the last interactions and retrieved interactions response = memory_manager.generate_response(new_prompt, last_interactions, relevant_interactions) # Display the response print(f"Generated response:\n{response}") # Extract concepts for the new interaction combined_text = f"{new_prompt} {response}" concepts = memory_manager.extract_concepts(combined_text) # Store this new interaction along with its embedding and concepts new_embedding = memory_manager.get_embedding(combined_text) memory_manager.add_interaction(new_prompt, response, new_embedding, concepts)if __name__ == "__main__": main()In this script, the MemoryManager Is initialized with specified chat and embedding models, along with a storage option. A new user prompt is processed, and the system retrieves relevant past interactions to generate a contextually appropriate response. The interaction is then stored with its embedding and extracted concepts for future reference.Memoripy provides an essential advancement in building AI systems that are more context-aware. The ability to retain and recall relevant information enables the development of virtual assistants, conversational agents, and customer service systems that offer more consistent and personalized interactions. For instance, a virtual assistant using Memoripy could remember user preferences or details of prior requests, thereby offering a more tailored response. Preliminary evaluations indicate that AI systems incorporating Memoripy exhibit enhanced user satisfaction, producing more coherent and contextually appropriate responses. Moreover, Memoripys emphasis on local storage is crucial for privacy-conscious applications, as it allows data to be handled securely without reliance on external servers.In conclusion, Memoripy represents a significant step towards more sophisticated AI interactions by providing real memory capabilities that enhance context retention and coherence. By structuring memory in a way that closely mimics human cognitive processes, Memoripy paves the way for AI systems that can adapt based on cumulative user interactions and offer more personalized, contextually aware experiences. This library provides developers with the tools needed to create AI that not only processes inputs but also learns from interactions in a meaningful way. Check out the GitHub Repo. All credit for this research goes to the researchers of this project. Also,dont forget to follow us onTwitter and join ourTelegram Channel andLinkedIn Group. If you like our work, you will love ournewsletter.. Dont Forget to join our55k+ ML SubReddit. Asif RazzaqAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences. LinkedIn event, 'One Platform, Multimodal Possibilities,' where Encord CEO Eric Landau and Head of Product Engineering, Justin Sharps will talk how they are reinventing data development process to help teams build game-changing multimodal AI models, fast
    0 Σχόλια 0 Μοιράστηκε 5 Views
  • WWW.MARKTECHPOST.COM
    Google AI Introduces LAuReL (Learned Augmented Residual Layer): Revolutionizing Neural Networks with Enhanced Residual Connections for Efficient Model Performance
    Model efficiency is important in the age of large language and vision models, but they face significant efficiency challenges in real-world deployments. Critical metrics such as training compute requirements, inference latency, and memory footprint impact deployment costs and system responsiveness. These constraints often limit the practical implementation of high-quality models in production environments. The need for efficient deep learning methods has become important, focusing on optimizing the trade-off between model quality and resource footprint. While various approaches including algorithmic techniques, efficient hardware solutions, and best practices have emerged, architectural improvements remain fundamental to efficiency gains.Several approaches have emerged to address model efficiency challenges, each with distinct focuses and limitations. Existing methods like LoRA introduce low-rank adapter weights during fine-tuning while keeping other weights constant, and AltUp creates parallel lightweight transformer blocks to simulate larger model dimensions. Other methods like compression techniques, include quantization and pruning to reduce model size and latency but can impact model quality. Knowledge distillation techniques transfer knowledge from larger teacher models to smaller student models, and progressive learning approaches like Stacking and RaPTr grow networks gradually. However, these methods involve complex training or trade-offs between efficiency and performance.Researchers from Google Research, Mountain View, CA, and Google Research, New York, NY have proposed a novel method called Learned Augmented Residual Layer (LAUREL), which revolutionizes the traditional residual connection concept in neural networks. It serves as a direct replacement for conventional residual connections while improving both model quality and efficiency metrics. LAUREL shows remarkable versatility, with significant improvements across vision and language models. When implemented in ResNet-50 for ImageNet 1K classification, LAUREL achieves 60% of the performance gains associated with adding an entire extra layer, with only 0.003% additional parameters. This efficiency translates to matching full-layer performance with 2.6 times fewer parameters.LAURELs implementation is tested in both vision and language domains, focusing on the ResNet-50 model for ImageNet-1K classification and a 3B parameter decoder-only transformer for language tasks. The architecture seamlessly integrates with existing residual connections, requiring minimal modifications to standard model architectures. For vision tasks, the implementation involves incorporating LAUREL into ResNet-50s skip connections and training on ImageNet 1K using 16 Cloud TPUv5e chips with data augmentation. In the language domain, two variants of LAUREL (LAUREL-RW and LAUREL-LR) are implemented in a 3B parameter transformer model and trained from scratch on text tokens using 1024 Cloud TPU v5e chips over two weeks.The results demonstrate LAURELs superior efficiency compared to traditional scaling methods. In vision tasks, adding an extra layer to ResNet-50 enhances accuracy by 0.25% with 4.37% more parameters, but LAUREL-RW achieves 0.15% improvement with just 0.003% parameter increase. The LAUREL-RW+LR variant matches the performance of the extra-layer approach while using 2.6 times fewer parameters, and LAUREL-RW+LR+PA outperforms it with 1.82 times fewer parameters. Moreover, in language models, LAUREL shows consistent improvements across tasks including Q&A, NLU, Math, and Code with only a 0.012% parameter increase. This minimal parameter addition makes LAUREL efficient for large-scale models.In conclusion, researchers introduced the LAUREL framework which represents a significant advancement in neural network architecture, offering a complex alternative to traditional residual connections. Its three variants LAUREL-RW, LAUREL-LR, and LAUREL-PA can be flexibly combined to optimize performance across different applications. The frameworks success in both vision and language tasks, along with its minimal parameter overhead shows its potential as a superior alternative to conventional model scaling approaches. The versatility and efficiency of LAUREL make it a promising candidate for future applications in other architectures like Vision Transformers (ViT).Check out the Paper. All credit for this research goes to the researchers of this project. Also,dont forget to follow us onTwitter and join ourTelegram Channel andLinkedIn Group. If you like our work, you will love ournewsletter.. Dont Forget to join our55k+ ML SubReddit. Sajjad Ansari+ postsSajjad Ansari is a final year undergraduate from IIT Kharagpur. As a Tech enthusiast, he delves into the practical applications of AI with a focus on understanding the impact of AI technologies and their real-world implications. He aims to articulate complex AI concepts in a clear and accessible manner. LinkedIn event, 'One Platform, Multimodal Possibilities,' where Encord CEO Eric Landau and Head of Product Engineering, Justin Sharps will talk how they are reinventing data development process to help teams build game-changing multimodal AI models, fast
    0 Σχόλια 0 Μοιράστηκε 5 Views
  • WWW.MARKTECHPOST.COM
    List of Large Mixture of Experts (MoE) Models: Architecture, Performance, and Innovations in Scalable AI Solutions
    Mixture of Experts (MoE) models represents a significant breakthrough in machine learning, offering an efficient approach to handling large-scale models. Unlike dense models, where all parameters are active during inference, MoE models activate only a fraction of their parameters. This approach balances computational efficiency with scalability, making MoE models highly attractive for various use cases. MoE models achieve efficiency by activating fewer parameters while maintaining a larger total parameter count. This design introduces unique trade-offs, including increased architectural complexity, but it provides greater flexibility for developers and researchers.Lets explore the largest MoE models released to date, focusing on their architecture, capabilities, and relative performance. These models are all publicly available and exceed 100 billion parameters. The analysis is ordered chronologically by release date, with rankings provided where available from the LMSYS leaderboard as of November 4, 2024.Googles Switch-C Transformer is one of the earliest models in the MoE space. Released on Hugging Face in November 2022, it boasts a staggering 1.6 trillion total parameters, supported by 2048 experts. Despite being an early innovator in this domain, Switch-C is now considered outdated, as it is not ranked on modern benchmarks like LMSYS. However, it remains noteworthy as a foundational MoE model and continues to influence subsequent innovations. Smaller variants of the Switch-C Transformer are also available, offering more accessible entry points for experimentation.In March 2024, X AI released Grok-1, a model with 314 billion total parameters and 86 billion active during inference. Unlike its predecessor, Grok-1 utilizes a smaller pool of experts, eight in total, with only two active per inference task. Its 8k context length is suitable for moderately long input sequences, though it is not competitive with newer models. While Grok-1 has limited adoption and is not ranked on LMSYS, its successor, Grok-2, has shown promise in preliminary benchmarks. Grok-2, yet to be publicly released, has ranked fifth overall in specific LMSYS tasks, suggesting that future iterations of this model could redefine performance benchmarks in the MoE landscape.Shortly after Grok-1, Databricks released DBRX in late March 2024. This model features 132 billion total parameters, with 36 billion active, spread across 16 experts. Its 32k context length significantly outpaces many contemporaries, allowing it to process longer input sequences efficiently. DBRX is supported by multiple backends, including llamacpp, exllama v2, and vLLM, making it a versatile choice for developers. Despite its strong architecture, its LMSYS rankings place it only at 90th overall and 78th for hard prompts in English, indicating room for improvement in quality and adoption.April 2024 saw the release of Mistral AIs Mixtral 8x22b. This model stands out with its 141 billion total parameters and 39 billion active during inference. It incorporates eight experts, two of which are chosen dynamically based on the input. With a 64k context length, Mixtral is well-suited for tasks requiring extensive input handling. While its LMSYS rankings, 70th overall and 66th on hard prompts, indicate middling performance, its compatibility with multiple backends ensures usability across diverse platforms.Another April release was Snowflakes Arctic, an MoE model with 480 billion total parameters but only 17 billion active during inference. Arctics unique design combines sparse (7 billion) and dense (10 billion) components distributed among 128 experts. However, its performance falls short, ranking 99th overall on LMSYS and a notably low 101st for hard prompts. Its limited 4k context length further restricts its applicability, making it a less competitive option despite its innovative architecture.Skywork joined the MoE space in June 2024 with the release of Skywork-MoE. This model features 146 billion total parameters, of which 22 billion are active, and employs 16 experts during inference. With an 8k context length, it supports moderately lengthy tasks but lacks LMSYS rankings, which suggests limited testing or adoption. The base model is the only available version, as the promised chat variant has yet to be released.In August 2024, AI21 Labs released Jamba 1.5 Large, a hybrid model that merges MoE and mamba-transformer architectures. With 398 billion total parameters and 98 billion active, Jamba 1.5 Large offers an exceptional 256k context length, making it ideal for tasks requiring extensive input processing. Its LMSYS rankings reflect its high performance, placing 34th overall and 28th for hard prompts. Additionally, Jamba models excel in context benchmarks, particularly the RULER context benchmark, solidifying their reputation for long-context tasks.DeepSeek V2.5, released in September 2024, currently leads the MoE space in performance. This model incorporates 236 billion total parameters, with 21 billion active during inference. Its architecture includes 160 experts, of which six are dynamically chosen and two are shared, resulting in eight active parameters. With a 128k context length, DeepSeek V2.5 demonstrates robust capabilities for long-context tasks. It ranks 18th overall on LMSYS and 6th for hard prompts, outperforming all available MoE models. Earlier iterations, such as DeepSeek V2, laid the groundwork for its success.The most recent addition to the MoE family is Tencents Hunyuan Large, released in November 2024. With 389 billion total parameters and 52 billion active, Hunyuan Large employs a unique design, where one expert is chosen dynamically and one is shared. This results in two active parameters during inference. Its 128k context length matches that of DeepSeek V2.5, positioning it as a strong competitor. While it is not yet ranked on LMSYS, early indications suggest it could rival or surpass DeepSeeks performance.Among the MoE models discussed, DeepSeek V2.5 is the most robust option currently available. However, newer models such as Hunyuan Large and the anticipated Grok-2 may soon shift the rankings. Models like Jamba 1.5 Large also highlight the strengths of hybrid architectures, particularly in tasks requiring extensive context handling. The LMSYS rankings, while useful for initial comparisons, do not capture every nuance of model performance, especially for specialized tasks.In conclusion, MoE models represent a growing frontier in AI, offering scalable and efficient solutions tailored to diverse applications. Developers and researchers are encouraged to explore these models based on specific use cases, leveraging their unique architectures to optimize performance. As the field evolves, the MoE landscape will likely witness further innovations, pushing the boundaries of what these architectures can achieve.This article is based on this Reddit post. All credit for this research goes to the researchers of this project. Also,dont forget to follow us onTwitter and join ourTelegram Channel andLinkedIn Group. If you like our work, you will love ournewsletter.. Dont Forget to join our55k+ ML SubReddit. Asif RazzaqAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences. LinkedIn event, 'One Platform, Multimodal Possibilities,' where Encord CEO Eric Landau and Head of Product Engineering, Justin Sharps will talk how they are reinventing data development process to help teams build game-changing multimodal AI models, fast
    0 Σχόλια 0 Μοιράστηκε 3 Views
  • WWW.MARKTECHPOST.COM
    Top Artificial Intelligence AI Books to Read in 2024
    Artificial Intelligence (AI) has been making significant strides over the past few years, with the emergence of Large Language Models (LLMs) marking a major milestone in its growth. With such widespread adoption, feeling left out of this revolution is not uncommon. One way an individual can stay updated with the latest trends is by reading books on various facets of AI. Following are the top AI books one should read in 2024.Deep Learning (Adaptive Computation and Machine Learning series)This book covers a wide range of deep learning topics along with their mathematical and conceptual background. It also provides information on the different deep learning techniques used in various industrial applications.Python: Advanced Guide to Artificial IntelligenceThis book helps individuals familiarize themselves with the most popular machine learning (ML) algorithms and delves into the details of deep learning, covering topics like CNN, RNN, etc. It provides a comprehensive understanding of advanced AI concepts while focusing on their practical implementation using Python.Machine Learning (in Python and R) for DummiesThis book explains the fundamentals of machine learning by providing practical examples using Python and R. It is a beginner-friendly guide and a good starting point for people new to this field.Machine Learning for BeginnersGiven the pace with which machine learning systems are growing, this book provides a good base for anyone shifting to this field. The author talks about machine intelligences historical background and provides beginners with information on how advanced algorithms work.Artificial Intelligence: A Modern ApproachThis is a well-acclaimed book that covers the breadth of AI topics, including problem-solving, knowledge representation, machine learning, and natural language processing. It provides theoretical explanations along with practical examples, making it an excellent starting point for anyone looking to dive into the world of AI.Human Compatible: Artificial Intelligence and the Problem of ControlThe book discusses the inevitable conflict between humans and machines, providing important context before we advocate for AI. The author also talks about the possibility of superhuman AI and questions the concepts of human comprehension and machine learning.The Alignment Problem: Machine Learning and Human ValuesThis book talks about a concept called The Alignment Problem, where the systems we aim to teach, dont perform as expected, and various ethical and existential risks emerge.Life 3.0: Being Human in the Age of Artificial IntelligenceThe author of this book talks about questions like what the future of AI will look like and the possibility of superhuman intelligence becoming our master. He also talks about how we can ensure these systems perform without malfunctioning.The Coming Wave: Technology, Power, and the Twenty-First Centurys Greatest DilemmaThis book warns about the risks that emerging technologies pose to global order. It covers topics like robotics and large language models and examines the forces that fuel these innovations.Artificial Intelligence Engines: A Tutorial Introduction to the Mathematics of Deep LearningArtificial Intelligence Engines dives into the mathematical foundations of deep learning. It provides a holistic understanding of deep learning, covering both the historical development of neural networks as well as modern techniques and architecture while focusing on the underlying mathematical concepts.Neural Networks and Deep LearningThis book covers the fundamental concepts of neural networks and deep learning. It also covers the mathematical aspects of the same, covering topics like linear algebra, probability theory, and numerical computation.Artificial Intelligence for HumansThis book explains how AI algorithms are used using actual numeric calculations. The book aims to target those without an extensive mathematical background and each unit is followed by examples in different programming languages.AI Superpowers: China, Silicon Valley, and the New World OrderThe author of this book explains the unexpected consequences of AI development. The book sheds light on the competition between the USA and China over AI innovations through actual events.Hello World: Being Human in the Age of AlgorithmsThe author talks about the powers and limitations of the algorithms that are widely used today. The book prepares its readers for the moral uncertainties of a world run by code.The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our WorldThis book talks about the concept of the Master algorithm, which is a single, overarching learning algorithm capable of incorporating different approaches.Applied Artificial Intelligence: A Handbook for Business LeadersApplied Artificial Intelligence provides a guide for businesses on how to leverage AI to drive innovation and growth. It covers various applications of AI and also explores its ethical considerations. Additionally, it sheds light on building AI teams and talent acquisition.Superintelligence: Paths, Dangers, StrategiesThis book asks questions like whether AI agents will save or destroy us and what happens when machines surpass humans in general intelligence. The author talks about the importance of global collaboration in developing safe AI.We make a small profit from purchases made via referral/affiliate links attached to each book mentioned in the above list.If you want to suggest any book that we missed from this list, then please email us atasif@marktechpost.com Shobha Kakkar+ postsShobha is a data analyst with a proven track record of developing innovative machine-learning solutions that drive business value. LinkedIn event, 'One Platform, Multimodal Possibilities,' where Encord CEO Eric Landau and Head of Product Engineering, Justin Sharps will talk how they are reinventing data development process to help teams build game-changing multimodal AI models, fast
    0 Σχόλια 0 Μοιράστηκε 5 Views
  • WWW.MARKTECHPOST.COM
    Top Generative Artificial Intelligence AI Courses in 2024
    In recent years, generative AI has surged in popularity, transforming fields like text generation, image creation, and code development. Its ability to automate and enhance creative tasks makes it a valuable skill for professionals across industries. Learning generative AI is crucial for staying competitive and leveraging the technologys potential to innovate and improve efficiency. This article lists the top generative AI courses that provide comprehensive training to help you master this technology, enhance your professional skill set, and stay ahead in the rapidly evolving job market.Introduction to Generative AI Learning Path SpecializationThis course offers a comprehensive introduction to generative AI, covering large language models (LLMs), their applications, and ethical considerations. The learning path comprises three courses: Generative AI, Large Language Models, and Responsible AI.Generative AI for EveryoneThis course provides a unique perspective on using generative AI. It covers how generative AI works, its applications, and its limitations, with hands-on exercises for practical use and effective prompt engineering. It aims to empower everyone to participate in an AI-powered future.Introduction to Generative AIThis beginner-friendly course provides a solid foundation in generative AI, covering concepts, effective prompting, and major models. It includes hands-on examples and practical exercises and explores use cases across various domains like text, images, and code.Generative AI with Large Language ModelsThis course teaches the fundamentals of generative AI with large language models (LLMs), including their lifecycle, transformer architecture, and optimization. It covers training, tuning, and deploying LLMs with practical insights from industry experts.Generative AI Fundamentals SpecializationThis specialization offers a comprehensive introduction to generative AI, covering models like GPT and DALL-E, prompt engineering, and ethical considerations. It includes five self-paced courses with hands-on labs and projects using tools like ChatGPT, Stable Diffusion, and IBM Watsonx.ai.Generative AI for Data Scientists SpecializationThis specialization by IBM is designed for data professionals to learn generative AI, including prompt engineering and applying AI tools in data science. It features hands-on projects like text, image, and code generation, as well as creating prediction models.Generative AI for Data Analysts SpecializationThis specialization covers generative AI use cases, models, and tools for text, code, image, audio, and video generation. It includes prompt engineering techniques, ethical considerations, and hands-on labs using tools like IBM Watsonx and GPT. Suitable for beginners, it offers practical projects to apply AI concepts in real-world scenarios.Generative AI for Software Developers SpecializationThis IBM specialization teaches software developers to leverage generative AI for writing high-quality code, enhancing productivity and efficiency. It includes three self-paced courses covering generative AI basics, prompt engineering, and tools like GitHub Co-pilot and ChatGPT, with hands-on projects to apply skills in real-world scenarios.IBM: Developing Generative AI Applications with PythonThis course teaches generative AI modeling through hands-on projects using Python, Flask, Gradio, and frameworks like Langchain. Youll build applications with LLMs like GPT-3 and Llama 2 and explore retrieval-augmented generation and voice-enabled chatbots.AI: Generative AI and LLMs on AWSThis course teaches deploying generative AI models like GPT on AWS through hands-on labs, covering architecture selection, cost optimization, monitoring, CI/CD pipelines, and compliance. It is ideal for ML engineers, data scientists, and technical leaders, providing real-world training for production-ready generative AI using Amazon Bedrock and cloud-native services.Using GenAI to Automate Software Development TasksThis course teaches how to streamline development workflows with generative AI, use AI pair programming tools like CodeWhisperer, master prompt engineering, and understand the role of Rust and Python in MLOps. It includes hands-on experience with AWS services like Code Catalyst, SageMaker, and Lightsail.AI Prompt Engineering for BeginnersThis course focuses on prompt engineering for AI language tools like ChatGPT. It offers hands-on practice and guidance to frame effective prompts.Generative AI for Business LeadersThis course equips business leaders with essential knowledge of generative AI and its tools to adapt and implement this transformative technology. By the end, youll understand how generative AI can revolutionize business operations and gain the skills needed for successful implementation.We make a small profit from purchases made viareferral/affiliate links attached to each course mentioned in the above list.If you want to suggest any course that we missed from this list, then please email us atasif@marktechpost.comThe post Top Generative Artificial Intelligence AI Courses in 2024 appeared first on MarkTechPost.
    0 Σχόλια 0 Μοιράστηκε 6 Views
  • WWW.MARKTECHPOST.COM
    GaLiTe and AGaLiTe: Efficient Transformer Alternatives for Partially Observable Online Reinforcement Learning
    In real-world settings, agents often face limited visibility of the environment, complicating decision-making. For instance, a car-driving agent must recall road signs from moments earlier to adjust its speed, yet storing all observations is unscalable due to memory limits. Instead, agents must learn compressed representations of observations. This challenge is compounded in ongoing tasks, where essential past information can only sometimes be retained efficiently. Incremental state construction is key in partially observable online reinforcement learning (RL), where recurrent neural networks (RNNs) like LSTMs handle sequences effectively, though theyre tough to train. Transformers capture long-term dependencies but come with higher computational costs.Various approaches have extended linear transformers to address their limitations in handling sequential data. One architecture uses a scalar gating method to accumulate values over time, while others add recurrence and non-linear updates to enhance learning from sequential dependencies, although this can reduce parallelization efficiency. Additionally, some models selectively calculate sparse attention or cache previous activations, allowing them to attend to longer sequences without significant memory cost. Other recent innovations reduce the complexity of self-attention, improving transformers ability to process long contexts efficiently. Though transformers are commonly used in offline reinforcement learning, their application in model-free settings is still emerging.Researchers from the University of Alberta and Amii developed two new transformer architectures tailored for partially observable online reinforcement learning, addressing issues with high inference costs and memory demands typical of traditional transformers. Their proposed models, GaLiTe and AGaLiTe, implement a gated self-attention mechanism to manage and update information efficiently, providing a context-independent inference cost and improved performance in long-range dependencies. Testing in 2D and 3D environments, like T-Maze and Craftax, showed these models outperformed or matched the state-of-the-art GTrXL, reducing memory and computation by over 40%, with AGaLiTe achieving up to 37% better performance on complex tasks.The Gated Linear Transformer (GaLiTe) enhances linear transformers by addressing key limitations, particularly the lack of mechanisms to remove outdated information and the reliance on the kernel feature map choice. GaLiTe introduces a gating mechanism to control information flow, allowing selective memory retention and a parameterized feature map to compute key and query vectors without needing specific kernel functions. For further efficiency, the Approximate Gated Linear Transformer (AGaLiTe) utilizes a low-rank approximation to reduce memory demands, storing recurrent states as vectors rather than matrices. This approach achieves significant space and time savings compared to other architectures, especially in complex reinforcement learning tasks.The study evaluates the proposed AGaLiTe model across several partially observable RL tasks. In these environments, agents require memory to handle different levels of partial observability, such as recalling single cues in T-Maze, integrating information over time in CartPole, or navigating through complex environments like Mystery Path, Craftax, and Memory Maze. AGaLiTe, equipped with a streamlined self-attention mechanism, achieves high performance, surpassing traditional models like GTrXL and GRU in effectiveness and computational efficiency. The results indicate that AGaLiTes design significantly reduces operations and memory usage, offering advantages for RL tasks with extensive context requirements.In conclusion, Transformers are highly effective for sequential data processing but face limitations in online reinforcement learning due to high computational demands and the need to maintain all historical data for self-attention. This study introduces two efficient alternatives to transformer self-attention, GaLiTe, and AGaLiTe, which are recurrent-based and designed for partially observable RL tasks. Both models perform competitively or better than GTrXL, with over 40% lower inference costs and over 50% reduced memory usage. Future research may improve AGaLiTe with real-time learning updates and applications in model-based RL approaches like Dreamer V3.Check out the Paper. All credit for this research goes to the researchers of this project. Also,dont forget to follow us onTwitter and join ourTelegram Channel andLinkedIn Group. If you like our work, you will love ournewsletter.. Dont Forget to join our55k+ ML SubReddit. Sana Hassan+ postsSana Hassan, a consulting intern at Marktechpost and dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. With a keen interest in solving practical problems, he brings a fresh perspective to the intersection of AI and real-life solutions. Upcoming Live LinkedIn event, 'One Platform, Multimodal Possibilities,' where Encord CEO Eric Landau and Head of Product Engineering, Justin Sharps will talk how they are reinventing data development process to help teams build game-changing multimodal AI models, fast
    0 Σχόλια 0 Μοιράστηκε 5 Views
  • WWW.MARKTECHPOST.COM
    Microsoft Released LLM2CLIP: A New AI Technique in which a LLM Acts as a Teacher for CLIPs Visual Encoder
    In todays world, CLIP is one of the most important multimodal foundational models. It combines visual and textual signals into a shared feature space using a simple contrastive learning loss on large-scale image-text pairs. As a retriever, CLIP supports many tasks, including zero-shot classification, detection, segmentation, and image-text retrieval. Also, as a feature extractor, it has become dominant in virtually all cross-modal representation tasks, such as image understanding, video understanding, and text-to-image/video generation. Its strength mainly comes from its ability to connect images with natural language and capture human knowledge as it is trained on large web data with detailed text descriptions, unlike vision encoders. As the large language models (LLMs) are developing rapidly, the boundaries of language comprehension and generation are continually being pushed. LLMs strong text skills can help CLIP better handle long, complex captions, a weakness of the original CLIP. LLMs also have broad knowledge of large text datasets, making training more effective. LLMs have strong understanding skills, but their way of generating text hides abilities that make their outputs unclear.Current developments have extended CLIP to handle other modalities, and its influence in the field is growing. New models like Llama3 have been used to extend CLIPs caption length and improve its performance by leveraging the open-world knowledge of LLMs. However, incorporating LLMs with CLIP takes work due to the limitations of its text encoder. In multiple experiments, it was found that directly integrating LLMs into CLIP leads to reduced performance. Thus, certain challenges exist to overcome to explore the potential benefits of incorporating LLMs into CLIP.Tongji University and Microsoft Corporation researchers conducted detailed research and proposed the LLM2CLIP approach for enhancing visual representation learning by integrating large language models (LLMs). This method takes a straightforward step by replacing the original CLIP text encoder and enhances the CLIP visual encoder with extensive knowledge of LLMs. It identifies key obstacles associated with this innovative idea and suggests a cost-effective fine-tuning strategy to overcome them. This method boldly replaces the original CLIP text encoder. It recognizes the challenges of this approach and suggests an affordable way to fine-tune the model to address them. The LLM2CLIP method effectively improved the CLIP model by integrating large language models (LLMs) like Llama. Initially, LLMs struggled as text encoders for CLIP due to their inability to clearly distinguish image captions. Researchers introduced the caption contrastive fine-tuning technique to address this, greatly improving the LLMs ability to separate captions. This fine-tuning led to a substantial performance boost, surpassing existing state-of-the-art models. The LLM2CLIP framework combined the improved LLM with the pretrained CLIP visual encoder, creating a powerful cross-modal model. The method used large LLMs but remained computationally efficient with minimal added costs.The experiments mainly focused on fine-tuning models for better image-text matching using datasets like CC-3M. For LLM2CLIP fine-tuning, three dataset sizes were tested: small (CC-3M), medium (CC-3M and CC-12M), and large (CC-3M, CC-12M, YFCC-15M, and Recaption-1B). Training with augmented captions improved performance, while using an untrained language model for CLIP worsened it. Models trained with LLM2CLIP outperformed standard CLIP and EVA in tasks like image-to-text and text-to-image retrieval, highlighting the advantage of integrating large language models with image-text models.The method directly boosted the performance of the previous SOTA EVA02 model by 16.5% on both long-text and short-text retrieval tasks, transforming a CLIP model trained solely on English data into a state-of-the-art cross-lingual model. After integrating multimodal training with models like Llava 1.5, it performed better than CLIP on almost all benchmarks, showing significant overall improvements in performance.In conclusion, the proposed method allows LLMs to assist in CLIP training. By adjusting parameters such as data distribution, length, or categories, the LLM can be modified to fix CLIPs limitations. It allows LLM to act as a more comprehensive teacher for various tasks. In the proposed work, the LLM gradients were frozen during fine-tuning to maintain a large batch size for CLIP training. In future works, the LLM2CLIP can be trained from scratch on datasets like Laion-2Band and Recaption-1B for better results and performance. This work can be used as a baseline for future research in CLIP training and its wide range of applications!Check out the Paper, Code, and Model on Hugging Face. All credit for this research goes to the researchers of this project. Also,dont forget to follow us onTwitter and join ourTelegram Channel andLinkedIn Group. If you like our work, you will love ournewsletter.. Dont Forget to join our55k+ ML SubReddit. Divyesh Vitthal Jawkhede+ postsDivyesh is a consulting intern at Marktechpost. He is pursuing a BTech in Agricultural and Food Engineering from the Indian Institute of Technology, Kharagpur. He is a Data Science and Machine learning enthusiast who wants to integrate these leading technologies into the agricultural domain and solve challenges. Upcoming Live LinkedIn event, 'One Platform, Multimodal Possibilities,' where Encord CEO Eric Landau and Head of Product Engineering, Justin Sharps will talk how they are reinventing data development process to help teams build game-changing multimodal AI models, fast
    0 Σχόλια 0 Μοιράστηκε 4 Views
  • WWW.MARKTECHPOST.COM
    Meta AI Researchers Introduce Mixture-of-Transformers (MoT): A Sparse Multi-Modal Transformer Architecture that Significantly Reduces Pretraining Computational Costs
    Advancements in AI have paved the way for multi-modal foundation models that simultaneously process text, images, and speech under a unified framework. These models can potentially transform various applications, from content creation to seamless translation across media types, as they enable the generation and interpretation of complex data. However, achieving this requires immense computational resources, which creates a barrier to scaling and operational efficiency. Training these multi-modal systems is complex, as each modality, whether text, image, or audio, introduces unique challenges, requiring customized handling while maintaining cohesion within the models framework. Balancing this level of diversity in data types has proven difficult regarding both processing power and training efficiency.A primary issue faced in multi-modal AI research is that traditional language models are optimized for text, and extending them to incorporate images and audio requires substantial computational power. Large language models, or LLMs, designed specifically for text-based tasks do not naturally integrate other modalities due to the inherent differences in how each modality needs to be processed. For instance, a text model optimized on trillions of tokens can only extend to image and speech data with conflicts in the training dynamics. Consequently, the computational load escalates, with these models requiring up to five times the data and processing power compared to text-only models. Researchers, therefore, aim to find architectures that can accommodate these requirements without a proportional increase in resources.Various strategies currently address this need for computational efficiency in multi-modal models. One prominent approach is using sparse architectures, such as Mixture-of-Experts (MoE), which activates only specific parts of the model as needed. MoE operates by utilizing experts to manage different aspects of the data, reducing the workload of the model at any given moment. However, MoE has limitations, including instability caused by unbalanced expert utilization and difficulty managing training dynamics at scale. Furthermore, MoEs routing mechanism tends to focus on specific aspects of the data, often leading to an imbalance in training different modalities, thus requiring additional techniques to stabilize the process and maintain efficiency.FAIR at Meta and Stanford University researchers introduced a new architecture called Mixture-of-Transformers (MoT). The MoT, built as a sparse, multi-modal transformer, reduces computational demands by incorporating modality-specific parameters. Unlike traditional dense models that rely on uniform processing, MoT utilizes distinct components for each modality, text, image, and speech, allowing for modality-specific optimization without requiring additional model components. For example, MoT assigns unique feed-forward networks, attention matrices, and normalization layers to each modality while maintaining a unified attention mechanism across the entire input data sequence, enhancing processing efficiency and output accuracy.The Mixture-of-Transformers framework leverages this sparse design by decoupling the model parameters according to modality, optimizing training and inference phases. For instance, MoT separates text, image, and speech parameters during a multi-modal task, applying customized processing layers for each. This process reduces the need for dense model layers to accommodate all modalities simultaneously. As a result, MoT achieves a balance of efficiency and effectiveness that traditional dense models lack. For instance, in tests involving text and image generation within the Chameleon 7B model, MoT delivered comparable results to dense baselines with only 55.8% of the FLOPs and even less 37.2% when integrating a third modality, such as speech. This efficiency gain translates to significant reductions in resource usage, which, in large-scale AI models, can lead to major cost savings.Mixture-of-Transformers showed notable improvements across multiple evaluation criteria. Compared to dense transformer models, the architecture reduced pretraining times for text and image tasks by over 40%. In the Chameleon setting, where the model processes text and images using autoregressive objectives, MoT reached the dense models final validation loss using just 55.8% of the computational power. Furthermore, MoT accelerated the training process by achieving the same levels of accuracy in image quality with 47.2% of the time required by dense models, and it achieved text quality in 75.6% of the typical time. Such efficiency gains were further confirmed in the Transfusion setting. MoT matched dense baseline image performance while using only one-third of the FLOPs, proving its adaptability and resource efficiency in handling complex multi-modal data.The research offers several key takeaways, highlighting the potential of Mixture-of-Transformers to redefine multi-modal AI processing:Efficient Multi-Modal Processing: MoT matches dense model performance across text, image, and speech, achieving results with 37.2% to 55.8% of the computational resources.Training Acceleration: In the Chameleon model, MoT reduced training time for image tasks by 52.8% and text tasks by 24.4% while maintaining accuracy.Adaptive Scalability: MoT demonstrated high adaptability by effectively handling discrete and continuous tokens for multiple modalities without additional processing layers.Resource Reduction in Real-Time Use: Performance evaluations on NVIDIA A100 GPUs showed MoT significantly reduced wall-clock training times, making it a viable option for real-time applications.In conclusion, Mixture-of-Transformers presents an innovative approach to multi-modal modeling by offering an efficient, scalable solution for integrating diverse data types within a single framework. Through a sparse architecture that leverages modality-specific processing, MoT significantly reduces computational load while delivering robust performance across various tasks. This breakthrough could transform the landscape of AI, enabling more accessible, resource-efficient models for advanced multi-modal applications.Check out the Paper. All credit for this research goes to the researchers of this project. Also,dont forget to follow us onTwitter and join ourTelegram Channel andLinkedIn Group. If you like our work, you will love ournewsletter.. Dont Forget to join our55k+ ML SubReddit. Asif RazzaqAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences. Upcoming Live LinkedIn event, 'One Platform, Multimodal Possibilities,' where Encord CEO Eric Landau and Head of Product Engineering, Justin Sharps will talk how they are reinventing data development process to help teams build game-changing multimodal AI models, fast
    0 Σχόλια 0 Μοιράστηκε 6 Views
  • WWW.MARKTECHPOST.COM
    Fixie AI Introduces Ultravox v0.4.1: A Family of Open Speech Models Trained Specifically for Enabling Real-Time Conversation with LLMs and An Open-Weight Alternative to GPT-4o Realtime
    Interacting seamlessly with artificial intelligence in real time has always been a complex endeavor for developers and researchers. A significant challenge lies in integrating multi-modal informationsuch as text, images, and audiointo a cohesive conversational system. Despite advancements in large language models like GPT-4, many AI systems still encounter difficulties in achieving real-time conversational fluency, contextual awareness, and multi-modal understanding, which limits their effectiveness for practical applications. Additionally, the computational demands of these models make real-time deployment challenging without considerable infrastructure.Introducing Fixie AIs Ultravox v0.4.1Fixie AI introduces Ultravox v0.4.1, a family of multi-modal, open-source models trained specifically for enabling real-time conversations with AI. Designed to overcome some of the most pressing challenges in real-time AI interaction, Ultravox v0.4.1 incorporates the ability to handle multiple input formats, such as text, images, and other sensory data. This latest release aims to provide an alternative to closed-source models like GPT-4, focusing not only on language proficiency but also on enabling fluid, context-aware dialogues across different types of media. By being open-source, Fixie AI also aims to democratize access to state-of-the-art conversation technologies, allowing developers and researchers worldwide to adapt and fine-tune Ultravox for diverse applicationsfrom customer support to entertainment.Technical Details and Key BenefitsThe Ultravox v0.4.1 models are built using a transformer-based architecture optimized to process multiple types of data in parallel. Leveraging a technique called cross-modal attention, these models can integrate and interpret information from various sources simultaneously. This means users can present an image to the AI, type in a question about it, and receive an informed response in real time. The open-source models are hosted on Hugging Face at Fixie AI on Hugging Face, making it convenient for developers to access and experiment with the models. Fixie AI has also provided a well-documented API to facilitate seamless integration into real-world applications. The models boast impressive latency reduction, allowing interactions to take place almost instantly, making them suitable for real-time scenarios like live customer interactions and educational assistance.Ultravox v0.4.1 represents a notable advancement in conversational AI systems. Unlike proprietary models, which often operate as opaque black boxes, Ultravox offers an open-weight alternative with performance comparable to GPT-4 while also being highly adaptable. Analysis based on Figure 1 from recent evaluations shows that Ultravox v0.4.1 achieves significantly lower response latencyapproximately 30% faster than leading commercial modelswhile maintaining equivalent accuracy and contextual understanding. The models cross-modal capabilities make it effective for complex use cases, such as integrating images with text for comprehensive analysis in healthcare or delivering enriched interactive educational content. The open nature of Ultravox facilitates continuous community-driven development, enhancing flexibility and fostering transparency. By mitigating the computational overhead associated with deploying such models, Ultravox makes advanced conversational AI more accessible to smaller entities and independent developers, bridging the gap previously imposed by resource constraints.ConclusionUltravox v0.4.1 by Fixie AI marks a significant milestone for the AI community by addressing critical issues in real-time conversational AI. With its multi-modal capabilities, open-source model weights, and a focus on reducing response latency, Ultravox paves the way for more engaging and accessible AI experiences. As more developers and researchers start experimenting with Ultravox, it has the potential to foster innovative applications across industries that demand real-time, context-rich, and multi-modal conversations. Check out the Twitter and join ourTelegram Channel andLinkedIn Group. If you like our work, you will love ournewsletter.. Dont Forget to join our55k+ ML SubReddit. Asif RazzaqAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences. Upcoming Live LinkedIn event, 'One Platform, Multimodal Possibilities,' where Encord CEO Eric Landau and Head of Product Engineering, Justin Sharps will talk how they are reinventing data development process to help teams build game-changing multimodal AI models, fast
    0 Σχόλια 0 Μοιράστηκε 4 Views
  • WWW.MARKTECHPOST.COM
    FineTuneBench: Evaluating LLMs Ability to Incorporate and Update Knowledge through Fine-Tuning
    The demand for fine-tuning LLMs to incorporate new information and refresh existing knowledge is growing. While companies like OpenAI and Google offer fine-tuning APIs that allow LLM customization, their effectiveness for knowledge updating remains to be determined. LLMs used in fields like software and medicine need current, domain-specific informationsoftware developers need models updated with the latest code, while healthcare requires adherence to recent guidelines. Although fine-tuning services offer a way to adapt proprietary, closed-source models, they lack transparency regarding methods, and limited hyperparameter options may restrict knowledge infusion. No standardized benchmarks exist to evaluate these fine-tuning capabilities.Current methods to alter LLM behavior include SFT, RLHF, and continued pre-training. However, the effectiveness of these approaches for knowledge infusion is still being determined. Retrieval-augmented generation (RAG) introduces knowledge in prompts, though models often ignore conflicting information, causing inaccuracies. Past research has explored knowledge injection in open-source LLMs using adapters or shallow layer fine-tuning, but more understanding is needed around fine-tuning larger commercial models. Prior studies have fine-tuned models for classification and summarization, yet this work uniquely focuses on knowledge infusion and compares multiple fine-tuning APIs on a shared dataset.Stanford University researchers have developed FineTuneBench, a comprehensive framework and dataset to evaluate how effectively commercial fine-tuning APIs allow LLMs to incorporate new and updated knowledge. Testing five advanced LLMs, including GPT-4o and Gemini 1.5 Pro, in two scenariosintroducing new information (e.g., recent news) and updating existing knowledge (e.g., medical guidelines)the study found limited success across models. The models averaged only 37% accuracy for learning new information and 19% for updating knowledge. Among them, GPT-4o mini performed best, while Gemini models showed minimal capacity for knowledge updates, underscoring limitations in current fine-tuning services for reliable knowledge adaptation.To evaluate how well fine-tuning can enable models to learn new information, researchers created two unique datasets: a Latest News Dataset and a Fictional People Dataset, ensuring none of the data existed in the models training sets. The Latest News Dataset, generated from September 2024 Associated Press articles, was crafted into 277 question-answer pairs, which were further rephrased to test model robustness. The Fictional People Dataset included profile facts about fictional characters, producing direct and derived questions for knowledge testing. Models were trained on both datasets using various methods, such as masking answers in the prompt. Different configurations and epochs were explored to optimize performance.Fine-tuning OpenAI models shows high memorization but limited generalization for new knowledge tasks. While models like GPT-4o-mini excel in recalling trained QA pairs, they struggle with rephrased questions, especially in the Fictional People dataset, where responses to secondary or comparative questions remain weak. Updating knowledge is harder, notably in coding tasks, due to challenges in altering pre-existing information. Gemini models underperform across tasks and need help memorizing or generalizing effectively. Training methods like word masking and prompt completions also fail to enhance generalization, suggesting that standard training paradigms may not adequately improve adaptability.The study presents FineTuneBench, a dataset collection testing fine-tuned LLMs capacity to acquire knowledge in the news, fictional people, medical guidelines, and code libraries. Despite fine-tuning, models showed limited knowledge adaptation, with GPT-4o-mini outperforming others and Gemini underperforming. Relying on LLM fine-tuning remains challenging, as current methods and parameters from OpenAI and Google are limited. RAG approaches are also suboptimal due to cost and scaling issues. Limitations include testing only two LLM providers and using mostly default fine-tuning parameters. Future work will explore how question complexity impacts model generalization.Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also,dont forget to follow us onTwitter and join ourTelegram Channel andLinkedIn Group. If you like our work, you will love ournewsletter.. Dont Forget to join our55k+ ML SubReddit. Sana Hassan+ postsSana Hassan, a consulting intern at Marktechpost and dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. With a keen interest in solving practical problems, he brings a fresh perspective to the intersection of AI and real-life solutions. Upcoming Live LinkedIn event, 'One Platform, Multimodal Possibilities,' where Encord CEO Eric Landau and Head of Product Engineering, Justin Sharps will talk how they are reinventing data development process to help teams build game-changing multimodal AI models, fast
    0 Σχόλια 0 Μοιράστηκε 5 Views
  • WWW.MARKTECHPOST.COM
    Meet Aioli: A Unified Optimization Framework for Language Model Data Mixing
    In recent years, training large language models has faced a crucial challenge: determining the optimal data mixture. Models like GPT-4 can generate diverse content types, ranging from legal texts to conversational responses. However, their performance hinges significantly on the right balance of training data from various sources. The problem of data mixing refers to how we can optimally blend these diverse data typessuch as law, code, and scientific articlesin the models training process. Traditional approaches have involved either static proportioning of these datasets or, more recently, dynamically altering these mixtures during training. Despite these advances, current methods have proven inconsistent, with none clearly outperforming a simple stratified sampling baseline in average test performance. This inconsistency highlights a core issue: existing approaches lack a unified, systematic framework for optimizing data mixtures, leading to suboptimal performance and wasted computational resources.Meet Aioli: A Unified Optimization Framework for Language Model Data MixingIn response to these challenges, a team of researchers from Stanford, NYU, and Genentech have introduced Aioli, a novel online data mixing method that leverages a unified optimization framework called Linear Mixing Optimization (LMO). The LMO framework aims to streamline and improve the way data mixtures are optimized during language model training. Unlike previous methods, Aioli does not merely rely on static guesses or manual tuning. Instead, it incorporates the ongoing dynamics of the training process itself, estimating mixing parameters directly from the models performance. This dynamic adjustment allows Aioli to more effectively estimate the ideal mixture proportions without requiring additional training runs, which are often computationally prohibitive. By implementing Aioli, the research team aims to address the inconsistent results of previous data mixing strategies and offer a more reliable, systematic approach.Technical DetailsAiolis approach is grounded in the Linear Mixing Optimization framework, which formulates data mixing as an optimization problem with the goal of minimizing the average test loss of the language model across various data groups. Unlike traditional offline methods, which require separate training runs to determine optimal mixture ratios, Aioli uses an online adjustment mechanism based on exponentiated gradient descent. This allows the model to adjust the mixture proportions at each training step dynamically. Essentially, Aioli fits the parameters of a linear dynamic mixing law throughout training, allowing it to adapt to the specific needs of the model at that moment, minimizing discrepancies between estimated and optimal mixing parameters.Experimentally, Aioli has shown considerable promise. On six distinct datasets, Aioli outperformed stratified samplinga method that evenly blends all data groupsby an average improvement of 0.28 in test perplexity, indicating better model accuracy. In more constrained training settings, where proportion estimates must be learned on shorter runs, Aioli has further demonstrated its ability to significantly adjust and improve results, achieving up to 12.01 test perplexity points of improvement over previous methods.Importance The introduction of Aioli is a significant breakthrough for several reasons. First, the framework provides a clear understanding of why previous methods failed to consistently improve upon simple data mixing baselines. By using LMO, the researchers were able to unify various existing methods and identify flaws in how their mixing laws were parameterized. The core insight was that while existing parameterizations were well-specified mathematically, the methods themselves often set these parameters inaccurately, leading to performance losses. Aioli corrects this by dynamically estimating these parameters throughout training, providing a more consistent and reliable improvement.Additionally, the importance of Aioli lies in its efficiencyit requires no extra training runs, which not only saves computational resources but also reduces the carbon footprint associated with training large language models. For practical applications, such as updating a conversational AI or optimizing a search engines response mechanism, this means faster deployment and reduced cost.ConclusionAioli presents a promising solution to the ongoing challenge of data mixing in language model training. By unifying the optimization process through the Linear Mixing Optimization framework, Aioli dynamically adjusts data mixture proportions in real time, offering improved accuracy without the need for additional computational overhead. Its ability to consistently outperform both existing online and offline methods across multiple datasets makes it a valuable tool for practitioners looking to improve language model performance. With the increasing demand for powerful language models that can cater to diverse tasks and domains, Aiolis unified and optimized approach offers a significant step forward, enabling models to learn more effectively from the rich tapestry of human knowledge.Check out the Paper and GitHub. All credit for this research goes to the researchers of this project. Also,dont forget to follow us onTwitter and join ourTelegram Channel andLinkedIn Group. If you like our work, you will love ournewsletter.. Dont Forget to join our55k+ ML SubReddit. Asif RazzaqAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences. Listen to our latest AI podcasts and AI research videos here
    0 Σχόλια 0 Μοιράστηκε 4 Views
  • WWW.MARKTECHPOST.COM
    Qwen Open Sources the Powerful, Diverse, and Practical Qwen2.5-Coder Series (0.5B/1.5B/3B/7B/14B/32B)
    In the world of software development, there is a constant need for more intelligent, capable, and specialized coding language models. While existing models have made significant strides in automating code generation, completion, and reasoning, several issues persist. The main challenges include inefficiency in dealing with a diverse range of coding tasks, lack of domain-specific expertise, and difficulty in applying models to real-world coding scenarios. Despite the rise of many large language models (LLMs), code-specific models have often struggled to compete with their proprietary counterparts, especially in terms of versatility and applicability. The need for a model that not only performs well on standard benchmarks but also adapts to diverse environments has never been greater.Qwen2.5-Coder: A New Era of Open CodeLLMsQwen has open-sourced the Powerful, Diverse, and Practical Qwen2.5-Coder series, dedicated to continuously promoting the development of open CodeLLMs. The Qwen2.5-Coder series is built upon the Qwen2.5 architecture, leveraging its advanced architecture and expansive tokenizer to enhance the efficiency and accuracy of coding tasks. Qwen has made a significant stride by open-sourcing these models, making them accessible to developers, researchers, and industry professionals. This family of coder models offers a range of sizes from 0.5B to 32B parameters, providing flexibility for a wide variety of coding needs. The release of Qwen2.5-Coder-32B-Instruct comes at an opportune moment, presenting itself as the most capable and practical coder model of the Qwen series. It highlights Qwens commitment to fostering innovation and advancing the field of open-source coding models.Technical Details Technically, Qwen2.5-Coder models have undergone extensive pretraining on a vast corpus of over 5.5 trillion tokens, which includes public code repositories and large-scale web-crawled data containing code-related texts. The model architecture is shared across different model sizes1.5B and 7B parametersfeaturing 28 layers with variances in hidden sizes and attention heads. Moreover, Qwen2.5-Coder has been fine-tuned using synthetic datasets generated by its predecessor, CodeQwen1.5, incorporating an executor to ensure only executable code is retained, thereby reducing hallucination risks. The models have also been designed to be versatile, supporting various pretraining objectives such as code generation, completion, reasoning, and editing.State-of-the-Art PerformanceOne of the reasons why Qwen2.5-Coder stands out is its demonstrated performance across multiple evaluation benchmarks. It has consistently achieved state-of-the-art (SOTA) performance in over 10 benchmarks, including HumanEval and BigCodeBench, surpassing even some larger models. Specifically, Qwen2.5-Coder-7B-Base achieved higher accuracy on HumanEval and MBPP benchmarks compared to models like StarCoder2 and DeepSeek-Coder of comparable or even greater sizes. The Qwen2.5-Coder series also excels in multi-programming language capabilities, demonstrating balanced proficiency across eight languagessuch as Python, Java, and TypeScript. Additionally, Qwen2.5-Coders long-context capabilities are notably strong, making it suitable for handling repository-level code and effectively supporting inputs up to 128k tokens.Scalability and AccessibilityFurthermore, the availability of models in various parameter sizes (ranging from 0.5B to 32B), along with the option of quantized formats like GPTQ, AWQ, and GGUF ensures that Qwen2.5-Coder can cater to a wide range of computational requirements. This scalability is crucial for developers and researchers who may not have access to high-end computational resources but still need to benefit from powerful coding capabilities. Qwen2.5-Coders versatility in supporting different formats makes it more accessible for practical use, allowing for broader adoption in diverse applications. Such adaptability makes the Qwen2.5-Coder family a vital tool for promoting the development of open-source coding assistants.ConclusionThe open sourcing of the Qwen2.5-Coder series marks a significant step forward in the development of coding language models. By releasing models that are powerful, diverse, and practical, Qwen has addressed key limitations of existing code-specific models. The combination of state-of-the-art performance, scalability, and flexibility makes the Qwen2.5-Coder family a valuable asset for the global developer community. Whether you are looking to leverage the capabilities of a 0.5B model or need the expansive power of a 32B variant, the Qwen2.5-Coder family aims to meet the needs of a diverse range of users. Now is indeed the perfect time to explore the possibilities with Qwens best coder model ever, the Qwen2.5-Coder-32B-Instruct, as well as its versatile family of smaller coders. Lets welcome this new era of open-source coding language models that continue to push the boundaries of innovation and accessibility.Check out the Paper, Models on Hugging Face, Demo, and Details. All credit for this research goes to the researchers of this project. Also,dont forget to follow us onTwitter and join ourTelegram Channel andLinkedIn Group. If you like our work, you will love ournewsletter.. Dont Forget to join our55k+ ML SubReddit. Asif RazzaqAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences. Listen to our latest AI podcasts and AI research videos here
    0 Σχόλια 0 Μοιράστηκε 5 Views
  • WWW.MARKTECHPOST.COM
    MOS-Bench: A Comprehensive Collection of Datasets for Training and Evaluating Subjective Speech Quality Assessment (SSQA) Models
    A critical challenge in Subjective Speech Quality Assessment (SSQA) is enabling models to generalize across diverse and unseen speech domains. General SSQA models evaluate many models in performing poorly outside their training domain, mainly because such a model is often met with cross-domain difficulty in performance, however, due to the quite distinct data characteristics and scoring systems that exist among different types of SSQA tasks including TTS, VC, and speech enhancement, it is equally challenging. Effective generalization of SSQA is necessary to ensure alignment of human perception in these fields, however, many such models remain limited to the data on which they have been trained, thus constraining them in their real-world utility in applications such as automated speech evaluation for TTS and VC systems.Current SSQA approaches include both reference-based and model-based methods. Reference-based models evaluate quality by comparing speech samples with a reference. On the other hand, model-based methods, especially DNNs, learn directly from human-annotated datasets. Model-based SSQA has a strong potential for capturing human perception much more precisely but, at the same time, shows some very significant limitations:Generalization Constraints: SSQA models often break down while tested over new out-of-domain data, resulting in inconsistent performance.Dataset Bias and Corpus Effect: The models then may become too adapted to the characteristics of the dataset with all its peculiarities, such as scoring biases or data types, which might then make them less effective across different datasets.Computational Complexity: The ensemble models increase the robustness of SSQA, but at the same time increase the computational cost compared to the baseline model, reducing it to impractical possibilities for real-time assessment in low-resource settings. The limitations mentioned above collectively hound the development of good SSQA models, with the ability to generalize well across different datasets and application contexts.To address these limitations, researchers introduce MOS-Bench, a benchmark collection that includes seven training datasets and twelve test datasets across varied speech types, languages, and sampling frequencies. In addition to MOS-Bench, SHEET is a toolkit proposed that provides a standardized workflow for training, validation, and testing of SSQA models. Such a combination of MOS-Bench with SHEET allows SSQA models to be evaluated systematically, and those specifically entail the generalization ability of models. MOS-Bench incorporates the multi-dataset approach, combining data across different sources to expand the exposure of the model to varying conditions. Besides that, a best score difference/ratio new performance metric is also introduced to provide a holistic assessment of the SSQA models performance on these datasets. This doesnt just provide a framework for consistent evaluation but generalizes better as the models are brought in agreement with the variability of the real world, which is a pretty notable contribution towards SSQA.The MOS-Bench dataset collection consists of a wide range of datasets that have diversity in their sampling frequencies and listener labels to capture cross-domain variability in SSQA. Major datasets are:BVCC- A dataset for English that comes with samples for TTS and VC.SOMOS: Speech quality data about English TTS models trained on LJSpeech.SingMOS: A singing voice sampling dataset in Chinese and Japanese.NISQA: Noisy speech samples that have undergone communications over networks. Datasets are multilingual, multiple domains, and speech types for widespread training scope. MOS-Bench uses the SSL-MOS model and the modified AlignNet as backbones, utilizing SSL to learn rich feature representations. SHEET takes the SSQA process one step ahead with data processing, training, and evaluation workflows. SHEET also includes retrieval-based scoring non-parametric kNN inference to improve the faithfulness of models. In addition, hyperparameter tuning, such as batch size and optimization strategies, has been included for further improvement of model performance.Using MOS-Bench and SHEET, both make tremendous improvements in the generalization of SSQA across synthetic and non-synthetic test sets to the point where models learn to achieve high ranks and highly faithful quality predictions even for out-of-domain data. Models trained on MOS-Bench datasets, like PSTN and NISQA, are highly robust on synthetic test sets, and the need for synthetic-focused data as previously required for generalization becomes obsolete. Further, this incorporation of visualizations firmly established that models trained on MOS-Bench captured a wide variety of data distributions and reflected better adaptability and consistency. In this regard, the introduction of these results by MOS-Bench further establishes a reliable benchmark, allowing SSQA models to apply accurate performance across different domains with greater effectiveness and applicability of automated speech quality assessment.This methodology, through MOS-Bench and SHEET, was to challenge the generalization problem of SSQA through several datasets as well as by introducing a new metric of evaluation. Providing a reduction in dataset-specific biases and cross-domain applicability, this methodology will move the frontiers of SSQA research to make it possible for models to generalize across applications effectively. An important advancement is that cross-domain datasets have been gathered by MOS-Bench and with its standardized toolkit. Rather excitingly, the resources are now available for researchers to develop SSQA models that are robust in the presence of a variety of speech types and the presence of real-world applications.Check out the Paper. All credit for this research goes to the researchers of this project. Also,dont forget to follow us onTwitter and join ourTelegram Channel andLinkedIn Group. If you like our work, you will love ournewsletter.. Dont Forget to join our55k+ ML SubReddit. Aswin Ak+ postsAswin AK is a consulting intern at MarkTechPost. He is pursuing his Dual Degree at the Indian Institute of Technology, Kharagpur. He is passionate about data science and machine learning, bringing a strong academic background and hands-on experience in solving real-life cross-domain challenges. Listen to our latest AI podcasts and AI research videos here
    0 Σχόλια 0 Μοιράστηκε 5 Views
  • WWW.MARKTECHPOST.COM
    GPTKB: Large-Scale Knowledge Base Construction from Large Language Models
    Knowledge bases like Wikidata, Yago, and DBpedia have served as fundamental resources for intelligent applications, but innovation in general-world knowledge base construction has been stagnant over the past decade. While Large Language Models (LLMs) have revolutionized various AI domains and shown potential as sources of structured knowledge, extracting and materializing their complete knowledge remains a significant challenge. Current approaches mainly focus on sample-based evaluations using question-answering datasets or specific domains, falling short of comprehensive knowledge extraction. Moreover, scaling the methods of knowledge bases from LLMs through factual prompts and iterative graphs effectively while maintaining accuracy and completeness poses technical and methodological challenges.Existing knowledge base construction methods follow two main paradigms: volunteer-driven approaches like Wikidata and structured information harvesting from sources like Wikipedia, exemplified by Yago and DBpedia. Text-based knowledge extraction systems like NELL and ReVerb represent an alternative approach but have seen limited adoption. Current methods for evaluating LLM knowledge primarily depend on sampling specific domains or benchmarks, failing to capture their knowledges full extent. While some attempts have been made to extract knowledge from LLMs through prompting and iterative exploration, these efforts have been limited in scale or focused on specific domains.Researchers from ScaDS.AI & TU Dresden, Germany, and Max Planck Institute for Informatics, Saarbrcken, Germany have proposed an approach to construct a large-scale knowledge base entirely from LLMs. They introduced GPTKB, built using GPT-4o-mini, demonstrating the feasibility of extracting structured knowledge at scale while addressing specific challenges in entity recognition, canonicalization, and taxonomy construction. The resulting knowledge base contains 105 million triples covering more than 2.9 million entities, achieved at a fraction of the cost compared to traditional KB construction methods. This approach bridges two domains: it provides insights into LLMs knowledge representation and advances general-domain knowledge base construction methods.The architecture of GPTKB follows a two-phase approach to knowledge extraction and organization. The first phase implements an iterative graph expansion process, starting from a seed subject (Vannevar Bush) and systematically extracting triples while identifying newly named entities for further exploration. This expansion process uses a multi-lingual named entity recognition (NER) system using spaCy models across 10 major languages, with rule-based filters to maintain focus on relevant entities and prevent drift into linguistic or translation-related content. The second phase emphasizes consolidation, which includes entity canonicalization, relation standardization, and taxonomy construction. This system operates independently of existing knowledge bases or standardized vocabularies, depending only on the LLMs knowledge.GPTKB shows significant scale and diversity in its knowledge representation, containing patent and person-related information, with nearly 600,000 human entities. The most common properties are patentCitation (3.15 M) and instanceOf (2.96 M), with person-specific properties like hasOccupation (126K), knownFor (119K), and nationality (114K). Comparative analysis with Wikidata reveals that only 24% of GPTKB subjects have exact matches in Wikidata, with 69.5% being potentially novel entities. The knowledge base also captures properties not modeled in Wikidata, such as historicalSignificance (270K triples), hobbies (30K triples), and hasArtStyle (11K triples), suggesting significant novel knowledge contribution.In conclusion, researchers introduced an approach to construct a large-scale knowledge base entirely from LLMs. They provided successful development of GPTKB which shows the feasibility of constructing large-scale knowledge bases directly from LLMs, marking a significant advancement in natural language processing and semantic web domains. While challenges remain in ensuring precision and handling tasks like entity recognition and canonicalization, the approach has proven highly cost-effective, generating 105 million assertions for over 2.9 million entities at a fraction of traditional costs. This approach provides valuable insights into LLMs knowledge representation and opens a new door for open-domain knowledge base construction on how structured knowledge is extracted and organized from language models.Check out the Paper. All credit for this research goes to the researchers of this project. Also,dont forget to follow us onTwitter and join ourTelegram Channel andLinkedIn Group. If you like our work, you will love ournewsletter.. Dont Forget to join our55k+ ML SubReddit. Sajjad Ansari+ postsSajjad Ansari is a final year undergraduate from IIT Kharagpur. As a Tech enthusiast, he delves into the practical applications of AI with a focus on understanding the impact of AI technologies and their real-world implications. He aims to articulate complex AI concepts in a clear and accessible manner. Listen to our latest AI podcasts and AI research videos here
    0 Σχόλια 0 Μοιράστηκε 7 Views
  • WWW.MARKTECHPOST.COM
    Salesforce AI Research Introduces Moirai-MoE: A MoE Time Series Foundation Model that Achieves Token-Level Model Specialization Autonomously
    Time series forecasting has long been integral to finance, healthcare, meteorology, and supply chain management. Its main objective is to predict future data points based on historical observations, which can be challenging due to the complex and varying nature of time series data. Recent advancements in machine learning, particularly foundation models, have transformed this domain by creating generalized models capable of handling various time series without specialized, case-specific training. These foundation models mark a significant shift from traditional approaches that required multiple models tailored to specific datasets. However, the diversity in time series characteristics, such as variations in frequency, seasonality, and underlying patterns, continues to present substantial challenges for unified model training.A key problem in time series forecasting is handling data heterogeneity effectively. Time series data from different sources vary significantly regarding frequency, distribution, and structure. Current forecasting models often rely on human-defined frequency-based specialization to address this diversity. However, frequency alone is not a reliable indicator of a time series pattern, as data with similar frequencies may exhibit distinct behaviors. Conversely, data with different frequencies may display similar patterns. This approach must capture the complexity and diversity inherent in real-world time series. Another challenge lies in the non-stationary nature of time series data, where the statistical properties of the data change over time, making it difficult to model accurately with frequency-based grouping.Existing time series forecasting methods attempt to address data variability with varied approaches. For instance, models such as TEMPO and UniTime incorporate language-based prompts to help the model discern different data sources, achieving limited dataset-level specialization. Other models, like TimesFM, maintain frequency-specific embedding dictionaries to aid in distinguishing between data types based on frequency. However, many models, including the widely recognized Chronos series, opt for a generalized structure without specialized modules, increasing model complexity and large parameter demands. The challenge with these methods is their inability to fully capture the diverse nature of time series data, as frequency alone only sometimes correlates with underlying data patterns, leading to inefficiencies and compromised model accuracy.Researchers from Salesforce AI Research, the National University of Singapore, and the Hong Kong University of Science and Technology introduced an innovative model called MOIRAI-MoE. MOIRAI-MoE integrates a sparse mixture of experts (MoE) within its Transformer architecture, allowing token-level specialization without human-defined frequency heuristics. This data-driven approach minimizes dependency on predefined frequency-based layers and uses a single input/output projection layer, enabling the model to automatically capture and represent diverse patterns. By achieving token-level specialization, MOIRAI-MoE provides a more flexible and efficient solution capable of better representing the unique characteristics of varied time series data without requiring distinct models for each frequency category.MOIRAI-MoEs architecture leverages a gating function that assigns each token to an appropriate expert within the Transformer layers based on token clustering derived from a pretrained model. This clustering approach is guided by the Euclidean distance to centroids, allowing tokens with similar patterns to be processed by the same expert while specialized experts handle diverse tokens. By incorporating 32 expert networks, each focusing on unique time series characteristics, MOIRAI-MoE effectively reduces computational overhead while enhancing its ability to generalize across different data types. This approach enables MOIRAI-MoE to excel in representing non-stationary time series data by dynamically adapting to pattern shifts within the data.Extensive testing across 39 datasets demonstrated the superior performance of MOIRAI-MoE in both in-distribution and zero-shot forecasting scenarios. For in-distribution forecasting, MOIRAI-MoE outperformed its dense model counterpart by up to 17%, showcasing a significant improvement in accuracy while utilizing up to 65 times fewer activated parameters than other leading models, including TimesFM and Chronos. In zero-shot forecasting, where the model was tested on datasets not included in the training data, MOIRAI-MoEs performance surpassed traditional models. In these tests, MOIRAI-MoE achieved a 3-14% improvement in continuous ranked probability score (CRPS) and an 8-16% improvement in mean absolute scaled error (MASE) over prior models. These results underscore the models robust generalization ability without requiring task-specific training.This research presents key takeaways that highlight the advancements MOIRAI-MoE brings to time series forecasting:Data-Driven Specialization: By achieving token-level specialization through a sparse mixture of experts, MOIRAI-MoE overcomes the limitations of human-defined frequency specialization, allowing for a more nuanced representation of time series diversity.Computational Efficiency: The models sparse expert activation drastically reduces computational demands, achieving up to 65 times fewer activated parameters while maintaining high accuracy.Performance Gains: Testing on diverse datasets confirmed that MOIRAI-MoE surpasses dense models and foundational models like TimesFM and Chronos, achieving a 17% improvement over dense counterparts in in-distribution tests.Scalability and Generalization: MOIRAI-MoE demonstrates strong zero-shot performance, making it highly applicable to real-world forecasting tasks without requiring specialized training for each application, which is critical in diverse applications like finance, healthcare, and climate modeling.In conclusion, MOIRAI-MoE represents a major advancement in time series forecasting by introducing a flexible, data-driven approach that overcomes the limitations of frequency-based specialization. With its sparse mixture of expert architecture, MOIRAI-MoE addresses the diverse and non-stationary nature of time series data and achieves significant computational efficiency and performance gains. This novel approach underscores the potential of token-level specialization, paving the way for future improvements in time series foundation models and expanding the utility of zero-shot forecasting across various industries and applications.Check out the Paper. All credit for this research goes to the researchers of this project. Also,dont forget to follow us onTwitter and join ourTelegram Channel andLinkedIn Group. If you like our work, you will love ournewsletter.. Dont Forget to join our55k+ ML SubReddit. Asif RazzaqAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences. Listen to our latest AI podcasts and AI research videos here
    0 Σχόλια 0 Μοιράστηκε 7 Views
  • WWW.MARKTECHPOST.COM
    SambaNova and Hugging Face Simplify AI Chatbot Integration with One-Click Deployment
    The deployment of AI chatbots has long been a significant challenge for organizations, particularly for those without the necessary technical expertise or infrastructure to support advanced AI models. Developing AI chatbots requires training complex models, managing cloud resources, optimizing inference, and maintaining compatibility across platforms. As a result, many businesses find themselves either compromising on performance or outsourcing their AI projects, both of which can be costly and time-consuming. The high barrier to entry has made AI chatbot deployment particularly challenging for small to medium-sized enterprises (SMEs) and individual developers, limiting the widespread adoption of conversational AI across industries.One-Click Integration: Simplifying DeploymentSambaNova and Hugging Face are changing the AI chatbot landscape with their new one-click integration, designed to make deployment accessible to a broader audience. The collaboration between these two companies enables developers to seamlessly deploy advanced AI models for chatbots with minimal configuration and setup. This one-click integration aims to streamline the process of getting an AI chatbot up and running, reducing complexity, costs, and the need for extensive technical knowledge. By combining SambaNovas expertise in hardware acceleration and Hugging Faces extensive collection of pre-trained models, this integration provides a holistic solution that addresses multiple pain points in the chatbot deployment process.Technical Details and BenefitsAt the core of this integration lies a powerful collaboration that blends SambaNovas Reconfigurable Dataflow Architecture (RDA) with Hugging Faces open-source AI models and tools. The technical aspect revolves around providing optimized hardware infrastructure through SambaNovas DataScale system, which is well-suited for AI workloads, along with Hugging Faces model repository. With the click of a button, developers can deploy advanced models like GPT-3 or Bloom directly to a scalable and efficient environment without worrying about the underlying infrastructure. The integration simplifies deployment and ensures that the models run efficiently on high-performance systems, thereby improving inference speeds and enhancing the user experience. Furthermore, the collaboration allows developers to leverage SambaNovas support for large-scale model training while benefiting from Hugging Faces popular transformers library, which is known for its user-friendly interface.The Importance of One-Click IntegrationThis one-click integration is significant for several reasons. It allows developerseven those with limited AI experienceto quickly build and deploy sophisticated conversational agents without getting bogged down in infrastructure details. Early reports from developers using this solution indicate substantial time savings, with some citing deployment processes being reduced from weeks to mere hours. This ease of deployment also translates into quicker iterations and improvements, enabling businesses to be more agile in their chatbot strategies. For enterprises that rely on customer interaction, the reduced complexity and increased speed of deploying chatbots can enhance customer service and drive engagement. Additionally, the availability of pre-trained models from Hugging Face means developers can tailor chatbot behavior to their specific needs with relatively little customization, further boosting the accessibility of AI tools.Getting Started with One-Click IntegrationFor developers looking to try out the service, the process is quite simple. Start by visiting SambaNova Clouds API website to obtain an access token. Next, use Python to execute the following three lines of code:import gradio as grimport sambanova_gradiogr.load("Meta-Llama-3.1-70B-Instruct-8k", src=sambanova_gradio.registry, accept_token=True).launch()The final step involves clicking Deploy to Hugging Face and entering the SambaNova token. In just a few seconds, a fully functional AI chatbot will be available on Hugging Faces Spaces platform, ready for use.ConclusionIn conclusion, the partnership between SambaNova and Hugging Face marks a significant step forward in democratizing AI chatbot technology. The one-click integration they have introduced makes the deployment of powerful chatbots feasible for a much wider range of users, from individual developers to large enterprises. By reducing technical barriers and leveraging powerful, optimized infrastructure, SambaNova and Hugging Face are pushing the boundaries of whats possible with conversational AI, encouraging further innovation and enabling more organizations to benefit from advanced AI solutions. Check out the Source and Details. All credit for this research goes to the researchers of this project. Also,dont forget to follow us onTwitter and join ourTelegram Channel andLinkedIn Group. If you like our work, you will love ournewsletter.. Dont Forget to join our55k+ ML SubReddit. Asif RazzaqAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences. Listen to our latest AI podcasts and AI research videos here
    0 Σχόλια 0 Μοιράστηκε 5 Views
  • WWW.MARKTECHPOST.COM
    Anthropic AI Introduces a New Token Counting API
    Precise control over language models is crucial for developers and data scientists. Large language models like Claude from Anthropic offer remarkable opportunities, but managing tokens effectively is a key challenge. Anthropics Token Counting API addresses this by providing detailed insights into token usage, enhancing efficiency and control over language model interactions.Why Token Counting MattersTokens are the building blocks of language modelsletters, punctuation, or words used to generate responses. Managing tokens impacts:Cost Efficiency: Tokens determine API costs. Proper management reduces unnecessary expenses.Quality Control: Token limits affect response completeness. Counting tokens helps craft optimal prompts.User Experience: Understanding token usage ensures smoother interactions, crucial for chatbots and extensive conversations.Anthropics Token Counting API simplifies measuring and managing token consumption, offering developers better control over their interactions with language models.Supported modelsThe token-counting endpoint supports the following models:Claude 3.5 SonnetClaude 3.5 HaikuClaude 3 HaikuClaude 3 OpusIntroducing the Token Counting APIThe Token Counting API allows developers to count tokens without interacting directly with Claude. It measures token counts for prompts and responses without consuming compute resources, enabling optimization during development.How It Works: Developers submit text inputs, and the API calculates the token count. This preemptive estimate allows prompt adjustments before making costly API calls. The Token Counting API is compatible with various Anthropic models, ensuring consistent token monitoring across updates.Count tokens in basic messages (Python)import anthropicclient = anthropic.Anthropic()response = client.beta.messages.count_tokens( betas=["token-counting-2024-11-01"], model="claude-3-5-sonnet-20241022", system="You are a scientist", messages=[{ "role": "user", "content": "Hello, Claude" }],)print(response.json())Count tokens in basic messages (Typescript)import Anthropic from '@anthropic-ai/sdk';const client = new Anthropic();const response = await client.beta.messages.countTokens({ betas: ["token-counting-2024-11-01"], model: 'claude-3-5-sonnet-20241022', system: 'You are a scientist', messages: [{ role: 'user', content: 'Hello, Claude' }]});console.log(response);Key Features and BenefitsAccurate Estimation: The API provides a precise token count for prompts, helping developers refine inputs to stay within token limits, ensuring completeness and efficiency.Optimized Utilization: For complex use cases like retrieval-augmented generation or customer support systems, the API helps manage token usage, preventing incomplete responses and improving reliability.Cost-Effectiveness: Understanding token usage helps optimize API calls and prompt lengths, reducing costsespecially beneficial for startups and cost-sensitive projects.Real-World Use CasesCustomer Support Chatbots: Ensures coherent conversations without abrupt cut-offs.Document Summarization: Tailors inputs for efficient summaries despite token limits.Interactive Learning Tools: Maintains efficient prompts and useful responses for educational purposes.Key InsightsThe Token Counting API solves a persistent developer challengeestimating token usage before interacting with the model. This preemptive approach helps avoid frustrating token limits during interactions, enhancing workflow efficiency.The API aligns with Anthropics focus on user safety and transparency, giving developers greater control over their models and reinforcing the commitment to manageable AI tools.ConclusionThe Token Counting API empowers developers by providing accurate token insights, leading to smarter model usage and more efficient application development. It supports transparent and predictable AI interactions, enabling developers to craft better prompts, reduce costs, and deliver smoother user experiences.As language models evolve, tools like Anthropics Token Counting API will be essential for efficient AI integration, helping optimize projects and save time and resources.Check out the Details. All credit for this research goes to the researchers of this project. Also,dont forget to follow us onTwitter and join ourTelegram Channel andLinkedIn Group. If you like our work, you will love ournewsletter.. Dont Forget to join our55k+ ML SubReddit. Shobha Kakkar+ postsShobha is a data analyst with a proven track record of developing innovative machine-learning solutions that drive business value. Listen to our latest AI podcasts and AI research videos here
    0 Σχόλια 0 Μοιράστηκε 4 Views
  • WWW.MARKTECHPOST.COM
    Cerebras Systems Revolutionizes AI Inference: 3x Faster with Llama 3.1-70B at 2,100 Tokens per Second
    Artificial Intelligence (AI) continues to evolve rapidly, but with that evolution comes a host of technical challenges that need to be overcome for the technology to truly flourish. One of the most pressing challenges today lies in inference performance. Large language models (LLMs), such as those used in GPT-based applications, demand a high volume of computational resources. The bottleneck occurs during inferencethe stage where trained models generate responses or predictions. This stage often faces constraints due to the limitations of current hardware solutions, making the process slow, energy-intensive, and cost-prohibitive. As models become larger, traditional GPU-based solutions are increasingly falling short in terms of both speed and efficiency, limiting the transformative potential of AI in real-time applications. This situation creates a need for faster, more efficient solutions to keep pace with the demands of modern AI workloads.Cerebras Systems Inference Gets 3x Faster! Llama 3.1-70B at 2,100 Tokens per SecondCerebras Systems has made a significant breakthrough, claiming that its inference process is now three times faster than before. Specifically, the company has achieved a staggering 2,100 tokens per second with the Llama 3.1-70B model. This means that Cerebras Systems is now 16 times faster than the fastest GPU solution currently available. This kind of performance leap is akin to an entire generation upgrade in GPU technology, like moving from the NVIDIA A100 to the H100, but all accomplished through a software update. Moreover, it is not just larger models that benefit from this increaseCerebras is delivering 8 times the speed of GPUs running the much smaller Llama 3.1-3B, which is 23 times smaller in scale. Such impressive gains underscore the promise that Cerebras brings to the field, making high-speed, efficient inference available at an unprecedented rate.Technical Improvements and BenefitsThe technical innovations behind Cerebras latest leap in performance include several under-the-hood optimizations that fundamentally enhance the inference process. Critical kernels such as matrix multiplication (MatMul), reduce/broadcast, and element-wise operations have been entirely rewritten and optimized for speed. Cerebras has also implemented asynchronous wafer I/O computation, which allows for overlapping data communication and computation, ensuring the maximum utilization of available resources. In addition, advanced speculative decoding has been introduced, effectively reducing latency without sacrificing the quality of generated tokens. Another key aspect of this improvement is that Cerebras maintained 16-bit precision for the original model weights, ensuring that this boost in speed does not compromise model accuracy. All of these optimizations have been verified through meticulous artificial analysis to guarantee they do not degrade the output quality, making Cerebras system not only faster but also trustworthy for enterprise-grade applications.Transformative Potential and Real-World ApplicationsThe implications of this performance boost are far-reaching, especially when considering the practical applications of LLMs in sectors like healthcare, entertainment, and real-time communication. GSK, a pharmaceutical giant, has highlighted how Cerebras improved inference speed is fundamentally transforming their drug discovery process. According to Kim Branson, SVP of AI/ML at GSK, Cerebras advances in AI are enabling intelligent research agents to work faster and more effectively, providing a critical edge in the competitive field of medical research. Similarly, LiveKita platform that powers ChatGPTs voice modehas seen a drastic improvement in performance. Russ dSa, CEO of LiveKit, remarked that what used to be the slowest step in their AI pipeline has now become the fastest. This transformation is enabling instantaneous voice and video processing capabilities, opening new doors for advanced reasoning, real-time intelligent applications, and enabling up to 10 times more reasoning steps without increasing latency. The data shows that the improvements are not just theoretical; they are actively reshaping workflows and reducing operational bottlenecks across industries.ConclusionCerebras Systems has once again proven its dedication to pushing the boundaries of AI inference technology. With a threefold increase in inference speed and the ability to process 2,100 tokens per second with the Llama 3.1-70B model, Cerebras is setting a new benchmark for whats possible in AI hardware. By focusing on both software and hardware optimizations, Cerebras is helping AI transcend the limits of what was previously achievablenot only in speed but also in efficiency and scalability. This latest leap means more real-time, intelligent applications, more robust AI reasoning, and a smoother, more interactive user experience. As we move forward, these kinds of advancements are critical in ensuring that AI remains a transformative force across industries. With Cerebras leading the charge, the future of AI inference looks faster, smarter, and more promising than ever.Check out the Details. All credit for this research goes to the researchers of this project. Also,dont forget to follow us onTwitter and join ourTelegram Channel andLinkedIn Group. If you like our work, you will love ournewsletter.. Dont Forget to join our55k+ ML SubReddit.[AI Magazine/Report] Read Our Latest Report on SMALL LANGUAGE MODELSThe post Cerebras Systems Revolutionizes AI Inference: 3x Faster with Llama 3.1-70B at 2,100 Tokens per Second appeared first on MarkTechPost.
    0 Σχόλια 0 Μοιράστηκε 3 Views
  • WWW.MARKTECHPOST.COM
    Assembly AI Introduces Universal-2: The Next Leap in Speech-to-Text Technology
    In recent years, Automatic Speech Recognition (ASR) technology has gained significant traction, transforming industries ranging from healthcare to customer support. However, achieving accurate transcription across diverse languages, accents, and noisy environments remains challenging. Current speech-to-text models often face issues like inaccuracies in understanding complex accents, handling domain-specific terminology, and dealing with background noise. The need for a more robust, adaptable, and scalable speech-to-text solution is evident, especially as the demand for such technology rises with the proliferation of AI-driven applications in day-to-day life.Assembly AI Introduces Universal-2: A New Speech-to-Text Model with Major ImprovementsIn response to these challenges, Assembly AI has introduced Universal-2, a new speech-to-text model designed to offer significant improvements over its predecessor, Universal-1. This upgraded model aims to enhance transcription accuracy across a broader spectrum of languages, accents, and scenarios. Assembly AIs Universal-2 leverages cutting-edge advancements in deep learning and speech processing, enabling a more nuanced understanding of human speech even in challenging conditions like poor audio quality or heavy background noise. According to Assembly AI, the release of Universal-2 is a milestone in their journey toward creating the most comprehensive and accurate ASR solution in the industry.The Universal-2 model has been built on top of the previous version with substantial refinements in architecture and training methodologies. It introduces enhanced multilingual support, making it a truly versatile ASR solution capable of delivering high-quality results across various languages and dialects. One of the key differentiators of Universal-2 is its ability to maintain consistent performance even in low-resource settings, meaning that the model doesnt falter when transcribing under less-than-ideal conditions. This makes it ideal for applications like call centers, podcasts, and multilingual meetings where speech quality can vary significantly. Additionally, Universal-2 is designed with scalability in mind, offering developers an easy integration experience with a wide array of APIs for rapid deployment.Technical Details and Benefits of Universal-2Universal-2 is based on an ASR decoder architecture called the Recurrent Neural Network Transducer (RNN-T). Compared to Universal-1, the model employs a broader training dataset, encompassing diverse speech patterns, multiple dialects, and varying audio qualities. This broader dataset helps the model learn to be more adaptive and precise, reducing the word error rate (WER) compared to its predecessor.Moreover, the improvements in noise robustness allow Universal-2 to handle real-world audio scenarios more effectively. It has also been optimized for faster processing speeds, enabling near real-time transcriptiona crucial feature for applications in sectors like customer service, live broadcasting, and automated meeting transcription. These technical enhancements help bridge the gap between human-level understanding and machine-level transcription, which has long been a target for AI researchers and developers.The Importance of Universal-2 and Its Performance MetricsThe introduction of Universal-2 is a significant step forward for the ASR industry. Enhanced accuracy and robustness mean that businesses can rely on transcription services with increased confidence, even when dealing with complex audio environments. Assembly AI has reported a notable decrease in the word error rate of Universal-2a 32% reduction compared to Universal-1. This improvement translates into fewer transcription errors, better customer experiences, and higher efficiency for tasks such as subtitling videos, generating meeting notes, or powering voice-controlled applications.Another critical aspect is Universal-2s enhanced performance across different languages and accents. In an increasingly interconnected world, the ability to accurately transcribe non-English languages or handle strong regional accents opens up new opportunities for businesses and services. This broader applicability makes Universal-2 highly valuable in regions where language diversity poses a challenge to conventional ASR systems. By pushing the envelope on multilingual support, Assembly AI continues to make strides in democratizing access to cutting-edge AI technologies.ConclusionWith Universal-2, Assembly AI is setting a new standard in the speech-to-text landscape. The models enhanced accuracy, speed, and adaptability make it a robust choice for developers and businesses looking to leverage the latest in ASR technology. By addressing previous challenges, such as the need for better noise handling and multilingual support, Universal-2 not only builds upon the strengths of its predecessor but also introduces new capabilities that make speech recognition more accessible and effective for a wider range of applications. As industries continue to integrate AI-driven tools into their workflows, advancements like Universal-2 bring us closer to seamless human-computer communication, laying the groundwork for more intuitive and efficient interactions.Check out the Details. All credit for this research goes to the researchers of this project. Also,dont forget to follow us onTwitter and join ourTelegram Channel andLinkedIn Group. If you like our work, you will love ournewsletter.. Dont Forget to join our55k+ ML SubReddit. Asif RazzaqAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences. Listen to our latest AI podcasts and AI research videos here
    0 Σχόλια 0 Μοιράστηκε 3 Views
  • WWW.MARKTECHPOST.COM
    Google DeepMind Researchers Propose RT-Affordance: A Hierarchical Method that Uses Affordances as an Intermediate Representation for Policies
    In recent years, there has been significant development in the field of large pre-trained models for learning robot policies. The term policy representation here refers to the different ways of interfacing with the decision-making mechanisms of robots, which can potentially facilitate generalization to new tasks and environments. Vision-language-action (VLA) models are pre-trained with large-scale robot data to integrate visual perception, language understanding, and action-based decision-making to guide robots in various tasks. On top of vision-language models (VLMs), they come up with the promise of generalization to new objects, scenes, and tasks. However, VLAs still need to be more reliable to be deployed outside the narrow lab settings they are trained in. While these drawbacks can be mitigated by expanding the scope and diversity of robot datasets, this is highly resource-intensive and challenging to scale. In simple words, these policy representations either need to provide more context or over-specified context that yields less robust policies.language, goal images, and trajectory sketches are widely used and are helpful. One of the most common policy representations is conditioning on language. Most of the robot datasets are labeled with underspecified descriptions of the task, and language-based guidance does not provide enough guidance on how to perform the task. Goal image-conditioned policies provide detailed spatial information about the final goal configuration of the scene. However, goal images are high-dimensional, which presents learning challenges due to over-specification issues. Intermediate representation such as Trajectory sketches, or key points attempts to provide spatial plans for guiding the robots actions. While these spatial plans provide guidance, they still lack sufficient information for the policy on how to perform specific movements.A team of researchers from Google DeepMind conducted detailed research on policy representation for robots and proposed RT-Affordance which is a hierarchical model that first creates an affordance plan given the task language, and then uses the policy on this affordance plan to guide the robots actions for manipulation. In robotics, affordance refers to the potential interactions that an object enables for a robot, based on its shape, size etc. The RT-Affordance model can easily connect heterogeneous sources of supervision including large web datasets and robot trajectories.First, the affordance plan is predicted for the given task language and the initial image of the task. This affordance plan is then combined with language instructions to condition the policy for task execution. It is then projected onto the image, and following this, the policy is conditioned on images overlaid with the affordance plan. The model is co-trained on web datasets (the largest data source), robot trajectories, and a modest number of cheap-to-collect images labeled with affordances. This approach benefits from leveraging both robot trajectory data and extensive web datasets, allowing the model to generalize well across new objects, scenes, and tasks.The research team conducted various experiments that mainly focused on how affordances help to improve robotic grasping, especially for movements of household items with complex shapes (like kettles, dustpans, and pots). A detailed evaluation showed that RT-A remains robust across various out-of-distribution (OOD) scenarios, such as novel objects, camera angles, and backgrounds. The RT-A model performed better than RT-2 and its goal-conditioned variant, achieving success rates of 68%-76% compared to RT-2s 24%-28%. In tasks beyond grasping, like placing objects into containers, RT-A showed a significant performance with a 70% success rate. However, the performance of RT-A slightly dropped when it faced entirely new objects.Check out the Paper. All credit for this research goes to the researchers of this project. Also,dont forget to follow us onTwitter and join ourTelegram Channel andLinkedIn Group. If you like our work, you will love ournewsletter.. Dont Forget to join our55k+ ML SubReddit. Divyesh Vitthal Jawkhede+ postsDivyesh is a consulting intern at Marktechpost. He is pursuing a BTech in Agricultural and Food Engineering from the Indian Institute of Technology, Kharagpur. He is a Data Science and Machine learning enthusiast who wants to integrate these leading technologies into the agricultural domain and solve challenges. Listen to our latest AI podcasts and AI research videos here
    0 Σχόλια 0 Μοιράστηκε 6 Views
  • WWW.MARKTECHPOST.COM
    Databricks Mosaic Research Examines Long-Context Retrieval-Augmented Generation: How Leading AI Models Handle Expansive Information for Improved Response Accuracy
    Retrieval-augmented generation (RAG) represents a great advancement in the capability of large language models (LLMs) to perform tasks accurately by incorporating relevant external information into their processing workflows. This approach, blending information retrieval techniques with generative modeling, has seen growing utility in complex applications such as machine translation, question answering, and comprehensive content generation. By embedding documents into LLMs contexts, RAG enables models to access and utilize more extensive and nuanced data sources, effectively expanding the models capacity to handle specialized queries. This technique has proven especially valuable in industries that require precise and informed responses, offering a transformative potential for fields where accuracy and specificity are paramount.A major challenge facing the development of large language models is the effective management of vast contextual information. As LLMs grow more powerful, so does the demand for their ability to synthesize large volumes of data without losing the quality of their responses. However, incorporating extensive external information often results in performance degradation, as the model may need help to retain critical information across long contexts. This issue is compounded in retrieval scenarios, where models must pull from expansive information databases and integrate them cohesively to generate meaningful output. Consequently, optimizing LLMs for longer context lengths is a crucial research goal, particularly as applications increasingly rely on high-volume, data-rich interactions.Most conventional RAG approaches use embedding documents in vector databases to facilitate efficient, similarity-based retrieval. This process typically involves breaking down documents into retrievable chunks that can be matched to a users query based on relevance. While this method has proven useful for short-to-moderate context lengths, many open-source models experience a decline in accuracy as context size increases. While some more advanced models exhibit promising accuracy with up to 32,000 tokens, limitations remain in harnessing even greater context lengths to consistently enhance performance, suggesting a need for more sophisticated approaches.The research team from Databricks Mosaic Research undertook a comprehensive evaluation of RAG performance across an array of both open-source and commercial LLMs, including well-regarded models such as OpenAIs GPT-4, Anthropics Claude 3.5, and Googles Gemini 1.5. This evaluation tested the impact of increasing context lengths, ranging from 2,000 tokens up to an unprecedented 2 million tokens, to assess how well various models could maintain accuracy when handling extensive contextual information. By varying context lengths across 20 prominent LLMs, the researchers aimed to identify which models demonstrate superior performance in long-context scenarios, making them better suited for applications requiring large-scale data synthesis.The research employed a consistent methodology across all models, embedding document chunks using OpenAIs text-embedding-3-large model and then storing these chunks in a vector store. The studys tests were conducted on three specialized datasets: Databricks DocsQA, FinanceBench, and Natural Questions, each chosen for its relevance to real-world RAG applications. In the generation stage, these embedded chunks were then provided to a range of generative models, where performance was gauged based on the models ability to produce accurate responses to user queries by integrating retrieved information from the context. This approach compared each models capacity to handle information-rich scenarios effectively.The results showed notable variance in performance across the models. Not all benefited equally from expanded context lengths, as extending context did not consistently improve RAG accuracy. The research found that models such as OpenAIs o1-mini and o1-preview, GPT-4o, Claude 3.5 Sonnet, and Gemini 1.5 Pro showed steady improvements, sustaining high accuracy levels even up to 100,000 tokens. However, other models, particularly open-source options like Qwen 2 (70B) and Llama 3.1 (405B), displayed performance degradation beyond the 32,000-token mark. Only a few of the latest commercial models demonstrated consistent long-context capabilities, revealing that while extending context can enhance RAG performance, many models still face substantial limitations beyond certain token thresholds. Of particular interest, Googles Gemini 1.5 Pro model maintained accuracy at extremely long contexts, handling up to 2 million tokens effectively, a remarkable feat not widely observed among other tested models.Analyzing the failure patterns of models in long-context scenarios provided additional insights. Some models, such as Claude 3 Sonnet, frequently refused to respond due to concerns around copyright compliance, especially as context lengths increased. Other models, including Gemini 1.5 Pro, encountered difficulties due to overly sensitive safety filters, resulting in repeated refusals to complete certain tasks. Open-source models also exhibited unique failure patterns; Llama 3.1, for example, demonstrated consistent failures in contexts above 64k tokens, often by providing irrelevant or random content. These results underscore that long-context models fail in various ways, largely dependent on context length and task demands, and suggest specific areas for future improvement.The studys key findings reveal the potential and limitations of using long-context LLMs for RAG applications. While certain state-of-the-art models, such as OpenAIs o1 and Googles Gemini 1.5 Pro, displayed consistent improvement in accuracy across long contexts, most models only demonstrated optimal performance within shorter ranges, around 16,000 to 32,000 tokens. The research team hypothesizes that advanced models like o1 benefit from increased test-time computation, allowing them to handle complex questions and avoid confusion from less relevant retrieved documents. The teams findings highlight the complexities of long-context RAG applications and provide valuable insights for researchers seeking to refine these techniques.Key takeaways from the research include:Performance Stability: Only a select group of commercial models, such as OpenAIs o1 and Googles Gemini 1.5 Pro, maintained consistent performance up to 100,000 tokens and beyond.Performance Decline in Open-Source Models: Most open-source models, including Qwen 2 and Llama 3.1, experienced significant performance drops beyond 32,000 tokens.Failure Patterns: Models like Claude 3 Sonnet and Gemini 1.5 Pro failed differently, with issues like task refusals due to safety filters or copyright concerns.High-Cost Challenges: Long-context RAG is cost-intensive, with processing costs ranging from $0.16 to $5 per query, depending on the model and context length.Future Research Needs: The study suggests further research on context management, error handling, and cost mitigation in practical RAG applications.In conclusion, while extended context lengths present exciting possibilities for LLM-based retrieval, practical limitations persist. Advanced models like OpenAIs o1 and Googles Gemini 1.5 show promise, but broader applicability across diverse models and use cases requires continued refinement and targeted improvements. This research marks an essential step toward understanding the trade-offs and challenges inherent in scaling RAG systems for real-world applications.Check out the Paper. All credit for this research goes to the researchers of this project. Also,dont forget to follow us onTwitter and join ourTelegram Channel andLinkedIn Group. If you like our work, you will love ournewsletter.. Dont Forget to join our55k+ ML SubReddit.[Sponsorship Opportunity with us] Promote Your Research/Product/Webinar with 1Million+ Monthly Readers and 500k+ Community MembersThe post Databricks Mosaic Research Examines Long-Context Retrieval-Augmented Generation: How Leading AI Models Handle Expansive Information for Improved Response Accuracy appeared first on MarkTechPost.
    0 Σχόλια 0 Μοιράστηκε 5 Views
  • WWW.MARKTECHPOST.COM
    MBZUAI Researchers Release Atlas-Chat (2B, 9B, and 27B): A Family of Open Models Instruction-Tuned for Darija (Moroccan Arabic)
    Natural language processing (NLP) has made incredible strides in recent years, particularly through the use of large language models (LLMs). However, one of the primary issues with these LLMs is that they have largely focused on data-rich languages such as English, leaving behind many underrepresented languages and dialects. Moroccan Arabic, also known as Darija, is one such dialect that has received very little attention despite being the main form of daily communication for over 40 million people. Due to the lack of extensive datasets, proper grammatical standards, and suitable benchmarks, Darija has been classified as a low-resource language. As a result, it has often been neglected by developers of large language models. The challenge of incorporating Darija into LLMs is further compounded by its unique mix of Modern Standard Arabic (MSA), Amazigh, French, and Spanish, along with its emerging written form that still lacks standardization. This has led to an asymmetry where dialectal Arabic like Darija is marginalized, despite its widespread use, which has affected the ability of AI models to cater to the needs of these speakers effectively.Meet Atlas-Chat!!MBZUAI (Mohamed bin Zayed University of Artificial Intelligence) has released Atlas-Chat, a family of open, instruction-tuned models specifically designed for Darijathe colloquial Arabic of Morocco. The introduction of Atlas-Chat marks a significant step in addressing the challenges posed by low-resource languages. Atlas-Chat consists of three models with different parameter sizes2 billion, 9 billion, and 27 billionoffering a range of capabilities to users depending on their needs. The models have been instruction-tuned, enabling them to perform effectively across different tasks such as conversational interaction, translation, summarization, and content creation in Darija. Moreover, they aim to advance cultural research by better understanding Moroccos linguistic heritage. This initiative is particularly noteworthy because it aligns with the mission to make advanced AI accessible to communities that have been underrepresented in the AI landscape, thus helping bridge the gap between resource-rich and low-resource languages.Technical Details and Benefits of Atlas-ChatAtlas-Chat models are developed by consolidating existing Darija language resources and creating new datasets through both manual and synthetic means. Notably, the Darija-SFT-Mixture dataset consists of 458,000 instruction samples, which were gathered from existing resources and through synthetic generation from platforms like Wikipedia and YouTube. Additionally, high-quality English instruction datasets were translated into Darija with rigorous quality control. The models have been fine-tuned on this dataset using different base model choices like the Gemma 2 models. This careful construction has led Atlas-Chat to outperform other Arabic-specialized LLMs, such as Jais and AceGPT, by significant margins. For instance, in the newly introduced DarijaMMLU benchmarka comprehensive evaluation suite for Darija covering discriminative and generative tasksAtlas-Chat achieved a 13% performance boost over a larger 13 billion parameter model. This demonstrates its superior ability in following instructions, generating culturally relevant responses, and performing standard NLP tasks in Darija.Why Atlas-Chat MattersThe introduction of Atlas-Chat is crucial for multiple reasons. First, it addresses a long-standing gap in AI development by focusing on an underrepresented language. Moroccan Arabic, which has a complex cultural and linguistic makeup, is often neglected in favor of MSA or other dialects that are more data-rich. With Atlas-Chat, MBZUAI has provided a powerful tool for enhancing communication and content creation in Darija, supporting applications like conversational agents, automated summarization, and more nuanced cultural research. Second, by providing models with varying parameter sizes, Atlas-Chat ensures flexibility and accessibility, catering to a wide range of user needsfrom lightweight applications requiring fewer computational resources to more sophisticated tasks. The evaluation results for Atlas-Chat highlight its effectiveness; for example, Atlas-Chat-9B scored 58.23% on the DarijaMMLU benchmark, significantly outperforming state-of-the-art models like AceGPT-13B. Such advancements indicate the potential of Atlas-Chat in delivering high-quality language understanding for Moroccan Arabic speakers.ConclusionAtlas-Chat represents a transformative advancement for Moroccan Arabic and other low-resource dialects. By creating a robust and open-source solution for Darija, MBZUAI is taking a major step in making advanced AI accessible to a broader audience, empowering users to interact with technology in their own language and cultural context. This work not only addresses the asymmetries seen in AI support for low-resource languages but also sets a precedent for future development in underrepresented linguistic domains. As AI continues to evolve, initiatives like Atlas-Chat are crucial in ensuring that the benefits of technology are available to all, regardless of the language they speak. With further improvements and refinements, Atlas-Chat is poised to bridge the communication gap and enhance the digital experience for millions of Darija speakers.Check out the Paper and Models on Hugging Face. All credit for this research goes to the researchers of this project. Also,dont forget to follow us onTwitter and join ourTelegram Channel andLinkedIn Group. If you like our work, you will love ournewsletter.. Dont Forget to join our55k+ ML SubReddit. Asif RazzaqAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences. Listen to our latest AI podcasts and AI research videos here
    0 Σχόλια 0 Μοιράστηκε 14 Views
  • WWW.MARKTECHPOST.COM
    LLM-KT: A Flexible Framework for Enhancing Collaborative Filtering Models with Embedded LLM-Generated Features
    Collaborative Filtering (CF) is widely used in recommender systems to match user preferences with items but often struggles with complex relationships and adapting to evolving user interactions. Recently, researchers have explored using LLMs to enhance recommendations by leveraging their reasoning abilities. LLMs have been integrated into various stages, from knowledge generation to candidate ranking. While effective, this integration can be costly, and existing methods, such as KAR and LLM-CF, only enhance context-aware CF models by adding LLM-derived textual features.Researchers from HSE University, MIPT, Ural Federal University, Sber AI Lab, AIRI, and ISP RAS developed LLM-KT, a flexible framework designed to enhance CF models by embedding LLM-generated features into intermediate model layers. Unlike previous methods that rely on directly inputting LLM-derived features, LLM-KT integrates these features within the model, allowing it to reconstruct and utilize the embeddings internally. This adaptable approach requires no architectural changes, making it suitable for various CF models. Experiments on the MovieLens and Amazon datasets show that LLM-KT significantly improves baseline models, achieving a 21% increase in NDCG@10 and performing comparably with state-of-the-art context-aware methods.The proposed method introduces a knowledge transfer approach that enhances CF models by embedding LLM-generated features within a designated internal layer. This approach allows CF models to intuitively learn user preferences without altering their architecture, creating profiles based on user-item interactions. LLMs use prompts tailored to each users interaction data to generate preference summaries, or profiles, which are then converted into embeddings with a pre-trained text model, such as text-embedding-ada-002. To optimize this integration, the CF model is trained with an auxiliary pretext task, combining the original model loss with a reconstruction loss that aligns profile embeddings with the CF models internal representations. This setup uses UMAP for dimensional alignment and RMSE for the reconstruction loss, ensuring that the model accurately represents user preferences.The LLM-KT framework, built on RecBole, supports flexible experimental configurations, allowing researchers to define detailed pipelines through a single configuration file. Key features include support for integrating LLM-generated profiles from various sources, an adaptable configuration system, and batch experiment execution with analytical tools for comparing results. The frameworks internal structure includes a Model Wrapper, which oversees essential components like the Hook Manager for accessing intermediate representations, the Weights Manager for fine-tuning control, and the Loss Manager for custom loss adjustments. This modular design streamlines knowledge transfer and fine-tuning, enabling researchers to efficiently test and refine CF models.The experimental setup evaluates the proposed knowledge transfer method for CF models in two ways: for traditional models using only user-item interaction data and for context-aware models that can utilize input features. Experiments were conducted on Amazons CD and Vinyl and MovieLens datasets, using a 70-10-20% train-validation-test split. Baseline CF models included NeuMF, SimpleX, and MultVAE, while KAR, DCN, and DeepFM were used for context-aware comparisons. The method was assessed with ranking metrics (NDCG@K, Hits@K, Recall@K) and AUC-ROC for click-through-rate tasks. Results showed consistent performance improvements across models, with comparable versatility and accuracy to existing approaches like KAR.The LLM-KT framework offers a versatile way to enhance CF models by embedding LLM-generated features within an intermediate layer, allowing models to leverage these embeddings internally. Unlike traditional methods that input LLM features directly, LLM-KT enables seamless knowledge transfer across various CF architectures without altering their structure. Built on the RecBole platform, the framework allows flexible configurations for easy integration and adaptation. Experiments on MovieLens and Amazon datasets confirm significant performance gains, showing that LLM-KT is competitive with advanced methods in context-aware models and applicable across a wider range of CF models.Check out the Paper. All credit for this research goes to the researchers of this project. Also,dont forget to follow us onTwitter and join ourTelegram Channel andLinkedIn Group. If you like our work, you will love ournewsletter.. Dont Forget to join our55k+ ML SubReddit. Sana Hassan+ postsSana Hassan, a consulting intern at Marktechpost and dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. With a keen interest in solving practical problems, he brings a fresh perspective to the intersection of AI and real-life solutions. Listen to our latest AI podcasts and AI research videos here
    0 Σχόλια 0 Μοιράστηκε 11 Views
  • WWW.MARKTECHPOST.COM
    MIT Researchers Developed Heterogeneous Pre-trained Transformers (HPTs): A Scalable AI Approach for Robotic Learning from Heterogeneous Data
    In todays world, building robotic policies is difficult. It often requires collecting specific data for each robot, task, and environment, and the learned policies do not generalize beyond these specific settings. Recent progress in open-source, large-scale data collection has made pre-training on large-scale, high-quality, and diverse data possible. However, in robotics, heterogeneity poses a challenge because robots differ in physical form, sensors, and operating environments. Both proprioception and vision information are important for complex, contact-rich, long-horizon behaviors in robotics. Poor learning of such information can lead to overfitting behaviors such as repeating motions for a particular scene, task, or even trajectory.The current methods in robotic learning involve collecting data from a single robot embodiment for a specific task and training the model upon it. This is an extensive approach, and the main limitation of this is that the model cannot be generalized for various tasks and robots. Methods like pre-training and transfer learning use data from various fields, such as computer vision and natural language, to help models learn and adapt to newer tasks. Recent works show that small projection layers can be used to combine the pre-trained feature spaces of the foundation models. Different from other fields, robotics has less data quantity and diversity but much more heterogeneity. Also, recent advancements combine multimodal data (images, language, audio) for better representation learning. MIT CSAIL and Meta conducted detailed research and proposed a framework named Heterogeneous Pre-trained Transformers (HPT). It is a family of architecture designed to scalably learn from data across heterogeneous embodiments. HPTs main function is to create a shared understanding or representation of tasks that can be used by different robots in various conditions. Instead of training a robot from scratch for each new task or environment, HPT allows robots to use pre-learned knowledge, making the training process faster and more efficient. This architecture combines the proprioception and vision inputs from distinct embodiments into a short sequence of tokens, which are then processed to control robots for various tasks.The architecture of HPT consists of the embodiment-specific stem, the shared trunk, and the task-specific heads. HPT is inspired by learning from multimodal data and uses embodiment-specific tokenizers, known as stem, to combine various sensor inputs such as camera views and body movements data. The trunk is a shared model and pre-trained across datasets and is transferred when adapting to new embodiments and tasks that are unknown during the pre-training times. Moreover, it uses task-specific action decoders to produce the action outputs known as heads. After tokenizing each embodiment, HPT operates on a shared space of a short sequence of latent tokens.The scaling behaviors and various designs of policy pre-training were investigated using more than 50 individual data sources and a model size of over 1 billion parameters. Many available embodied datasets in different embodiments, such as real robots, simulations, and internet human videos, were incorporated into the pre-training process. The results showed that the HPT framework works well not only with costly real-world robot operations but also with other types of embodiments. It outperforms several baselines and enhances the fine-tuned policy performance by over 20% on unseen tasks in multiple simulator benchmarks and real-world settings.Check out the Paper, Project, and MIT Blog. All credit for this research goes to the researchers of this project. Also,dont forget to follow us onTwitter and join ourTelegram Channel andLinkedIn Group. If you like our work, you will love ournewsletter.. Dont Forget to join our55k+ ML SubReddit. Divyesh Vitthal Jawkhede+ postsDivyesh is a consulting intern at Marktechpost. He is pursuing a BTech in Agricultural and Food Engineering from the Indian Institute of Technology, Kharagpur. He is a Data Science and Machine learning enthusiast who wants to integrate these leading technologies into the agricultural domain and solve challenges. Listen to our latest AI podcasts and AI research videos here
    0 Σχόλια 0 Μοιράστηκε 7 Views
  • WWW.MARKTECHPOST.COM
    Meta AI Introduces AdaCache: A Training-Free Method to Accelerate Video Diffusion Transformers (DiTs)
    Video generation has rapidly become a focal point in artificial intelligence research, especially in generating temporally consistent, high-fidelity videos. This area involves creating video sequences that maintain visual coherence across frames and preserve details over time. Machine learning models, particularly diffusion transformers (DiTs), have emerged as powerful tools for these tasks, surpassing previous methods like GANs and VAEs in quality. However, as these models become complex, generating high-resolution videos computational cost and latency has become a significant challenge. Researchers are now focused on improving these models efficiency to enable faster, real-time video generation while maintaining quality standards.One pressing issue in video generation is the resource-intensive nature of current high-quality models. Generating complex, visually appealing videos requires significant processing power, especially with large models that handle longer, high-resolution video sequences. These demands slow down the inference process, which makes real-time generation challenging. Many video applications need models that can process data quickly while still delivering high fidelity across frames. A key problem is finding an optimal balance between processing speed and output quality, as faster methods typically compromise the details. In contrast, high-quality methods tend to be computationally heavy and slow.Over time, various methods have been introduced to optimize video generation models, aiming to streamline computational processes and reduce resource usage. Traditional approaches like step-distillation, latent diffusion, and caching have contributed to this goal. Step distillation, for instance, reduces the number of steps needed to achieve quality by condensing complex tasks into simpler forms. At the same time, latent diffusion techniques aim to improve the overall quality-to-latency ratio. Caching techniques store previously computed steps to avoid redundant calculations. However, these approaches have limitations, such as more flexibility to adapt to the unique characteristics of each video sequence. This often leads to inefficiencies, particularly when dealing with videos that vary greatly in complexity, motion, and texture.Researchers from Meta AI and Stony Brook University introduced an innovative solution called Adaptive Caching (AdaCache), which accelerates video diffusion transformers without additional training. AdaCache is a training-free technique that can be integrated into various video DiT models to streamline processing times by dynamically caching computations. By adapting to the unique needs of each video, this approach allows AdaCache to allocate computational resources where they are most effective. AdaCache is built to optimize latency while preserving video quality, making it a flexible, plug-and-play solution for improving performance across different video generation models.AdaCache operates by caching certain residual computations within the transformer architecture, allowing these calculations to be reused across multiple steps. This approach is particularly efficient because it avoids redundant processing steps, a common bottleneck in video generation tasks. The model uses a caching schedule tailored for each video to determine the best points for recomputing or reusing residual data. This schedule is based on a metric that assesses the data change rate across frames. Further, the researchers incorporated a Motion Regularization (MoReg) mechanism into AdaCache, which allocates more computational resources to high-motion scenes that require finer attention to detail. By using a lightweight distance metric and a motion-based regularization factor, AdaCache balances the trade-off between speed and quality, adjusting computational focus based on the videos motion content.The research team conducted a series of tests to evaluate AdaCaches performance. Results showed that AdaCache substantially improved processing speeds and quality retention across multiple video generation models. For example, in a test involving Open-Soras 720p 2-second video generation, AdaCache recorded a speed increase up to 4.7 times faster than previous methods while maintaining comparable video quality. Furthermore, variants of AdaCache, like the AdaCache-fast and the AdaCache-slow, offer options based on speed or quality needs. With MoReg, AdaCache demonstrated enhanced quality, aligning closely with human preferences in visual assessments, and outperformed traditional caching methods. Speed benchmarks on different DiT models also confirmed AdaCaches superiority, with speedups ranging from 1.46x to 4.7x depending on the configuration and quality requirements.In conclusion, AdaCache marks a significant advancement in video generation, providing a flexible solution to the longstanding issue of balancing latency and video quality. By employing adaptive caching and motion-based regularization, the researchers offer a method that is efficient and practical for a wide array of real-world applications in real-time and high-quality video production. AdaCaches plug-and-play nature enables it to enhance existing video generation systems without requiring extensive retraining or customization, making it a promising tool for future video generation.Check out the Paper, Code, and Project. All credit for this research goes to the researchers of this project. Also,dont forget to follow us onTwitter and join ourTelegram Channel andLinkedIn Group. If you like our work, you will love ournewsletter.. Dont Forget to join our55k+ ML SubReddit. Nikhil+ postsNikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute. Listen to our latest AI podcasts and AI research videos here
    0 Σχόλια 0 Μοιράστηκε 19 Views
  • WWW.MARKTECHPOST.COM
    Anthropic Introduces Claude 3.5 Sonnet: The AI That Understands Text, Images, and More in PDFs
    Information overload presents significant challenges in extracting insights from documents containing both text and visuals, such as charts, graphs, and images. Despite advancements in language models, analyzing these multimodal documents remains difficult. Conventional AI models are limited to interpreting plain text, often struggling to process complex visual elements embedded in documents, which hinders effective document analysis and knowledge extraction.The new Claude 3.5 Sonnet model now supports PDF input, enabling it to understand both textual and visual content within documents. Developed by Anthropic, this enhancement marks a substantial leap forward, allowing the AI to handle a broader range of information from PDFs, including textual explanations, images, charts, and graphs, within documents that span up to 100 pages. Users can now upload entire PDF documents for detailed analysis, benefitting from an AI that understands not just the words but the complete layout and visual narrative of a document. The models ability to read tables and charts embedded within PDFs is particularly noteworthy, making it an all-encompassing tool for those seeking comprehensive content interpretation without needing to rely on multiple tools for different data types.Technically, Claude 3.5 Sonnets capabilities are driven by advancements in multimodal learning. The model has been trained not only to parse text but also to recognize and interpret visual patterns, allowing it to link textual content with related visual information effectively. This integration relies on sophisticated vision-language transformers, which enable the model to process data from different modalities simultaneously. The fusion of both textual and visual learning pathways results in an enriched understanding of contextbe it discerning insights from a pie chart or explaining the relationship between text and a related image. Moreover, Claude 3.5 Sonnets ability to process lengthy documents up to 100 pages greatly enhances its utility for use cases like auditing financial reports, conducting academic research, and summarizing legal papers. Users can experience faster, more accurate document interpretation without the need for additional manual processing or restructuring.This development is important for several reasons. First, the ability to analyze both text and visual content significantly increases efficiency for end users. Consider a researcher analyzing a scientific report: instead of manually extracting data from graphs or interpreting accompanying explanations, the researcher can simply rely on the model to summarize and correlate this information. Preliminary user tests have shown that Claude 3.5 Sonnet offers an approximately 60% reduction in the time taken to summarize and analyze documents compared to traditional text-only models. Additionally, the models deep understanding of visual data means it can describe and derive meaning from images and graphs that would otherwise require human intervention. By embedding this capability directly within the Claude model, Anthropic provides a one-stop solution for document analysisone that promises to save time and enhance productivity across sectors.The inclusion of PDF support in Claude 3.5 Sonnet is a major milestone in AI-driven document analysis. By integrating visual data comprehension along with text analysis, the model pushes the boundaries of how AI can be used to interact with complex documents. This update eliminates a major friction point for users who have had to deal with cumbersome workflows to extract meaningful insights from multimodal documents. Whether for academia, corporate research, or legal review, Claude 3.5 Sonnet offers a holistic, streamlined approach to document handling and is poised to change the way we think about data extraction and analysis. Check out the Details here. All credit for this research goes to the researchers of this project. Also,dont forget to follow us onTwitter and join ourTelegram Channel andLinkedIn Group. If you like our work, you will love ournewsletter.. Dont Forget to join our55k+ ML SubReddit. Aswin Ak+ postsAswin AK is a consulting intern at MarkTechPost. He is pursuing his Dual Degree at the Indian Institute of Technology, Kharagpur. He is passionate about data science and machine learning, bringing a strong academic background and hands-on experience in solving real-life cross-domain challenges. Listen to our latest AI podcasts and AI research videos here
    0 Σχόλια 0 Μοιράστηκε 6 Views
  • WWW.MARKTECHPOST.COM
    UniMTS: A Unified Pre-Training Procedure for Motion Time Series that Generalizes Across Diverse Device Latent Factors and Activities
    Recognition of human motion using time series from mobile and wearable devices is commonly used as key context information for various applications, from health condition monitoring to sports activity analysis to user habit studies. However, collecting large-scale motion time series data remains challenging due to security or privacy concerns. In the motion time series domain, the lack of datasets and an effective pre-training task makes it difficult to develop similar models that can operate with limited data. Typically, existing models perform training and testing on the same dataset, and they struggle to generalize across different datasets given three unique challenges within the motion time series problem domain: First, placing devices in different locations on the bodylike on the wrist versus the legleads to very different data, which makes it tough to use a model trained for one spot on another part. Second, since devices can be held in various orientations, its problematic because models trained with a device in one position often struggle when the device is held differently. Lastly, different datasets often focus on different types of activities, making it hard to compare or combine the data effectively.The conventional motion time series classification relies on separate classifiers for each dataset, using methods like statistical feature extraction, CNNs, RNNs, and attention models. General-purpose models like TimesNet and SHARE aim for task versatility, but they require training or testing on the same dataset; hence, they limit adaptability. Self-supervised learning helps in representation learning, though generalization across various datasets remains challenging. Pretrained models like ImageBind and IMU2CLIP consider motion and text data, but they are constrained by device-specific training. Methods that use large language models (LLMs) rely on prompts but have difficulty recognizing complex activities as they are not trained on raw motion time series and struggle with accurately recognizing complex activities.A group of researchers from UC San Diego, Amazon, and Qualcomm proposed UniMTS as the first unified pre-training procedure for motion time series that generalizes across diverse device latent factors and activities. UniMTS uses a contrastive learning framework to link motion time series data with enriched text descriptions from large language models (LLMs). This helps the model to understand the meaning behind different movements and allows it to generalize across various activities. For large-scale pre-training, UniMTS generates motion time series data based on existing detailed skeleton data, which covers various body parts. The generated data is then processed using graph networks to capture both spatial and temporal relationships across different device locations, helping the model generalize to data from different device placements.The process begins by creating motion data from skeleton movements and adjusting it according to different orientations. It also uses a graph encoder to understand how joints connect so it can work well across different devices. The text descriptions are improved using large language models. To create motion data, it calculates the velocities and accelerations of each joint while it considers their positions and orientations, adding noise to mimic real-world sensor errors. To handle inconsistencies in device orientation, UniMTS uses data augmentation to create random orientations during pre-training. This method takes into account variations in device positions and axis setups. By aligning motion data with text descriptions, the model can adapt well to different orientations and activity types. For training, UniMTS employs rotation-invariant data augmentation to handle device positioning differences. It was tested on the HumanML3D dataset and 18 other real-world motion time series benchmark datasets, notably with a performance improvement of 340% in the zero-shot setting, 16.3% in the few-shot setting, and 9.2% in the full-shot setting, compared with the respective best-performing baselines. The models performance was compared to baselines like ImageBind and IMU2CLIP. Results showed UniMTS outperformed other models, notably in zero-shot settings, based on statistical tests that confirmed significant improvements.UniMTS is solely based on physics-simulated data, yet it shows remarkable generalization across diverse real-world motion time series datasets featuring different device locations, orientations, and activities. While leveraging its performance from traditional methods, UniMTS possesses some limitations, too. In a broader sense, this pre-trained motion time series classification model can act as a potential base for the upcoming research in the field of human motion recognition!Check out the Paper, GitHub, and Model on Hugging Face. All credit for this research goes to the researchers of this project. Also,dont forget to follow us onTwitter and join ourTelegram Channel andLinkedIn Group. If you like our work, you will love ournewsletter.. Dont Forget to join our55k+ ML SubReddit. Divyesh Vitthal Jawkhede+ postsDivyesh is a consulting intern at Marktechpost. He is pursuing a BTech in Agricultural and Food Engineering from the Indian Institute of Technology, Kharagpur. He is a Data Science and Machine learning enthusiast who wants to integrate these leading technologies into the agricultural domain and solve challenges. Listen to our latest AI podcasts and AI research videos here
    0 Σχόλια 0 Μοιράστηκε 13 Views
  • WWW.MARKTECHPOST.COM
    Optimizing Large-Scale AI Model Pre-Training for Academic Research: A Resource-Efficient Approach
    The landscape of AI research is experiencing significant challenges due to the immense computational requirements of large pre-trained language and vision models. Training even relatively modest models demand substantial resources; for instance, Pythia-1B requires 64 GPUs for three days, while RoBERTa needs 1,000 GPUs for a single day. This computational barrier affects academic laboratories, limiting their ability to conduct controlled pre-training experiments. Moreover, lacking transparency regarding pre-training costs in academia creates additional obstacles, making it difficult for researchers to plan experiments, propose realistic grant budgets, and efficiently allocate resources.Previous attempts to address computational challenges in AI research include Compute surveys that explore resource access and environmental impacts but most focused narrowly on NLP communities. Next, training optimization techniques depend on manual tuning with specialized knowledge, while systems like Deepspeed Autotune focus on batch size and Zero-based model sharding optimizations. Some researchers have developed efficient pre-training recipes for models like BERT variants, achieving faster training times on limited GPUs. Moreover, Hardware recommendation studies have provided detailed guidance on equipment selection but highlight throughput metrics rather than practical training time considerations. These approaches still need to fully address the need for model-agnostic, replication-focused solutions that maintain original architecture integrity.Researchers from Brown University have proposed a comprehensive approach to clarify pre-training capabilities in academic settings. Their methodology combines a survey of academic researchers computational resources with empirical measurements of model replication times. A novel benchmark system is developed that evaluates pre-training duration across different GPUs and identifies optimal settings for maximum training efficiency. Through extensive experimentation involving 2,000 GPU hours, there are significant improvements in resource utilization. The results highlight potential improvements for academic pre-training, showing that models like Pythia-1B can be replicated using fewer GPU days than originally required.The proposed method utilizes a dual-category optimization strategy: free-lunch methods and memory-saving methods. Free-lunch methods represent optimizations with improvements in throughput and potential memory reduction without losing performance or requiring user intervention. These include model compilation, using off-the-shelf custom kernels as drop-in replacements for PyTorch modules, and utilizing TF32 mode for matrix operations. On the other hand, Memory-saving methods reduce memory consumption, introducing some performance trade-offs consisting of three key components: activation checkpointing, model sharding, and offloading. The system evaluates up to 22 unique combinations of memory-saving methods while maintaining free-lunch optimizations as a constant baseline.The empirical results show significant improvements over initial analytical predictions, which are overly optimistic by a factor of 6 times. Initial testing shows that 9 out of 20 model-GPU configurations are not feasible, with Pythia-1B requiring 41 days on 4 A100 GPUs using naive implementation. However, after implementing the optimized configuration methods, the research achieved an average 4.3 times speedup in training time, reducing Pythia-1B training to just 18 days on the same hardware setup. Moreover, the study reveals a surprising benefit: memory-saving methods, earlier associated with speed reduction, sometimes improved training time by up to 71%, especially for GPUs with limited memory or larger models.In conclusion, researchers from Brown University present a significant step toward bridging the growing computational divide between industry and academia in AI research. The study shows that academic institutions can train billion-parameter models despite resource limitations. The developed codebase and benchmark system provide practical tools for researchers to evaluate and optimize their hardware configurations before making substantial investments. It allows academic groups to find optimal training settings specific to their available resources and run preliminary tests on cloud platforms. This work marks an important milestone in empowering academic researchers to engage more actively in large-scale AI model development.Check out the Paper and GitHub. All credit for this research goes to the researchers of this project. Also,dont forget to follow us onTwitter and join ourTelegram Channel andLinkedIn Group. If you like our work, you will love ournewsletter.. Dont Forget to join our55k+ ML SubReddit. Sajjad Ansari+ postsSajjad Ansari is a final year undergraduate from IIT Kharagpur. As a Tech enthusiast, he delves into the practical applications of AI with a focus on understanding the impact of AI technologies and their real-world implications. He aims to articulate complex AI concepts in a clear and accessible manner. Listen to our latest AI podcasts and AI research videos here
    0 Σχόλια 0 Μοιράστηκε 16 Views
  • WWW.MARKTECHPOST.COM
    OpenAI Introduces Predicted Outputs Feature: Speeding Up GPT-4o by ~5x for Tasks like Editing Docs or Refactoring Code
    The use of large language models like GPT-4o and GPT-4o-mini has brought significant advancements in natural language processing, enabling high-quality response generation, document rewriting, and productivity enhancements across numerous applications. However, one of the biggest challenges these models face is latency. Whether its updating a blog post or refining lines of code, the lag associated with response generation can hinder seamless user experiences. This latency is particularly evident in applications requiring multiple iterations, such as document refinement or code rewriting, where users often experience frustrating delays that hamper productivity and discourage real-time use.OpenAI has introduced the Predicted Outputs feature, which dramatically decreases latency for GPT-4o and GPT-4o-mini by providing a reference string. This feature is a game-changer, especially for those who use language models to iterate over content or make repeated updates. The key innovation lies in the ability to predict probable content and use it as a starting point for the model, effectively skipping portions of the process where the outcome is already well-established. By reducing computational overhead through this speculative decoding approach, latency can be decreased by as much as fivefold, making GPT-4o far more suitable for real-time tasks like document updates, code editing, and other iterative text generation activities. This enhancement is particularly beneficial for developers, content creators, and professionals who require rapid updates and minimal downtime in their workflows.Technical Details and BenefitsThe core mechanism behind Predicted Outputs is speculative decoding, a clever approach that allows the model to skip over known or expected content. Imagine you are updating a document where only minor edits are needed. In traditional scenarios, GPT models generate text word by word, evaluating each possible token at every stage, which can be time-consuming. However, with speculative decoding, if parts of the text can be predicted based on a provided reference string, the model can skip over them and immediately jump to the sections that require computation. This skipping mechanism significantly reduces latency, making it possible to iterate quickly on prior responses. Additionally, Predicted Outputs work particularly well in contexts where rapid turnaround is essential, such as live document collaboration, fast code refactoring, or real-time article updates. The integration of this feature ensures that interactions with GPT-4o are not only more efficient but also less burdensome for the infrastructure, ultimately reducing costs.https://x.com/FactoryAI/status/1853563170448965788Why Predicted Outputs MatterThe importance of the Predicted Outputs feature cannot be overstated. One key reason is the dramatic reduction in latency it provides, as speed becomes a critical factor in the effectiveness of AI applications for real-world scenarios. For instance, an improvement in latency of up to fivefold can make a significant difference for developers who rely on AI tools to rewrite or refine code, allowing them to work faster with fewer interruptions. Similarly, content creators updating blogs or documents in real-time will find the reduced latency crucial in enhancing their productivity and keeping content up to date. Results from OpenAIs testing have shown that GPT-4os performance on latency-sensitive tasks, such as iterative document editing and code rewriting, has improved considerably, with up to 5x faster response times in common use cases. By cutting down on lag, Predicted Outputs not only save time but also make GPT-4o and GPT-4o-mini more accessible and practical for a broader range of users, from professional developers to writers and educators.ConclusionOpenAIs introduction of the Predicted Outputs feature for GPT-4o and GPT-4o-mini marks a major step toward addressing one of the most significant limitations of language models: latency. With the incorporation of speculative decoding, this feature dramatically speeds up tasks such as document editing, content iteration, and code refactoring. The reduction in response time is transformative for user experience, ensuring that GPT-4o remains at the forefront of practical AI applications. By enabling up to 5x faster processing, Predicted Outputs make these models more efficient, allowing users to focus on creativity and problem-solving rather than waiting on model computations. For anyone relying on AI to enhance their productivity, this is a welcome development that takes us closer to seamless, real-time interaction with powerful language models.Check out the Details and Tweet. All credit for this research goes to the researchers of this project. Also,dont forget to follow us onTwitter and join ourTelegram Channel andLinkedIn Group. If you like our work, you will love ournewsletter.. Dont Forget to join our55k+ ML SubReddit. Asif RazzaqAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences. Listen to our latest AI podcasts and AI research videos here
    0 Σχόλια 0 Μοιράστηκε 7 Views
  • WWW.MARKTECHPOST.COM
    How to Become a Data Analyst? Step by Step Guide
    In todays data-driven world, data analysts play a crucial role in various domains. Businesses use data extensively to inform strategy, enhance operations, and obtain a competitive edge. Professionals known as data analysts enable this by turning complicated raw data into understandable, useful insights that help in decision-making. They navigate the whole data analysis cycle, from discovering and collecting pertinent data to getting it ready for analysis, interpreting the findings, and formulating suggestions. Whether its determining the best client segments, increasing operational effectiveness, or predicting future trends, this approach assists firms in answering particular questions, finding patterns, and comprehending trends that affect important results.Data analysts are in high demand, and possibilities are opening up in industries like government, business, healthcare, and finance. As businesses depend more and more on data to make quicker, more accurate, and more intelligent decisions, the field of data analytics keeps growing. Today, many businesses view data analytics as a crucial role, creating a stable need for qualified workers and a bright future for ambitious analysts.This article has covered what it means to be a data analyst, the necessary tools and abilities, the top learning resources, and doable strategies for establishing a prosperous career in this fulfilling sector.Table of contentsWhat Do Data Analysts Do?Skills Required for SuccessOnline Courses to Become Data AnalystsKey Tools and Technologies for Data AnalysisBuilding Practical Experience in Data AnalyticsProjects and Portfolio DevelopmentKaggle Competitions and Other PlatformsNetworking and Community EngagementJob Search and Career Advancement Tips Websites for Interview PreparationConclusionUnderstanding the Role of a Data AnalystWhat Do Data Analysts Do?Depending on the organizations adoption of data-driven decision-making procedures, a data analysts position can vary greatly. But fundamentally, the job of a data analyst is to turn unprocessed data into insights that can be used to inform business choices. In order to solve particular business questions, this process usually includes developing and managing data systems, collecting and cleaning data, analyzing it statistically, and interpreting the findings.Creating and managing databases to guarantee data integrity, locating and obtaining pertinent data sources, and applying statistical tools to find significant trends and patterns are some of the major responsibilities. In order to help stakeholders make well-informed decisions based on current market or operational trends, analysts frequently create comprehensive reports and dashboards that emphasize these insights.To find chances for process enhancements, suggest system changes, and maintain data governance standards, data analysts also work with programmers, engineers, and company executives. For example, a data analyst can look at demographics associated with the campaign to find out if the target audience is being reached, analyze the campaigns effectiveness, and consider whether to invest in similar advertising campaigns in the future.Skills Required for SuccessStrong analytical skills and technological knowledge are essential for a successful data analyst. To query and manipulate data, one must be proficient in computer languages like Python, R, and SQL. For statistical analysis, spreadsheet programs like Microsoft Excel are frequently utilized, and data visualization tools like Tableau and Power BI allow analysts to visually convey their findings. Accurate data interpretation, projection, and conclusion-making require a solid mathematical and statistical foundation.Leadership abilities that enable analysts to think strategically and communicate their findings to stakeholders who are not technical, like problem-solving and good communication, are equally crucial. For example, project managers use data analysts to monitor important project indicators, identify problems, and assess the possible effects of different remedies. Data analysts succeed in this dynamic sector when they possess a combination of curiosity, attention to detail, and a dedication to remaining current with industry trends.Online Courses to Become Data AnalystsIntroduction to Data AnalyticsIBM Data Analyst Professional CertificateExcel Skills for Data Analytics and Visualization SpecializationData Analysis EssentialsData Analysis with PythonGoogle Data Analytics Professional CertificateIBM: Analyzing Data with ExcelIntroduction to Data Analytics for BusinessExcel for Everyone: Data Analysis FundamentalsFoundations: Data, Data, EverywhereData Analytics for BusinessIBM Data Analytics with Excel and R Professional CertificateData Analysis and Visualization Foundations SpecializationHigh-Dimensional Data AnalysisIntroduction to Data AnalyticsData Analytics and Visualization in Health CarePython Data AnalyticsAnalytics for Decision-MakingGetting Started with Data Analytics on AWSDatabases: Modeling and TheoryKey Tools and Technologies for Data AnalysisTableauTableau is well known for its user-friendly data visualization features, which let users make dynamic, interactive dashboards without knowing any code. Tableau helps convey complicated information in a way that stakeholders can understand, making it very helpful for analysts and non-technical users alike. Tableau is a cost-effective option for businesses concentrating on data-driven storytelling and visualization.Microsoft Azure Machine LearningData scientists can create, train, and implement models with Microsoft Azure Machine Learning, a cloud-based platform. Because of its connectivity with other Azure services, it is quite flexible for purposes involving the deployment of AI and large-scale data processing. Users can choose between using its visual interface or code-based solutions depending on user preference and technical proficiency.Microsoft Power BIFor businesses looking to integrate AI and improve their data analysis capabilities, Microsoft Power BI is a crucial tool. Its advanced text analysis features allow users to extract significant phrases and do sentiment analysis, improving the overall caliber of data insights. Power BI assists companies in gaining a thorough grasp of consumer preferences and market sentiment by converting unstructured data, such as internal papers, social media posts, and customer feedback, into structured insights.IBM Watson AnalyticsIBM AI-driven insights are used by Watson Analytics, a cloud-based data analysis and visualization tool, to assist users in understanding their data. Users can rapidly find trends, patterns, and relationships in data using its automatic data discovery tool. For business customers who require actionable information without requiring sophisticated analytics expertise, Watson Analytics is a great fit.Databricks Unified Data Analytics PlatformDatabricks provides a single cloud-based platform for the large-scale deployment of enterprise-grade AI and data analytics solutions. Databricks, which is well-known for having a solid foundation in Apache Spark, is perfect for companies looking to incorporate data science and machine learning into product development. It is frequently used to spur creativity and quicken the creation of data-driven applications in a variety of industries, including technology and finance.PyTorchThe deep learning framework PyTorch is well-known for its adaptability and broad support for applications like computer vision, reinforcement learning, and natural language processing. PyTorch, an open-source framework, is widely used in both commercial and academic applications, especially when neural networks are needed. Deep learning practitioners choose it because of its large community and libraries. PyTorch usage is priced according to cloud provider fees and usually starts at $0.07 per hour, making it affordable for both production and experimentation.H2O.aiAn open-source AI and machine learning platform called H2O.ai has a strong emphasis on in-memory processing, which greatly accelerates data analysis. Because of the platforms automatic machine-learning features, users can develop predictive models without requiring much technical knowledge. Businesses that require quick insights and effective management of large datasets can benefit from H2O.ai. Because it is a free tool, it appeals to businesses that wish to study some good ML models without making a financial commitment.Google Cloud Smart AnalyticsGoogle Cloud Smart Analytics delivers AI-powered tools for enterprises looking to transform data into strategic assets. Leveraging Googles expertise in data handling and AI innovation, this platform offers extensive analytics capabilities that range from marketing and business intelligence to data science. Google Cloud Smart Analytics supports organizations in building data-driven workflows and implementing AI at scale.PolymerPolymer provides an agentless platform for data security that simplifies business intelligence and improves data security by utilizing machine learning. Without knowing any code, users can construct dashboards, embed analytics in presentations, and create data visualizations with its user-friendly features. By producing visualizations based on natural language input, Polymers conversational AI assistant, PolyAI, significantly increases productivity and saves analysts time.RapidMinerRapidMiner is a potent AI platform that can be used by users of all skill levels because of its intuitive drag-and-drop interface. From data preparation to model deployment, it provides a comprehensive solution. RapidMiners versatility for a wide range of data types includes the ability to analyze text, pictures, and audio. Businesses can swiftly extract insights and automate procedures because of its interaction with machine learning and deep learning frameworks. RapidMiner is perfect for small-scale projects and accessible for novices because it is free and has minimal functionality.AkkioAkkio is a flexible tool for forecasting and business analytics. It offers a user-friendly starting point for anyone who wants to examine their data and predict results. After choosing a target variable and uploading their dataset, users can use Akkio to build a neural network centered on that particular variable. Because it doesnt require coding knowledge, its perfect for predictive analytics, particularly in marketing and sales. Eighty percent of the dataset is used for training, and the remaining twenty percent is used for validation in Akkios model-building process.AlteryxWith Alteryxs new no-code AI studio, users can create custom analytics apps with their own company data and can query using a natural language interface that incorporates models such as OpenAIs GPT-4. The platform, which is driven by the Alteryx AiDIN engine, is notable for its approachable and intuitive design in terms of insight production and predictive analytics. The Workflow Summary Tool, which converts intricate procedures into concise summaries in natural language, is one of its highlights. In order to effectively reach their target customers, users can additionally select particular report formats, such as PowerPoint or email.SisenseSisense is an analytics platform that helps developers and analysts create smart data products by combining AI with easy-to-use, no-code tools. By integrating intelligence into processes and products, Sisenses AI-driven analytics help businesses make well-informed decisions and increase user engagement. More than 2,000 businesses have been using Sisense Fusion as a powerful AI and analytics platform to develop distinctive, high-value products.Julius AIJulius AI automates difficult data analysis, simplifying findings and visualizations for both entry-level and seasoned analysts. Integrating easily with existing platforms, Julius accelerates data processing with intuitive interfaces and advanced predictive analytics, making it a key asset for teams trying to translate massive datasets into meaningful insights.LuzmoLuzmo is a no-code, user-friendly analytics tool made especially for SaaS platforms that enable users to generate interactive charts and dashboards easily. Businesses wishing to improve their analytics skills without needing a lot of coding expertise can use this tool. Users can save time while still receiving high-quality insights with the help of Luzmos API compatibility with well-known AI products like ChatGPT, which makes dashboard generation effective and automatic. Everyone can interact with data more easily because of the user-friendly interface, which also promotes teamwork and expedites the analytics process.KNIMEKNIME is an open-source platform that democratizes access to data science tools, facilitating advanced analytics and machine learning for users of all skill levels. KNIME promotes a more inclusive data science environment by enabling users to construct data workflows without the need for complex programming knowledge through its user-friendly drag-and-drop interface. Modern analytics requires smooth integration with a wide range of data sources, which is made possible by the platforms support for more than 300 data connectors.AnswerRocketAnswerRocket serves as an AI-powered data analysis helper, making it easier to glean insights from a variety of data sources. AnswerRocket breaks down conventional barriers to data insights by enabling non-technical individuals to communicate with their data through natural language queries. Without requiring in-depth analytical training, the platforms AI capabilities allow it to provide proactive insights and data-driven suggestions, assisting users in finding significant patterns and trends.DataLabDataLab is a powerful AI-powered data notebook that combines generative AI technology with a strong IDE to expedite the conversion of data into meaningful insights. Users can engage directly with their data using an easy-to-use chat interface, developing, updating, and debugging code, analyzing datasets, and producing comprehensive reports all on one platform. By allowing users to chat with their data, DataLabs AI Assistant further increases productivity by streamlining processes like coding, providing context-specific recommendations to improve workflow efficiency, and explaining data structures. DataLabs collaborative capabilities facilitate real-time teamwork by enabling several people to work on projects at once, exchange ideas, and easily handle version control. DataLab automatically creates editable, real-time reports that users can share with ease as they dive into data research. It guarantees simple data importation and extensive analytical capabilities in a single, user-friendly environment by being compatible with a variety of data sources, such as CSV files, Google Sheets, Snowflake, and BigQuery.LookerGoogle Clouds Looker is a powerful no-code platform for business intelligence and data analysis with a wide range of integration possibilities. It effectively manages huge datasets, allowing users to combine several data sources into a single, coherent view and generate numerous dashboards and reports. The platform takes advantage of Googles robust support infrastructure and provides sophisticated data modeling capabilities.EchobaseEchobase is an AI-powered platform designed for companies looking to use data to improve productivity and cooperation. Echobase streamlines procedures that call for analytics and creativity by using AI agents trained for jobs like content production, data analysis, and Q&A. Echobase integration is easy as users dont need to know any code to connect cloud storage or upload files. The platform facilitates teamwork by enabling users to interact with AI bots for data searches and analysis by offering role assignments and permission management. AWS encryption protects sensitive data while giving users total control over it, demonstrating the importance of data security. In addition to analysis, Echobase provides a range of AI tools to assist with operational and creative activities, such as narrative creators and email and paragraph generators.BlazeSQLBlazeSQL is an AI-powered tool that converts natural language queries into SQL, making it easier for people who dont know much about SQL to retrieve data. Teams can easily extract and visualize data from databases with the help of the SQL queries it automatically generates. Compatibility with several SQL databases, including MySQL, PostgreSQL, Microsoft SQL Server, Snowflake, BigQuery, and Redshift, accounts for BlazeSQLs adaptability. It prioritizes data security and privacy by storing data interactions on the users local device and is available in both desktop and cloud versions. One of the main benefits is no-code SQL generation, which enables users to quickly convert text prompts into SQL queries.MonkeyLearnMonkeyLearn is a no-code AI platform that simplifies data organization and visualization. MonkeyLearns powerful text analysis features enable it to change data visualization quickly and let customers configure classifiers and extractors to automatically categorize data by subject or purpose or to extract important product aspects and user information. It is useful for managing customer feedback and support tickets since it can automate processes and remove hours of manual data processing by utilizing machine learning. Its ability to highlight particular text for fast sorting and categorizing incoming data, like emails or tickets by keywords, is a good feature.Google SheetsWith integrated machine learning capabilities that facilitate data analysis and visualization, Google Sheets is a versatile spreadsheet application. For individuals and small teams seeking a flexible platform for data work, it provides real-time communication, data visualization, and ML-powered analytical capabilities. Personal users can use it for free, and business users can choose from pricing options under G Suite.ThoughtSpotAn AI-powered analytics tool called ThoughtSpot enables users to ask sophisticated business queries in normal language and receive perceptive, AI-powered responses. With features like interactive visualizations, data modeling support, AI-guided search, and real-time data monitoring, its perfect for analysts and business users who require quick, clear insights. ThoughtSpot is a cloud-based solution that offers adjustable pricing to accommodate different requirements.TalendTalend is a comprehensive platform for data integration, monitoring, and administration, which allows users to process and analyze data from a variety of big data sources, such as Hadoop, Spark, and Hive. It is an effective tool for companies that prioritize safe and legal data management because of its important characteristics, which include strong integration capabilities, data monitoring, and support for large data environments. Talend provides both commercial and open-source versions, with scalable and adaptable pricing options to accommodate a range of requirements.Building Practical Experience in Data AnalyticsInternships and Entry-Level PositionsGetting an internship or entry-level job is a crucial first step in developing real-world data analysis experience. Building a strong CV and emphasizing pertinent education, certificates, and any technological abilities like Excel, SQL, Python, or Tableau increases chances. Each application can be made unique to the business and position by highlighting the particular abilities the employer is looking for.In the resume and interviews, personal projects or case studies can be included, whether they were self-initiated or completed as part of studies. In a competitive market, a candidate can differentiate themselves by showcasing readiness to learn and a proactive approach to solving real-world challenges.Projects and Portfolio DevelopmentTo differentiate as a data analyst, building a portfolio of personal work is good practice. Projects that highlight technical and analytical abilities, such as data cleansing, exploratory data analysis, statistical modeling, and data visualization, should be included in the portfolio. One excellent option for publicly displaying work is to host it on GitHub. It shows that one can manage a project, record activities, and adhere to best practices, in addition to giving prospective employers access to code and problem-solving methodology. With concise summaries of each project, an explanation of the issue, and any new insights or solutions discovered, the portfolio should be simple to browse.Kaggle Competitions and Other PlatformsData analytics expertise can be used to solve practical issues in competitions hosted by Kaggle, an online community of data scientists and machine learning professionals. Business-related problems can be solved by working with a variety of datasets and honing modeling and data manipulation abilities by taking part in Kaggle tournaments.Networking and Community EngagementJoining Data Communities and MeetupsOne effective strategy for learning, developing, and connecting with other data analytics professionals is networking. Attending meetups and joining data-focused networks can help learn more, become familiar with industry trends, and open doors to possible career prospects. Data science and analytics-focused LinkedIn groups are an excellent place to start because they exchange resources, have discussions, and occasionally post job openings or joint projects. In a similar vein, one can network with analysts, exchange stories, and discover practical uses of data analytics in other sectors by going to local data science meetings.Another easily accessible choice for networking and skill development is webinars. Numerous organizations hold webinars with presenters who are specialists in the field on subjects like machine learning algorithms and data visualization strategies. In addition to offering insightful information, these sessions allow to interact with experts via group chats and Q&A sessions.LinkedIn and GitHub for Professional NetworkingAny data analyst must have a professional internet profile, and two important sites to use are GitHub and LinkedIn. It should be ensured that ones LinkedIn profile is comprehensive, highlighting experience, projects, credentials, and abilities. The profile can stand out if one writes a good headline and summary, especially if it contains particular data tools or abilities related to desired work roles.A useful portfolio tool for demonstrating coding and project management abilities is GitHub. Participating in open-source projects, uploading personal projects on a regular basis, and maintaining repository documentation and organization is a good way to showcase skills. This not only shows off technical skills but also makes it simple for industry experts and recruiters to look over the work. Because they offer concrete proof of skills and commitment to lifelong learning, GitHub projects can be a great conversation starter and a useful tool for networking events and job applications.Job Search and Career Advancement TipsPreparing for InterviewsInterviews for data analysts frequently consist of multiple steps to evaluate a candidates technical proficiency, critical thinking, and good insight communication. The interview procedure usually consists of:Initial Screening: To find out more about the candidates experience, abilities, and motivation, a recruiter or hiring manager can speak over the phone or via video conference. This is frequently a chance to enquire about the position and the business.Technical Tests: To evaluate problem-solving abilities, many data analyst positions call for a technical test that may consist of data manipulation exercises, short coding challenges, or projects involving SQL, Excel, or Python.Business scenarios and case studies: Case studies assess ones capacity to evaluate a business issue, decipher information, and offer practical solutions. In these tasks, the candidate might be asked to describe how they would collect, examine, and interpret data for business decision-making in a hypothetical situation.Situational and behavioral interviews concentrate on soft skills, including problem-solving, communication, and teamwork.Websites for Interview PreparationDataCamp: It offers interactive courses in SQL, Python, and R, along with practice assessments to prepare for data analyst interviews.Pramp It facilitates mock interviews with peers and includes data analysis problem sets, which are ideal for practicing interview-style questions in real-time.AlgoExpert It is a comprehensive platform with curated problems, video explanations, and mock interview tools for technical interview preparation.StrataScratch It provides real-world SQL interview questions used by top tech companies, which is ideal for honing SQL skills.Interview Query This website specializes in data science and data analytics interview prepration, with practice questions and mock interviews.Glassdoor It provides access to interview questions specific to data analyst roles by company.Codeacademy: It provides an exclusive interview preparation kit for data analyst roles.Simplilearn: It includes top-asked data analyst interview questions, guiding the candidate in the interview process.TechCanvass: This website includes the compilation of the top interview questions and responses related to data analytics. The questions have been categorized for easy learning.Exponent: This platform helps candidates in appearing for mock interviews.ConclusionIn conclusion, the path to becoming a data analyst is one that is full of chances to truly make an impact. Gaining a firm grasp of the principles of data analytics, gaining real-world experience through projects or internships, improving technical abilities, and actively networking with specialists in the field can help secure a data analyst job. With dedication and a strategic approach, one can be well on their way to a rewarding career as a data analyst.The post How to Become a Data Analyst? Step by Step Guide appeared first on MarkTechPost.
    0 Σχόλια 0 Μοιράστηκε 6 Views
και άλλες ιστορίες