
Convergence Releases Proxy Lite: A Mini, Open-Weights Version of Proxy Assistant Performing Pretty Well on UI Navigation Tasks
www.marktechpost.com
In todays digital landscape, automating interactions with web content remains a nuanced challenge. Many existing solutions are resource-intensive and tailored for narrowly defined tasks, which limits their broader applicability. Developers often face the dual challenge of balancing computational efficiency with the need for a model that can generalize well across diverse websites. Traditional systems, heavily reliant on prompt-prediction, often lack the reflective reasoning required for the unpredictable nature of web environments. Additionally, proprietary models typically restrict access to detailed inner workings, making it difficult for researchers and practitioners in the open-source community to build on state-of-the-art methods. These persistent issues underline the importance of developing an automation tool that is both efficient and accessible.Convergence has introduced Proxy Lite: a mini, open-weights version of their well-regarded Proxy assistant. This 3B parameter Vision-Language Model is designed to extend sophisticated web automation capabilities to the open-source community. Rather than promising extraordinary feats, Proxy Lite aims to offer a balanced approach that marries efficiency with reliability. Its architecture builds on a solid foundation, allowing it to perform a variety of web-based tasks without imposing heavy computational demands.What makes Proxy Lite notable is its transparent design and open-weights approach. This encourages the community to explore, modify, and improve upon its framework. With an integrated system for Vision-Language Model (VLM) and browser interactions, Proxy Lite allows for nuanced control over browser tasks. The models configuration supports practical applications ranging from routine data extraction to more complex navigational tasks, all while keeping resource usage in check.Technical Aspects and Their BenefitsAt its core, Proxy Lite leverages a 3B parameter model built on the Qwen2.5-VL-3B-Instruct foundation. This choice reflects a commitment to balancing performance with efficiency. The model employs a three-phase process to generate responses:Observation: The model first examines the current state of the web pageconfirming, for instance, that an overlay or privacy banner has been dismissed.Thinking: It then methodically determines the next course of action, weighing the various possibilities based on the context.Tool Call: Finally, it issues a precise command to execute the selected action within the browser.This structured approach not only improves task reliability but also facilitates the models ability to generalize across different types of web interactions. By mirroring human-like reasoning processes, Proxy Lite manages to strike a balance between simplicity and sophistication. Moreover, its design supports a straightforward integration into both command-line interfaces and Streamlit applications, making deployment accessible even for those with modest technical resources.Performance Insights and Practical EvaluationsProxy Lite has been carefully evaluated using the WebVoyager benchmark, a comprehensive set of tasks designed to test web automation capabilities. The model achieved an overall score of 72.4%, a strong performance indicator given its open-weights nature. Detailed performance statistics across various websites reveal its thoughtful design:Allrecipes: Achieving an 87.8% success rate with an average of 10.3 message exchanges, it demonstrates effectiveness in content-rich environments.Amazon: A 70.0% success rate here highlights the models ability to navigate more complex, dynamic e-commerce platforms.Notable High-Profile Sites: With success rates in the low 80s on platforms such as Apple and GitHub, Proxy Lite consistently shows reliable behavior on diverse sites.Google Services: While some areas, such as Google Flights, yield lower success metrics, the overall performance remains competitive considering the models scope.These findings reflect a balanced performance, with Proxy Lite efficiently managing tasks without the overhead typically associated with larger, proprietary models. The comprehensive evaluation not only underscores its current utility but also points to potential enhancements through community-driven refinements.ConclusionProxy Lite emerges as a thoughtfully designed tool in the field of web automation. By addressing key challengessuch as resource constraints, generalization, and transparencyit offers a practical solution for automating routine online tasks. Its open-weights approach and modular design invite collaboration and ongoing development, providing a valuable resource for both academic research and commercial projects.Check outthe Technical Details andModel here.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our80k+ ML SubReddit. Asif RazzaqWebsite| + postsBioAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.Asif Razzaqhttps://www.marktechpost.com/author/6flvq/DeepSeek AI Releases DeepEP: An Open-Source EP Communication Library for MoE Model Training and InferenceAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Building an Interactive Weather Data Scraper in Google Colab: A Code Guide to Extract, Display, and Download Live Forecast Data Using Python, BeautifulSoup, Requests, Pandas, and IpywidgetsAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Building a Legal AI Chatbot: A Step-by-Step Guide Using bigscience/T0pp LLM, Open-Source NLP Models, Streamlit, PyTorch, and Hugging Face Transformers Recommended Open-Source AI Platform: IntellAgent is a An Open-Source Multi-Agent Framework to Evaluate Complex Conversational AI System' (Promoted)
0 Комментарии
·0 Поделились
·39 Просмотры