• Convergence AI Releases WebGames: A Comprehensive Benchmark Suite Designed to Evaluate General-Purpose Web-Browsing AI Agents
    www.marktechpost.com
    AI agents are becoming more advanced and capable of handling complex tasks across different platforms. Websites and desktop applications are intended for human use, which demands knowledge of visual arrangements, interactive components, and time-based behavior. Handling such systems requires monitoring user actions, from clicks to sophisticated drag-and-drop actions. Such challenges are difficult for AI to handle and cannot compete with human capability regarding web tasks. A broader evaluation system is necessary to measure and improve AI agents for web browsing.GPT-4o, Claude Computer-Use, Gemini-1.5-Pro, and Qwen2-VL struggle with navigation and task execution. Initially based on reinforcement learning, traditional evaluation frameworks expanded to web tasks but remained limited to short-context scenarios, leading to quick saturation and incomplete assessments. Modern web interaction requires advanced skills like tool usage, planning, and environmental reasoning, which are not fully tested. While multi-agent interactions are gaining attention, current methods do not effectively evaluate collaboration and competition between AI systems.To address the limitations of current AI benchmarks in web interaction, researchers from Convergence Labs Ltd. and Clusterfudge Ltd. proposed WebGames, a framework designed to evaluate web-browsing AI agents through over 50 interactive challenges. These challenges include basic browser usage, complex input management, mental thinking, workflow automation, and interactive amusement. Compared to the prior benchmarks, WebGames intends to measure correctly by separating interaction skills and providing tested AI with control. Its client-side design prevents dependencies on external resources, providing uniform and reproducible tests.WebGames is modular in design. It specifies problems in a standardized JSONL format for effortless integration with automated test frameworks and extension with additional tasks. All problems follow a deterministic verification structure that ensures task verifiability when it is done. The structure examines AI performance in a systematic way through web interactions, quantifying navigation, decision-making, and adaptability ability in dynamic environments.Researchers evaluated leading vision-language foundation models, including GPT-4o, Claude Computer-Use (Sonnet 3.5), Gemini-1.5-Pro, Qwen2-VL, and a Proxy assistant, using WebGames to assess their web interaction capabilities. Since most models were not designed for web interactions, they required scaffolding through a Chromium browser using Playwright. Except for Claude, the models lacked sufficient GUI grounding to determine exact pixel locations, so a Set-of-Marks (SoMs) approach was used to highlight relevant elements. The models operated within a partially observed Markov decision process (POMDP), receiving JPEG screenshots and text-based SoM elements while executing tool-based actions through a ReAct-style prompting method. The evaluation showed that Claude scored lower than GPT-4 despite having more precise web control, likely due to Anthropics training restrictions preventing actions resembling human behavior. Human participants from Prolific completed tasks easily, averaging 80 minutes and earning 18, with some achieving 100% scores. The findings revealed a wide capability gap between human and AI abilities, much like the ARC challenge, with some activities such as Slider Symphony demanding exacting drag-and-drop capabilities that proved difficult for models to accomplish, revealing limitations in AI abilities to interact on real-world websites.In summary, the proposed benchmark found a significant gap in human vs. AI performance for web interaction tasks. The best-performing AI model, GPT-4o, only achieved 41.2% success, whereas humans achieved 95.7%. The results revealed that current AI systems struggle with intuitive web interaction, and constraints on models like Claude Computer-Use still impede the tasks success. This approach can be used as a reference point for further research, with improvements in AI flexibility, reasoning, and web interaction efficiency being directed.Check outthe Paper and GitHub Page.All credit for this research goes to the researchers of this project. Also,feel free to follow us onTwitterand dont forget to join our80k+ ML SubReddit. Divyesh Vitthal JawkhedeDivyesh is a consulting intern at Marktechpost. He is pursuing a BTech in Agricultural and Food Engineering from the Indian Institute of Technology, Kharagpur. He is a Data Science and Machine learning enthusiast who wants to integrate these leading technologies into the agricultural domain and solve challenges.Divyesh Vitthal Jawkhedehttps://www.marktechpost.com/author/divyesh-jawkhede/Simplifying Self-Supervised Vision: How Coding Rate Regularization Transforms DINO & DINOv2Divyesh Vitthal Jawkhedehttps://www.marktechpost.com/author/divyesh-jawkhede/SongGen: A Fully Open-Source Single-Stage Auto-Regressive Transformer Designed for Controllable Song GenerationDivyesh Vitthal Jawkhedehttps://www.marktechpost.com/author/divyesh-jawkhede/Optimizing Imitation Learning: How XIL is Shaping the Future of RoboticsDivyesh Vitthal Jawkhedehttps://www.marktechpost.com/author/divyesh-jawkhede/Sony Researchers Propose TalkHier: A Novel AI Framework for LLM-MA Systems that Addresses Key Challenges in Communication and Refinement Recommended Open-Source AI Platform: IntellAgent is a An Open-Source Multi-Agent Framework to Evaluate Complex Conversational AI System' (Promoted)
    0 Comments ·0 Shares ·61 Views
  • Defiant, Inc.: Finance Coordinator
    weworkremotely.com
    Are you excited about working for a technology company that is securing the web? Are you looking for full-time, flexible hours* working remotely from anywhere in the United States? If so, this may be your dream job!We are hiring a Finance Coordinator who will oversee the administration and processing of semi-monthly payroll for employees located in multiple states, utilizing BambooHR to ensure compliance with all relevant regulations. This position will also maintain precise payroll records, manage tax responsibilities, and provide support to the Director of Finance for various accounting tasks.Salary is $60,000 to $80,000, depending on experience.This position requires that you be eligible to work in the United States without immigration assistance and that you currently live in the US.*Required core hours of 10am - 1pm Pacific Time, Monday through Friday.RequirementsPayroll:Payroll Processing: Administer and process semi-monthly payroll for employees across multiple states using BambooHR, ensuring compliance with applicable regulations.Compliance and Tax Registration: Stay updated on federal, state, and local payroll regulations, register and set up state tax withholding and unemployment accounts as needed, update in payroll system, review payroll tax reports W-2s, and other required documentation prepared by BambooHR to ensure accuracy. Research and resolve issues.Data Management: Maintain accurate employee payroll records, including benefits deductions, direct deposit information, tax elections, as well as PTO calculations and accruals.Employee Support: Respond to employee inquiries related to payroll, tax withholdings, and benefits deductions, troubleshoot and resolve payroll discrepancies proactively.System Management: Operate and manage the payroll system BambooHR to process payroll and generate reports.Taxes:Maintain companys tax database.Prepare and assist with schedules, returns, payments, reports.Adhere to deadlines for city and state tax filings.Accounting:Bookkeeping of financial transactions in QBO.Assisting the Director of Finance (DOF) in month close activities.Pay vendor/subcontractors invoices and track bank account balances.Verify the accuracy of business accounts.Manage employee expense claims in Expensify.Prepare 1099s for Contractors.Help the DOF/Accountants with administrative duties and prepare taxes and reports.Qualifications:Bachelor's degree in finance, accounting, or business administration.Minimum of four years of experience in a finance role.Strong knowledge of payroll practices, labor laws, and tax regulations.Experience with registering state withholding and unemployment accounts is highly desirable.Multi-state payroll tax experience is a big plus.Keen data entry attention to detail for financial records and recognizing errors.Strong mathematics skills for accurate recordkeeping.High quality administrative skills for filing financial records and Payroll.Computer literacy, especially familiarity with spreadsheets, databases and accounting software such as Excel, Google Sheets, and QuickBooks Online.Multi-tasking and organizational skills to manage different financial duties, including the ability to prioritize tasks in order to meet deadlines.Ability to effectively work both independently or in a team setting.Excellent communication skills, including the ability to explain complex financial matters in accessible terms.Hiring ProcessWe review all applications submitted and respond to all candidates typically within one to two weeks. All interviews are done remotely with no travel involved.All positions require a trial period of approximately 2-3 weeks with a minimum commitment of 10 hours per week. You will be paid for this short-term contract, and it will be used to evaluate whether both parties want to pursue an ongoing, regular employment relationship.All offers of employment are contingent on successful completion of a background check. The results of the background check are considered as they relate to the position and do not automatically disqualify someone from an offer of employment with the company.BenefitsFull time telecommuting and flexible working hours, with a company that has been 100% remote for more than a decade.100% employee premium and 50% of dependent premium paid by company for premier level medical, dental, and vision insurance.21 days PTO per year to start.Approximately 12 paid company holidays including the week from December 25 to January 1.New Parent leave policy for up to 16 weeks of paid leave at full base salary, eligible after 1 year of employment.401(k) with a 4% Safe Harbor company match that is 100% vested immediately.Latest in laptop and workstation technology.Paid training and study time for work-related training and certifications.College tuition and Student Loan reimbursement.Wellness reimbursement program for health and fitness purchases.Mobile phone and internet reimbursement up to $100 per month.One time Beverage Machine Benefit up to $200Monthly non-alcoholic beverage reimbursement for coffee, tea, water, etc.Travel benefits including reimbursement for Clear membership; TSA PreCheck or Global Entry fees; passport application/renewal fees.Reimbursement for pet boarding/sitting expenses incurred while on approved company travel.Diversity at DefiantWe value diversity and do not discriminate based on race, color, religion or creed, national origin or ancestry, sex, age, physical or mental disability, military or veteran status, gender identity or expression, marital status, sexual orientation, political ideology, economic status, parental status, or any other non-performance-related status.
    0 Comments ·0 Shares ·55 Views
  • What made this project Gateway to Nature Centre by Oberlanders
    www.bdonline.co.uk
    Oberlanders made the shortlist at last yearsAYAs, as the practice was named a finalist forOne Off Small Project of the Year.In this series,we take a look at the teams entry project and ask the firms partner, Catriona Hill, and lead delivery architect, Catriona Kinghorn, to break down some of the biggest specification challenges that needed to be overcome.Source: Gordon Burniston and Ewan WetherspoonCorrieshalloch Gorge is a designated National Nature Reserve situated on a remote, sloped wilderness site twenty minutes from Ullapool in ScotlandWhat were the key requirements of the clients brief? How did you meet these both through design and specification?The National Trust for Scotlands new Gateway to Nature at Corrieshalloch Gorge provides essential facilities for visitors who have travelled a considerable distance.Set in a remote and rural landscape, Corrieshalloch Gorge Reserve previously had no utility connections and no facilities to serve its visitors or the staff who worked there. The key requirements of the clients brief were to discretely introduce new services with minimal impact on the landscape setting.This requirement was met through the careful positioning of the new centre and the specification of appropriate materials such as the larch used to clad the building and the landscape installations positioned along the new pathway.Source: Gordon Burniston and Ewan WetherspoonOberlanders was appointed to help the National Trust for Scotland realise their ambitions for a sensitively designed visitor centre to enhance the experience of visitors to the GorgeWhat were the biggest specification challenges on the project and how were these overcome?One of the biggest challenges was the specification of the larch cladding. Whilst the team was keen to use locally grown larch, the growing conditions in Scotland impacted the density of the larch, making it less suitable for use as a cladding material than European or Siberian larch.The preference has typically been for Siberian larch, which is denser and more durable. However, recent issues in Russia have affected the supply chain, making it impossible to procure. The contractor and architect worked hard with the supplier to source available Siberian larch and modify cladding board lengths to enable this to be used around the perimeter of the main building.To ensure even weathering a SIOO finish was specified which enables the larch to silver consistently regardless of whether the board is protected or exposed. The outcome is an evenly finished surface across the building, which will age and mature consistently over time.Source: Gordon Burniston and Ewan WetherspoonThe centre provides a place for arrival, orientation and interpretation before a visit to the gorge which runs through the NTS nature reserveWhat are the three biggest specification considerations for the project type? How did these specifically apply to your project?The key specification considerations were the robustness to exposed weather conditions, ease of procurement and cost. The order varied depending on the product.The National Trust for Scotland was keen to ensure that durability in the exposed landscape setting was considered in the selection of each product, alongside the use of natural materials suitable for the landscape setting. This applied to both the building and the landscape specification.The architect and landscape architect worked closely to ensure the hard surfacing and architectural finishes were compatible. Ease of procurement, which in this case extends to ease of ongoing maintenance, was necessary due to the remote location of the project. This was particularly relevant in the specification of building services, such as the rainwater harvesting system, which relies on the local supply chain to undertake the necessary annual servicing.Source: Gordon Burniston and Ewan WetherspoonThe building features a flicked canopy roof protecting visitors from the Scottish weather and collecting rainwater to be filtered and reusedDo you have a favourite product or material that was specified on the project?The oversailing roof, which acts as a generous umbrella in the typically wet environment, is overclad with powder-coated aluminium panels beautifully made by a local supplier. The roofline, often described as the brim of the hat, provides a defined line which caps the building and flicks up, signalling the start of the path and the beginning of the journey towards the gorge. The cladding panels have a double function as they define the roof edge and flash over the roof membrane.Are there any suppliers you collaborated with on the project that contributed significantly? And what was the most valuable service that they offered?The windows were specified from the Nordan product range and the local representative provided invaluable assistance during the specification and design development period.What did you think was the biggest success on the project?The biggest success of the project was collaboration between the client team, the contractor and the design team. There was a shared vision to produce a high-quality building, albeit small, in this remote and rural setting and ensure that every aspect was considered and delivered to the best outcome.Source: Gordon Burniston and Ewan WetherspoonThe pavilions are positioned to provide visitors with an auditory and visual experience of the nearby riverSource: Gordon Burniston and Ewan WetherspoonSource: Gordon Burniston and Ewan WetherspoonSource: Gordon Burniston and Ewan Wetherspoon1/4show captionProject detailsArchitect OberlandersExternal windows and doors NordanLarch cladding and SiOO:X treatment RusswoodCLT roof cassettes BinderholzSingle ply roof covering Alwitra ICBRooflights Lareine EngineeringZinc cladding and flashings VM ZincFall restraint systems SFSCommercial kitchen and servery JSSKEStaff kitchen HowdensSanitary cubicles VenestaCorian wash troughs CDUKAirblade wash and dry taps DysonFlooring TarkettResin-bound gravel AbacusGreen retaining wall GravitasLandscape seeding Scotia SeedsOur What made this project series highlights the outstanding work of our Architect of the Year finalists.To keep up-to-date with all the latest from the Architect of the Year Awards visithere.
    0 Comments ·0 Shares ·57 Views
  • Today's NYT Mini Crossword Answers for Friday, Feb. 28
    www.cnet.com
    Looking forthe most recentMini Crossword answer?Click here for today's Mini Crossword hints, as well as our daily answers and hints for The New York Times Wordle, Strands, Connections and Connections: Sports Edition puzzles.Today'sNYT Mini Crosswordthrows a bunch of color clues at you. I blanked on nearly all of them, so had to answer the other questions in order to solve the puzzle. Need some help with today's Mini Crossword? Read on. And if you could use some hints and guidance for daily solving, check out our Mini Crossword tips.The Mini Crossword is just one of many games in the Times' games collection. If you're looking for today's Wordle, Connections, Connections: Sports Edition and Strands answers, you can visitCNET's NYT puzzle hints page.Read more: Tips and Tricks for Solving The New York Times Mini CrosswordLet's get at those Mini Crossword clues and answers. The completed NYT Mini Crossword puzzle for Feb. 28, 2025. NYT/Screenshot by CNETMini across clues and answers Upgrade your inbox Get cnet insider From talking fridges to iPhones, our experts are here to help make the world a little less complicated. 1A clue: Italian tourist city you might be "inclined" to visitAnswer: PISA5A clue: Pink/orange shadeAnswer: CORAL6A clue: Green/yellow shadeAnswer: OLIVE7A clue: Red/pink shadeAnswer: ROSE8A clue: Silver/gray shadeAnswer: ASHMini down clues and answers1D clue: Many collared golf shirtsAnswer: POLO2D clue: From Dublin or DerryAnswer: IRISH3D clue: Command+S, on a MacAnswer: SAVE4D clue: Pint at a public houseAnswer: ALE5D clue: ___ Crawley, "Downton Abbey" countessAnswer: CORAHow to play more Mini CrosswordsThe New York Times Games section offers a large number of online games, but only some of them are free for all to play. You can play the current day's Mini Crossword for free, but you'll need a subscription to the Times Games section to play older puzzles from the archives.
    0 Comments ·0 Shares ·59 Views
  • Eiyuden Chronicle: Hundred Heroes First Story Expansion DLC Out This Week
    www.nintendolife.com
    "A prequel to the epic saga of the Hundred Heroes".Eiyuden Chronicle: Hundred Heroes was one of the many RPGs to arrive on the Switch in 2024 and developer Rabbit & Bear Studios has now started its rollout of the story expansion DLC.As previously revealed, this all kicks off with "The Chapter of Marisa" which arrives on the eShop this week. It will set you back $7.99 / 5.99 (or your regional equivalent). Of course, you'll need the main game to access this new prequel story content.Read the full article on nintendolife.com
    0 Comments ·0 Shares ·57 Views
  • Nintendo Switch Online Announces Removal Of Super Famicom Title
    www.nintendolife.com
    Goodbye, Super Soccer.So normally we're used to reporting about the latest retro additions to the Switch Online library, but today is a bit different...Read the full article on nintendolife.com
    0 Comments ·0 Shares ·58 Views
  • SEC says meme coins are not securities
    techcrunch.com
    The Securities and Exchange Commission issued guidance on Thursday saying it does not view most meme coins, which are crypto tokens that originated from memes, as securities under United States federal law.As a result, the SEC says it does not believe people who purchase or hold meme coins are protected by federal securities law, and that people who participate in the offer and sale of meme coins do not need to register their transactions with the Commission.The SECs new guidance comes roughly a month into U.S. President Donald Trumps second term, in which Trump issued an executive order to create the Department of Government Efficiency, led by Elon Musk, an independent government advisory agency thats named after the meme coin, Dogecoin. President Trump also launched a meme coin for his supporters, called $TRUMP, just days before his inauguration. Since its peak on January 19, the coin has lost $12 billion in value, The Telegraph reported on Thursday.Mark Uyeda, the SEC chairman appointed by Trump in January, previously signaled he would create clear regulatory lines around cryptocurrencies. On Uyedas first day in office, he announced the formation of a cryptocurrency task force.Uyedas SEC argues that meme coins are not securities in its view, because they do not generate a yield or convey rights to future income, profits, or assets of a business. Rather, the Commission says it views meme coins more like collectibles.Thursdays guidance on meme coins represents a stark contract compared to how the SEC considered meme coins under its former chairman, Gary Gensler. Gensler repeatedly called for crypto tokens, including meme coins, to be treated as securities, and told crypto service providers to proactively register with the SEC.Also on Thursday, the SEC announced it dismissed its lawsuit against Coinbase, the United States largest cryptocurrency exchange. For the last several years, the Commissions views on crypto have been largely expressed through enforcement actions without engaging the general public, said Uyeda in a statement. Its time for the Commission to rectify its approach and develop crypto policy in a more transparent manner.
    0 Comments ·0 Shares ·57 Views
  • Meta is reportedly planning a standalone AI chatbot app
    techcrunch.com
    In BriefPosted:3:06 PM PST February 27, 2025Image Credits:Jonathan Raa/NurPhoto / Getty ImagesMeta is reportedly planning a standalone AI chatbot appMeta reportedly plans to release a standalone app for its AI assistant, Meta AI, in a bid to better compete with AI-powered chatbots like OpenAIs ChatGPT and Googles Gemini.According to CNBC, Meta could launch a standalone Meta AI app as soon as the companys next fiscal quarter (April-June). Meta AI is currently only available to users via a website and Metas family of apps, including Facebook and WhatsApp. Meta also plans to test a paid subscription service for Meta AI thatll add unspecified capabilities to the assistant, per CNBC. The publication wasnt able to learn the price. Meta AI, which has over 700 million active monthly users, is a part of Metas multi-pronged strategy to become a dominant force in the AI space. The company has also aggressively released open models like Llama, which it believes could foster an ecosystem rivaling that of OpenAIs. Meta plans to host its first-ever AI-focused developer conference, LlamaCon, in late April.Topics
    0 Comments ·0 Shares ·56 Views
  • Showing differently between Niagara preview and Main viewport
    realtimevfx.com
    Why does my Niagara effect look different in the Niagara preview compared to the main viewport in Unreal Engine, and how can I make them match? 1 post - 1 participant Read full topic
    0 Comments ·0 Shares ·58 Views
  • The last page: out of space
    www.architectural-review.com
    The International Space Station (ISS) was built as a series of extensions; each module was launched to orbit individually and docked to a previous one. The most recent additions were made in 2021. The ISS also represents the quest to extend human habitation beyond Earth. In June 2024, two NASA astronauts arrived at the station for a week-long mission. As the ARs February 2025 issue went to print, they were still there, waiting to come homeRead stories from theExtensions issue2025-02-28AR EditorsShare
    0 Comments ·0 Shares ·58 Views