Everyone in AI is talking about Manus. We put it to the test.
www.technologyreview.com
Since general AI agent Manus was launched last week, it has spread online like wildfire. And not just in China either, where it was developed by Wuhan-based startup Butterfly Effect. Its made its way into the global conversation, with influential voices in tech, including Twitter co-founder Jack Dorsey and Hugging Face product lead Victor Mustar, praising its performance. Some have even dubbed it the second DeepSeek, drawing comparisons to the earlier AI model that took the industry by surpriseboth for its unexpected capabilities and its origin. Manus claims to be the world's first general AI agentleveraging multiple AI models (such as Anthropic's Claude 3.5 Sonnet and fine-tuned versions of Alibaba's open-source Qwen), and various independently operating agents to act autonomously on a wide range of tasks. (This is different from AI chatbots, including DeepSeek, that are based on a single large language model family and are primarily designed for conversational interactions.) Despite all the hype, very few people have had a chance to use it. Currently, under 1% of the users on the waitlist have received an invite code. (Its unclear how many people are on this waitlist, but for a sense of how much interest there is, Manuss Discord channel has more than 186,000 members.) MIT Technology Review was able to obtain access to Manus, and when I gave it a test drive, I found that using it feels like collaborating with a highly intelligent and efficient intern: While it occasionally lacks understanding of what its being asked to do, makes incorrect assumptions, or cuts corners to expedite tasks, it explains its reasoning clearly, is remarkably adaptable, and can improve substantially when provided with detailed instructions or feedback. Ultimately, its promising but not perfect. Just like its parent companys previous product, an AI assistant called Monica that was released in 2023, Manus is intended for a global audience. English is set as the default language, and its design is clean and minimalist. To get in, a user has to enter a valid invite code. Then the system directs users to a landing page that closely resembles those of ChatGPT or DeepSeek, with historical sessions displayed in a left-hand column and a chat input box in the center. The landing page also features sample tasks curated by the companyranging from business strategy development to interactive learning to customized audio meditation sessions. Like other reasoning-based agentic AI tools, such as ChatGPT DeepResearch, Manus is capable of breaking tasks down into steps and autonomously navigating the web to get the information it needs to complete tasks. What sets it apart is the Manus's Computer window, which allows users not only to observe what the agent is doing, but also intervene at any point. To put it to the test, I tasked Manus with three assignments: (1) compile a list of notable reporters covering China tech, (2) search for two-bedroom property listings in New York City, and (3) nominate potential candidates for Innovators Under 35, a list created by MIT Technology Review every year. Heres how it did: Task 1: The first list of reporters that Manus gave me contained only five names, with five honorable mentions below them. I noticed that it listed some journalists notable work while not others. I asked Manus why it did this. The reason it offered was hilariously simple: It got lazy. It was partly due to time constraints as I tried to expedite the research process, the agent told me. When I insisted on consistency and thoroughness, Manus responded with a comprehensive list of 30 journalists, noting their current outlet and listing notable work. (I was glad to see I made the cut, along with many of my beloved peers.) I was impressed that I was able to make top-level suggestions for changes, much as someone would with a real-life intern or assistant, and that it responded in kind. And while it initially overlooked some journalists employer status changes, when I asked it to revisit some results, it quickly corrected them. Another nice feature: the output was downloadable as a Word or Excel file, making it easy to edit or share with others. Manus hit a snag, though, when accessing journalists news articles behind paywalls; it frequently encountered CAPTCHA blocks. Since I was able to follow along step by step, I could easily take over to complete these, though many media sites still blocked the tool due to suspicious activity. I see potential for major improvements hereand it would be useful if a future version of Manus could proactively ask for help when it encounters these sorts of restrictions. Task 2: For the apartment search, I gave Manus a complex set of criteria, including a budget, and parameters for a spacious kitchen, outdoor space, access to downtown Manhattan, and a major train station within a seven-minute walk. Manus initially interpreted vague requirements like some kind of outdoor access too literally, completely excluding properties without a private terrace or balcony access. However, after more guidance and clarification, it was able to compile a broader and more helpful list, giving recommendations in tiers and neat bullet points. The final output felt straight from Wirecutter, containing subtitles like best overall, best value, and luxury option. This task (including the back and forth) took less than half an houra lot faster than compiling the list of journalists (which took a little over an hour), likely because property listings are more openly available and well-structured online. Task 3: This was the largest in scope: I asked Manus to nominate 50 people for this years Innovators Under 35 list. Producing this list is an enormous undertaking and we typically get hundreds of nominations every year. So I was curious to see how well Manus could do. It broke the task into steps, including reviewing past lists to understand selection criteria, creating a search strategy for identifying candidates, compiling names, and ensuring a diverse selection of candidates from all over the world. Developing a search strategy was the most time-consuming part for Manus. While it didnt explicitly outline its approach, the Manuss Computer window revealed the agent rapidly scrolling through websites of prestigious research universities, announcements of tech awards, and news articles. However, it again encountered obstacles when trying to access academic papers and paywalled media content. After three hours of scouring the internetduring which Manus (understandably) asked me multiple times whether I could narrow the searchit was only able to give me three candidates with full background profiles. When I pressed it again to provide a complete list of 50 names, it eventually generated one, but certain academic institutions and fields were heavily overrepresented, reflecting an incomplete research process. After I pointed out the issue and asked it to find five candidates from China, it managed to compile a solid five-name list, though the results skewed toward Chinese media darlings. Ultimately, I had to give up after the system warned that Manuss performance might decline if I kept inputting too much text. My assessment: Overall, Still, its not all smooth sailing. Manus can suffer from frequent crashes and system instability, and can struggle when asked to process large chunks of text. The message Due to the current high service load, tasks cannot be created. Please try again in a few minutes flashed on my screen a few times when starting new requests, and occasionally Manuss Computer froze on a certain page for a long period of time. It has a higher failure rate than ChatGPT DeepResearcha problem the team is addressing, according to Manuss chief scientist Peak Ji. That said, Chinese media outlet 36Kr reports that Manuss per-task cost is about $2, which is just one-tenth of DeepResearchs cost. If the Manus team strengthens its server infrastructure, I can see the tool becoming a preferred choice for individual users, particularly white-collar professionals, independent developers, and small teams. Finally, I think its really valuable that Manuss working process feels more transparent and collaborative. It actively asks questions along the way and retains key instructions as knowledge in its memory for future use, allowing for an easily customizable agentic experience. Its also really nice that each session is replayable and shareable. I expect I will keep using Manus for all sorts of tasks, both in my personal and professional lives. While I'm not sure the comparisons to DeepSeek are quite right, it serves as further evidence that Chinese AI companies are not just following in the footsteps of their Western counterparts. They are not just innovating on base models, but actively shaping the adoption of autonomous AI agents in their own way.
0 Commentarii ·0 Distribuiri ·79 Views