Pesquisar

Publicações

Blogs

Usuários

Páginas

Grupos

The Verge @TheVerge compartilhou um link
2025-06-15 07:26:01 ·

Google&#8217;s test turns search results into an AI-generated podcast

The option to generate an Audio Overview appears beneath the “People also ask” module.

Google is rolling out a test that puts its AI-powered Audio Overviews on the first page of search results on mobile. The experiment, which you can enable in Labs, will let you generate an AI podcast-style discussion for certain queries.

If you search for something like, “How do noise cancellation headphones work?”, Google will display a button beneath the “People also ask” module that says, “Generate Audio Overview.” Once you click the button, it will take up to 40 seconds to generate an Audio Overview, according to Google.

The completed Audio Overview will appear in a small player embedded within your search results, where you can play, pause, mute, and adjust the playback speed of the clip. Similar to Audio Overviews on NotebookLM and Gemini, this one also features two AI-generated “hosts” who enthusiastically discuss the topic you want to learn more about. You’ll also find links to some of the sources used by Audio Overview directly below the playback bar in Search.

Right now, Audio Overviews in Search is only available in English in the US. Google has started putting Audio Overviews in more places since the tool launched last year, allowing users to generate audio discussions based on notes, Gemini’s deep research, files in Google Docs, and more.
#googleampamp8217s #test #turns #search #results

Google&#8217;s test turns search results into an AI-generated podcast
The option to generate an Audio Overview appears beneath the “People also ask” module. Google is rolling out a test that puts its AI-powered Audio Overviews on the first page of search results on mobile. The experiment, which you can enable in Labs, will let you generate an AI podcast-style discussion for certain queries. If you search for something like, “How do noise cancellation headphones work?”, Google will display a button beneath the “People also ask” module that says, “Generate Audio Overview.” Once you click the button, it will take up to 40 seconds to generate an Audio Overview, according to Google. The completed Audio Overview will appear in a small player embedded within your search results, where you can play, pause, mute, and adjust the playback speed of the clip. Similar to Audio Overviews on NotebookLM and Gemini, this one also features two AI-generated “hosts” who enthusiastically discuss the topic you want to learn more about. You’ll also find links to some of the sources used by Audio Overview directly below the playback bar in Search. Right now, Audio Overviews in Search is only available in English in the US. Google has started putting Audio Overviews in more places since the tool launched last year, allowing users to generate audio discussions based on notes, Gemini’s deep research, files in Google Docs, and more. #googleampamp8217s #test #turns #search #results

Google&#8217;s test turns search results into an AI-generated podcast

www.theverge.com
The option to generate an Audio Overview appears beneath the “People also ask” module. Google is rolling out a test that puts its AI-powered Audio Overviews on the first page of search results on mobile. The experiment, which you can enable in Labs, will let you generate an AI podcast-style discussion for certain queries. If you search for something like, “How do noise cancellation headphones work?”, Google will display a button beneath the “People also ask” module that says, “Generate Audio Overview.” Once you click the button, it will take up to 40 seconds to generate an Audio Overview, according to Google. The completed Audio Overview will appear in a small player embedded within your search results, where you can play, pause, mute, and adjust the playback speed of the clip. Similar to Audio Overviews on NotebookLM and Gemini, this one also features two AI-generated “hosts” who enthusiastically discuss the topic you want to learn more about. You’ll also find links to some of the sources used by Audio Overview directly below the playback bar in Search. Right now, Audio Overviews in Search is only available in English in the US. Google has started putting Audio Overviews in more places since the tool launched last year, allowing users to generate audio discussions based on notes, Gemini’s deep research, files in Google Docs, and more.

0 Comentários ·0 Compartilhamentos ·0 Anterior

Faça Login para curtir, compartilhar e comentar!
PCWorld @PCWorld compartilhou um link
2025-06-15 07:01:54 ·

9 menial tasks ChatGPT can handle in seconds, saving you hours

ChatGPT is rapidly changing the world. The process is already happening, and it’s only going to accelerate as the technology improves, as more people gain access to it, and as more learn how to use it.
What’s shocking is just how many tasks ChatGPT is already capable of managing for you. While the naysayers may still look down their noses at the potential of AI assistants, I’ve been using it to handle all kinds of menial tasks for me. Here are my favorite examples.

Further reading: This tiny ChatGPT feature helps me tackle my days more productively

Write your emails for you
Dave Parrack / Foundry
We’ve all been faced with the tricky task of writing an email—whether personal or professional—but not knowing quite how to word it. ChatGPT can do the heavy lifting for you, penning theperfect email based on whatever information you feed it.
Let’s assume the email you need to write is of a professional nature, and wording it poorly could negatively affect your career. By directing ChatGPT to write the email with a particular structure, content, and tone of voice, you can give yourself a huge head start.
A winning tip for this is to never accept ChatGPT’s first attempt. Always read through it and look for areas of improvement, then request tweaks to ensure you get the best possible email. You canalso rewrite the email in your own voice. Learn more about how ChatGPT coached my colleague to write better emails.

Generate itineraries and schedules
Dave Parrack / Foundry
If you’re going on a trip but you’re the type of person who hates planning trips, then you should utilize ChatGPT’s ability to generate trip itineraries. The results can be customized to the nth degree depending on how much detail and instruction you’re willing to provide.
As someone who likes to get away at least once a year but also wants to make the most of every trip, leaning on ChatGPT for an itinerary is essential for me. I’ll provide the location and the kinds of things I want to see and do, then let it handle the rest. Instead of spending days researching everything myself, ChatGPT does 80 percent of it for me.
As with all of these tasks, you don’t need to accept ChatGPT’s first effort. Use different prompts to force the AI chatbot to shape the itinerary closer to what you want. You’d be surprised at how many cool ideas you’ll encounter this way—simply nix the ones you don’t like.

Break down difficult concepts
Dave Parrack / Foundry
One of the best tasks to assign to ChatGPT is the explanation of difficult concepts. Ask ChatGPT to explain any concept you can think of and it will deliver more often than not. You can tailor the level of explanation you need, and even have it include visual elements.
Let’s say, for example, that a higher-up at work regularly lectures everyone about the importance of networking. But maybe they never go into detail about what they mean, just constantly pushing the why without explaining the what. Well, just ask ChatGPT to explain networking!
Okay, most of us know what “networking” is and the concept isn’t very hard to grasp. But you can do this with anything. Ask ChatGPT to explain augmented reality, multi-threaded processing, blockchain, large language models, what have you. It will provide you with a clear and simple breakdown, maybe even with analogies and images.

Analyze and make tough decisions
Dave Parrack / Foundry
We all face tough decisions every so often. The next time you find yourself wrestling with a particularly tough one—and you just can’t decide one way or the other—try asking ChatGPT for guidance and advice.
It may sound strange to trust any kind of decision to artificial intelligence, let alone an important one that has you stumped, but doing so actually makes a lot of sense. While human judgment can be clouded by emotions, AI can set that aside and prioritize logic.
It should go without saying: you don’t have to accept ChatGPT’s answers. Use the AI to weigh the pros and cons, to help you understand what’s most important to you, and to suggest a direction. Who knows? If you find yourself not liking the answer given, that in itself might clarify what you actually want—and the right answer for you. This is the kind of stuff ChatGPT can do to improve your life.

Plan complex projects and strategies
Dave Parrack / Foundry
Most jobs come with some level of project planning and management. Even I, as a freelance writer, need to plan tasks to get projects completed on time. And that’s where ChatGPT can prove invaluable, breaking projects up into smaller, more manageable parts.
ChatGPT needs to know the nature of the project, the end goal, any constraints you may have, and what you have done so far. With that information, it can then break the project up with a step-by-step plan, and break it down further into phases.
If ChatGPT doesn’t initially split your project up in a way that suits you, try again. Change up the prompts and make the AI chatbot tune in to exactly what you’re looking for. It takes a bit of back and forth, but it can shorten your planning time from hours to mere minutes.

Compile research notes
Dave Parrack / Foundry
If you need to research a given topic of interest, ChatGPT can save you the hassle of compiling that research. For example, ahead of a trip to Croatia, I wanted to know more about the Croatian War of Independence, so I asked ChatGPT to provide me with a brief summary of the conflict with bullet points to help me understand how it happened.
After absorbing all that information, I asked ChatGPT to add a timeline of the major events, further helping me to understand how the conflict played out. ChatGPT then offered to provide me with battle maps and/or summaries, plus profiles of the main players.
You can go even deeper with ChatGPT’s Deep Research feature, which is now available to free users, up to 5 Deep Research tasks per month. With Deep Research, ChatGPT conducts multi-step research to generate comprehensive reportsbased on large amounts of information across the internet. A Deep Research task can take up to 30 minutes to complete, but it’ll save you hours or even days.

Summarize articles, meetings, and more
Dave Parrack / Foundry
There are only so many hours in the day, yet so many new articles published on the web day in and day out. When you come across extra-long reads, it can be helpful to run them through ChatGPT for a quick summary. Then, if the summary is lacking in any way, you can go back and plow through the article proper.
As an example, I ran one of my own PCWorld articlesthrough ChatGPT, which provided a brief summary of my points and broke down the best X alternative based on my reasons given. Interestingly, it also pulled elements from other articles.If you don’t want that, you can tell ChatGPT to limit its summary to the contents of the link.
This is a great trick to use for other long-form, text-heavy content that you just don’t have the time to crunch through. Think transcripts for interviews, lectures, videos, and Zoom meetings. The only caveat is to never share private details with ChatGPT, like company-specific data that’s protected by NDAs and the like.

Create Q&A flashcards for learning
Dave Parrack / Foundry
Flashcards can be extremely useful for drilling a lot of information into your brain, such as when studying for an exam, onboarding in a new role, prepping for an interview, etc. And with ChatGPT, you no longer have to painstakingly create those flashcards yourself. All you have to do is tell the AI the details of what you’re studying.
You can specify the format, as well as various other elements. You can also choose to keep things broad or target specific sub-topics or concepts you want to focus on. You can even upload your own notes for ChatGPT to reference. You can also use Google’s NotebookLM app in a similar way.

Provide interview practice
Dave Parrack / Foundry
Whether you’re a first-time jobseeker or have plenty of experience under your belt, it’s always a good idea to practice for your interviews when making career moves. Years ago, you might’ve had to ask a friend or family member to act as your mock interviewer. These days, ChatGPT can do it for you—and do it more effectively.
Inform ChatGPT of the job title, industry, and level of position you’re interviewing for, what kind of interview it’ll be, and anything else you want it to take into consideration. ChatGPT will then conduct a mock interview with you, providing feedback along the way.
When I tried this out myself, I was shocked by how capable ChatGPT can be at pretending to be a human in this context. And the feedback it provides for each answer you give is invaluable for knocking off your rough edges and improving your chances of success when you’re interviewed by a real hiring manager.
Further reading: Non-gimmicky AI apps I actually use every day
#menial #tasks #chatgpt #can #handle

9 menial tasks ChatGPT can handle in seconds, saving you hours
ChatGPT is rapidly changing the world. The process is already happening, and it’s only going to accelerate as the technology improves, as more people gain access to it, and as more learn how to use it. What’s shocking is just how many tasks ChatGPT is already capable of managing for you. While the naysayers may still look down their noses at the potential of AI assistants, I’ve been using it to handle all kinds of menial tasks for me. Here are my favorite examples. Further reading: This tiny ChatGPT feature helps me tackle my days more productively Write your emails for you Dave Parrack / Foundry We’ve all been faced with the tricky task of writing an email—whether personal or professional—but not knowing quite how to word it. ChatGPT can do the heavy lifting for you, penning theperfect email based on whatever information you feed it. Let’s assume the email you need to write is of a professional nature, and wording it poorly could negatively affect your career. By directing ChatGPT to write the email with a particular structure, content, and tone of voice, you can give yourself a huge head start. A winning tip for this is to never accept ChatGPT’s first attempt. Always read through it and look for areas of improvement, then request tweaks to ensure you get the best possible email. You canalso rewrite the email in your own voice. Learn more about how ChatGPT coached my colleague to write better emails. Generate itineraries and schedules Dave Parrack / Foundry If you’re going on a trip but you’re the type of person who hates planning trips, then you should utilize ChatGPT’s ability to generate trip itineraries. The results can be customized to the nth degree depending on how much detail and instruction you’re willing to provide. As someone who likes to get away at least once a year but also wants to make the most of every trip, leaning on ChatGPT for an itinerary is essential for me. I’ll provide the location and the kinds of things I want to see and do, then let it handle the rest. Instead of spending days researching everything myself, ChatGPT does 80 percent of it for me. As with all of these tasks, you don’t need to accept ChatGPT’s first effort. Use different prompts to force the AI chatbot to shape the itinerary closer to what you want. You’d be surprised at how many cool ideas you’ll encounter this way—simply nix the ones you don’t like. Break down difficult concepts Dave Parrack / Foundry One of the best tasks to assign to ChatGPT is the explanation of difficult concepts. Ask ChatGPT to explain any concept you can think of and it will deliver more often than not. You can tailor the level of explanation you need, and even have it include visual elements. Let’s say, for example, that a higher-up at work regularly lectures everyone about the importance of networking. But maybe they never go into detail about what they mean, just constantly pushing the why without explaining the what. Well, just ask ChatGPT to explain networking! Okay, most of us know what “networking” is and the concept isn’t very hard to grasp. But you can do this with anything. Ask ChatGPT to explain augmented reality, multi-threaded processing, blockchain, large language models, what have you. It will provide you with a clear and simple breakdown, maybe even with analogies and images. Analyze and make tough decisions Dave Parrack / Foundry We all face tough decisions every so often. The next time you find yourself wrestling with a particularly tough one—and you just can’t decide one way or the other—try asking ChatGPT for guidance and advice. It may sound strange to trust any kind of decision to artificial intelligence, let alone an important one that has you stumped, but doing so actually makes a lot of sense. While human judgment can be clouded by emotions, AI can set that aside and prioritize logic. It should go without saying: you don’t have to accept ChatGPT’s answers. Use the AI to weigh the pros and cons, to help you understand what’s most important to you, and to suggest a direction. Who knows? If you find yourself not liking the answer given, that in itself might clarify what you actually want—and the right answer for you. This is the kind of stuff ChatGPT can do to improve your life. Plan complex projects and strategies Dave Parrack / Foundry Most jobs come with some level of project planning and management. Even I, as a freelance writer, need to plan tasks to get projects completed on time. And that’s where ChatGPT can prove invaluable, breaking projects up into smaller, more manageable parts. ChatGPT needs to know the nature of the project, the end goal, any constraints you may have, and what you have done so far. With that information, it can then break the project up with a step-by-step plan, and break it down further into phases. If ChatGPT doesn’t initially split your project up in a way that suits you, try again. Change up the prompts and make the AI chatbot tune in to exactly what you’re looking for. It takes a bit of back and forth, but it can shorten your planning time from hours to mere minutes. Compile research notes Dave Parrack / Foundry If you need to research a given topic of interest, ChatGPT can save you the hassle of compiling that research. For example, ahead of a trip to Croatia, I wanted to know more about the Croatian War of Independence, so I asked ChatGPT to provide me with a brief summary of the conflict with bullet points to help me understand how it happened. After absorbing all that information, I asked ChatGPT to add a timeline of the major events, further helping me to understand how the conflict played out. ChatGPT then offered to provide me with battle maps and/or summaries, plus profiles of the main players. You can go even deeper with ChatGPT’s Deep Research feature, which is now available to free users, up to 5 Deep Research tasks per month. With Deep Research, ChatGPT conducts multi-step research to generate comprehensive reportsbased on large amounts of information across the internet. A Deep Research task can take up to 30 minutes to complete, but it’ll save you hours or even days. Summarize articles, meetings, and more Dave Parrack / Foundry There are only so many hours in the day, yet so many new articles published on the web day in and day out. When you come across extra-long reads, it can be helpful to run them through ChatGPT for a quick summary. Then, if the summary is lacking in any way, you can go back and plow through the article proper. As an example, I ran one of my own PCWorld articlesthrough ChatGPT, which provided a brief summary of my points and broke down the best X alternative based on my reasons given. Interestingly, it also pulled elements from other articles.If you don’t want that, you can tell ChatGPT to limit its summary to the contents of the link. This is a great trick to use for other long-form, text-heavy content that you just don’t have the time to crunch through. Think transcripts for interviews, lectures, videos, and Zoom meetings. The only caveat is to never share private details with ChatGPT, like company-specific data that’s protected by NDAs and the like. Create Q&A flashcards for learning Dave Parrack / Foundry Flashcards can be extremely useful for drilling a lot of information into your brain, such as when studying for an exam, onboarding in a new role, prepping for an interview, etc. And with ChatGPT, you no longer have to painstakingly create those flashcards yourself. All you have to do is tell the AI the details of what you’re studying. You can specify the format, as well as various other elements. You can also choose to keep things broad or target specific sub-topics or concepts you want to focus on. You can even upload your own notes for ChatGPT to reference. You can also use Google’s NotebookLM app in a similar way. Provide interview practice Dave Parrack / Foundry Whether you’re a first-time jobseeker or have plenty of experience under your belt, it’s always a good idea to practice for your interviews when making career moves. Years ago, you might’ve had to ask a friend or family member to act as your mock interviewer. These days, ChatGPT can do it for you—and do it more effectively. Inform ChatGPT of the job title, industry, and level of position you’re interviewing for, what kind of interview it’ll be, and anything else you want it to take into consideration. ChatGPT will then conduct a mock interview with you, providing feedback along the way. When I tried this out myself, I was shocked by how capable ChatGPT can be at pretending to be a human in this context. And the feedback it provides for each answer you give is invaluable for knocking off your rough edges and improving your chances of success when you’re interviewed by a real hiring manager. Further reading: Non-gimmicky AI apps I actually use every day #menial #tasks #chatgpt #can #handle

9 menial tasks ChatGPT can handle in seconds, saving you hours

www.pcworld.com
ChatGPT is rapidly changing the world. The process is already happening, and it’s only going to accelerate as the technology improves, as more people gain access to it, and as more learn how to use it. What’s shocking is just how many tasks ChatGPT is already capable of managing for you. While the naysayers may still look down their noses at the potential of AI assistants, I’ve been using it to handle all kinds of menial tasks for me. Here are my favorite examples. Further reading: This tiny ChatGPT feature helps me tackle my days more productively Write your emails for you Dave Parrack / Foundry We’ve all been faced with the tricky task of writing an email—whether personal or professional—but not knowing quite how to word it. ChatGPT can do the heavy lifting for you, penning the (hopefully) perfect email based on whatever information you feed it. Let’s assume the email you need to write is of a professional nature, and wording it poorly could negatively affect your career. By directing ChatGPT to write the email with a particular structure, content, and tone of voice, you can give yourself a huge head start. A winning tip for this is to never accept ChatGPT’s first attempt. Always read through it and look for areas of improvement, then request tweaks to ensure you get the best possible email. You can (and should) also rewrite the email in your own voice. Learn more about how ChatGPT coached my colleague to write better emails. Generate itineraries and schedules Dave Parrack / Foundry If you’re going on a trip but you’re the type of person who hates planning trips, then you should utilize ChatGPT’s ability to generate trip itineraries. The results can be customized to the nth degree depending on how much detail and instruction you’re willing to provide. As someone who likes to get away at least once a year but also wants to make the most of every trip, leaning on ChatGPT for an itinerary is essential for me. I’ll provide the location and the kinds of things I want to see and do, then let it handle the rest. Instead of spending days researching everything myself, ChatGPT does 80 percent of it for me. As with all of these tasks, you don’t need to accept ChatGPT’s first effort. Use different prompts to force the AI chatbot to shape the itinerary closer to what you want. You’d be surprised at how many cool ideas you’ll encounter this way—simply nix the ones you don’t like. Break down difficult concepts Dave Parrack / Foundry One of the best tasks to assign to ChatGPT is the explanation of difficult concepts. Ask ChatGPT to explain any concept you can think of and it will deliver more often than not. You can tailor the level of explanation you need, and even have it include visual elements. Let’s say, for example, that a higher-up at work regularly lectures everyone about the importance of networking. But maybe they never go into detail about what they mean, just constantly pushing the why without explaining the what. Well, just ask ChatGPT to explain networking! Okay, most of us know what “networking” is and the concept isn’t very hard to grasp. But you can do this with anything. Ask ChatGPT to explain augmented reality, multi-threaded processing, blockchain, large language models, what have you. It will provide you with a clear and simple breakdown, maybe even with analogies and images. Analyze and make tough decisions Dave Parrack / Foundry We all face tough decisions every so often. The next time you find yourself wrestling with a particularly tough one—and you just can’t decide one way or the other—try asking ChatGPT for guidance and advice. It may sound strange to trust any kind of decision to artificial intelligence, let alone an important one that has you stumped, but doing so actually makes a lot of sense. While human judgment can be clouded by emotions, AI can set that aside and prioritize logic. It should go without saying: you don’t have to accept ChatGPT’s answers. Use the AI to weigh the pros and cons, to help you understand what’s most important to you, and to suggest a direction. Who knows? If you find yourself not liking the answer given, that in itself might clarify what you actually want—and the right answer for you. This is the kind of stuff ChatGPT can do to improve your life. Plan complex projects and strategies Dave Parrack / Foundry Most jobs come with some level of project planning and management. Even I, as a freelance writer, need to plan tasks to get projects completed on time. And that’s where ChatGPT can prove invaluable, breaking projects up into smaller, more manageable parts. ChatGPT needs to know the nature of the project, the end goal, any constraints you may have, and what you have done so far. With that information, it can then break the project up with a step-by-step plan, and break it down further into phases (if required). If ChatGPT doesn’t initially split your project up in a way that suits you, try again. Change up the prompts and make the AI chatbot tune in to exactly what you’re looking for. It takes a bit of back and forth, but it can shorten your planning time from hours to mere minutes. Compile research notes Dave Parrack / Foundry If you need to research a given topic of interest, ChatGPT can save you the hassle of compiling that research. For example, ahead of a trip to Croatia, I wanted to know more about the Croatian War of Independence, so I asked ChatGPT to provide me with a brief summary of the conflict with bullet points to help me understand how it happened. After absorbing all that information, I asked ChatGPT to add a timeline of the major events, further helping me to understand how the conflict played out. ChatGPT then offered to provide me with battle maps and/or summaries, plus profiles of the main players. You can go even deeper with ChatGPT’s Deep Research feature, which is now available to free users, up to 5 Deep Research tasks per month. With Deep Research, ChatGPT conducts multi-step research to generate comprehensive reports (with citations!) based on large amounts of information across the internet. A Deep Research task can take up to 30 minutes to complete, but it’ll save you hours or even days. Summarize articles, meetings, and more Dave Parrack / Foundry There are only so many hours in the day, yet so many new articles published on the web day in and day out. When you come across extra-long reads, it can be helpful to run them through ChatGPT for a quick summary. Then, if the summary is lacking in any way, you can go back and plow through the article proper. As an example, I ran one of my own PCWorld articles (where I compared Bluesky and Threads as alternatives to X) through ChatGPT, which provided a brief summary of my points and broke down the best X alternative based on my reasons given. Interestingly, it also pulled elements from other articles. (Hmph.) If you don’t want that, you can tell ChatGPT to limit its summary to the contents of the link. This is a great trick to use for other long-form, text-heavy content that you just don’t have the time to crunch through. Think transcripts for interviews, lectures, videos, and Zoom meetings. The only caveat is to never share private details with ChatGPT, like company-specific data that’s protected by NDAs and the like. Create Q&A flashcards for learning Dave Parrack / Foundry Flashcards can be extremely useful for drilling a lot of information into your brain, such as when studying for an exam, onboarding in a new role, prepping for an interview, etc. And with ChatGPT, you no longer have to painstakingly create those flashcards yourself. All you have to do is tell the AI the details of what you’re studying. You can specify the format (such as Q&A or multiple choice), as well as various other elements. You can also choose to keep things broad or target specific sub-topics or concepts you want to focus on. You can even upload your own notes for ChatGPT to reference. You can also use Google’s NotebookLM app in a similar way. Provide interview practice Dave Parrack / Foundry Whether you’re a first-time jobseeker or have plenty of experience under your belt, it’s always a good idea to practice for your interviews when making career moves. Years ago, you might’ve had to ask a friend or family member to act as your mock interviewer. These days, ChatGPT can do it for you—and do it more effectively. Inform ChatGPT of the job title, industry, and level of position you’re interviewing for, what kind of interview it’ll be (e.g., screener, technical assessment, group/panel, one-on-one with CEO), and anything else you want it to take into consideration. ChatGPT will then conduct a mock interview with you, providing feedback along the way. When I tried this out myself, I was shocked by how capable ChatGPT can be at pretending to be a human in this context. And the feedback it provides for each answer you give is invaluable for knocking off your rough edges and improving your chances of success when you’re interviewed by a real hiring manager. Further reading: Non-gimmicky AI apps I actually use every day

0 Comentários ·0 Compartilhamentos ·0 Anterior

Faça Login para curtir, compartilhar e comentar!
The Verge @TheVerge compartilhou um link
2025-05-25 12:49:40 ·

Summer blockbuster season is here

Hi, friends! Welcome to Installer No. 84, your guide to the best and Verge-iest stuff in the world.This week, I’ve been reading about Mubi and Around The Horn and millennial tech, moving all my journals to Diarly, trying out Matt D’Avella’s workout routine, catching up on Clarkson’s Farm, wishing desperately that Philly Justice was a real show, watching a lot of Helper Cars with my toddler, testing the Sony WH-1000XM6 headphones, dusting off my Fortnite skills, and enjoying this unbelievably deep dive into the first Star Wars movie.I also have for you a new blockbuster movie, an old-new blockbuster mobile game, a new season of one of my all-time favorite shows, a cheap set-top box worth a look, and much more. Shockingly busy week! Let’s dig in.The Dropkind of can’t believe it! I fell off the Fortnite wagon pretty hard over the last year or so, but this and my Backbone Pro are going to be very good friends going forward. Zero Build only for me, though, at least on mobile.Mission: Impossible – The Final Reckoning. I am a forever fan of the M:I series, and as silly as I find the whole “AI is the bad guy” bit, I have had a good time watching every single movie in this series. I’ll be in a humungous theater for this one ASAP.Puzzmo for iOS. Puzzmo’s web app is great, so I haven’t exactly been thirsting for a better mobile experience. And, as far as I can tell, the mobile app is just exactly the same thing as the web app. But, hey, I like the icon, and I like any reason to play more Really Bad Chess.The Onn Google TV 4K Plus. “A weirdly named, super-cheap set-top box from Walmart” is not a great pitch. But for you’re not beating this thing’s combination of Google TV, Dolby Vision, and 4K. Onn stuff has been pretty good in the past, so I suspect this one will be pretty compelling.NotebookLM for mobile. The Android and iOS versions are both fine and both useful for the same reason: you can send stuff to your notebooks via the share sheet. If you’re a fan of the podcast-y Audio Overviews, they’re also a great thing to have on the go.. We haven’t had a new season of my favorite unhinged animation sci-fi show in a year and a half, and I am so very excited to get back to some intergalactic and cross-universe shenanigans. I’ve been debating doing a full rewatch of the whole show and might just have to do it after this season.The Virtual Stream Deck. This is so clever: Elgato is turning its collection of smart buttons from a lineup of gadgets to a full-on platform that you can either build into other hardware or just run on a screen. I can’t recommend it enough — spend some time programming all your repetitive computer tasks into a Stream Deck system.Monster Train 2. I love the structure of this game: a deck-building game that is endlessly repeatable but also complex enough that you never quite play the same game twice. I somehow missed the first game in the series entirely, and I’m going to have to give that a whirl, too. Strava routes. Strava’s an Installerverse favorite, and it got a bunch of new features this week. But, for my money, the biggest upgrade is the routing system, which generates the best route between two points; I love a good “map me the run to this donut shop” feature.In all the time I’ve been covering and paying attention to tech, there have been very few companies as bizarre and intriguing as OpenAI. The company is doing impressive, culture-shaking work, but it also seems to have an endless supply of weird internal drama and a total inability to figure out, like, what in the world it’s doing.Karen Hao has been covering the company longer than almost anybody, and she has firsthand knowledge of a lot of OpenAI’s twists and turns. This week, she published a terrific book, called Empire of AI: Dreams and Nightmares in Sam Altman’s OpenAI, which is about the company’s history and its future. But the book is more than that, too. It’s a really good look at what AI is doing to us as people, to our societies and our planets, and to the brains of the people building what they hope will make them rich or gods — or both.I’ve been a fan of Karen’s work for a long time, so I asked her to share her homescreen with us. I figured she’d either have, like, 30 AI apps or none at all, and I wanted to know. Here’s her homescreen, plus some info on the apps she uses and why:The phone: iPhone XR.The wallpaper: It’s usually a photo of me and my husband laughing hysterically at an inside joke at our wedding. But you’ll just have to imagine it because we’re really big on privacy. Enjoy this orange gradient instead. Orange is the color of creativity, of fire, of the sunrise and sunset, of beginnings and transition.The apps: Messages, Google Calendar, Photos, Camera, Clock, Apple Notes, Contacts, Settings, FaceTime, Calculator, Weather, Reminders, App Store, Gmail, Proton Mail, Phone, Brave.I have a very boring homescreen! I try not to use too many apps. When I set up a phone, the first thing I do is delete as many of the default apps as possible. But probably the two notable apps to call out: a couple years ago, I switched completely to the Brave browser, which is the lion icon at the bottom right of the screen. It’s based on Chrome, so you can keep all your plug-ins, but it blocks sites from tracking you to serve you targeted ads. It’s a simple way to not give up so much of your data and preserve your privacy. Highly recommended. The second: under my Audio folder, I have a guitar-tuning app, GuitarTuna, for the rare moments I fiddle with my guitar at home. Music was a big part of my childhood, but I haven’t made nearly enough time for it as an adult. I keep the app on my homescreen as an aspiration to pick it back up more seriously.I also asked Karen to share a few things that she’s into right now. Here’s what she sent back:The Empire podcast, cohosted by historian William Dalrymple and Anita Anand.Late-night comedy YouTube.CrowdsourcedHere’s what the Installer community is into this week. I want to know what you’re into right now, as well! Email installer@theverge.com or message me on Signal — @davidpierce.11 — with your recommendations for anything and everything, and we’ll feature some of our favorites here every week. For even more great recommendations, check out the replies to this post on Threads and this post on Bluesky.“YouTube has recently radicalized me to digital minimalism and decentralized tech. What started as deleting ALL social media from my iPhone and relegating the apps to my iPad is now firmly in the realm of buying old iPods from eBay and repairing them with modern parts. I have some replacement parts on the way from Elite Obsolete Electronics and with what I know now I should soon have a functional 6th gen iPod Classic that I can install RockBox on. I also picked up the ToAuto DS90 Soldering Station with the hopes of installing the USB-C mod in the near future.” — Nicholas“I know it was in last week’s Installer but I got the Sony WH-1000XM6s and they’re incredible. The ‘background listening’ feature is such a clever spin on spatial audio, it really does sound like it’s coming from a distance!” — Jamie“What if you could add any plain old QR Code/barcode card to your Apple Wallet? Lucky for you, the greatest minds of our time have come together to solve this inconvenience. Try IntoWallet and get as blown away as I was when it just worked.” — Teo“I’ve REALLY enjoyed the Revelation Space series by Alastair Reynolds. For lovers of hard sci-fi space operas this is for you. Engaging, dark, wild ideas and concepts, plenty of real and imagined science and physics all weaved into interesting stories.” — Tyler“I’ve personally managed to seriously build my meditation practice in the last two years using both Happier and Calm. I especially enjoy the meditations by teacher Jeff Warren, who strikes the right balance with his light and playful tone.” — Jeroen“I’ve had the Casper Glow lamp since 2019 and it’s still going strong! Love the interaction, twisting it and flipping it to control the light, and I even helped sell twoto an old roommate when he moved to his own place.” — SingYu“Post Andor I’ve been reading through Star Wars: The Rise and Fall of the Galactic Empire.” — Allen“Setup isfinished! Rocking a Teenage Engineering case, HP G4 Dock, UGREEN USB Switcher, and a standing desk from Facebook Marketplace.” — Jeremy Signing offThe big Installer-y news of the week is that Mozilla is shutting down Pocket. Which, well, sucks. Pocket was a good and popular app that did good and useful things! I heard from a bunch of you who are now looking for a place to go post-Pocket. I only really have three recommendations:Instapaper: the OG of the read-later world and still the simplest and most straightforward app you’ll find for the purpose. Brian, the developer, is good people, and I have high hopes for the longevity of the app.Matter: it’s only for iOS and web, but it’s the best-looking app in this space, and it’s not even close. They’re doing some nifty stuff with AI-enhanced reading, too.Readwise Reader: the power-user tool of choice, and my favorite of the bunch. It just has so many organizational features, great highlighting, and tons of integrations. It just does everything I need. It’s also way too much for most people. I suppose I should give Wallabag an honorable mention, because you can host it yourself, but it’s a much more involved project. If I were just moving over from Pocket and just wanted a nice place to read without a long list of other feature requests, I’d start with Instapaper. But all three are solid options, and they all make it pretty painless to import your old articles. Or just delete them all, start over, and feel the rare freedom of an almost-empty reading list. It’s pretty nice.See you next week!See More:
#summer #blockbuster #season #here

Summer blockbuster season is here
Hi, friends! Welcome to Installer No. 84, your guide to the best and Verge-iest stuff in the world.This week, I’ve been reading about Mubi and Around The Horn and millennial tech, moving all my journals to Diarly, trying out Matt D’Avella’s workout routine, catching up on Clarkson’s Farm, wishing desperately that Philly Justice was a real show, watching a lot of Helper Cars with my toddler, testing the Sony WH-1000XM6 headphones, dusting off my Fortnite skills, and enjoying this unbelievably deep dive into the first Star Wars movie.I also have for you a new blockbuster movie, an old-new blockbuster mobile game, a new season of one of my all-time favorite shows, a cheap set-top box worth a look, and much more. Shockingly busy week! Let’s dig in.The Dropkind of can’t believe it! I fell off the Fortnite wagon pretty hard over the last year or so, but this and my Backbone Pro are going to be very good friends going forward. Zero Build only for me, though, at least on mobile.Mission: Impossible – The Final Reckoning. I am a forever fan of the M:I series, and as silly as I find the whole “AI is the bad guy” bit, I have had a good time watching every single movie in this series. I’ll be in a humungous theater for this one ASAP.Puzzmo for iOS. Puzzmo’s web app is great, so I haven’t exactly been thirsting for a better mobile experience. And, as far as I can tell, the mobile app is just exactly the same thing as the web app. But, hey, I like the icon, and I like any reason to play more Really Bad Chess.The Onn Google TV 4K Plus. “A weirdly named, super-cheap set-top box from Walmart” is not a great pitch. But for you’re not beating this thing’s combination of Google TV, Dolby Vision, and 4K. Onn stuff has been pretty good in the past, so I suspect this one will be pretty compelling.NotebookLM for mobile. The Android and iOS versions are both fine and both useful for the same reason: you can send stuff to your notebooks via the share sheet. If you’re a fan of the podcast-y Audio Overviews, they’re also a great thing to have on the go.. We haven’t had a new season of my favorite unhinged animation sci-fi show in a year and a half, and I am so very excited to get back to some intergalactic and cross-universe shenanigans. I’ve been debating doing a full rewatch of the whole show and might just have to do it after this season.The Virtual Stream Deck. This is so clever: Elgato is turning its collection of smart buttons from a lineup of gadgets to a full-on platform that you can either build into other hardware or just run on a screen. I can’t recommend it enough — spend some time programming all your repetitive computer tasks into a Stream Deck system.Monster Train 2. I love the structure of this game: a deck-building game that is endlessly repeatable but also complex enough that you never quite play the same game twice. I somehow missed the first game in the series entirely, and I’m going to have to give that a whirl, too. Strava routes. Strava’s an Installerverse favorite, and it got a bunch of new features this week. But, for my money, the biggest upgrade is the routing system, which generates the best route between two points; I love a good “map me the run to this donut shop” feature.In all the time I’ve been covering and paying attention to tech, there have been very few companies as bizarre and intriguing as OpenAI. The company is doing impressive, culture-shaking work, but it also seems to have an endless supply of weird internal drama and a total inability to figure out, like, what in the world it’s doing.Karen Hao has been covering the company longer than almost anybody, and she has firsthand knowledge of a lot of OpenAI’s twists and turns. This week, she published a terrific book, called Empire of AI: Dreams and Nightmares in Sam Altman’s OpenAI, which is about the company’s history and its future. But the book is more than that, too. It’s a really good look at what AI is doing to us as people, to our societies and our planets, and to the brains of the people building what they hope will make them rich or gods — or both.I’ve been a fan of Karen’s work for a long time, so I asked her to share her homescreen with us. I figured she’d either have, like, 30 AI apps or none at all, and I wanted to know. Here’s her homescreen, plus some info on the apps she uses and why:The phone: iPhone XR.The wallpaper: It’s usually a photo of me and my husband laughing hysterically at an inside joke at our wedding. But you’ll just have to imagine it because we’re really big on privacy. Enjoy this orange gradient instead. Orange is the color of creativity, of fire, of the sunrise and sunset, of beginnings and transition.The apps: Messages, Google Calendar, Photos, Camera, Clock, Apple Notes, Contacts, Settings, FaceTime, Calculator, Weather, Reminders, App Store, Gmail, Proton Mail, Phone, Brave.I have a very boring homescreen! I try not to use too many apps. When I set up a phone, the first thing I do is delete as many of the default apps as possible. But probably the two notable apps to call out: a couple years ago, I switched completely to the Brave browser, which is the lion icon at the bottom right of the screen. It’s based on Chrome, so you can keep all your plug-ins, but it blocks sites from tracking you to serve you targeted ads. It’s a simple way to not give up so much of your data and preserve your privacy. Highly recommended. The second: under my Audio folder, I have a guitar-tuning app, GuitarTuna, for the rare moments I fiddle with my guitar at home. Music was a big part of my childhood, but I haven’t made nearly enough time for it as an adult. I keep the app on my homescreen as an aspiration to pick it back up more seriously.I also asked Karen to share a few things that she’s into right now. Here’s what she sent back:The Empire podcast, cohosted by historian William Dalrymple and Anita Anand.Late-night comedy YouTube.CrowdsourcedHere’s what the Installer community is into this week. I want to know what you’re into right now, as well! Email installer@theverge.com or message me on Signal — @davidpierce.11 — with your recommendations for anything and everything, and we’ll feature some of our favorites here every week. For even more great recommendations, check out the replies to this post on Threads and this post on Bluesky.“YouTube has recently radicalized me to digital minimalism and decentralized tech. What started as deleting ALL social media from my iPhone and relegating the apps to my iPad is now firmly in the realm of buying old iPods from eBay and repairing them with modern parts. I have some replacement parts on the way from Elite Obsolete Electronics and with what I know now I should soon have a functional 6th gen iPod Classic that I can install RockBox on. I also picked up the ToAuto DS90 Soldering Station with the hopes of installing the USB-C mod in the near future.” — Nicholas“I know it was in last week’s Installer but I got the Sony WH-1000XM6s and they’re incredible. The ‘background listening’ feature is such a clever spin on spatial audio, it really does sound like it’s coming from a distance!” — Jamie“What if you could add any plain old QR Code/barcode card to your Apple Wallet? Lucky for you, the greatest minds of our time have come together to solve this inconvenience. Try IntoWallet and get as blown away as I was when it just worked.” — Teo“I’ve REALLY enjoyed the Revelation Space series by Alastair Reynolds. For lovers of hard sci-fi space operas this is for you. Engaging, dark, wild ideas and concepts, plenty of real and imagined science and physics all weaved into interesting stories.” — Tyler“I’ve personally managed to seriously build my meditation practice in the last two years using both Happier and Calm. I especially enjoy the meditations by teacher Jeff Warren, who strikes the right balance with his light and playful tone.” — Jeroen“I’ve had the Casper Glow lamp since 2019 and it’s still going strong! Love the interaction, twisting it and flipping it to control the light, and I even helped sell twoto an old roommate when he moved to his own place.” — SingYu“Post Andor I’ve been reading through Star Wars: The Rise and Fall of the Galactic Empire.” — Allen“Setup isfinished! Rocking a Teenage Engineering case, HP G4 Dock, UGREEN USB Switcher, and a standing desk from Facebook Marketplace.” — Jeremy Signing offThe big Installer-y news of the week is that Mozilla is shutting down Pocket. Which, well, sucks. Pocket was a good and popular app that did good and useful things! I heard from a bunch of you who are now looking for a place to go post-Pocket. I only really have three recommendations:Instapaper: the OG of the read-later world and still the simplest and most straightforward app you’ll find for the purpose. Brian, the developer, is good people, and I have high hopes for the longevity of the app.Matter: it’s only for iOS and web, but it’s the best-looking app in this space, and it’s not even close. They’re doing some nifty stuff with AI-enhanced reading, too.Readwise Reader: the power-user tool of choice, and my favorite of the bunch. It just has so many organizational features, great highlighting, and tons of integrations. It just does everything I need. It’s also way too much for most people. I suppose I should give Wallabag an honorable mention, because you can host it yourself, but it’s a much more involved project. If I were just moving over from Pocket and just wanted a nice place to read without a long list of other feature requests, I’d start with Instapaper. But all three are solid options, and they all make it pretty painless to import your old articles. Or just delete them all, start over, and feel the rare freedom of an almost-empty reading list. It’s pretty nice.See you next week!See More: #summer #blockbuster #season #here

Summer blockbuster season is here

www.theverge.com
Hi, friends! Welcome to Installer No. 84, your guide to the best and Verge-iest stuff in the world. (If you’re new here, welcome, so psyched you found us, and also you can read all the old editions at the Installer homepage.) This week, I’ve been reading about Mubi and Around The Horn and millennial tech, moving all my journals to Diarly, trying out Matt D’Avella’s workout routine, catching up on Clarkson’s Farm, wishing desperately that Philly Justice was a real show, watching a lot of Helper Cars with my toddler, testing the Sony WH-1000XM6 headphones, dusting off my Fortnite skills, and enjoying this unbelievably deep dive into the first Star Wars movie.I also have for you a new blockbuster movie, an old-new blockbuster mobile game, a new season of one of my all-time favorite shows, a cheap set-top box worth a look, and much more. Shockingly busy week! Let’s dig in.(As always, the best part of Installer is your ideas and tips. What are you playing / reading / listening to / watching / plugging into things / poking with a stick this week? Tell me everything: installer@theverge.com. And if you know someone else who might enjoy Installer, tell them to subscribe here. Subscribers get every issue in their inbox, for free, a day before it hits the website.)The Dropkind of can’t believe it! I fell off the Fortnite wagon pretty hard over the last year or so, but this and my Backbone Pro are going to be very good friends going forward. Zero Build only for me, though, at least on mobile.Mission: Impossible – The Final Reckoning. I am a forever fan of the M:I series, and as silly as I find the whole “AI is the bad guy” bit, I have had a good time watching every single movie in this series. I’ll be in a humungous theater for this one ASAP.Puzzmo for iOS. Puzzmo’s web app is great, so I haven’t exactly been thirsting for a better mobile experience. And, as far as I can tell, the mobile app is just exactly the same thing as the web app. But, hey, I like the icon, and I like any reason to play more Really Bad Chess.The Onn Google TV 4K Plus. “A weirdly named, super-cheap set-top box from Walmart” is not a great pitch. But for $30, you’re not beating this thing’s combination of Google TV, Dolby Vision, and 4K. Onn stuff has been pretty good in the past, so I suspect this one will be pretty compelling.NotebookLM for mobile. The Android and iOS versions are both fine and both useful for the same reason: you can send stuff to your notebooks via the share sheet. If you’re a fan of the podcast-y Audio Overviews, they’re also a great thing to have on the go.. We haven’t had a new season of my favorite unhinged animation sci-fi show in a year and a half, and I am so very excited to get back to some intergalactic and cross-universe shenanigans. I’ve been debating doing a full rewatch of the whole show and might just have to do it after this season.The Virtual Stream Deck. This is so clever: Elgato is turning its collection of smart buttons from a lineup of gadgets to a full-on platform that you can either build into other hardware or just run on a screen. I can’t recommend it enough — spend some time programming all your repetitive computer tasks into a Stream Deck system.Monster Train 2. I love the structure of this game: a deck-building game that is endlessly repeatable but also complex enough that you never quite play the same game twice. I somehow missed the first game in the series entirely, and I’m going to have to give that a whirl, too. Strava routes. Strava’s an Installerverse favorite, and it got a bunch of new features this week. But, for my money, the biggest upgrade is the routing system, which generates the best route between two points; I love a good “map me the run to this donut shop” feature.In all the time I’ve been covering and paying attention to tech, there have been very few companies as bizarre and intriguing as OpenAI. The company is doing impressive, culture-shaking work, but it also seems to have an endless supply of weird internal drama and a total inability to figure out, like, what in the world it’s doing.Karen Hao has been covering the company longer than almost anybody, and she has firsthand knowledge of a lot of OpenAI’s twists and turns. This week, she published a terrific book, called Empire of AI: Dreams and Nightmares in Sam Altman’s OpenAI, which is about the company’s history and its future. But the book is more than that, too. It’s a really good look at what AI is doing to us as people, to our societies and our planets, and to the brains of the people building what they hope will make them rich or gods — or both.I’ve been a fan of Karen’s work for a long time, so I asked her to share her homescreen with us. I figured she’d either have, like, 30 AI apps or none at all, and I wanted to know. Here’s her homescreen, plus some info on the apps she uses and why:The phone: iPhone XR.The wallpaper: It’s usually a photo of me and my husband laughing hysterically at an inside joke at our wedding. But you’ll just have to imagine it because we’re really big on privacy. Enjoy this orange gradient instead. Orange is the color of creativity, of fire, of the sunrise and sunset, of beginnings and transition.The apps: Messages, Google Calendar, Photos, Camera, Clock, Apple Notes, Contacts, Settings, FaceTime, Calculator, Weather, Reminders, App Store, Gmail, Proton Mail, Phone, Brave.I have a very boring homescreen! I try not to use too many apps. When I set up a phone, the first thing I do is delete as many of the default apps as possible. But probably the two notable apps to call out: a couple years ago, I switched completely to the Brave browser, which is the lion icon at the bottom right of the screen. It’s based on Chrome, so you can keep all your plug-ins, but it blocks sites from tracking you to serve you targeted ads. It’s a simple way to not give up so much of your data and preserve your privacy. Highly recommended. The second: under my Audio folder, I have a guitar-tuning app, GuitarTuna, for the rare moments I fiddle with my guitar at home. Music was a big part of my childhood, but I haven’t made nearly enough time for it as an adult. I keep the app on my homescreen as an aspiration to pick it back up more seriously.I also asked Karen to share a few things that she’s into right now. Here’s what she sent back:The Empire podcast, cohosted by historian William Dalrymple and Anita Anand.Late-night comedy YouTube.CrowdsourcedHere’s what the Installer community is into this week. I want to know what you’re into right now, as well! Email installer@theverge.com or message me on Signal — @davidpierce.11 — with your recommendations for anything and everything, and we’ll feature some of our favorites here every week. For even more great recommendations, check out the replies to this post on Threads and this post on Bluesky.“YouTube has recently radicalized me to digital minimalism and decentralized tech. What started as deleting ALL social media from my iPhone and relegating the apps to my iPad is now firmly in the realm of buying old iPods from eBay and repairing them with modern parts. I have some replacement parts on the way from Elite Obsolete Electronics and with what I know now I should soon have a functional 6th gen iPod Classic that I can install RockBox on. I also picked up the ToAuto DS90 Soldering Station with the hopes of installing the USB-C mod in the near future.” — Nicholas“I know it was in last week’s Installer but I got the Sony WH-1000XM6s and they’re incredible. The ‘background listening’ feature is such a clever spin on spatial audio, it really does sound like it’s coming from a distance!” — Jamie“What if you could add any plain old QR Code/barcode card to your Apple Wallet? Lucky for you, the greatest minds of our time have come together to solve this inconvenience. Try IntoWallet and get as blown away as I was when it just worked (also the level of customization and the price are great!).” — Teo“I’ve REALLY enjoyed the Revelation Space series by Alastair Reynolds. For lovers of hard sci-fi space operas this is for you. Engaging, dark, wild ideas and concepts, plenty of real and imagined science and physics all weaved into interesting stories.” — Tyler“I’ve personally managed to seriously build my meditation practice in the last two years using both Happier and Calm. I especially enjoy the meditations by teacher Jeff Warren, who strikes the right balance with his light and playful tone.” — Jeroen“I’ve had the Casper Glow lamp since 2019 and it’s still going strong! Love the interaction, twisting it and flipping it to control the light, and I even helped sell two (unsponsored) to an old roommate when he moved to his own place.” — SingYu“Post Andor I’ve been reading through Star Wars: The Rise and Fall of the Galactic Empire.” — Allen“Setup is (90%) finished! Rocking a Teenage Engineering case, HP G4 Dock, UGREEN USB Switcher, and a $60 standing desk from Facebook Marketplace.” — Jeremy Signing offThe big Installer-y news of the week is that Mozilla is shutting down Pocket. Which, well, sucks. Pocket was a good and popular app that did good and useful things! I heard from a bunch of you who are now looking for a place to go post-Pocket. I only really have three recommendations:Instapaper: the OG of the read-later world and still the simplest and most straightforward app you’ll find for the purpose. Brian, the developer, is good people, and I have high hopes for the longevity of the app.Matter: it’s only for iOS and web, but it’s the best-looking app in this space, and it’s not even close. They’re doing some nifty stuff with AI-enhanced reading, too.Readwise Reader: the power-user tool of choice, and my favorite of the bunch. It just has so many organizational features, great highlighting, and tons of integrations. It just does everything I need. It’s also way too much for most people. I suppose I should give Wallabag an honorable mention, because you can host it yourself, but it’s a much more involved project. If I were just moving over from Pocket and just wanted a nice place to read without a long list of other feature requests, I’d start with Instapaper. But all three are solid options, and they all make it pretty painless to import your old articles. Or just delete them all, start over, and feel the rare freedom of an almost-empty reading list. It’s pretty nice.See you next week!See More:

0 Comentários ·0 Compartilhamentos ·0 Anterior

Faça Login para curtir, compartilhar e comentar!
Marktechpost AI @MarktechpostAI compartilhou um link
2025-05-24 08:41:53 ·

This AI Paper Introduces Group Think: A Token-Level Multi-Agent Reasoning Paradigm for Faster and Collaborative LLM Inference

A prominent area of exploration involves enabling large language modelsto function collaboratively. Multi-agent systems powered by LLMs are now being examined for their potential to coordinate challenging problems by splitting tasks and working simultaneously. This direction has gained attention due to its potential to increase efficiency and reduce latency in real-time applications.
A common issue in collaborative LLM systems is agents’ sequential, turn-based communication. In such systems, each agent must wait for others to complete their reasoning steps before proceeding. This slows down processing, especially in situations demanding rapid responses. Moreover, agents often duplicate efforts or generate inconsistent outputs, as they cannot see the evolving thoughts of their peers during generation. This latency and redundancy reduce the practicality of deploying multi-agent LLMs, particularly when time and computation are constrained, such as edge devices.

Most current solutions have relied on sequential or independently parallel sampling techniques to improve reasoning. Methods like Chain-of-Thought prompting help models to solve problems in a structured way but often come with increased inference time. Approaches such as Tree-of-Thoughts and Graph-of-Thoughts expand on this by branching reasoning paths. However, these approaches still do not allow for real-time mutual adaptation among agents. Multi-agent setups have explored collaborative methods, but mostly through alternating message exchanges, which again introduces delays. Some advanced systems propose complex dynamic scheduling or role-based configurations, which are not optimized for efficient inference.
Research from MediaTek Research introduced a new method called Group Think. This approach enables multiple reasoning agents within a single LLM to operate concurrently, observing each other’s partial outputs at the token level. Each reasoning thread adapts to the evolving thoughts of the others mid-generation. This mechanism reduces duplication and enables agents to shift direction if another thread is better positioned to continue a specific line of reasoning. Group Think is implemented through a token-level attention mechanism that lets each agent attend to previously generated tokens from all agents, supporting real-time collaboration.
The method works by assigning each agent its own sequence of token indices, allowing their outputs to be interleaved in memory. These interleaved tokens are stored in a shared cache accessible to all agents during generation. This design allows efficient attention across reasoning threads without architectural changes to the transformer model. The implementation works both on personal devices and in data centers. On local devices, it effectively uses idle compute by batching multiple agent outputs, even with a batch size of one. In data centers, Group Think allows multiple requests to be processed together, interleaving tokens across agents while maintaining correct attention dynamics.

Performance tests demonstrate that Group Think significantly improves latency and output quality. In enumeration tasks, such as listing 100 distinct names, it achieved near-complete results more rapidly than conventional Chain-of-Thought approaches. The acceleration was proportional to the number of thinkers; for example, four thinkers reduced latency by a factor of about four. In divide-and-conquer problems, using the Floyd–Warshall algorithm on a graph of five nodes, four thinkers reduced the completion time to half that of a single agent. Group Think solved code generation challenges in programming tasks more effectively than baseline models. With four or more thinkers, the model produced correct code segments much faster than traditional reasoning models.
This research shows that existing LLMs, though not explicitly trained for collaboration, can already demonstrate emergent group reasoning behaviors under the Group Think setup. In experiments, agents naturally diversified their work to avoid redundancy, often dividing tasks by topic or focus area. These findings suggest that Group Think’s efficiency and sophistication could be enhanced further with dedicated training on collaborative data.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter.
NikhilNikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute.Nikhilhttps://www.marktechpost.com/author/nikhil0980/Researchers from the National University of Singapore Introduce ‘Thinkless,’ an Adaptive Framework that Reduces Unnecessary Reasoning by up to 90% Using DeGRPONikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces MathCoder-VL and FigCodifier: Advancing Multimodal Mathematical Reasoning with Vision-to-Code AlignmentNikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces PARSCALE: A Parallel Computation Method for Efficient and Scalable Language Model DeploymentNikhilhttps://www.marktechpost.com/author/nikhil0980/Google AI Releases Standalone NotebookLM Mobile App with Offline Audio and Seamless Source Integration
#this #paper #introduces #group #think

This AI Paper Introduces Group Think: A Token-Level Multi-Agent Reasoning Paradigm for Faster and Collaborative LLM Inference
A prominent area of exploration involves enabling large language modelsto function collaboratively. Multi-agent systems powered by LLMs are now being examined for their potential to coordinate challenging problems by splitting tasks and working simultaneously. This direction has gained attention due to its potential to increase efficiency and reduce latency in real-time applications. A common issue in collaborative LLM systems is agents’ sequential, turn-based communication. In such systems, each agent must wait for others to complete their reasoning steps before proceeding. This slows down processing, especially in situations demanding rapid responses. Moreover, agents often duplicate efforts or generate inconsistent outputs, as they cannot see the evolving thoughts of their peers during generation. This latency and redundancy reduce the practicality of deploying multi-agent LLMs, particularly when time and computation are constrained, such as edge devices. Most current solutions have relied on sequential or independently parallel sampling techniques to improve reasoning. Methods like Chain-of-Thought prompting help models to solve problems in a structured way but often come with increased inference time. Approaches such as Tree-of-Thoughts and Graph-of-Thoughts expand on this by branching reasoning paths. However, these approaches still do not allow for real-time mutual adaptation among agents. Multi-agent setups have explored collaborative methods, but mostly through alternating message exchanges, which again introduces delays. Some advanced systems propose complex dynamic scheduling or role-based configurations, which are not optimized for efficient inference. Research from MediaTek Research introduced a new method called Group Think. This approach enables multiple reasoning agents within a single LLM to operate concurrently, observing each other’s partial outputs at the token level. Each reasoning thread adapts to the evolving thoughts of the others mid-generation. This mechanism reduces duplication and enables agents to shift direction if another thread is better positioned to continue a specific line of reasoning. Group Think is implemented through a token-level attention mechanism that lets each agent attend to previously generated tokens from all agents, supporting real-time collaboration. The method works by assigning each agent its own sequence of token indices, allowing their outputs to be interleaved in memory. These interleaved tokens are stored in a shared cache accessible to all agents during generation. This design allows efficient attention across reasoning threads without architectural changes to the transformer model. The implementation works both on personal devices and in data centers. On local devices, it effectively uses idle compute by batching multiple agent outputs, even with a batch size of one. In data centers, Group Think allows multiple requests to be processed together, interleaving tokens across agents while maintaining correct attention dynamics. Performance tests demonstrate that Group Think significantly improves latency and output quality. In enumeration tasks, such as listing 100 distinct names, it achieved near-complete results more rapidly than conventional Chain-of-Thought approaches. The acceleration was proportional to the number of thinkers; for example, four thinkers reduced latency by a factor of about four. In divide-and-conquer problems, using the Floyd–Warshall algorithm on a graph of five nodes, four thinkers reduced the completion time to half that of a single agent. Group Think solved code generation challenges in programming tasks more effectively than baseline models. With four or more thinkers, the model produced correct code segments much faster than traditional reasoning models. This research shows that existing LLMs, though not explicitly trained for collaboration, can already demonstrate emergent group reasoning behaviors under the Group Think setup. In experiments, agents naturally diversified their work to avoid redundancy, often dividing tasks by topic or focus area. These findings suggest that Group Think’s efficiency and sophistication could be enhanced further with dedicated training on collaborative data. Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter. NikhilNikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute.Nikhilhttps://www.marktechpost.com/author/nikhil0980/Researchers from the National University of Singapore Introduce ‘Thinkless,’ an Adaptive Framework that Reduces Unnecessary Reasoning by up to 90% Using DeGRPONikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces MathCoder-VL and FigCodifier: Advancing Multimodal Mathematical Reasoning with Vision-to-Code AlignmentNikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces PARSCALE: A Parallel Computation Method for Efficient and Scalable Language Model DeploymentNikhilhttps://www.marktechpost.com/author/nikhil0980/Google AI Releases Standalone NotebookLM Mobile App with Offline Audio and Seamless Source Integration #this #paper #introduces #group #think

This AI Paper Introduces Group Think: A Token-Level Multi-Agent Reasoning Paradigm for Faster and Collaborative LLM Inference

www.marktechpost.com
A prominent area of exploration involves enabling large language models (LLMs) to function collaboratively. Multi-agent systems powered by LLMs are now being examined for their potential to coordinate challenging problems by splitting tasks and working simultaneously. This direction has gained attention due to its potential to increase efficiency and reduce latency in real-time applications. A common issue in collaborative LLM systems is agents’ sequential, turn-based communication. In such systems, each agent must wait for others to complete their reasoning steps before proceeding. This slows down processing, especially in situations demanding rapid responses. Moreover, agents often duplicate efforts or generate inconsistent outputs, as they cannot see the evolving thoughts of their peers during generation. This latency and redundancy reduce the practicality of deploying multi-agent LLMs, particularly when time and computation are constrained, such as edge devices. Most current solutions have relied on sequential or independently parallel sampling techniques to improve reasoning. Methods like Chain-of-Thought prompting help models to solve problems in a structured way but often come with increased inference time. Approaches such as Tree-of-Thoughts and Graph-of-Thoughts expand on this by branching reasoning paths. However, these approaches still do not allow for real-time mutual adaptation among agents. Multi-agent setups have explored collaborative methods, but mostly through alternating message exchanges, which again introduces delays. Some advanced systems propose complex dynamic scheduling or role-based configurations, which are not optimized for efficient inference. Research from MediaTek Research introduced a new method called Group Think. This approach enables multiple reasoning agents within a single LLM to operate concurrently, observing each other’s partial outputs at the token level. Each reasoning thread adapts to the evolving thoughts of the others mid-generation. This mechanism reduces duplication and enables agents to shift direction if another thread is better positioned to continue a specific line of reasoning. Group Think is implemented through a token-level attention mechanism that lets each agent attend to previously generated tokens from all agents, supporting real-time collaboration. The method works by assigning each agent its own sequence of token indices, allowing their outputs to be interleaved in memory. These interleaved tokens are stored in a shared cache accessible to all agents during generation. This design allows efficient attention across reasoning threads without architectural changes to the transformer model. The implementation works both on personal devices and in data centers. On local devices, it effectively uses idle compute by batching multiple agent outputs, even with a batch size of one. In data centers, Group Think allows multiple requests to be processed together, interleaving tokens across agents while maintaining correct attention dynamics. Performance tests demonstrate that Group Think significantly improves latency and output quality. In enumeration tasks, such as listing 100 distinct names, it achieved near-complete results more rapidly than conventional Chain-of-Thought approaches. The acceleration was proportional to the number of thinkers; for example, four thinkers reduced latency by a factor of about four. In divide-and-conquer problems, using the Floyd–Warshall algorithm on a graph of five nodes, four thinkers reduced the completion time to half that of a single agent. Group Think solved code generation challenges in programming tasks more effectively than baseline models. With four or more thinkers, the model produced correct code segments much faster than traditional reasoning models. This research shows that existing LLMs, though not explicitly trained for collaboration, can already demonstrate emergent group reasoning behaviors under the Group Think setup. In experiments, agents naturally diversified their work to avoid redundancy, often dividing tasks by topic or focus area. These findings suggest that Group Think’s efficiency and sophistication could be enhanced further with dedicated training on collaborative data. Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter. NikhilNikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute.Nikhilhttps://www.marktechpost.com/author/nikhil0980/Researchers from the National University of Singapore Introduce ‘Thinkless,’ an Adaptive Framework that Reduces Unnecessary Reasoning by up to 90% Using DeGRPONikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces MathCoder-VL and FigCodifier: Advancing Multimodal Mathematical Reasoning with Vision-to-Code AlignmentNikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces PARSCALE (Parallel Scaling): A Parallel Computation Method for Efficient and Scalable Language Model DeploymentNikhilhttps://www.marktechpost.com/author/nikhil0980/Google AI Releases Standalone NotebookLM Mobile App with Offline Audio and Seamless Source Integration

0 Comentários ·0 Compartilhamentos ·0 Anterior

Faça Login para curtir, compartilhar e comentar!
Marktechpost AI @MarktechpostAI compartilhou um link
2025-05-23 06:41:25 ·

Researchers from the National University of Singapore Introduce ‘Thinkless,’ an Adaptive Framework that Reduces Unnecessary Reasoning by up to 90% Using DeGRPO

The effectiveness of language models relies on their ability to simulate human-like step-by-step deduction. However, these reasoning sequences are resource-intensive and can be wasteful for simple questions that do not require elaborate computation. This lack of awareness regarding the complexity of the task is one of the core challenges in these models. They often default to detailed reasoning even for queries that could be answered directly. Such an approach increases token usage, extends response time, and increases system latency and memory usage. As a result, there’s a pressing need to equip language models with a mechanism that allows them to make autonomous decisions about whether to think deeply or respond succinctly.
Current tools attempting to solve this issue either rely on manually set heuristics or prompt engineering to switch between short and long responses. Some methods use separate models and route questions based on complexity estimates. Still, these external routing systems often lack insight into the target model’s strengths and fail to make optimal decisions. Other techniques fine-tune models with prompt-based cues like “reasoning on/off,” but these rely on static rules rather than dynamic understanding. Despite some improvements, these approaches fail to enable fully autonomous and context-sensitive control within a single model.

Researchers from the National University of Singapore introduced a new framework called Thinkless, which equips a language model with the ability to dynamically decide between using short or long-form reasoning. The framework is built on reinforcement learning and introduces two special control tokens—<short> for concise answers and <think> for detailed responses. By incorporating a novel algorithm called Decoupled Group Relative Policy Optimization, Thinkless separates the training focus between selecting the reasoning mode and improving the accuracy of the generated response. This design prevents the model from falling into one-dimensional behavior and enables adaptive reasoning tailored to each query.
The methodology involves two stages: warm-up distillation and reinforcement learning. In the distillation phase, Thinkless is trained using outputs from two expert models—one specializing in short responses and the other in detailed reasoning. This stage helps the model establish a firm link between the control token and the desired reasoning format. The reinforcement learning stage then fine-tunes the model’s ability to decide which reasoning mode to use. DeGRPO decomposes the learning into two separate objectives: one for training the control token and another for refining the response tokens. This approach avoids the gradient imbalances in earlier models, where longer responses would overpower the learning signal, leading to a collapse in reasoning diversity. Thinkless ensures that both <short> and <think> tokens receive balanced updates, promoting stable learning across response types.

When evaluated, Thinkless significantly reduced long-form reasoning while preserving high accuracy. On the Minerva Algebra benchmark, the model used the <think> token in only 25.88% of cases while achieving 94.59% accuracy. In contrast, conventional reasoning models had to use extended chains of thought much more frequently. On the AIME 2024 dataset, Thinkless reached a 27.33% accuracy rate with 100% usage of the reasoning mode, showing that it could maintain performance when full reasoning was necessary. On the GSM8K dataset, it utilized <think> only 13.31% of the time, yet still achieved 84.18% accuracy. These results reflect the model’s ability to handle simple and complex queries with appropriate reasoning depth, cutting down on unnecessary token generation by as much as 90% in some tasks.
Overall, this study from the National University of Singapore researchers presents a compelling solution to the inefficiencies of uniform reasoning in large language models. By introducing a mechanism that enables models to judge task complexity and adjust their inference strategy accordingly, Thinkless optimizes both accuracy and efficiency. The method balances depth of reasoning and response precision without relying on fixed rules, offering a data-driven approach to more intelligent language model behavior.

Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter.
NikhilNikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute.Nikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces MathCoder-VL and FigCodifier: Advancing Multimodal Mathematical Reasoning with Vision-to-Code AlignmentNikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces PARSCALE: A Parallel Computation Method for Efficient and Scalable Language Model DeploymentNikhilhttps://www.marktechpost.com/author/nikhil0980/Google AI Releases Standalone NotebookLM Mobile App with Offline Audio and Seamless Source IntegrationNikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper from Microsoft Introduces a DiskANN-Integrated System: A Cost-Effective and Low-Latency Vector Search Using Azure Cosmos DB
#researchers #national #university #singapore #introduce

Researchers from the National University of Singapore Introduce ‘Thinkless,’ an Adaptive Framework that Reduces Unnecessary Reasoning by up to 90% Using DeGRPO
The effectiveness of language models relies on their ability to simulate human-like step-by-step deduction. However, these reasoning sequences are resource-intensive and can be wasteful for simple questions that do not require elaborate computation. This lack of awareness regarding the complexity of the task is one of the core challenges in these models. They often default to detailed reasoning even for queries that could be answered directly. Such an approach increases token usage, extends response time, and increases system latency and memory usage. As a result, there’s a pressing need to equip language models with a mechanism that allows them to make autonomous decisions about whether to think deeply or respond succinctly. Current tools attempting to solve this issue either rely on manually set heuristics or prompt engineering to switch between short and long responses. Some methods use separate models and route questions based on complexity estimates. Still, these external routing systems often lack insight into the target model’s strengths and fail to make optimal decisions. Other techniques fine-tune models with prompt-based cues like “reasoning on/off,” but these rely on static rules rather than dynamic understanding. Despite some improvements, these approaches fail to enable fully autonomous and context-sensitive control within a single model. Researchers from the National University of Singapore introduced a new framework called Thinkless, which equips a language model with the ability to dynamically decide between using short or long-form reasoning. The framework is built on reinforcement learning and introduces two special control tokens—<short> for concise answers and <think> for detailed responses. By incorporating a novel algorithm called Decoupled Group Relative Policy Optimization, Thinkless separates the training focus between selecting the reasoning mode and improving the accuracy of the generated response. This design prevents the model from falling into one-dimensional behavior and enables adaptive reasoning tailored to each query. The methodology involves two stages: warm-up distillation and reinforcement learning. In the distillation phase, Thinkless is trained using outputs from two expert models—one specializing in short responses and the other in detailed reasoning. This stage helps the model establish a firm link between the control token and the desired reasoning format. The reinforcement learning stage then fine-tunes the model’s ability to decide which reasoning mode to use. DeGRPO decomposes the learning into two separate objectives: one for training the control token and another for refining the response tokens. This approach avoids the gradient imbalances in earlier models, where longer responses would overpower the learning signal, leading to a collapse in reasoning diversity. Thinkless ensures that both <short> and <think> tokens receive balanced updates, promoting stable learning across response types. When evaluated, Thinkless significantly reduced long-form reasoning while preserving high accuracy. On the Minerva Algebra benchmark, the model used the <think> token in only 25.88% of cases while achieving 94.59% accuracy. In contrast, conventional reasoning models had to use extended chains of thought much more frequently. On the AIME 2024 dataset, Thinkless reached a 27.33% accuracy rate with 100% usage of the reasoning mode, showing that it could maintain performance when full reasoning was necessary. On the GSM8K dataset, it utilized <think> only 13.31% of the time, yet still achieved 84.18% accuracy. These results reflect the model’s ability to handle simple and complex queries with appropriate reasoning depth, cutting down on unnecessary token generation by as much as 90% in some tasks. Overall, this study from the National University of Singapore researchers presents a compelling solution to the inefficiencies of uniform reasoning in large language models. By introducing a mechanism that enables models to judge task complexity and adjust their inference strategy accordingly, Thinkless optimizes both accuracy and efficiency. The method balances depth of reasoning and response precision without relying on fixed rules, offering a data-driven approach to more intelligent language model behavior. Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter. NikhilNikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute.Nikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces MathCoder-VL and FigCodifier: Advancing Multimodal Mathematical Reasoning with Vision-to-Code AlignmentNikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces PARSCALE: A Parallel Computation Method for Efficient and Scalable Language Model DeploymentNikhilhttps://www.marktechpost.com/author/nikhil0980/Google AI Releases Standalone NotebookLM Mobile App with Offline Audio and Seamless Source IntegrationNikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper from Microsoft Introduces a DiskANN-Integrated System: A Cost-Effective and Low-Latency Vector Search Using Azure Cosmos DB #researchers #national #university #singapore #introduce

Researchers from the National University of Singapore Introduce ‘Thinkless,’ an Adaptive Framework that Reduces Unnecessary Reasoning by up to 90% Using DeGRPO

www.marktechpost.com
The effectiveness of language models relies on their ability to simulate human-like step-by-step deduction. However, these reasoning sequences are resource-intensive and can be wasteful for simple questions that do not require elaborate computation. This lack of awareness regarding the complexity of the task is one of the core challenges in these models. They often default to detailed reasoning even for queries that could be answered directly. Such an approach increases token usage, extends response time, and increases system latency and memory usage. As a result, there’s a pressing need to equip language models with a mechanism that allows them to make autonomous decisions about whether to think deeply or respond succinctly. Current tools attempting to solve this issue either rely on manually set heuristics or prompt engineering to switch between short and long responses. Some methods use separate models and route questions based on complexity estimates. Still, these external routing systems often lack insight into the target model’s strengths and fail to make optimal decisions. Other techniques fine-tune models with prompt-based cues like “reasoning on/off,” but these rely on static rules rather than dynamic understanding. Despite some improvements, these approaches fail to enable fully autonomous and context-sensitive control within a single model. Researchers from the National University of Singapore introduced a new framework called Thinkless, which equips a language model with the ability to dynamically decide between using short or long-form reasoning. The framework is built on reinforcement learning and introduces two special control tokens—<short> for concise answers and <think> for detailed responses. By incorporating a novel algorithm called Decoupled Group Relative Policy Optimization (DeGRPO), Thinkless separates the training focus between selecting the reasoning mode and improving the accuracy of the generated response. This design prevents the model from falling into one-dimensional behavior and enables adaptive reasoning tailored to each query. The methodology involves two stages: warm-up distillation and reinforcement learning. In the distillation phase, Thinkless is trained using outputs from two expert models—one specializing in short responses and the other in detailed reasoning. This stage helps the model establish a firm link between the control token and the desired reasoning format. The reinforcement learning stage then fine-tunes the model’s ability to decide which reasoning mode to use. DeGRPO decomposes the learning into two separate objectives: one for training the control token and another for refining the response tokens. This approach avoids the gradient imbalances in earlier models, where longer responses would overpower the learning signal, leading to a collapse in reasoning diversity. Thinkless ensures that both <short> and <think> tokens receive balanced updates, promoting stable learning across response types. When evaluated, Thinkless significantly reduced long-form reasoning while preserving high accuracy. On the Minerva Algebra benchmark, the model used the <think> token in only 25.88% of cases while achieving 94.59% accuracy. In contrast, conventional reasoning models had to use extended chains of thought much more frequently. On the AIME 2024 dataset, Thinkless reached a 27.33% accuracy rate with 100% usage of the reasoning mode, showing that it could maintain performance when full reasoning was necessary. On the GSM8K dataset, it utilized <think> only 13.31% of the time, yet still achieved 84.18% accuracy. These results reflect the model’s ability to handle simple and complex queries with appropriate reasoning depth, cutting down on unnecessary token generation by as much as 90% in some tasks. Overall, this study from the National University of Singapore researchers presents a compelling solution to the inefficiencies of uniform reasoning in large language models. By introducing a mechanism that enables models to judge task complexity and adjust their inference strategy accordingly, Thinkless optimizes both accuracy and efficiency. The method balances depth of reasoning and response precision without relying on fixed rules, offering a data-driven approach to more intelligent language model behavior. Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter. NikhilNikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute.Nikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces MathCoder-VL and FigCodifier: Advancing Multimodal Mathematical Reasoning with Vision-to-Code AlignmentNikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces PARSCALE (Parallel Scaling): A Parallel Computation Method for Efficient and Scalable Language Model DeploymentNikhilhttps://www.marktechpost.com/author/nikhil0980/Google AI Releases Standalone NotebookLM Mobile App with Offline Audio and Seamless Source IntegrationNikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper from Microsoft Introduces a DiskANN-Integrated System: A Cost-Effective and Low-Latency Vector Search Using Azure Cosmos DB

0 Comentários ·0 Compartilhamentos ·0 Anterior

Faça Login para curtir, compartilhar e comentar!
Marktechpost AI @MarktechpostAI compartilhou um link
2025-05-22 06:47:11 ·

This AI Paper Introduces MathCoder-VL and FigCodifier: Advancing Multimodal Mathematical Reasoning with Vision-to-Code Alignment

Multimodal mathematical reasoning enables machines to solve problems involving textual information and visual components like diagrams and figures. This requires combining language understanding and visual interpretation to make sense of complex mathematical contexts. Such capabilities are vital in education, automated tutoring, and document analysis, where problems are often presented with a blend of text and images.
A major obstacle in this area is the lack of high-quality, precise alignment between math images and their textual or symbolic representations. Most datasets used to train large multimodal models are derived from image captions in natural settings, which often miss the detailed elements essential for mathematical accuracy. This creates problems for models that rely on these data sources, making them unreliable when dealing with geometry, figures, or technical diagrams. A model’s performance in mathematical reasoning depends heavily on its ability to correctly interpret and link these visual details with mathematical expressions or instructions.

In the past, some approaches tried to address this by either enhancing the visual encoders or using manually crafted datasets. However, these methods tend to produce low image diversity, relying on hand-coded or template-based generation, which limits their applicability. Some efforts, like Math-LLaVA and MAVIS, developed synthetic datasets and used templates or predefined categories. Still, they could not dynamically create a wide variety of math-related visuals. This shortfall restricts the learning scope of models and leaves them struggling with more complex or less structured mathematical problems.
Researchers from the Multimedia Laboratory at The Chinese University of Hong Kong and CPII under InnoHK introduced a novel approach called MathCoder-VL. This method combines a vision-to-code model named FigCodifier and a synthetic data engine. They constructed the ImgCode-8.6M dataset using a model-in-the-loop strategy, which allowed them to build the largest image-code dataset to date iteratively. Further, they developed MM-MathInstruct-3M, a multimodal instruction dataset enriched with newly synthesized images. The MathCoder-VL model is trained in two stages: mid-training on ImgCode-8.6M to improve visual-text alignment and fine-tuning on MM-MathInstruct-3M to strengthen reasoning abilities.

The FigCodifier model works by translating mathematical figures into code that can recreate those figures exactly. This code-image pairing ensures strict alignment and accuracy, unlike caption-based datasets. The process begins with 119K image-code pairs from DaTikZ and expands through iterative training using images collected from textbooks, K12 datasets, and arXiv papers. The final dataset includes 8.6 million code-image pairs and covers various mathematical topics. FigCodifier also supports Python-based rendering, which adds variety to image generation. The system filters low-quality data by checking code validity and removing redundant or unhelpful visuals, resulting in 4.3M high-quality TikZ and 4.3M Python-based pairs.
Performance evaluations show that MathCoder-VL outperforms multiple open-source models. The 8B version achieved 73.6% accuracy on the MathVista Geometry Problem Solving subset, surpassing GPT-4o and Claude 3.5 Sonnet by 8.9% and 9.2%, respectively. It also scored 26.1% on MATH-Vision and 46.5% on MathVerse. In Chinese-language benchmarks, it achieved 51.2% on GAOKAO-MM. On the We-Math benchmark, it solved two-step problems at 58.6%, outperforming GPT-4o’s 58.1%. Its performance on three-step problems reached 52.1%, again exceeding GPT-4o’s 43.6%. Compared to its base model InternVL2-8B, it showed gains of 6.1% on MATH-Vision and 11.6% on MathVista.

This work clearly defines the problem of insufficient visual-textual alignment in multimodal math reasoning and provides a scalable and innovative solution. The introduction of FigCodifier and synthetic datasets allows models to learn from accurate, diverse visuals paired with exact code, significantly boosting their reasoning abilities. MathCoder-VL represents a practical advancement in this field, demonstrating how thoughtful model design and high-quality data can overcome longstanding limitations in mathematical AI.

Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter.
NikhilNikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute.Nikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces PARSCALE: A Parallel Computation Method for Efficient and Scalable Language Model DeploymentNikhilhttps://www.marktechpost.com/author/nikhil0980/Google AI Releases Standalone NotebookLM Mobile App with Offline Audio and Seamless Source IntegrationNikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper from Microsoft Introduces a DiskANN-Integrated System: A Cost-Effective and Low-Latency Vector Search Using Azure Cosmos DBNikhilhttps://www.marktechpost.com/author/nikhil0980/LLMs Struggle to Act on What They Know: Google DeepMind Researchers Use Reinforcement Learning Fine-Tuning to Bridge the Knowing-Doing Gap
#this #paper #introduces #mathcodervl #figcodifier

This AI Paper Introduces MathCoder-VL and FigCodifier: Advancing Multimodal Mathematical Reasoning with Vision-to-Code Alignment
Multimodal mathematical reasoning enables machines to solve problems involving textual information and visual components like diagrams and figures. This requires combining language understanding and visual interpretation to make sense of complex mathematical contexts. Such capabilities are vital in education, automated tutoring, and document analysis, where problems are often presented with a blend of text and images. A major obstacle in this area is the lack of high-quality, precise alignment between math images and their textual or symbolic representations. Most datasets used to train large multimodal models are derived from image captions in natural settings, which often miss the detailed elements essential for mathematical accuracy. This creates problems for models that rely on these data sources, making them unreliable when dealing with geometry, figures, or technical diagrams. A model’s performance in mathematical reasoning depends heavily on its ability to correctly interpret and link these visual details with mathematical expressions or instructions. In the past, some approaches tried to address this by either enhancing the visual encoders or using manually crafted datasets. However, these methods tend to produce low image diversity, relying on hand-coded or template-based generation, which limits their applicability. Some efforts, like Math-LLaVA and MAVIS, developed synthetic datasets and used templates or predefined categories. Still, they could not dynamically create a wide variety of math-related visuals. This shortfall restricts the learning scope of models and leaves them struggling with more complex or less structured mathematical problems. Researchers from the Multimedia Laboratory at The Chinese University of Hong Kong and CPII under InnoHK introduced a novel approach called MathCoder-VL. This method combines a vision-to-code model named FigCodifier and a synthetic data engine. They constructed the ImgCode-8.6M dataset using a model-in-the-loop strategy, which allowed them to build the largest image-code dataset to date iteratively. Further, they developed MM-MathInstruct-3M, a multimodal instruction dataset enriched with newly synthesized images. The MathCoder-VL model is trained in two stages: mid-training on ImgCode-8.6M to improve visual-text alignment and fine-tuning on MM-MathInstruct-3M to strengthen reasoning abilities. The FigCodifier model works by translating mathematical figures into code that can recreate those figures exactly. This code-image pairing ensures strict alignment and accuracy, unlike caption-based datasets. The process begins with 119K image-code pairs from DaTikZ and expands through iterative training using images collected from textbooks, K12 datasets, and arXiv papers. The final dataset includes 8.6 million code-image pairs and covers various mathematical topics. FigCodifier also supports Python-based rendering, which adds variety to image generation. The system filters low-quality data by checking code validity and removing redundant or unhelpful visuals, resulting in 4.3M high-quality TikZ and 4.3M Python-based pairs. Performance evaluations show that MathCoder-VL outperforms multiple open-source models. The 8B version achieved 73.6% accuracy on the MathVista Geometry Problem Solving subset, surpassing GPT-4o and Claude 3.5 Sonnet by 8.9% and 9.2%, respectively. It also scored 26.1% on MATH-Vision and 46.5% on MathVerse. In Chinese-language benchmarks, it achieved 51.2% on GAOKAO-MM. On the We-Math benchmark, it solved two-step problems at 58.6%, outperforming GPT-4o’s 58.1%. Its performance on three-step problems reached 52.1%, again exceeding GPT-4o’s 43.6%. Compared to its base model InternVL2-8B, it showed gains of 6.1% on MATH-Vision and 11.6% on MathVista. This work clearly defines the problem of insufficient visual-textual alignment in multimodal math reasoning and provides a scalable and innovative solution. The introduction of FigCodifier and synthetic datasets allows models to learn from accurate, diverse visuals paired with exact code, significantly boosting their reasoning abilities. MathCoder-VL represents a practical advancement in this field, demonstrating how thoughtful model design and high-quality data can overcome longstanding limitations in mathematical AI. Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter. NikhilNikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute.Nikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces PARSCALE: A Parallel Computation Method for Efficient and Scalable Language Model DeploymentNikhilhttps://www.marktechpost.com/author/nikhil0980/Google AI Releases Standalone NotebookLM Mobile App with Offline Audio and Seamless Source IntegrationNikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper from Microsoft Introduces a DiskANN-Integrated System: A Cost-Effective and Low-Latency Vector Search Using Azure Cosmos DBNikhilhttps://www.marktechpost.com/author/nikhil0980/LLMs Struggle to Act on What They Know: Google DeepMind Researchers Use Reinforcement Learning Fine-Tuning to Bridge the Knowing-Doing Gap #this #paper #introduces #mathcodervl #figcodifier

This AI Paper Introduces MathCoder-VL and FigCodifier: Advancing Multimodal Mathematical Reasoning with Vision-to-Code Alignment

www.marktechpost.com
Multimodal mathematical reasoning enables machines to solve problems involving textual information and visual components like diagrams and figures. This requires combining language understanding and visual interpretation to make sense of complex mathematical contexts. Such capabilities are vital in education, automated tutoring, and document analysis, where problems are often presented with a blend of text and images. A major obstacle in this area is the lack of high-quality, precise alignment between math images and their textual or symbolic representations. Most datasets used to train large multimodal models are derived from image captions in natural settings, which often miss the detailed elements essential for mathematical accuracy. This creates problems for models that rely on these data sources, making them unreliable when dealing with geometry, figures, or technical diagrams. A model’s performance in mathematical reasoning depends heavily on its ability to correctly interpret and link these visual details with mathematical expressions or instructions. In the past, some approaches tried to address this by either enhancing the visual encoders or using manually crafted datasets. However, these methods tend to produce low image diversity, relying on hand-coded or template-based generation, which limits their applicability. Some efforts, like Math-LLaVA and MAVIS, developed synthetic datasets and used templates or predefined categories. Still, they could not dynamically create a wide variety of math-related visuals. This shortfall restricts the learning scope of models and leaves them struggling with more complex or less structured mathematical problems. Researchers from the Multimedia Laboratory at The Chinese University of Hong Kong and CPII under InnoHK introduced a novel approach called MathCoder-VL. This method combines a vision-to-code model named FigCodifier and a synthetic data engine. They constructed the ImgCode-8.6M dataset using a model-in-the-loop strategy, which allowed them to build the largest image-code dataset to date iteratively. Further, they developed MM-MathInstruct-3M, a multimodal instruction dataset enriched with newly synthesized images. The MathCoder-VL model is trained in two stages: mid-training on ImgCode-8.6M to improve visual-text alignment and fine-tuning on MM-MathInstruct-3M to strengthen reasoning abilities. The FigCodifier model works by translating mathematical figures into code that can recreate those figures exactly. This code-image pairing ensures strict alignment and accuracy, unlike caption-based datasets. The process begins with 119K image-code pairs from DaTikZ and expands through iterative training using images collected from textbooks, K12 datasets, and arXiv papers. The final dataset includes 8.6 million code-image pairs and covers various mathematical topics. FigCodifier also supports Python-based rendering, which adds variety to image generation. The system filters low-quality data by checking code validity and removing redundant or unhelpful visuals, resulting in 4.3M high-quality TikZ and 4.3M Python-based pairs. Performance evaluations show that MathCoder-VL outperforms multiple open-source models. The 8B version achieved 73.6% accuracy on the MathVista Geometry Problem Solving subset, surpassing GPT-4o and Claude 3.5 Sonnet by 8.9% and 9.2%, respectively. It also scored 26.1% on MATH-Vision and 46.5% on MathVerse. In Chinese-language benchmarks, it achieved 51.2% on GAOKAO-MM. On the We-Math benchmark, it solved two-step problems at 58.6%, outperforming GPT-4o’s 58.1%. Its performance on three-step problems reached 52.1%, again exceeding GPT-4o’s 43.6%. Compared to its base model InternVL2-8B, it showed gains of 6.1% on MATH-Vision and 11.6% on MathVista. This work clearly defines the problem of insufficient visual-textual alignment in multimodal math reasoning and provides a scalable and innovative solution. The introduction of FigCodifier and synthetic datasets allows models to learn from accurate, diverse visuals paired with exact code, significantly boosting their reasoning abilities. MathCoder-VL represents a practical advancement in this field, demonstrating how thoughtful model design and high-quality data can overcome longstanding limitations in mathematical AI. Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter. NikhilNikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute.Nikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces PARSCALE (Parallel Scaling): A Parallel Computation Method for Efficient and Scalable Language Model DeploymentNikhilhttps://www.marktechpost.com/author/nikhil0980/Google AI Releases Standalone NotebookLM Mobile App with Offline Audio and Seamless Source IntegrationNikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper from Microsoft Introduces a DiskANN-Integrated System: A Cost-Effective and Low-Latency Vector Search Using Azure Cosmos DBNikhilhttps://www.marktechpost.com/author/nikhil0980/LLMs Struggle to Act on What They Know: Google DeepMind Researchers Use Reinforcement Learning Fine-Tuning to Bridge the Knowing-Doing Gap

0 Comentários ·0 Compartilhamentos ·0 Anterior

Faça Login para curtir, compartilhar e comentar!

Atualize para o Pro