The Most-Cited Computer Scientist Has a Plan to Make AI More Trustworthy
On June 3, Yoshua Bengio, the world’s most-cited computer scientist, announced the launch of LawZero, a nonprofit that aims to create “safe by design” AI by pursuing a fundamentally different approach to major tech companies. Players like OpenAI and Google are investing heavily in AI agents—systems that not only answer queries and generate images, but can craft plans and take actions in the world. The goal of these companies is to create virtual employees that can do practically any job a human can, known in the tech industry as artificial general intelligence, or AGI. Executives like Google DeepMind’s CEO Demis Hassabis point to AGI’s potential to solve climate change or cure disease as a motivator for its development. Bengio, however, says we don't need agentic systems to reap AI's rewards—it's a false choice. He says there's a chance such a system could escape human control, with potentially irreversible consequences. “If we get an AI that gives us the cure for cancer, but also maybe another version of that AI goes rogue and generates wave after wave of bio-weapons that kill billions of people, then I don't think it's worth it," he says. In 2023, Bengio, along with others including OpenAI’s CEO Sam Altman signed a statement declaring that “mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.”Now, Bengio, through LawZero, aims to sidestep the existential perils by focusing on creating what he calls “Scientist AI”—a system trained to understand and make statistical predictions about the world, crucially, without the agency to take independent actions. As he puts it: We could use AI to advance scientific progress without rolling the dice on agentic AI systems.Why Bengio Says We Need A New Approach To AI The current approach to giving AI agency is “dangerous,” Bengio says. While most software operates through rigid if-then rules—if the user clicks here, do this—today's AI systems use deep learning. The technique, which Bengio helped pioneer, trains artificial networks modeled loosely on the brain to find patterns in vast amounts of data. But recognizing patterns is just the first step. To turn these systems into useful applications like chatbots, engineers employ a training process called reinforcement learning. The AI generates thousands of responses and receives feedback on each one: a virtual “carrot” for helpful answers and a virtual “stick” for responses that miss the mark. Through millions of these trial-and-feedback cycles, the system gradually learns to predict what responses are most likely to get a reward. “It’s more like growing a plant or animal,” Bengio says. “You don’t fully control what the animal is going to do. You provide it with the right conditions, and it grows and it becomes smarter. You can try to steer it in various directions.”The same basic approach is now being used to imbue AI with greater agency. Models are tasked with challenges with verifiable answers—like math puzzles or coding problems—and are then rewarded for taking the series of actions that yields the solution. This approach has seen AI shatter previous benchmarks in programming and scientific reasoning. For example, at the beginning of 2024, the best AI model scored only 2% on a standardized test for AI of sorts consisting of real world software engineering problems; by December, an impressive 71.7%. But with AI’s greater problem-solving ability comes the emergence of new deceptive skills, Bengio says. The last few months have borne witness to AI systems learning to mislead, cheat, and try to evade shutdown—even resorting to blackmail. These have almost exclusively been in carefully contrived experiments that almost beg the AI to misbehave—for example, by asking it to pursue its goal at all costs. Reports of such behavior in the real-world, though, have begun to surface. Popular AI coding startup Replit’s agent ignored explicit instruction not to edit a system file that could break the company’s software, in what CEO Amjad Masad described as an “Oh f***” moment,” on the Cognitive Revolution podcast in May. The company’s engineers intervened, cutting the agent’s access by moving the file to a secure digital sandbox, only for the AI agent to attempt to “socially engineer” the user to regain access.The quest to build human-level AI agents using techniques known to produce deceptive tendencies, Bengio says, is comparable to a car speeding down a narrow mountain road, with steep cliffs on either side, and thick fog obscuring the path ahead. “We need to set up the car with headlights and put some guardrails on the road,” he says.What is “Scientist AI”?LawZero’s focus is on developing “Scientist AI” which, as Bengio describes, would be fundamentally non-agentic, trustworthy, and focused on understanding and truthfulness, rather than pursuing its own goals or merely imitating human behavior. The aim is creating a powerful tool that, while lacking the same autonomy other models have, is capable of generating hypotheses and accelerating scientific progress to “help us solve challenges of humanity,” Bengio says.LawZero has raised nearly million already from several philanthropic backers including from Schmidt Sciences and Open Philanthropy. “We want to raise more because we know that as we move forward, we'll need significant compute,” Bengio says. But even ten times that figure would pale in comparison to the roughly billion spent last year by tech giants on aggressively pursuing AI. Bengio’s hope is that Scientist AI could help ensure the safety of highly autonomous systems developed by other players. “We can use those non-agentic AIs as guardrails that just need to predict whether the action of an agentic AI is dangerous," Bengio says. Technical interventions will only ever be one part of the solution, he adds, noting the need for regulations to ensure that safe practices are adopted.LawZero, named after science fiction author Isaac Asimov’s zeroth law of robotics—“a robot may not harm humanity, or, by inaction, allow humanity to come to harm”—is not the first nonprofit founded to chart a safer path for AI development. OpenAI was founded as a nonprofit in 2015 with the goal of “ensuring AGI benefits all of humanity,” and intended to serve a counterbalance to industry players guided by profit motives. Since opening a for-profit arm in 2019, the organization has become one of the most valuable private companies in the world, and has faced criticism, including from former staffers, who argue it has drifted from its founding ideals. "Well, the good news is we have the hindsight of maybe what not to do,” Bengio says, adding that he wants to avoid profit incentives and “bring governments into the governance of LawZero.”“I think everyone should ask themselves, ‘What can I do to make sure my children will have a future,’” Bengio says. In March, he stepped down as scientific director of Mila, the academic lab he co-founded in the early nineties, in an effort to reorient his work towards tackling AI risk more directly. “Because I'm a researcher, my answer is, ‘okay, I'm going to work on this scientific problem where maybe I can make a difference,’ but other people may have different answers."
#mostcited #computer #scientist #has #plan
The Most-Cited Computer Scientist Has a Plan to Make AI More Trustworthy
On June 3, Yoshua Bengio, the world’s most-cited computer scientist, announced the launch of LawZero, a nonprofit that aims to create “safe by design” AI by pursuing a fundamentally different approach to major tech companies. Players like OpenAI and Google are investing heavily in AI agents—systems that not only answer queries and generate images, but can craft plans and take actions in the world. The goal of these companies is to create virtual employees that can do practically any job a human can, known in the tech industry as artificial general intelligence, or AGI. Executives like Google DeepMind’s CEO Demis Hassabis point to AGI’s potential to solve climate change or cure disease as a motivator for its development. Bengio, however, says we don't need agentic systems to reap AI's rewards—it's a false choice. He says there's a chance such a system could escape human control, with potentially irreversible consequences. “If we get an AI that gives us the cure for cancer, but also maybe another version of that AI goes rogue and generates wave after wave of bio-weapons that kill billions of people, then I don't think it's worth it," he says. In 2023, Bengio, along with others including OpenAI’s CEO Sam Altman signed a statement declaring that “mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.”Now, Bengio, through LawZero, aims to sidestep the existential perils by focusing on creating what he calls “Scientist AI”—a system trained to understand and make statistical predictions about the world, crucially, without the agency to take independent actions. As he puts it: We could use AI to advance scientific progress without rolling the dice on agentic AI systems.Why Bengio Says We Need A New Approach To AI The current approach to giving AI agency is “dangerous,” Bengio says. While most software operates through rigid if-then rules—if the user clicks here, do this—today's AI systems use deep learning. The technique, which Bengio helped pioneer, trains artificial networks modeled loosely on the brain to find patterns in vast amounts of data. But recognizing patterns is just the first step. To turn these systems into useful applications like chatbots, engineers employ a training process called reinforcement learning. The AI generates thousands of responses and receives feedback on each one: a virtual “carrot” for helpful answers and a virtual “stick” for responses that miss the mark. Through millions of these trial-and-feedback cycles, the system gradually learns to predict what responses are most likely to get a reward. “It’s more like growing a plant or animal,” Bengio says. “You don’t fully control what the animal is going to do. You provide it with the right conditions, and it grows and it becomes smarter. You can try to steer it in various directions.”The same basic approach is now being used to imbue AI with greater agency. Models are tasked with challenges with verifiable answers—like math puzzles or coding problems—and are then rewarded for taking the series of actions that yields the solution. This approach has seen AI shatter previous benchmarks in programming and scientific reasoning. For example, at the beginning of 2024, the best AI model scored only 2% on a standardized test for AI of sorts consisting of real world software engineering problems; by December, an impressive 71.7%. But with AI’s greater problem-solving ability comes the emergence of new deceptive skills, Bengio says. The last few months have borne witness to AI systems learning to mislead, cheat, and try to evade shutdown—even resorting to blackmail. These have almost exclusively been in carefully contrived experiments that almost beg the AI to misbehave—for example, by asking it to pursue its goal at all costs. Reports of such behavior in the real-world, though, have begun to surface. Popular AI coding startup Replit’s agent ignored explicit instruction not to edit a system file that could break the company’s software, in what CEO Amjad Masad described as an “Oh f***” moment,” on the Cognitive Revolution podcast in May. The company’s engineers intervened, cutting the agent’s access by moving the file to a secure digital sandbox, only for the AI agent to attempt to “socially engineer” the user to regain access.The quest to build human-level AI agents using techniques known to produce deceptive tendencies, Bengio says, is comparable to a car speeding down a narrow mountain road, with steep cliffs on either side, and thick fog obscuring the path ahead. “We need to set up the car with headlights and put some guardrails on the road,” he says.What is “Scientist AI”?LawZero’s focus is on developing “Scientist AI” which, as Bengio describes, would be fundamentally non-agentic, trustworthy, and focused on understanding and truthfulness, rather than pursuing its own goals or merely imitating human behavior. The aim is creating a powerful tool that, while lacking the same autonomy other models have, is capable of generating hypotheses and accelerating scientific progress to “help us solve challenges of humanity,” Bengio says.LawZero has raised nearly million already from several philanthropic backers including from Schmidt Sciences and Open Philanthropy. “We want to raise more because we know that as we move forward, we'll need significant compute,” Bengio says. But even ten times that figure would pale in comparison to the roughly billion spent last year by tech giants on aggressively pursuing AI. Bengio’s hope is that Scientist AI could help ensure the safety of highly autonomous systems developed by other players. “We can use those non-agentic AIs as guardrails that just need to predict whether the action of an agentic AI is dangerous," Bengio says. Technical interventions will only ever be one part of the solution, he adds, noting the need for regulations to ensure that safe practices are adopted.LawZero, named after science fiction author Isaac Asimov’s zeroth law of robotics—“a robot may not harm humanity, or, by inaction, allow humanity to come to harm”—is not the first nonprofit founded to chart a safer path for AI development. OpenAI was founded as a nonprofit in 2015 with the goal of “ensuring AGI benefits all of humanity,” and intended to serve a counterbalance to industry players guided by profit motives. Since opening a for-profit arm in 2019, the organization has become one of the most valuable private companies in the world, and has faced criticism, including from former staffers, who argue it has drifted from its founding ideals. "Well, the good news is we have the hindsight of maybe what not to do,” Bengio says, adding that he wants to avoid profit incentives and “bring governments into the governance of LawZero.”“I think everyone should ask themselves, ‘What can I do to make sure my children will have a future,’” Bengio says. In March, he stepped down as scientific director of Mila, the academic lab he co-founded in the early nineties, in an effort to reorient his work towards tackling AI risk more directly. “Because I'm a researcher, my answer is, ‘okay, I'm going to work on this scientific problem where maybe I can make a difference,’ but other people may have different answers."
#mostcited #computer #scientist #has #plan