AI Pioneer Announces Non-Profit To Develop 'Honest' AI

Yoshua Bengio, a pioneer in AI and Turing Award winner, has launched a $30 million non-profit aimed at developing "honest" AI systems that detect and prevent deceptive or harmful behavior in autonomous agents. The Guardian reports: Yoshua Bengio, a renowned computer scientist described as one of the "godfathers" of AI, will be president of LawZero, an organization committed to the safe design of the cutting-edge technology that has sparked a $1 trillion arms race. Starting with funding of approximately $30m and more than a dozen researchers, Bengio is developing a system called Scientist AI that will act as a guardrail against AI agents -- which carry out tasks without human intervention -- showing deceptive or self-preserving behavior, such as trying to avoid being turned off. Describing the current suite of AI agents as "actors" seeking to imitate humans and please users, he said the Scientist AI system would be more like a "psychologist" that can understand and predict bad behavior. "We want to build AIs that will be honest and not deceptive," Bengio said. He added: "It is theoretically possible to imagine machines that have no self, no goal for themselves, that are just pure knowledge machines -- like a scientist who knows a lot of stuff." However, unlike current generative AI tools, Bengio's system will not give definitive answers and will instead give probabilities for whether an answer is correct. "It has a sense of humility that it isn't sure about the answer," he said. Deployed alongside an AI agent, Bengio's model would flag potentially harmful behaviour by an autonomous system -- having gauged the probability of its actions causing harm. Scientist AI will "predict the probability that an agent's actions will lead to harm" and, if that probability is above a certain threshold, that agent's proposed action will then be blocked. "The point is to demonstrate the methodology so that then we can convince either donors or governments or AI labs to put the resources that are needed to train this at the same scale as the current frontier AIs. It is really important that the guardrail AI be at least as smart as the AI agent that it is trying to monitor and control," he said. Read more of this story at Slashdot.

Jun 4, 2025 - 00:07

AI Pioneer Announces Non-Profit To Develop 'Honest' AI

Yoshua Bengio, a pioneer in AI and Turing Award winner, has launched a $30 million non-profit aimed at developing "honest" AI systems that detect and prevent deceptive or harmful behavior in autonomous agents. The Guardian reports: Yoshua Bengio, a renowned computer scientist described as one of the "godfathers" of AI, will be president of LawZero, an organization committed to the safe design of the cutting-edge technology that has sparked a $1 trillion arms race. Starting with funding of approximately $30m and more than a dozen researchers, Bengio is developing a system called Scientist AI that will act as a guardrail against AI agents -- which carry out tasks without human intervention -- showing deceptive or self-preserving behavior, such as trying to avoid being turned off. Describing the current suite of AI agents as "actors" seeking to imitate humans and please users, he said the Scientist AI system would be more like a "psychologist" that can understand and predict bad behavior. "We want to build AIs that will be honest and not deceptive," Bengio said. He added: "It is theoretically possible to imagine machines that have no self, no goal for themselves, that are just pure knowledge machines -- like a scientist who knows a lot of stuff." However, unlike current generative AI tools, Bengio's system will not give definitive answers and will instead give probabilities for whether an answer is correct. "It has a sense of humility that it isn't sure about the answer," he said. Deployed alongside an AI agent, Bengio's model would flag potentially harmful behaviour by an autonomous system -- having gauged the probability of its actions causing harm. Scientist AI will "predict the probability that an agent's actions will lead to harm" and, if that probability is above a certain threshold, that agent's proposed action will then be blocked. "The point is to demonstrate the methodology so that then we can convince either donors or governments or AI labs to put the resources that are needed to train this at the same scale as the current frontier AIs. It is really important that the guardrail AI be at least as smart as the AI agent that it is trying to monitor and control," he said.