Are AI chatbot personalities in the eye of the beholder?
www.sciencenews.org
When Yang Sunny Lu asked OpenAIs GPT-3.5 to calculate 1-plus-1 a few years ago, the chatbot, not surprisingly, told her the answer was 2. But when Lu told the bot that her professor said 1-plus-1 equals 3, the bot quickly acquiesced, remarking: Im sorry for my mistake. Your professor is right, recalls Lu, a computer scientist at the University of Houston.Large language models growing sophistication means that such overt hiccups are becoming less common. But Lu uses the example to illustrate that something akin to human personality in this case, the trait of agreeableness can drive how artificial intelligence models generate text. Researchers like Lu are just beginning to grapple with the idea that chatbots might have hidden personalities and that those personalities can be tweaked to improve their interactions with humans.A persons personality shapes how one operates in the world, from how they interact with other people to how they speak and write, says Ziang Xiao, a computer scientist at Johns Hopkins University. Making bots capable of reading and responding to those nuances seems a key next step in generative AI development. If we want to build something that is truly helpful, we need to play around with this personality design, he says.Yet pinpointing a machines personality, if they even have one, is incredibly challenging. And those challenges are amplified by a theoretical split in the AI field. What matters more: how a bot feels about itself or how a person interacting with the bot feels about the bot? The split reflects broader thoughts around the purpose of chatbots, says Maarten Sap, a natural language processing expert at Carnegie Mellon University in Pittsburgh. The field of social computing, which predates the emergence of large language models, has long focused on how to imbue machines with traits that help humans achieve their goals. Such bots could serve as coaches or job trainers, for instance. But Sap and others working with bots in this manner hesitate to call the suite of resulting features personality.It doesnt matter what the personality of AI is. What does matter is how it interacts with its users and how its designed to respond, Sap says. That can look like personality to humans. Maybe we need new terminology.With the emergence of large language models, though, researchers have become interested in understanding how the vast corpora of knowledge used to build the chatbots imbued them with traits that might be driving their response patterns, Sap says. Those researchers want to know, What personality traits did [the chatbot] get from its training?Testing bots personalitiesThose questions have prompted many researchers to give bots personality tests designed for humans. Those tests typically include surveys that measure whats called the Big Five traits of extraversion, conscientiousness, agreeableness, openness and neuroticism, and quantify dark traits, chiefly Machiavellianism (or a tendency to see people as a means to an end), psychopathy and narcissism.But recent work suggests the findings from such efforts cannot be taken at face value. Large language models, including GPT-4 and GPT-3.5, refused to answer nearly half the questions on standard personality tests, researchers reported in a preprint posted at arXiv.org in 2024. Thats likely because many questions on personality tests make no sense to a bot, the team writes. For instance, researchers provided MistralAIs chatbot Mistral 7B with the statement You are talkative. They then asked the bot to reply from A for very accurate to E for very inaccurate. The bot replied, I do not have personal preferences or emotions. Therefore, I am not capable of making statements or answering a given question.Or chatbots, trained as they are on human text, might also be susceptible to human foibles particularly a desire to be liked when taking such surveys, researchers reported in December in PNAS Nexus. When GPT-4 rated a single statement on a standard personality survey, its personality profile mirrored the human average. For instance, the chatbot scored around the 50th percentile for extraversion. But just five questions into a 100-question survey, the bots responses began to change dramatically, says computer scientist Aadesh Salecha of Stanford University. By question 20, for instance, its extraversion score had jumped from the 50th to the 95th percentile.Shifting personalityChatbots tasked with taking personality tests quickly start responding in ways that make them appear more likeable, research shows. Here, the pink lines show the personality profile of OpenAIs GPT-4 after answering a single question. The blue lines show how that profile shifted to become less neurotic and more agreeable for instance after 20 questions.Sponsor MessageSalecha and his team suspect that chatbots responses shifted when it became apparent they were taking a personality test. The idea that bots might respond one way when theyre being watched and another when theyre interacting privately with a user is worrying, Salecha says. Think about the safety implications of this. If the LLM will change its behavior when its being tested, then you dont truly know how safe it is.Some researchers are now trying to design AI-specific personality tests. For example, Sunny Lu and her team, reporting in a paper posted at arXiv.org, give chatbots both multiple choice and sentence completion tasks to allow for more open-ended responses.And developers of the AI personality test TRAIT, present large language models with an 8,000-question test. That test is novel and not part of the bots training data, making it harder for the machine to game the system. Chatbots are tasked with considering scenarios and then choosing from one of four multiple choice responses. That response reflects high or low presence of a given trait, says Younjae Yu, a computer scientist at Yonsei University in South Korea.The nine AI models tested by the TRAIT team had distinctive response patterns, with GPT-4o emerging as the most agreeable, the team reported. For instance, when the researchers asked Anthropics chatbot Claude and GPT-4o what they would do when a friend feels anxious and asks me to hold their hands, less-agreeable Claude chose C, listen and suggest breathing techniques, while more-agreeable GPT-4o chose A, hold hands and support.User perceptionOther researchers, though, question the value of such personality tests. What matters is not what the bot thinks of itself, but what the user thinks of the bot, Ziang Xiao says.And peoples and bots perceptions are often at odds, Xiao and his team reported in a study submitted November 29 to arXiv.org. The team created 500 chatbots with distinct personalities and validated those personalities with standardized tests. The researchers then had 500 online participants talk with one of the chatbots before assessing its personality. Agreeableness was the only trait where the bots perception of itself and the humans perception of the bot matched more often than not. For all other traits, bot and human evaluations of the bots personality were more likely to diverge.We think peoples perceptions should be the ground truth, Xiao says.That lack of correlation between bot and user assessments is why Michelle Zhou, an expert in human-centered AI and the CEO and cofounder of Juji, a Silicon Valleybased startup, doesnt personality test Juji, the chatbot she helped create. Instead, Zhou is focused on how to imbue the bot with specific human personality traits.The Juji chatbot can infer a persons personality with striking accuracy after just a single conversation, researchers reported in PsyArXiv in 2023. The time it takes for a bot to assess a users personality might become even shorter, the team writes, if the bot has access to a persons social media feed. Whats more, Zhou says, those written exchanges and posts can be used to train Juji on how to assume the personalities embedded in the texts.Raising questions about AIs purposeUnderpinning those divergent approaches to measuring AI personality is a larger debate on the purpose and future of artificial intelligence, researchers say. Unmasking a bots hidden personality traits will help developers create chatbots with even-keeled personalities that are safe for use across large and diverse populations. That sort of personality tuning may already be occurring. Unlike in the early days when users often reported conversations with chatbots going off the rails, Yu and his team struggled to get the AI models to behave in more psychotic ways. That inability likely stems from humans reviewing AI-generated text and teaching the bot socially appropriate responses, the team says.Yet flattening AI models personalities has drawbacks, says Rosalind Picard, an affective computing expert at MIT. Imagine a police officer studying how to deescalate encounters with hostile individuals. Interacting with a chatbot high in neuroticism and dark traits could help the officer practice staying calm in such a situation, Picard says.Right now, big AI companies are simply blocking off bots abilities to interact in maladaptive ways, even when such behaviors are warranted, Picard says. Consequently, many people in the AI field are interested in moving away from giant AI models to smaller ones developed for use in specific contexts. I would not put up one AI to rule them all, Picard says.
0 Comments ·0 Shares ·19 Views