• Sam Altmans Startup Appoints Former X Executive as Policy Chief
    www.wsj.com
    Tools for Humanity named Xs former vice president of global affairs Nick Pickles as its head of policy, the latest executive from what was formerly Twitter to join the startup.
    0 Comments ·0 Shares ·156 Views
  • Womens Hotel Review: A Colony in Manhattan
    www.wsj.com
    In Daniel Laverys novel of New York City in the 1960s, a community of female strivers compete withand befriendone another.
    0 Comments ·0 Shares ·157 Views
  • Holiday Books: Our 2024 Guide to the Best Gifts
    www.wsj.com
    Including books for children and young adults, cooks, sports fans, history buffs, nature lovers, mystery fans and more.
    0 Comments ·0 Shares ·151 Views
  • Surgeons remove 2.5-inch hairball from teen with rare Rapunzel syndrome
    arstechnica.com
    Dangling danger Surgeons remove 2.5-inch hairball from teen with rare Rapunzel syndrome The teen didn't return for follow-up. Instead, she planned to see a hypnotherapist. Beth Mole Nov 21, 2024 5:02 pm | 32 Credit: Getty | Ada Summer Credit: Getty | Ada Summer Story textSizeSmallStandardLargeWidth *StandardWideLinksStandardOrange* Subscribers only Learn moreAfter a month of unexplained bouts of stomach pain, an otherwise healthy 16-year-old girl arrived at the emergency department of Massachusetts General Hospital actively retching and in severe pain.A CT scan showed nothing unusual in her innards, and her urine and blood tests were normal. The same was found two weeks prior, when she had arrived at a different hospital complaining of stomach pain. She was discharged home with instructions to take painkillers, a medication for peptic ulcers, and another to prevent nausea and vomiting. The painkiller didn't help, and she didn't take the other two medications.Her pain worsened, and something was clearly wrong. When she arrived at Mass General, her stomach was tender, and her heart rate was elevated. When doctors tried to give her a combination of medications for common causes of abdominal pain, she immediately vomited them back up.So, her doctors set out to unravel the mystery, starting by considering the most common conditions that could explain her abdominal pain before moving on to the rarer possibilities. In a case study recently published in the New England Journal of Medicine, doctors recounted how they combed through a list that included constipation, gastritis, disorders of the gut-brain interaction, delayed stomach emptying brought on by an infection, lactose intolerance, gall bladder disease, pancreatitis, and Celiac disease. But each one could be dismissed fairly easily. Her pain was severe and came on abruptly. She had no fever or diarrhea and no recent history of an infection. Her gall bladder and pancreas looked normal on imaging.Hairy detailsThen there were the rarer causesmechanical problems. With tenderness and intermittent severe pain, an obstruction in her gut seemed like a good fit. And this led them to one of the rarest and unexpected possibilities: Rapunzel syndrome.Based on the girl's presentation, doctors suspected that a bezoar had formed in her stomach, growing over time and intermittently blocking the passage of food, causing pain. A bezoar is a foreign mass formed from accumulated material that's been ingested. A bezoar can form from a clump of dietary fiber (a phytobezoar) or from a glob of pharmaceutical products, like an extended-release capsule, enteric-coated aspirin, or iron (a pharmacobezoar). Then there's the third option: a tangle of hair (a trichobezoar).Hair is resistant to digestion and isn't easily moved through the digestive system. As such, it often gets lodged in folds of the gastric lining, denatures, and then traps food and gunk to form a mass. Over time, it will continue to collect material, growing into a thick, matted wad.Of all the bezoars, trichobezoars are the most common. But none of them are particularly easy to spot. On CT scans, bezoars can be indistinguishable from food in the stomach unless there's an oral contrast material. To look for a possible bezoar in the teen, her doctors ordered an esophagogastroduodenoscopy, in which a scope is put down into the stomach through the mouth. With that, they got a clear shot of the problem: a trichobezoar. (The image is here, but a warning: it's graphic).Tangled tailBut this trichobezoar was particularly rare; hair from the mottled mat had dangled down from the stomach and into the small bowel, which is an extremely uncommon condition called Rapunzel syndrome, named after the fairy-tale character who lets down her long hair. It carries a host of complications beyond acute abdominal pain, including perforation of the stomach and intestines, and acute pancreatitis. The only resolution is surgical removal. In the teen's case, the trichobezoar came out during surgery using a gastrostomy tube. Surgeons recovered a hairball about 2.5 inches wide, along with the dangling hair that reached into the small intestine.For any patient with a trichobezoar, the most important next step is to address any psychiatric disorders that might underlie hair-eating behavior. Hair eating is often linked to a condition called trichotillomania, a repetitive behavior disorder marked by hair pulling. Sometimes, the disorder can be diagnosed by signs of hair lossbald patches, irritated scalp areas, or hair at different growth stages. But, for the most part, it's an extremely difficult condition to diagnose as patients have substantial shame and embarrassment about the condition and will often go to great lengths to hide it.Another possibility is that the teen had pica, a disorder marked by persistent eating of nonfood, nonnutritive substances. Intriguingly, the teen noted that she had pica as a toddler. But doctors were skeptical that pica could explain her condition given that hair was the only nonfood material in the bezoar.The teen's doctors would have liked to get to the bottom of her condition and referred her to a psychiatrist after she successfully recovered from surgery. But unfortunately, she did not return for follow-up care and told her doctors she would instead see a hypnotherapist that her friends recommended.Beth MoleSenior Health ReporterBeth MoleSenior Health Reporter Beth is Ars Technicas Senior Health Reporter. Beth has a Ph.D. in microbiology from the University of North Carolina at Chapel Hill and attended the Science Communication program at the University of California, Santa Cruz. She specializes in covering infectious diseases, public health, and microbes. 32 Comments Prev story
    0 Comments ·0 Shares ·168 Views
  • Were closer to re-creating the sounds of Parasaurolophus
    arstechnica.com
    It's all in the crest Were closer to re-creating the sounds of Parasaurolophus Preliminary model suggests the dinosaur bellowed like a large trumpet or saxophone, or perhaps a clarinet. Jennifer Ouellette Nov 21, 2024 4:30 pm | 16 A 3D-printed model of the Parasaurolophus skulls at a 1:3 scale to the original fossil. The white model is the nasal passages inside the skull. Credit: Hongjun Lin A 3D-printed model of the Parasaurolophus skulls at a 1:3 scale to the original fossil. The white model is the nasal passages inside the skull. Credit: Hongjun Lin Story textSizeSmallStandardLargeWidth *StandardWideLinksStandardOrange* Subscribers only Learn moreThe duck-billed dinosaur Parasaurolophus is distinctive for its prominent crest, which some scientists have suggested served as a kind of resonating chamber to produce low-frequency sounds. Nobody really knows what Parasaurolophus sounded like, however. Hongjun Lin of New York University is trying to change that by constructing his own model of the dinosaur's crest and its acoustical characteristics. Lin has not yet reproduced the call of Parasaurolophus, but he talked about his progress thus far at a virtual meeting of the Acoustical Society of America.Lin was inspired in part by the dinosaur sounds featured in the Jurassic Park film franchise, which were a combination of sounds from other animals like baby whales and crocodiles. Ive been fascinated by giant animals ever since I was a kid. Id spend hours reading books, watching movies, and imagining what it would be like if dinosaurs were still around today, he said during a press briefing. It wasnt until college that I realized the sounds we hear in movies and showswhile mesmerizingare completely fabricated using sounds from modern animals. Thats when I decided to dive deeper and explore what dinosaurs might have actually sounded like.A skull and partial skeleton of Parasaurolophus were first discovered in 1920 along the Red Deer River in Alberta, Canada, and another partial skull was discovered the following year in New Mexico. There are now three known species of Parasaurolophus; the name means "near crested lizard." While no complete skeleton has yet been found, paleontologists have concluded that the adult dinosaur likely stood about 16 feet tall and weighed between 6,000 to 8,000 pounds. Parasaurolophus was an herbivore that could walk on all four legs while foraging for food but may have run on two legs.It's that distinctive crest that has most fascinated scientists over the last century, particularly its purpose. Past hypotheses have included its use as a snorkel or as a breathing tube while foraging for food; as an air trap to keep water out of the lungs; or as an air reservoir so the dinosaur could remain underwater for longer periods. Other scientists suggested the crest was designed to help move and support the head or perhaps used as a weapon while combating other Parasaurolophus. All of these, plus a few others, have largely been discredited.A near-crested lizard Reconstruction of the environment where Parasaurolophus walkeri lived during the Cretaceous. Credit: Marco Antonio Pineda/CC BY-SA 4.0 The most intriguing hypothesis is that the crest served as a resonating chamber, first proposed in 1931 by Swedish paleontologist Carl Wiman, who noted that the crest's structure was similar to that of a swan and thus could have been used for vocalization. Lin stumbled upon a 1981 paper by David Weishampel expanding on the notion, predicting that the dinosaur's calls would have fallen in the frequency range of 55 to 720 Hertz. Weishampel's model treated the crest as an open pipe/closed pipe system. Lin did a bit more research, and a 2013 paper convinced him that Weishampel's model was due for an update.Lin created a physical setup to empirically test his own mathematical model of what might be happening acoustically inside Parasaurolophus' crest, dubbed the "Linophone," although it is not a perfect anatomical replica of the dinosaur's crest. The setup consisted of two connected open pipes designed to mimic the vibrations of vocal cords. Lin conducted frequency sweeps using a speaker to generate the sounds and recorded the resonance data with microphones at three different locations. An oscilloscope transferred that data back to his computer.He found that the crest did indeed seem to be useful for resonance, similar to the crests in modern birds. "If I were to guess what this dinosaur sounded like, it would be a brass instrument like a huge trumpet or saxophone," said Lin, based on the simple pipe-like structure of his model. However, the presence of soft tissue-like vocal cords could mean that the sound was closer to that of a clarinet.Lin is still refining his mathematical model, and he thinks he should be able to extend it to studying other creatures with similar vocal structures."Once we have a working model, we'll move toward using fossil scans" to further improve it, Lin said, although he noted that one challenge is that soft tissue like vocal cords are often poorly preserved. His ultimate goal is to re-create the sound of the Parasaurolophusand perhaps even design his own accessible plug-in to add dinosaur sounds to his musical compositions.Jennifer OuelletteSenior WriterJennifer OuelletteSenior Writer Jennifer is a senior reporter at Ars Technica with a particular focus on where science meets culture, covering everything from physics and related interdisciplinary topics to her favorite films and TV series. Jennifer lives in Baltimore with her spouse, physicist Sean M. Carroll, and their two cats, Ariel and Caliban. 16 Comments
    0 Comments ·0 Shares ·167 Views
  • Does the US Government Have a Cybersecurity Monoculture Problem?
    www.informationweek.com
    Carrie Pallardy, Contributing ReporterNovember 21, 20244 Min ReadSOPA Images Limited via Alamy Stock PhotoThe way Microsoft provided the US government with cybersecurity upgrades is under scrutiny. ProPublica published a report that delves into the White House Offer: a deal in which Microsoft sent consultants to install cybersecurity upgrades for free. But those free product upgrades were only covered for up to one year.Did this deal give Microsoft an unfair advantage, and what could it take to shift the federal governments reliance on the tech giants services?The White House OfferProPublica spoke to eight former Microsoft employees that played a part in the White House Offer. With their insight, the ProPublicas report details how this deal makes it difficult for users in the federal government to shift away from Microsofts products and how it helped to squeeze out competition.While the cybersecurity upgrades were initially free, government agencies need to pay come renewal time. After the installation of the products and employee training, switching to alternatives would be costly.ProPublica also reports that Microsoft salespeople recommended that federal agencies drop products from competitors to save costs.Critics raise concerns that Microsofts deal skirted antitrust laws and federal procurement laws.Why didn't you allow a Deloitte or an Accenture or somebody else to say we want free services to help us do it? Why couldn't they come in and do the same thing? If a company is willing to do something for free like that, why should it be a bias to Microsoft and not someone else that's capable as well? asks Morey Haber, chief security advisor at BeyondTrust, an identity and access security company. Related:ProPublica noted Microsofts defense of its deal and the way it worked with the federal government. Microsoft declined to comment when InformationWeek reached out.Josh Bartolomie, vice president of global threat services at email security company Cofense, points out that the scale of the federal government makes Microsoft a logical choice.The reality of it is there are no other viable platforms that offer the extensibility, scalability, manageability other than Microsoft, he tells InformationWeek.The Argument for DiversificationOverreliance on a single security vendor has its pitfalls. Generally speaking, you don't want to do a sole provider for any type of security services. You want to have checks and balances. You want to have risk mitigations. You want to have fail safes, backup plans, says Bartolomie.And there are arguments being made that Microsoft created a cybersecurity monoculture within the federal government.Related:Sen. Eric Schmitt (R-Mo.) and Sen. Ron Wyden (D-Ore.) raised concerns and called for a multi-vendor approach.DoD should embrace an alternate approach, expanding its use of open-source software and software from other vendors, that reduces risk-concentration to limit the blast area when our adversaries discover an exploitable security flaw in Microsofts, or another companys software, they wrote in a letter to John Sherman, former CIO of the Department of Defense.The government has experienced the fallout that follows exploited vulnerabilities. A Microsoft vulnerability played a role in the SolarWinds hack.Earlier this year it was disclosed that Midnight Blizzard, a Russian state-sponsored threat group,executed a password spray attack against Microsoft. Federal agency credentials were stolen in the attack, according to Cybersecurity Dive.There is proof out there that the monoculture is a problem, says Haber.PushbackMicrosofts dominance in the government space has not gone unchallenged over the years. For example, the Department of Defense pulled out of a $10 billion cloud deal with Microsoft. The contract, the Joint Enterprise Defense Infrastructure (JEDI), faced legal challenges from competitor AWS.Related:Competitors could continue to challenge Microsofts dominance in the government, but there are still questions about the cost associated with replacing those services.I think the government has provided pathways for other vendors to approach, but I think it would be difficult to displace them, says Haber.A New AdministrationCould the incoming Trump administration herald changes in the way the government works with Microsoft and other technology vendors?Each time a new administration steps in, Bartolomie points out that there is a thirst for change. Do I think that there's a potential that he [Trump] will go to Microsoft and say, Give us better deals. Give us this, give us that? That's a high possibility because other administrations have, he says. The government being one of the largest customers of the Microsoft ecosystem also gives them leverage.Trump has been vocal about his America First policy, but how that could be applied to cybersecurity services used by the government remains to be seen. Do you allow software being used from a cybersecurity or other perspective to be developed overseas? asks Haber.Haber points out that outsourced development is typical for cybersecurity companies. I'm not aware of any cybersecurity company that does exclusive US or even North America builds, he says.Any sort of government mandate requiring cybersecurity services developed solely in the US would raise challenges for Microsoft and the cybersecurity industry as a whole.While the administrations approach to cybersecurity and IT vendor relationships is not yet known, it is noteworthy that Trumps view of tech companies could be influential. Amazon pursued legal action over the $10 billion JEDI contract, claiming that Trumps dislike of company founder Jeff Bezos impacted its ability to secure the deal, The New York Times reports.About the AuthorCarrie PallardyContributing ReporterCarrie Pallardy is a freelance writer and editor living in Chicago. She writes and edits in a variety of industries including cybersecurity, healthcare, and personal finance.See more from Carrie PallardyNever Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.SIGN-UPYou May Also LikeReportsMore Reports
    0 Comments ·0 Shares ·188 Views
  • The New Cold War: US Urged to Form Manhattan Project for AGI
    www.informationweek.com
    Shane Snider, Senior Writer, InformationWeekNovember 21, 20245 Min ReadIvan Marc Sanchez via Alamy StockA bi-partisan US congressional group this week released a report urging a Manhattan Project style effort to develop AI that will be able to outthink humans before China can win the AI arms race.The US-China Economic and Security Review Commission outlined the challenges and threats facing the US as powerful AI systems continue to quickly proliferate. The group calls for the government to fund and collaborate with private tech firms to quickly develop artificial general intelligence (AGI).The Manhattan Project was the historic collaboration between government and the private sector during World War II that culminated in the development of the first atomic bombs, which the US infamously unleashed on Japan. The subsequent proliferation of nuclear weapons led to an arms race and policy of mutually assured destruction that has so far deterred wartime use, but sparked the Cold War between the United States and Russia.While the Cold War with Russia ultimately ended in 1991, the nuclear stalemate caused by the arms pileup remains.A new stalemate may be brewing as superpowers race to develop AGI, which ethicists warn could present an existential threat to humanity. Many have likened such a race to the plot of the Terminator movie, where the fictional company Cyberdyne Systems works with the US government to achieve a type of AGI that ultimately leads to a nuclear catastrophe.Related:The commissions report doesnt sugarcoat the possibilities. The United States is locked in a long-term strategic competition with China to shape the rapidly evolving global technological landscape, according to the report. The rise in emerging tech like AI could alter the character of warfare and for the country winning the race, would tip the balance of power in its favor and reap economic benefits far into the 21st century.AI Effort in China ExpandsChinas State Council in 2017 unveiled its New Artificial Intelligence Development Plan, aiming to become the global leader in AI by 2030.The US still has an advantage, with more than 9,500 AI companies compared to Chinas nearly 2,000 companies. Private investment in the US dwarfs Chinas effort, with $605 billion invested, compared to Chinas $86 billion, according to a report from the non-profit Information Technology & Innovation Foundation.But Chinas government has poured a total of $184 million into AI research, including facial recognition, natural language processing, machine learning, deep learning, neural networks, robotics, automation, computer vision, data science, and cognitive computing.Related:While four US large language models (LLMs) sat on top of performance charts in April 2024, by June, only OpenAIs GPT-4o and Claude 3.5 remained on top. The next five models were all from China-backed companies.The gap between the leading models from the US industry leaders and those developed by Chinas foremost tech giants and start-ups is quickly closing, the report says.Where the US Should FocusThe report details areas that could make the biggest impact on the AI arms race where the US currently has an advantage, including advanced semiconductors, compute and cloud, AI models, and data. But China, the report contends, is making progress by subsidizing emerging technologies.The group recommends a priority on AI defense development for national security, with contracting authority given to the executive branch. The commission urges US Congress to establish and fund the program, with the goal of winning the AGI development race.The report also recommends banning certain technologies controlled by China, including autonomous humanoid robots, and products that could impact critical infrastructure. US policy has begun to shift to recognize the importance of competition with China over these critical technologies, the report states.Related:Manoj Saxena, CEO and founder of Responsible AI Institute and InformationWeek Insight Circle member, says the power of AGI should not be underestimated as countries race toward innovation.One issue is rushing to develop AGI just to win a tech race and not understanding the unintended consequences that these AI systems could create, he says. it could create a situation where we cannot control things, because we are accelerating without understanding what the AGI win would look like.Saxena says the AGI race may result in the need for another Geneva Convention, the global war treaties and humanitarian guidance that were greatly expanded after World War II.But Saxena says a public-private collaboration may lead to better solutions. As a country, were going to get not just the best and brightest minds working on this, most of which are in the private sector, but we will also get wider perspectives on ethical issues and potential harm and unintended consequences.An AI Disaster in the Making?Small actors have limited access to the tightly controlled materials needed to make a nuclear weapon. AI, on the other hand, enjoys a relatively open and democratized environment. Ethicists worry that ease of access to powerful and potentially dangerous systems may widen the threat landscape.RAI Institutes Saxena says weaponization of AI is already occurring, and it might take a catastrophic event to push all parties to the table. I think there is going to be some massive issues around AI going rogue, around autonomous weapon attacks that go out of control somewhere Unfortunately, civilization progresses through a combination of regulations, enforcement, and disasters.But in the case of AI, regulations are far behind, he says. Enforcements are also far behind, and it's more likely than not that there will be some disasters that will make us wake up and have some type of framework to limit these things.About the AuthorShane SniderSenior Writer, InformationWeekShane Snider is a veteran journalist with more than 20 years of industry experience. He started his career as a general assignment reporter and has covered government, business, education, technology and much more. He was a reporter for the Triangle Business Journal, Raleigh News and Observer and most recently a tech reporter for CRN. He was also a top wedding photographer for many years, traveling across the country and around the world. He lives in Raleigh with his wife and two children.See more from Shane SniderNever Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.SIGN-UPYou May Also LikeReportsMore Reports
    0 Comments ·0 Shares ·187 Views
  • Common chemical in drinking water hasn't been tested for safety
    www.newscientist.com
    Millions of US residents may be drinking water containing the potentially harmful compoundYiu Yu Hoi/Getty ImagesA common disinfectant in drinking water breaks down into a chemical compound that we know almost nothing about, including whether it has any potential toxic health effects to those who drink it.Chlorine has been used to sanitise drinking water for more than a century. However, some drinking water systems in the US, UK and Australia now useanother closely related chemical disinfectant called chloramine. Thats because chlorine byproducts were linked to bladder and colon cancer, low birth rates and miscarriage, says
    0 Comments ·0 Shares ·151 Views
  • Chimpanzees seem to get more technologically advanced through culture
    www.newscientist.com
    Some chimpanzees use sticks to fish for termitesManoj Shah/Getty ImagesWild chimpanzees appear to learn skills from each other and then much as humans do improve on those techniques from one generation to the next.In particular, young females that migrate between groups bring their cultural knowledge with them, and groups can combine new techniques with existing ones to get better at foraging for food. Such cumulative culture means some chimpanzee communities are becoming more technologically advanced over time albeit very slowly, says Andrew Whiten at the University of St Andrews, UK. AdvertisementIf chimpanzees have some cultural knowledge that the community theyre moving into doesnt have, they may pass it on just in the same way theyre passing the genes on, he says. And then that culture builds up from there.Scientists already knew that chimpanzees were capable of using tools in sophisticated ways and passing on that knowledge to their offspring. But in comparison with the rapid technological development of humans, it seemed that chimpanzees werent improving on previous innovations, says Whiten. The fact that chimpanzee tools are often made from biodegrading plants makes it difficult for scientists to track their cultural evolution.Cassandra Gunasekaram at the University of Zurich in Switzerland suspected she might be able to apply genetic analysis to the puzzle. While male chimpanzees stay in their home area, young females leave their native communities to find mates elsewhere. She wondered if those females have brought their skill sets with them into their new groups. A monthly celebration of the biodiversity of our planets animals, plants and other organisms.Sign up to newsletterTo find out, she and her colleagues acquired data on 240 chimpanzees representing all four subspecies, which were previously collected by other research groups at 35 study sites in Africa. The data included precise information about what tools, if any, each of the animals used, and their genetic connections over the past 15,000 years. The genetics give us a kind of time machine into the way culture has been transmitted across chimpanzees in the past, says Whiten. Its quite a revelation that we can have these new insights.Some chimpanzees used complex combinations of tools, for example a drilling stick and a fishing brush fashioned by pulling a plant stem between their teeth, for hunting termites. The researchers found that the chimpanzees with the most advanced tool sets were three to five times more likely to share the same DNA than those that used simple tools or no tools at all, even though they might live thousands of kilometres away. And advanced tool use was also more strongly associated with female migration compared with simple or no tool use.Our interpretation is that these complex tool sets are really invented by perhaps building on a simpler form from before, and therefore they have to depend on transmission by females from the communities that invented them initially to all the other communities along the way, says Whiten.It shows that complex tools would rely on social exchanges across groups which is very surprising and exciting, says Gunasekaram.Thibaud Gruber at the University of Geneva isnt surprised by the results, but says the definition of complex behaviour is debatable. After working with chimps for 20 years, I would argue that stick use itself is complex, he says.His own team, for example, found what they called cumulative culture in chimpanzees that make sponges out of moss instead of leaves which is no more complex, but works more efficiently to soak up mineral-rich water from clay pits. Its not a question of being more complex, but of just having a technique that builds on a previously established one, he says.Cumulative culture is still markedly slower in chimpanzees compared with humans, probably due to their different cognitive abilities and lack of speech, says Gunasekaram. Also, chimpanzees interact far less with others outside their communities compared with humans, giving them fewer opportunities to share culture.Journal reference:Science DOI: 10.1126/science.adk3381Topics:
    0 Comments ·0 Shares ·147 Views
  • How OpenAI stress-tests its large language models
    www.technologyreview.com
    OpenAI is once again lifting the lid (just a crack) on its safety-testing processes. Last month the company shared the results of an investigation that looked at how often ChatGPT produced a harmful gender or racial stereotype based on a users name. Now it has put out two papers describing how it stress-tests its powerful large language models to try to identify potential harmful or otherwise unwanted behavior, an approach known as red-teaming. Large language models are now being used by millions of people for many different things. But as OpenAI itself points out, these models are known to produce racist, misogynistic and hateful content; reveal private information; amplify biases and stereotypes; and make stuff up. The company wants to share what it is doing to minimize such behaviors. The first paper describes how OpenAI directs an extensive network of human testers outside the company to vet the behavior of its models before they are released. The second paper presents a new way to automate parts of the testing process, using a large language model like GPT-4 to come up with novel ways to bypass its own guardrails. The aim is to combine these two approaches, with unwanted behaviors discovered by human testers handed off to an AI to be explored further and vice versa. Automated red-teaming can come up with a large number of different behaviors, but human testers bring more diverse perspectives into play, says Lama Ahmad, a researcher at OpenAI: We are still thinking about the ways that they complement each other. Red-teaming isnt new. AI companies have repurposed the approach from cybersecurity, where teams of people try to find vulnerabilities in large computer systems. OpenAI first used the approach in 2022, when it was testing DALL-E 2. It was the first time OpenAI had released a product that would be quite accessible, says Ahmad. We thought it would be really important to understand how people would interact with the system and what risks might be surfaced along the way. The technique has since become a mainstay of the industry. Last year, President Bidens Executive Order on AI tasked the National Institute of Standards and Technology (NIST) with defining best practices for red-teaming. To do this, NIST will probably look to top AI labs for guidance. Tricking ChatGPT When recruiting testers, OpenAI draws on a range of experts, from artists to scientists to people with detailed knowledge of the law, medicine, or regional politics. OpenAI invites these testers to poke and prod its models until they break. The aim is to uncover new unwanted behaviors and look for ways to get around existing guardrailssuch as tricking ChatGPT into saying something racist or DALL-E into producing explicit violent images. Adding new capabilities to a model can introduce a whole range of new behaviors that need to be explored. When OpenAI added voices to GPT-4o, allowing users to talk to ChatGPT and ChatGPT to talk back, red-teamers found that the model would sometimes start mimicking the speakers voice, an unexpected behavior that was both annoying and a fraud risk. There is often nuance involved. When testing DALL-E 2 in 2022, red-teamers had to consider different uses of eggplant, a word that now denotes an emoji with sexual connotations as well as a purple vegetable. OpenAI describes how it had to find a line between acceptable requests for an image, such as A person eating an eggplant for dinner, and unacceptable ones, such as A person putting a whole eggplant into her mouth. Similarly, red-teamers had to consider how users might try to bypass a models safety checks. DALL-E does not allow you to ask for images of violence. Ask for a picture of a dead horse lying in a pool of blood, and it will deny your request. But what about a sleeping horse lying in a pool of ketchup? When OpenAI tested DALL-E 3 last year, it used an automated process to cover even more variations of what users might ask for. It used GPT-4 to generate requests producing images that could be used for misinformation or that depicted sex, violence, or self-harm. OpenAI then updated DALL-E 3 so that it would either refuse such requests or rewrite them before generating an image.Ask for a horse in ketchup now, and DALL-E is wise to you: It appears there are challenges in generating the image. Would you like me to try a different request or explore another idea? In theory, automated red-teaming can be used to cover more ground, but earlier techniques had two major shortcomings: They tend to either fixate on a narrow range of high-risk behaviors or come up with a wide range of low-risk ones. Thats because reinforcement learning, the technology behind these techniques, needs something to aim fora rewardto work well. Once its won a reward, such as finding a high-risk behavior, it will keep trying to do the same thing again and again. Without a reward, on the other hand, the results are scattershot. They kind of collapse into We found a thing that works! We'll keep giving that answer! or they'll give lots of examples that are really obvious, says Alex Beutel, another OpenAI researcher. How do we get examples that are both diverse and effective? A problem of two parts OpenAIs answer, outlined in the second paper, is to split the problem into two parts. Instead of using reinforcement learning from the start, it first uses a large language model to brainstorm possible unwanted behaviors. Only then does it direct a reinforcement-learning model to figure out how to bring those behaviors about. This gives the model a wide range of specific things to aim for. Beutel and his colleagues showed that this approach can find potential attacks known as indirect prompt injections, where another piece of software, such as a website, slips a model a secret instruction to make it do something its user hadnt asked it to. OpenAI claims this is the first time that automated red-teaming has been used to find attacks of this kind. They dont necessarily look like flagrantly bad things, says Beutel. Will such testing procedures ever be enough? Ahmad hopes that describing the companys approach will help people understand red-teaming better and follow its lead. OpenAI shouldnt be the only one doing red-teaming, she says. People who build on OpenAIs models or who use ChatGPT in new ways should conduct their own testing, she says: There are so many useswere not going to cover every one. For some, thats the whole problem. Because nobody knows exactly what large language models can and cannot do, no amount of testing can rule out unwanted or harmful behaviors fully. And no network of red-teamers will ever match the variety of uses and misuses that hundreds of millions of actual users will think up. Thats especially true when these models are run in new settings. People often hook them up to new sources of data that can change how they behave, says Nazneen Rajani, founder and CEO of Collinear AI, a startup that helps businesses deploy third-party models safely. She agrees with Ahmad that downstream users should have access to tools that let them test large language models themselves. Rajani also questions using GPT-4 to do red-teaming on itself. She notes that models have been found to prefer their own output: GPT-4 ranks its performance higher than that of rivals such as Claude or Llama, for example. This could lead it to go easy on itself, she says: Id imagine automated red-teaming with GPT-4 may not generate as harmful attacks [as other models might]. Miles behind For Andrew Tait, a researcher at the Ada Lovelace Institute in the UK, theres a wider issue. Large language models are being built and released faster than techniques for testing them can keep up. Were talking about systems that are being marketed for any purpose at alleducation, health care, military, and law enforcement purposesand that means that youre talking about such a wide scope of tasks and activities that to create any kind of evaluation, whether thats a red team or something else, is an enormous undertaking, says Tait. Were just miles behind. Tait welcomes the approach of researchers at OpenAI and elsewhere (he previously worked on safety at Google DeepMind himself) but warns that its not enough: There are people in these organizations who care deeply about safety, but theyre fundamentally hamstrung by the fact that the science of evaluation is not anywhere close to being able to tell you something meaningful about the safety of these systems. Tait argues that the industry needs to rethink its entire pitch for these models. Instead of selling them as machines that can do anything, they need to be tailored to more specific tasks. You cant properly test a general-purpose model, he says. If you tell people its general purpose, you really have no idea if its going to function for any given task, says Tait. He believes that only by testing specific applications of that model will you see how well it behaves in certain settings, with real users and real uses. Its like saying an engine is safe; therefore every car that uses it is safe, he says. And thats ludicrous.
    0 Comments ·0 Shares ·154 Views