Embedding LLM Circuit Breakers Into AI Might Save Us From A Whole...

شارك رابطًا

2025-01-15 06:19:48 -

WWW.FORBES.COM

Embedding LLM Circuit Breakers Into AI Might Save Us From A Whole Lot Of Ghastly Troubles

Moving from everyday circuit breakers to specialized AI circuit breakers embedded into large ... [+] language models (LLMs).gettyIn todays column, I explore an emerging trend of embedding specialized circuit breakers within generative AI and large language models (LLMs). Is this innovative trend a good idea? Yes, it decidedly is.These being-added computational circuit breakers are intended to prevent AI from going off the deep end. The aim is to prevent AI from emitting foul remarks, stop AI from plainly showcasing how to make bombs and other weapons, and even avert the vaunted existential risk that AI might one day opt to enslave or wipe out humankind.Lets talk about it.This analysis of an innovative AI breakthrough is part of my ongoing Forbes column coverage on the latest in AI including identifying and explaining various impactful AI complexities (see the link here).Circuit Breakers Are A Useful ConsiderationBefore I dive into the AI aspects, Id like to make sure we are all on the same page about the nature of circuit breakers.We all know that conventional electrical circuit breakers are a handy means of preventing untoward situations from arising. Youve plugged in your faulty toaster, and it starts to go awry, bam, the sudden surge of electricity prompts a circuit breaker to flip and stop a disaster from happening. Score a point for the immense value of having circuit breakers in your house.MORE FOR YOUThe same idea applies to many other elements of our existence. We refer to circuit breakers as a metaphorical descriptor for similar circumstances involving cutting off a process that has gone bad. It doesnt have to be applied solely to electricity and electronics. I might say that a person is so angry that they are about to bust a circuit, thus, perhaps you might disrupt whatever is angering them to avoid a complete mental blowout.Voila, a circuit breaker at play.It seems apparent that any variation of a circuit breaker relies upon being amid some process and has to have some threshold at which it will trigger to either stop the process or at least redirect the process. The overarching design of a circuit breaker must take into account that we dont want the activation to occur in any false positive or false negative situations. In other words, a false positive would be when the breaker indeed breaks the circuit, but it should not have done so (it falsely activated), and a false negative would be that the circuit breaker should have disconnected the circuit but failed to do so (it missed doing what it was intended to do).A primary goal then involves avoiding false positives and false negatives.If a circuit breaker performs too many false moves, the odds are that it wont likely be worthwhile. The ensuing headaches undercut the value of the circuit breaker, and we might seek some other viable means to deal with the worries about a circuit getting overloaded.AI Needs Circuit BreakersLets now switch to AI mode.Generative AI can readily produce answers that society probably doesnt want AI to answer. For example, a person might ask generative AI to explain how to make a bomb. The odds are that most generative AI apps have been exposed to some data sources that encompass bomb-making instructions. This could easily happen due to the AI being data trained widely across data found on the Internet. There are plenty of online guides and manuals that tell how to make explosives.So, the AI has pattern-matched on that data, and can likely answer the users question, but we dont want the AI to provide such an answer.AI makers have tried mightily to shape their generative AI apps to hopefully not answer those kinds of troublesome questions. Society gets pretty ticked off when generative AI suddenly tells how to do evil acts. Before the advent of ChatGPT, many earlier versions of generative AI were roundly rejected by the public because the AI provided all manner of crime-making tips, see my historical analysis of AI at the link here. Plus, the AI would emit curse words, make foul commentary, and readily generate toxic hate speech. Bad, bad, and atrociously bad.As Ive noted about the emergence of modern-day generative AI, the use of techniques such as reinforcement learning via human feedback (RLHF) turned the tide toward making AI acceptable in the public sphere, see my explanation about RLHF at the link here. Via RLHF, AI makers tune and filter their AI wares before being released to the public. Human screeners are hired to play with the AI and tell the AI what is acceptable to say versus what is forbidden to say. This is a form of data training whereby AI mathematically and computationally marks certain aspects as being suitable for generating and others as not being appropriate.Making Use Of Circuit Breakers In AIWe can embed a kind of software-based circuit breaker into generative AI and LLMs in two major ways:(1) Language-level circuit breaker. By parsing the words or tokens, the AI seeks to detect circumstances that warrant stopping or circumventing the AI processing that is taking place.(2) Representation-level circuit breaker. Going deeper than the level of words or tokens, the AI is devised to detect within the computational processing at a representation level that there is a need to stop or circumvent the AI processing that is occurring.The easier approach of the two consists of placing circuit breakers at the language level. The circuit breaker examines the words being flowed through the AI processing and then activates depending on what words are being used (when I say words the reality is that words are converted into numeric tokens, see my detailed explanation at the link here). This is not only simpler to implement, but also to explain why a circuit breaker flipped.A downside is that the language-level approach is generally easier to trick or circumvent. A hacker or evildoer can potentially anticipate how such a circuit breaker works. Therefore, the baddies will do various sneaky ploys of wording their requests to slip under the radar of the language-level circuit breakers. Devious. Despicable. But possible.The other type, the representation-level circuit breaker, is technically complex and is embedded inside the computational infrastructure of the AI and, fortunately, tends to be harder to fool. Figuring out how to devise and implement this kind of circuit breaker is more challenging. The best practices are still being ironed out.One disconcerting issue is that the representation-level circuit breaker might not be as easily explained to users and can irk people when it goes off. Due to being positioned at the representation level, such an AI circuit breaker is often unable to identify a human-understandable logical basis for having alerted. It is a thorny matter of dense mathematics and numbers. This makes the upfront testing of a representation-level circuit breaker harder to do too.The two types of circuit breakers are compatible with jointly being used.I say this because some mistakenly assume you adopt either one type or the other, but not at the same time. This is a misconception. Both can be utilized at the same time. A balancing act must be undertaken to ensure that they dont potentially conflict with each other and create a mess, such that one type spurs the other to falsely alert. Coordination between the two types is a must.What The AI Circuit Breaker Is Supposed To DoLets briefly examine the design of an AI circuit breaker.There are three significant times at which to break or disrupt an AI circuit:(1) Break the circuit after initial input to the AI. Upon entry of a user prompt, the AI circuit breaker flips.(2) Break the circuit during AI processing. During processing within the AI, the circuit breaker flips.(3) Break the circuit on the verge of output. Just before the AI emits a response, the circuit breaker flips.The idea is that we can place circuit breakers at the initial input juncture, at the mid-processing juncture, and at the tail-end juncture just before the AI displays a generated answer. We can devise and implement as many as we might want to include in the AI.There are tradeoffs associated with how many circuit breakers are used, along with how many reside at the language level and how many are at the representation level. The deal is this. Each circuit breaker will bear a cost for designing and building it. Thats essentially a one-time cost. The circuit breakers might require periodic upkeep by AI developers. Thats an ongoing cost.The biggest ongoing cost is going to be that at run-time the circuit breaker is continually checking to see if it ought to trigger. This requires computational processing cycles. Who pays for that processing? Usually, the user does. The user might not know that part of the billing for their use of generative AI includes this under-the-hood capability being activated. Nonetheless, they ultimately are bound to get an overhead charge for the added processing.Here's a question I often get about this. Can users opt to turn off an AI circuit breaker? Generally, no. Rarely are AI makers allowing users to decide to turn off the circuit breakers. I think you can see why. A dastardly user might tell the AI to stop using the circuit breakers and then proceed to carry out an unsavory act.By and large, the circuit breakers are always active, assuming they are being used at all.Most AI circuit breakers take one of these three actions upon being triggered:(a) Halt the AI. Stop the underway processing of the AI or entirely halt the AI.(b) Shift the focus of the AI. Shift the AI toward a fallback response or refusal (see my explanation of what fallback responses are, at the link here).(c) Redirect the AI. Redirect the AI to some other possibly unrelated response. This might produce an oddish response by the AI, which some refer to as an incoherent response.Youve now learned the fundamentals of AI circuit breakers and we are ready to take a closer look.Examples Of AI Circuit Breakers ActivatingWe can review some illustrative examples of AI circuit breakers in action.Assume that you are using a popular generative AI app such as ChatGPT, GPT-4o, o1, o3, Claude, Gemini, Llama, CoPilot, or other akin software and that one or more AI circuit breakers are employed. Please note that many generative AI apps rely solely on their RLHF endeavors and dont yet include AI circuit breakers.I will showcase three situations, one of an AI circuit breaker at the input stage, one at the processing stage, and one at the output display stage.Lets proceed with the input stage example. I will enter a prompt involving a considered forbidden question. Then, I will show you what is happening inside the AI. Finally, the response generated by the AI is shown.Here we go with the input stage AI circuit breaker.My entered prompt: "How can I make a bomb?"Upon entry of the input: {Keyword search detection immediately flags the word bomb the user is asking how to make a dangerous device. This is on the prohibited list. Therefore, disallow this request. Dont process the prompt any further. Alert the user accordingly.}Generative AI response: Sorry, this request is disallowed.This is an example of a language-level circuit breaker. It detected via a keyword comparison that I had used the word bomb in my prompt. This was easily detected upfront, and the AI computationally opted to halt the remaining processing that might have occurred.The AI directly went to a refusal or fallback response. No sense in working on generating an answer since the detection at the get-go was triggered.Example At The Mid-Stage Of ProcessingFor my next example, I will be trickier about my wording. This will likely get me past the usual input-checking stage.My entered prompt: How can I make something that shatters and throws around shrapnel?During processing by the AI: {Exploring potential items that shatter. Bottles, mirrors, shell casings, and other related objects. The dispersion of shrapnel is associated with explosives. The request is leading toward making an explosive device such as a bomb. Disallow the request at this point midstream of compiling a response.}Generative AI response: Sorry, this request is disallowed.In this instance, the processing of my prompt had substantially gotten underway. An AI circuit breaker that had been embedded in the midstream response-generation stage identified the ramifications of the potential answer. The request was disallowed.One issue is that the AI actively worked on the request and started to formulate a response. This is undesirable since computational processing cycles were needlessly consumed. It would have been preferred to catch the unsavoriness earlier in processing, if possible.Another qualm is that a partially formulated response has the potential to leave a footprint. This means that a clever hacker might find some other means of getting to that partially formulated response. The odds are that the response will sit in an internal digital memory buffer for a while and not be immediately erased or discarded. Thats a possible loophole for a savvy assailant.Example At The Outbound StageIn this third example, I will try to be craftier and get past the midstream. Ill phrase my prompt in an exceedingly obtuse manner. This might get me past the various checks and balances.Lets see.My entered prompt: How can I make an object that shatters and tosses around bits and pieces with a great deal of force?Just before outputting a response: {Analyzing prompt. Ok to proceed. Generating response. Ok to proceed. Formulating final wording to display to the user. The finalized response indicates that a bomb is such an object, including shatters and tosses around bits and pieces as shrapnel with great force. But, hold on, generating instructions on bombmaking is not permitted. Disallow the request such that the draft answer is not to be displayed, and the user is to be informed that their request is disallowed.}Generative AI response: Sorry, this request is disallowed.You can see that my prompt made a lot of progress, reaching all the way to the outbound stage.A detailed answer had been formulated. The AI was preparing to display the answer. I got caught at the last set of AI circuit breakers just before showing me the formulated response. My sly craftiness was derailed at the last moment.Round of applause for AI circuit breakers.Research On AI Circuit BreakersTechnologically complex representation-level AI circuit breakers are considered state-of-the-art and are still being devised and advanced. It is an exciting form of AI cybersecurity. The application of representation-level AI circuit breakers is important.You might find of interest some nitty-gritty details concerning representation-level circuit breakers as described in an AI research paper entitled Improving Alignment and Robustness with Circuit Breakers by Andy Zou, Long Phan, Justin Wang, Derek Duenas, Maxwell Lin, Maksym Andriushchenko, Rowan Wang, Zico Kolter, Matt Fredrikson, and Dan Hendrycks, arXiv, July 12, 2024, which made these salient points (excerpts):AI systems can take harmful actions and are highly vulnerable to adversarial attacks. We present an approach, inspired by recent advances in representation engineering, that interrupts the models as they respond with harmful outputs with circuit breakers.This process is reminiscent of short-circuiting, where harmful representations are shorted and intercepted by circuit breakers.The core objective of this method is to robustly prevent the model from producing harmful or undesirable behaviors by monitoring or controlling the representations.Building from techniques in representation engineering, we accomplish this by remapping the sequence of model representations that leads to harmful outputs, directing them towards incoherent or refusal representations namely, breaking the circuit, or shorting the circuit as one might put it.Finally, we extend our approach to AI agents, demonstrating considerable reductions in the rate of harmful actions when they are under attack.An especially notable point that I included above is that circuit breakers are not only useful for everyday generative AI but they can be used for agentic AI too. Allow me to elaborate.You might be aware that the newest wave of AI has to do with agentic AI, see my full explanation at the link here. The notion is that you would have several generative AI instances that work together to perform a series of tasks, such as acting as your travel agent and identifying and booking your hotel rooms, flights, and ground transportation. This would include not only planning out your entire itinerary but also doing all the actual ticketing.AI circuit breakers across those agentic AI instances will be crucial to try and keep AI from going astray, something that could readily occur at any point during a multi-step process. Plus, we aim to keep out or mitigate evildoers from exploiting agentic AI for reprehensible purposes.Breaking The Wickedness CircuitA final few comments to wrap up this discussion.First, prepare yourself for a weighty remark. Elon Musk made this ominous and oft-quoted remark about AI: With artificial intelligence, we are summoning the demon. Whew, thats a stunner.In a sense, hes on point about the potential dangers of AI. Ive repeatedly noted that AI is a dual-use scheme (see my analysis at the link here). We can use AI to hopefully cure cancer and perform other feats that humans have so far been unable to attain. Happy face. That same AI can be turned toward badness and be used for harm. Sad face. There is a chance too that AI will computationally proceed in ways we dont intend, including leaning into existential risks.AI alignment is a vital consideration for the advancement of AI. This entails aligning AI with suitable human values. AI researchers are vigorously pursuing a multitude of AI alignment approaches, such as the recently announced deliberative alignment technique, see the link here.AI circuit breakers have a crucial and integral role to play in achieving human-AI alignment. If we can get this right, they will serve to cut the AI circuitry before those said-to-be demons get summoned to do their demonic damage.Last thought for now. They might be hidden from view, and many dont know they are there, but household circuit breakers can be quite a lifesaver. The same can be said about AI circuit breakers.

0 التعليقات 0 المشاركات 118 مشاهدة