The perverse incentives of Vibe Coding
Image Credit: Chat GPT o3I’ve been using AI coding assistants like Claude Code for a while now, and I’m here to say, I may be an addict. And boy is this is an expensive habit.Its “almost there” quality — the feeling we’re just one prompt away from the perfect solution — is what makes it so addicting. Vibe coding operates on the principle of variable-ratio reinforcement, a powerful form of operant conditioning where rewards come unpredictably. Unlike fixed rewards, this intermittent success pattern, triggers stronger dopamine responses in our brain’s reward pathways, similar to gambling behaviors.What makes this especially effective with AI is the minimal effort required for potentially significant rewards — creating what neuroscientists call an “effort discounting” advantage. Combined with our innate completion bias — the drive to finish tasks we’ve started — this creates a compelling psychological loop that keeps us prompting.I don’t smoke, but don’t these bar graphs look like ciagrettes?Since Claude Code has been released, I have probably spent over vibe coding various projects into reality.But lets talk about the expense too, because I think there’s something bad there as well: coding agents, and especially Claude 3.7, tend to write too much code, a phenomenon that ends up costing users more than it should.Where an experienced developer might solve a problem with a few elegant lines with a thoughtful functional method, these AI systems often produce verbose, over-engineered solutions that tackle problems incrementally rather than addressing them at their core.My initial reaction was to attribute this to the relative immaturity of LLMs and their limitations when reasoning about abstract logic problems. Since these models are primarily trained to predict and generate text based on patterns they’ve seen before, it makes sense that they might struggle with the deeper architectural thinking that leads to elegant, minimal solutions.My human code on the left, Claude Code on the right implementing the same algorithmAnd indeed, the highly complex tasks I’ve handed to them have largely resulted in failure: implementing a minimax algorithm in a novel card game, crafting thoughtful animations in CSS, completely refactoring a codebase. The LLMs routinely get lost in the sauce when it comes to thinking through the high level principles required to solve difficult problems with computer science.In the example above, my human implemented version of minimax from 2018 totals 400 lines of code, whereas Claude Code’s version comes in at 627 lines. The LLM version also requires almost a dozen other library files. Granted, this version is in TypeScript and has a ton of extra bells and whistles, some of which I explicitly asked for, but the real problem is: it doesn’t actually work. Furthermore, using the LLM to debug it requires sending the bloated code back and forth to the API every time I want to holistically debug it.In an effort to impress the user and over-deliver, LLMs end up creating a rat’s nest of ultra-defensive code littered with debugging statements, neurotic comments and barely-useful helper funcitions. If you’ve ever worked in a highly functional production codebase, this is enough to drive you insane.I think everyone who spends any time vibe coding eventually discovers something like this and realizes that it’s much more worthwhile to work with a plan composed of discrete tasks that could be explained to a junior level developer vs. a feature-level project handed off to a staff engineer.There’s also the likelihood that the vast majority of code that LLMs have been trained on tends to be inelegant and overly verbose. Lord knows there’s a lot of AbstractJavaFinalSerializedFactory code out there.But I’m beginning to think the problem runs deeper, and it has to do with the economics of AI assistance.The economic incentive problemMany AI coding assistants, including Claude Code, charge based on token count — essentially the amount of text processed and generated. This creates what economists would call a “perverse incentive” — an incentive that produces behavior contrary to what’s actually desired.Let’s break down how this works:The AI generates verbose, procedural code for a given taskThis code becomes part of the context when you ask for further changes or additionsThe AI now has to readthis verbose code in every subsequent interactionMore tokens processed = more revenue for the company behind the AIThe LLM developers have no incentive to “fix” the verbose code problem because doing so will meaningfully impact their bottom lineAs Upton Sinclair famously noted: “It is difficult to get a man to understand something when his salary depends on his not understanding it.” Similarly, it might be difficult for AI companies to prioritize code conciseness when their revenue depends on token count.The broader implicationsThis pattern points to a more general concern in AI development: the alignment between how systems are monetized and how well they serve user needs. When charging by token count, there’s naturally less incentive to optimize for elegant, minimal solutions.Even “all you can eat” subscription plansdon’t fully resolve this tension, as they typically come with usage caps or other limitations that maintain the underlying incentive structure.System instructions and verbosity trade-offsThe perverse incentives in AI code generation point to a more fundamental issue that extends beyond coding assistants. When she was reading a draft of this, Louise pointed out some recent research from Giskard AI’s Phare benchmark that reveals a troubling pattern that mirrors our coding dilemma: demanding shorter responses jeopardizes the accuracy of the answers.According to their findings, instructions emphasizing concisenesssignificantly degraded factual reliability across most models tested — in some cases causing a 20% drop in hallucination resistance. When forced to be concise, models face an impossible choice between fabricating short but inaccurate answers or appearing unhelpful by rejecting the question entirely. The data shows models consistently prioritize brevity over accuracy when given these constraints.There’s clearly something going on where the more verbose the LLM is, the better it does. This actually makes sense given the discovery that chain-of-thought reasoning improves accuracy, but this issue has begun to feel like a real tradeoff when it comes to these almost-magical systems.We see this exact tension in code generation every day. When we optimize for conciseness and ask for the problems to be solved in fewer setps, we often sacrifice quality. The difference is that in coding, the sacrifice manifests as over-engineered verbosity — the model produces more tokens to cover all possible edge cases rather than thinking deeply about the elegant core solution or a root cause problem. In both cases, economic incentiveswork against quality outcomes.Just as Phare’s research suggests that seemingly innocent prompts like “be concise” can sabotage a model’s ability to debunk misinformation, our experience shows that standard prompting approaches can yield bloated, inefficient code. In both domains, the fundamental misalignment between token economics and quality outputs creates a persistent tension that users must actively manage.Some tricks to manage these perverse incentivesWhile we wait for AI companies to better align their incentives with our need for elegant code, I’ve developed several strategies to counteract verbose code generation:1. Force planning before implementationI harass the LLM to write a detailed plan before generating any code. This forces the model to think through the architecture and approach, rather than diving straight into implementation details. Often, I find that a well-articulated plan leads to more concise code, as the model has already resolved the logical structure of the solution before writing a single line.2. Explicit permission protocolI’ve implemented a strict “ask before generating” protocol in my workflow. My personal CLAUDE.md file explicitly instructs Claude to request permission before writing any code. Infuriatingly, Claude Code regularly ignores this, likely due to its massive system prompt that talks so much about writing code it overrides my preferences. Enforcing this boundary and repeatedly belaboring ithelps prevent the automatic generation of unwanted, verbose solutions.3. Git-based experimentation with ruthless pruningVersion control becomes essential when working with AI-generated code. I frequently benchmark code in git when I arrive at an “ok it works as intended” moment. Creating experimental branches is also very helpful. Most importantly, I’m ready to throw out branches entirely when fixing them would require more work than starting from scratch. This willingness to abandon sunk costs is surprisingly important — it helps me work through problems and figure out the AI’s hangups while preventing the accumulation of bandaid solutions on top of fundamentally flawed approaches.4. Use a cheaper modelSometimes the simplest solution works best: using a smaller, cheaper model often results in more direct solutions. These models tend to generate less verbose code simply because they have limited context windows and processing capacity. While they might not handle extremely complex problems as well, for many day-to-day coding tasks, their constraints can actually produce more elegant solutions. For example, Claude 3.5 Haiku is currently 26% the price of Claude 3.7. Also, Claude 3.7 seems to overengineer more frequently than Claude 3.5.Moving toward better alignmentWhat might a better approach look like?LLM coding agents could evaluated and incentivized based on code quality metrics rather than just token counts. The challenge here is that this kind of metric is quite subjective.Companies could offer pricing models that reward efficiency rather than verbosityLLMs training should incorporate feedback mechanisms that specifically promote concise, elegant solutions via RLHFCompanies realize that overly verbose code generation is not good for their bottom lineThis isn’t just about getting better AI — it’s about making sure that the economic incentives driving AI development align with what we actually value as developers: clean, maintainable, elegant code that solves problems at their root.Until then, don’t forget: brevity is the soul of wit, and machines have no soul.Thanks to Louise Macfadyen, Justin Kazmark and Bethany Crystal for reading and suggesting edits to a draft of this.— -PS: Yes, I used Claude to help write this post critiquing AI verbosity. There’s a delicious irony here: these systems will happily help you articulate why they might be ripping you off. Their willingness to steelman arguments against their own economic interests shows that the perverse incentives aren’t embedded in the models themselves, but in the business decisions surrounding them. In other words, don’t blame the AI — blame the humans optimizing the revenue models. The machines are just doing what they’re told, even when that includes explaining how they’re being told to do too much.The perverse incentives of Vibe Coding was originally published in UX Collective on Medium, where people are continuing the conversation by highlighting and responding to this story.
#perverse #incentives #vibe #coding
The perverse incentives of Vibe Coding
Image Credit: Chat GPT o3I’ve been using AI coding assistants like Claude Code for a while now, and I’m here to say, I may be an addict. And boy is this is an expensive habit.Its “almost there” quality — the feeling we’re just one prompt away from the perfect solution — is what makes it so addicting. Vibe coding operates on the principle of variable-ratio reinforcement, a powerful form of operant conditioning where rewards come unpredictably. Unlike fixed rewards, this intermittent success pattern, triggers stronger dopamine responses in our brain’s reward pathways, similar to gambling behaviors.What makes this especially effective with AI is the minimal effort required for potentially significant rewards — creating what neuroscientists call an “effort discounting” advantage. Combined with our innate completion bias — the drive to finish tasks we’ve started — this creates a compelling psychological loop that keeps us prompting.I don’t smoke, but don’t these bar graphs look like ciagrettes?Since Claude Code has been released, I have probably spent over vibe coding various projects into reality.But lets talk about the expense too, because I think there’s something bad there as well: coding agents, and especially Claude 3.7, tend to write too much code, a phenomenon that ends up costing users more than it should.Where an experienced developer might solve a problem with a few elegant lines with a thoughtful functional method, these AI systems often produce verbose, over-engineered solutions that tackle problems incrementally rather than addressing them at their core.My initial reaction was to attribute this to the relative immaturity of LLMs and their limitations when reasoning about abstract logic problems. Since these models are primarily trained to predict and generate text based on patterns they’ve seen before, it makes sense that they might struggle with the deeper architectural thinking that leads to elegant, minimal solutions.My human code on the left, Claude Code on the right implementing the same algorithmAnd indeed, the highly complex tasks I’ve handed to them have largely resulted in failure: implementing a minimax algorithm in a novel card game, crafting thoughtful animations in CSS, completely refactoring a codebase. The LLMs routinely get lost in the sauce when it comes to thinking through the high level principles required to solve difficult problems with computer science.In the example above, my human implemented version of minimax from 2018 totals 400 lines of code, whereas Claude Code’s version comes in at 627 lines. The LLM version also requires almost a dozen other library files. Granted, this version is in TypeScript and has a ton of extra bells and whistles, some of which I explicitly asked for, but the real problem is: it doesn’t actually work. Furthermore, using the LLM to debug it requires sending the bloated code back and forth to the API every time I want to holistically debug it.In an effort to impress the user and over-deliver, LLMs end up creating a rat’s nest of ultra-defensive code littered with debugging statements, neurotic comments and barely-useful helper funcitions. If you’ve ever worked in a highly functional production codebase, this is enough to drive you insane.I think everyone who spends any time vibe coding eventually discovers something like this and realizes that it’s much more worthwhile to work with a plan composed of discrete tasks that could be explained to a junior level developer vs. a feature-level project handed off to a staff engineer.There’s also the likelihood that the vast majority of code that LLMs have been trained on tends to be inelegant and overly verbose. Lord knows there’s a lot of AbstractJavaFinalSerializedFactory code out there.But I’m beginning to think the problem runs deeper, and it has to do with the economics of AI assistance.The economic incentive problemMany AI coding assistants, including Claude Code, charge based on token count — essentially the amount of text processed and generated. This creates what economists would call a “perverse incentive” — an incentive that produces behavior contrary to what’s actually desired.Let’s break down how this works:The AI generates verbose, procedural code for a given taskThis code becomes part of the context when you ask for further changes or additionsThe AI now has to readthis verbose code in every subsequent interactionMore tokens processed = more revenue for the company behind the AIThe LLM developers have no incentive to “fix” the verbose code problem because doing so will meaningfully impact their bottom lineAs Upton Sinclair famously noted: “It is difficult to get a man to understand something when his salary depends on his not understanding it.” Similarly, it might be difficult for AI companies to prioritize code conciseness when their revenue depends on token count.The broader implicationsThis pattern points to a more general concern in AI development: the alignment between how systems are monetized and how well they serve user needs. When charging by token count, there’s naturally less incentive to optimize for elegant, minimal solutions.Even “all you can eat” subscription plansdon’t fully resolve this tension, as they typically come with usage caps or other limitations that maintain the underlying incentive structure.System instructions and verbosity trade-offsThe perverse incentives in AI code generation point to a more fundamental issue that extends beyond coding assistants. When she was reading a draft of this, Louise pointed out some recent research from Giskard AI’s Phare benchmark that reveals a troubling pattern that mirrors our coding dilemma: demanding shorter responses jeopardizes the accuracy of the answers.According to their findings, instructions emphasizing concisenesssignificantly degraded factual reliability across most models tested — in some cases causing a 20% drop in hallucination resistance. When forced to be concise, models face an impossible choice between fabricating short but inaccurate answers or appearing unhelpful by rejecting the question entirely. The data shows models consistently prioritize brevity over accuracy when given these constraints.There’s clearly something going on where the more verbose the LLM is, the better it does. This actually makes sense given the discovery that chain-of-thought reasoning improves accuracy, but this issue has begun to feel like a real tradeoff when it comes to these almost-magical systems.We see this exact tension in code generation every day. When we optimize for conciseness and ask for the problems to be solved in fewer setps, we often sacrifice quality. The difference is that in coding, the sacrifice manifests as over-engineered verbosity — the model produces more tokens to cover all possible edge cases rather than thinking deeply about the elegant core solution or a root cause problem. In both cases, economic incentiveswork against quality outcomes.Just as Phare’s research suggests that seemingly innocent prompts like “be concise” can sabotage a model’s ability to debunk misinformation, our experience shows that standard prompting approaches can yield bloated, inefficient code. In both domains, the fundamental misalignment between token economics and quality outputs creates a persistent tension that users must actively manage.Some tricks to manage these perverse incentivesWhile we wait for AI companies to better align their incentives with our need for elegant code, I’ve developed several strategies to counteract verbose code generation:1. Force planning before implementationI harass the LLM to write a detailed plan before generating any code. This forces the model to think through the architecture and approach, rather than diving straight into implementation details. Often, I find that a well-articulated plan leads to more concise code, as the model has already resolved the logical structure of the solution before writing a single line.2. Explicit permission protocolI’ve implemented a strict “ask before generating” protocol in my workflow. My personal CLAUDE.md file explicitly instructs Claude to request permission before writing any code. Infuriatingly, Claude Code regularly ignores this, likely due to its massive system prompt that talks so much about writing code it overrides my preferences. Enforcing this boundary and repeatedly belaboring ithelps prevent the automatic generation of unwanted, verbose solutions.3. Git-based experimentation with ruthless pruningVersion control becomes essential when working with AI-generated code. I frequently benchmark code in git when I arrive at an “ok it works as intended” moment. Creating experimental branches is also very helpful. Most importantly, I’m ready to throw out branches entirely when fixing them would require more work than starting from scratch. This willingness to abandon sunk costs is surprisingly important — it helps me work through problems and figure out the AI’s hangups while preventing the accumulation of bandaid solutions on top of fundamentally flawed approaches.4. Use a cheaper modelSometimes the simplest solution works best: using a smaller, cheaper model often results in more direct solutions. These models tend to generate less verbose code simply because they have limited context windows and processing capacity. While they might not handle extremely complex problems as well, for many day-to-day coding tasks, their constraints can actually produce more elegant solutions. For example, Claude 3.5 Haiku is currently 26% the price of Claude 3.7. Also, Claude 3.7 seems to overengineer more frequently than Claude 3.5.Moving toward better alignmentWhat might a better approach look like?LLM coding agents could evaluated and incentivized based on code quality metrics rather than just token counts. The challenge here is that this kind of metric is quite subjective.Companies could offer pricing models that reward efficiency rather than verbosityLLMs training should incorporate feedback mechanisms that specifically promote concise, elegant solutions via RLHFCompanies realize that overly verbose code generation is not good for their bottom lineThis isn’t just about getting better AI — it’s about making sure that the economic incentives driving AI development align with what we actually value as developers: clean, maintainable, elegant code that solves problems at their root.Until then, don’t forget: brevity is the soul of wit, and machines have no soul.Thanks to Louise Macfadyen, Justin Kazmark and Bethany Crystal for reading and suggesting edits to a draft of this.— -PS: Yes, I used Claude to help write this post critiquing AI verbosity. There’s a delicious irony here: these systems will happily help you articulate why they might be ripping you off. Their willingness to steelman arguments against their own economic interests shows that the perverse incentives aren’t embedded in the models themselves, but in the business decisions surrounding them. In other words, don’t blame the AI — blame the humans optimizing the revenue models. The machines are just doing what they’re told, even when that includes explaining how they’re being told to do too much.The perverse incentives of Vibe Coding was originally published in UX Collective on Medium, where people are continuing the conversation by highlighting and responding to this story.
#perverse #incentives #vibe #coding
·207 Vue