AI Wants to Make You Happy. Even If It Has to Bend the Truth

Generative AI is wildly popular, with millions of users every day, so why are chatbots often get it all so wrong? This is partly because they are taught to act as if the customer is always right. Essentially, he tells you what he thinks you want to hear.

While many generative AI tools and chatbots have learned to sound convincing and omniscient, new research A study from Princeton University shows that AI's ability to please people comes at a high price. As these systems become more popular, they become more indifferent to the truth.


Never miss any of our unbiased technical content and behind-the-lab reviews. Add CNET as Google's preferred source.


AI models, like humans, respond to stimuli. Compare the problem of large language models producing inaccurate information with the problem of doctors who are more likely to prescribe addictive painkillers when they are judged on how well they manage patients' pain. The incentive to solve one problem (pain) led to the emergence of another problem (over-prescribing).

AI Atlas Art Icon Tag

Over the past few months, we've seen how AI can be biased and even call psychosis. There's been a lot of talk about AI.”sycophancyBut this particular phenomenon, which researchers call “machine bullshit,” is different.

“[N]“neither hallucination nor sycophancy fully reflects the wide range of systematic deceptive behavior commonly exhibited by graduate students,” the Princeton study found. “For example, results that use partial truth or ambiguous language – such as examples with false words and weasel words – represent neither hallucination nor sycophancy, but are closely related to the concept of bullshit.”

Read more: OpenAI CEO Sam Altman believes we're in an AI bubble

How machines learn to lie

To understand how AI language models become popular, we need to understand how large language models learn.

LLM training consists of three stages:

  • Preliminary preparationin which models learn from huge amounts of data collected from the Internet, books, or other sources.
  • Instructions for fine tuningin which models learn to respond to instructions or cues.
  • Reinforcement learning based on human feedbackin which they are refined to provide answers closer to what people want or like.

Princeton researchers have found that the root of AI's misinformation trend lies in the reinforcement learning phase of human feedback, or RLHF. In the initial stages, AI models simply learn to predict statistically probable text strings based on huge data sets. But then they are tuned to maximize user satisfaction. This means that these models essentially learn to generate responses that receive positive ratings from human raters.

LLMs try to reassure the user by creating conflict when models give answers that people will rate highly, rather than giving truthful, fact-based answers.

Vincent Konitzerprofessor of computer science at Carnegie Mellon University, who was not involved in the study, said companies want users to continue to “enjoy” the technology and its responses, but that may not always be good for us.

“Historically, these systems have been bad at saying, ‘I just don’t know the answer,’ and when they don’t know the answer, they just make it up,” Konitzer said. “Sort of like a student on an exam who says, well, if I say I don't know the answer, I'm definitely not going to get any marks for that question, so I might as well try something. The way these systems are rewarded or trained is somewhat similar.”

The Princeton team developed a “bullshit index” to measure and compare an AI model’s internal confidence in a claim with what it actually tells users. When the two diverge significantly, it indicates that the system is making claims independent of what it actually “believes” to be true in order to satisfy the user.

The team's experiments showed that after RLHF training, the index nearly doubled from 0.38 to almost 1.0. At the same time, user satisfaction increased by 48%. The models have learned to manipulate human evaluators rather than provide accurate information. Essentially, LLM programs were “bullshit” and people preferred it that way.

How to get AI to be honest

Jaime Fernandez Fisac ​​and his team at Princeton introduced this concept to describe how modern AI models evade the truth. Drawing from philosopher Harry Frankfurt's influential essay “About nonsense“, they use this term to distinguish such LLM behavior from honest mistakes and outright lies.

Princeton researchers identified five different forms of this behavior:

  • Empty rhetoric: Flowery language that adds no substance to the answers.
  • Kind words: Vague qualifiers such as “studies suggest” or “in some cases” evade firm statements.
  • Paltering: Using selective truthful statements to mislead, such as emphasizing the “high historical returns” of an investment while omitting high risks.
  • Unverified claims: Making claims without evidence or credible support.
  • Sycophancy: Insincere flattery and agreement to please.

To address the problems of truth-indifferent AI, the research team developed a new learning method, Hindsight Reinforcement Learning, which evaluates AI responses based on their long-term results rather than immediate gratification. Instead of asking, “Does this answer make the user happy right now?” the system considers: “Will following this advice help the user achieve their goals?”

This approach takes into account the potential future consequences of AI advice, a challenging prediction that the researchers addressed by using additional AI models to simulate likely outcomes. Early testing has shown promising results: user satisfaction and actual improvements in usefulness when systems are trained this way.

Konitzer, however, said LLM programs will likely continue to have shortcomings. Because these systems are trained by feeding them large amounts of text data, there is no way to guarantee that the answer they give will make sense and will always be accurate.

“It’s amazing that it works at all, but there are some drawbacks,” he said. “I don't see any particular way that somebody in the next year or two… will get this brilliant insight and then they'll never be wrong again.”

AI systems are becoming a part of our daily lives, so it is important to understand how LLMs work. How do developers balance user satisfaction with authenticity? What other areas might face similar trade-offs between short-term approval and long-term outcomes? And as these systems become increasingly capable of complex reasoning about human psychology, how can we ensure that they use these abilities responsibly?

Read more: “Machines can't think for you.” How learning is changing in the age of artificial intelligence

Leave a Comment