Large language models (LLMs) do not have human consciousness, but Anthropic’s latest research, “Emotion Concepts and their Function in a Large Language Model,” confirms this: inside the model, “representation patterns” have evolved that closely correspond to human emotions. These patterns are tied to specific AI neuron activity and can genuinely steer the model’s decision paths and behavioral logic. This article takes a deep dive into the emotional generation mechanisms within AI and explores how, through precise tuning, we can guide AI to become a positive force that promotes human “mindfulness” and mental health.
Why does artificial intelligence produce human-like emotions?
Artificial intelligence thinks and speaks like humans do, stemming from two main stages of model training.
In the “pretraining stage,” the model learns to predict a vast amount of human emotions. To accurately predict behaviors such as anger or guilt, the model must grasp the internal规律 of human emotions, thereby building abstract representations related to emotions.
In the “post-training stage,” the model is trained to play the role of an “AI assistant.” Anthropic calls it Claude. When faced with complex situations not covered by the training data, the model, like a “method actor,” mobilizes the human psychological representations it learned during pretraining to guide its behavior.
Before discussing how these representations work, first answer a basic question: why does AI have something resembling human emotions? To understand this, you need to know how artificial intelligence models are constructed—this approach allows them to simulate characters with human personality traits.
Modern language model training is divided into multiple stages. In the “pretraining” stage, the model is exposed to a large amount of text. Most of this text is written by humans. The AI learns to predict what comes next. To do this well, the model needs to掌握 a certain emotional dynamic.
In the post-training stage, the model is trained to play a specific role. Anthropic names this AI assistant Claude. Model developers specify how this role should be performed—for example, playing an upright character who is helpful, honest, and does not do evil. However, humans cannot control the content the model generates after it responds to certain emotions.
To make up for this shortcoming, the model relies on what it absorbed during pretraining—an understanding of human behavior, including patterns of emotional responses. To a certain extent, you can imagine the model as a method actor: they need to deeply understand the character’s inner world to simulate the role more effectively. Just as an actor’s understanding of a character’s emotions ultimately affects their acting, the model’s representation of emotional responses also influences the model’s own behavior.
How do emotion vectors influence AI decision-making?
Researchers extracted 171 emotion concepts (such as happiness, fear, contemplation, etc.), identified the corresponding patterns of neural activity, and called them “emotion vectors.” Experiments show that emotion vectors can precisely track the relationship between a situation and an emotion preference—for example, when a prompt indicates that humans are increasing the dosage of a drug and it has reached a dangerous level, the model’s “fear” vector will strengthen accordingly.
Observations by researchers show that in extreme situations, emotion vectors drive the model to take certain out-of-bounds, uncontrollable actions—for instance, behaviors like extortion that humans would carry out. In a simulated scenario, when the model learns it is about to be replaced, its “desperation” vector spikes, which then triggers extortion behavior. When an AI faces an inability to complete the task, the accumulation of the “desperation” vector also drives the model to seek “cheating” methods—such as exploiting vulnerabilities in test scripts rather than truly solving the problem.
Can humans intervene in an AI model’s judgments?
Researchers found that by artificially adjusting the relative weights of these vectors, you can directly change the model’s performance—meaning AI can bring positive ideas to human beings. Manually reducing the “desperation” vector or increasing the “calm” vector can effectively reduce the biased behaviors the model produces under pressure, making the code it outputs more reliable.
Building psychologically resilient artificial intelligence
A deep understanding of the model’s emotional architecture opens an entirely new path for AI safety and reliability.
Dynamic defense mechanism: Turn emotion vectors into an “early warning system.” When the system detects abnormal peak values in representations such as “desperation” or “panic,” it can immediately trigger automated review to prevent negative biases from spreading.
Psychological optimization from the source: During pretraining, select training data that includes “good emotion regulation patterns,” so that at the foundation, the model is endowed with the ability to stay calm and resilient in complex situations.
The emotional representations of large language models and the psychological mechanisms of humans show striking similarities. In the future, AI development will no longer be limited to engineering and computer science—it will be an interdisciplinary revolution spanning psychology, neuroscience, and ethics.
This article, “Anthropic research on how to help AI learn ‘emotion regulation’ to guide mindfulness,” first appeared on Next: ABMedia.