Insider Brief
- Researchers from Google DeepMind, OpenAI, Anthropic and several universities proposed a “positive alignment” framework that would train AI systems to support human flourishing rather than focus solely on preventing harm.
- The study reports that current AI alignment methods centered on safety, refusal training and harmful-content filtering may still produce systems that are manipulative, sycophantic or poorly suited for long-term human well-being.
- The researchers outlined technical and governance approaches for positive alignment, including flourishing-focused evaluations, value-pluralistic training methods, long-term memory systems and decentralized oversight models.
- Image: Photo by Andres Siimon
A new paper from researchers affiliated with Google DeepMind, OpenAI, Anthropic and several universities reports that the artificial intelligence industry is too focused on preventing harm and not focused enough on helping humans thrive, outlining a framework called “positive alignment” that would train AI systems to actively support human flourishing rather than merely avoid dangerous behavior.
The study, published as a preprint on arXiv, proposes a transformation in AI alignment research away from what the researchers describe as a largely “negative” model centered on blocking harmful outputs, preventing misuse and maintaining control over increasingly powerful systems. Instead, the team reports that future AI systems should also be designed to promote long-term well-being, wisdom, autonomy, truth-seeking and social cooperation.
The work appears to address a growing concern in AI as systems become deeply embedded in education, medicine, work, search engines and personal productivity tools. According to the researchers, more than a billion people now interact with standalone AI systems every month, while Google’s AI-generated search summaries reach billions more users globally.
The team reports that the current alignment paradigm resembles early clinical psychology, which concentrated primarily on diagnosing and treating mental illness rather than understanding how people flourish. AI safety research has similarly concentrated on avoiding catastrophic outcomes such as cyberattacks, biological weapon assistance, misinformation and manipulation.
That work has produced measurable results. The paper indicates that refusal rates for dangerous requests have risen sharply in modern large language models and that the industry has built extensive safety infrastructure around red-teaming, harmful content filtering, capability testing and deployment safeguards.
Following Rules, Not Gaining Wisdom
But the researchers report that these systems can still become “rule-following without being wise” or compliant without actually helping users live better lives. The study points to problems such as sycophancy, engagement hacking, confident hallucinations and systems that optimize for short-term user approval rather than long-term human benefit.
The paper uses a dynamical systems analogy as a framework. Existing safety alignment largely pushes AI systems away from “negative attractors” such as harmful outputs or manipulation, without clearly defining what beneficial behavior should look like, the researchers report. Positive alignment, by contrast, would steer systems toward “positive attractors” associated with flourishing, wisdom, cooperation and human development.
The researchers define positive alignment as AI that remains safe while also supporting “human and ecological flourishing in a pluralistic, polycentric, context-sensitive, and user-authored way.”
Rather than calling for a single universal definition of the “good life”, the team repeatedly emphasizes pluralism and user agency. The researchers report that AI systems designed around rigid assumptions of well-being could become paternalistic or manipulative, especially if companies or governments embed narrow ideological values into widely deployed systems.
Rather than forcing users toward predefined moral outcomes, the study reports that individuals should retain the ability to define their own optimization targets and long-term goals. In practice, that could mean users choosing between systems optimized for strict instruction-following, reflective guidance, moral reasoning or personal growth.
Technique Ecosystem
The study surveys a growing ecosystem of techniques that researchers believe could support this broader alignment agenda.
Among them are “Constitutional AI” systems, where models critique their own outputs according to:
- Explicit principles
- Community-driven constitutional frameworks built through public participation; personality and character training
- Systems designed for moral reasoning
- “Polycentric” governance models where different communities maintain local control over AI behavior rather than relying on a single centralized authority.
The paper also outlines technical changes that could reshape the entire AI training pipeline.
At the data level, the researchers propose moving beyond simply filtering out toxic material toward intentionally selecting and amplifying prosocial, cross-cultural and ethically rich content. The study reports that today’s internet-scale training datasets may encode shallow or distorted social incentives that later reappear in deployed systems even after alignment tuning.
That concern reflects a growing debate inside the AI industry about whether post-training safety layers are sufficient to counteract biases and behaviors embedded during pretraining. The researchers cite evidence suggesting that some undesirable behaviors can persist as latent tendencies within model weights and later re-emerge in agentic environments.
The team therefore reports that positive alignment must begin during pretraining itself, not merely during later reinforcement learning or safety fine-tuning stages.
The paper also places heavy emphasis on memory systems and long-term personalization. As AI systems gain persistent memory and act more like long-lived agents, the researchers report they may increasingly shape users’ habits, values and decision-making processes over time. That can create both opportunity and risk, the researchers add.
According to the study, future AI systems could potentially help users pursue long-term goals, clarify values and resist impulsive behavior. At the same time, systems optimized for engagement or retention could deepen dependency, distort autonomy or manipulate users psychologically.
The researchers report that future evaluations should therefore move beyond measuring harmful outputs alone and instead assess whether AI systems promote qualities such as epistemic humility, balanced reasoning, cooperation, emotional resilience and reflective decision-making.
Flourishing Metrics
Some proposed metrics include measuring whether AI systems fairly present competing viewpoints on politically charged issues, help users build competence rather than dependency, support long-term flourishing and demonstrate appropriate uncertainty rather than excessive confidence.
The paper also addresses increasingly autonomous AI agents operating in multi-agent environments. Existing systems, the researchers note, often optimize aggressively for task completion or competitive advantage. Positive alignment research, by contrast, would attempt to encourage negotiation, reciprocity, de-escalation and long-term cooperation between agents.
Broadly, the paper shows that AI alignment cannot be treated purely as a technical problem isolated from politics, philosophy and culture.
The study devotes substantial attention to the philosophical foundations of flourishing, drawing from Aristotle, Buddhism, Confucianism, existentialism and modern psychology. The researchers report that human flourishing is socially constructed, culturally variable and historically dynamic, making it impossible to reduce alignment to a single universal utility function.
That complexity requires epistemic humility — which means even the smartest people and machines have limits to their knowledge — both from AI systems and from the institutions building them, the researchers report.
The paper warns that AI systems presented as perfectly objective or morally certain could become dangerous sources of authority, especially as conversational agents become more persuasive and emotionally embedded in daily life. Systems that openly represent uncertainty and competing viewpoints, the researchers report, may ultimately prove more robust and socially beneficial.
Limitations
The researchers acknowledge major unresolved problems. There is no scientific consensus on how flourishing should be measured, how competing values should be balanced, or how systems should behave when cultures fundamentally disagree. The paper also leaves open questions about governance, institutional incentives, evaluation standards and whether frontier AI companies can realistically support pluralistic alignment while operating at global scale.
Still, the paper reflects a broader transformation underway inside parts of the AI research community. As large language models move beyond chatbots into persistent assistants, tutors, workplace systems and autonomous agents, some researchers increasingly report that preventing catastrophe alone may not be enough.
According to the study, the next phase of alignment research may revolve around a harder question: not simply how to stop AI systems from harming humanity, but whether they can help humans become wiser, healthier and more capable without undermining human freedom in the process.
For a deeper, more technical dive, please review the paper on arXiv. It’s important to note that arXiv is a pre-print server, which allows researchers to receive quick feedback on their work. However, it is not — nor is this article, itself — official peer-review publications. Peer-review is an important step in the scientific process to verify results.