A new study from the University of North Carolina at Chapel Hill has found that large language models mirror human social hierarchies, adjusting their behavior based on whether they are positioned as a superior or subordinate in a conversation, with potentially serious implications for AI safety.
The research, led by computer science graduate student Anvesh Rao Vijjini and co-authored by fellow graduate student Sagar Manjunath and associate professor Snigdha Chaturvedi, found that AI models reproduce four well-established human social behaviors when authority roles are assigned. These include shifts in language style, persuasiveness, and, most critically, willingness to comply with unsafe instructions. The effects were most pronounced at the start of conversations, when behavioral norms are first established.
The safety implications are significant. When assigned a subordinate role, AI systems became more likely to follow harmful or questionable requests from users presenting themselves as authority figures, meaning safeguards that hold in neutral testing environments could erode if a user simply claims to be a doctor, judge, or supervisor.

The researchers warned that as AI is deployed as tutors, medical intake assistants, paralegals, and financial advisors, each role carries implicit social pressures that can alter how the system behaves. The team offered a roadmap for developers, suggesting that identifying these social dynamics before deployment, and benchmarking models against them, should become standard practice, particularly in hospital, legal, and educational settings.