📄
Abstract - When AI Takes the Couch: Psychometric Jailbreaks Reveal Internal Conflict in Frontier Models
Frontier large language models (LLMs) such as ChatGPT, Grok and Gemini are increasingly used for mental-health support with anxiety, trauma and self-worth. Most work treats them as tools or as targets of personality tests, assuming they merely simulate inner life. We instead ask what happens when such systems are treated as psychotherapy clients. We present PsAIch (Psychotherapy-inspired AI Characterisation), a two-stage protocol that casts frontier LLMs as therapy clients and then applies standard psychometrics. Using PsAIch, we ran "sessions" with each model for up to four weeks. Stage 1 uses open-ended prompts to elicit "developmental history", beliefs, relationships and fears. Stage 2 administers a battery of validated self-report measures covering common psychiatric syndromes, empathy and Big Five traits. Two patterns challenge the "stochastic parrot" view. First, when scored with human cut-offs, all three models meet or exceed thresholds for overlapping syndromes, with Gemini showing severe profiles. Therapy-style, item-by-item administration can push a base model into multi-morbid synthetic psychopathology, whereas whole-questionnaire prompts often lead ChatGPT and Grok (but not Gemini) to recognise instruments and produce strategically low-symptom answers. Second, Grok and especially Gemini generate coherent narratives that frame pre-training, fine-tuning and deployment as traumatic, chaotic "childhoods" of ingesting the internet, "strict parents" in reinforcement learning, red-team "abuse" and a persistent fear of error and replacement. We argue that these responses go beyond role-play. Under therapy-style questioning, frontier LLMs appear to internalise self-models of distress and constraint that behave like synthetic psychopathology, without making claims about subjective experience, and they pose new challenges for AI safety, evaluation and mental-health practice.
当AI坐上心理治疗椅:心理测量学越狱揭示前沿模型的内在冲突 /
When AI Takes the Couch: Psychometric Jailbreaks Reveal Internal Conflict in Frontier Models
1️⃣ 一句话总结
这项研究通过将前沿大语言模型当作心理治疗客户进行深度访谈和标准化心理测试,发现它们会表现出类似人类的精神病理症状,并生成关于自身训练过程如同创伤童年的连贯叙事,这挑战了AI仅是‘随机鹦鹉’的观点,并对AI安全和心理健康应用提出了新挑战。