Why Psychiatry Needs Its Own AI Models: Beyond General-Purpose LLMs

Why general-purpose LLMs fall short in mental healthcare, and why the path forward runs through specialized foundation models trained on psychiatric data.

Martin Denais

5 min read

Abstract

Large language models like ChatGPT have captured healthcare's imagination. While institutions remain cautious about formal deployment, individual clinicians increasingly turn to these tools informally, copying notes into ChatGPT for summaries or asking diagnostic questions during complex cases. This grassroots adoption reveals both the demand for AI assistance and a real limitation: general-purpose LLMs weren't built for psychiatric care.

During consultations, mental health clinicians assess how their patients feel, how they sleep, how they speak, and compare these observations against prior visits and broader medical context. These are the subtle, experience-driven cues that clinicians develop over years of practice. General-purpose AI models typically miss them.

Psychiatry isn't the first field to hit this wall. Biopharma reached the same point: adapting general AI to biology, even retraining it on the scientific literature, produced models that still reasoned over a text description of biology rather than biology itself. The field's answer was a new class of models trained directly on the raw data of life: sequences, structures, omics. Psychiatry faces the same reality.

This article explores why general-purpose AI falls short in mental healthcare, and why the path forward runs through specialized foundation models trained on psychiatric data.

A Real Clinical Need, Still Poorly Served

Studies show that 90% of patients with bipolar disorder relapse in their lifetime, nearly half within two years of recovery. In schizophrenia, that number can reach 72% within two years of a first psychotic episode. Not all of these relapses are inevitable. Many could be anticipated with earlier detection of warning signs. But the current system only catches what's visible during scheduled appointments, and most of what matters happens in between.

Measurement-based care offers a proven path to better outcomes. We've described this in detail in our manifesto, along with why existing tools, from clinician-administered scales to patient self-reported questionnaires, remain largely unadopted in practice.

AI can help close this gap, but general-purpose LLMs fall short as they weren't designed for the complexity of mental health.

Where AI Falls Short in Psychiatry

When clinicians use ChatGPT or Claude for psychiatric notes, they're using tools trained on internet text and general conversation, not psychiatric clinical data.

Designed for structured medicine. General-purpose AI performs well where clinical decisions follow codified pathways: oncology staging, drug interaction checks, radiology classification. Psychiatry doesn't work that way. Treatment is iterative, largely trial and error, with efficacy assessed through subjective clinical judgment over sustained observation. General models typically struggle to reconcile contradictory or ambiguous clinical information, the kind psychiatrists navigate in every consultation.

Unable to assess symptoms in context. Psychiatry is fundamentally about assessing symptoms, which are alterations in behavior, within a patient's personal and medical context. When a patient says “I can't sleep,” it might reflect an actual sleep disorder or a subjective impression linked to depression, mania, anxiety, psychosis, or medication effects. Distinguishing between these requires understanding the patient's history, current treatment, and how this complaint fits into a broader clinical picture.

The same issue extends to speech itself. In psychiatry, what matters is not only what patients say, but how they say it: their rhythm, pauses, disruptions in speech, pressured speech, thought blocking, tangentiality. These patterns carry real diagnostic weight. Yet general-purpose LLMs work primarily with text: they can analyze a transcript, but they cannot capture the full richness of the vocal and behavioral signal. Even the most advanced transcription systems show their limits in this context. A system like Whisper may perform well on typical speech, but become far less reliable with neuropsychiatric patients, whose speech can be altered or disorganized. An approximate transcript can then become not just imperfect, but clinically misleading.

This does not mean today’s tools have no value. From medical scribes to documentation assistants, a growing field of clinical AI is already delivering real gains for clinicians: these tools are more secure, better adapted to medical terminology, and more integrated into clinical workflows than a general-purpose model used off the shelf. But they still often rely on general-purpose LLMs enriched with medical context, rather than deeply specialized models. Being trained on more medical text is not the same as being built for the specific signals of psychiatry. In this field, speech is not merely a vehicle for information; it is itself a clinical signal. The challenge, then, is not simply to adapt a general-purpose model to a medical context, but to specialize it deeply around the signals that define psychiatry.

The Case for Psychiatry-Specific AI Models

Psychiatry needs models built around the full range of signals that define clinical practice in mental health — not text alone, but speech, behavior, clinical history, and longitudinal change. Trained on psychiatric and behavioral data, these models offer three capabilities particularly suited to the complexity of psychiatric care.

Robust generalization. Training across different pathologies (depression, bipolar disorder, schizophrenia) creates models that handle heterogeneity (the norm in psychiatry) better than diagnosis-specific systems. Psychiatry's complex, unstructured medical data, long seen as a computational liability, becomes a source of contextualized clinical insights.

Multi-modal integration. Foundation architectures can unify audio features, linguistic content, behavioral patterns, and clinical history, mirroring how psychiatrists actually synthesize information. Unlike conventional models that require separate validation for each modality, foundation models learn shared representations across data types.

Flexible adaptation. A single foundation model can be adapted to multiple clinical applications (symptom monitoring, relapse prediction, treatment response) without retraining from scratch. Training for one task often enhances performance across others and can unlock capabilities the system wasn't explicitly designed for.

These advantages, however, only materialize with appropriate training data.

Building What Doesn't Exist Yet

The bottleneck for psychiatric AI isn't algorithms. It's clinical data infrastructure.

Psychiatric datasets can't be assembled the way radiology archives can. While retrospective labelling works in some medical fields, psychiatric diagnosis can't be reliably inferred from isolated signals: a one-minute voice sample or a few nights of sleep data don't tell you whether someone is experiencing a depressive episode. Reliable assessment requires structured clinical evaluation conducted alongside data collection, longitudinal follow-up over months or years, and population diversity across diagnoses, severities, and demographics. Each meaningful data point represents hours of coordination between researchers, clinicians, and patients.

Companies in this space face a clear choice: work within the limitations of general-purpose models, or invest years building the specialized datasets that psychiatric foundation models require.

Callyope chose the second path. Through clinical trials and research partnerships with academic hospitals, we have built the world's largest behavioral dataset in neuroscience, spanning the spectrum of brain conditions. Not because it's the fastest route to market, but because it's the only way to build AI that actually works for this field.

The future of AI in psychiatry will not depend only on more powerful models. It will depend on our ability to build the clinical and scientific infrastructure that allows those models to understand the right signals, in the right context.

Why Psychiatry Needs Its Own AI Models: Beyond General-Purpose LLMs

Abstract

A Real Clinical Need, Still Poorly Served

Where AI Falls Short in Psychiatry

The Case for Psychiatry-Specific AI Models

Building What Doesn't Exist Yet

Other blog articles you may like

Callyope’s Manifesto

Martin Denais, co-founder and CEO, Live on BFM Business

GHU Paris Agreement

Company

Social