Four use cases that actually pay off.
After analysing 20 firms, four AI use cases consistently deliver measurable value. The others are mostly marketing. Start with these; they'll teach you what AI can and can't do in your firm before you bet bigger.
Inbox classification + routing
LLM classifies incoming mail (client, court, marketing, internal) and routes. Near-zero risk, high volume, immediate payoff. 4-week deployment typical.
Document summarisation
LLM summarises incoming documents (contracts, letters, reports) before partner review. Partner reads 200 words instead of 20 pages; drills down only when needed.
Deadline extraction (with mandatory human review)
LLM extracts deadlines from beA messages and emails. Human confirms before anything is written to DATEV/RA-MICRO. Saves hours weekly, zero risk if review gate is enforced.
Content drafting with approval
LLM drafts LinkedIn posts, blog articles, newsletters. Partner reviews against pre-defined criteria in <5 min. See the content-automation guide for depth.
Four use cases. Zero mystery. Everything else is marketing pretending to be products.
What doesn't work (and why).
Plenty of AI-for-legal products claim capabilities they don't reliably deliver. Here's what we've tested and seen fail — at least with current-generation models. This may change, but skepticism is the right default.
Autonomous legal advice
AI giving binding legal advice without human review: still too unreliable. Hallucinations are too frequent, citations too unreliable, context too easy to misread. Human-in-the-loop is mandatory.
Contract negotiation bots
AI negotiating contract terms with the other side's AI: impressive demos, poor real-world performance. Negotiation depends on strategy, ambiguity, power dynamics — things LLMs don't handle well.
End-to-end case prediction
AI predicting case outcomes: some specialised tools work for narrow domains (patent litigation in the US), but general-purpose prediction is unreliable and not worth betting on for most firms.
AI paralegal assistants without supervision
The marketing says "fully autonomous". The reality is: they need supervision like any junior staff. Treat them as a speed-up, not a replacement.
Compliance: BRAO, GDPR,
professional secrecy.
Legal professionals operate under stricter data rules than most industries. AI deployment must respect all three axes simultaneously — and it's fully doable, just not accidentally.
BRAO / bar rules
No advice without supervision. No comparative claims. No guaranteed outcomes. The generation prompt enforces these at output time; the review checklist verifies at sign-off time.
GDPR
EU-only endpoints. No-training agreements. 30-day data retention max. Personal data minimisation (only what's needed). Audit log for every AI call. Right to erasure implementable.
Professional secrecy
Mandate-level data doesn't leave the firm's infrastructure unless absolutely necessary. When it must (for extraction), it goes via EU endpoints with clear deletion terms. Never to US-residency services without explicit agreement.
Models and endpoints
— state 2026.
The AI landscape shifts monthly, but the stable choices for EU-based law firms in 2026 are:
Our default: Claude for high-stakes work, GPT-4o for volume, Mistral for EU-native, Llama for self-hosted. Choice depends on risk profile and latency tolerance — not on \"which is best\".
- Anthropic Claude 3.5 Sonnet via AWS Frankfurt — best instruction-following, strong German.
- Azure OpenAI GPT-4o in Frankfurt region — robust, wide ecosystem, slightly weaker on German nuance.
- Mistral Large via Mistral API (French hosting) — fully EU-native, solid quality for most tasks.
- Llama 3 self-hosted (for sensitive extraction) — runs on your server, no external API calls.
- Avoid: any US-hosted API without data-residency agreement. Avoid: on-device tools that phone home.
Governance that firms trust.
AI in a law firm must be governable — auditable, reversible, reviewable. Four mechanisms that turn "black box" into "traceable system":
Audit log per call
Every AI call is logged: prompt, input, output, who requested, timestamp. Retained in EU-hosted Postgres for 12 months. Exportable.
Human-in-the-loop on binding outputs
Anything going to DATEV, beA, or a client without human review: no. Reviews are recorded — who approved, when, against what criteria.
Rollback capability
Any automated decision must be reversible. A deadline written incorrectly can be unwritten. A document tagged wrongly can be retagged. All via UI, no engineer needed.
Quarterly review of false positives/negatives
Every quarter, partners review a sample of AI decisions. What did it get wrong? What did it miss? Feed learnings back into prompts and review criteria.
How to start — year one.
If you're just beginning, here's the year-one plan that's worked for the firms we've helped. Resist the temptation to do everything at once; each step needs time to become habit.
- Month 1–2: inbox classification + routing. Low-risk, high-volume. Teaches the firm what AI does and doesn't do.
- Month 3–4: document summarisation. Partners experience time savings firsthand. Builds trust.
- Month 5–7: deadline extraction with mandatory review. Higher stakes; justifies the review discipline.
- Month 8–12: content drafting with approval. Adds value without risk. Often the most visible AI feature to clients.
- After year 1: evaluate whether to add further cases based on actual outcomes, not demos.