Pillar Guide · 12 min read

The Klarna AI Support Playbook. 67% in 30 days — then what broke.

Klarna's AI handled 2.3M chats in month one and saved an estimated $40M. A year later, they reintroduced human agents for the complex 20%. This is the full deployment, the math, the five mistakes they fixed, and the 4-week plan to replicate it on your stack.

Start Free →

By Chandan Maruthi, CEO of Twig · Updated May 2026

2.3M

Chats handled in month one

Feb 2024 launch

67%

Of chats automated

Tier-1 fully resolved

11→2 min

Avg resolution time

Pre-AI → post-AI

$40M

Estimated annual saving

Avoided hires, not layoffs

TL;DR

Klarna's AI is the largest public customer-support AI deployment. The story is not “AI replaces humans.” It's “AI runs tier-1 well, humans move up the value chain — and over-automation will catch you.” Below: the architecture, the five mistakes that almost broke it, the vendor comparison, and the 4-week plan to replicate it on your own stack.

1. What Klarna actually built

Four components, integrated tightly. None of them are exotic — the engineering work was in the connections, not the parts.

GPT-4-class language model

OpenAI as the underlying model. Klarna tuned prompts and system instructions extensively for BNPL and fintech context.

Direct API integration

Read access to accounts, transactions, payment schedules. Write access to refunds and payment reschedules — without handing off to a human.

RAG over policy + past tickets

Help center, policy docs, and past resolutions form the grounding corpus. AI retrieves before it generates, which cuts hallucination.

Confidence-based escalation

Low-confidence and complex queries route to humans with full context. Public statements suggest ~30% still escalate.

2. The math behind “700 agents”

The 700-agent figure got every headline. Here's the actual calculation, and why it doesn't mean what the headlines said.

2.3M chats per month
Avg human handle time pre-AI: ~11 minutes
2.3M × 11 = 25.3M min = ~421K hours/month
At 160 productive agent-hours/month → ~2,630 agent-equivalents of work
At 67% automation → ~1,760 agent-equivalents handled by AI

The “700 agents” figure refers to the additional agents Klarna would have needed to hire during a growth phase. They avoided hiring, not laid off. The distinction matters — to investors, to customers, and to regulators.

3. What broke — the 2025 walk-back

By early 2025 Klarna quietly reintroduced human capacity for the hardest 20%. The reported causes weren't the model — they were the operating envelope.

Hallucinations on edge cases. ~5% of conversations got confident-but-wrong answers about fees, terms, or eligibility. In fintech, wrong answers about money are a compliance problem, not a CSAT problem.

CSAT dropped on complex / emotional tickets. Even when AI gave correct answers, customers wanted a human on disputes and hardship. The AI-only path scored lower.

Regulatory attention to the framing. The "AI replaced 700 agents" framing drew scrutiny. Klarna softened the narrative and tightened disclosure.

Weak escalation handoffs. Customers had to re-explain themselves on handoff to a human. That tanked CSAT on the escalated tier.

4. Five mistakes Klarna made — and the fixes

Every team copying the Klarna playbook hits these. The fixes are the playbook's actual value, more so than the headline numbers.

AI matched keywords, not intent

What broke: Rigid keyword responses. Customers re-explained themselves multiple times. Escalations went up, not down.

The fix: Trained on real conversations, added context tracking across turns, and refused to answer below a confidence threshold rather than guess.

AI took on disputes and fraud — and failed

What broke: Klarna pointed AI at complex disputes and hardship cases early. AI got stuck on multi-step interactions; CSAT dropped sharply on regulated query types.

The fix: Scope-restricted AI to repetitive, transactional queries (order status, refunds, payment reminders). Built a routing classifier that hands off complex intent on turn one.

Replies sounded robotic

What broke: Scripted, formal, no acknowledgement of frustration. Customers felt deflected. Lower satisfaction even when the answer was right.

The fix: Trained on agent-customer pairs (not just docs). Added sentiment detection that adjusts tone on detected frustration. Personalization references the customer's prior history.

AI-to-human handoffs lost context

What broke: When AI escalated, customers had to re-explain from scratch. Agents lost the AI's classification, attempted response, and confidence score.

The fix: AI now summarizes intent + retrieved context + attempted response + confidence before handoff. Agent picks up mid-conversation, not from zero.

AI did not identify itself

What broke: Customers assumed they were talking to a human, then felt misled when responses felt mechanical. Trust dropped on first-contact.

The fix: AI introduces itself by name and role. A persistent 'Talk to a human' option is one tap away. AI is positioned as first-line support, not a human replacement.

5. Klarna's custom build vs four off-the-shelf platforms

Klarna built in-house with OpenAI in ~6 months. Most teams don't need to. Here's how Klarna's architecture maps to the four platforms that productize the same pattern.

Capability	Klarna (custom)	Twig	Intercom Fin	Decagon	Ada
Time to live (typical)	~6 months	2–4 weeks	4–8 weeks	8–12 weeks	6–10 weeks
Foundation model	GPT-4 (OpenAI)	GPT-4 / Claude (swappable)	GPT-4 family	Multi-model	Proprietary + GPT
RAG / knowledge grounding
Confidence-based escalation					Limited
Voice channel out of the box
Content cleanup tooling			Partial	Partial	Partial
Per-conversation audit trail
Pricing model	Custom build	$5 / resolved ticket	$0.99 / resolution	Enterprise quote	Enterprise quote
Engineering team required	Dedicated (10+)	Configure, no build	Configure, no build	1–2 eng for integration	Configure

Pricing and timelines reflect publicly published figures and typical onboarding observed across 30+ Twig deployments. Vendor details change — verify before procurement.

Twig vs Decagon →·Twig vs Sierra →·Twig vs Ada →

6. The 4-week plan to replicate Klarna's playbook

Klarna shipped in 6 months with a dedicated team. With a managed platform, 4–8 weeks is realistic. The week-by-week below maps the order of operations.

Week 1

Content audit + integration

Pull the top 200 ticket intents from your helpdesk. Find the policy gaps. Connect Zendesk / Intercom / Freshdesk. Wire account-data APIs for actions the AI should take.

Week 2

Human-review pilot

AI drafts every response, human reviews before send. Track override rate. Target: under 5% before going autonomous. This catches hallucinations cheaply.

Week 3

Autonomous on low-risk intents

Release AI on order status, refund queries, payment reminders. Hold complex categories (disputes, hardship, account closure) in human queue. Confidence floor stays high.

Week 4

Expand + monitor

Open more intents based on autonomy quality, not autonomy rate. Sample 100 AI conversations weekly. Fix content gaps, not prompts, when hallucinations appear.

7. Run the math on your own ticket volume

Klarna's $40M number is theirs. Your number depends on three inputs: tickets per month, avg handle time, and fully-loaded agent cost. The formula:

monthly_hours_saved = tickets × handle_min × automation_rate / 60
annual_savings = monthly_hours_saved × hourly_cost × 12

Worked example. 50K tickets × 11 min × 0.67 ÷ 60 = 6,141 hours saved/month. At $25/hr loaded cost: $153K/month, $1.84M/year in freed capacity. Mileage varies with content quality, ticket mix, and escalation discipline.

See pricing tied to resolved tickets, not seats →

Run Klarna's playbook on your tickets

We'll train Twig on your help center, drop it into chat or voice, and show you a live resolution on your real content inside 15 minutes.