The Klarna AI Support Playbook. 67% in 30 days — then what broke.
Klarna's AI handled 2.3M chats in month one and saved an estimated $40M. A year later, they reintroduced human agents for the complex 20%. This is the full deployment, the math, the five mistakes they fixed, and the 4-week plan to replicate it on your stack.
By Chandan Maruthi, CEO of Twig · Updated May 2026
2.3M
Chats handled in month one
Feb 2024 launch
67%
Of chats automated
Tier-1 fully resolved
11→2 min
Avg resolution time
Pre-AI → post-AI
$40M
Estimated annual saving
Avoided hires, not layoffs
TL;DR
Klarna's AI is the largest public customer-support AI deployment. The story is not “AI replaces humans.” It's “AI runs tier-1 well, humans move up the value chain — and over-automation will catch you.” Below: the architecture, the five mistakes that almost broke it, the vendor comparison, and the 4-week plan to replicate it on your own stack.
1. What Klarna actually built
Four components, integrated tightly. None of them are exotic — the engineering work was in the connections, not the parts.
GPT-4-class language model
OpenAI as the underlying model. Klarna tuned prompts and system instructions extensively for BNPL and fintech context.
Direct API integration
Read access to accounts, transactions, payment schedules. Write access to refunds and payment reschedules — without handing off to a human.
RAG over policy + past tickets
Help center, policy docs, and past resolutions form the grounding corpus. AI retrieves before it generates, which cuts hallucination.
Confidence-based escalation
Low-confidence and complex queries route to humans with full context. Public statements suggest ~30% still escalate.
2. The math behind “700 agents”
The 700-agent figure got every headline. Here's the actual calculation, and why it doesn't mean what the headlines said.
- 2.3M chats per month
- Avg human handle time pre-AI: ~11 minutes
- 2.3M × 11 = 25.3M min = ~421K hours/month
- At 160 productive agent-hours/month → ~2,630 agent-equivalents of work
- At 67% automation → ~1,760 agent-equivalents handled by AI
The “700 agents” figure refers to the additional agents Klarna would have needed to hire during a growth phase. They avoided hiring, not laid off. The distinction matters — to investors, to customers, and to regulators.
3. What broke — the 2025 walk-back
By early 2025 Klarna quietly reintroduced human capacity for the hardest 20%. The reported causes weren't the model — they were the operating envelope.
Hallucinations on edge cases. ~5% of conversations got confident-but-wrong answers about fees, terms, or eligibility. In fintech, wrong answers about money are a compliance problem, not a CSAT problem.
CSAT dropped on complex / emotional tickets. Even when AI gave correct answers, customers wanted a human on disputes and hardship. The AI-only path scored lower.
Regulatory attention to the framing. The "AI replaced 700 agents" framing drew scrutiny. Klarna softened the narrative and tightened disclosure.
Weak escalation handoffs. Customers had to re-explain themselves on handoff to a human. That tanked CSAT on the escalated tier.
4. Five mistakes Klarna made — and the fixes
Every team copying the Klarna playbook hits these. The fixes are the playbook's actual value, more so than the headline numbers.
AI matched keywords, not intent
What broke: Rigid keyword responses. Customers re-explained themselves multiple times. Escalations went up, not down.
The fix: Trained on real conversations, added context tracking across turns, and refused to answer below a confidence threshold rather than guess.
AI took on disputes and fraud — and failed
What broke: Klarna pointed AI at complex disputes and hardship cases early. AI got stuck on multi-step interactions; CSAT dropped sharply on regulated query types.
The fix: Scope-restricted AI to repetitive, transactional queries (order status, refunds, payment reminders). Built a routing classifier that hands off complex intent on turn one.
Replies sounded robotic
What broke: Scripted, formal, no acknowledgement of frustration. Customers felt deflected. Lower satisfaction even when the answer was right.
The fix: Trained on agent-customer pairs (not just docs). Added sentiment detection that adjusts tone on detected frustration. Personalization references the customer's prior history.
AI-to-human handoffs lost context
What broke: When AI escalated, customers had to re-explain from scratch. Agents lost the AI's classification, attempted response, and confidence score.
The fix: AI now summarizes intent + retrieved context + attempted response + confidence before handoff. Agent picks up mid-conversation, not from zero.
AI did not identify itself
What broke: Customers assumed they were talking to a human, then felt misled when responses felt mechanical. Trust dropped on first-contact.
The fix: AI introduces itself by name and role. A persistent 'Talk to a human' option is one tap away. AI is positioned as first-line support, not a human replacement.
5. Klarna's custom build vs four off-the-shelf platforms
Klarna built in-house with OpenAI in ~6 months. Most teams don't need to. Here's how Klarna's architecture maps to the four platforms that productize the same pattern.
| Capability | Klarna (custom) | Twig | Intercom Fin | Decagon | Ada |
|---|---|---|---|---|---|
| Time to live (typical) | ~6 months | 2–4 weeks | 4–8 weeks | 8–12 weeks | 6–10 weeks |
| Foundation model | GPT-4 (OpenAI) | GPT-4 / Claude (swappable) | GPT-4 family | Multi-model | Proprietary + GPT |
| RAG / knowledge grounding | |||||
| Confidence-based escalation | Limited | ||||
| Voice channel out of the box | |||||
| Content cleanup tooling | Partial | Partial | Partial | ||
| Per-conversation audit trail | |||||
| Pricing model | Custom build | $5 / resolved ticket | $0.99 / resolution | Enterprise quote | Enterprise quote |
| Engineering team required | Dedicated (10+) | Configure, no build | Configure, no build | 1–2 eng for integration | Configure |
Pricing and timelines reflect publicly published figures and typical onboarding observed across 30+ Twig deployments. Vendor details change — verify before procurement.
6. The 4-week plan to replicate Klarna's playbook
Klarna shipped in 6 months with a dedicated team. With a managed platform, 4–8 weeks is realistic. The week-by-week below maps the order of operations.
Content audit + integration
Pull the top 200 ticket intents from your helpdesk. Find the policy gaps. Connect Zendesk / Intercom / Freshdesk. Wire account-data APIs for actions the AI should take.
Human-review pilot
AI drafts every response, human reviews before send. Track override rate. Target: under 5% before going autonomous. This catches hallucinations cheaply.
Autonomous on low-risk intents
Release AI on order status, refund queries, payment reminders. Hold complex categories (disputes, hardship, account closure) in human queue. Confidence floor stays high.
Expand + monitor
Open more intents based on autonomy quality, not autonomy rate. Sample 100 AI conversations weekly. Fix content gaps, not prompts, when hallucinations appear.
7. Run the math on your own ticket volume
Klarna's $40M number is theirs. Your number depends on three inputs: tickets per month, avg handle time, and fully-loaded agent cost. The formula:
annual_savings = monthly_hours_saved × hourly_cost × 12
Worked example. 50K tickets × 11 min × 0.67 ÷ 60 = 6,141 hours saved/month. At $25/hr loaded cost: $153K/month, $1.84M/year in freed capacity. Mileage varies with content quality, ticket mix, and escalation discipline.
People also ask
How did Klarna automate 67% of customer support in 30 days?+
Klarna combined a GPT-4-class model from OpenAI with direct API access to its account, transaction, and refund systems, plus RAG grounding on its help center. Low-confidence cases routed to humans. Total build: ~6 months. The 30-day window refers to volume ramp after launch, not the build itself.
How much money did Klarna actually save?+
Klarna estimated $40M/year in avoided hiring costs — not layoffs. They avoided adding ~700 new agents during a growth phase. The framing was later softened after regulatory and customer-experience pushback.
Why did Klarna walk back its AI claims in 2025?+
Three reasons: (1) hallucinations on edge cases damaged CSAT on complex tickets; (2) compliance concerns around AI handling disputes and account closures; (3) the 'AI replaced 700 agents' framing was misleading. Klarna reintroduced humans for the complex 20% and kept AI for tier-1.
Can a smaller team replicate Klarna's playbook?+
Yes, faster. Klarna spent ~6 months custom-building with OpenAI. Platforms like Twig productize the same pattern — content cleanup, confidence-based routing, RAG, escalation. Typical time-to-live: 2–4 weeks. Typical autonomous resolution by week 8: 50–70%.
What's the biggest mistake teams make copying Klarna?+
Optimizing for automation rate instead of escalation quality. A 50% automation rate with 98% accuracy beats a 75% rate with 90% accuracy in regulated contexts. Klarna's walk-back was driven by quality on the escalated tier, not by failing to automate more.
What AI technology does Klarna use today?+
GPT-4-class models from OpenAI with custom prompts, internal API integrations, and a RAG-style knowledge base. Post-2025 walk-back, they added stricter confidence thresholds and a dedicated human queue for disputes, fraud, and hardship cases.
Run Klarna's playbook on your tickets
We'll train Twig on your help center, drop it into chat or voice, and show you a live resolution on your real content inside 15 minutes.
SOC 2 Type II · No credit card required · Live in under 30 minutes


