Rag Scenarios And Solutions

Agent-Level Data Isolation

Multiple AI agents share the same knowledge base without proper isolation, causing agents to access data they shouldn't see.

TL;DR

Multiple AI agents share the same knowledge base without proper isolation, causing agents to access data they shouldn't see.

Key Takeaways

  • The Problem
  • Deep Technical Analysis
  • How to Solve
  • Agent Instructions: Querying This Documentation

The Problem

Multiple AI agents share the same knowledge base without proper isolation, causing agents to access data they shouldn't see.

Symptoms

  • ❌ Agent A sees Agent B's private data
  • ❌ Cross-agent data leakage
  • ❌ Cannot restrict knowledge by agent
  • ❌ Shared vector DB exposes all data
  • ❌ No tenant isolation

Real-World Example

Company has two agents:
→ HR Agent: Access to employee records
→ Customer Support Agent: Access to help docs

Shared vector DB with all data:
→ Customer asks Support Agent: "What's the CEO's salary?"
→ Retrieval finds HR document with salary info
→ Support Agent responds with CEO salary

Data isolation failure

Deep Technical Analysis

Shared Knowledge Base Risks

No Filtering Layer:

All chunks in one vector DB:
→ HR docs embedded
→ Customer docs embedded
→ No metadata distinguishing them

Any query retrieves anything:
→ Agent identity not checked
→ Data access unrestricted
→ Privacy violation

Metadata Filtering:

Solution: Tag chunks with access control:
{
  vector: [0.234, ...],
  metadata: {
    agent_id: "hr_agent",
    department: "hr",
    sensitivity: "confidential"
  }
}

Query with filter:
→ agent_id = "support_agent"
→ Only retrieve support_agent tagged chunks

Multi-Tenancy Patterns

Namespace Isolation:

Pinecone/Weaviate:
→ Create separate namespaces per agent
→ hr_agent namespace
→ support_agent namespace

Queries scoped to namespace:
→ Cannot cross namespace boundary
→ Strong isolation

Separate Indexes:

One index per agent:
→ hr_agent_index
→ support_agent_index

Complete separation:
+ Strongest isolation
+ Independent scaling
- Higher infrastructure cost
- More operational complexity

Row-Level Security:

PostgreSQL + pgvector:
→ Use database roles
→ Row-level security policies
→ Query: "Show only rows where agent_id = current_user"

Database-enforced isolation

Access Control Logic

Pre-Retrieval Filtering:

Before vector search:
1. Identify requesting agent
2. Add metadata filter:
   WHERE metadata.agent_id = 'support_agent'
3. Execute search with filter

Ensures:
→ Only authorized chunks retrieved
→ No leakage

Post-Retrieval Filtering:

Alternative: Filter after retrieval:
1. Retrieve top-K chunks (e.g., 20)
2. Check each chunk's agent_id
3. Remove unauthorized
4. Return remaining (e.g., 12)

Problem:
→ Reduces effective K
→ May not have enough results
→ Prefer pre-retrieval

How to Solve

Tag all chunks with agent_id/tenant_id metadata + implement pre-retrieval filtering (metadata.agent_id = current_agent) + use namespace isolation (separate vector DB namespaces) + consider separate indexes for strong isolation + apply row-level security if using PostgreSQL. See Data Isolation.


Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the ask query parameter:

GET /dev/rag-scenarios-and-solutions/privacy/data-isolation.md?ask=<question>

The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.

Related Pages

Last updated January 26, 2026