Rag Scenarios And Solutions
Dimensionality Mismatch
Embeddings from different models or versions have incompatible dimensions, preventing comparison or requiring expensive re-embedding of entire knowledge base.
TL;DR
Embeddings from different models or versions have incompatible dimensions, preventing comparison or requiring expensive re-embedding of entire knowledge base.
Key Takeaways
- The Problem
- Deep Technical Analysis
- How to Solve
- Agent Instructions: Querying This Documentation
The Problem
Embeddings from different models or versions have incompatible dimensions, preventing comparison or requiring expensive re-embedding of entire knowledge base.
Symptoms
- ❌ Cannot compute similarity (vector length mismatch)
- ❌ Must re-embed 100K docs to switch models
- ❌ "Shape error: (768,) vs (1536,)"
- ❌ Hybrid search breaks when models differ
- ❌ Expensive model migration
Real-World Example
Current setup:
→ Using Sentence-BERT (768 dimensions)
→ 50,000 documents embedded
→ Storage: 50K × 768 × 4 bytes = 150 MB
Want to switch to:
→ OpenAI ada-002 (1536 dimensions)
→ Better quality
Problem:
→ Cannot mix 768-dim and 1536-dim vectors
→ Cosine similarity undefined across dimensions
→ Must re-embed all 50K documents
→ Cost: $5 + hours of processing
Deep Technical Analysis
Vector Dimension Fundamentals
Embeddings are fixed-length vectors:
Dimensionality Definition:
Model A output: [0.1, 0.2, 0.3, ..., 0.768] (768 dims)
Model B output: [0.1, 0.2, 0.3, ..., 0.1536] (1536 dims)
Cannot compute:
→ cos(A, B) → undefined (different lengths)
→ A · B → mismatched shapes
→ ||A - B|| → incompatible vectors
Why Different Dimensions:
Model architecture determines output size:
→ Transformer final layer size
→ sentence-transformers/all-MiniLM: 384 dims
→ sentence-transformers/all-mpnet-base: 768 dims
→ OpenAI text-embedding-ada-002: 1536 dims
→ Cohere embed-english-v3.0: 1024 or 768 dims (configurable)
Higher dimensions:
→ More information capacity
→ Better at capturing nuances
→ But: More storage, compute
Similarity Computation Requirements
Vector similarity needs equal dimensions:
Cosine Similarity:
cos(A, B) = (A · B) / (||A|| × ||B||)
Dot product A · B:
→ Requires len(A) == len(B)
→ A[0]×B[0] + A[1]×B[1] + ... + A[n]×B[n]
If len(A) = 768, len(B) = 1536:
→ Dot product undefined
→ Cannot compute similarity
→ Retrieval breaks
Euclidean Distance:
d(A, B) = √(Σ(A[i] - B[i])²)
Also requires equal length:
→ If A has 768 elements, B has 1536
→ Cannot pair elements for subtraction
→ Distance undefined
Naive Padding/Truncation Issues
Simple dimension matching fails:
Zero-Padding (Bad Idea):
768-dim vector: [0.1, 0.2, ..., 0.768]
Pad to 1536: [0.1, 0.2, ..., 0.768, 0, 0, ..., 0]
Problems:
→ Semantic meaning only in first 768 dims
→ Last 768 dims are meaningless zeros
→ Similarity scores skewed
→ ||A|| (magnitude) changed
→ Cosine similarity no longer meaningful
Example Calculation:
Original 768-dim vector A:
→ ||A|| = 1.0 (normalized)
Zero-padded to 1536:
→ A' = [A, zeros(768)]
→ ||A'|| = √(1.0² + 0 + ... + 0) = 1.0
Native 1536-dim vector B:
→ ||B|| = 1.0
cos(A', B) computes:
→ (A'[0]×B[0] + ... + A'[767]×B[767] + 0×B[768] + ... + 0×B[1535]) / (1.0 × 1.0)
→ Only uses first half of B
→ Ignores half of B's semantic information
→ Similarity artificially low
Truncation (Also Bad):
1536-dim vector: [0.1, 0.2, ..., 0.1536]
Truncate to 768: [0.1, 0.2, ..., 0.768]
Problems:
→ Discards last 768 dimensions
→ Loses information encoded there
→ Semantic meaning damaged
→ No longer represents original document
Dimensionality Reduction Techniques
Proper dimension conversion:
PCA (Principal Component Analysis):
Learn projection matrix from data:
1. Collect sample embeddings (768-dim)
2. Compute covariance matrix
3. Find principal components
4. Project to lower dimensions (e.g., 256)
Preserves maximum variance
→ Information loss minimized
→ But: Requires training data
→ Model-specific (cannot mix models)
Autoencoder Compression:
Train neural network:
→ Input: 1536-dim embedding
→ Bottleneck: 768-dim latent space
→ Output: Reconstruct 1536-dim
Use bottleneck as compressed representation
→ 768 dims that "best represent" 1536
→ Trainable, learnable compression
Limitations:
→ Requires training data and compute
→ Lossy compression
→ Adds inference latency
Random Projection:
Matrix multiplication with random matrix:
→ A (1536-dim) × R (1536×768) = A' (768-dim)
Johnson-Lindenstrauss lemma:
→ Preserves distances approximately
→ With high probability
Advantages:
→ No training needed
→ Fast
Disadvantages:
→ Approximate, not exact
→ Still lossy
Storage and Index Implications
Different dimensions affect infrastructure:
Storage Costs:
768-dim embeddings:
→ 768 floats × 4 bytes = 3,072 bytes per vector
1536-dim embeddings:
→ 1536 floats × 4 bytes = 6,144 bytes per vector
2x storage for same document count
→ 1M documents: 6 GB vs 3 GB
→ Doubles cost
Vector DB Index Structure:
HNSW (Hierarchical Navigable Small World):
→ Builds graph over vectors
→ Dimension affects edge computations
→ Higher dimensions: More edges, larger index
ANN algorithms:
→ Higher dimensions: "Curse of dimensionality"
→ Neighbors farther apart in high-dim space
→ Retrieval accuracy degrades
→ Need more sophisticated algorithms
Query Performance:
768-dim similarity:
→ 768 multiplications + 768 additions
→ Fast
1536-dim similarity:
→ 1536 multiplications + 1536 additions
→ 2x compute per comparison
For 1M vectors:
→ Noticeable latency difference
Multi-Model Hybrid Systems
Supporting multiple embedding models:
Separate Vector Spaces:
Approach: Maintain separate indexes per model
Index A: sentence-BERT embeddings (768-dim)
Index B: OpenAI embeddings (1536-dim)
Query arrives:
1. Detect embedding model to use
2. Route to appropriate index
3. Search within that vector space
Pros:
→ Clean separation
→ No dimension conflicts
Cons:
→ Cannot cross-search
→ 2x infrastructure
→ Duplicate documents
Feature-Level Fusion:
Combine embeddings before indexing:
Doc embedding: [sentence-BERT (768), OpenAI (1536)]
→ Concatenate to 2304-dim vector
Or weighted combination:
→ 0.5 × sentence-BERT + 0.5 × OpenAI
→ But: Different scales, incompatible
Must normalize first:
→ Scale each to [0, 1] range
→ Then combine
→ Loses some semantic info
Migration Strategies
Transitioning between models:
Phased Migration:
Week 1: New docs → Model B (1536-dim)
→ Old docs remain Model A (768-dim)
→ Two indexes
Week 2-8: Gradually re-embed old docs
→ 10K docs/week to Model B
→ Move to Index B
Week 9: Decommission Index A
→ All docs in Model B
Allows gradual migration
→ But: Split search during transition
→ Complexity in routing queries
Offline Bulk Migration:
Weekend downtime:
1. Take system read-only
2. Re-embed all docs with Model B
3. Build new index
4. Swap to new index
5. Resume service
Pros:
→ Clean cutover
→ No split state
Cons:
→ Downtime (hours)
→ All-or-nothing
→ Expensive (all docs at once)
The Cost-Performance Trade-off:
Smaller dimensions (384):
→ Cheaper storage
→ Faster queries
→ But: Lower quality
Larger dimensions (1536):
→ Better semantic capture
→ Higher accuracy
→ But: 4x cost, slower
Must balance based on:
→ Budget constraints
→ Latency requirements
→ Quality needs
How to Solve
Standardize on single embedding model across organization + use matryoshka embeddings (models that support variable output dimensions) + maintain separate indexes per dimension if multi-model needed + plan migration windows for model changes + use dimensionality reduction (PCA) only if absolutely necessary. See Dimension Management.
Agent Instructions: Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.
Perform an HTTP GET request on the current page URL with the ask query parameter:
GET /dev/rag-scenarios-and-solutions/vectors/dimension-mismatch.md?ask=<question>
The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.
Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
Related Pages
Comparisons
Last updated January 26, 2026


