Dimensionality Mismatch

The Problem

Embeddings from different models or versions have incompatible dimensions, preventing comparison or requiring expensive re-embedding of entire knowledge base.

Symptoms

❌ Cannot compute similarity (vector length mismatch)
❌ Must re-embed 100K docs to switch models
❌ "Shape error: (768,) vs (1536,)"
❌ Hybrid search breaks when models differ
❌ Expensive model migration

Real-World Example

Current setup:
→ Using Sentence-BERT (768 dimensions)
→ 50,000 documents embedded
→ Storage: 50K × 768 × 4 bytes = 150 MB

Want to switch to:
→ OpenAI ada-002 (1536 dimensions)
→ Better quality

Problem:
→ Cannot mix 768-dim and 1536-dim vectors
→ Cosine similarity undefined across dimensions
→ Must re-embed all 50K documents
→ Cost: $5 + hours of processing

Deep Technical Analysis

Vector Dimension Fundamentals

Embeddings are fixed-length vectors:

Dimensionality Definition:

Model A output: [0.1, 0.2, 0.3, ..., 0.768] (768 dims)
Model B output: [0.1, 0.2, 0.3, ..., 0.1536] (1536 dims)

Cannot compute:
→ cos(A, B) → undefined (different lengths)
→ A · B → mismatched shapes
→ ||A - B|| → incompatible vectors

Why Different Dimensions:

Model architecture determines output size:
→ Transformer final layer size
→ sentence-transformers/all-MiniLM: 384 dims
→ sentence-transformers/all-mpnet-base: 768 dims
→ OpenAI text-embedding-ada-002: 1536 dims
→ Cohere embed-english-v3.0: 1024 or 768 dims (configurable)

Higher dimensions:
→ More information capacity
→ Better at capturing nuances
→ But: More storage, compute

Similarity Computation Requirements

Vector similarity needs equal dimensions:

Cosine Similarity:

cos(A, B) = (A · B) / (||A|| × ||B||)

Dot product A · B:
→ Requires len(A) == len(B)
→ A[0]×B[0] + A[1]×B[1] + ... + A[n]×B[n]

If len(A) = 768, len(B) = 1536:
→ Dot product undefined
→ Cannot compute similarity
→ Retrieval breaks

Euclidean Distance:

d(A, B) = √(Σ(A[i] - B[i])²)

Also requires equal length:
→ If A has 768 elements, B has 1536
→ Cannot pair elements for subtraction
→ Distance undefined

Naive Padding/Truncation Issues

Simple dimension matching fails:

Zero-Padding (Bad Idea):

768-dim vector: [0.1, 0.2, ..., 0.768]
Pad to 1536: [0.1, 0.2, ..., 0.768, 0, 0, ..., 0]

Problems:
→ Semantic meaning only in first 768 dims
→ Last 768 dims are meaningless zeros
→ Similarity scores skewed
→ ||A|| (magnitude) changed
→ Cosine similarity no longer meaningful

Example Calculation:

Original 768-dim vector A:
→ ||A|| = 1.0 (normalized)

Zero-padded to 1536:
→ A' = [A, zeros(768)]
→ ||A'|| = √(1.0² + 0 + ... + 0) = 1.0

Native 1536-dim vector B:
→ ||B|| = 1.0

cos(A', B) computes:
→ (A'[0]×B[0] + ... + A'[767]×B[767] + 0×B[768] + ... + 0×B[1535]) / (1.0 × 1.0)
→ Only uses first half of B
→ Ignores half of B's semantic information
→ Similarity artificially low

Truncation (Also Bad):

1536-dim vector: [0.1, 0.2, ..., 0.1536]
Truncate to 768: [0.1, 0.2, ..., 0.768]

Problems:
→ Discards last 768 dimensions
→ Loses information encoded there
→ Semantic meaning damaged
→ No longer represents original document

Dimensionality Reduction Techniques

Proper dimension conversion:

PCA (Principal Component Analysis):

Learn projection matrix from data:
1. Collect sample embeddings (768-dim)
2. Compute covariance matrix
3. Find principal components
4. Project to lower dimensions (e.g., 256)

Preserves maximum variance
→ Information loss minimized
→ But: Requires training data
→ Model-specific (cannot mix models)

Autoencoder Compression:

Train neural network:
→ Input: 1536-dim embedding
→ Bottleneck: 768-dim latent space
→ Output: Reconstruct 1536-dim

Use bottleneck as compressed representation
→ 768 dims that "best represent" 1536
→ Trainable, learnable compression

Limitations:
→ Requires training data and compute
→ Lossy compression
→ Adds inference latency

Random Projection:

Matrix multiplication with random matrix:
→ A (1536-dim) × R (1536×768) = A' (768-dim)

Johnson-Lindenstrauss lemma:
→ Preserves distances approximately
→ With high probability

Advantages:
→ No training needed
→ Fast

Disadvantages:
→ Approximate, not exact
→ Still lossy

Storage and Index Implications

Different dimensions affect infrastructure:

Storage Costs:

768-dim embeddings:
→ 768 floats × 4 bytes = 3,072 bytes per vector

1536-dim embeddings:
→ 1536 floats × 4 bytes = 6,144 bytes per vector

2x storage for same document count
→ 1M documents: 6 GB vs 3 GB
→ Doubles cost

Vector DB Index Structure:

HNSW (Hierarchical Navigable Small World):
→ Builds graph over vectors
→ Dimension affects edge computations
→ Higher dimensions: More edges, larger index

ANN algorithms:
→ Higher dimensions: "Curse of dimensionality"
→ Neighbors farther apart in high-dim space
→ Retrieval accuracy degrades
→ Need more sophisticated algorithms

Query Performance:

768-dim similarity:
→ 768 multiplications + 768 additions
→ Fast

1536-dim similarity:
→ 1536 multiplications + 1536 additions
→ 2x compute per comparison

For 1M vectors:
→ Noticeable latency difference

Multi-Model Hybrid Systems

Supporting multiple embedding models:

Separate Vector Spaces:

Approach: Maintain separate indexes per model

Index A: sentence-BERT embeddings (768-dim)
Index B: OpenAI embeddings (1536-dim)

Query arrives:
1. Detect embedding model to use
2. Route to appropriate index
3. Search within that vector space

Pros:
→ Clean separation
→ No dimension conflicts

Cons:
→ Cannot cross-search
→ 2x infrastructure
→ Duplicate documents

Feature-Level Fusion:

Combine embeddings before indexing:

Doc embedding: [sentence-BERT (768), OpenAI (1536)]
→ Concatenate to 2304-dim vector

Or weighted combination:
→ 0.5 × sentence-BERT + 0.5 × OpenAI
→ But: Different scales, incompatible

Must normalize first:
→ Scale each to [0, 1] range
→ Then combine
→ Loses some semantic info

Migration Strategies

Transitioning between models:

Phased Migration:

Week 1: New docs → Model B (1536-dim)
→ Old docs remain Model A (768-dim)
→ Two indexes

Week 2-8: Gradually re-embed old docs
→ 10K docs/week to Model B
→ Move to Index B

Week 9: Decommission Index A
→ All docs in Model B

Allows gradual migration
→ But: Split search during transition
→ Complexity in routing queries

Offline Bulk Migration:

Weekend downtime:
1. Take system read-only
2. Re-embed all docs with Model B
3. Build new index
4. Swap to new index
5. Resume service

Pros:
→ Clean cutover
→ No split state

Cons:
→ Downtime (hours)
→ All-or-nothing
→ Expensive (all docs at once)

The Cost-Performance Trade-off:

Smaller dimensions (384):
→ Cheaper storage
→ Faster queries
→ But: Lower quality

Larger dimensions (1536):
→ Better semantic capture
→ Higher accuracy
→ But: 4x cost, slower

Must balance based on:
→ Budget constraints
→ Latency requirements
→ Quality needs

How to Solve

Standardize on single embedding model across organization + use matryoshka embeddings (models that support variable output dimensions) + maintain separate indexes per dimension if multi-model needed + plan migration windows for model changes + use dimensionality reduction (PCA) only if absolutely necessary. See Dimension Management.

Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the ask query parameter:

GET /dev/rag-scenarios-and-solutions/vectors/dimension-mismatch.md?ask=<question>

The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.