vector-searchembeddingssimilarity-searchsemantic-search

Vector Search: Finding Meaning Instead of Keywords

How AI finds similar content by understanding meaning, not just matching words

AI Resources Team··9 min read

You search for "affordable shoes" and get results for "budget-friendly footwear." Traditional search engines would fail—no keyword match. But vector search understands that "affordable" and "budget-friendly" mean the same thing. That's the revolution: search by meaning, not by keywords.


The Problem Vector Search Solves

Traditional search: "Find pages with these exact words."

Query: "affordable shoes"
Results: Only pages containing "affordable" AND "shoes"
Misses: "cheap sneakers", "budget footwear", "inexpensive boots"

Vector search: "Find pages with meaning similar to my query."

Query: "affordable shoes"
Results: "cheap sneakers" ✓, "budget footwear" ✓, "discount boots" ✓
Understands: Same semantic meaning, different words

How Vector Search Works (The Core Idea)

Step 1: Convert Words to Numbers (Embeddings)

Words are converted into high-dimensional vectors (lists of numbers).

"Dog" → [0.2, 0.5, -0.1, 0.8, 0.3, ...]
"Puppy" → [0.21, 0.51, -0.09, 0.79, 0.31, ...]
"Cat" → [0.1, 0.3, -0.4, 0.6, 0.2, ...]

Similar words have similar vectors. "Dog" and "Puppy" are close. "Dog" and "Cat" are far apart.

Step 2: Measure Distance Between Vectors

Calculate how similar two vectors are using cosine similarity or Euclidean distance.

Cosine Similarity between "dog" and "puppy": 0.99 (very similar, near 1)
Cosine Similarity between "dog" and "cat": 0.85 (somewhat similar)
Cosine Similarity between "dog" and "pizza": 0.1 (very different)

Step 3: Find Closest Matches

Return items with vectors closest to the query vector.

Query: "affordable shoes"
→ Convert to vector: [query_vector]

Compare against all items:
- "cheap sneakers": distance 0.05 ✓ (very close, top result)
- "discount boots": distance 0.08 ✓ (close, second result)
- "expensive handbags": distance 0.9 ✗ (far away, irrelevant)

Vectors: What They Represent

A vector is a point in high-dimensional space. The dimensions capture meaning.

Word "dog" as a vector:
[0.2, 0.5, -0.1, 0.8, ...]
  ↑   ↑    ↑    ↑
  |   |    |    └─ "Animal-ness"
  |   |    └────── "Size"
  |   └─────────── "Domestication"
  └──────────────── "Fluffy-ness"

Modern embeddings: 100-3000 dimensions. Each dimension captures some aspect of meaning.

Key property: Vector arithmetic makes sense.

"King" - "Man" + "Woman" ≈ "Queen"
"Paris" - "France" + "Germany" ≈ "Berlin"
"Good" - "Bad" ≈ direction of sentiment

This is why embeddings are so powerful.


Creating Vectors (Embeddings)

Pre-Trained Models

Use existing models trained on billions of texts:

  • BERT: Converts text to 768-dimensional vectors
  • Sentence Transformers: Converts sentences to 384-512D vectors
  • GPT embeddings: OpenAI's embeddings (1536D)
  • Word2Vec: Classic embeddings (300D, older but fast)

How They're Trained

Models see billions of examples:

  • "dog" appears near "puppy" → make vectors similar
  • "dog" appears far from "pizza" → make vectors different
  • Repeat for all words/sentences

Result: Vectors automatically capture meaning through co-occurrence patterns.

Custom Embeddings

Train on your domain-specific data:

Medical texts about disease + symptoms
→ Train embeddings on this domain
→ "Fever" and "high_temperature" become similar
→ Domain-specific semantic understanding

Vector Search in Action (Real Examples)

Customer searches: "running shoes that don't hurt"

Traditional: No results (no page says "don't hurt")

Vector search:
- Converts query to vector
- Searches product descriptions, reviews
- Finds:
  * "Comfortable athletic sneakers"
  * "Cushioned running footwear"
  * "Supportive jogging shoes"
→ All semantically similar despite different wording

Customer Support

Customer: "My app keeps crashing"

Search internal knowledge base:
- "Application crashes on startup"
- "App is unstable"
- "Software fails frequently"

All match meaning, even though wording differs
→ Find right solution article

Content Recommendation

User watches video about "learning Python"

Vector search finds similar content:
- "Getting started with Python"
- "Python for beginners"
- "Introduction to programming in Python"

Without needing explicit tagging

AspectKeyword SearchVector Search
Match TypeExact wordsSemantic meaning
SpeedVery fastSlower (distance calc)
AccuracyLiteralContextual
Handles SynonymsNoYes
AmbiguityCan't resolveContextual resolution
SetupSimpleRequires embeddings
Use CaseStructured dataUnstructured text/images

When keyword search wins:

  • Exact matches (product IDs, dates)
  • Highly structured data (databases)
  • Speed is critical

When vector search wins:

  • Semantic understanding needed
  • Fuzzy matches acceptable
  • Synonyms/paraphrases matter

Step 1: Choose an Embedding Model

For text:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')  # 384D, fast, accurate

document = "The cat sat on the mat"
embedding = model.encode(document)  # [0.2, -0.1, 0.5, ...]

Step 2: Embed Your Data

Convert all documents/items to vectors.

documents = [
  "The dog chased the ball",
  "A puppy runs in the park",
  "Pizza is delicious"
]

embeddings = [model.encode(doc) for doc in documents]
# Result: List of 384D vectors

Step 3: Store in Vector Database

Use specialized databases optimized for vector similarity:

  • Pinecone: Managed service, simplest
  • Weaviate: Open-source, feature-rich
  • Milvus: High performance, self-hosted
  • Qdrant: Fast, user-friendly
  • FAISS: Facebook's library, research-focused
# Pseudocode with Pinecone
vector_db = PineconeDB()
for i, embedding in enumerate(embeddings):
    vector_db.upsert({
        "id": i,
        "values": embedding,
        "metadata": {"text": documents[i]}
    })

Step 4: Query

Convert query to vector, find nearest neighbors.

query = "A puppy playing outside"
query_embedding = model.encode(query)

results = vector_db.query(query_embedding, top_k=3)

# Results: Documents with highest cosine similarity
# 1. "A puppy runs in the park" (0.95 similarity)
# 2. "The dog chased the ball" (0.85 similarity)
# 3. "Pizza is delicious" (0.1 similarity)

Real Vector Search Stacks (2025)

Fast Startup Approach

Stack: FastAPI + Pinecone + Sentence-Transformers

Code:
1. Load model: model = SentenceTransformer(...)
2. Embed data: embeddings = model.encode(texts)
3. Upload to Pinecone: pinecone_index.upsert(embeddings)
4. API endpoint: @app.post("/search")
   - Get query text
   - Embed: query_vec = model.encode(query)
   - Search: results = pinecone_index.query(query_vec)
   - Return results

Cost: $0 startup, scales with usage (Pinecone bills by vectors stored)
Speed: Query latency <100ms

Enterprise Approach

Stack: Kubernetes + Qdrant + Fine-tuned BERT

1. Deploy Qdrant cluster on K8s
2. Fine-tune BERT on domain data
3. Batch embed all documents
4. Manage with REST API
5. Monitor with Prometheus/Grafana

Cost: $10K-100K/month for infrastructure
Speed: Sub-10ms query latency
Customization: Full control, domain-optimized embeddings

Self-Hosted Open Source

Stack: FAISS + FastAPI + SentenceTransformers

1. Load FAISS: index = faiss.IndexFlatL2(vector_dim)
2. Add vectors: index.add(embeddings_array)
3. Query: distances, ids = index.search(query_vector, k=10)

Cost: Just server hardware
Speed: Very fast (optimized C++)
Limitation: Single machine scaling

Advanced Techniques

Combine keyword + vector search:

Query: "Python course for beginners"

Keyword search: Documents with "Python" AND "course"
Vector search: Documents semantically similar

Combine results: Take top 5 from each, merge, re-rank
Result: Best of both approaches

Multi-Modal Embeddings

Same embedding space for text AND images.

Text: "Running shoes"
Image: Photo of a sneaker

Both convert to same embedding space
Can find images by text description
Can find text by image similarity

Re-ranking

First pass: Fast approximate search (FAISS)

Results: 100 candidates in 10ms

Second pass: Slow exact similarity (neural cross-encoder)

Re-rank top 100 with more accurate model
Return top 10 to user
Total time: 50ms, much better results

Challenges & Solutions

High Dimensional Spaces

Vectors have 384-3000 dimensions. Distance metrics become less meaningful at extreme dimensions ("curse of dimensionality").

Solution: Use approximate nearest neighbor (ANN) algorithms instead of exact search. Trade tiny accuracy loss for massive speed gain.

Stale Embeddings

Documents change, but embeddings don't auto-update. Old data can become irrelevant.

Solution: Re-embed periodically. Or trigger re-embed on document change.

Privacy Concerns

Embeddings encode semantic information. Might leak sensitive data.

Solution: Encrypt vectors. Use differential privacy. On-premise deployment.

Cost at Scale

Each document needs embedding. Embedding costs money (API calls or compute).

Solution: Batch embedding (cheaper than real-time). Use smaller, faster models. Cache embeddings.


Vector Search Use Cases

Use CaseImplementation
E-commerce searchEmbed product descriptions + user query
Chatbots/RAGEmbed docs, retrieve relevant context for LLM
RecommendationEmbed items, find similar to user preference
Content discoveryEmbed articles, find related content
Duplicate detectionEmbed documents, find near-duplicates
Semantic clusteringEmbed items, group by similarity
Cross-lingual searchEmbed multiple languages in same space

Vector Databases Comparison (2025)

DatabaseSpeedEaseCostSelf-Hosted
PineconeFastEasiest$$$$No
WeaviateMediumEasy$$$Yes
QdrantVery fastMedium$$Yes
MilvusVery fastHard$$Yes
FAISSVery fastHardFreeYes

Pick Pinecone for simplicity. Pick Qdrant for self-hosted speed. Pick FAISS for research.


FAQs

How many dimensions should vectors have? 384-512 for most cases. Larger = more information but slower. Smaller = faster but less nuanced.

Can I use pre-trained embeddings or must I fine-tune? Pre-trained works for general tasks. Fine-tune for domain-specific accuracy (legal, medical, technical).

What's cosine similarity vs. Euclidean distance? Cosine: Measures angle (normalized). Euclidean: Measures actual distance. Cosine preferred for high-dimensional spaces.

Is vector search replacing keyword search? No, they're complementary. Hybrid is best: keywords for exact matches, vectors for semantics.

How do I handle documents longer than model's max tokens? Chunk documents, embed chunks separately. Query finds most relevant chunks. Return those to user/LLM.

Can embeddings be reverse-engineered? Theoretically very hard (embedding is lossy). But sensitive data shouldn't be embedded with public models.


Next up: Explore Emergent Behavior in AI—the surprising capabilities that arise from scale and complexity.


Keep Learning