Knowledge Graphs: Connecting the Dots in AI

Ever notice how Google search really understands what you're asking? You search for "Napoleon" and it knows whether you mean the general or the pastry. It shows you his biography, timeline, related figures, and battles. That's not magic — that's a knowledge graph.

A knowledge graph is basically a giant database of entities (people, places, things) and the relationships between them. Instead of storing flat data, you store understanding. "Napoleon was a French general who ruled France and fought in the Napoleonic Wars." That's three entities (Napoleon, France, Napoleonic Wars) and three relationships (was a, ruled, fought in).

It sounds simple. It's actually one of the most powerful things in modern AI.

What's in a Knowledge Graph?

Let's break down the components.

Entities: The things themselves. In healthcare: patients, doctors, diseases, medications, hospitals. In music: artists, songs, albums, genres, venues. In ecommerce: products, customers, categories, suppliers, warehouses.

Attributes: Properties of entities. A patient has age, gender, blood type. A product has price, weight, color.

Relationships: Connections between entities. "Patient has Disease," "Artist performs Song," "Product is in Category." Relationships can be one-to-one, one-to-many, or many-to-many.

Type information: Is this relationship a medical diagnosis? A professional collaboration? A supply chain link? Types matter because they determine how relationships behave.

Picture it like this:

[Person: Napoleon] --ruled--> [Country: France]
                    --fought-in--> [War: Napoleonic Wars]
                    --lived-during--> [Period: 1769-1821]

Compare that to how databases usually work (flat tables). A traditional database would have three separate tables:

People: name, birth_year, death_year
Countries: name, capital
Rules: person_id, country_id, start_year, end_year

You can answer questions both ways, but the graph is structured understanding. The relationships are explicit and queryable.

Knowledge Graphs in the Wild

Google's Knowledge Graph

Google literally built their entire search around this. When you search for something, they match you against billions of entities and relationships.

Search: "What movies has Tom Hanks been in?" Google doesn't just find pages about Tom Hanks movies. It retrieves Tom Hanks (entity) → "appeared_in" relationships → all movies. Then it ranks them by relevance/recency.

Search: "How tall is the Eiffel Tower?" Google knows:

Eiffel Tower (entity)
Height (attribute): 330 meters
Location: France
Built: 1889
Architect: Gustave Eiffel

All connected. All queryable. This is why Google Search is so good at understanding.

Healthcare: Medical Knowledge Graphs

Imagine a knowledge graph of diseases, symptoms, treatments, side effects, and drug interactions.

Query: "What drugs can I take for migraines that won't interact with my blood pressure meds?"

The graph has:

Diseases: Migraine, Hypertension
Treatments: Sumatriptan, Lisinopril, Propranolol
Relationships: "Treats," "HasSideEffect," "InteractsWith"

The system can traverse: "Given Hypertension, find treatments. For each treatment, find compatible migraine treatments."

Real example: Google DeepMind uses knowledge graphs to predict drug interactions and side effects that weren't tested in clinical trials. Potentially saves lives.

LinkedIn & Professional Networks

LinkedIn's entire recommendation system is built on a knowledge graph:

People (with skills, experience, location)
Companies (with industries, size, location)
Jobs (with requirements, seniority level)
Schools (with alumni)

Relationships: "WorkedAt," "HasSkill," "IsAlumniOf," "Applied," "Recommended"

This is how LinkedIn recommends "People You May Know." It's not just similarity (though that's part of it). It's graph traversal: "Find people who worked at companies similar to mine, in roles similar to mine, in my region, with complementary skills."

Knowledge Graphs vs. Vector Databases

This is a question people ask a lot in 2025. And the answer is: they're different tools.

Aspect	Knowledge Graph	Vector Database
Structure	Explicit entities and relationships	Embeddings (numbers)
Query type	Logical: "X caused Y"	Semantic similarity: "Find similar documents"
Best for	Relationships, facts, logic	Natural language search, recommendations
Explainability	Very high (you see the path)	Low (vector values don't explain)
Scale	Excellent for 100M+ entities	Excellent for 1B+ embeddings
Schema	Requires design	Flexible

Real difference: A vector database says "this document is similar to that one." A knowledge graph says "Document A cites Person B, who worked for Company C, which owns Technology D, which is discussed in Document E."

Vector DBs are about similarity. Knowledge graphs are about relationships.

Graph Databases: The Infrastructure

You can't build a knowledge graph in a flat database. You need graph infrastructure.

Neo4j

The most popular. It's to graphs what PostgreSQL is to relational databases.

// Create entities and relationships
CREATE (tom:Person {name: "Tom Hanks"})
CREATE (forest:Movie {title: "Forrest Gump", year: 1994})
CREATE (tom)-[:ACTED_IN]->(forest)

// Query relationships
MATCH (actor:Person)-[:ACTED_IN]->(movie:Movie)
WHERE actor.name = "Tom Hanks"
RETURN movie.title, movie.year

Neo4j is fast, scalable, and works great for millions of entities. Companies like Walmart, BNY Mellon, and eBay use it.

Cost: Open source (free) to enterprise ($200k+/year). Most companies use the open source version or managed cloud.

Amazon Neptune

AWS's managed graph database. Works with property graphs (Neo4j style) and RDF (semantic web). If you're already in AWS and want managed, Neptune is solid.

Cost: $1-2/hour for decent clusters.

Other Players

TigerGraph: Distributed graph database, claims better performance than Neo4j at scale
ArangoDB: Multi-model (graph + document + search)
JanusGraph: Distributed graph, good for enormous datasets
Gremlin: Query language for traversing graphs

Graph RAG: The 2025 Trend

Here's something new that's blowing up: combining knowledge graphs with retrieval-augmented generation (RAG).

Traditional RAG: You embed documents into vectors, retrieve similar ones, pass to an LLM.

Graph RAG: You extract entities and relationships from documents, build a knowledge graph, then traverse the graph to find relevant context for the LLM.

Example:

Document: "Apple CEO Tim Cook announced a partnership with Samsung on battery technology."

Traditional RAG might find this document because someone searches "Apple partnerships."

Graph RAG extracts:

Apple (entity: Company)
Tim Cook (entity: Person)
Samsung (entity: Company)
Battery technology (entity: Technology)
Relationships: "CEO_OF," "ANNOUNCED," "PARTNERSHIP"

Now you can traverse: "Find all partnerships Apple announced in 2024" or "What companies has Tim Cook announced partnerships with?" It's orders of magnitude more intelligent.

Companies like Databricks and startup Aurelius are shipping Graph RAG tools in 2025. This is going to be huge.

Building a Knowledge Graph

Step 1: Design Your Schema

What entities exist in your domain? What relationships matter?

Example for a healthcare startup:

Entities:

Patient, Doctor, Hospital, Disease, Medication, Treatment, Symptom

Relationships:

Patient "HasDisease" Disease
Doctor "TreatsPatient" Patient
Doctor "WorksAt" Hospital
Medication "TreatsDisease" Disease
Medication "CausesSymptom" Symptom (side effects)
Treatment "RecomendedFor" Disease

Spend time here. A good schema makes everything downstream easier.

Step 2: Populate Your Graph

Manual input: If you have a small domain (100 entities), you can build it by hand.

Automated extraction: If you have documents, use NLP.

from spacy import load
from transformers import pipeline

nlp = load("en_core_web_sm")
ner_pipeline = pipeline("ner", model="dbmdz/bert-large-cased-finetuned-conll03-english")

text = "John Hanks worked at Google from 2020 to 2023."
entities = ner_pipeline(text)
# Output: [
#   {"word": "John Hanks", "entity": "PER"},
#   {"word": "Google", "entity": "ORG"},
#   {"word": "2020", "entity": "DATE"}
# ]

# Create relationships: Person "WorkedAt" Organization

With LLMs, you can do even better:

from openai import OpenAI

client = OpenAI()
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{
        "role": "user",
        "content": """Extract entities and relationships from this text:
        "Sarah Chen, a software engineer at Meta, published a paper on efficient transformers."

        Return JSON: {
            "entities": [{"name": "", "type": ""}],
            "relationships": [{"from": "", "relation": "", "to": ""}]
        }"""
    }]
)

This is how you scale knowledge graph construction. Accuracy is 85-95%, which is good enough to start.

Step 3: Query and Refine

Start querying your graph. Find gaps. Fix errors.

// Find all medications for a disease
MATCH (m:Medication)-[:TREATS]->(d:Disease {name: "Diabetes"})
RETURN m.name, m.sideEffects

// Find potential drug interactions
MATCH (p:Patient)-[:TAKES]->(m1:Medication),
      (p)-[:TAKES]->(m2:Medication),
      (m1)-[:INTERACTS_WITH]->(m2)
WHERE m1.id < m2.id
RETURN p.name, m1.name, m2.name

Step 4: Integrate with Your Application

Connect your app to the graph database.

Python + Neo4j example:

from neo4j import GraphDatabase

driver = GraphDatabase.driver("bolt://localhost:7687", auth=("neo4j", "password"))

def get_recommended_treatments(disease_name):
    with driver.session() as session:
        result = session.run("""
            MATCH (t:Treatment)-[:RECOMMENDED_FOR]->(d:Disease)
            WHERE d.name = $disease
            RETURN t.name, t.efficacy, t.cost
            ORDER BY t.efficacy DESC
        """, disease=disease_name)
        return [record.data() for record in result]

treatments = get_recommended_treatments("Diabetes")

Real-World Impact

Recommendation Engines: Every major company (Netflix, Spotify, Amazon, Pinterest) uses graph concepts for recommendations. Amazon product graph has billions of nodes. It's how they recommend complementary products.

Fraud Detection: Banks use graphs to detect rings of fraud. Transaction patterns, device IDs, IP addresses, account relationships. A fraudster might use different card numbers and names, but the graph reveals the pattern.

Information Retrieval: Google, Bing, DuckDuckGo all use knowledge graphs for search. Yahoo's knowledge graph powers their question-answering.

Scientific Discovery: Researchers build knowledge graphs from papers. Which drugs have been studied together? Which genes are associated with which diseases? Graph analysis accelerates discovery.

Common Challenges

Maintenance: Knowledge graphs decay. Entities become outdated. Relationships change. "John Works at Google" becomes false when John moves to Meta. You need processes to keep it fresh.

Scale: Graphs can get huge. A social network graph might have billions of edges. Querying becomes slow. Solutions: sharding, caching, approximation algorithms.

Schema Evolution: Your schema needs to change over time. Neo4j makes this relatively easy, but it's still work.

Data Quality: If your entities are wrong or relationships are missing, the graph is useless. Garbage in, garbage out.

Expertise: Graph databases require different mental models than relational databases. Your team needs to learn Cypher, graph algorithms, query optimization.

FAQs

Q: Isn't a knowledge graph just a database? Kinda. But the emphasis is different. Relational databases optimize for storage and ACID. Knowledge graphs optimize for traversal and relationship queries.

Q: Should I build a knowledge graph or use RAG with vectors? If your domain is highly relational (healthcare, finance, social networks), graph. If you're doing semantic search or document retrieval, vectors. Many teams use both.

Q: Can I build a knowledge graph incrementally? Absolutely. Start with core entities and relationships. Expand over time. Neo4j handles schema evolution well.

Q: How long does it take to build a knowledge graph? Depends on scope. A small domain (1,000 entities): weeks. Medium (100k entities): months. Large (millions): ongoing.

Q: Can LLMs replace knowledge graphs? No. LLMs are great at reasoning, but they can't guarantee correctness or explain their logic. Knowledge graphs guarantee facts and are fully explainable. Use both.

The Bottom Line

Knowledge graphs are how AI moves from "finding patterns" to "understanding relationships." They're not new (Google's been using them for 15 years), but they're becoming more central as AI systems get more sophisticated.

Whether you're building recommendations, fraud detection, scientific discovery, or just smart search, think about whether your domain is relational. If entities and relationships matter, a knowledge graph is your answer.

The graph mindset is different. But once you get it, it's hard to go back to flat databases for relational data.

Next up: AI Ethics & Bias: Why It Actually Matters — Because smarter AI without ethics is just a smart disaster.

Tools that use this

Put this knowledge into practice

notion ai

chatgpt

perplexity ai

Test your understanding

3 questions · 2 minutes

1 / 3

What is a knowledge graph?

0 correct so far