What’s the Point of Finding “Named Entities”?
Imagine you’re reading a news article: ”Apple CEO Tim Cook announced a partnership with Samsung to develop AI chips.”
Humans instantly recognize:
- Apple = company
- Tim Cook = person
- Samsung = company
- AI = technology domain
A basic text processor sees: just words.
That’s what Named Entity Recognition (NER) solves. It’s the NLP task of spotting and labeling specific things (entities) in text: people, organizations, locations, dates, products, amounts of money, etc.
Why? Because once you extract structured data from unstructured text, you can do analytics, build knowledge graphs, automate workflows, and answer questions you couldn’t answer before.
How NER Works (The Simple Version)
Input: ”Microsoft founder Bill Gates visited Singapore last month.”
NER output:
- Microsoft → Organization
- Bill Gates → Person
- Singapore → Location
- last month → Time expression
The system reads word-by-word, checks context, and decides: “What is this entity, and what category does it belong to?”
The context matters. ”Amazon is a company” vs. ”The Amazon rainforest”—same word, different entities, different contexts. Good NER models consider surrounding words.
Three Approaches to NER
1. Rule-Based (Old but Still Used)
Hand-coded patterns: “If a word is capitalized and followed by ‘Inc.,’ it’s a company. If it matches [date patterns], it’s a date.”
Advantages: Simple, interpretable, works for consistent patterns.
Disadvantages: Rigid, breaks on variations, requires manual rule writing. ”Apple” vs. ”APPLE” might need separate rules.
When to use: When patterns are highly consistent (dates, currency, email addresses).
2. Statistical Machine Learning
Train models on labeled datasets where humans have manually tagged entities. Models learn patterns statistically.
Algorithms used: Conditional Random Fields (CRF), Hidden Markov Models, Support Vector Machines.
How it works: Show the model 1000s of examples of “John” labeled as Person, “Microsoft” as Organization, etc. The model learns: “Capitalized words, preceded by certain titles, are likely people.”
Advantages: More flexible than rules, handles variations.
Disadvantages: Needs good labeled training data. May miss rare entity types.
3. Deep Learning (Current Standard)
Use transformers (BERT, RoBERTa, DistilBERT) pre-trained on massive text corpora, then fine-tune on labeled NER datasets.
How it works: Pre-trained models already understand language structure and semantics. Fine-tuning teaches them entity boundaries and categories. Takes far fewer labeled examples than training from scratch.
Advantages: State-of-the-art accuracy, handles complex contexts, works across languages with transfer learning.
Disadvantages: Requires GPU, computationally expensive, black box model.
The Entity Categories
Different tasks care about different entity types:
Common Categories:
- PERSON: Names of individuals
- ORGANIZATION: Companies, institutions
- LOCATION: Cities, countries, landmarks
- DATE: Dates, times, durations
- MONEY: Currency amounts
- PRODUCT: Product names
- EVENT: Named events
Specialized Categories (domain-specific):
- DRUG: Medicine names (healthcare)
- SYMPTOM: Disease indicators
- CHEMICAL: Molecule names (chemistry)
- LEGISLATION: Law names (legal)
- TICKER: Stock symbols (finance)
The more specific your domain, the more custom entity types you might need.
How NER Systems Actually Get Built
Step 1: Collect and Annotate Data
Gather text relevant to your domain. Then pay humans (or use crowdsourcing) to manually label entities.
”Apple CEO Tim Cook announced...” → Humans tag: “Apple” (ORG), “Tim Cook” (PERSON).
Typically need 5,000-50,000 labeled examples to train a good model. More for rare entity types.
Step 2: Preprocess
Split sentences, tokenize (break into words), standardize formats.
Step 3: Train or Fine-Tune
If using deep learning: Take a pre-trained model (BERT), freeze most layers, add a small NER-specific layer on top, fine-tune on your labeled data.
If using CRF: Extract features (word shape, POS tag, surrounding context), train the CRF on these features and labels.
Step 4: Evaluate
Test on held-out data. Measure precision (of the entities I found, how many were correct?) and recall (of the entities that exist, how many did I find?).
For critical domains, evaluate on different text genres and real-world edge cases.
Real Applications
Information Extraction from Documents
Scan contracts, resumes, invoices, research papers—extract key facts automatically.
Example: Extract date, amount, company name from invoices. Now you can automate billing or build analytics.
Real impact: HR (extracting resume data), legal (contract analysis), finance (document processing).
Chatbots and Question Answering
When someone types ”What’s the weather in New York?”, NER identifies “New York” as a location. The system knows to fetch weather for that city specifically.
Real impact: Better customer support, more accurate voice assistants.
News and Event Tracking
Monitor news articles for mentions of companies, people, events. “Did our CEO get mentioned? Did competitors announce partnerships?”
Real impact: Brand monitoring, competitive intelligence, news aggregation.
Healthcare
Extract disease names, drug names, patient demographics from clinical notes. Automate record-keeping, enable analytics.
Real impact: Faster diagnosis support, better record-keeping, automated adverse event reporting.
Knowledge Graphs and Recommendation
Identify entities in text, link them together, build knowledge graphs.
Example: “Actor X appeared in Movie Y” → Connect actor to movie → Recommend other movies with that actor.
Real impact: Better recommendations, semantic search, AI-powered discovery.
Finance and Market Intelligence
Track mentions of companies, executives, events in financial news. Alert traders to market-moving information.
Real impact: Faster reaction to market events, better risk monitoring.
The Challenges
Ambiguity: ”John Smith worked at Apple.” Is Apple the company or a fruit vendor? Context helps but isn’t always clear.
Rare entities: Models struggle with entity types they see infrequently. If your training data has 100 examples of “PERSON” but only 2 of “CHEMICAL,” the model struggles with chemicals.
Domain shift: NER trained on news articles might fail on medical text. Language varies by domain.
Boundary ambiguity: Is it “New York” or “New York City”? Entity boundaries aren’t always obvious.
Languages and scripts: Works best for English. Other languages need separate models.
FAQs
How does NER know where entity boundaries are?
Context clues: capitalization, surrounding words, POS tags, learned patterns. Modern models use transformers that attend to all relevant context.
Can NER detect sentiment?
No. NER identifies what entities are mentioned. Sentiment analysis determines how they’re perceived. Different tasks.
Is NER language-specific?
Yes, traditionally. But modern transfer learning (pre-trained multilingual models like mBERT) works across languages with minimal fine-tuning.
How accurate is NER?
Depends on complexity. Simple entity types (dates, numbers): 95%+. Complex types (rare entities, ambiguous domains): 80-90%. Real-world performance drops due to noise, misspellings, unusual text.
Can I use pre-trained NER models?
Yes. Hugging Face has NER models pre-trained on various datasets. Start there, fine-tune if needed.
Next up: explore Sentiment Analysis to see how AI understands emotions and opinions in text.