A Short History of AI: Why LLMs Aren’t the Whole Story 

AI is having a moment. The rise of generative AI has transformed how we think about technology. Large Language Models (LLMs) like ChatGPT can summarize, draft, and classify text in ways that seemed impossible just a decade ago. 

But while LLMs feel brand new, they’re the latest chapter in a much longer story. AI has gone through multiple eras, each building on the last. And here’s the key: those older approaches? They’re still incredibly useful today—sometimes even faster, cheaper, or more reliable than the newest models. 

To make this concrete, let’s use a simple example: document classification. Imagine you’ve got thousands of documents, and you need to sort them into two piles: finance vs. manufacturing. How have computers tackled this task over the decades? Let’s take a tour. 

Era 1: Rules-Based Systems — “We wrote the rules” 

In the early days of AI (1950s–1980s), intelligence was thought of as something you could codify. If we could capture the rules humans used to make decisions, then machines could reproduce them. This led to the rise of expert systems—large databases of facts and rules built by teams of specialists. 

Applied to our classification problem, this might mean: 

  • If a document contains “balance sheet” → Finance 
  • If a document contains “assembly line” → Manufacturing 

This approach is simple and transparent. It’s also brittle. Covering every possible rule is impossible. What if a document mentions “earnings reports” instead of “balance sheet”? What if it uses industry-specific jargon? Unless a rule has been written for that, the system fails. 

Takeaway: Rules-based AI was straightforward and powerful in narrow domains, but it struggled in messy, real-world language. 

Era 2: The Statistical Turn — “Let the data speak” 

By the 1990s, AI took a sharp turn. Instead of handcrafting rules, researchers began letting computers learn directly from data. If we could provide labeled examples—documents tagged as finance or manufacturing—algorithms could discover their own patterns. 

This shift was driven by two factors: 

  • The internet created a flood of digital text to learn from. 
  • Faster computers made it possible to train algorithms on large datasets. 

For document classification, this meant we no longer had to write the rules. Instead, we trained a model on thousands of examples, and it learned correlations automatically. 

The upside: computers are excellent at spotting complex statistical patterns. The downside: these systems treated words as isolated tokens. If the training data connected “earnings” to finance, the model could use that. But if it never saw “revenue,” it wouldn’t know the two were related. 

Takeaway: Statistical methods scaled better than rules, but they didn’t capture deeper meaning. 

Era 3: Deep Learning — “Now the computer gets context” 

The next leap came in the 2010s. With vast amounts of text available online and GPUs enabling more powerful computation, neural networks re-emerged. Instead of treating words as disconnected labels, deep learning models began representing them as embeddings—mathematical vectors that capture context. 

Here’s what that means in practice: 

  • In older systems, “earnings” and “revenue” were unrelated unless both appeared in training data. 
  • With embeddings, the model learns that “earnings” and “revenue” live in the same neighborhood—because they appear in similar contexts. 

This was a major breakthrough. It allowed models to understand not just patterns but also semantics—the actual meaning of words in relation to each other. 

Takeaway: Deep learning gave AI the ability to understand words in context, not just count their appearances. 

Era 4: Transformers & LLMs — “AI that feels fluent” 

The real game-changer arrived in 2017 with the transformer architecture. Transformers allowed models to capture relationships between words across entire documents, not just within short windows of text. 

When scaled up, transformers became Large Language Models (LLMs). These models have billions—or even trillions—of parameters, trained on enormous swaths of internet text. That scale gives them their uncanny fluency. 

The magic of LLMs comes from zero-shot learning. Unlike older methods, which required labeled data, LLMs can often perform new tasks with no training. You can simply ask: 

“Tell me if this document is finance or manufacturing.” 

And the model can respond reasonably well, even without explicit training data. 

But this power comes at a cost. Training LLMs requires enormous amounts of data and computing power. Running them at scale is also expensive. Classifying a handful of documents? No problem. Classifying millions? That can be slow and cost-prohibitive. 

Takeaway: LLMs are astonishingly flexible, but their size makes them resource-heavy. 

Why History Still Matters 

At this point, you might be tempted to think: if LLMs are so powerful, why not just use them for everything? 

The answer: there’s no free lunch in computing. 

  • Cost: LLMs are expensive to train and run. Simpler methods are far cheaper. 
  • Speed: Older approaches are faster, especially at scale. 
  • Control: Rules and traditional models can be easier to tune for specific domains. 

Consider a legal example: finding potentially privileged emails in a dataset of a million documents. Running everything through an LLM might give good results, but it could be slower and vastly more expensive than using a combination of keyword searches and heuristic methods. 

That’s why the history matters. Each generation of AI offers strengths and weaknesses. The best practitioners don’t just chase the newest model—they understand the whole toolbox and use the right tool for the job. 

What This Means for Today 

We should absolutely be excited about LLMs. They’ve unlocked possibilities that were once unimaginable. But older methods haven’t disappeared. They’re still highly effective, particularly in contexts where cost, speed, or transparency matter. 

At Lineal, we see this every day. In our Amplify suite, we combine modern techniques with tried-and-true methods. That means we can deliver results that are not only accurate but also efficient—without the heavy overhead of an LLM when it isn’t needed. 

Even industry leaders are recognizing this balance. NVIDIA recently argued that “Small Language Models” may represent the future of AI—leaner systems trained for specific tasks rather than massive general-purpose models. 

The future of AI isn’t one-size-fits-all. It’s knowing when to use the shiny new power tool, and when a simple wrench gets the job done better. 

Coming Up Next 

This post covered the big picture: the evolution from rules to statistics to deep learning to LLMs. 

In the next post, we’ll show how Lineal applies these lessons in practice—using the right mix of methods inside Amplify™ to solve real legal problems efficiently. Sometimes, smarter really does mean simpler. 

Want to go deeper on how AI, automation, and legal data strategy are transforming modern firms? Download our Digital Transformation eBook for an inside look at what the most innovative teams are doing next.

_

About the Author   

Matthew Heston is Lead Data Scientist at Lineal, where he leads the design and implementation of AI, machine learning, and scalable data systems that transform how legal teams work with complex information. He received a PhD in Technology and Social Behavior from Northwestern University. 

 _

About Lineal   

Lineal is an innovative eDiscovery and legal technology solutions company that empowers law firms and corporations with modern data management and review strategies. Established in 2009, Lineal specializes in comprehensive eDiscovery services, leveraging its proprietary technology suite, Amplify™ to enhance efficiency and accuracy in handling large volumes of electronic data. With a global presence and a team of experienced professionals, Lineal is dedicated to delivering custom-tailored solutions that drive optimal legal outcomes for its clients. For more information, visit lineal.com