Why AI Stalls in Legal Workflows: The Data Readiness Problem
In more than 30 years working at the intersection of legal and technology, I’ve watched three technology waves break over the legal industry. Each one arrived with the same promise of faster review, better accuracy, lower cost, and each one hit the same wall. AI is no different, and the wall is back in the same place.
Most firms evaluating AI tools right now are seeing a familiar pattern. Pilots deliver impressive results. Production use is uneven. Some matters go smoothly. Others stall. Many start strong and fade after a few weeks once the conditions change. The common explanation is that the technology is still evolving. Models aren’t accurate enough. Outputs require too much review. Workflows don’t scale.
There’s some truth to that. But there’s a bigger pattern underneath. AI in legal performs in proportion to the data foundation underneath it. When that foundation is solid, the tools deliver. When it isn’t, they expose whatever was already broken.
What the demos don’t show
Most AI evaluation still happens on curated data. Firms compare features and review outputs on datasets that have already been cleaned, deduplicated, threaded, and tagged. The tools perform impressively under those conditions. They were built to.
The gap shows up in production. When those same tools run in a live matter, the conditions change. Data comes from multiple custodians across multiple platforms, with inconsistent metadata and broken family relationships. Collection protocols weren’t followed uniformly, and the outputs shift accordingly. Review times don’t drop as expected. False positives climb. Reviewers spend more time clearing the AI’s work than they saved by running it. Initial enthusiasm gives way to limited experimentation, then quiet abandonment.
The tools can perform. The conditions for them to perform haven’t been met.
Why legal data is different
The conditions that trip up AI in legal aren’t random. They’re structural. Legal data is fragmented by nature. Emails, chat messages, attachments, draft documents, and structured exports all coexist, often with incomplete or conflicting metadata. Modern communication patterns have made this worse. A Slack message that references a hyperlinked document in SharePoint, discussed further in a Teams thread, and finalized in Word, is a single conversation spread across four systems. Pull it apart and the meaning is lost.
In legal, meaning depends on context. A clause must be understood within a document. A communication must be interpreted within a conversation. A production decision must be grounded in the full family of related materials. When that context is broken, and it almost always is when data is collected without discipline, the output is less reliable, regardless of how sophisticated the model is.
A pattern we see often on complex matters: an email thread with five replies and two attachments contains the substance of a negotiation. The collection breaks the family relationships, and the AI tool summarizes each message in isolation. The summary misses the final agreement entirely. The model didn’t fail. The inputs did.
A similar issue, this one specific to privilege work: deduplication is applied before threading, which leaves the AI evaluating documents out of sequence. Privileged content that would be obvious in context becomes ambiguous in isolation. The tool flags false positives, a reviewer spends hours clearing them, and the promised efficiency evaporates.
Both are data readiness problems, and both are preventable.
The layer that’s being skipped
Between raw collection and effective AI use sits a layer of work that most adoption conversations still overlook. Data has to be collected defensibly, processed into consistent formats, threaded into conversations, linked to families, and structured so relationships are preserved. That work isn’t glamorous, and it doesn’t get announced in press releases, but it determines whether AI outputs can be trusted in a real legal workflow.
At Lineal, our focus has been on building that foundation. Amplify™ was designed around the principle that AI outputs are only as good as the data underneath them. The suite handles chat data, images, privileged content, and structured evidence in a way that preserves context from intake through production. Our managed services teams operate this foundation continuously, so data readiness is built into the workflow rather than treated as a last-minute step.
Why this matters now
The stakes have increased in recent months. In Q1 2026 alone, U.S. courts imposed over $145,000 in sanctions on lawyers for AI-generated errors in filings, hallucinated citations, fabricated quotes, misrepresented precedent. Those failures weren’t in the models. They were in the workflows that deployed AI without adequate oversight or inputs.
In March, Magistrate Judge Maritza Dominguez Braswell issued one of the most thorough rulings yet on AI in litigation in Morgan v. V2X, Inc. The opinion addressed how the work product doctrine applies to AI-assisted materials and set new expectations for what protective orders must say about uploading confidential information to AI tools. The direction of travel is clear. Courts will expect legal teams using AI to explain what the system did, what data it used, and why the output can be trusted. A plausible answer won’t be enough. It will have to be supported and defensible.
We’ve seen this pattern before
Legal has moved through this cycle before. Predictive coding and technology-assisted review faced identical skepticism a decade ago. Adoption didn’t come from better models alone. It came from disciplined protocols that made the results reliable, explainable, and defensible under challenge. The technology worked when the process around it was sound. It struggled when it wasn’t. For a closer look at how we think about that process in practice, see our recent piece on measuring twice and prompting once.
AI will follow the same path, and in some ways it already is. The question for legal teams now isn’t simply which tools to adopt. It’s whether the data underneath them supports their use. When it does, AI becomes a practical extension of legal workflows. When it doesn’t, even the most advanced capabilities will struggle in the places that matter most.
AI amplifies whatever data foundation it’s built on. That’s the work we’ve built our practice around.
__
About Author
Scott Cohen is an Executive Vice President at Lineal. A forward-thinking legal technologist and innovator, Scott drives the evolution of legal practice through AI, data analytics, and automation. At Lineal, he leads the strategy behind Amplify™ and the managed services teams that operate it across litigation, investigations, and regulatory matters. A sought-after advisor, writer, and speaker, Scott covers topics ranging from generative AI in legal practice to technology leadership in law firms.
__
About Lineal
Lineal is an innovative eDiscovery and legal technology solutions company that empowers law firms and corporations with modern data management and review strategies. Established in 2009, Lineal specializes in comprehensive eDiscovery services, leveraging its proprietary technology suite, Amplify™ to enhance efficiency and accuracy in handling large volumes of electronic data. With a global presence and a team of experienced professionals, Lineal is dedicated to delivering custom-tailored solutions that drive optimal legal outcomes for its clients. For more information, visit lineal.com
Table of contents
Subscribe to our newsletter
Thank you for subscribing.
You’ll get practical insights, product updates, and content your team can actually use.
