Harvey, Claude, Ernie, Gemma: New Work Bestie, New Collection Considerations
There is no doubt that AI has changed the way we work forever; it now helps us generate frameworks, draft documents, create infographics, and re-word emails and reports so they read more clearly, confidently, and professionally.
AI models are built using broadly similar underlying architectures (most commonly large neural networks such as transformers), but are configured and trained differently depending on their intended use, data sources, and governance requirements.
For example, ChatGPT is a general-purpose language model trained on diverse text data and fine-tuned with human feedback to support a wide range of conversational and creative tasks.
Microsoft Copilot, while often based on similar foundational models, is configured with enterprise-grade controls and integrated into Microsoft 365 environments, meaning it can securely access organisational data (e.g., emails, documents, Teams chats) and must adhere to strict compliance, privacy, and audit requirements.
In contrast, Harvey is an AI tool designed specifically for legal professionals and is further specialised through domain-specific training and configuration, enabling it to handle legal reasoning, document review, and workflows in line with legal standards and terminology.
These differences illustrate how AI models are not just “built” once, but are shaped through fine-tuning, data access, and guardrails to align with distinct use cases, such as general productivity, enterprise operations, or highly regulated industries like law.
In many day-to-day tasks, these tools act like an always-available assistant, speeding up first drafts, summarising long materials, translating ideas into structured outlines, and offering alternative phrasing when we are stuck.
Used well, this can free up time for higher-value work such as analysis, decision-making, and quality control; used poorly, it can also introduce new risks, including over-reliance on outputs we have not verified, inconsistencies in tone or terminology, and the accidental inclusion of inaccurate or sensitive information.
When we look at these models with an eDiscovery or DFIR lens, there are a multitude of questions that instantly spring to mind:
- What data does the model query?
- How accurate was the data before the model was run?
- How was the model trained and optimised?
- What prompts or agents were used?
Before embarking on an all encompassing collection of all prompts, training data and agent data, it is important to establish the scope of the collection need, the AI tool in scope, the level of licensing applied to the tool and the architecture used to host and or access the AI tool. This will heavily influence the collection approach taken to extract the data and for onward processing into a review platform, such as Relativity, Nuix or Farsight.
Data can be extracted from enterprise-licensed systems such as Microsoft 365 via Microsoft Purview, which provides a defensible way to identify, preserve, and collect AI-related activity where it is stored within the tenant. In practical terms, this may include prompts and responses generated through tools like Microsoft Copilot (and other supported experiences), together with associated context that can help with interpretation. Such as the user account, timestamps, conversation/session identifiers, the application used (e.g., Teams, Outlook, Word), and any relevant compliance or audit signals.
(For a related deep dive on Microsoft 365 collection complexity, see Modern Attachments in eDiscovery: What Your Collection Strategy Is Probably Missing.)
It is also possible to capture and extract certain artefacts from AI tools such as ChatGPT and others from mobile devices, although what is available will vary depending on the device type, operating system, app version, and how the user has configured the application (for example, whether chat history is enabled).
Depending on those factors, relevant material may include conversation content, projects and workspaces, locally stored drafts, attachments or images created/received within chats, cached media, and application metadata that can help to establish timelines and user activity.
In an eDiscovery or DFIR context, these mobile-derived artefacts can be particularly useful where access to the user’s primary account export is not available, where conversations were conducted on the go, or where corroboration of cloud-side exports is required. As with any mobile collection, careful scoping, documentation, and validation are essential, and practitioners should be mindful of encryption, retention behaviour, and the risk of over-collection of personal data.
If your organisation needs to preserve, extract, or process evidence relating to the use of AI tools such as ChatGPT, Microsoft Copilot, or Harvey, Lineal’s forensic services team can help. We have the experience, tooling, and methodology to handle AI-related collections defensibly across enterprise and mobile environments, and to deliver the resulting data into your chosen review platform. Get in touch to discuss your matter or to scope a collection.
__
About Author
Laura Collins is an accomplished digital forensics examiner, currently serving as Vice President of Shared Services & Forensics at Lineal. With extensive experience overseeing global forensic operations, complex investigations, and eDiscovery delivery, she has built her career across corporate, legal, and incident‑response environments. Laura’s background spans hands‑on forensic analysis, major incident response, and leading high‑performing teams to deliver innovative, defensible solutions for clients worldwide. Recognised for her operational leadership and deep technical expertise, she is committed to advancing high‑quality forensic services while driving collaboration, efficiency, and excellence across the organisation.
__
About Lineal
Lineal is an innovative eDiscovery and legal technology solutions company that empowers law firms and corporations with modern data management and review strategies. Established in 2009, Lineal specializes in comprehensive eDiscovery services, leveraging its proprietary technology suite, Amplify™ to enhance efficiency and accuracy in handling large volumes of electronic data. With a global presence and a team of experienced professionals, Lineal is dedicated to delivering custom-tailored solutions that drive optimal legal outcomes for its clients. For more information, visit lineal.com
Subscribe to our newsletter
Thank you for subscribing.
You’ll get practical insights, product updates, and content your team can actually use.
