Predictive Coding In The Document Review Process

Since the infamous Jackson Reforms were implemented in April 2013, keeping costs ‘proportionate’ and exercising stringent case management, have been at the forefront of litigator’s minds. 

Fortunately, the continued development of AI and machine learning has led to solutions which make keeping litigation cost proportionate easier, benefiting not only the client but society as a whole.

Predictive coding is one of these developments which has revolutionised the document review process.

What is predictive coding?

Predictive coding systems apply complex algorithms which, based upon their analysis of review decisions, identify similar documents which are prioritised for review.  This allows for the limitation of the review of irrelevant documents and allows relevant documents to be captured as efficiently as possible, improving recall and precision.

The theory and technology behind predictive coding is based upon commonly used statistical techniques which have been widely adopted in a variety of industries including accountancy, insurance, banking, financial services, pharmaceuticals, and healthcare to name a few.  Predictive technology is often used to analyse data to assess risk and make future predictions.  Common everyday uses are in credit scoring, fraud identification, and risk underwriting.

One of the biggest advantages of using predictive coding in the document review process is that unlike with human review, the cost does not increase at the same rate as the number of documents to be reviewed

How does predictive coding in document review work?

Typically, a senior solicitor will start by ‘training’ an algorithm using a set of ‘seed’ documents.  The algorithm learns from the characteristics of the documents and the solicitor’s reactions and decisions relating to them.  It then searches for similar documents and ranks them in terms of relevance.  These documents can then be prioritised for review.  The process continues until no further relevant documents are provided by the system for review, or the number of documents available becomes so low that continuing the review process would be disproportionate.

The composition of the seed, how long it should be trained for, and the breadth of quality control methods which should be employed, is debatable amongst the profession.


The court’s treatment of the use of predictive coding software

Predictive technology has been used in American and Irish courts from 2015[1].   The method, which can save hundreds of thousands of pounds associated with document review, was first approved in the English courts in the case of Pyrrho Investments and others v MWB Property and others [2016] EWHC 256 (Ch).  Here, over three million electronic documents had to be considered for disclosure.

It was proposed that at the beginning of the process, the parties would agree a predictive coding protocol covering matters including the definition of the data-set and sample size. Criteria then had to be decided upon for inclusion of documents in the process; those criteria would include who had the documents, the date range, and perhaps whether the documents contained specific keywords. A representative sample of the included documents was used to “train” the software.  A person who would otherwise be deciding as to relevance for the whole document set, considered each document in the sample and each document was categorised accordingly.

In making his decision, Master Paul Matthews had little in the way of English authorities to refer to.  He therefore reviewed decisions from other jurisdictions, notably the US Federal Court case of Moore v Publicis Groupe, 11 Civ 1279 (ALC)(AJP) and Irish Bank Resolution Corporation Ltd v Quinn [2015] IEHC 175.  Ultimately, the use of predictive coding in the document review process was approved on the grounds that:

  • authorities from overseas jurisdictions supported the use of predictive coding in appropriate cases
  • there was no evidence that predictive coding led to less accurate disclosure being given than a manual review alone, or keyword searches and manual review combined
  • there will be greater consistency in using the computer to apply the approach of a senior lawyer towards the initial sample (as refined), to the whole document set, than in using dozens, perhaps hundreds, of lower-grade fee-earners, each seeking independently to apply the relevant criteria in relation to individual documents
  • there is nothing in the CPR or Practice Directions to prohibit the use of such software
  • the cost of a manual review would run into the millions of pounds, therefore, a full manual review of each document would be “unreasonable” within paragraph 25 of Practice Direction B to Part 31, at least where a suitable automated alternative exists at lower cost
  • the parties had already agreed on the use of predictive coding software and only required the approval of the court

In 2016 City law firm Berwin Leighton Paisner (BLP) won the first contested application to use predictive coding as part of a substantial document review exercise.  The case, Brown v BCA Trading Ltd [2016] EWHC 1464 (Ch) concerned a disagreement between the parties on whether the Respondents should provide electronic disclosure using predictive coding or a more traditional keyword approach, and whether a costs management order should be made in respect of the parties’ costs budgets, which were partly, but not wholly, agreed.

The Respondents argued that using predictive coding would be the most reasonable and proportionate method of disclosure. They stated it would cost around £132,000, while a traditional keyword search would cost at least £250,000.

Mr Registrar Jones, in making his decision, made it clear that although the cost difference between predictive coding and keyword search was relevant and persuasive, this was only to the point in which it is effective to achieve the disclosure required.

“When the size of potential disclosure is significant both in terms of quantity of documents and the time required to be spent on the disclosure process, it is particularly important for the lawyers to identify by reference to the true issues, the anticipated categories of documents and to enter into discussions to seek to minimise the work required and therefore the costs.”

The registrar went on:

“The statements of case from both sides within this section 994 Companies Act 2006 petition present extremely broad issues of factual dispute.

“Realistically, however, experience shows that issues will narrow significantly by the time the trial is reached. This can mean that what may have appeared to be necessary disclosure based upon the statements of case at this stage, will turn out to have been unnecessary and indeed to a large degree irrelevant to the way the case will be heard at trial.”

The Registrar stated that although it may be challenging, solicitors must make a “reasonable attempt” to foresee the outcome.

“A successful outcome from the use of predictive coding must, at least to some extent, depend upon the success of the parties having been able first to narrow down the issues and therefore the categories/types of documents relevant to the disclosure process.”

Referring to the decision in Pyrrho Investments and others v MWB Property and others, the registrar noted that all factors cited by Master Paul Matthews for why predictive coding should be employed applied to this case, except for one relating to the agreement of the parties.  Despite this, an order for the use of predictive coding was made:

“There is nothing, as yet, to suggest that predictive coding will not be able to identify the documents which would otherwise be identified through, for example, keyword search and, more importantly, with the full cost of employees/agents having to carry out extensive investigations as to whether documents should be disclosed or not.”


The future of predictive coding in the UK

BLP has credited the use of predictive coding with uncovering documents which helped it win its client’s case.

Corporate risk partner, Oliver Glynn-Jones, told Legal IT Insider: “At the case management conference we had to make the case for predictive coding and we presented evidence to show that it is better than human review and around a third cheaper than manual review. All the safeguards the Court was looking for were in place and we persuaded the Court that it was the right way to proceed”.

He added: “This is a case where the documents were key and pivotal to the judgment – and that’s what came out of the predictive coding exercise[2]”.

UK law firms have been slower to embrace predictive coding that their counterparts in the US.  However, BLP’s success is likely to pave the way for more firms to utilise the technology.   Mr Glyn-Jones stated:

“The reality is that the courts are already aware of the technology and want to use it. If you have an example where it’s been used in a contested manner and it’s been successful, then people will point to it as a demonstration of it working in practice.”

Lineal is a global leader in providing flexible eDiscovery and litigation support.  To find out more about predictive coding, eDiscovery and our other services, please call us on +44 (0)20 7940 4799 or email

[1] Irish Bank Resolution Corporation Ltd & ors v Quinn & ors[2015] IEHC 175 sanctioned the use of predictive coding in disclosure in Irish courts