In Re: Actos -- Does New Federal Litigation Clarify Predictive Coding In eDiscovery?

Guest Post by Matthew Nelson

In Re: Actos (Pioglitazone) Products Liability Litigation recently surfaced as the newest case to fuel the continuing debate about the use of predictive coding technology in litigation. The plaintiffs allege a prescription drug for the treatment of type 2 diabetes called Actos increases the risk of bladder cancer in patients.

On July 27, 2012, United States Magistrate Judge Hanna Doherty of the Western District of Louisiana entered a Case Management Order outlining the electronically stored information (ESI) protocol the parties must follow during discovery.

Central to the protocol is a detailed description of the predictive coding methodology the parties must utilize during discovery. Predictive coding is a type of machine-learning technology that enables a computer to help “predict” how documents should be classified based on limited training provided by their human counterparts. The technology is exciting for organizations attempting to manage skyrocketing legal budgets because predictive coding has the potential to save organizations millions in document review costs. That is because a computer can be used to help predict which documents within a large population of information are legally required to be produced to requesting parties during litigation. As a result, only a fraction of the documents that normally need to be manually reviewed by humans need to be reviewed. Supporters also believe this form of technology-assisted review is far faster and more accurate than paying attorneys to manually drudge through thousands or even millions of documents because computers don’t get tired or daydream.

Predictive coding technology is beginning to gain broader acceptance within the legal community as an alternative approach to the drudgery of page-by-page manual document review, but some have expressed reluctance. Part of the reason more attorneys haven’t embraced predictive coding technology is because although machine learning (the underlying technology behind predictive coding) has existed for decades, the technology is relatively new to the legal profession. Not surprisingly, early generation predictive coding tools are far more complex and difficult to use than traditional legal technology tools such as keyword search, concept search, email threading, and data clustering to name a few. Some feel asking attorneys to add predictive coding tools to their technology tool belt is like asking them to fly jet airplanes instead of drive cars. A jet airplane might be a faster and more efficient mode of transportation in many instances, but manning the cockpit without proper education and training could backfire.

The shifting legal technology landscape explains why so many in the litigation community have turned to the courts for further guidance on the use of predictive coding technology. Less than a year ago, there were no legal cases on record specifically addressing the use of predictive coding technology. Since early 2012, however, four new cases have surfaced, including In Re Actos. Each of the first three cases (Da Silva Moore, et. al. v. Publicis Groupe; Kleen Products, LLC, et. al. v. Packaging Corporation of America; and Aerospace Inc., et al, v. Landow Aviation, L.P. dba Dulles Jet Center) include unique fact patterns and present unique issues related to the use of predictive coding. One of the three cases also includes a judicial endorsement that predictive coding technology is even “acceptable in appropriate cases.” Unfortunately, none of these cases, including Actos, clarifies one of the most critical and important aspects of any reliable predictive coding protocol – statistical sampling.

Although the Order in Actos outlines the predictive coding protocol in detail, it does little to illuminate the complexity related to data sampling techniques that are the backbone of predictive coding technology. A clear explanation about how a sample is drawn from a larger population of documents is important because that sample can be used to estimate the prevalence of responsive documents within the larger population. If the sample is too small or is not random, then estimates regarding the number of responsive documents contained within the population could be wildly inaccurate. Improper sampling could also impact the accuracy of common methods used to measure the performance of the predictive coding tool being used.

For example, assume the parties to a case stipulate that the defendant will make a good faith effort to provide the plaintiff with at least 80% of the responsive documents contained within a collection of one million potentially responsive documents using a predictive coding tool. If the document sample drawn from the population is too small, the defendant might unwittingly produce far fewer than 80% of the responsive documents due to erroneous calculations resulting from flawed sampling procedures. This scenario is obviously bad for the receiving party because they expected a higher level of performance—at least 80% of all responsive documents—from the predictive coding system used by the producing party. The outcome is also bad for the producing party because they unwittingly made false representations to the court and the opposing party about the quality of their document production that could have been avoided with more diligence.

Perhaps more importantly in the grand scheme of things is the fact that this type of scenario is also bad for the future of predictive coding. Understandably, vendors do not want to reveal proprietary secrets behind their technology nor should they be required to in most circumstances. However, the industry should demand that predictive coding providers include enough transparency into their tools so both parties can evaluate the soundness of the underlying statistical approach being applied. That transparency should include sufficient information about the sampling approaches utilized throughout the process and visibility into the methodology the computer used to help determine why some documents are responsive and others are not.

Actos provides additional insight into how parties and judges are approaching the use of predictive coding technology during discovery. Like Da Silva Moore, Actos even demands a high degree of cooperation between the parties that requires them to make joint decisions about the relevance or non-relevance of the documents used to train the predictive coding system. Whether this level of transparency between parties should be required is debatable. However, what is less debatable is the need for transparency related to statistical sampling protocols so both parties understand what it is they are producing and receiving with a reasonable degree of confidence. While Actos represents another step in the evolution of predictive coding, the protocol explained in the Order does not provide enough transparency to ensure the overall integrity of the predictive coding process.

____________

Matthew Nelson is eDiscovery Counsel at Symantec and author of Predictive Coding for Dummies. Matthew is a member of The Sedona Conference, the Electronic Discovery Reference Model (EDRM) and is licensed to practice law in California and Idaho. Follow Matthew on Twitter at @InfoGovlawyer.com.

More From Forbes

In Re: Actos -- Does New Federal Litigation Clarify Predictive Coding In eDiscovery?