Computer assisted review
It can be hard to find specific evidence to support broad assertions of systematic misconduct, such as a glass ceiling for female employees. These cases often require very extensive discovery, and the inspection process can run into the millions of dollars.
Keyword searches can provide some results, but it is easy to write a document on a particular topic without using any particular keywords. Keyword searching will miss these documents. So what to do when you have millions of emails between many senior managers over many years?
One technique is “computer-assisted review”, or “predictive coding”. Searches are performed using rules created by watching how experienced lawyers analyse a set of documents taken from the actual potential discovery set. These rules are far more complex than keywords, but they require that very experienced lawyers create the rules. The software watches while the documents are coded, and it attempts to predict the coding results. After sufficient cycles of review and feedback the software becomes capable of either determinig yes/no relevance or providing a relevance score. This enables the legal team to prioritise the review of those documents. Where a relevance score is used the parties may attempt to agree on a minimum threshold for manual review, thus containing costs.
In Da Silva Moore et al v Publicis Groupe & MSL Group (USDC, SD of NY, 24 Feb 2012) (http://goo.gl/0pNzq) the court dealt with consent orders using computer-assisted review in relation to a glass-ceiling case. Particular processes were required, such as maintaining the sample set and a documented quality control regime to assist in dealing with arguments as to the accuracy of the process. Magistrate Judge Peck had previously said “Key words, certainly unless they are well done and tested, are not overly useful. Key words along with predictive coding and other methodology, can be very instructive.”
The defendants proposed that the top 40,000 documents be produced, but this approach was rejected as it did not deal with what the statistics showed for the results. It may result in many relevant documents being excluded.
Since some data was in an email account of a French citizen, Peck MJ also mentioned the Sedona Conference’s (a research and educational institute) International Principles of Discovery, Disclosure and Data Protection publication. This deals with the challenges of competing international privacy laws. This is a particular issue since the EU is drafting a replacement General Data Protection Regulation that requires strict personal data protection compliance for non-EU countries. A penalty of up to 2% of world-wide turnover may be applied for breach, and it will be compulsory to notify data protection authorities and the individuals concerned of of a breach or leak within 24 hours. The rules will apply to non-EU based businesses who have subsidiaries in the EU or offer goods or services to EU-based customers.
The parties started by selecting a sample of documents with a 95% confidence level. using that to train the software. Keyword sample sets were also produced, and in the end around 7,000 documents were given to senior attorneys to create the seed set, and the court made the point that these were not paralegals, in-house lawyers or junior associates. The defendants proposed seven iterative rounds of training and testing, at which the plaintiffs baulked, but the court “reminded the parties that computer-assisted review works better than most of the alternatives, if not all of the [present] alternatives. So the idea is not to make this perfect, it’s not going to be perfect. The idea is to make it significantly better than the alternatives without nearly as much cost.” Now, there’s an idea.