E-Discovery In the Trenches
Saturday, April 26, 2008
  Recall and Precision
There's a great law.com article by H. Christopher Boehning and Daniel J. Toal that discusses traditional keyword and Boolean search methods versus new alternative methods. Though the authors don't mention it specifically, their article discusses the theory of "recall" and "precision". The ability to search a corpus of documents and bring back all of the relevant material in a result set is called "recall". The ability to reduce the number of false positives in a result set is called "precision". Therefore, if you craft an overly broad search you may increase your recall, but lower your precision. This scenario usually results in a larger number of false positive documents to sort through in your review. If you have very few false positives in your result set, it allows you to identify relevant documents one-after-another with fairly high frequency, but the snapshot of material may be a very thin slice of the overall relevant material (high precision, low recall). In other words, there may be a lot more juicy stuff out there to review. The trick is--and this is the holy grail of search--how do you corral all of the good stuff without having any bad stuff mixed in?

It really depends on your review goals. The fallacy with most search efforts is a desire to only get low doc counts with the most relevant material possible. In this case, the emphasis for your review is on precision (maybe because cost is your primary driving constraint). If relevant material is rampant within the corpus, however, you will want to increase your recall in order to get at the full scope of your issue. You may tolerate a good number of false positives in order to be as thorough as possible (maybe completeness is your primary driving constraint). You'll want to decide quickly whether recall or precision is the ultimate goal of your review. Of course you'll want both, but after the review has started you'll want to shift your focus on one or the other depending on the incremental results of your review. You'll know quickly (after a day or two) if your review assignments are yielding the desired level of precision. In order to test your level of recall, you'll want to sample a population of the documents that were excluded from review (make sure it's statistically significant). Once you perform a QC review on this sample set, you'll know whether your search terms were sufficient in capturing enough relevant material.

As you all know, the iterative nature of this work is commonplace in our business. Unless you have a real sense of the percentage of relevant material to begin with, there's absolutely no way of knowing whether your search results have achieved the highest level of recall and precision until you roll up your sleeves and just dig into it. If you're trusting the artificial intelligence of a system to do this "auto-magically" for you, either by concept grouping or "learning" or some other newfangled algorithm, then you are putting quite a bit of faith into the technology. Remember that most of this new technology is a carefully guarded trade secret belonging to the software vendor. In order to prove anything to the court, however, you have to be able to lift the hood and explain the goings-on underneath. The only defensible position that one can take these days, at least until there's a technology winner that is universally accepted by the court, is to present your search terms with hit counts and corresponding review calls. Keywords and Boolean searches are still the state-of-the-art today.
 
Thursday, April 17, 2008
  Only the Company Can Know Itself
In the latest law.com article, Keeping Your Firm's E-Discovery In-House, Dale Buss recognizes that there's strong sentiment in the industry for "legal departments [to] establish as much as possible of the ESI-management function in-house as swiftly as they can [because] only the company over time truly can know itself". Robert Bjornsti, VP of AXA Equitable Life Insurance Co., echoed this sentiment earlier in the year at LegalTech NY when he delivered the day two keynote address on "Paradigm Shift -- Corporate Use of Legal Support Services". The argument here is that insourcing e-discovery work not only reduces cost, but is more effective. A corporation can fine tune it's response to a legal hold by tapping into the company's ERP system. Leveraging the HR metadata resident in enterprise databases gives you insight into a custodian's business function, the nature of the data that they keep, and the level of privileged and/or confidential information contained therein. "That way, when you get a discovery notice, the company can be very precise, not shotgun, about where the right data is." Performing this work behind the corporate firewall also enhances security and control. It allows corporations to reuse data for concurrent and pending matters within their litigation portfolio.

This is no small undertaking. First of all, e-discovery software is mostly proprietary and is geared to reside at the technology vendor's hosting facility. A lot of these homegrown solutions were developed by the technology vendors themselves and were invented to serve as a secondary offering to their consulting services. The software platform was never designed for general, off-the-shelf deployment within a company's network. Secondly, IT departments aren't equipped to deal with the high stakes nature of e-discovery work; and the personnel aren't suited at all to deal with attorneys and attorney requests. I used to be an IT guy and I can tell you that we are bred with a troubleshooting mindset. Everything is up for experimentation and subject to trial and error (we deal primarily with Microsoft tools, after all). This approach simply doesn't work in litigation. If the pendulum truly is swinging back from outsourcing to insourcing, it could come crashing in through corporate walls creating more damage than originally anticipated. For the enterprise that is litigation savvy and has a penchant for detail, it may very well be worth the effort. The corporation must understand that the effort will require an entirely new business function -- not supplanting the IT department, but working hand-in-hand with it. New (and very large) budgets will need to be allocated for hardware and people. Planning for an in-house staff of e-discovery professionals and a handful of reliable, independent consultants will go a long way in easing the transition.
 
Tuesday, April 15, 2008
  Trend towards the Proactive
Many in our industry have predicted a trend towards more proactive e-discovery solutions, and I tend to agree. In its most simplest form, this argument means reducing the volume of data and overall costs. Whether this is accomplished through "early case analysis" or better software, the distinguishing feature is where & when one decides to pare the corpus of data for a particular matter. If you identify the priority custodians and send all of their material en masse to a vendor, you are taking the traditional route and being reactive. If however, you can pare the material by priority custodian, date range, and keywords onsite, behind the firewall at the corporation you are definitely being more proactive than most. Now, we all know keywords have limited effectiveness for identifying relevant material, but that's a topic for a whole other discussion. The point is, keyword search terms are still very commonly utilized in litigation matters and if you can filter the data ahead of time and send only the resultant material to your vendor, it will reduce your overall cost significantly.

Most attorneys will argue that it is within the client's interest to keep all the data in one location--typically at the technology vendor's data center; so in the event that keyword search terms change (which they will) or the priority custodian list changes (which it will), it will save time to make these changes on-the-fly in one unified location rather than in a piecemeal fashion, once at the corporation and once again at the vendor after more data has been shipped.

For my next blog entry, I will talk about the latest school of thought: let's keep all the data at the corporation and NEVER send it to a technology vendor!!
 
Tuesday, January 15, 2008
  The Offline Review
Every so often, there's an unavoidable need to export documents out of your review platform for "offline review". This can mean something as simple as printing documents out for an attorney to provide handwritten comments; or it can mean something more complicated like exporting documents to an offline format because your system's native viewer can't render documents containing illegible text, password protection, or foreign language content.

Make sure this is a necessity. Tracking these documents later can create a huge reconciliation headache.

Ensure that everything has been tried within the system to fix your problematic docs. If TIFF-on-demand, or installing language packs, or password recovery measures don't fix your documents, then tread carefully with "offline review". Remember these challenges:

How do you summarize the review markings and production status of these offline documents in your standard status reports?

How do you maintain an audit trail for the way these documents change over time during the course of the offline review?

If you or someone on your team backfills markings, annotations, and redactions into your online system on the reviewers' behalf, know that YOU will be recorded as the reviewer for that subset of documents. How does this affect the accuracy of your reviewer progress reports?


You'll discover that your pretty online reports are riddled with asterisks and footnotes, referencing ugly, confusing spreadsheets that contain specific stats for your "offline review". Also, unless you are extremely meticulous with offline tracking, your ability to confidently explain the status of your review quickly diminishes once you head down the path of "offline review".
 
Saturday, December 29, 2007
  The Media Log
It's also referred to as a tracking spreadsheet or delivery manifest, but the "media log" is one of the most important pieces of paper in your Chain of Custody. If the sending party doesn't provide a media log to accompany a piece of delivered data, don't process the data! If they push back and say, "Can you guys just fill out the log based on what's on the DVD?", don't do it. You have no way of knowing what's on the disc. There have been many, many instances where the sending party forgot something that they "intended" to send. There's also the event that the sending party accidentally copied material to the media that was collected for another matter altogether. The log is a tool to confirm the nature and validity of contents contained therein. Months can go by and a question could eventually arise, "Didn't you process Custodian ABC hard drive data in batch XYZ? It was supposed to be on the DVD that we sent you", or more egregiously, "Why am I seeing Custodian ABC data in the repository? He has absolutely nothing to do with this case!" The media log allows you to address discrepancies immediately.

You can process the material that is sent "as-is" as long as you accept all assumptions that go along with it. You can analyze the contents beforehand and can report any anomalies, but there's no way of confirming for sure the accuracy or thoroughness of the delivery. In other words:

Accuracy: Are all the custodian sources there?
Thoroughness: Was all the data copied? If you see an empty source directory for Custodian ABC's network share, should you be concerned?

I know we've been in the business for a long time, so these risks are semi-obvious, but it doesn't hurt to reiterate at the outset of a new project. The workflow and standards that you enforce beforehand really set the stage for a successful engagement.
 
This Blog is dedicated to the men & women working directly in the trenches on EDD projects - junior attorneys, paralegals, project managers, document reviewers, data processors, and staff consultants alike, who put in countless stressful (and often thankless) hours doing what seems to be the impossible.

View Jerry Bui's profile on LinkedIn

My Photo
Name: Jerry Bui
Location: Los Angeles, California, United States

Jerry leads large scale discovery projects and investigations for government agencies and the country's top law firms. His background is in multi-tiered software architecture, security, data modeling/warehousing and business analytics. He has been involved in major front-page corporate cases, some of which involve hot-button matters such as Anti-money Laundering, Antitrust, and Options Back-dating.

Previous Posts

Recall and Precision
Only the Company Can Know Itself
Trend towards the Proactive
The Offline Review
The Media Log
Repopulating Dupes
Database Mitosis
Waivering, To and Fro
Beware of Going Native
Meta-Four

Blogroll
Ride The Lightning
E-Discovery 2.0
On the Mark
Law Tech Guru
EDDBlogOnline

Archives

April 2007 / May 2007 / November 2007 / December 2007 / January 2008 / April 2008 /


Add to Technorati Favorites

Powered by Blogger

Disclaimer: Opinions and claims contained herein are those of the author only and are not representative of Jerry's employer, its partners, or any of its member firms.

This blog is intended to impart general information and does not offer specific legal advice. Use of this blog does not create an attorney-client relationship. If you require legal advice, consult an attorney.

Subscribe to
Posts [Atom]