Recall and Precision
There's a great law.com
article by H. Christopher Boehning and Daniel J. Toal that discusses traditional keyword and Boolean search methods versus new alternative methods. Though the authors don't mention it specifically, their article discusses the theory of "recall" and "precision". The ability to search a corpus of documents and bring back all of the relevant material in a result set is called "recall". The ability to reduce the number of false positives in a result set is called "precision". Therefore, if you craft an overly broad search you may increase your recall, but lower your precision. This scenario usually results in a larger number of false positive documents to sort through in your review. If you have very few false positives in your result set, it allows you to identify relevant documents one-after-another with fairly high frequency, but the snapshot of material may be a very thin slice of the overall relevant material (high precision, low recall). In other words, there may be a lot more juicy stuff out there to review. The trick is--and this is the holy grail of search--how do you corral all of the good stuff without having any bad stuff mixed in?
It really depends on your review goals. The fallacy with most search efforts is a desire to only get low doc counts with the most relevant material possible. In this case, the emphasis for your review is on
precision (maybe because cost is your primary driving constraint). If relevant material is rampant within the corpus, however, you will want to increase your
recall in order to get at the full scope of your issue. You may tolerate a good number of false positives in order to be as thorough as possible (maybe completeness is your primary driving constraint). You'll want to decide quickly whether recall or precision is the ultimate goal of your review. Of course you'll want both, but after the review has started you'll want to shift your focus on one or the other depending on the incremental results of your review. You'll know quickly (after a day or two) if your review assignments are yielding the desired level of precision. In order to test your level of recall, you'll want to sample a population of the documents that were excluded from review (make sure it's statistically significant). Once you perform a QC review on this sample set, you'll know whether your search terms were sufficient in capturing enough relevant material.
As you all know, the iterative nature of this work is commonplace in our business. Unless you have a real sense of the percentage of relevant material to begin with, there's absolutely no way of knowing whether your search results have achieved the highest level of recall and precision until you
roll up your sleeves and just dig into it. If you're trusting the artificial intelligence of a system to do this "auto-magically" for you, either by concept grouping or "learning" or some other newfangled algorithm, then you are putting quite a bit of faith into the technology. Remember that most of this new technology is a carefully guarded trade secret belonging to the software vendor. In order to prove anything to the court, however, you have to be able to lift the hood and explain the goings-on underneath. The only defensible position that one can take these days, at least until there's a technology winner that is universally accepted by the court, is to present your search terms with hit counts and corresponding review calls. Keywords and Boolean searches are still the state-of-the-art today.
Only the Company Can Know Itself
In the latest law.com article,
Keeping Your Firm's E-Discovery In-House, Dale Buss recognizes that there's strong sentiment in the industry for "legal departments [to] establish as much as possible of the ESI-management function in-house as swiftly as they can [because] only the company over time truly can know itself". Robert Bjornsti, VP of AXA Equitable Life Insurance Co., echoed this sentiment earlier in the year at LegalTech NY when he delivered the day two keynote address on "Paradigm Shift -- Corporate Use of Legal Support Services". The argument here is that insourcing e-discovery work not only reduces cost, but is more effective. A corporation can fine tune it's response to a legal hold by tapping into the company's ERP system. Leveraging the HR metadata resident in enterprise databases gives you insight into a custodian's business function, the nature of the data that they keep, and the level of privileged and/or confidential information contained therein. "That way, when you get a discovery notice, the company can be very precise, not shotgun, about where the right data is." Performing this work behind the corporate firewall also enhances security and control. It allows corporations to reuse data for concurrent and pending matters within their litigation portfolio.
This is no small undertaking. First of all, e-discovery software is mostly proprietary and is geared to reside at the technology vendor's hosting facility. A lot of these homegrown solutions were developed by the technology vendors themselves and were invented to serve as a secondary offering to their consulting services. The software platform was never designed for general, off-the-shelf deployment within a company's network. Secondly, IT departments aren't equipped to deal with the high stakes nature of e-discovery work; and the personnel aren't suited at all to deal with attorneys and attorney requests. I used to be an IT guy and I can tell you that we are bred with a troubleshooting mindset. Everything is up for experimentation and subject to trial and error (we deal primarily with Microsoft tools, after all). This approach simply doesn't work in litigation. If the pendulum truly is swinging back from outsourcing to insourcing, it could come crashing in through corporate walls creating more damage than originally anticipated. For the enterprise that is litigation savvy and has a penchant for detail, it may very well be worth the effort. The corporation must understand that the effort will require an entirely new business function -- not supplanting the IT department, but working hand-in-hand with it. New (and very large) budgets will need to be allocated for hardware and people. Planning for an in-house staff of e-discovery professionals and a handful of reliable, independent consultants will go a long way in easing the transition.
Trend towards the Proactive
Many in our industry have predicted a trend towards more proactive e-discovery solutions, and I tend to agree. In its most simplest form, this argument means reducing the volume of data and overall costs. Whether this is accomplished through "early case analysis" or better software, the distinguishing feature is where & when one decides to pare the corpus of data for a particular matter. If you identify the priority custodians and send all of their material
en masse to a vendor, you are taking the traditional route and being
reactive. If however, you can pare the material by priority custodian, date range, and keywords onsite, behind the firewall at the corporation you are definitely being more proactive than most. Now, we all know keywords have limited effectiveness for identifying relevant material, but that's a topic for a whole other discussion. The point is, keyword search terms are still very commonly utilized in litigation matters and if you can filter the data ahead of time and send only the resultant material to your vendor, it will reduce your overall cost significantly.
Most attorneys will argue that it is within the client's interest to keep all the data in one location--typically at the technology vendor's data center; so in the event that keyword search terms change (which they will) or the priority custodian list changes (which it will), it will save time to make these changes on-the-fly in one unified location rather than in a piecemeal fashion, once at the corporation and once again at the vendor after more data has been shipped.
For my next blog entry, I will talk about the latest school of thought: let's keep all the data at the corporation and NEVER send it to a technology vendor!!
This Blog is dedicated to the men & women working directly in the trenches on EDD projects - junior attorneys, paralegals, project managers, document reviewers, data processors, and staff consultants alike, who put in countless stressful (and often thankless) hours doing what seems to be the impossible.

- Name: Jerry Bui
- Location: Los Angeles, California, United States
Jerry leads large scale discovery projects and investigations for government agencies and the country's top law firms. His background is in multi-tiered software architecture, security, data modeling/warehousing and document analytics. He has been involved in major front-page corporate cases, some of which involve hot-button matters such as Anti-money Laundering, Antitrust, and Options Back-dating.
View my complete profile
Project Managers, Practitioners, and Professionals...
Recall and Precision
Only the Company Can Know Itself
Trend towards the Proactive
The Offline Review
The Media Log
Repopulating Dupes
Database Mitosis
Waivering, To and Fro
Beware of Going Native
Ride The Lightning
E-Discovery 2.0
On the Mark
Law Tech Guru
EDDBlogOnline
April 2007 /
May 2007 /
November 2007 /
December 2007 /
January 2008 /
April 2008 /
May 2008 /
Disclaimer: Opinions and claims contained herein are those of the author only and are not representative of Jerry's employer, its partners, or any of its member firms.
This blog is intended to impart general information and does not offer specific legal advice. Use of this blog does not create an attorney-client relationship. If you require legal advice, consult an attorney.
Subscribe to
Posts [Atom]