Friday, November 16, 2007
  Database Mitosis
"Uh, your database is too big. We need to split it."

Your organization may have bought into a evidence and discovery management tool because it was advertised as the biggest and fastest database on Earth. Sure, the backend database to these applications (SQL Server, Oracle, etc) may have benchmark statistics proving such claims, but what are the practical limits of the software package itself? A severe limitation is the software code that's overlaid on top of the database technology. As a past programmer, I know full well that code can be written efficiently or inefficiently. You can have programming code that cuts to the chase (1+1=2), or code that takes unnecessary leaps of logic (((2 x -10) + 40) / 10 = 2). Strip away the pretty interface and expedience makes all the difference. Let's face it, there's basics that we all need in a tool. If an extra bell or an extra whistle slows things down for you and your review team, it may not be worth the purchase.

Ask the following questions to a potential vendor:

1) How much data can we host in your software package? What's the practical upper limit given x number of users?
2) Should we consider multiple databases from the outset, based on the anticipated volume of data?
3) What affect do multiple databases have on deduplication? multi-user access? consolidated reporting?


Ask these questions from the get-go, and you won't be confronted with splitting a database, after-the-fact, once the performance of your nifty software package slows to an agonizing crawl. You may experience weeks of downtime before you're up and running again with multi-database constraints that you weren't even close to understanding beforehand.
 
Thursday, November 8, 2007
  Waivering, To and Fro
Crafting a list of keywords that will retrieve a maximum number of responsive documents on your matter requires planning and knowledge. Skilled practitioners in our field understand that it requires interviews with relevant custodians (to understand organizational lingo), and a firm understanding of the specific search technology that's employed. We also know that this methodology shouldn't only apply to the keywords within a document, but also in the TO and FROM fields in email metadata as well. Almost everyone has, at minimum, two email accounts - one for work and one for personal communication. Some of us have more and I've seen as many as twelve corporate email addresses for the same person at an organization. For example, "customersupport@xyz.com", "marketing@xyz.com", "helpdesk@xyz.com", "accounting@xyz.com", etc. While e-discovery typically targets work and personal email, this will certainly grow once other types of "e-communication" accounts are brought into the fold, such as Instant Messaging and cellular text messaging accounts.

If you are required to search email communication by one or more individuals and the available custodian information won't suffice, you will need to capture all variations in the TO and FROM fields (and possibly the CC and BCC fields). The format of these fields can vary widely by including just the email address (jbui@xyz.com), the display name (Jerry Bui), or some combination of the two. You might also observe some of other formatting wildness, such as the following:

CCMAIL: Jerry T Bui at XYZ_US
MS: XYZ/US/JTBUI
X400:c=US;a=CONCERT;p=XYZ;s=Bui;g=Jerry;i=T;

If you're looking at personal email accounts, then all bets are off. These tend to look like any of the following:

prettyflower_1963@yahoo.com
ifixmustangs@gmail.com
jb74_forensicexpert@msn.com

In this scenario, searching the TO and FROM fields for elements of the person's name just won't work. Keep in mind, too, that individuals can change their DISPLAY NAME alias numerous times over the course of owning an email account. Realize that you will need to tease this information out during custodian interviews and you will also need to sample the material yourself; look at the email headers and note the variations. You will want to include all variations of a person's name, email address, and display name alias as part of your search term list. Otherwise, any misunderstanding of what's included in the TO and FROM fields could cause you to overlook relevant communication.

Labels:

 
This Blog is dedicated to the men & women working directly in the trenches on EDD projects - junior attorneys, paralegals, project managers, document reviewers, data processors, and staff consultants alike, who put in countless stressful (and often thankless) hours doing what seems to be the impossible.

View Jerry Bui's profile on LinkedIn

My Photo
Name: Jerry Bui
Location: Los Angeles, California, United States

Jerry leads large scale discovery projects and investigations for government agencies and the country's top law firms. His background is in multi-tiered software architecture, security, data modeling/warehousing and document analytics. He has been involved in major front-page corporate cases, some of which involve hot-button matters such as Anti-money Laundering, Antitrust, and Options Back-dating.

Previous Posts

Project Managers, Practitioners, and Professionals...
Recall and Precision
Only the Company Can Know Itself
Trend towards the Proactive
The Offline Review
The Media Log
Repopulating Dupes
Database Mitosis
Waivering, To and Fro
Beware of Going Native

Blogroll
Ride The Lightning
E-Discovery 2.0
On the Mark
Law Tech Guru
EDDBlogOnline

Archives

April 2007 / May 2007 / November 2007 / December 2007 / January 2008 / April 2008 / May 2008 /


Add to Technorati Favorites

Powered by Blogger

Disclaimer: Opinions and claims contained herein are those of the author only and are not representative of Jerry's employer, its partners, or any of its member firms.

This blog is intended to impart general information and does not offer specific legal advice. Use of this blog does not create an attorney-client relationship. If you require legal advice, consult an attorney.

Subscribe to
Posts [Atom]