Database Mitosis
"Uh, your database is too big. We need to split it."
Your organization may have bought into a evidence and discovery management tool because it was advertised as the biggest and fastest database on Earth. Sure, the backend database to these applications (SQL Server, Oracle, etc) may have benchmark statistics
proving such claims, but what are the practical limits of the software package itself? A severe limitation is the software code that's overlaid on top of the database technology. As a past programmer, I know full well that code can be written efficiently or inefficiently. You can have programming code that cuts to the chase (1+1=2), or code that takes unnecessary leaps of logic (((2 x -10) + 40) / 10 = 2). Strip away the pretty interface and expedience makes all the difference. Let's face it, there's basics that we all need in a tool. If an extra bell or an extra whistle slows things down for you and your review team, it may not be worth the purchase.
Ask the following questions to a potential vendor:
1) How much data can we host in your software package? What's the practical upper limit given x number of users?
2) Should we consider multiple databases from the outset, based on the anticipated volume of data?
3) What affect do multiple databases have on deduplication? multi-user access? consolidated reporting?Ask these questions from the get-go, and you won't be confronted with splitting a database,
after-the-fact, once the performance of your nifty software package slows to an agonizing crawl. You may experience weeks of downtime before you're up and running again with multi-database constraints that you weren't even close to understanding beforehand.
Waivering, To and Fro
Crafting a list of keywords that will retrieve a maximum number of responsive documents on your matter requires planning and knowledge. Skilled practitioners in our field understand that it requires interviews with relevant custodians (to understand organizational lingo), and a firm understanding of the specific search technology that's employed. We also know that this methodology shouldn't only apply to the keywords
within a document, but also in the TO and FROM fields in email metadata as well. Almost everyone has, at minimum, two email accounts - one for work and one for personal communication. Some of us have more and I've seen as many as twelve corporate email addresses for the same person at an organization. For example, "customersupport@xyz.com", "marketing@xyz.com", "helpdesk@xyz.com", "accounting@xyz.com", etc. While e-discovery typically targets work and personal email, this will certainly grow once other types of "e-communication" accounts are brought into the fold, such as Instant Messaging and cellular text messaging accounts.
If you are required to search email communication by one or more individuals and the available custodian information won't suffice, you will need to capture all variations in the TO and FROM fields (and possibly the CC and BCC fields). The format of these fields can vary widely by including just the email address (jbui@xyz.com), the display name (Jerry Bui), or some combination of the two. You might also observe some of other formatting wildness, such as the following:
CCMAIL: Jerry T Bui at XYZ_US
MS: XYZ/US/JTBUI
X400:c=US;a=CONCERT;p=XYZ;s=Bui;g=Jerry;i=T;
If you're looking at personal email accounts, then all bets are off. These tend to look like any of the following:
prettyflower_1963@yahoo.com
ifixmustangs@gmail.com
jb74_forensicexpert@msn.com
In this scenario, searching the TO and FROM fields for elements of the person's name just won't work. Keep in mind, too, that individuals can change their DISPLAY NAME alias numerous times over the course of owning an email account. Realize that you will need to tease this information out during custodian interviews and you will also need to sample the material yourself;
look at the email headers and note the variations. You will want to include all variations of a person's name, email address, and display name alias as part of your search term list. Otherwise, any misunderstanding of what's included in the TO and FROM fields could cause you to overlook relevant communication.
Labels: metadata