Friday, November 16, 2007
  Database Mitosis
"Uh, your database is too big. We need to split it."

Your organization may have bought into a evidence and discovery management tool because it was advertised as the biggest and fastest database on Earth. Sure, the backend database to these applications (SQL Server, Oracle, etc) may have benchmark statistics proving such claims, but what are the practical limits of the software package itself? A severe limitation is the software code that's overlaid on top of the database technology. As a past programmer, I know full well that code can be written efficiently or inefficiently. You can have programming code that cuts to the chase (1+1=2), or code that takes unnecessary leaps of logic (((2 x -10) + 40) / 10 = 2). Strip away the pretty interface and expedience makes all the difference. Let's face it, there's basics that we all need in a tool. If an extra bell or an extra whistle slows things down for you and your review team, it may not be worth the purchase.

Ask the following questions to a potential vendor:

1) How much data can we host in your software package? What's the practical upper limit given x number of users?
2) Should we consider multiple databases from the outset, based on the anticipated volume of data?
3) What affect do multiple databases have on deduplication? multi-user access? consolidated reporting?


Ask these questions from the get-go, and you won't be confronted with splitting a database, after-the-fact, once the performance of your nifty software package slows to an agonizing crawl. You may experience weeks of downtime before you're up and running again with multi-database constraints that you weren't even close to understanding beforehand.
 
Comments:
Love your blog Jerry. I've been managing large electronic reviews for about a year and it's great to read someone who also has that "in the trenches" perspective.

I've dealt with a mid-review database split before, and it wasn't pretty. While it can speed things up for users reviewing documents in each individual database, some of the benefits of a review platform can evaporate after a split.

For example, if you can't search across both databases simultaneously, it takes twice as much time to conduct every single search. Almost every advanced function will also be impacted, such as duplicate and near-duplicate identification and concept-based document grouping.

Since as you pointed out, the slowdown is coming from the software and not the backend database, splitting the database usually means working in parallel instances of the platform. And that means whoever is running the project might suddenly find themselves with twice as much work.
 
Steve, excellent follow up!
 
Post a Comment





<< Home
This Blog is dedicated to the men & women working directly in the trenches on EDD projects - junior attorneys, paralegals, project managers, document reviewers, data processors, and staff consultants alike, who put in countless stressful (and often thankless) hours doing what seems to be the impossible.

View Jerry Bui's profile on LinkedIn

My Photo
Name: Jerry Bui
Location: Los Angeles, California, United States

Jerry leads large scale discovery projects and investigations for government agencies and the country's top law firms. His background is in multi-tiered software architecture, security, data modeling/warehousing and document analytics. He has been involved in major front-page corporate cases, some of which involve hot-button matters such as Anti-money Laundering, Antitrust, and Options Back-dating.

Previous Posts

Waivering, To and Fro
Beware of Going Native
Meta-Four
Chain of Fools
I can review faster using Paper
I hate my Project Manager!
This is just the beginning. We are going to be bur...

Blogroll
Ride The Lightning
E-Discovery 2.0
On the Mark
Law Tech Guru
EDDBlogOnline

Archives

April 2007 / May 2007 / November 2007 / December 2007 / January 2008 / April 2008 / May 2008 /


Add to Technorati Favorites

Powered by Blogger

Disclaimer: Opinions and claims contained herein are those of the author only and are not representative of Jerry's employer, its partners, or any of its member firms.

This blog is intended to impart general information and does not offer specific legal advice. Use of this blog does not create an attorney-client relationship. If you require legal advice, consult an attorney.

Subscribe to
Posts [Atom]