Document Review Services

Data Coding/Indexing

 

Scanning/OCR

 

Web Hosting Service

 

 Litigation Forensics

 

Database Design

 


Don't wait for extraordinary circumstance to do good; try to use ordinary situations. Charles Richter


Advanced Interactive Document Access (AIDA)


Consultation

 

Print on Demand

 

Training

 

More services


  • Hashes and how they apply to E-Discovery
I am going to provide a simple explanation of what hash values are and how they are used for de-duplication in discovery of electronic documents. Hash values are also used in computer forensics for verification purposes, but we will speak to that at another time. A hash is a complex mathematical algorithm that when run against an electronic file will generate a short alphanumeric sequence calculated from the values of the given information. The algorithm ensures that if a character or a bit of information is removed, added or changed within the same electronic file, it will provide a completely different hash value. The odds of two different electronic files having the same MD5 hash value is a staggering 1 in 340,282,366,920,938,463,463,374,607,431,768,211,000 chance. Other algorithms such as the SHA-1 produce even more reliable hash strings, which are definitely more accurate than DNA or fingerprints and both of these are readily accepted by the courts.

So how are hash values used in e-discovery? We'll break e-discovery up into two sections, the first dealing with electronic documents and the second with e-mail repositories. As a result of email, electronic documents have become extremely easy to distribute, and in large volumes. When collecting electronic documents from custodians and backups for discovery purposes, it is very likely that you will get many duplicates. Comparing the hash values of these documents provides for a quick method of de-duplicating these files.

So how are hash values used in e-discovery? We'll break e-discovery up into two sections, the first dealing with electronic documents and the second with e-mail repositories. As a result of email, electronic documents have become extremely easy to distribute, and in large volumes. When collecting electronic documents from custodians and backups for discovery purposes, it is very likely that you will get many duplicates. Comparing the hash values of these documents provides for a quick method of de-duplicating these files.

So how are hash values used in e-discovery? We'll break e-discovery up into two sections, the first dealing with electronic documents and the second with e-mail repositories. As a result of email, electronic documents have become extremely easy to distribute, and in large volumes. When collecting electronic documents from custodians and backups for discovery purposes, it is very likely that you will get many duplicates. Comparing the hash values of these documents provides for a quick method of de-duplicating these files.

A hash algorithm can be run against any combination of electronic values. In the case de-duplicating emails, you can generate a hash string of certain portions of the email or a combination of email parts. The most common would include the Author, the Recipients, the Subject Line, the Content, and the Date and Time Sent. Creating a hash string of these combined values provides for a far more accurate method of de-duplicating emails because the Sender's and the Recipients' emails would have all the same values matching. Similar techniques are used for identifying near duplicates.

I trust this assists in understanding how hash values are used in e-discovery. If you would like any more information, please do not hesitate to contact us.

Girts Jansons
Litigation Support Technical Specialist
JLS inc.
girts@jls.ca
Providing Discover-E Services since 1995

Cell: 705-715-6808
Phn: 800-979-9139

If you would like to recieve periodic emails from us, please click here and type "Include" in the subject line.

For More Information, please contact:
Pati Jansons
Client Services
pati@jls.ca
JLS inc.
1-800-979-9139
705-737-1832
www.jls.ca