Quick Navigation

      * JLS NewsLetter Page

 

JLS' Litigation Services Update
 
  • Hashes and how they apply to E-Discovery

I am going to provide a simple explanation of what hash values are and how they are used for de-duplication in discovery of electronic documents.  Hash values are also used in computer forensics for verification purposes, but we will speak to that at another time.  A hash is a complex mathematical algorithm that when run against an electronic file will generate a short alphanumeric sequence calculated from the values of the given information.  The algorithm ensures that if a character or a bit of information is removed, added or changed within the same electronic file, it will provide a completely different hash value.  The odds of two different electronic files having the same MD5 hash value is a staggering 1 in 340,282,366,920,938,463,463,374,607,431,768,211,000 chance.  Other algorithms such as the SHA-1 produce even more reliable hash strings, which are definitely more accurate than DNA or fingerprints and both of these are readily accepted by the courts.

So how are hash values used in e-discovery?  We'll break e-discovery up into two sections, the first dealing with electronic documents and the second with e-mail repositories.  As a result of email, electronic documents have become extremely easy to distribute, and in large volumes.  When collecting electronic documents from custodians and backups for discovery purposes, it is very likely that you will get many duplicates.  Comparing the hash values of these documents provides for a quick method of de-duplicating these files.

W
hat about emails?  Well emails are a completely different animal.  If you tried to de-duplicate complete emails based on their hash value you would have little luck.  Here is a quick explanation.  If an email is sent to multiple parties they should all be duplicates of each other but their hash values would not match.  The reason for this is that the receiving email server for each recipient adds new information to the email as well as the time received, which would most likely be different for each recipient.  Accordingly, when hashed, a different hash value will be returned for each email.

S
o how can you use hash values with emails to assist in email de-duplication?  Most email messages contain a unique identifier called the "Message ID" which is a Globally Unique Identifier (GUID).  Although comparing GUID values is an accepted method of de-duplicating emails, it is not mandatory that all emails have this value and, therefore, not all email servers assign Message IDs to their emails.  As a result, you can end up with large amounts of emails to dedupe, without a Message ID.  This is where using hash values become useful.

A h
ash algorithm can be run against any combination of electronic values.  In the case de-duplicating emails, you can generate a hash string of certain portions of the email or a combination of email parts.  The most common would include the Author, the Recipients, the Subject Line, the Content, and the Date and Time Sent.  Creating a hash string of these combined values provides for a far more accurate method of de-duplicating emails because the Sender's and the Recipients' emails would have all the same values matching.  Similar techniques are used for identifying near duplicates.

I t
rust this assists in understanding how hash values are used in e-discovery.  If you would like any more information, please do not hesitate to contact us.

Girts Jansons
Litigation Support Technical Specialist
JLS inc.
girts@jls.ca
Providing Discover-E Services since 1995

Cell: 705-715-6808
Phn: 800-979-9139

If you would like to recieve periodic emails from us, please click here and type "Include" in the subject line.



Interesting Articles
 
Sedona Canada E-Discovery Principles 
http://www.lexum.umontreal.ca/e-discovery/SedonaCanadaPrinciples01-08.pdf

 Electronic Document Discovery / Litigation Forensics
http://www.jls.ca/press/Elec_Doc_Discovery.pdf
E-Discovery Canada
http://www.lexum.umontreal.ca/e-discovery/
   


   


   
   


   


   


   
Security as a marketing angle
http://www2.cio.com/research/security/edit/a04112002.html
   
Copyright © 2005 JLS Inc. All Rights Reserved. Web site Design By: ADG Canada