![]() |
|
![]() |
||
|
|
Don't wait for extraordinary circumstance to do good; try to use ordinary situations. Charles Richter |
|
||
|
|
||||
I am going to provide a simple explanation of what hash values are and how they
are used for de-duplication in discovery of electronic documents. Hash values are
also used in computer forensics for verification purposes, but we will speak to
that at another time. A hash is a complex mathematical algorithm that when run against
an electronic file will generate a short alphanumeric sequence calculated from the
values of the given information. The algorithm ensures that if a character or a
bit of information is removed, added or changed within the same electronic file,
it will provide a completely different hash value. The odds of two different electronic
files having the same MD5 hash value is a staggering 1 in 340,282,366,920,938,463,463,374,607,431,768,211,000
chance. Other algorithms such as the SHA-1 produce even more reliable hash strings,
which are definitely more accurate than DNA or fingerprints and both of these are
readily accepted by the courts.
So how are hash values used in e-discovery? We'll break e-discovery up into two sections, the first dealing with electronic documents and the second with e-mail repositories. As a result of email, electronic documents have become extremely easy to distribute, and in large volumes. When collecting electronic documents from custodians and backups for discovery purposes, it is very likely that you will get many duplicates. Comparing the hash values of these documents provides for a quick method of de-duplicating these files. So how are hash values used in e-discovery? We'll break e-discovery up into two sections, the first dealing with electronic documents and the second with e-mail repositories. As a result of email, electronic documents have become extremely easy to distribute, and in large volumes. When collecting electronic documents from custodians and backups for discovery purposes, it is very likely that you will get many duplicates. Comparing the hash values of these documents provides for a quick method of de-duplicating these files. So how are hash values used in e-discovery? We'll break e-discovery up into two sections, the first dealing with electronic documents and the second with e-mail repositories. As a result of email, electronic documents have become extremely easy to distribute, and in large volumes. When collecting electronic documents from custodians and backups for discovery purposes, it is very likely that you will get many duplicates. Comparing the hash values of these documents provides for a quick method of de-duplicating these files. A hash algorithm can be run against any combination of electronic values. In the case de-duplicating emails, you can generate a hash string of certain portions of the email or a combination of email parts. The most common would include the Author, the Recipients, the Subject Line, the Content, and the Date and Time Sent. Creating a hash string of these combined values provides for a far more accurate method of de-duplicating emails because the Sender's and the Recipients' emails would have all the same values matching. Similar techniques are used for identifying near duplicates. I trust this assists in understanding how hash values are used in e-discovery. If you would like any more information, please do not hesitate to contact us. Girts Jansons Litigation Support Technical Specialist JLS inc. girts@jls.ca Providing Discover-E Services since 1995 Cell: 705-715-6808 Phn: 800-979-9139 If you would like to recieve periodic emails from us, please click here and type "Include" in the subject line. |
||||