Hashing and Forensic Images
Data Preservation and Image Creation
In forensic investigations, preserving the original evidence (the Original) is paramount; it must remain unchanged and unaltered. We must not work on the original evidence.
Copying and pasting or zipping up multiple files will not guarantee that things don’t get changed. Copying and pasting will change the metadata. Zipping up files will again not guarantee that changes are not made to the files.
There are two primary options for creating a forensically sound copy of digital evidence:
1st Option: Hardware to Hardware (Cloning)
The process is called cloning.
– Clone – target drive/destination drive: A new hard drive that is the same as the original, used to make an exact copy of the original drive.
– You can technically boot off a clone, but it is not recommended in forensic workflow to boot from a clone to prevent evidence contamination.
– Cloning is generally less flexible than imaging for forensic purposes, as you cannot hash or segment the data easily.
2nd Option: Hardware to File (Imaging)
The process is creating a forensic image (or “image”).
– The Original is copied to a file.
– This image file is the same as the original but can be stored anywhere, such as on a hard drive or SSD.
– You cannot boot off this image because it is just a file.
Clone vs. Image
– Our image is like the DNA of our original drive, while the clone is an exact physical replica in the sense.
– Images offer more options: we can image a volume, a partition, a specific file, a directory, or the entire drive. With cloning, there are no such options because we need to clone the entire drive.
Types of Forensic Images
Logical Image
– Logical image: Selected data or specific data.
– It is an image that collects only visible data (existing data from anywhere).
– A logical image captures all the “active” data (what you see when browsing the C drive).
– Deleted space, deleted files, and fragments will NOT be captured.
– If a 1 TB drive has 30 GB of active files, the uncompressed resulting image will be 30 GB.
Physical Image
– Physical image: Will collect everything, including unallocated areas of the drive.
– A physical image captures all of the ones and zeroes contained on the drive.
– It will capture the deleted space (even if the drive has been recently formatted) and deleted files and file fragments.
– If a 1 TB drive is imaged, the resulting image file(s) will be 1 TB, unless compression is used.
Image File Formats
DD Files / Raw Files
– Also called raw files.
– Only contains the data, so it gets done quicker.
– Can split the data into multiple files.
– Cannot compress.
– Does not have CRC checking.
– Case data, such as hashes, must be stored separately.
E01/Ex01 (EnCase Evidence File Format)
– Most popular image format, a universal language and standard, created in the early 90s. Most digital forensics work uses this image format.
– Often referred to as an “Image file” although technically it’s more of a container or an “evidence file”.
– Primary purpose: Preserve an exact bit-for-bit copy of the target media.
– Contains the bit-for-bit copy plus other information that serves to “bag-and-tag” the evidence file to preserve the chain of custody.
– This “bag-and-tag” information is created automatically and integrated while the evidence file is created, preserving the integrity of the evidence.
Components of E01/Ex01:
1. The Header (A): Contains tombstone information like evidence name, case number, notes, date/time of acquisition, version of the tool used, and the OS. The header is subject to CRC (Cyclic Redundancy Check) and is **always compressed** (even if the data is not).
2. Data Blocks (B): Follow the header.
3. File Integrity Component (CRC and MD5/SHA-1) (C):
– Each compartment has its own integrity seal.
– The header is sealed with its own CRC.
– Each data block is verified with its own CRC.
– The entire data block section is subjected to an MD5/SHA-1 acquisition hash.
– IMPORTANT: The acquisition hash is calculated only on the data; the header and all CRCs are not included.
Hashing for Integrity and Authentication
What is Hashing?
A Hash is an algorithm used to verify data integrity. It creates a fixed-length message digest (e.g., 32, 128, 160, 256 bits) from any length of data input. This digest is claimed to be as unique to that specific data as a fingerprint is to an individual.
Why We Hash Forensic Images
– We hash a forensic image right after we create it to ensure nothing has been changed. Hashing gives a unique value to the content of the forensic image.
– We hash it at the beginning so we can hash it later when it’s being used or pulled to ensure the hashes are the same and match. This process serves to ensure the evidence’s authenticity and integrity are intact to be used for later.
– The hash is there to prevent:
– Tampering
– Suspicion
– Contamination
– No room for errors
– Eliminates any suspicion of third-party intervention
– This is also connected to the Canada Evidence Act, where the best evidence rule is satisfied on proof of the integrity of the electronic documents system.
Hash Sets
– A Hash Set or Hash Library is a database of known hash values.
– We would use hash sets mainly for two reasons: removing known good files or quickly finding known bad files.
Removing Known Good Files Example:
– An investigator filters out known legitimate operating system and application files from a forensic image. This allows the investigator to focus on unknown or suspicious data. The NIST National Software Reference Library (NSRL) stores one of the most exhaustive hash libraries for this purpose.
Quickly Finding Known Bad Files Example:
– To quickly find known illicit or malicious files such as child exploitation images (CSAM) or malware. You can find these hash sets online that contain the hashes that identify this, from places like Project Vic International. Forensic tools will compare the file hashes in the case against the library, and any matches will be flagged.
Hash Collision
– Hash Collision: The potential of different input messages results in the same hash value.
– The possibility of hash collision is very low. MD5 is a hash algorithm that might create a hash collision, while other algorithms like SHA1 are considered more secure against collision.
Forensic Tools and Processes
Forensic Hardware
Hardware Imagers
– Hardware imaging appliances are usually faster and have better error handling abilities and more functionality than software-based imaging.
– They have a lot of options and things you can do with it, such as cloning, imaging, physical imaging, logical imaging, hashing drives, and even wiping drives.
– The Source drive plugs into the left and the target drive plugs into the right.
– Examples: Ditto Dx, Tableau.
Advanced Disk Imagers
– These tools require advanced training.
– They deal with failed firmware and bad sectors and are more expensive than regular hardware imagers.
– Example: DELFSPOR.
HD Write Blockers
– HD Write Blockers are important for blocking writes to drives. Without a write blocker, you could possibly contaminate the drive.
– Testing and Maintenance: Write blockers need to be tested constantly at least quarterly.
– Test them before every activity and every quarter to ensure they work.
– Testing is done with two methods: 1. Hash a drive with a known hash value and 2. Software testing.
– You should regularly check for firmware updates for your forensic hardware.
– Examples: Tableau, WiebeTECH.
Dongles
– Forensic software often uses a dongle to control access; the software will not work without the dongle in the PC.
– Some labs use a dongle server to remotely connect to a physical dongle as if it was local, or for network licensing.
Media Sanitization
Media Sanitation is necessary to:
– Prevent sensitive material from being recovered from media.
– Prevent the possibility of any preexisting data from contaminating evidence placed on the host media (known as Cross Contamination).
Process and Rules
– All data remains intact until it is overwritten or physically destroyed. Normal disk sanitation involves writing a known character (e.g., 00h) over the disk.
– For overwriting in Canada, the legacy standard was 3 passes, and in the United States, the legacy standard was 7 passes.
– NOTE: Modern standards, such as NIST SP 800-88r1, recommend one pass for magnetic media sanitization. Multiple passes are no longer required for simple overwrite.
– You can use tools like WinHex.
Post-Sanitization
– Once sanitized, the media must be clearly labeled with:
– “sanitized” or “wiped”.
– The date and time of sanitization.
– The initials and employee number performing the sanitization.
– Sanitized media shall be stored in a sealed anti-static bag, clearly identified as sanitized.
Removable Media
– Removable media used by investigators in digital cameras shall be reformatted prior to each search.
Encryption
– Encryption enhances the security of the data or file by scrambling the content. It is the same concept as sending secret code messages.
– You need to know the “key” to be able to decrypt the message.
– Why do people encrypt their data?
– Corporate information
– Personal and private content
– To prevent intellectual property theft
– VeraCrypt is a very popular tool for encryption.
– You can use the Encrypted Disk Detector (EDD) to see if there are encrypted items.