| Cyclic Redundancy Code FileCheck | Lab Report |
| Compute CRC-32s for Files, a Directory, or a Volume (Also see CRC Calculator Lab Report for how to compute CRC-16/CRC-32 of a character string) |
||
![]() |
||
![]() |
||
Purpose
The purpose of this project is to show how to compute
CRC-32s for one or more files and to form a "metaCRC" based on an ordered
sequence of files for a directory or volume. The integrity of files processed
through a "scan" operation can be checked at any later time in a
"verify" operation by saving a file containing the observed CRC values.
Background
For QA/QC purposes, knowing that a file, a directory or
even a volume is exactly the same as another is very useful. After
"burning" several CD-Rs I discovered that some of the files were not being
written reliably. I wanted to find a way to verify whether the CD "burn"
had exactly the same contents as the original disk copy.
For example, I discovered a CD-R with 646 directories, 13,421 files and 509,314,783 bytes had 12 bad files! As long as I could identify which files were "bad," and verify the "bad" files could be safely ignored, I could then "accept" the CD as a valid backup copy even with bad files. This Lab Report is about a FileCheck utility that can be used to automate this verification of CD-Rs or other media.
Materials and Equipment
Software Requirements
Windows 95/98
Delphi 3/4/5 (to recompile)Hardware Requirements
VGA display
Procedure
Discussion
"Scan" and "Verify" are the two main functions of this program.
A volume, directory or file can be scanned with resulting data written to a File List
(for viewing by a human much like a DOS DIR list) and a Verify File.
An example "Log" for a successful volume scan appears as follows:
| Volume e:\ Directories = 696, Files = 9,225, Bytes = 249,435,921, Meta CRC32 = C42592A7 Scan time = 375.8 sec (7 Sep 1999 21:51:21) |
A File List disk file created during a scan is an expanded version of the information shown by the DOS "DIR" command and is intended to be in a human readable.
| Sample FileList disk file |
| FileCheck: e:\ 09/07/1999 21:45 Label = EFG-DELL-E VolSer = D7D4B00A Date Time Attrib Bytes CRC-32 Filename ---------- -------- ------ ----------- -------- -------- e: 10/03/1997 15:44:30 -D---- bc5 08/28/1999 13:34:12 -D---- bde32sdk 10/03/1997 15:54:20 -D---- bp 10/11/1997 10:14:06 -D---- Comm 10/12/1997 22:56:08 -D---- radiation 07/27/1997 16:35:42 -D-SH- RECYCLED ------------ -------- 00000000 0 files e:\bc5 07/26/1997 19:13:46 A----- 122,337 F0B7FB58 BC5RMV.LOG 10/03/1997 15:44:32 -D---- BGI 10/03/1997 15:44:32 -D---- BIN 03/25/1997 05:02:00 A----- 32,768 A364DCED cgclean.exe 03/25/1997 05:02:00 A----- 9,057 C835AFF0 CGREADME.TXT 03/25/1997 05:02:00 A----- 49,152 E01A87CD cleanini.exe 10/03/1997 15:45:04 -D---- DOC 10/03/1997 15:45:18 -D---- EXAMPLES 10/03/1997 15:48:14 -D---- EXPERT 10/03/1997 15:48:18 -D---- HELP 10/03/1997 15:49:02 -D---- INCLUDE ... e:\radiation\WIN-BUG 01/11/1992 09:20:10 A----- 9,006 3A88EC5C PEEKPOKE.EXE 01/11/1992 09:20:10 A----- 91 A01D7133 READTHIS ------------ -------- 9,097 454BDB4E 2 files e:\RECYCLED 09/02/1999 17:55:32 A---H- 65 74221298 desktop.ini ------------ -------- 65 4A7D57B9 1 files Summary of e:\ Directories = 696 Files = 9,225 Bytes = 249,435,921 Meta CRC-32 = C42592A7 |
A Verify disk file created during a scan is an ASCII text file. This file is intended for processing by the "Verify" operation, or other computer programs.
| Sample Verify disk file |
| V
Label
= EFG-DELL-E VolSer = D7D4B00A P e: D 0 00000000 e: 0 F 122337 F0B7FB58 e:\bc5\BC5RMV.LOG F 32768 A364DCED e:\bc5\cgclean.exe F 9057 C835AFF0 e:\bc5\CGREADME.TXT F 49152 E01A87CD e:\bc5\cleanini.exe F 21950 B7A958C4 e:\bc5\INSTALL.TXT F 9982 E14F0501 e:\bc5\README.TXT F 57101 F918A9C9 e:\bc5\regsrvr.exe F 11014 9333C2EF e:\bc5\uninst.ini F 49152 96AEC4BF e:\bc5\UNPAQ.EXE F 114688 F6576E76 e:\bc5\unreg.exe F 21792 0FD64C87 e:\bc5\unreg.ini D 498993 12F585E9 e:\bc5 11 F 6332 4F4D7A95 e:\bc5\BGI\att.bgi F 49630 F413139D e:\bc5\BGI\bgidemo.c F 23016 62949AEE e:\bc5\BGI\bgidemo.ide F 12208 6D728AAF e:\bc5\BGI\bgiobj.exe ... F 9006 3A88EC5C e:\radiation\WIN-BUG\PEEKPOKE.EXE F 91 A01D7133 e:\radiation\WIN-BUG\READTHIS D 9097 454BDB4E e:\radiation\WIN-BUG 2 F 65 74221298 e:\RECYCLED\desktop.ini D 65 4A7D57B9 e:\RECYCLED 1 S 249435921 C42592A7 9225 696 |
Editing this file may cause the Verify operation to report erroneous results, but sometimes editing this file is the quickest way to compare files one-by-one that are moved to a new location. Consider a directory scan. The first line of the Verify file, which is normally named Verify.CRC, is a Path:
|
P
c:\data F 2975 F0B7FB58 c:\data\set1.dat ... |
If this directory was moved to e:\Monthly\Backup\data, simply edit Verify.CRC file to replace "c:\data" with the new location "e:\Monthly\Backup\data".
|
P
e:\Monthly\Backup\data F 2975 F0B7FB58 e:\Monthly\Backup\data\set1.dat ... |
[Future: A possible future Verify option would be to specify a path that would be used instead of the one specified on the "P" line in the Verify.CRC file.]
Look at this page for various I/O errors that can occur while running FileCheck.
At a later time, the information stored in the Verify File can be verified to see that all CRCs match the original values. A Print button allows printing the Scan or Verify operations for documentation purposes.
A volume "scan" is much like the scan of the root directory of a volume, except that the volume label and volume serial number are stored as part of the information about a volume. A volume "scan" always implies that all subdirectories should be scanned. The Subdirs Checkbox allows one to specify whether subdirectories should be scanned in a Directory "scan."
If multiple instances of FileCheck are run, be sure that unique File List and Verify files are specified. If you blank either of the fields for these files, the corresponding file is not created.
The BitBtnScanClick method is called for a "click" on any of the Scan buttons. The Tag value of each button is used to determine whether the scan is for the volume in the TDriveCombobox, the directory in the TDirectoryListBox, or the file in the TFileListBox. A further helper routine, ScanDirectoryTarget, is called for processing a volume of directory scan.
The BitBtnVerifyFileClick method is called for the "verify" operation. Many of the variables used for scanning are replicated within this routine so that (in theory) a scan and a verify could run simultaneously without interfering with each other.
See the CRC Calculator Lab Report for how to compute the CRC-16/CRC-32 of a character string, including source code for a CalcFileCRC32 procedure from the CRC32.PAS unit.
Two versions of CalcFileCRC32 are available. The StreamIO conditional compilation variable allows to select I/O using Streams or with the older BlockRead routine. Since I have observed that BlockRead is still faster than Stream.LoadFileFrom, the default is setting is NoStreamIO.
Here are the two possible ways the CRC32 of a file is computed using the CalcCRC32 procedure:
CalcFileCRC32 using a TMemoryStream |
| // The CRC-32 value calculated here matches the one from the
PKZIP // program. Use MemoryStream to read file in binary mode. PROCEDURE CalcFileCRC32 (FromName: STRING; VAR CRCvalue: DWORD; VAR TotalBytes: TInteger8; VAR error: WORD); VAR Stream: TMemoryStream; BEGIN error := 0; CRCValue := $FFFFFFFF; Stream := TMemoryStream.Create; TRY TRY Stream.LoadFromFile(FromName); IF Stream.Size > 0 THEN CalcCRC32 (Stream.Memory, Stream.Size, CRCvalue) EXCEPT ON E: EReadError DO error := 1 // arbitrarily set this for now END; CRCvalue := NOT CRCvalue; TotalBytes := Stream.Size FINALLY Stream.Free END END {CalcFileCRC32}; |
An Error code 1 is return from this procedure when an EReadException is encountered since the Exception Message string did have any additional useful information. (See IOResult values below with BlockRead).
CalcFileCRC32 using BlockRead |
| // The CRC-32 value calculated here matches the one from the
PKZIP program. // Use BlockRead to read file in binary mode. PROCEDURE CalcFileCRC32 (FromName: STRING; VAR CRCvalue: DWORD; VAR TotalBytes: TInteger8; VAR error: WORD); CONST BufferSize = 32768; TYPE BufferIndex = 0..BufferSize-1; TBuffer = ARRAY[BufferIndex] OF BYTE; pBuffer = ^TBuffer; VAR BytesRead: INTEGER; FromFile : FILE; IOBuffer : pBuffer; BEGIN New(IOBuffer); TRY FileMode := 0; {Turbo default is 2 for R/W; 0 is for R/O} CRCValue := $FFFFFFFF; ASSIGN (FromFile,FromName); {$I-} RESET (FromFile,1); {$I+} error := IOResult; IF error = 0 THEN BEGIN TotalBytes := 0; REPEAT {$I-} BlockRead (FromFile, IOBuffer^, BufferSize, BytesRead); {$I+} error := IOResult; IF (error = 0) AND (BytesRead > 0) THEN BEGIN CalcCRC32 (IOBuffer, BytesRead, CRCvalue); TotalBytes := TotalBytes + BytesRead; // can't use INC with COMP END UNTIL (BytesRead = 0) OR (error > 0); CLOSE (FromFile) END; CRCvalue := NOT CRCvalue FINALLY Dispose(IOBuffer) END END {CalcFileCRC32}; |
The most likely error values returned by this routine are as follows:
| Error | Brief Description |
| 30 | ERROR_READ_FAULT occurs when the system cannot read from the specified device. |
| 31 | ERROR_GEN_FAILURE occurs when a device attached to the system is not functioning. |
| 32 | ERROR_SHARING_VIOLATION. The process cannot access the file because it is being
used by another process. This is likely to happen if you try to scan the Windows Swap file, e.g., Error Code 32 reading file c:\WINDOWS\WIN386.SWP |
Whenver a read error occurs, an error message is displayed in the log and the CRC is assigned a value of $00000000.
A CRC-32 value can be computed for a each file in a directory. The CRC of an ordered list of files in a directory could be directly computed, but maintaining the information about the computation is somewhat a pain. So instead of a "true" directory CRC, a "MetaCRC" is computed for a well-ordered list of files in a directory. This MetaCRC is simply a CRC of the file CRCs.
A Directory MetaCRC is a CRC of the file CRCs in a directory, which are processed in alphabetical order. Each of the file CRCs is converted to an 8-byte hex string for computing the Directory MetaCRC. (This facilitates a similar computation on machines of a different endianess. That is, CRCing the list of file hex CRCs will give the same result on either a PC with little endian words, or a UNIX workstation with big endian words.)
The Volume MetaCRC is a CRC of the Directory MetaCRCs taken in alphabetical order.
[Erratum: In the original version of FileCheck the Directory and Volume MetaCRCs were computed using a statement like this:
CalcCRC32 (@CRCValueHex[1], SizeOf(CRCValueHex), CRCTemp);
Unfortunately, the SizeOf function should have been the string Length function -- SizeOf returned "4" as the length of the string pointer, while Length returned "8", which was the correct number of bytes in a hex character string of a 4-byte integer value. The correction was made in the April 2001 version, labeled Version 1.01. Thanks to Miroslav Vancl for bringing this error to my attention. efg, 1 April 2001.]
The FileListLibrary.PAS unit provides a ScanDirectory procedure for a generic way to process a hierarchy of directories and files. Two callback routines are parameters to ScanDirectory to process each file, and to process the beginning and end of a directory. The routines ProcessDirectory and ProcessFile in ScreenFileCheck.PAS are the routines used as parameters to ScanDirectory.
To define a well-ordered list of files in a directory, a third parameter is a routine that is used to compare file names within a directory. The OrderByFilename function in ScreenFileCheck.PAS uses StrIComp to compare filenames in a case insensitive way.
A global variable in the FileListLibrary unit, ContinueScan, allows an external routine to stop the processing of directories and files (intended to be set by a "Cancel" button).
The Dbt_h.PAS file is a partial translation of DBT.H, which was adapted from "Notification of CD-ROM insertion and removal," http://www.undu.com/Articles/980221b.htm. The WmDeviceChange message is used to detect a change in CD-ROMs so the . (Setting a Debug compilation conditional enables additional log comments when this messgae is received).
The Refresh button on the Scan TabSheet forces an update of the TDriveComboBox, which may be necessary on some devices that do not generate a WmDeviceChange such as ZIP drives. Calling the BuildList methods of both the TDriveComboBox and the TDirectoryListBox updated these controls.
Unfortunately, the BuildList methods of both the TDriveComboBox and the TDirectoryListBox are protected methods. Creating new controls derived from these classes is somewhat of a pain just to call the protected BuildList method. To get around this limitation, derived classes were defined:
| type // Trick to call protected method of TDriveCombobox TMyDriveComboBox = CLASS(TDriveComboBox) END; // Trick to call protected method of TDirectoryListbox TMyDirectoryListBox = CLASS(TDirectoryListbox) END; |
These new derived classes were only used to typecast the original values and call the "protected" methods in the WmDeviceChange routine and the following:
| procedure
TFormFileList.SpeedButtonRefreshClick(Sender: TObject); VAR SaveDrive: CHAR; begin SaveDrive := DriveComboBox.Drive; TMyDriveComboBox(DriveComboBox).BuildList; DriveComboBox.Drive := SaveDrive; TMyDirectoryListBox(DirectoryListBox).BuildList; end; |
Any change in a file will most likely result in different CRC value. Keeping the number of bytes and the CRC value the same is even a more strict requirement. The "verify" operation for each file checks that a file's size and CRC-32 is the same. The "verify" operation for a directory is that the directory has the same number of files, bytes and MetaCRC values. Likewise, a volume match looks for the same number of directories, files, bytes and MetaCRC values.
A ScanDetails Radiobox is partially implemented but is hidden in the current implementation. This allows the CRC file to only contain directory information instead of file-by-file details. (The "Scan" functionality of this feature works, but the "Verify" functionality doesn't work correctly when "Directories" is chosen instead of "Files.")
The Verify operation reads a Verify.CRC file created in the Scan phase. The number of lines in this file is used as the measure of progress in the progress bar. A TTokens class is used to parse the tokens in the Verify.CRC file.
So far, the process of simply attempting to read each file on a CD-R has identified the "bad" files -- files that cannot be opened and read. CRC mismatches have not yet been observed on the same CD-R over time.
One side effect of the process of verifying every byte on a CD was to identify a virus (using McAfee VirusScan) that was stored on several of my CD backups.
Conclusions
The FileCheck utility is a handy utility to verify a copy of a
file, directory or even a volume (within acceptable probabilities).
Keywords
cyclic redundancy check, CRC-32, Lookup Table, MetaCRC, CalcCRC32, CalcFileCrc32,
Stream I/O, TMemoryStream, BlockRead, WmDeviceChange message, DBT.H,
FindFirst/FindNext/FindClose, TSearchRec, TStringList, Sort, StrIComp, Int64, Comp,
IntToHex, FormatFloat, FormatDateTime, Format, GetVolumeInformation, Volume Serial Number,
Volume Label, TTabSheet, TDriveComboBox, TDirectoryListBox, TFileListBox, procedure
variables, calling protected methods, tokens
Files (only for
noncommercial use)
Delphi 3/4/5 Source Code and EXE (195 KB): FileCheck.ZIP
Updated 15 Dec 2002
since 6 Sep 1999