We recently ran into a situation where a customer thought that they were seeing file corruption when they were transferring files from a Windows 7 client to their IIS 6.0 server using WebDAV. More specifically, the file sizes were increasing for several specific file types, and for obvious reasons the checksums for these files would not match for verification. Needless to say this situation caused a great deal of alarm on the WebDAV team when we heard about it - file corruption issues are simply unacceptable.
To alleviate any fears, I should tell you right up front that no corruption was actually taking place, and the increase in file size was easily explained once we discovered what was really going on. All of that being said, I thought that a detailed explanation of the scenario would make a great blog entry in case anyone else runs into the situation.
The Customer Scenario
First of all, the customer was copying installation files using a batch file over WebDAV; more specifically the batch file was copying a collection of MSI and MST files. After the batch file copied the files to the destination server it would call the command-line comp utility to compare the files. Each MSI and MST file that was copied would increase by a small number of bytes so the comparison would fail. The customer computed checksums for the files to troubleshoot the issue and found that the checksums for the files on the source and destination did not match. Armed with this knowledge the customer contacted Microsoft for support, and eventually I got involved and I explained what the situation was.
The Architecture Explanation
Windows has a type of file format called a Compound Document, and many Windows applications make use of this file format. For example, several Microsoft Office file formats prior to Office 2007 used a compound document format to store information.
A compound document file is somewhat analogous to a file-based database, or in some situations like a mini file system that is hosted inside another file system. In the case of an MST or MSI file these are both true: MST and MSI files store information in various database-style tables with rows and columns, and they also store files for installation.
With that in mind, here's a behind-the-scenes view of WebDAV in IIS 6.0:
The WebDAV protocol extension allows you to store information in "properties", and copying files over the WebDAV redirector stores several properties about a file when it sends the file to the server. If you were to examine a protocol trace for the WebDAV traffic between a Windows 7 client and an IIS server, you will see the PUT command for the document followed by several PROPPATCH commands for the properties.
IIS needs a way to store the properties for a file in a way where they will remain associated with the file in question, so the big question is - where do you store properties?
In IIS 7 we have a simple property provider that stores the properties in a file named "properties.dav," but for IIS 5.0 and IIS 6.0 WebDAV code we chose to write the properties in the compound document file format because there are lots of APIs for doing so. Here's the way that it works in IIS 5 and IIS 6.0:
- If the file is already in the compound document file format, IIS simply adds the WebDAV properties to the existing file. This data will not be used by the application that created the file - it will only be used by WebDAV. This is exactly what the customer was seeing - the file size was increasing because the WebDAV properties were being added to the compound document.
- For other files, WebDAV stores a compound document in an NTFS alternate data stream that is attached to the file. You will never see this additional data from any directory listing, and the file size doesn't change because it's in an alternate data stream.
So believe it or not, no harm is done by modifying a compound document file to store the WebDAV properties. Each application that wants to pull information from a compound document file simply asks for the data that it wants, so adding additional data to a compound document file in this scenario was essentially expected behavior. I know that this may seem counter-intuitive, but it's actually by design. ;-]
The Resolution
Once I was able to explain what was actually taking place, the customer was able to verify that their MST and MSI files still worked exactly as expected. Once again, no harm was done by adding the WebDAV properties to the compound document files.
You needn't take my word for this, you can easily verify this yourself. Here's a simple test: Word 2003 documents (*.DOC not *.DOCX) are in the compound document file format. So if you were to create a Word 2003 document and then copy that document to an IIS 6.0 server over WebDAV, you'll notice that the file size increases by several bytes. That being said, if you open the document in Word, you will see no corruption - the file contains the same data that you had originally entered.
I hope this helps. ;-]