Bad Characters to Use in Web-based Filenames

My good friend Wade Hilmo recently posted an excellent blog titled "How IIS blocks characters in URLs" that discusses some of the internal workings for how IIS deals with several characters in file names that do not work well in URLs. Wade’s blog does a great job explaining all of the internal IIS URL parsing logic in detail, and his post reminded me of some related notes that I had published on an internal website at Microsoft. As a complement to Wade’s outstanding blog post, I’m reposting my notes in this blog.

Recently a Microsoft Technical Account Manager (TAM) mentioned that he was working on an issue with a customer that was using SharePoint 2007 on IIS 7. The customer's problem was this: his company had several Word documents that were stored in SharePoint that had the plus sign (+) in the filenames, and HTTP requests for these documents were failing. The TAM configured IIS 7's request filtering feature to allow doubly-escaped characters by setting the allowDoubleEscaping attribute to true. This seemed to alleviate the problem, but I had to point out that this probably wasn't the right thing to do. As a general rule, I don't like changing many of the default configuration options for the IIS 7 request filtering feature, because they are designed to keep my servers safe. But in this specific scenario, modifying those settings is simply looking in the wrong place.

Let me explain:

There are several characters that are perfectly valid in a Windows filename that are really bad when you post those files on websites, and either the server or the client could wreak havoc with them. In most scenarios the HTTP requests will receive an HTTP 404 File Not Found error, but in some cases that might cause an HTTP 400 Bad Request error. As such, even though you might find a way to work around the problem, it's a really bad idea to use those characters when you are posting files to a website.

RFC 2396 is the governing document for "Uniform Resource Identifiers (URI): Generic Syntax." This RFC defines what can and can't be used in a URI, as well as what shouldn't be used.

First, section "2.2. Reserved Characters" contains the following list of reserved characters:

reserved = ";" | "/" | "?" | ":" | "@" |
           "&" | "=" | "+" | "$" | ","

Second, section "2.4.3. Excluded US-ASCII Characters" contains the following lists of delimiter and unwise characters:

delims = "<" | ">" | "#" | "%" | <">

unwise = "{" | "}" | "|" | "\" | "^" | "[" | "]" | "`"

Several of the characters in those lists cannot be used in Windows filenames, but the remaining characters should not be used for filenames if you intend to upload those files to a website.

Here are my explanations for why some of those characters will cause problems if you attempt to use them in filenames that you upload to a website:

  • Plus Sign (+) - this character is often translated as a URI-encoded space, so the URI "http://localhost/foo+bar.doc" could be misinterpreted as the URI "http://localhost/foo bar.doc".
  • Percent Sign (%) - this character is used for URI escaping, and I've seen this cause a lot of problems because the two characters that follow the percent sign are assumed to be hex digits for an escaped ASCII code, so the URI "http://localhost/foo%bar.doc" could be misinterpreted as the URI "http://localhost/fooºr.doc".
  • Number/Pound Sign (#) - this character is used to delimit a URI from a fragment identifier (aka bookmarks), so the URI "http://localhost/foo#bar.doc" could be misinterpreted as the URI "http://localhost/foo" with a bookmark of "bar.doc".

So once again, just because you might be able to get this to work on your server doesn't mean that you should be using a character in a web-based filename that's reserved for something else. It's like building an atomic bomb - just because you can doesn't mean that you should. Your best answer in this scenario is to rename your files to something else.

;-]


Note: This blog was originally posted at http://blogs.msdn.com/robert_mcmurray/