I had another interesting situation present itself recently that I thought would make a good blog: how to use Classic ASP with the IIS URL Rewrite module to dynamically generate Robots.txt and Sitemap.xml files.
Overview
Here's the situation: I host a website for one of my family members, and like everyone else on the Internet, he wanted some better SEO rankings. We discussed a few things that he could do to improve his visibility with search engines, and one of the suggestions that I gave him was to keep his Robots.txt and Sitemap.xml files up-to-date. But there was an additional caveat - he uses two separate DNS names for the same website, and that presents a problem for absolute URLs in either of those files. Before anyone points out that it's usually not a good idea to host multiple DNS names on the same content, there are times when this is acceptable; for example, if you are trying to decide which of several DNS names is the best to use, you might want to bind each name to the same IP address and parse your logs to find out which address is getting the most traffic.
In any event, the syntax for both Robots.txt and Sitemap.xml files is pretty easy, so I wrote a couple of simple Classic ASP Robots.asp and Sitemap.asp pages that output the correct syntax and DNS-specific URLs for each domain name, and I wrote some simple URL Rewrite rules that rewrite inbound requests for Robots.txt and Sitemap.xml files to the ASP pages, while blocking direct access to the Classic ASP pages themselves.
All of that being said, there are a couple of quick things that I would like to mention before I get to the code:
- First of all, I chose Classic ASP for the files because it allows the code to run without having to load any additional framework; I could have used ASP.NET or PHP just as easily, but either of those would require additional overhead that isn't really required.
- Second, the specific website for which I wrote these specific examples consists of all static content that is updated a few times a month, so I wrote the example to parse the physical directory structure for the website's URLs and specified a weekly interval for search engines to revisit the website. All of these options can easily be changed; for example, I reused this code a little while later for a website where all of the content was created dynamically from a database, and I updated the code in the Sitemap.asp file to create the URLs from the dynamically-generated content. (That's really easy to do, but outside the scope of this blog.)
That being said, let's move on to the actual code.
Creating the Required Files
There are three files that you will need to create for this example:
- A Robots.asp file to which URL Rewrite will send requests for Robots.txt
- A Sitemap.asp file to which URL Rewrite will send requests for Sitemap.xml
- A Web.config file that contains the URL Rewrite rules
Step 1 - Creating the Robots.asp File
You need to save the following code sample as Robots.asp in the root of your website; this page will be executed whenever someone requests the Robots.txt file for your website. This example is very simple: it checks for the requested hostname and uses that to dynamically create the absolute URL for the website's Sitemap.xml file.
<%
Option Explicit
On Error Resume Next
Dim strUrlRoot
Dim strHttpHost
Dim strUserAgent
Response.Clear
Response.Buffer = True
Response.ContentType = "text/plain"
Response.CacheControl = "public"
Response.Write "# Robots.txt" & vbCrLf
Response.Write "# For more information on this file see:" & vbCrLf
Response.Write "# http://www.robotstxt.org/" & vbCrLf & vbCrLf
strHttpHost = LCase(Request.ServerVariables("HTTP_HOST"))
strUserAgent = LCase(Request.ServerVariables("HTTP_USER_AGENT"))
strUrlRoot = "http://" & strHttpHost
Response.Write "# Define the sitemap path" & vbCrLf
Response.Write "Sitemap: " & strUrlRoot & "/sitemap.xml" & vbCrLf & vbCrLf
Response.Write "# Make changes for all web spiders" & vbCrLf
Response.Write "User-agent: *" & vbCrLf
Response.Write "Allow: /" & vbCrLf
Response.Write "Disallow: " & vbCrLf
Response.End
%>
Step 2 - Creating the Sitemap.asp File
The following example file is also pretty simple, and you would save this code as Sitemap.asp in the root of your website. There is a section in the code where it loops through the file system looking for files with the *.html file extension and only creates URLs for those files. If you want other files included in your results, or you want to change the code from static to dynamic content, this is where you would need to update the file accordingly.
<%
Option Explicit
On Error Resume Next
Response.Clear
Response.Buffer = True
Response.AddHeader "Connection", "Keep-Alive"
Response.CacheControl = "public"
Dim strFolderArray, lngFolderArray
Dim strUrlRoot, strPhysicalRoot, strFormat
Dim strUrlRelative, strExt
Dim objFSO, objFolder, objFile
strPhysicalRoot = Server.MapPath("/")
Set objFSO = Server.CreateObject("Scripting.Filesystemobject")
strUrlRoot = "http://" & Request.ServerVariables("HTTP_HOST")
' Check for XML or TXT format.
If UCase(Trim(Request("format")))="XML" Then
strFormat = "XML"
Response.ContentType = "text/xml"
Else
strFormat = "TXT"
Response.ContentType = "text/plain"
End If
' Add the UTF-8 Byte Order Mark.
Response.Write Chr(CByte("&hEF"))
Response.Write Chr(CByte("&hBB"))
Response.Write Chr(CByte("&hBF"))
If strFormat = "XML" Then
Response.Write "<?xml version=""1.0"" encoding=""UTF-8""?>" & vbCrLf
Response.Write "<urlset xmlns=""http://www.sitemaps.org/schemas/sitemap/0.9"">" & vbCrLf
End if
' Always output the root of the website.
Call WriteUrl(strUrlRoot,Now,"weekly",strFormat)
' --------------------------------------------------
' This following section contains the logic to parse
' the directory tree and return URLs based on the
' static *.html files that it locates. This is where
' you would change the code for dynamic content.
' --------------------------------------------------
strFolderArray = GetFolderTree(strPhysicalRoot)
For lngFolderArray = 1 to UBound(strFolderArray)
strUrlRelative = Replace(Mid(strFolderArray(lngFolderArray),Len(strPhysicalRoot)+1),"\","/")
Set objFolder = objFSO.GetFolder(Server.MapPath("." & strUrlRelative))
For Each objFile in objFolder.Files
strExt = objFSO.GetExtensionName(objFile.Name)
If StrComp(strExt,"html",vbTextCompare)=0 Then
If StrComp(Left(objFile.Name,6),"google",vbTextCompare)<>0 Then
Call WriteUrl(strUrlRoot & strUrlRelative & "/" & objFile.Name, objFile.DateLastModified, "weekly", strFormat)
End If
End If
Next
Next
' --------------------------------------------------
' End of file system loop.
' --------------------------------------------------
If strFormat = "XML" Then
Response.Write "</urlset>"
End If
Response.End
' ======================================================================
'
' Outputs a sitemap URL to the client in XML or TXT format.
'
' tmpStrFreq = always|hourly|daily|weekly|monthly|yearly|never
' tmpStrFormat = TXT|XML
'
' ======================================================================
Sub WriteUrl(tmpStrUrl,tmpLastModified,tmpStrFreq,tmpStrFormat)
On Error Resume Next
Dim tmpDate : tmpDate = CDate(tmpLastModified)
' Check if the request is for XML or TXT and return the appropriate syntax.
If tmpStrFormat = "XML" Then
Response.Write " <url>" & vbCrLf
Response.Write " <loc>" & Server.HtmlEncode(tmpStrUrl) & "</loc>" & vbCrLf
Response.Write " <lastmod>" & Year(tmpLastModified) & "-" & Right("0" & Month(tmpLastModified),2) & "-" & Right("0" & Day(tmpLastModified),2) & "</lastmod>" & vbCrLf
Response.Write " <changefreq>" & tmpStrFreq & "</changefreq>" & vbCrLf
Response.Write " </url>" & vbCrLf
Else
Response.Write tmpStrUrl & vbCrLf
End If
End Sub
' ======================================================================
'
' Returns a string array of folders under a root path
'
' ======================================================================
Function GetFolderTree(strBaseFolder)
Dim tmpFolderCount,tmpBaseCount
Dim tmpFolders()
Dim tmpFSO,tmpFolder,tmpSubFolder
' Define the initial values for the folder counters.
tmpFolderCount = 1
tmpBaseCount = 0
' Dimension an array to hold the folder names.
ReDim tmpFolders(1)
' Store the root folder in the array.
tmpFolders(tmpFolderCount) = strBaseFolder
' Create file system object.
Set tmpFSO = Server.CreateObject("Scripting.Filesystemobject")
' Loop while we still have folders to process.
While tmpFolderCount <> tmpBaseCount
' Set up a folder object to a base folder.
Set tmpFolder = tmpFSO.GetFolder(tmpFolders(tmpBaseCount+1))
' Loop through the collection of subfolders for the base folder.
For Each tmpSubFolder In tmpFolder.SubFolders
' Increment the folder count.
tmpFolderCount = tmpFolderCount + 1
' Increase the array size
ReDim Preserve tmpFolders(tmpFolderCount)
' Store the folder name in the array.
tmpFolders(tmpFolderCount) = tmpSubFolder.Path
Next
' Increment the base folder counter.
tmpBaseCount = tmpBaseCount + 1
Wend
GetFolderTree = tmpFolders
End Function
%>
Note: There are two helper methods in the preceding example that I should call out:
- The GetFolderTree() function returns a string array of all the folders that are located under a root folder; you could remove that function if you were generating all of your URLs dynamically.
- The WriteUrl() function outputs an entry for the sitemap file in either XML or TXT format, depending on the file type that is in use. It also allows you to specify the frequency that the specific URL should be indexed (always, hourly, daily, weekly, monthly, yearly, or never).
Step 3 - Creating the Web.config File
The last step is to add the URL Rewrite rules to the Web.config file in the root of your website. The following example is a complete Web.config file, but you could merge the rules into your existing Web.config file if you have already created one for your website. These rules are pretty simple, they rewrite all inbound requests for Robots.txt to Robots.asp, and they rewrite all requests for Sitemap.xml to Sitemap.asp?format=XML and requests for Sitemap.txt to Sitemap.asp?format=TXT; this allows requests for both the XML-based and text-based sitemaps to work, even though the Robots.txt file contains the path to the XML file. The last part of the URL Rewrite syntax returns HTTP 404 errors if anyone tries to send direct requests for either the Robots.asp or Sitemap.asp files; this isn't absolutely necesary, but I like to mask what I'm doing from prying eyes. (I'm kind of geeky that way.)
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<system.webServer>
<rewrite>
<rewriteMaps>
<clear />
<rewriteMap name="Static URL Rewrites">
<add key="/robots.txt" value="/robots.asp" />
<add key="/sitemap.xml" value="/sitemap.asp?format=XML" />
<add key="/sitemap.txt" value="/sitemap.asp?format=TXT" />
</rewriteMap>
<rewriteMap name="Static URL Failures">
<add key="/robots.asp" value="/" />
<add key="/sitemap.asp" value="/" />
</rewriteMap>
</rewriteMaps>
<rules>
<clear />
<rule name="Static URL Rewrites" patternSyntax="ECMAScript" stopProcessing="true">
<match url=".*" ignoreCase="true" negate="false" />
<conditions>
<add input="{Static URL Rewrites:{REQUEST_URI}}" pattern="(.+)" />
</conditions>
<action type="Rewrite" url="{C:1}" appendQueryString="false" redirectType="Temporary" />
</rule>
<rule name="Static URL Failures" patternSyntax="ECMAScript" stopProcessing="true">
<match url=".*" ignoreCase="true" negate="false" />
<conditions>
<add input="{Static URL Failures:{REQUEST_URI}}" pattern="(.+)" />
</conditions>
<action type="CustomResponse" statusCode="404" subStatusCode="0" />
</rule>
<rule name="Prevent rewriting for static files" patternSyntax="Wildcard" stopProcessing="true">
<match url="*" />
<conditions>
<add input="{REQUEST_FILENAME}" matchType="IsFile" />
</conditions>
<action type="None" />
</rule>
</rules>
</rewrite>
</system.webServer>
</configuration>
Summary
That sums it up for this blog; I hope that you get some good ideas from it.
For more information about the syntax in Robots.txt and Sitemap.xml files, see the following URLs:
Note: This blog was originally posted at http://blogs.msdn.com/robert_mcmurray/