What is the sitemap protocol? It's the brain child of Google that has caught on with MSN and Yahoo to make it easier for search engines to spider your site. It's obvious and simple; an XML file on your server is read by the search engine to know all the url's of your website. It's better than a robots.txt file because it allows you to tell the search engine what pages are more important compared to the rest of the site and add other metadata.
So, you have your ASP.NET site all ready and have used SiteMapProviders to make sure all pages, even the dynamic pages, are in SiteMap? Of course you do, you're not the type of developer to reinvent the wheel! Well my sharp ninja coder friend, here is where the pay off comes in. If you have all your pages with the SiteMap, supporting the sitemap protocol is simple.
First thing we need is a generic handler, so add a new file in Visual Studio and select Generic Handler and save it in the root of your site as SiteMap.ashx - Visual Studio will create the stub file below:
What we are looking at here is a class that implements IHttpHandler. This is a very basic interface with one exposed method and one property. ProcessRequest is where we build the response and the property IsReusable tells the framework if we are thread-safe. If you've created any web services in .NET the idea is the same - the only difference is we will be formatting the messages instead of relying on the framework. This may sound like a pain, after all Xml generating code can get messy, but XmlTextWriter is going to do the heavy lifting for us. Below is a completed sitemap protocol handler (that runs ViNull.com):
Notice I've changed the name of the class (lines 1 and 8) from SiteMap to SiteMapBuilder. This is because I don't want have a naming conflict between my class and System.Web.SiteMap. I've also brought in System.Xml and System.Collections.Generics. So here is what's going on:
1: <%@ WebHandler Language="C#" Class="SiteMapBuilder" %>
2:
3: using System;
4: using System.Web;
5: using System.Xml;
6: using System.Collections.Generic;
7:
8: public class SiteMapBuilder : IHttpHandler { 9:
10: public void ProcessRequest (HttpContext context) { 11: context.Response.Clear();
12: context.Response.ContentType = "text/xml";
13: XmlTextWriter xmlSiteMap = new XmlTextWriter(context.Response.OutputStream, System.Text.Encoding.UTF8);
14: xmlSiteMap.WriteStartDocument();
15: xmlSiteMap.WriteStartElement("urlset"); 16: xmlSiteMap.WriteAttributeString("xmlns:xsi", "http://www.w3.org/2001/XMLSchema-instance"); 17: xmlSiteMap.WriteAttributeString("xsi:schemaLocation", "http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"); 18: xmlSiteMap.WriteAttributeString("xmlns", "http://www.sitemaps.org/schemas/sitemap/0.9"); 19:
20: String[] homepages = { "/", "/Default.aspx" }; 21: foreach(string url in homepages) { 22: xmlSiteMap.WriteStartElement("url"); 23: xmlSiteMap.WriteElementString("loc", "http://" + context.Request.ServerVariables["SERVER_NAME"] + url); 24: xmlSiteMap.WriteElementString("priority", "0.5"); 25: xmlSiteMap.WriteElementString("changefreq", "daily"); 26: xmlSiteMap.WriteEndElement();
27: }
28:
29: List<string> seen = new List<string>();
30: foreach (SiteMapNode node in SiteMap.Provider.FindSiteMapNode("~/Default.aspx").GetAllNodes()) { 31: if (!seen.Contains(node.Url)) { 32: xmlSiteMap.WriteStartElement("url"); 33: xmlSiteMap.WriteElementString("loc", "http://" + context.Request.ServerVariables["SERVER_NAME"] + node.Url); 34: if(node.Url.Contains("/Post/")) { 35: xmlSiteMap.WriteElementString("priority", "1.0"); 36: xmlSiteMap.WriteElementString("changefreq", "weekly"); 37: }
38: else { 39: xmlSiteMap.WriteElementString("priority", "0.5"); 40: xmlSiteMap.WriteElementString("changefreq", "monthly"); 41: }
42: xmlSiteMap.WriteEndElement();
43: seen.Add(node.Url);
44: }
45: }
46:
47: xmlSiteMap.WriteEndElement();
48: xmlSiteMap.WriteEndDocument();
49: xmlSiteMap.Flush();
50: xmlSiteMap.Close();
51: context.Response.End();
52: }
53:
54: public bool IsReusable { 55: get { 56: return true;
57: }
58: }
59:
60: }
11-12: We clear anything that may be in the response buffer, and set our content type.
13-18: We create an XmlTextWriter and tie it's output to our response object. According to spec, the output should be UTF8, so we set our XmlTextWriter to do the converting for us. We begin the Xml document by setting some schema locations and we're good to go.
20-27: Line 30 is going to get all pages below the root, but not the root itself, which can be reach in two ways. So we'll manually add those two pages.
29-45: Here is the fun stuff. In ViNull Siding, a post can be in more than one category and will appear more than one time in the sitemap. Since a search engine doesn't care about this, and may even think I was trying to fool it, I use a List<> to track the url's I've already added. GetAllNodes() returns a flat list of all nodes below the current node, so we don't need to "walk" the sitemap to get pages at all levels. If the page is a post, I'm going to tell the search engine it's more important that the other pages. What this means is if Google has results for a search on both the post page and the homepage, it will direct users to the post page as being the preferred landing page. Note that priority has nothing to do with other sites, just pages within my site ranked against each other.
47-51: Wrap everything up, close and flush the buffers.
54-58: Since we didn't do anything with any data outside the scope of ProcessRequest, we are thread safe and can let the server load our class once to handle multiple requests. Ho-ray, scalability!
That's all there is to it. You can see the results of this code at http://www.vinull.com/SiteMap.ashx. After you have your file you need to tell the search engines it's there, and this is currently search engine specific. Details are at sitemaps.org and you should probably look over everything at the site before deploying your sitemap to make sure you've done everything correctly (there is also a few more pieces of metadata you can add). Yahoo and Google are already accepting sitemaps, MSN is in closed testing and will add support soon. With the "big 3" already on board it won't be long before every search engine uses sitemaps, so now is the time to prepare your site.