What is the sitemap protocol? It's the brain child of Google that has caught on with MSN and Yahoo to make it easier for search engines to spider your site. It's obvious and simple; an XML file on your server is read by the search engine to know all the url's of your website. It's better than a robots.txt file because it allows you to tell the search engine what pages are more important compared to the rest of the site and add other metadata.
So, you have your ASP.NET site all ready and have used SiteMapProviders to make sure all pages, even the dynamic pages, are in SiteMap? Of course you do, you're not the type of developer to reinvent the wheel! Well my sharp ninja coder friend, here is where the pay off comes in. If you have all your pages with the SiteMap, supporting the sitemap protocol is simple.
First thing we need is a generic handler, so add a new file in Visual Studio and select Generic Handler and save it in the root of your site as SiteMap.ashx - Visual Studio will create the stub file below:
What we are looking at here is a class that implements IHttpHandler. This is a very basic interface with one exposed method and one property. ProcessRequest is where we build the response and the property IsReusable tells the framework if we are thread-safe. If you've created any web services in .NET the idea is the same - the only difference is we will be formatting the messages instead of relying on the framework. This may sound like a pain, after all Xml generating code can get messy, but XmlTextWriter is going to do the heavy lifting for us. Below is a completed sitemap protocol handler (that runs ViNull.com):
Notice I've changed the name of the class (lines 1 and 8) from SiteMap to SiteMapBuilder. This is because I don't want have a naming conflict between my class and System.Web.SiteMap. I've also brought in System.Xml and System.Collections.Generics. So here is what's going on:
1: <%@ WebHandler Language="C#" Class="SiteMapBuilder" %>
2:
3: using System;
4: using System.Web;
5: using System.Xml;
6: using System.Collections.Generic;
7:
8: public class SiteMapBuilder : IHttpHandler { 9:
10: public void ProcessRequest (HttpContext context) { 11: context.Response.Clear();
12: context.Response.ContentType = "text/xml";
13: XmlTextWriter xmlSiteMap = new XmlTextWriter(context.Response.OutputStream, System.Text.Encoding.UTF8);
14: xmlSiteMap.WriteStartDocument();
15: xmlSiteMap.WriteStartElement("urlset"); 16: xmlSiteMap.WriteAttributeString("xmlns:xsi", "http://www.w3.org/2001/XMLSchema-instance"); 17: xmlSiteMap.WriteAttributeString("xsi:schemaLocation", "http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"); 18: xmlSiteMap.WriteAttributeString("xmlns", "http://www.sitemaps.org/schemas/sitemap/0.9"); 19:
20: String[] homepages = { "/", "/Default.aspx" }; 21: foreach(string url in homepages) { 22: xmlSiteMap.WriteStartElement("url"); 23: xmlSiteMap.WriteElementString("loc", "http://" + context.Request.ServerVariables["SERVER_NAME"] + url); 24: xmlSiteMap.WriteElementString("priority", "0.5"); 25: xmlSiteMap.WriteElementString("changefreq", "daily"); 26: xmlSiteMap.WriteEndElement();
27: }
28:
29: List<string> seen = new List<string>();
30: foreach (SiteMapNode node in SiteMap.Provider.FindSiteMapNode("~/Default.aspx").GetAllNodes()) { 31: if (!seen.Contains(node.Url)) { 32: xmlSiteMap.WriteStartElement("url"); 33: xmlSiteMap.WriteElementString("loc", "http://" + context.Request.ServerVariables["SERVER_NAME"] + node.Url); 34: if(node.Url.Contains("/Post/")) { 35: xmlSiteMap.WriteElementString("priority", "1.0"); 36: xmlSiteMap.WriteElementString("changefreq", "weekly"); 37: }
38: else { 39: xmlSiteMap.WriteElementString("priority", "0.5"); 40: xmlSiteMap.WriteElementString("changefreq", "monthly"); 41: }
42: xmlSiteMap.WriteEndElement();
43: seen.Add(node.Url);
44: }
45: }
46:
47: xmlSiteMap.WriteEndElement();
48: xmlSiteMap.WriteEndDocument();
49: xmlSiteMap.Flush();
50: xmlSiteMap.Close();
51: context.Response.End();
52: }
53:
54: public bool IsReusable { 55: get { 56: return true;
57: }
58: }
59:
60: }
11-12: We clear anything that may be in the response buffer, and set our content type.
13-18: We create an XmlTextWriter and tie it's output to our response object. According to spec, the output should be UTF8, so we set our XmlTextWriter to do the converting for us. We begin the Xml document by setting some schema locations and we're good to go.
20-27: Line 30 is going to get all pages below the root, but not the root itself, which can be reach in two ways. So we'll manually add those two pages.
29-45: Here is the fun stuff. In ViNull Siding, a post can be in more than one category and will appear more than one time in the sitemap. Since a search engine doesn't care about this, and may even think I was trying to fool it, I use a List<> to track the url's I've already added. GetAllNodes() returns a flat list of all nodes below the current node, so we don't need to "walk" the sitemap to get pages at all levels. If the page is a post, I'm going to tell the search engine it's more important that the other pages. What this means is if Google has results for a search on both the post page and the homepage, it will direct users to the post page as being the preferred landing page. Note that priority has nothing to do with other sites, just pages within my site ranked against each other.
47-51: Wrap everything up, close and flush the buffers.
54-58: Since we didn't do anything with any data outside the scope of ProcessRequest, we are thread safe and can let the server load our class once to handle multiple requests. Ho-ray, scalability!
That's all there is to it. You can see the results of this code at http://www.vinull.com/SiteMap.ashx. After you have your file you need to tell the search engines it's there, and this is currently search engine specific. Details are at sitemaps.org and you should probably look over everything at the site before deploying your sitemap to make sure you've done everything correctly (there is also a few more pieces of metadata you can add). Yahoo and Google are already accepting sitemaps, MSN is in closed testing and will add support soon. With the "big 3" already on board it won't be long before every search engine uses sitemaps, so now is the time to prepare your site.
Posted By Mike On Sunday, December 03, 2006
Filed under asp.net seo httphandler sitemaps |
Comments (8)
Shalan
-
Monday, November 03, 2008
11:35:24 PM
great article! You mentioned that if we as developers have already incorporated a SiteMapProvider in our app, then all the spade-work is done. I havent tried this out yet but will this also apply to the SqlSiteMapProvider that Jeff Prosise wrote in Feeb 2006? In my web.config under <sitemap> (under <system.web>) I have already named AspNetSqlSiteMapProvider as the default provider, so it still should work?
Sorry, I am just trying to understand this correctly!
cheers!
Mike
-
Tuesday, November 04, 2008
12:44:12 AM
Yep, you're set to walk the site map and turn it into the xml needed for the sitemap protocol!
Shalan
-
Wednesday, November 05, 2008
12:29:46 AM
Hi Mike,
sorry, but I just wanted to know a little bit more about the search engine sitemap protocol...when rendered, does the xml sitemap define a hierarchy of links (much like a navigation menu), or just list all of them?
I ask this, as your solution is great but unfortunately doesn't work in my scenario - I have menu items with empty URLs (these act as placeholders for sub-menu items in my menubar). But using the principle behind your solution, I thought I could replace your second "foreach" statement with an SQL query to just write out the links (if not null), and where (roles = '*'). If u need me to explain my situation in detail, please email me.
regards and thanks once again! -cheers
Mike
-
Wednesday, November 05, 2008
1:17:21 PM
sitemaps.org explains the protocol in detail - it's simply a list of pages you want search engines to index; making it easier on dynamic content sites to list deep content.
The code here is just an example; the idea is since you already have a list of pages in the SiteMap provider, there is no need for additional logic to get the list; i.e. you don't have SQL in two places to do the same thing. You also do not need to worry about security, since the SiteMap provider already take care of filtering links that are not public.
What's the best way to generate a sitemap though depends on your particular site!
Ian
-
Saturday, March 28, 2009
5:18:59 PM
Nice one, just what I was looking for :)
marty
-
Thursday, February 18, 2010
3:51:06 PM
can you help with this? (asp.net vb)
A News Sitemap uses the Sitemap protocol, with additional News-specific tags as defined below. Here is an example of a News Sitemap entry using News-specific tags:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:n="http://www.google.com/schemas/sitemap-news/0.9">
<url>
<loc>http://www.example.org/business/article55.html</loc>
<n:news>
<n:publication>
<n:name>The Example Times</n:name>
<n:language>en</n:language>
</n:publication>
<n:access>subscription</n:access>
<n:genres>pressrelease, blog</n:genres>
<n:publication_date>2008-12-23</n:publication_date>
<n:title>Companies A, B in Merger Talks</n:title>
<n:keywords>business, merger, acquisition, A, B</n:keywords>
<n:stock_tickers>NASDAQ:A, NASDAQ:B</n:stock_tickers>
</n:news>
</url>
</urlset>
(mentioned here - http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=74288 )
Mike
-
Friday, February 19, 2010
1:20:26 PM
Possibly, what is the question?