While I hesitate to call this “ultimate”, I can’t help but feel like this is an achievement after a number of routes were followed and ultimately fell short in one way or another. The culmination of those lessons learned is here, in two parts, with some code snippets to download.
First thing to note is this was built for Sitecore 6.2. I’m not sure if there are any goodies in later versions which make any of this easier. We’ll be getting our hands dirty with 6.4 shortly.
Our requirements:
- Handle 301 redirects, from old site URLs to new for site migration. Must include static (i.e.htm) files as well as folders and aspx pages
- Handle 301 redirects for aliases (i.e. vanity URLs)
- Manage these 301′s via Sitecore
- Handle 404 pages properly, i.e. by returning a 404 status and our custom page without redirecting
On the surface this might seem like it should be straight-forward, until you run into some of the issues thrown up by IIS7′s 404 handling, Sitecore’s built-in 404 redirect and alias handling, and the IIS URL rewrite module, all of which were investigated before being abandoned.
Let me expand:
- IIS7 doesn’t allow you to use a .NET page to handle a 404 properly, or at least it’ll do a 302 first and not give you a decent view of what URL you had initially entered. We tried several configurations without success.
- IIS URL Rewrite module could be setup to handle files as redirects, but we’d have to manage the whole list of redirects in IIS and we hit Sitcore exceptions when we tried this. We could alternatively rewrite all URLs to a .NET page for processing but this is pretty inefficient.
- Sitecore’s built-in 404 page (ItemNotFoundUrl) performs a 302 then delivers the page without a 404 status, which is bad for SEO. Also this means the original URL is lost as the user is redirected to /notfound.aspx
- Sitecore’s built-in aliases are global, so if you have a multi-site instance, an alias for “/jobs” will run for both domain1.com/jobs as well as domain2.com/jobs
So, in Part 1 of this article I’ll show how we handle 301 redirects, aliases and 404 pages. In Part 2 I’ll look at how to make them manageable via Sitecore.
Some settings
There are a million ways to store config settings in Sitecore. My currently favoured option is to store them in a global settings class, which will either read them from sitecore items, web.config settings or contain static values depending on requirements. They’re also nice and concise to reference in code. For this project, it looks like this:
public class Settings { // Global settings public static string ClientName = "Dog"; public static string GlobalPath = "/sitecore/content/DogDigital/Global"; public static string NotFoundID = "{4532999D-7CE8-4A09-9C9F-F433B3C2D025}"; // ID of CMS editable 404 page // Redirects section public static string IDRedirects = "{C609C37A-A9E8-484E-B9C8-E70E3F83531D}"; // ID of redirects root item public static string IDRedirectsTpl = "{392B10AE-6676-4CC3-AEAB-91A39CE871B7}"; // ID of redirect item template public static string RedirectsXMLPath = "/app_data/redirects.xml"; // Path to redirect xml file public static string RedirectsStaticFiles = "htm,html"; // Extensions of those static files we want to handle in 301 redirects }
While I’ve added some settings to web.config, we’re going to create a custom config file called DogRedirect.config and add this to /App_Config/Include. This is where we’ll put the Sitecore-specific config changes.
The redirect XML
We’re going to start with a simple XML file containing redirects. This can be aliases as well as old-to-new migration or dead link SEO redirects. It looks like this:
<!--?xml version="1.0" encoding="UTF-8"?--><?xml version="1.0" encoding="UTF-8"?> <redirects xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <redirect> <from>/jobs</from> <to>/about-us/work-with-us.aspx</to> </redirect> <redirect> <from>/default.htm</from> <to>/</to> </redirect> </redirects>
We’re storing this in /App_Data/redirects.xml which is already webserver writable (more on that in Part 2)
HTML file handling
In order to have html files handled in the same was as aspx files (and for that matter, dynamic folders) we need to get .NET to pick them up. We need to make sure the AppPool for the website in IIS is running in Integrated Mode and we need to add an html handler to our config. Add the following to the system.Webserver->handlers section:
<add name="htm-to-aspx-isapi" path="*.htm*" verb="*" modules="IsapiModule" scriptProcessor="%windir%\Microsoft.NET\Framework\v2.0.50727\aspnet_isapi.dll" resourceType="Unspecified" preCondition="classicMode,runtimeVersionv2.0,bitness32" /> <add name="htm-to-aspx" path="*.htm*" verb="*" type="System.Web.UI.PageHandlerFactory" resourceType="Unspecified" preCondition="integratedMode" />
In addition, we need to tell .NET to compile these extension
<compilation> <buildProviders> <add extension=".htm" type="System.Web.Compilation.PageBuildProvider"/> <add extension=".html" type="System.Web.Compilation.PageBuildProvider"/> </buildProviders> </compilation>
Now all files we care about are being handled by .NET we can start routing them to our script for processing.
httpRequestBegin Pipelines
Sitecore’s pipelines are extremely useful, once you get your head round which one does what. After a bit of wrangling, I managed to pinpoint where in the http pipeline Sitecore had decided the static file didn’t exist, which is at the very start. The first method in the httpBeginRequest pipeline is the “CheckIgnoreFlag” method. This is where Sitecore realises the html file doesn’t exists and redirects it to it’s 404 page. In order to process an html URL we need to add a handler before this method. This was unfortunate as it hadnt defined a context by this point (Site context and Item context being required for our dynamic handler for aliases and aspx files). So, we need two pipeline methods – one for static and one for dynamic.
The static handler goes right at the start of the httpRequestBegin pipeline. Add the following to your custom config file:
<processor type="Dog.Code.Dog.StaticRedirectHandlerPipeline, Dog" patch:before="processor[@type='Sitecore.Pipelines.PreprocessRequest.CheckIgnoreFlag, Sitecore.Kernel']" />
The dynamic handler goes after the item has been resolved, like this:
<processor type="Dog.Code.Dog.DynamicRedirectHandlerPipeline, Dog" patch:after="processor[@type='Sitecore.Pipelines.HttpRequest.ItemResolver, Sitecore.Kernel']" />
The Static File Handler
Now it’s time to download the code if you haven’t already. It’s in a zip at the bottom of the page.
The first class StaticRedirectHandlerPipeline checks whether the URL contains a static file extension as declared in Settings.RedirectStaticFiles, and if so checks first if it’s a real file, then if it exists in our XML file. If a match is found, it sends it to the Utils.handle301 method which sends headers and redirects. Simple.
If however, the file does not exist, we need to sent the 404 status and return our custom 404 page. This isn’t as easy as it sounds as if we let the pipeline continue Sitecore will 302 redirect the user to the 404 page, which doesn’t itself send a status.
This might sound nuts, but the solution was to read the contents of the 404 page into a string! We can then set proper 404 headers, return the string to the browser, and end the response. There’s a bit of a classic ASP approach to this admitedly, but it works and nothing else was going to do the trick.
The Dynamic File Handler
If the above code doesn’t do anything (i.e. the file isn’t static), the request falls through to the point where we have a Sitecore context, and our next handler kicks in.
This one was originally based on Sitecore’s Shared Source NotFoundRedirector class, however doesn’t bear much resemblance to it any more. First we check if there’s a Sitecore context item, and if there is, do nothing. Similarly, if there’s a physical aspx file or folder also do nothing and return.
Next, we grab the URL and any querystring and check it against the XML file, sending a 301 if we get a match. This will handle old-to-new redirects as well as aliases, as they’re all in the XML file. I’ll go over how they’re managed in Part 2.
Finally, if all else has failed, we set the Sitecore context item to the ID of the 404 page. This maintains the URL but also sends the 404 content and status.
Sending 404 status from your 404 page
This took a bit of digging, and in the end it turned out that you can’t send 404 headers from the page prior to redirecting (before we’d decided on our final approach we were still working with redirects). Also, sending the 404 status from a user control (sublayout) within the 404 page didn’t work either. The status needed to be set in the layout. We therefore needed a code block like this in the layout used by our 404 page:
if (itm.ID.ToString() == Settings.NotFoundID) { Response.TrySkipIisCustomErrors = true; Response.StatusCode = 404; Response.StatusDescription = "Page not found"; }
Note the Response.TrySkipIisCustomErrors = true line. This is a little-known statment from .NET 2 SP1 onward which prevents us getting into ugly territory with infinite redirects between our 404 page and IIS’s own.
Summary
So, now you should be able to hit a URL and have it either redirect with a 301 to your target page, or bail out with a 404 status. Get Firebug on the case and see how it goes.
Download Files
Here are the files necessary with the exception of the web.config settings for handling html files noted above. Please note that these have been extracted from a working solution and tweaked in places to make them more generic without testing again. There may be bugs, and if found please let me know and I’ll update them. It likely won’t run without some changes, and as always this code is provided “as is” and I am not responsible for any problems arising from its use.
Add RedirectHanderPipeline.cs, RedirectHandlerUtils.cs, Settings.cs and Utils.cs to your Sitecore web application
Add DogRedirector.config to your /App_Config/Include folder
Add redirect.xml to your /App_Data/ folder
Update the Settings.cs file with some correct values.
Now head on to Part 2 to find out how to manage all this via Sitecore!
Hi, this is an interesting read. There is a lot of spam out there which relates to SEO and to be completely honest it often goes way over my head! However this succinctly outlines and expands on exactly what is involved in relation to this subject. Very educational indeed.
[...] output the content of your custom 404 page. If you want more details on this you can read this blog post, the code below will provide a similar solution. The above image shows the response header we want [...]
It’s a crazy workaround but glad you figured it out. Thanks!
A site that I am working on recently re-launched their new site and in doing so, created a lot of 404 pages (321 pages). I have a list of these 404 pages that I compiled through Google Webmaster Tools. If I want to simply redirect these exact pages from the old page to the new page, could I just create an XML file like above and save it here /App_Data/redirects.xml and will this correct the issue? Is there something else I need to do to tell the webserver to look for this file before processing the page? I am new to Sitecore so this is all foreign to me. Thanks for any help.
Yes that’s certainly possible and one of the uses for the script. You need to take all the steps though and create the pipeline or nothing will process the XML file.
This is an awesome solution! I have begun implementing it in my client’s site. The 404 portion is working great. My issue is with the 301 redirects. We are migrating from Teamsite to Sitecore so all of the pages we are migrating from are with a .jsp extension. IIS is not allowing the requests to be passed through. How did you get IIS to allow your static .html files to pass through?
Thanks Nick. I’ve not worked on JSPs with this, but I’d just replicate the configs under the HTML File Handling section above for your jsp extentions. That should create HTTP handlers for that extension and force .NET to process them. Good luck!