Keep in mind, however, that how you ultimately handle URL rewriting depends largely on where you are relocating to, and how you plan to get there. For example, if you've decided to make the switch to an Apache-based hosting solution, your options will be much different that moving to an IIS-based one.
Because of some factors not directly related to the blog, I went down the shared-hosting-IIS road, which presented a set of unique challenges. In my case, I had some pretty particular requirements:
- Rewrite old URLs in the most search engine friendly way possible.
- Not break any existing external links.
- Eliminate any lasting dependency on the Community Server database.
- Implement truly "pretty" URLs, so that I don't ever have to go through this process again.
- Continue to use shared hosting.
- Avoid having to inspect every incoming request regardless of whether it needed to be rewritten or not.
- Retain a single point of management for application hosting, regardless of the underlying technologies.
The next problem I ran into was the availability of a ready-made, pattern based URL rewriting tool. Apache's mod_rewrite module wasn't an option. IIS7, when finally released in RTM form, will include a native URL rewriter much like it, but because this is presently a CTP add-on most hosting providers haven't installed it yet. ISAPI-based solutions like ISAPI Rewrite or the freeware IIRF component are also options, but it's rare to find these in a shared hosting environment, and IIRF isn't II7 ready anyhow.
I finally realized that I was going to have write some code to get what I wanted. Taking some cues from Dave Bost's post on the subject, here's the high-level approach I followed:
First, I installed the Remove index.php from Permalinks in IIS plug-in to get rid of the "index.php" in my WordPress URLs. You may need to regenerate your permalinks in WordPress using the admin console before you get much further, since you'll need them to be production ready for the mapping process to work.
Next I saved the output of the Google sitemap that is dynamically generated by Community Server as an XML document. In my case, this represented almost all of the URLs I had to worry about. If you're skilled in regex and/or SQL-XML, this file will come in very handy.
At this point, I started poking around in the Community Server database, and realized something I hadn't paid much attention to before. Because I had upgraded through the years from DotText, the format of my links was inconsistent. Some used the pseudo-pretty Community Server format, while some were still in the ugly DotText format.
After a bit of quick testing, I also discovered the method used to escape special characters in a URL is different in Community Server than it is in WordPress, and that Community Server allows multiple URL formats for the same content. Even though the latter probably wouldn't have a huge impact on SEO (since the alternate formats didn't appear in the sitemap output), I didn't want to risk breaking an external link somewhere. In the end, I decided to handle the following Community Server link formats, which meant three possible inbound matches for each WordPress post:
<!-- CS 2007 format -->
http://oldsite/archive/YYYY/mm/dd/some-intersting-post-title.aspx
<!-- some old CS format I think -->
http://oldsite/archive/YYYY/mm/dd/1446.aspx
<!-- legacy DotText format -->
http://oldsite/1234.aspx
Once I had determined the link formats I was going to support, I grabbed the output from my WordPress Google sitemap. Again, you may need to regenerate this manually, since you're WordPress permalinks probably changed after installing the permalink plugin.After a few hours of some creative SQL-XML and regex work on the Community Server database, the Community Server sitemap, and the WordPress sitemap, I had an XML document that I could use to build an in-memory dictionary to map the old URLs. The schema looks something like this:
<urls oldUrlRoot="http://kriscargile.com/" newUrlRoot="http://www.kristophercargile.com/">
<url sourceType="post">
<new>2008/06/29/atlanta-scrum-users-group-formed/</new>
<old>archive/2008/06/29/atlanta-scrum-user-group-formed.aspx</old>
</url>
</urls>
I decided to represent the old and new root paths as attributes, since this greatly simplified local testing, and allowed me to use the newUrlRoot value as a fallback for an invalid or deleted URL.Using the Application_Start event handler in Global.asax, the XML file is parsed into a cached Hashtable. A cache dependency on the XML file allows for easy tweaks without the need to bounce IIS. Each inbound request is then handled by Application_BeginRequest—which handles every client request and fires at the start of the ASP.NET pipeline—to see if the old URL exists in the hashtable, and where the subsequent 301 should point to.
Finally, I replaced the old Community Server website with the new Global.asax file. Once everything was tested and verified, I backed up my old Community Server database and dropped it.
The source code for the rewriter is below. You'll need to generate your own content for the XML mapping file, though a stub is included.
If you decide to use or extend it, I'd appreciate your feedback.
klc;
cs2wp-url-rewriter.zip
5 comments:
Use this flag to prevent the currently rewritten URL from being rewritten farther by following rules. Content Rewriter
Good site I "Stumbledupon" it today and gave it a stumble for you.. looking forward to seeing what else you have..later
[...] | user-saved public links | iLinkShare 3 votesFun with URL Rewriting, WordPress, and IIS7>> saved by jimantonopoulos 1 days ago4 votesNow Available: URL Rewriter Tech Preview 1>> saved by [...]
And there is what some alternative? ;)
Think twice before you go down this road: http://www.kriscargile.com/2008/08/wordpress-on-iis7-revisited.html
Post a Comment