Dynamic Pages and Search Engine Rankings
It's not the CMS that causes the problems, necessarily, but how it forms the URL's in the browser which tend to negatively impact positioning.
In most cases, URL's are formed something like:
http://www.somesite.com/default.asp? sessionid=sdf43fase33r&variable=123&productid=4323&pagename
=44&template=22
As you can see, this CMS relies on a series of variables to display the page. This is where problems tend to develop.
We are always being asked the effect on search engine rankings when using a CMS of this type. When we say that they tend to perform poorly, the first response is almost always "but I always see pages like this ranking."
While this may be true in some cases, we have noticed that in very competitive keyword markets, static pages always outrank dynamic pages. It is not that the spiders aren't able to crawl the pages, its just that they don't perform as well in the search engines.
But before we get into the rankings, let's take a look at what potential barriers there are to indexing the URL listed above.
The first major roadblock which jumps out at me is the sessionid as the first variable. While for most sites the sessionid is critical to tracking users, in many cases, a spider will not index such a page. The reason is that they understand that by simply changing the sessionid, the page may or may not change. Because of the dynamic and unique nature of a sessionid, a single page URL can effectively have many variations, even if the body text doesn't change. The danger of this is that a spider can get caught within the site (also known as a spider trap) whereby they attempt to index a site but keep getting served up pages with different sessionid's.
What happens is, even on a site with a few pages, the spider sees unique URLs because the sessionid changes values. While the value won't change on a single visit, spiders tend to take multiple visits to a site to index the entire site. Therefore, each visit would render a new sessionid, effectively making the page look different to the crawler. A site with a few pages can then have many hundreds of thousands of URL's that a crawler can see. This creates the potential for the spider to get caught in a never ending loop whereby its attempts to fully index a site never happen, and it gets stuck on the site.
This is why crawlers don't like sites with any type of "id" in the URL.
But as you can see, simply moving or removing the sessionid doesn't solve this problem, because there is also a productid. This site risks the spider not visiting it because the spider will see the "id" in the string and may not request the page.
The next problem I see is the number of variables. In general, most crawlers will only effectively index a site that uses 2 or 3 variables. With the sessionid variable, this site has 5, which means that there is a good chance that the pagename and template variables will not be requested when indexing happens. In this case, it would appear that the site relies on the template to display the page, therefore the crawler would receive a 404 error page because the page wouldn't display properly.
So essentially, you have a site which is not indexable by a search engine spider and therefore won't rank for anything. So how to you fix it?
Assuming that you can play with the URL string without breaking something, there are solutions.
The first could be to implement some sort of IP recognition software. What this does is checks the users IP address before serving pages. Then what you could do is define rules that say "when an IP belonging to a spider visits, serve the page without the sessionid." This will not only eliminate one variable, but it will help get spiders into the site because they won't see the sessionid variable.
The next solution would be to use a URL re-writer to change the construction on the URL on the fly before sending it to the browser. Depending on how many variables you have, and how flexible the URL rewriter is, this can have a dramatic impact on search engine rankings.
Looking at the same URL: http://www.somesite.com/default.asp? sessionid=sdf43fase33r&variable=123&productid=4323&pagename
=44&template=22
If we were able to remove the sessionid using IP recognition, the URL would now look like:
http://www.somesite.com/default.asp? variable=123&productid=4323&pagename=44&template=22
Now, if we can implement a URL rewriter, we can look at what variables are needed. Perhaps the template variable and the pagename variable are universal, therefore we can effectively combine them.
Finally, if we can replace "=" with "-" for example, we can effectively rewrite the url to be:
http://www.somesite.com/site-variable-123-product-4343/default.asp
As you can see above, "site" replaces the pagename and template variables, and then we use "-" to replace the "=" and "&" symbols. Now instead of a site with 5 dynamic variables which the crawler would have problems with, we have made the page appear static and we have also moved it to within 1 folder deep of the root of the site giving it much more authority. This should in turn translate into much higher rankings, and the site will be more competitive especially in those high traffic keyword markets.
Of course there are other considerations when implementing a URL rewriter, such as, now the site navigation and any other hyperlinks need to be updated to reflect the new URL, and product pages will need to follow the same URL structure in order to be displayed properly.
But proper planning can ensure that the URL rewriter (and optionally, IP recognition software) can be implemented with little impact.
About the author:
Find more articles at www.searchengineoptimizationworld.com