SEO - Web Page File Names are Important. Really!
URL Filenames Count
This simple lesson will work wonders when it comes to fine-tuning your search engine optimization strategy.
We have already covered how choosing the right title and headings for your web page will give the search engine robots a clear idea about what your page is about. One commonly overlooked, yet very powerful method for giving your keywords more coverage is the use of your web page URL.
What Web Page File Name Did Widget? (groan...)
Say for instance that you have a product web page where you sell Round Blue Widgets. You have selected the title "Round Blue Widgets - Widgets Online Inc.", and have placed the target keyword, "Round Blue Widgets" in two levels of headings, as well as a couple of paragraphs. Perfect. Now it's time to select a filename using the same keywords you are targeting for the rest of the page, say something like:
Roundbluewidgets.html
Good, we have the keywords in there. The next step is to separate the keywords to improve the chances of a search engine finding them in the name. The best character to use for this in my experience has been a simple dash. So now your filename will look like this:
Round-blue-widgets.html
A fringe benefit of naming your files this way is that it makes the filenames more descriptive and easier to read, which is in turn a good indicator to visitors as to what the page is about.
About Dynamic Filename Extensions
Dynamic Filename Extensions
One of the biggest problems search engines have to face is filtering spam from their search results. Because search engines provide an excellent venue for generating traffic, there are a lot of parties interested in discovering formulas that get their sites to the top. Many of these sites exist only for search engines, and lots of them generate dummy content at a head-spinning rate.
With a simple script, a web developer can easily generate tens of thousands of web pages almost instantly. Using this technology, an infinite number of pages can also be generated "dynamically", meaning that the page does not actually exist until a visitor or robot visits it.
Search engines have learned how to identify files that are likely to have been generated dynamically by spammers by their filename extensions. For instance, a .php or .asp filename extension implies that the page was created dynamically. A .html or .htm filename extension implies a "static" page, one that was created by hand rather than by a program.
SPAMers Ruined Dynamic File Names For the Good Web Designs
Unfortunately, the tools that spammers use to generate this dynamic spam content are the same tools that legit web developers use to manage any significant amount of information on their site. If your site relies on dynamic technology to generate web pages such as PHP, ASP, Coldfusion or ASP.NET, then your filenames will, by default, end in a dynamic filename extension, such as .php or .asp.
There are tools available - depending on the technology you use - that will allow you to use a different file extension for your scripts, namely .html or .htm. By doing so, you "fool" the search engine robots into thinking that your page does not use technology to create dynamic pages. It will take a bit of technical knowledge to accomplish, but it's better to start out with the right extension than changing them after your pages are established with the search engines. I discuss these tools later in this chapter under Using URL rewrite tools .
Don't worry, the Search Engine Spiders don't mind
While I use the term "fool" in this case, search engines do not consider this to be a negative technique. In other words, you will not be penalized for using static filename extensions on dynamic pages.
About Query Strings in URLs
Query Strings in URLs
A query string is typically used to pass information from one page to another through the URL. Here is a practical example, which displays the content of an article:
article.html?id=3
This example tells the page that the value of a variable named "id" is 3. The page can then use this information for processing. For dynamic pages - pages that are created by a script "dynamically" when the visitor visits them - it gives the script essential information that will tell it what to display, and often how to display it.
Query strings are typically used for database web design
For instance, the example above likely pulls an article from a database in which each article has a corresponding number. The URL above will pull article number 3 from the database.
Search engines can index web pages with query strings, but typically with limitations. For instance, query strings can become very long when multiple variables are requested. For instance, take the following URL, which generates a list of articles according to certain variables in the query string:
Article-list.html?page=2&sort_by=article_name&order=ascending&category=newest
The more variables a query string has (the one above has 4), the less likely it is that it will be indexed by a search engine, and the less value the search engine will place on the page. A query string indicates that a page is generated dynamically, which means that the site can feasibly have hundreds of thousands of dummy pages without any real content, again implying that there could be some spam, but also that there could be a large number of pages without any useful content.
For example, the URL above uses the variable "sort_by" to tell the page how to order the list on the page. This example sorts it by "article_name". This indicates that there are other ways to sort the articles, such as by "article_date" or by "author". The "page" variable indicates that there are multiple pages. With the four variables in this example, you could generate a hundred or more URLs that all had the same content, each simply organizing the information slightly differently.
If query strings are so useful, why don't search engines like them?
Perhaps because of this possibility, search engines are reluctant to index pages with long query strings, and the engines don't rank the pages as highly as they would without the query strings.
The solution to this problem takes some technical knowledge, but it is knowledge easily gained with a little surfing on the web. The goal is to get your pages looking like they are not generated dynamically, but still pass the necessary information required to display the page.
The Solution: Use session variables when you can
Session variables are variables stored on a server -the computer that stores and runs the web site - that allow a visitor to keep certain settings throughout their visit. For instance, with the URL example above, session variables can be used to store the "sort_by", "order", and "category" variables, leaving only one variable that has to be passed through the query string. The server keeps track of these variables, allowing you to pare down your query string significantly.
The exact details of implementing session variables vary depending on the technology you are using, but a quick search on the web for "session variable" alongside the scripting technology you choose should lead you to the information you need.
Using URL Rewrite Tools
Using URL rewrite tools
Now the only thing left is to get that one variable left into your filename and out of the query string. This can be accomplished using a tool called mod_rewrite, which is available for the Apache server for PHP pages, or one of several rewrite tools available for Microsoft's IIS server which runs ASP and ASP.NET pages.
What mod_rewrite and the others do is allow you to display one URL to the client while using a different one for the server. This tool can be used to incorporate variables into filenames on the client, but still allow the server access to them just as it would if the variables were in a query string.
A Practical URL Rewrite Example
Let's use the article list example. Having moved three of the query string variables over to session variables, we have one left that we need to move in order to eliminate the query string altogether: the "page" variable.
Right now the URL looks like this:
article-list.html?page=2
But with a rewrite tool, we can change the URL to look something like this:
2p-article-list.html
How to actually rewrite the URL
The rewrite tool uses a standard called "regular expressions" to tell the web server how to re-interpret the URL. For instance, the following is the configuration that tells an Apache web server how to interpret the search engine friendly link above and change it into a form it can use directly:
RewriteRule ^([0-9]+) p-article-list.html$ article-list.php?page=$1
It may look complex at first, but in fact the line above is very simple. This example tells the mod_rewrite tool to look for anything between 0 and 9, followed by a "p-article-list.html". If it finds that, then it tells the Apache web server that what it really looking for is the URL "article-list.php?page=" followed by the number.
When you write the script that deals with the page variable, it can then use the "page" variable exactly as it would if it was in the query string.
Note that this example also illustrates how you can convert a dynamic filename extension (.php) to a static one (.html) using mod_rewrite.
A simple Apache URL rewrite example
While URL rewriting is a fairly technical search engine optimization technique, you don't need to know everything about the technology in order to use it. In fact, if you are using PHP and the Apache web server, you should be able to use the example above for most of your scripts. Just replace the filenames with the ones you want to use.
Just in case, if you need to pass two variables in your filename, say the "page" and the "category", you can use the following example (the entire code is a single line):
RewriteRule ^([0-9]+)p-([0-9]+)c-article-list.html$ article-list.php?page=$1&category=$2
A Practical URL Rewrite Example for a News-Driven Web Site
A Practical URL Rewrite Example for a News-Driven Web Site
Note: In the following example, I assume that you have some background knowledge about how a database-driven web page works.
A website that manages news content will likely be database-driven, making it a prime candidate for keyword-rich URL rewriting.
For this example, I will be building two web pages. The first, article-list.php , will generate a list of search engine optimized hyperlinks to articles. The second page, article.php , will grab the necessary information from an article database and display it.
Rewriting URLs in Two Parts
The final URL of the article pages will need three parts of information in it. First, it will need a unique identifier, which will tell the article.php page which article to fetch from a database. Typically, this will be the "primary key" in the article database. A primary key is a column in a database, which insures that there is only one "key" - typically a number - for each article. That way, you can ask for article "243" without running the risk of there being two articles labeled "243".
The second part will be a unique set of characters that will identify the URL as belonging to the article.php page. Since the URL won't be named article.php anymore, you need something unique in the URL to tell the server that it is really asking for article.php . In this case, the characters "a-" will do the trick.
The final part is going to be the keywords that will give your page the SERP boost. You will need to decide where the keywords in the URL will come from. The most accessible source is going to be the title of the article since it likely contains the most essential keywords the article. You can also use a more complex system based on subject areas and create a spider of your own to determine what the most important keywords in an article are, but using the title is often going to be the easiest and most effective source.
Step-By-Step URL Rewriting
So now that we know where all the information in the URL is going to come from, it's time to put them all together.
The script for article-list.php will need to accomplish the following tasks:
- It will grab the primary key and the title of every article in the database.
Next, it will go through each article's information and do the following:
- Convert all of the characters to lower case (this just makes the URLs consistent and recognizable)
- Remove all non-alphanumeric (letters and numbers) characters except dashes and spaces from the article title.
- Then, it will replace all of the spaces with dashes.
- Next, it will tie together the primary key, the page identifier and the formatted keywords.
- Finally, it will add the URL to the list of articles in the article-list.php page.
Say for instance we have an article with a primary key of "34" and the title "The Total Search Engine Optimization Solution". The script should generate a URL that looks like the following:
34a-the-total-search-engine-optimization-solution.html
The full hyperlink will look something like this:
<a href="34a-the-total-search-engine-optimization-solution.html">The Total Search Engine Optimization Solution</a>
Look at all those keywords in the URL. Not bad!
Geting the Unique ID out of the URL
Now, the final thing we need to do is create a URL rewrite rule that will take the information from the URL, and send the right bits to the article.php page. Here is an example for the Apache web server:
RewriteRule ^([0-9]+)a-(.*).html$ article.php?id=$1
What the rule above says essentially is "look for a number before the characters "a-", followed by anything and ending in ‘.html'. Take the number part, and send it to article.php as a variable called ‘id'."
Then, all we need to do to get the information we need for the article page is to look at what the variable "id" is, and grab the corresponding information from the database.
Just remember that every time you create a hyperlink to the article pages, the link should either be hard-coded to match the generated URL, or it should use the same process to generate the URL dynamically.
Lesson Summary - URL Rewriting
Lesson Summary - URL Rewriting
In this lesson, we delved into the nitty-gritty of optimizing URLs. We covered techniques such as using static filename extensions instead of dynamic ones, using session variables instead of query strings, and finally moving query strings into the filename using URL rewrite tools.
Employing all of the methods above, a search engine will never look over a page because it has too long of a query string, and by condensing a URL so that is consists mostly of keywords, you make the URL a much more powerful indicator of what the page is about to search engines.










