SEO - Duplicate Content Waters Down Web Pages

Issues With Duplicate Content

Duplicate Content

Duplicating content on the web is easy to do, on purpose and inadvertently. Without keeping a close watch on techniques that can be interpreted as duplicate content, you run the risk of running afoul with search engines and simultaneously reduce the power of your pages on the SERPs. There are three major reasons to take the effort to make insure that you are avoiding duplicate content.

Reason #1: It can be interpreted as a violation of a search engine's terms of service

Because most of the content on the web is plain text, it is therefore a fairly simple task to copy the content of one site to add to another. Just copy, paste and post. This in itself has led to an incredible amount of duplicate content on the web, both by spam creators and by legitimate web site owners.

Because SERPs wouldn't be very useful if a searcher had to wade through pages upon pages of results displaying the same content, search engines have had to come up with a system to take their best guess as to where the content was first generated.

What happens to the duplicate content?

Simply put, the page with the highest reputation (PageRank) is considered the primary source, and the duplicates are dropped from the search engine's index.

Even though they have systems in place to manage duplicate content, search engines still don't like duplicate content. Often times, duplicate content implies "content hijacking", where one site plagiarizes the content of another site and claims to be the original owners. It can also mean that a web developer is putting the same content on multiple sites, which is considered a form of spamming by the search engines since it provides no additional useful information to searchers and makes their robots work harder without additional benefit.

Reason #2: Lost control over the direction of visitors

Because the duplicate content check applies even within a web site, it means that if you have two pages with duplicate content, only one will show up in search engines. Since you have no control over the page that shows up in the SERPs, you have no control over where your visitors end up.

Choosing one page to show to search engines, even if you must duplicate content for usability reasons, insures that you can control what the visitor sees and what options they have to further explore your site from there.

Reason #3: PageRank dilution

Probably the most important reason to be concerned about duplicate content within your site is the dilution of PageRank.

Remember that a link from one page to another counts as a vote of reputation. If you have a page of content that is duplicated 5 times - for example the page is dynamic and there are several different URLs used for the same page - then reputation is being poured into each of these 5 pages.

The search engines, however, will only recognize one of these and will drop the rest from their SERPs. This means that the reputation is divided 5 ways and only 1/5 of it actually counts. That same page would receive a much more powerful reputation if all of the links to it went to a single URL. Instead of 1/5 of the reputation, it would get the whole enchilada, making it much more of a force to be reckoned with.

Techniques to avoid duplicate content

Techniques to avoid duplicate content

Duplicating content is usually not done on purpose and mostly happens through the use of query strings. For instance, take the following example:

Article-list.html?page=2&sort_by=article_name&order=ascending&category=newest

For this dynamic web page, you would probably have original content for each different page, and also each category. However, the two variables "sort_by" and "order" will only re-combine the page in several different ways to give visitors a unique view of the content. Given that there could be 6 sort_by options and "ascending" and "descending" for the order options, that would mean there would be 12 different URLs for each original page of content.

The reputation of each of these pages, divided equally, would only account for 1/12 of the total reputation, so when the search engine decides which one to include in its SERPs, it's only going to have 1/12 of the reputation it could have.

By employing techniques to rewrite your URLs, you will inadvertently cut out most of your duplicate content problems because you will be removing option-type variables from the query string and moving them into session variables and filenames.

Check your links for duplicates

You can ask for any web page using several URLs. For instance, you can just add a query string at the end of a URL - even if the page doesn't use the variables - and it will be considered a different page by search engines. However, the only way a search engine knows if a page exists is if there is a hyperlink to it. If there are no hyperlinks to a page, it won't be indexed. Also, if there are no hyperlinks to a page, no PageRank will be transferred over to it.

Knowing this, you can see that it is very important that hyperlinks to a page remain consistent. Using only one URL in your hyperlinks for every page you want to be indexed by search engines will ensure that the URL is the recipient of all the PageRank conferred to it by links.

On Your Home Page, Cutting Duplicate Content is a Must

Applying this concept is especially important for your home page. Most external links will link to your home page, and ironically your home page has a wider variety of naming options than any of the other pages on your site, making it easy to slip and link to several of the options throughout a site.

For instance, the following URLs are identical:

http://www.bluewidgets.com/
http://www.bluewidgets.com
https://www.bluewidgets.com/
http://bluewidgets.com/
https://bluewidgets.com/
http://www.bluewidgets.com/index.html
http://bluewidgets.com/index.html

There are an infinite number of others, but the examples above illustrate most of the common mistakes developers make when linking to the home page.

Using the rel="nofollow" to avoid duplicate content

Adding the rel="nofollow" attribute for hyperlinks will prevent search engines from following a link, and thus withhold conferring PageRank to it. In addition to using this attribute for outbound links, you can also use it for internal links.

Employing the rel="nofollow" attribute in hyperlinks to duplicate content should knock out the majority of the remaining PageRank leaks. Once you choose a single URL for the standard hyperlinks, you can then add the rel="nofollow" attribute to any link that goes to a different, duplicate URL.

Lesson Summary - Duplicate Content

Lesson Summary - Duplicate Content

In this lesson, we covered the topic of avoiding duplicate content. Most of the time duplicate content is not generated purposefully, so care has to be given to insure that you are not accidentally creating multiple pages where you should only have one.

Duplicating content can violate a search engine's terms of service, cause you to lose control of your visitors, and also has the unfortunate effect of diluting PageRank. By using search engine friendly URLs and making sure all of your hyperlinks point to a single URL for each page of content, you can avoid most duplicate content traps.

Avoiding duplicate content is one of your best tools for really honing the power of each one of your pages.


Web Statistics Montage
Web Stats Montage Icon

Web Statistics Montage is a sweet little tool that compiles your web traffic statistics from your web sites (that's plural!) and sends you a nicely formatted e-mail summerizing the most important parts, like your Google PageRank, the number of visitors that came to your site every day for the last couple of months, the keywords that were used in search engines to find your site that day, and what the three major search engine spiders have been doing with your site. This web stats tool is invaluable! Note that it only works with the cPanel control panel with AWStats enabled. Read more about Web Statistics Montage >>

Download


SEO Rank Checker
SEO Rank Checker Icon

The SEO Rank Checker is a nice tool you can use on your site to quickly check the search engine rank of any keyword or list of keywords in Google, Yahoo and MSN. You can even add it to your web site for trusted visitors to use. Read more about SEO Rank Checker >>

Download









Search-Engine-Optimized.com
Search-Engine-Optimized.com Home SEO Articles SEO Blog Free SEO Tools
About Us Contact Us