|
|
|
First of all, for those that are just getting their feet wet, what is duplicate content? Duplicate content is just that - text that is the same, word for word, on different pages; mostly found on different websites as well (it wouldn't make much sense to duplicate your own pages). We differentiate between external (content on external websites) and internal (content within the same website) duplicate content. The bottom line is that you are not penalized for duplicate content, unless your entire website was built for that purpose. This comes straight from the horses mouth - Adam Lasnik, Search Evangelist/ Web Master Communications expert from Google. Why? Read on to find out.
External duplicate content raises more flags than does internal duplicate content, because it is located on two or more different websites, and therefore the assumption is made that the original was copied by a different person or entity. With internal pages, duplicate content is primarily unintentional, and the assumption is made that the writer of the content is one and the same.
It's not uncommon for pages on CMS (Content Management Systems) systems to be duplicated inadvertently. The system that this website was designed with (Drupal), for example, by default names pages /node#, where # is substituted with a page number. Because this is not very search-engine friendly (we recommend using keywords in your page path - check out our article on SEO 101 and read the section entitled "SEF (Search Engine Friendly) URL's/ Filenames"), we implement Drupal's "clean URL" mechanism, which automatically generates filenames based on keywords in the title.
The problem is that at this point, two versions of each page exist - the /node# version and the /friendly-page-title version. By default, search engines will index both pages and recognize them as different entities. However, when they recognize one page having the same exact content as the other, they are likely to disregard one of the pages. For this reason, we block out (via a robots.txt file located in the top level public (ie. public_html or www) directory) all the /nodes pages with the following statement:
User-agent: *
Disallow: /node
In most cases, duplicate content is not penalized. Why? Because of content syndication. It can be useful for important content to be found in more than one place, and in more than one variation. Just like with newspapers and magazines, the more sources an item has the more easier it is for the consumer to find.
How does a search engine determine if your content is the original, or first, version, and why is that important? The first indexed version of an article is typically the one that receives the most ranking points. That's why it's important that if you publish content you notify the search engines (in particular the big three - Google, Yahoo, and MSN) before someone else does. You can do this via their Webmaster Sitemap tools. Read more about those in our article on sitemaps: roadmaps for visitors and search engines.
The only time you would receive a duplicate content penalty is if you were trying to falsely promote the work as your own. While search engines typically won't pick up on such an incident, if your entire website serves this purpose you'll most likely be discovered and penalized. Remember that content is king - it's vital that your website contain primarily original and fresh content that is regularly updated.
All Content © 2012 Contract Web Development, Inc. All Rights Reserved. Privacy Policy | Terms of Use | Powered by Drupal