May 07 2008

Should I remove stop words from my URL's?


Filed under:
Tools:
Tweet This

What is a "stopword"?

A stopword is a word that search engines supposedly ignore.  Google used to provide a statement such as "the following words have been omitted from your search: a, and, the" at the top of your search results.  However, that statement no longer appears.  And if you search Google for "Internet" and "The Internet" you'll notice a different set of results.  So stop words are, contrary to popular SEO beliefs, obviously used by search engines.  The question is how, and should you use them in your SEF (Search Engine Friendly) URL's?

Should I use stop words in my url's?

A list of stopwords commonly removed from SEF URL's by search aficionados includes:

a, an,and, as, at, before, but, by, for, from, is, in, into, like, of, off, on, onto, per, since, than, the, this, that, to, up, via, with

There are some that will remove an even larger list of words.  What happens to your URL without stop words?  Well, if you have a page entitled "Some of this and some of that" the SEF URL's with and without stop words will look as follows:

  • with stop words:  /some-of-this-and-some-of-that
  • without stop words: /some-some

Cater to your visitors first, search engines second

As we've mentioned several times, you always want to remember that you're catering to users and visitors.  That is a lesson in search that most forget when they try and over-optimize their websites for search.  Remember that, in the end, search engines are catering to your visitors as well.  They are trying to make it as easy as possible for people to find what they're searching for.  Therefore, your SEF URL should look good to the search engine as well as the visitor.

Does your URL reflect your page's meaning?

The question to ask when removing stop words from a URL is:  does the URL still reflect the meaning of the page title with stop words removed?  For example, does "some-some" still let the visitor know the page is about "Some of this and some of that"?  Not really.  While you don't need the entire "some-of-this-and-some-of-that" URL, it will probably suffice to use "some-this-some-that"

Automated url's created by CMS systems

Keep this in mind when using auto-URL generating scripts provided by many CMS (Content Management Systems) such as Wordpress and Drupal.  Drupal, for one, lets you modify the list of stop words, or even use no stop words at all.  And most systems let you turn off the automated URL function entirely, enabling you to set your own URL for each page. 

Think of your URL as your street address

The extra time you take to think up a good, short, semantically meaningful URL will pay dividends in the long run.  Remember, once a search engine indexes your pages they're indexed for good.  While you can redirect them later, you want to minimize changes to the URL once the page has been indexed.  Think of the URL as a street address, and the people you need to notify (post office, friends, family) everytime you move.  The same concept applies to search.

Average: 5 (2 votes)
Select your preferred way to display the comments and click "Save settings" to activate your changes.

Drupal Pathauto Stop Words?

I'm using the Drupal CMS as well and I'm noticing that the pathauto module by default takes out stop words, but lets you configure which ones you want to keep.

As a rule of thumb - do you use the pathauto stop words, or do you simply write the URL yourself?

Define your own stop words

Great question - we used to use pathauto, and will often install it for clients for sake of simplicity. But when creating pages myself, I don't use pathauto - I form the URL myself based on what I think makes sense to the user.

  • Allowed HTML tags: <a> <em> <strong> <cite> <ul> <ol> <li> <dl> <dt> <dd><u>
  • Lines and paragraphs break automatically.
  • Textual smileys will be replaced with graphical ones.

More information about formatting options

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.