« Register for Online Marketing and PR Webinar | Home | 8 Reasons to Clean Up Your Online Copy »
10 Traffic-Stealing Weeds That Suck the Life Out of Your Google Garden and How to Yank Them
By Kurt Krejny | February 18, 2008
Regular evaluation and maintenance of your website’s Google Indexing or “Google Garden” can be an easily overlooked practice by webmasters and SEO’s alike. It is typically not a top priority for optimization if there are other pressing issues at hand, and it also may be deemed that “the more pages you
have indexed in Google, the more search traffic you may receive.”
However, robust websites with a wide array of files, frame technology usage, and secure pages that get regularly crawled and indexed by Google can easily succumb to a growth in “weed-like” indexing results. These weeds in the garden typically offer little to no value to your search audience, and can suck away the nutrients from growing your main crop, which is your website’s core pages – the ones you really want your search audience to feed on!
Below are examples of the weeds that can suck away nutrients from your searched Google Garden, and prevent it from flourishing:
Unoptimized PDF files and Microsoft Office Documents (doc, xls, ppt)
How many times have you been served up a PDF search result that is "Untitled" and doesn’t have a proper title or description?
PS files (PostScript) and EPS files (Encapsulated PostScript)
These file types are rarely found on websites, but they do get indexed by Google.
Flash files
How often do you search for flash files???
Frames and iFrames
Clicks on these indexed results can lead people to pages without navigation.
Secure https:// website pages
Google can index pages of your secure site instead of your main public site.
Which version do you want your visitors to view?
Pages with parameters for link tracking
Google can index these pages as duplicate content.
Example: http://www.domain.com/page.htm?link=contact can be indexed and it has the same content as http://www.domain.com/page.htm
Low-value / Low-traffic pages
Copyright, Disclaimer, and Privacy Policy pages can fall under this category.
Does your audience truly benefit from visiting these pages from a search engine?
Blog pages
Monthly archives, category, tag/label, and search pages get frequently indexed by Google.
Do you really want them saturating your overall index and competing with your main blog postings?
Is your garden already overgrown with weeds? Do you want to remove a few sprouting dandelions before they get out of control? Here are some basic tips to keep those pesky weeds from overtaking your Google Garden:
Perform a site:www.domain.com query on Google
- Evaluate the presentation of the results in terms of the keywords, branding, and call-to-action
- Make sure you have addressed the basic SEO elements on ALL pages (Page Titles and Meta Descriptions)
Analyze your webstats
- Determine pages with low keyword traffic and overall low search value to remove from the Google Index
Clean up your sitewide internal linking
- Example: Change http://www.domain.com/directory/index.htm to http://www.domain.com/directory/
- If the directory name contains keywords that can be searched, it’ll stand out more if it’s not followed by /index.htm
- This also cuts down on page source code and makes your links look cleaner in the search result listings
Sculpt your site’s template links with nofollow tags
- Identify low-value pages on your site that you do not want indexed, and add a nofollow tag on links to those pages (Wikipedia says that this is ‘what nofollow is not for’ – but it is another technique outside of the robots.txt to ensure these pages do not get crawled and indexed)
- This will improve your internal link structure and give extra weight to your main pages
- For more information read SEOmoz’s post on sculpting with nofollow tags
Optimize Titles for all PDFs and Microsoft Office Docs
- Don’t overlook this simple step as these files rank for keyword searches and can receive quality traffic if optimized properly
Create an XML sitemap file and keep it updated
- If you don’t want a page or file on your site to be indexed, remove it from this file (however this does not guarantee that page or file won’t be indexed)
- Visit sitemaps.org to view proper protocol
Utilize Google Webmaster Tools
- Enough said. Keep on the lookout for new tools
The robots.txt file is your friend
- Visit robotstxt.org for information on proper formatting
- Create a separate robots file for https:// site and disallow duplicate pages that also reside on the http:// site (to help ensure these secure pages do not get indexed instead of the http:// pages)
Clean up your index!
- Upload a clean XML sitemap with pages you want indexed
- Upload a robots.txt excluding pages you do not want indexed (see examples above)
- Add a meta noindex tag to pages you want removed
- Submit a URL removal request in Google Webmaster Tools
Keeping these tools readily available in your garden shed and using them when necessary will help keep your Google Garden free of weeds, will allow your cash crop to grow big and bountiful for your search audience to feed on. Remember to keep planting new seeds (valuable content) each season to expand your garden!
Want to know more about search?Visit our search engine marketing forum.
Want to know more about online videos?
Stop by our Internet video marketing forum.



February 18th, 2008 at 3:05 pm
Great Post! Lots of useful information. Time to go weed my Google garden.
February 18th, 2008 at 3:22 pm
WOW! Great tips. Some of these I never really thought of.
February 18th, 2008 at 3:46 pm
Great set of tips Kurt. Thanks for the info.
February 20th, 2008 at 1:34 am
Thanks, this is great list
February 20th, 2008 at 5:15 am
Great article. I wonder about sculpting with no follow tags is it really worth the saved juice trying to manipulate page rank like that?
February 20th, 2008 at 6:34 am
Nice list, but I can’t find out why you called it “10 Traffic-Stealing Weeds”, instead of “8 Traffic-Stealing Weeds” :-)
February 20th, 2008 at 7:45 am
Nice article. I like the no-follow sculpting tips.
February 20th, 2008 at 12:51 pm
Hi Geld, I counted the first 2 items as 4 (PDF, Office Docs, EPS, PS) and 10 sounded like a better number :) - good catch… I knew someone would notice that!
February 20th, 2008 at 1:32 pm
Kurt, I especially appreciate the part about keeping low-value, low-traffic pages out of the index, because this seems to be an unknown good way to manage a site’s link equity. Asking the question, “Is this page useful to search engine visitors?” should be a constant determiner for onsite SEO work. Another reason to weed those pages out, pointed out by Aaron Wall in a recent blog post, is that large websites’ bandwidth can be taxed by the volume of spiders crawling them.
February 20th, 2008 at 3:34 pm
Thanks for posting these tips on how to yank traffic stealing weeds..^^ and by the way..I love the comparison..^^ great work Kurt!
February 21st, 2008 at 12:20 am
Paul, bandwidth drainage on large websites is definitely a negative effect from weed-like pages, and the rate at which they are crawled/indexed. Thanks for adding this!
February 25th, 2008 at 1:26 pm
These are great tips! I think often we overlook the “weeds” because we are focused on providing more content. It’s important to remove those components with little or no value that could affect the indexing of other pages on your website.
February 28th, 2008 at 8:54 am
You really keyed in on critical site vulnerabilities. Nice job.
February 29th, 2008 at 10:14 am
great tips!! Thanks for sharing.
March 1st, 2008 at 1:05 am
Excellent post, Kurt.
We’ve been struggling with where to appropriately position nofollow within larger sites.
Thinking in terms of “what pages do you want to see in SERPs” is WAY more customer-centric than simply managing Page Rank. That’s something our clients can clearly understand. Brilliant post.
Mark Alan Effinger
RichContent.com