A Search Engine Optimization blog whose primary focus is to provide information about search engines and to design search engine friendly pages. By Joann Cabading
Thursday, November 23, 2006
PinoyCallcenter Login Trouble
Monday, November 20, 2006
1. Rate Websites. Get involved in a community and start rating other people's websites.
2. Add Friends on your network. This will increase the volume of users that are likely to give you a two thumbs up
3. Submit new pages.
4. Carefully select topic titles.This is important to make sure your webpage is displayed to relevant users
5. Add multiple tags.
That's all for stumbleupon. gotta get back to work.
Sunday, November 19, 2006
Finally, i now see stumbleupon's contribution on my traffic. Now on the top 5 of my Most referring sites!
2006 Pubcon in Las Vegas
1. "Google doesn't give a lot of weight on directory links". - by Matt Cutts
My comment. it says "a lot of weight" , that means they still do give importance to it, just not that much.
2. Don't use link services, link exchanges or buy links. Rely on your good content to get ONE-WAY inbound links.
3. Keep your page titles short and simple (i usually limit my title to 100 characters). Google might look at long titles as if you're trying to stuff some extra keywords, etc.
4. Unique titles on your pages.
5. Avoid templating pages. Don't create tons of pages that have very similar content with the exception of a paragraph or two. Most of it will get flagged as duplicate content and will likely de-index most of your pages.
6. Try to keep most of your javascript in includes. Helps keep the source cleaner from a crawlers perspective since they don't have to look at it.
7. Register your site with Google local and Yahoo yellow pages. Doing so helps boost your rank in local search. A good site description when registering plays an important role in your ranking, so pay attention to it as well.
Tuesday, November 14, 2006
Google PubCon2006 in Las Vegas
2006". An event where bloggers, SEOs, webmasters, programmers and anyone
who has the passion to know more about Google, gathered together. It's
just too bad, i wasn't able to attend but i wish that soon, i could. I
know some filipinos went to this event and i'm hoping that when they come
back to the philippines, they will be organizing a conference here in
manila and share what they have learned during this PubCon. But i'm
keeping my fingers crossed as well that they won't charge too high (of
course to cover up the expenses they have spent during their travel!)
Here's the list of google events at pubcon:
Tuesday 14
10:15 - 11:30 SEO and Big Search Adam Lasnik, Search Evangelist
1:30 - 2:45 PPC Search Advertising Programs Frederick Vallaeys, Senior Product Specialist, AdWords
2:45 - 4:00 PPC Tracking and Reconciliation Brett Crosby, Senior Manager, Google Analytics
Wednesday 15
10:15 - 11:30 Contextual Advertising Optimization Tom Pickett, Online Sales and Operations
11:35 - 12:50 Site Structure for Crawlability Vanessa Fox, Product Manager, Google Webmaster Central
1:30 - 3:10 Duplicate Content Issues Vanessa Fox, Product Manager, Google Webmaster Central
5:30 - 7:30 Safe Bets From Google Cocktail party!
Thursday 16
11:35 - 12:50 Spider and DOS Defense Vanessa Fox, Product Manager, Google Webmaster Central
1:30 - 3:10 Interactive Site Reviews Matt Cutts, Software Engineer
3:30 - 5:00 Super Session Matt Cutts, Software Engineer
Here's the Activity Details:
Oki, so that's it! I will keep myself updated with these events and will be updating this blog too for others to see. I missed one day by the way. since i'm writing this blog on 15Nov. Sorry!
Monday, August 21, 2006
What's Cloaking?
Cloaking is a way of serving the search engine spiders a different optimized page to what the website visitor sees. So if the Google spider / bot comes along to index your page it will serve it a page specially designed for Google. If the Altavista bot comes along, it will get an Altavista optimized page - etc...
Cloaking software serves the spider bot a 'content-oriented' page that is optimized for that specific engine. It is not the same page served to the surfer.
One advantage of cloaking is that people cannot "steal" your code. If you spend time optimizing your pages, it is simple for someone to copy that page and change it slightly if they are competing with you. With cloaking, there is no way they can see the code that got you a high ranking.
However many search engines regard it as "cheating" so if you are going to use it, you have been warned. Some sites register another domain name that will redirect to the main website.
Tuesday, July 18, 2006
List of SEO Tools
Originally posted by Wayne from World Famous Gift Baskets.
World Famous Gift Baskets, Corporate Gifts, Holiday Gifts
1. Reporting Spam to Google - http://www.google.com/contact/spamreport.html
2. Use Google to search your website - http://www.google.com/services/free.html
3. Submit your website to Google - http://www.google.com/addurl.html
4. Monitor Keyword Phrases - http://google.com/webalerts (This is neat to check out however does not help that much)
5. Googles Guidelines for Websmasters - http://www.google.com/webmasters/guidelines.html (A must read new people)
6. Facts for Webmasters - http://www.google.com/webmasters/facts.html
7. Having Trouble? Contact Google Directly - http://www.google.com/ads/offices.html
8. Googles Top 3 asked Questions - http://www.google.com/contact/search.html
Froogle
1. Get your products into Froogle - http://services.google.com/froogle/merchant_email
Advertising
1. PPC with Espotting - http://www.espotting.com/advertisers...ferralTypeID=1
Website Design & Tools
1. Free Forms for your website TFMail - http://nms-cgi.sourceforge.net/
2. Validate Your HTML - http://validator.w3.org/
3. HTTP Error Code Meanings - http://www.searchengineworld.com/val...errorcodes.htm
4. Keyword Tracking - http://www.digitalpoint.com/tools/keywords/
5. Link Checker - http://dev.w3.org/cvsweb/~checkout~/...0charset=utf-8
6. Search Engine Relationship Chart - http://www.bruceclay.com/searchengin...nshipchart.htm
Bruce Clay does an excellent job of keeping this updated.
7. Link Popularity Checker (Uptime Bot) - http://www.uptimebot.com/
8. Character Counting - http://a1portal.com/freetools/charcount.htm (This is great when optimizing your title or meta tags)
9. Character Encoding - http://www.itnews.org.uk/w_qrefs/w_i...p_charsets.cfm (Ever wonder what those iso-8859-4 or utf-8 were or how to use them?)
10. Converting Hex to Dec or Vias Versa - http://www.hypersolutions.org/pages/hex.html#DectoHex
11. Ascii-Dec-Hex Conversion Code Chart - http://www.sonofsofaman.com/misc/ascii/default.asp
12. Ascii-HTML View Conversion Chart - http://a1portal.com/freetools/asciicodes.htm (This is an excellent resource when placing ascii code on your website. Remember to use the correct character encoding)
13. Ascii Chart in .GIF Format - http://www.jimprice.com/ascii-0-127.gif
14. Customer Focus Tool - http://www.futurenowinc.com/wewe.htm (Tells you whether your website is focused on your customers or not)
15. Dead Link Checker - http://www.dead-links.com/ (Doesn't crawl links within Frames or JavaScript)
16. Adsense Simulator - http://www.digitalpoint.com/tools/adsense-sandbox/ (This will give you an idea of what ads will be displayed on your website before you place them)
17. Google Page Rank Calculator - http://www.webworkshop.net/pagerank_...ator.php3?pgs=
18. Page Rank Finder - http://www.seo-guy.com/seo-tools/google-pr.php (This is a great tool to find quality websites with the PR that you are looking for to exchange websites with. This tool only looks at the home page not the link pages. This tool looks at 10 pages or 100 results)
19. Future Google PR - http://www.searchengineforums.com/ap...e::eek:rphans/ - This is an article that tells you what datacenter your Google PR is udpated on first.
20. Keyword Analysis Tool - http://www.mcdar.net/ - This tool is a must. It's quick and easy to use
21. Keyword Density Analyzer - http://www.webjectives.com/keyword.htm
22. Keyword Difficulty Checker - http://www.searchguild.com/cgi-bin/difficulty.pl (You will need a Google API for this one)
23. Free Google API - http://www.google.com/api
24. Rocket Rank - http://www.rocketrank.com/ - This will only check the top 20 of the following SE's:
(All The Web DMOZ AltaVista Overture Excite Web Crawler HotBot Lycos What U Seek Yahoo)
Keyword Suggestion Tools:
25. WordTracker & Overture Suggestions http://www.digitalpoint.com/tools/suggestion/ - This is the best one of the three
26. Adwords Suggestion - https://adwords.google.com/select/ma...KeywordSandbox
27. Overture Suggestion - http://inventory.overture.com/d/sear...ry/suggestion/
28. Link Analyzer - http://www.scribbling.net/analyze-web-page-links Analyze the ratio of internal liinks vs. external links. This is a good tool when determining page rank leakage.
29. Link Appeal - http://www.webmaster-toolkit.com/link-appeal.shtml (Want to know whether or not you actually want your link on that page?)
30. Link City - http://showcase.netins.net/web/phdss/linkcity/ (This place has EVERY tool under the sun for everything you could ever possibly want)
31. Link Reputation - http://198.68.180.60/cgi-bin/link-reputation-tool.cgi (Reveals baclinks pointing to the target URL along with a link survey for eack backlink.)
32. Google PR Tools - http://www.thinkbling.com/tools.php (This guy has tons of fantastic tools. He is not as popular as some of the rest but the tools are great)
33. Protect Your e-mail address - http://www.fingerlakesbmw.org/main/flobfuscate.php
34. Digital Points Ad Network - http://www.digitalpoint.com/tools/ad-network/?s=2197 - After using all of the tools and more on this page. This has helped out the rankings faster than anything else.
35. Sandbox Detection Tool - http://www.socengine.com/seo/tools/sandbox-tool.php - Is your website being sandboxed?
36. Spider Simulation - http://www.submitexpress.com/analyzer/ - See what the spider sees on your website
37. SEO-Toys - http://seo-toys.com/ - These are some things that I had in my favorites. Some of them are okay.
38. Multiple SEO Tools - http://www.free-seo-tools.com/ - This website has a variety of misc. tools on it that you can use to better your search engine rankings.
39. Bot Spotter - http://sourceforge.net/projects/botspotter - This is a phenomenal script that will track what bots hit your website at what times. (Runs on PHP enabled websites)
40. Net Mechanic - http://www.netmechanic.com/toolbox/power_user.htm - This will break your website down and tell you any errors that you may be unaware of.
41. Statcounter - http://www.statcounter.com/ - This will track your clients throughout the dynamically created pages of your website. This is a free service. (of course I don't have to mention this to you guys)
42. Dr. HTML - http://www.fixingyourwebsite.com/drhtml.html - This will test your website for any errors that you may be unaware of and tell you how to fix them.
43. Page Rank Calculation - http://www.sitepronews.com/pagerank.html
Webmaster Forums
1. Web Pro World - http://www.webproworld.com/forum.php
2. Webmaster World - www.webmasterworld.com
3. Digital Point - http://forums.digitalpoint.com
4. Search Engine World - www.searchengineworld.com
(There are 10,000,000 others but those are some good ones)
Newsletters & Articles
1. Site Pro News - www.sitepronews.com (This guy has some great articles however he tells you up front he knows nothing of SEO)
2. In Stat - http://www.instat.com/ (This has some decent insite)
3. Page Rank Explained - http://www.webworkshop.net/pagerank....olbar_pagerank
4. Seach Engine Ratings and Reviews - http://searchenginewatch.com/reports/
5. Database of Robots - http://www.robotstxt.org/wc/active/html/index.html
(Ever wondered anything about the spiders that are out there?)
6. Guide to deisgning a website - http://www.webstyleguide.com/index.html?/contents.html - This is an online book that tells you the basics of website design.
Webmaster Information
1. Want to know where all of the Internet traffic is at? - http://www.internettrafficreport.com/main.htm
ISAPI Rewrites
1. URL Replacer - (Free) - http://www.motobit.com/help/url-repl...od-rewrite.asp
2. Mod Rewrite2 - ($39.90US) - http://www.iismods.com/url-rewrite/index.htm
3. URL Rewrite - (23.00EUR) - http://www.smalig.com/url_rewrite-en.htm
Link Exchanging
1. Links Manager ($20.00US /mo)- http://linksmanager.com/cgi-bin/cook/control_panel.cgi (This is great for the beginner however you will find out that you need to majorly adjust your pages manually in order to pread page rank throughout them otherwise you end up with 20 pages with no PR and 1 page with PR.)
2. Page Rank Finder - http://www.seo-guy.com/seo-tools/google-pr.php (This is a great tool to find quality websites with the PR that you are looking for to exchange websites with. This tool only looks at the home page not the link pages. This tool looks at 10 pages or 100 results)
3. Link Appeal - http://www.webmaster-toolkit.com/link-appeal.shtml (Want to know whether or not you actually want your link on that page?)
Search Engine Submissions
1. Submit Express - http://www.submitexpress.com/newsletters/dec_15_00.html (A lot of people utilize this service. I don't utilize it)
2. Alexa - http://pages.alexa.com/help/webmaste...tml#crawl_site
3. AOL - http://search.aol.com/aolcom/add.jsp
4. DMOZ Dummies Guide - http://www.dummies-guide-to-dmoz.org...not_google.htm
5. DMOZ Instructions - http://dmoz.org/add.html
6. DMOZ Resource Forum - http://resource-zone.com/forum/showthread.php?t=396 (This is where you go when you website doesn't show up in DMOZ after you have submitted READ THEIR RULES FOR ASKING)
7. ExactSeek - http://www.exactseek.com/freemember.html
8. Google - http://www.google.com/addurl.html
9. Yahoo http://submit.search.yahoo.com/free/request (You must have an account)
10. Yahoo Directory Help - http://docs.yahoo.com/info/suggest/appropriate.html
11. Yahoo Express Submit TOS - https://ecom.yahoo.com/dir/express/terms (After reading the TOS for Yahoo, I would never submit my website to Yahoo and pay the $300.00 to due so- Everyone, I broke down and was forced to pay the $300.00. The website would not get past 30 for months and about 2 weeks after we paid it we are now number 10.)
12. Yahoo Submit Help - http://help.yahoo.com/help/us/dir/su...uggest-01.html
13. MSN - http://beta.search.msn.com/docs/submit.aspx?
If you have any tools that are not listed here and would like to add them, please do. This is only a partial list of the tools that I utilize. It would take way too long to provide everything. If you are looking for a specific tool let me know, chances are that I probably have it. (This is an advanced tool for finding out what you need to get your PR to the next level.) (Obfuscates your e-mail so spambots don't pick it up from the Internet)
Wayne
Monday, July 17, 2006
Google is Cloaking!
- By: Nick W [privmsg - website] On 7th Mar 2005 In Rumours & Scandal
- Source: Google Caught Cloaking - Keyword Stuffing Titles
A short while ago, Threadwatch member Adam_C discovered what for all appearances seems to be Google pulling dirty SEO tactics on it's own pages and thus going against it's own guidelines in an effort to rank highly within it's own results.
Cloaking
Cloaking is covered in Google's guidelines as something strictly not to do:
- Don't employ cloaking or sneaky redirects.
Although there is some debate within the SEO industry as to what exactly cloaking is, in it's simplest form it is showing one page to search engines, and a different page for users - much of the debate hinges on intent.
Here's how Google define it in the Google Webmaster FAQ
The term "cloaking" is used to describe a website that returns altered webpages to search engines crawling the site. In other words, the webserver is programmed to return different content to Google than it returns to regular users, usually in an attempt to distort search engine rankings. This can mislead users about what they'll find when they click on a search result. To preserve the accuracy and quality of our search results, Google may permanently ban from our index any sites or site authors that engage in cloaking to distort their search rankings.
Keyword stuffing
Keyword stuffing is, as you might expect, the practice of stuffing a page with the keywords you wish to rank for - without off page optimization it's worse than useless, but combined with incoming links, and cloaked to appear normal to visitors (they see a nicely worded page, search robots see the kw stuffed page) it can be highly effective.
So where do Google come into this?
If you look at this Adwords page on Google you'll see at the top of your browser, the title:
Google AdWords Support: How do I use the Traffic Estimator?
That's what normal visitors like you and me will see when visiting the page.
Now have a look at Google's cache of the same page - Notice the change in the title? It now reads:
traffic estimator, traffic estimates, traffic tool, estimate traffic Google AdWords Support ...
You think they want to rank for traffic estimates? I'd say they did...
Update: In the comments of this post, fishyking points out that the keyword stuffing has been done globally...
If true, what are the implications?
There is much debate around the way Google handles cloaking, in fact, many webmasters and SEO's feel thier is a need for a change in Google's official policy, but that's probably a discussion for another day.
For now, the implications are simple - If Google can do this on it's own pages, why can ordinary webmasters not? Google's keyword stuffed, cloaked title would be hard to describe as anything other than an SEO tactic not so much frowned upon, but full on hated by the Search giant itself.
Unless they can pull something out of the bag on this regarding an explanation, i'd say they've just been caught red handedly doing one of the very things they ban websites for, and consistently tell webmasters on forums and blogs not to do.
Friday, July 07, 2006
Mr know-it-all
Wednesday, July 05, 2006
Google Toolbar
Google toolbar is sending out private information about your site when installed in your computer. Google toolbar will let google know more about website traffic in general . Though, it may not harm, some of the seo's prefer not to use this tool. If you enable pagerank, it will start sending anonymous information to google .
Friday, June 30, 2006
Traffic Estimator
Traffic estimator:
http://www.seomoz.org/tools.php
http://www.webfooted.net/keywordstats.php
http://www.seochat.com/seo-tools/keyword-difficulty/
http://inventory.overture.com/d/searchinventory/suggestion/
http://wordtracker.com
Interview with Google Sitemap Team
This is another compilation about Google Sitemap. I hope you guys pick something from this:
Google's Sitemaps Team, interviewed in January 2006, provides great insights and information about the Sitemaps program, crawling and indexing in general, handling of vanished pages (404 vs. 410) and the URL removal tool, and valuable assistance on many frequently asked questions. Matt Cutts chimed in and stated '...It's definitely a good idea to join Sitemaps so that you can be on the ground floor and watch as Sitemaps improves'. This interview is a must-read for Webmasters.
Google Sitemaps KB · Index · Expand · Web Feed
Previous PageAbout the Google Sitemaps Knowledge Base Project
Canonical server name issuesNext Page
Google Sitemaps was launched in June 2005 to enhance Google's Web crawling process in cooperation with Webmasters and site owners. Collaborative crawling takes Webmasters into the boat to some degree, and both sides did learn a lot from each other in the last months. Google's Sitemaps Team does listen to Joe Webmaster's needs, questions, and suggestions. They have implemented a lot of very useful features based on suggestions in the Google Sitemaps Group, an open forum where members of the Sitemaps team communicate with their users, handing out technical advice even on weekends. The nickname Google Employee used by the Sitemaps team makes it regularly on the list of This month's top posters.
The Sitemaps community, producing an average of 1,500 posts monthly, suffered from repetitive topics diluting the archives. As the idea of a Google Sitemaps Knowledge Base was born in the group, I've discussed the idea with the Sitemaps team. Vanessa Fox, who is blogging for the Sitemaps team from Kirkland, Washington, suggested doing "an e-mail interview to answer some of the more frequently asked questions", so here we are.
Vanessa, thank you for taking the time to support the group's knowledge base project. Before we discuss geeky topics which usually are dull as dust, would you mind to introduce the Sitemaps team? I understand that you're an international team, your team members are working in Kirkland, Mountain View, and Zürich on different components of the Google Sitemaps program. Can you tell us who is who in your team?
Vanessa: You're right. Our team is located in offices around the globe, which means someone on the team is working on Sitemaps nearly around the clock. A few team members were able to take some time to answer your questions, including Shiva Shivakumar, engineering director who started the Google Sitemaps project (and whose interview with Danny Sullivan you may have seen when we initially launched), Grace and Patrik from our Zurich office, Michael and Andrey from our Kirkland office, and Shal from our Mountain View office. We also got Matt Cutts to chime in.
My amateurish try to summarize the Google Sitemaps program is "Aimed crawling makes Google's search results fresh, and Webmasters happy". How would you outline your service, its goals, intentions, and benefits?
Our goal is two-way communication between Google and webmasters. Google Sitemaps is a free tool designed so webmasters can let us know about all the pages on their sites and so we can provide them with detailed reports on how we see their sites (such as their top Google search queries and URLs we had trouble crawling).
We can reach many pages that our discovery crawl cannot find and Sitemaps convey some very important metadata about the sites and pages which we could not infer otherwise, like the page's priority and refresh cycle. In particular, the refresh cycle should allow us to download pages only when they change and thus reduce needless downloads, saving bandwidth.
You've announced collaborative crawling as "an experiment called Google Sitemaps that will either fail miserably, or succeed beyond our wildest dreams, in making the web better for webmasters and users alike". Eight months later I think it's safe to say that that your experiment has been grown up to a great success. How much has your project contributed to the growth, freshness and improved quality of Google's search index?
We've had a huge response from webmasters, who have submitted a great deal of high-quality pages. Many pages would have never found through our usual crawl process, such as URLs with content found behind forms or locked up in databases or behind content management systems.
Also we have many clients supporting Sitemaps natively (many are listed at http://code.google.com/sm_thirdparty.html), in addition to our initial Open Source python client. We are working with more such clients to automatically support Sitemaps natively, and larger websites to compute these automatically as well.
Also, Michael points out "As some of you may have noticed, Sitemaps has also served as an...uhm...impromptu stress test of different parts of the Google infrastructure, and we're working hard to fix those parts."
A major source of misunderstandings and irritations is the common lack of knowledge on how large IT systems -- especially search engines -- work. This leads to unrealistic expectations and rants like "Google is broke because it had fetched my sitemap, the download status shows an OK, but none of my new pages appear in search results for their deserved keywords".
Matt's recent article How does Google collect and rank results? sheds some light on the three independent processes crawling, indexing, and ranking in response to user queries, and I've published some speculations in my Sitemaps FAQ. Can you provide us with a spot on description of the process starting with a Sitemap download, its validation and passing of its URLs to the crawling engine, which sends out the Googlebots to fetch the files and hand them over to the indexer? I'm sure such an anatomic insight would help Sitemaps users to think in realistic time tables.
Your description is pretty close. Sitemaps are downloaded periodically and then scanned to extract links and metadata. The valid URLs are passed along to the rest of our crawling pipeline -- the pipeline takes input from 'discovery crawl' and from Sitemaps. The pipeline then sends out the Googlebots to fetch the URLs, downloads the pages and submits them to be considered for our different indices.
Obviously you can't reveal details about the scores applied to Web sites (besides PageRank) which control priorities and frequency of your (sitemap based as well as regular) crawling and indexing, and -- due to your efforts to ensure a high quality of search results and other factors as well -- you cannot guarantee crawling and indexing of all URLs submitted via Sitemaps. However, for a quite popular and reputable site which meets your quality guidelines, what would you expect as the best/average throughput, or time to index, for both new URLs and updated content as well?
You're right. Groups users report being indexed in a matter of days or weeks. Crawling is a funny business when you are crawling several billions of pages -- we need to crawl lots of new pages and refresh a large subset of previously crawled pages periodically as well, with finite capacity. So we're always working on decreasing the time it takes to index new information and to process updated information, while focusing on end user search results quality. Currently Sitemaps feeds the URL and metadata into the existing crawling pipeline like our discovered URLs.
Matt adds "it's useful to remember that our crawling strategies change and improve over time. As Sitemaps gains more and more functionality, I wouldn't be surprised to see this data become more important. It's definitely a good idea to join Sitemaps so that you can be on the 'ground floor' and watch as Sitemaps improves."
My experiments have shown that often a regular crawl spots and fetches fresh content before you can process the updated Sitemap, and that some URLs harvested from Sitemaps are even crawled when the page in question is unlinked and its URL cannot be found elsewhere. Also, many archived pages lastly crawled in the stone age get revisited all of a sudden. These findings lead to two questions.
First, to what degree can a Google Sitemap help to direct Googlebot to updates and new URLs on well structured sites, when the regular crawling is that sophisticated? Second, how much of the formerly 'hidden Web' -- especially unhandily linked contents -- did you discover with the help of Sitemaps, and what do you do with unlinked orphan pages?
Sitemaps offer search engines with precise information than can be found through discovery crawling. All sites can potentially benefit from Sitemaps in this way, particularly as the metadata is used in more and more ways.
Grace says: "As for the 'hidden Web', lots of high quality pages have indeed been hidden. In many high quality sites that have submitted Sitemaps, we now see 10-20 times as many pages to consider for crawling."
Some Webmasters fear that a Google Sitemap submission might harm their positioning, and in the discussion group we can often read postings asserting that Google has removed complete Web sites from the search index shortly after a Sitemap submission. My standard answer is "don't blame the Sitemap when a site gets tanked", and looking at most of the posted URLs the reasons causing invisibility on the SERPs become obvious at first glance: the usual quality issues. However, in very few of the reported cases it seems possible that a Sitemaps submission could result in a complete wipe-out or move to the supplemental index, followed by heavy crawling and fresh indexing in pretty good shape after a while.
Machine readable mass submissions would allow a few holistic quality checks before the URLs are passed to the crawling engine. Do you handle URLs harvested from XML sitemaps other than URLs found on the Web or submitted via the Add-Url page? Do mass submissions of complete sites speed up the process of algorithmic quality judgements and spam filtering?
Sitemap URLs are currently handled in the same way as discovery URLs in terms of penalties. If a site is penalized for violating the webmaster guidelines, that penalty would apply whether Googlebot followed links from a Sitemap, as part of the regular discovery crawl, or from the Add URL page.
Many users expect Google Sitemaps to work like Infoseek submissions in the last century, that is instant indexing and ranking of each and every URL submission. Although unverified instant listings are technically possible, they would dilute the quality of any search index, because savvy spammers could flood the index with loads of crap in no time.
Experienced Webmasters and SEOs do read and understand your guidelines, play by the rules, and get their contents indexed. Blogs and other easy to use content management systems (CMS) brought millions of publishers to the Web, who usually can't be bothered with all the technical stuff involved.
Those technically challenged publishers deserve search engine traffic, but most probably they would never visit a search engine's Webmasters section. Natural publishing and networking leads to indexing eventually, but there are lots of pitfalls, for example the lack of search engine friendly CMS software and so many Web servers which are misconfigured by default.
What's your best advice for the novice Publisher not willing -- or not able -- to wear a Webmaster hat? What are your plans to reach those who don't get your message yet, and how do you think you can help to propagate a realistic management of expectations?
Our FAQ pages are a good starting place for all webmasters. For those who are using hosting or CMS systems they don't have a lot of experience with, Sitemaps can help alert them to issues they may not know about, such as problems Googlebot has had crawling their pages.
Michael adds that "our webmaster guidelines are intended to be readable and usable by non-experts. Create lots of good and unique content, don't try to be sneaky or underhanded, and be a bit patient. The Web is a very, very big place, but there's still a lot of room for new contributions. If you want to put up a new website, it may be helpful to think about how your website will be an improvement over whatever is already out there. If you can't think of any reasons why your site is special or different, then it's likely that search engines won't either, and that may be frustrating."
Many Web sites use eCommerce systems and CMS software producing cluttered non-standard HTML code, badly structured SE-unfriendly navigation, and huge amounts of duplicated textual content available from various URLs. In conjunction with architectural improvements like SE-friendly cloaking to enhance such a site's crawl-ability, a Google Sitemap will help to increase the number of crawled and indexed pages. In some cases this may be a double-edged sword, because on formerly rarely indexed sites a full crawl may reveal unintended content duplication, which leads to suppressed search results caused by your newer filters.
What is your advice for (large) dynamic sites suffering from sessionIDs and UI specific query string parameters, case issues in URLs, navigation structures which create multiple URLs pointing to the same content, excessively repeated text snippets, or thin product pages without unique content except of the SKU and a picture linked to the shopping cart? Will or do you use the (crawling) priority attribute to determine whether you index a URL from the sitemap, or a variant -- with similar or near duplicated content -- found by a regular crawl?
Everything that applies to regular discovery crawling of sites applies to pages listed in Sitemaps. Our webmaster guidelines provide many tips about these issues. Take a look at your site in a text-only browser. What content is visible?
As for dynamic pages that cause duplicate content listings in the Sitemap, make sure that a version of each page exists that doesn't include things like a session ID in the URL and then list that version of the page in your Sitemap.
Make sure that the Sitemap doesn't include multiple versions of the same page that differ only in session ID, for instance.
If your site uses a content management system or database, you probably want to review your generated Sitemap before submitting to make sure that each page of your site is only listed once. After all, you only want each page listed once in the search results, not listed multiple times with different variations of the URL.
Google is the first major search engine informing Webmasters -- not only Sitemaps users -- about the crawlability of their Web sites. Besides problem reports your statistics even show top search queries, the most clicked search terms, PageRank distribution, overviews on content types, encoding and more. You roll out new reports quite frequently, what is the major source of your inspiration?
We look at what users are asking for in the discussion groups and we work very closely with the crawling and indexing teams within Google.
Andrey adds, "if we find a particular statistic that is useful to webmasters and doesn't expose confidential details of the Google algorithms, we queue that up for release."
On the error pages you show a list of invalid URLs, for example 404 responses and other errors. If the source is "Web", that is the URL was found on the site or anywhere on the Web during the regular crawling process and not harvested from a Sitemap, it's hard to localize the page(s) carrying the dead link. A Web search does not always lead to the source, because you don't index every page you've crawled. Thus in many cases it's impossible to ask the other Webmaster for a correction.
Since Google knows every link out there, or at least should know the location where an invalid URL was found in the first place, can you report the sources of dead links on the error page? Also, do you have plans to show more errors?
We have a long list of things we'd love to add, but we can only work so fast. :) Meanwhile, we do read every post in the group to look for common requests and suggestions, so please keep them coming!
HTTP/1.1 introduced the 410-Gone response code, which is supported by the newer Mozilla-compatible Googlebot, but not by the older crawler which still does HTTP/1.0 requests. The 404-Not found response indicates that the requested resource may reappear, so the right thing to do is responding with a 410 error if a resource has been removed permanently.
Would you consider it safe with Google to make use of the 410 error code, or do your prefer a 404 response and want Webmasters to manually remove outdated pages with the URL console, which scares the hell out of most Webmasters who fear to get their complete site suspended for 180 days?
Webmasters can use either a 404 or 410. When Googlebot receives either response when trying to crawl a page, that page doesn't get included in the refresh of the index. So, over time, as the Googlebot recrawls your site, pages that no longer exist should fall out of our index naturally.
Webmasters shouldn't fear our automated URL removal tool. We realize that the description says if you use the tool, your site will be removed for 180 days and we are working to get this text updated. What actually happens is that we check to make sure the page you want removed doesn't exist or that you've blocked it using a robots.txt file or META tags (depending on which removal option you choose). If everything checks out OK, we remove the page from our index. If later on, you decide you do want the page indexed and you add it back to your site or unblock it, we won't add it back to our index for at least 180 days. That's what the warning is all about. You need to be sure you really don't want this page indexed.
But I would suggest that webmasters not bother with this tool for pages that no longer exist. They should really only use it for pages that they meant to block from the index with a robots.txt file or META tags originally. For instance, if a hypothetical webmaster has a page on a site that lists sensitive customer information and that hypothetical webmaster is also somewhat forgetful and doesn't remember to add that page to the robots.txt file, the sensitive page will likely get indexed. That hypothetical forgetful webmaster would then probably want to add the page to the site's robots.txt file and then use the URL removal tool to remove that page from the index.
As you mentioned earlier, webmasters might also be concerned about pages listed in the Site Errors tab (under HTTP errors). These are pages that Googlebot tried to crawl but couldn't. These pages are not necessarily listed in the index. In fact, if the page is listed in the Errors tab, it's quite possible that the page isn't in the index or will not be included in the next refresh (because we couldn't crawl it). Even if you were to remove this using the URL removal tool, you still might see it show up in the Errors tab if other sites continue to link to that page since Googlebot tries to crawl every link it finds.
We list these pages just to let you know we followed a link to your site and couldn't crawl the page and in some cases that's informational data only. For these pages, check to see if they are pages you thought existed. In this case, maybe you named the page on your site incorrectly or you listed it in your Sitemap or linked to it incorrectly. If you know these pages don't exist on your site, make sure you don't list them in your Sitemap or link to them on your site. (Remember that if we tried to crawl a page from your Sitemap, we indicate that next to the URL.) If external sites are linking to these non-existent pages, you may not be able to do anything about the links, but don't worry that the index will be cluttered up with these non-existent pages. If we can't crawl the page, we won't add it to the index.
Although the data appearing in the current reports aren't that sensitive (for now), you've a simple and effective procedure in place to make sure that only the site owner can view the statistics. To get access to the stats one must upload a file with a unique name to the Web server's root level, and you check for its existence. To ensure this lookup can't be fooled by a redirect, you also request a file which should not exist. The verification is considered successful when the verification URL responds with a 200-Ok code, and the server responds to the probe request with a 404-Not found error. To enable verification of Yahoo stores and sites on other hosts with case restrictions you've recently changed the verification file names to all lower case.
Besides a couple quite exotic configurations, this procedure does not work well with large sites like AOL or eBay, which generate a useful page on the fly even if the requested URI does not exist, and it keeps out all sites on sub-domains, where the Webmaster or publisher can't access the root level, for example free hosts, hosting services like ATT, and your very own Blogger Web logs.
Can you think of an alternative verification procedure to satisfy the smallest as well as the largest Web sites out there?
We are actively working improving the verification process. We know that there are some sites that have had issues with our older verification process, and we updated it to help (allowing lowercase verification filenames), but we know there are still users out there who have problems. We'd love to hear suggestions from users as to what would or wouldn't work for them!
In some cases verification requests seem to stick to the queue, and every once in a while a verfied site falls back in the pending status. You've posted that's an inconvenience you're working on, can you estimate when delayed verifications will be the news of yesterday?
We've been making improvements in this area, but we know that some webmasters are still having trouble and it's very frustrating. We are working on a complete resolution as quickly as we can. We have sped up the time from "pending" to "verified" and you shouldn't see issues with verified sites falling back into pending.
If your site goes from pending to not verified, then likely we weren't able to successfully process the verification request. We are working on adding error details so you can see exactly why the request wasn't successful, but until we have that ready, if your verification status goes from "pending" to "not verified", check on the following as these are the most comment problems we encounter:
* that your verification file exists in the correct location and is named correctly
* that your webserver is up and responding to requests when we attempt the verification
* that your robots.txt file doesn't block our access to the file
* that your server doesn't return a status of 200 in the header of 404 pages
Once you've checked these things and made any needed changes, click Verify again.
The Google Sitemaps Protocol is sort of a "robots inclusion protocol" respecting the gatekeeper robots.txt, standardized in the "robots exclusion protocol". Some Sitemaps users are pretty much confused by those mutually exclusive standards with regard to Web crawler support. Some of their suggestions like restricting crawling to URIs in the Sitemap make no sense, but adding a way to remove URIs from a search engine's index for example is a sound idea.
Others have suggested to add attributes like title, abstract, and parent respectively level. Those attributes would allow Webmasters to integrate the XML sitemaps (formatted by XSLT stylesheets) in the user interface. I'm sure you've gathered many more suggestions, do you have plans to change your protocol?
Because the protocol is an open standard, we don't want to make too many changes. We don't want to disrupt those who have adopted it. However, over time, we may augment the protocol in cases where we find that to be useful. The protocol is designed to be extensible, so if people find extended uses for it, they should go for it.
You've launched the Google Sitemaps Protocol under the terms of the Attribution-ShareAlike Creative Commons License, and your Sitemaps generator as open source software as well. The majority of the 3rd party Sitemaps tools are free too.
Eight months after the launch of Google Sitemaps, MSN search silently accepts submissions of XML sitemaps, but does not officially support the Sitemaps Protocol. Shortly after your announcement Yahoo started to accept mass submissions in plain text format, and has added support of RSS and ATOM feeds a few months later. In the meantime each and every content management system produces XML Sitemaps, there are tons of great sitemap generators out there, and many sites have integrated Google Sitemaps individually.
You've created a huge user base, are you aware of other search engines planning to implement the Sitemaps Protocol?
We hope that all search engines adopt the standard so that webmasters have one easy way to tell search engines comprehensive information about their sites. We are actively helping more content management systems and websites to support Sitemaps. We believe the open protocol will help the web become cleaner with regard to crawlers, and are looking forward to widespread adoption.
Is there anything else you'd like to tell your users, and the Webmasters not yet using your service as well?
We greatly appreciate the enthusiastic participation of webmasters around the world. The input continues to help us learn what webmasters would most like and how we can deliver that to them.
Shiva says that "eight months back we released Sitemaps as an experiment, wondering if webmasters and client developers would just ignore us or if they'd see the value in a more open dialogue between webmasters and search engines. We're not wondering that anymore. With the current rate of adoption, we are now looking at how to work better with partners and supporting their needs so that more and more webmasters with all kinds of hosting environments can take advantage of what Sitemaps can offer."
And Michael wants to remind everyone that "We really do read every post on the Sitemaps newsgroup and we really want people to keep posting there."
Thanks, Sebastian, for giving us this opportunity to talk to webmasters. That's what the Sitemaps project is all about.
Using a robots.txt File
Using a robots.txt File
The robots.txt file is a good way to prevent this page from getting indexed. However, not every site can use it. The only robots.txt file that the spiders will read is the one at the top html directory of your server. This means you can only use it if you run your own domain. The spiders will look for the file in a location similar to these below:
http://www.pageresource.com/robots.txt
http://www.javascriptcity.com/robots.txt
http://www.mysite.com/robots.txt
Any other location of the robots.txt file will not be read by a search engine spider, so the file locations below will not be worthwhile:
http://www.pageresource.com/html/robots.txt
http://members.someplace.com/you/robots.txt
http://someisp.net/~you/robots.txt
Now, if you have your own domain- you can see where to place the file. So let's take a look at exactly what needs to go into the robots.txt file to make the spider see what you want done.
If you want to exclude all the search engine spiders from your entire domain, you would write just the following into the robots.txt file:
User-agent: *
Disallow: /
If you want to exclude all the spiders from a certain directory within your site, you would write the following:
User-agent: *
Disallow: /aboutme/
If you want to do this for multiple directories, you add on more Disallow lines:
User-agent: *
Disallow: /aboutme/
Disallow: /stats/
If you want to exclude certain files, then type in the rest of the path to the files you want to exclude: