Thursday, April 3, 2008

The Invisible Web

i think you might be wondering what the hell is thus "Invisible Web????"

Many untrained users have the naive expectation that they can locate anything on the world wide web by using Google or Yahoo or Ask.com. No, as powerful as these search engines are, they do not index everything on the world wide web. In fact, search engines index less than 10% of the entire web! That remaining 90% is called the "Invisible Web", or in other words, "The Cloaked Web" or "The Deep Web". This is the massive content that is publicly available, but hidden from regular search engines.

Indeed, this is a tough concept to grasp - that billions of web pages cannot be found by Google. But it's true, billions of pages are beyond the abilities of search engine cataloging. The robot "spiders" which scan and catalog the world wide web are limited... they cannot see nor index everything.

To better visualize this concept, let's start with some size estimates from Google.com, Yahoo.com, Cyberatlas, and MIT. These stats are current to Fall 2007:
  • Google.com indexes 12.5 billion public web pages.
  • 71 billion static web pages are publicly-available. These pages can easily be found by Google and other search engines. (e.g. www.honda.com, www.australia.gov.au)
  • 6.5 billion static pages are hidden from the public. As private intranet content, these are the corporate pages that are only open to employees of specific companies. (e.g. employees.honda.com, secure.australia.gov.au)
  • 220+ billion database-driven pages are completely invisible to Google. These invisible pages are not the regular web pages you and I can make. Rather, these are dynamic database reports that exist only when called from large databases.
    (e.g. custom online car quote for Shelly, Australian government discussion on aboriginal taxation)
Google, considered the best search database today, can only catalog a fraction of this monstrous content. Even with electronic spiders to catalog millions of web pages each week, Google current indexes only 12.5 billion out of the 220+ billion pages out there...less than 6% of all available internet content.

So if Google only catalogs 6% of the World Wide Web, and other search engines catalog even less, then where is the remaining 90%of web content hidden?
Invisible Web

No comments :