Learning About Search Engines

Learning About Search Engines

Whether you’re selling a product or service online or just looking for a venue to share your ideas and writings, as a webmaster, one of your goals has got to be reaching as large an audience as possible. But with the millions (or billions) of sites populating the World Wide Web, getting yourself noticed can be a challenge.

Conversely, if you happen to be one of those surfing the Internet for products/services or information, then sifting through all those gazillions of sites can also be quite a daunting task.

That’s where search engines come in.

Simply put, search engines (or SEs) are searchable indexes of web sites (and their pages). SEs don’t really search the World Wide Web directly, but rather the databases of the full text of web pages residing on servers. When you use an SE to ‘search’ the web, what you actually get is a stale copy of the real web page. It is only when you click on the links provided in an SE’s search results page that the current version of the page is retrieved from the server.

There are two types of SEs, depending on the way they gather the information needed to index the sites.

Human Indexed Search Engines are more commonly referred to as Search Directories which, as the name suggests, gather information manually. This means, that actual persons examine the sites submitted to them and position them in their index according to some predefined criteria. Examples of these are: LookSmart and dmoz – open directory project.

Spider Indexed Search Engines on the other hand, are the more accurate claimants to the term SE. As the name suggests, they build databases and index web sites automatically using computer robot programs called “spiders.”


Although people say spiders ‘crawl’ the web, they actually just stay in one place and find the pages for potential inclusion by following the links found in pages that are already in their database. So if your web pages aren’t linked to by other pages, the spiders are unable to find them. The only way you can get into search engines’ databases is by submitting your URL to them.

After being ‘spidered’, your web pages are then passed on to another computer program for ‘indexing.’ All texts, links and other content in your page are identified and stored in the database, and users arel then able to locate your page once they search this database and your content matches their search criteria (through keywords).

Different search engines have their own spiders and rank sites (pages) based on a variety of factors (which may or may not be the same across SEs). The following is a comparison of some of the more popular ones:

Search Engine Spider Basis for Indexing Sites
Teoma Teoma_agent1 -uses Subject-Specific PopularitySM approach
-ranks pages based on the number of same-subject pages that reference it
-has Advanced Search tools that allow searchers to refine searches using specific criteria
-click for more info

Altavista Scooter -spiders entire site
-indexes both keywords and description meta tag fields as keywords
-higher priority to keywords:
1. near top of the page
2. appearing closer to each other on the page text
-adds up occurrences of keywords in the page for higher scoring
-click for more info

Google Googlebot -uses a PageRank system for the central basis of its indexer
-gives a higher priority to site linkage
-combines PageRank with text matching
-click for more info

msn search MSNBot -main listings on results page (under the “Web Pages” heading) come from pages found by Yahoo’s web crawling and from Yahoo’s content acquisition program
-change planned by end of 2004
-click for more info)

Yahoo Yahoo! Slurp -follows HREF links (not SRC)
-ranks results according to their relevance to a particular query
-analyzes web page text, title and description accuracy; source, associated links, and other unique document characteristics
-click for more info

What the above essentially means is that getting your site listed in one, does not necessarily mean you’re also listed in another. You need to submit your pages to each one individually.

Also, not all types of pages/links can be included in the SE database. Some are excluded because they cannot be ‘spidered.’ These pages that you don’t see in search engine results are collectively termed as the “Invisible Web.”

Next time, we will discuss some of the ways that webmasters can optimize their search engine ranking.


Similar Articles : HTML Tags in Search Engine Optimization, Keyword Selection in Search Engine Optimization, Learning About Search Engines, Link Popularity in Search Engine Optimization, Meta Tags in Search Engine Optimization, Promoting Your Web Hosting/E-commerce Web Site – Search Engine Optimization, Search Engine Optimization – Introduction, SEO (Search Engine Optimization) – Web Page Submission, Search Engine Optimization – What to Avoid, Search Engine-Specific Optimization, Site Design in Search Engine Optimization