How search engine robots work?

Search Engine Robots Search engines are the key to finding specific information on the vast expanse of the World Wide Web. Without the use of sophisticated search engines, it would be virtually impossible to locate anything on the Web.

How Search Engine Robots Work ?

Search Engines do not actually search the internet each time somebody types in a search query. This would take far too long. Instead, what they do is search through their databases of web sites they have already indexed. The Search Engine robots find pages that are linked to from other pages that they already know about or pages that are submitted to them. When a web page is submitted to a search engine, the url is added to the search engine bots queue of websites to visit. Even if you don't directly submit a website, or the web pages within a website, most robots will find the content within your website if other websites link to it. That is a part of the process referred to as building reciprocal links. This is one of the reasons why it is crucial to build the link popularity for a website, and to get links from other relevant sites back to yours. It should be part of any website marketing strategy you opt in for.

The search engine databases update at varying times. Once a website is in the search engine database, the bots will keep visiting it regularly, so as to pick up any changes that are made to the websites pages, and to ensure they have the most current data. The number of times a website is visited will depend on how the search engine sets up its visits, which can vary per search engine. However, the more active a website, the more often if will get visited. If a website varies frequently, the search engine will send bots more often. This is also true if the website is extremely popular, or heavily trafficked.

Sometimes bots are unable to access the website they are visiting. If a website is down, the bot may not be able to access the website. When this happens, the website may not be re-indexed, and if it happens repeatedly, the website may drop in the rankings.

Types of Search Engines

There are basically two types of search engines which gather their listings in different ways.

Crawler Based Search Engines - Crawler based search engines, such as Google, create their listings automatically. The Google Robot crawls or spiders the web, then people search through what they have found.

Human Powered Search Engines - A human powered directory, such as the Open Directory, depends on humans for its listings. You submit a short description to the directory for your entire site, or editors write one for sites they review. A search looks for matches only in the descriptions submitted.

How do you identify a Search Engine Robot?

Search engines send out what are called spiders, crawlers or robots to visit your site and gather web pages. These Search Engine robots leave traces behind in your access logs, just as an ordinary person does. If you know what to look for, you can tell when a spider has come to call. That can save you worrying that you haven't been visited. You can tell exactly what a robot has recorded or failed to record. You can also spot robots that may be making a large number of requests, which can affect your page impression statistics or even burden your server.

A better way of spotting spiders is to look for their agent names, or what some people call browser names. Spiders or search engine robots have their own names, just like browsers. For example, Netscape identifies itself by saying Mozilla. Alta Vista's spider says Scooter, while yahoo's spider is named Slurp.

Search Engine Robots Crawling Problems

Search engine robots follow standard links with slashes, but dynamic pages, generated from databases or content management systems, have dynamic URLs with question marks (?) and other command punctuation such as &, %, + and $. Search Engine Robots find difficult to crawl such dynamic sites as they include these blocking parameters in url's. The simplest Search Engine Optimisation solution is to generate static pages from your dynamic data and store them in the file system, linking to them using simple URLs. Site visitors and robots can access these files easily. This also removes a load from your back end database, as it does not have to gather content every time someone wants to view a page.

Search engine robots sometimes have problems finding pages on the web. Spidering issues can be caused by Macromedia Flash sites which are coded with image based language, rather than a text based language, which the search engine robots can't read. Search engine robots can have difficulty in penetrating Javascript navigation menus as well.

Conclusion

Search engine robots are your friends. They ensure that your site is seen by potential customers. The search results decide the fate of a search engine. Different search engines try to cater to different users. Search engine technology is evolving every day and new researches are carried out to provide more concept and descriptive based search queries. However, the same theory applies - "The search engine, which provides the most relevant results, will rule".

About the author:

S Prema is Search Engine Optimisation Executive for UK-based internet marketing company, Star Internet Ltd. Cients of Star Internet benefit from a range of services designed to maximise ROI from internet marketing activities. To find out more, visit http://www.affordable-seo.co.uk

Schewanick Computer Services

"Bringing Service Back to Technology..."

Search

Categories

Related Links