Indexing and working process of search engine

Page 1

Indexing and Working Process of Search Engines

Presentation by: ADMEC MULTIMEDIA INSTITUTE www.admecindia.co.in


The first basic truths, that’s you need to understand in SEO that search engines are not a human.

 While this might be obvious for everybody, the differences between how humans and search engines view web pages aren't. Search engines are text-driven, voice driven and image driven.  Although now a day’s technology advances rapidly grow, search engines are far from intelligent creatures that can feel the beauty of a cool design or enjoy the sounds and movement in movies.  Instead, search engines crawl the web pages, looking at particular site content (mainly text) to get an idea about a site.



Firstly, search engines crawl the website to see what is on the website. This task is performed by software, called a crawler or a spider. Spiders go to website and follow links from one page to another and index all things, whatever they find on their way. More than 20 billion pages on the web available, so it is impossible for a spider to visit all site daily just to see if a new pages is added or any existing page is modified on the web. So it may be possible that crawlers may not end up visiting your site for a month or two.

CrawlingCrawling is a process by which search engines discover publicly available web pages. Google uses software name “web crawlers� for crawling. The crawl process begins with a list of web address from past crawls and sitemaps provided by website owners.


What you can do is to check what a crawler sees from your site. As above mentioned, crawlers are not humans and they do not see images, Flash movies, JavaScript, frames, password-protected pages and directories, so if you have added these on your site, you'd better run the Spider Simulator below to see if these goodies are viewable by the spider. If they are not viewable, they will not be spidered, not indexed, not processed, etc. - in a word they will be non-existent for search engines.

SpiderSpider is a program (set of instructions) that automatically fetches Web pages. Spiders are used to feed pages to search engines. It's crawls over the Web, so it’s called spider. Another term for these programs is known as WebCrawler. Example: Name of Google Spider is “Googlebot”. Name of Bing Spider is “Bingbot”. Name of Alta Vista Spider is “Scooter”.


a) When page is crawled by crawler the next step is to index its all the content. b) The index page stored in a giant database, from where it can be access or retrieved later as per requirement. c) Essentially, the process of indexing is identifying the words that best describe the page and provides the page to particular keywords which search on the web. d) So typical work is very difficult for a human to process such amounts of information but generally search engines manage just fine with this task within a few time. e) Sometimes search engine not get the meaning of a page right but if we help them by optimizing it, it will be easier for to search engine to classify your pages correctly and for you – to get higher rankings and better results.



When anybody Query anything in search engine, the search engine processes it – i.e. it compares the search keywords or string in the search request with the indexed pages in the stored database. Since it is likely that more than one page (practically it is millions of pages) contains the search string or keyword, the search engine starts calculating the relevancy of each of the pages in its index as the keywords or string searched and provides best result after calculating the relevancy.

1. The Web server sends the query to the index servers. The content inside the index server is similar to the index in the back of a book-it tells which pages contain the words that match the query. 2. The query travels to the doc servers , which actually retrieve the stored documents. Snippets are generated to describe each search result. 3. The search result are returned to the user in a fraction of a second.


ADMEC MULTIMEDIA INSTITUTE For More information you can visit :

http://www.admecindia.co.in Or email : info@admecindia.co.in

Contact Us: ADMEC MULTIMEDIA INSTITUTE C-7/114, IInd Floor, Sector- 7, Rohini, Delhi- 85 Landmark: Near Rohini East Metro Station Helpline 1: +91 9811 818 122 Helpline 2: +91 9911 782 350


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.