crawlers

A crawler is a program used by search engines to collect data from the internet. When a crawler visits a website, it picks over the entire website�s content (i.e. the text) and stores it in a databank. It also stores all the external and internal links to the website. The crawler will visit the stored links at a later point in time, which is how it moves from one website to the next. By this process, the crawler captures and indexes every website that has links to at least one other website.

There are hundreds of web crawlers and bots scouring the internet but below is a list of 10 popular web crawlers and bots.

1. GoogleBot
Googlebot is obviously one of the most popular web crawlers on the internet today as it is used to index content for Google�s search engine. One great thing about Google�s web crawler is that they give us a lot of tools and control over the process.

2. Bingbot
Bingbot is a web crawler deployed by Microsoft in 2010 to supply information to their Bing search engine. This is the replacement of what used to be the MSN bot.

3. Slurp Bot
Yahoo Search results come from the Yahoo web crawler Slurp and Bing�s web crawler, as a lot of Yahoo is now powered by Bing. Sites should allow Yahoo Slurp access in order to appear in Yahoo Mobile Search results.

4. DuckDuckBot
DuckDuckBot is the Web crawler for DuckDuckGo, a search engine that has become quite popular lately as it is known for privacy and not tracking you. It now handles over 12 million queries per day. DuckDuckGo gets its results from over four hundred sources. These include hundreds of vertical sources delivering niche Instant Answers, DuckDuckBot (their crawler) and crowd-sourced sites (Wikipedia). They also have more traditional links in the search results, which they source from Yahoo!, Yandex and Bing.

5. Baiduspider
Baiduspider is the official name of the Chinese Baidu search engine�s web crawling spider. It crawls web pages and returns updates to the Baidu index. Baidu is the leading Chinese search engine that takes an 80% share of the overall search engine market of China Mainland.

6. Yandex Bot
YandexBot is the web crawler to one of the largest Russian search engines, Yandex. According to LiveInternet, for the three months ended December 31, 2015, they generated 57.3% of all search traffic in Russia.

7. Sogou Spider
Sogou Spider is the web crawler for Sogou.com, a leading Chinese search engine that was launched in 2004. As of April 2016 it has a rank of 103 in Alexa�s internet rankings. Note: The Sogou web spider does not respect the robots.txt internet standard, and is therefore banned from many websites because of excessive crawling.

8. Exabot
Exabot is a web crawler for Exalead, which is a search engine based out of France. It was founded in 2000 and now has more than 16 billion pages currently indexed.

9. Facebook External Hit
Facebook allows its users to send links to interesting web content to other Facebook users. Part of how this works on the Facebook system involves the temporary display of certain images or details related to the web content, such as the title of the webpage or the embed tag of a video. The Facebook system retrieves this information only after a user provides a link.

10. Alexa Crawler
Ia_archiver is the web crawler for Amazon�s Alexa internet rankings. As you probably know they collect information to show rankings for both local and international sites.

« Back to Glossary Index

Related Terms:

Platform

Customers

Resources

Company

Blog