Web Crawler | Web spider | Web Robot

Imalka Prasadini
3 min readJun 6, 2024

--

A web crawler, web spider, or web bot is used in web crawling. Web crawlers are used as tools in data collecting and web scraping. Simply put, crawlers search and download web pages automatically.

Web Spider | Crawler | Bot
Source

What is Crawler?

A crawler is a computer program or script that browses www methodically and automatically. This process is called crawling. Search engines are used to search through the internet and build indexes.

How it works?

Let’s see how it works briefly.

The starting URL set is called seeds. As the first step, these will be added to Frontier which is the request URL list that needs to be downloaded. This is organized as a standard queue, alternatively, the most important ULRs will come front and be downloaded earlier.

In the middle of crawling, if the crawler finds a new URL that was not visited earlier, it will be added to the frontier. it will be visited based on importance. This process will be repeated according to policies until the queue is empty.

Mainly it uses, two strategies to crawl.

  1. Breadth First — start crawling from seeds and proceed.
BFS
BFS

2. Depth First — start crawling from the root and traversal through child nodes and proceed.

DFS
DFS

These are used to understand the topology of the www when crawling.

Use Cases

crawlers are used in many areas like, sentiment analysis, market research, consumer monitoring, price compression, affiliate marketing, stock markets AL ML, and more. Google, Bond, and DuckDuckBot are examples of crawlers.

Conclusion

A crawler is more similar to a librarian. It looks on the web, assigns data into certain categories, and then indexes or categorizes them as per requirement. So that, this crawled information can be retrievable and evaluated.

References

https://en.wikipedia.org/wiki/Web_crawler#:~:text=Today%2C%20relevant%20results%20are%20given,scraping%20and%20data%2Ddriven%20programming.

--

--

No responses yet