Skip to main content

Introduction

The goal of this application is to scrape websites with multiple web browsers running in parallel. After scraping tasks are completed, web browsers pages stay open, allowing running new scraping tasks without having to re-open new pages, saving time and proxy traffic.

The following diagram illustrates how the application works:

Architecture

Users interact with the application through an HTTP gateway, to allow them to run scraping tasks.

info

See the API Reference for more details.

When running scraping tasks, the HTTP gateway put them into the pending queues (one queue per country) and wait for their completion by listening to the result queues.

The scraper manages multiple web browser pages via page slots. It listens to the pending queues and automatically executes scraping tasks. When completed (success or blocked more than 3 times), the scraping task results are put into the result queues.

info

See the Page Slot Management for more details.