Introduction

The goal of Charted Sea is to scrape websites with multiple web browsers running in parallel. After scraping tasks are completed, web browsers stay open, allowing running new scraping tasks without having to re-open new pages, saving time and proxy traffic.

The following diagram illustrates how the service works:

Architecture

Users interact with the application through an HTTP gateway, to allow them to run scraping tasks.

info

See the API Reference for more details.

When running scraping tasks, the HTTP gateway put them into the pending queues (one queue per country) and wait for their completion by listening to the result queues.

The scrapers manage multiple web browsers distributed on multiple servers. They listen to the pending queues and automatically execute scraping tasks. When completed (success or blocked more than 3 times), the scraping task results are put into the result queues.