Skip to main content

API Reference

info

Contact Marc Plouhinec to get an authentication token.

Table Of Content

Run Scraping Tasks

HTTP Method and URL

POST https://continuous-scraper.common.chartedapi.com/scraping-tasks/${scraper}/run?autoCancelAfterSec=1200&includeAllFields=false&waitForCompletion=true

Path Parameters

  • scraper: shopee or lazada.

Query Parameters

  • autoCancelAfterSec (default = 600): scraping execution timeout (only enabled if waitForCompletion=true).
  • includeAllFields (default = false): if false, only the most important fields are returned.
  • waitForCompletion (default = true): if true, wait for the scraping tasks to terminate. If false, return immediately (use this API with the uuids parameter to get the results).
warning

waitForCompletion=true requires maintaining an open HTTP connection to receive results after the scraping tasks complete. If this connection is closed prematurely, the ongoing scraping tasks will be cancelled.

Request Headers

  • Authorization: Bearer ${token}

Request Body Example

{
"requests": [
{
"url": "https://shopee.co.id/adidas-Runfalcon-3.0-Men's-Running-Shoes---Core-Black-i.234490784.25462132870?sp_atk=67836f0a-c0c8-40d4-a0a7-bd1998943cc9&xptdk=67836f0a-c0c8-40d4-a0a7-bd1998943cc9"
},
{
"url": "https://shopee.co.id/SEPATU-SNEAKERS-CASUAL-ADIDS-SPEZIAL-HANDBALL-BNIB-i.197850298.20595626884?sp_atk=ff613749-9cf1-4b40-ae8e-f76dd2279ba8&xptdk=ff613749-9cf1-4b40-ae8e-f76dd2279ba8"
},
{
"url": "https://shopee.co.th/api/v4/item/get_list",
"method": "POST",
"payload": {
"shop_item_ids": [
{
"item_id": 5289970960,
"shop_id": 237980476
},
{
"item_id": 23556386969,
"shop_id": 644345618
}
],
"source": "microsite_individual_product"
}
}
]
}

Request body fields

  • requests: An object with the following fields:
    • url: The API or page URL of the resource to scrap.
    • referrer (optional): The referrer to pass as a request header.
    • method (optional): The request method (GET or POST).
    • payload (optional): The request body (if method is POST).
  • cacheMaxAgeInSec (default = 0): If it exists and is fresh enough, return the successful scraping task result with the same URL.

In addition, you can also add scraper-specific parameters, for example:

{
"requests": [
{
"url": "https://shopee.co.id/api/v4/search/search_items?keyword=tshirt"
}
],
"productSearch_enrichUrlQuery_pageSize": 60,
"productSearch_crawlNextPages": true,
"productSearch_crawlNextPages_maxPages": 3,
"productSearch_crawlProductDetails": true
}

Response Body Example

[
{
"url": "https://shopee.co.id/SEPATU-SNEAKERS-CASUAL-ADIDS-SPEZIAL-HANDBALL-BNIB-i.197850298.20595626884?sp_atk=ff613749-9cf1-4b40-ae8e-f76dd2279ba8&xptdk=ff613749-9cf1-4b40-ae8e-f76dd2279ba8",
"status": "SUCCESS",
"responseBody": "{}"
},
{
"url": "https://shopee.co.id/adidas-Runfalcon-3.0-Men's-Running-Shoes---Core-Black-i.234490784.25462132870?sp_atk=67836f0a-c0c8-40d4-a0a7-bd1998943cc9&xptdk=67836f0a-c0c8-40d4-a0a7-bd1998943cc9",
"status": "ERROR",
"responseBody": null,
"errorMessage": "... explanation ..."
}
]

CURL Command Example

curl --header "Content-Type: application/json" \
--header "Authorization: Bearer $token" \
--request POST \
--data '{"requests":[{"url": "https://shopee.co.id/adidas-Runfalcon-3.0-Mens-Running-Shoes---Core-Black-i.234490784.25462132870?sp_atk=67836f0a-c0c8-40d4-a0a7-bd1998943cc9&xptdk=67836f0a-c0c8-40d4-a0a7-bd1998943cc9"}, {"url": "https://shopee.co.id/SEPATU-SNEAKERS-CASUAL-ADIDS-SPEZIAL-HANDBALL-BNIB-i.197850298.20595626884?sp_atk=ff613749-9cf1-4b40-ae8e-f76dd2279ba8&xptdk=ff613749-9cf1-4b40-ae8e-f76dd2279ba8"}]}' \
https://continuous-scraper.common.chartedapi.com/requests/shopee/run?autoCancelAfterSec=1200&includeAllFields=false

Run A Single Scraping Task

HTTP Method and URL

POST https://continuous-scraper.common.chartedapi.com/scraping-tasks/${scraper}/run-single?autoCancelAfterSec=1200&includeAllFields=false

Path Parameters

  • scraper: shopee or lazada.

Query Parameters

  • autoCancelAfterSec (default = 600): scraping execution timeout (only enabled if waitForCompletion=true).
  • includeAllFields (default = false): if false, only the most important fields are returned.
  • waitForCompletion (default = true): if true, wait for the scraping tasks to terminate. If false, return immediately (use this API with the uuids parameter to get the results).
warning

waitForCompletion=true requires maintaining an open HTTP connection to receive results after the scraping tasks complete. If this connection is closed prematurely, the ongoing scraping tasks will be cancelled.

Request Headers

  • Authorization: Bearer ${token}

Request Body Example

{
"url": "https://shopee.co.id/adidas-Runfalcon-3.0-Men's-Running-Shoes---Core-Black-i.234490784.25462132870?sp_atk=67836f0a-c0c8-40d4-a0a7-bd1998943cc9&xptdk=67836f0a-c0c8-40d4-a0a7-bd1998943cc9"
}

Request body fields

  • url: The API or page URL of the resource to scrap.
  • referrer (optional): The referrer to pass as a request header.
  • method (optional): The request method (GET or POST).
  • payload (optional): The request body (if method is POST).
  • cacheMaxAgeInSec (default = 0): If it exists and is fresh enough, return the successful scraping task result with the same URL.

In addition, you can also add scraper-specific parameters, for example:

{
"url": "https://shopee.co.id/api/v4/search/search_items?keyword=tshirt",
"productSearch_enrichUrlQuery_pageSize": 60,
"productSearch_crawlNextPages": true,
"productSearch_crawlNextPages_maxPages": 3,
"productSearch_crawlProductDetails": true
}

Response Body Example

{
"url": "https://shopee.co.id/SEPATU-SNEAKERS-CASUAL-ADIDS-SPEZIAL-HANDBALL-BNIB-i.197850298.20595626884?sp_atk=ff613749-9cf1-4b40-ae8e-f76dd2279ba8&xptdk=ff613749-9cf1-4b40-ae8e-f76dd2279ba8",
"status": "SUCCESS",
"responseBody": "{}"
}

CURL Command Example

curl --header "Content-Type: application/json" \
--header "Authorization: Bearer $token" \
--request POST \
--data '{"url": "https://shopee.co.id/adidas-Runfalcon-3.0-Mens-Running-Shoes---Core-Black-i.234490784.25462132870?sp_atk=67836f0a-c0c8-40d4-a0a7-bd1998943cc9&xptdk=67836f0a-c0c8-40d4-a0a7-bd1998943cc9"}' \
https://continuous-scraper.common.chartedapi.com/requests/shopee/run-single?autoCancelAfterSec=1200&includeAllFields=false

Read Scraping Tasks

HTTP Method and URL

GET https://continuous-scraper.common.chartedapi.com/scraping-tasks/${scraper}?limit=20&offset=0&uuids=398a3b4d-241c-4297-9e81-5c4c6ecce340,785ebeaf-ae2c-4453-908b-c7dac008fbf4

Path Parameters

  • scraper: shopee or lazada.

Query Parameters

  • limit (optional, default = 20): Maximum number of scraping tasks to return.
  • offset (default = 0): Number of scraping tasks to skip.
  • uuids (optional): Comma-separated list of scraping task UUIDs. Filter by the given scraping task UUIDs. Example: uuids=398a3b4d-241c-4297-9e81-5c4c6ecce340,785ebeaf-ae2c-4453-908b-c7dac008fbf4
  • urls (optional): Comma-separated list of scraping task URLs. Filter by the given scraping task URLs. Example: urls=https://shopee.co.id/api/v4/search/search_items?keyword=nike,https://shopee.co.id/api/v4/search/search_items?keyword=huawei
  • minPendingAt (optional): Only return scraping tasks that have been submitted after this date (ISO 8601 format).
  • maxPendingAt (optional): Only return scraping tasks that have been submitted before this date (ISO 8601 format).
  • includeAllFields (default = false): if false, only the most important fields are returned.
  • includeFields (optional): Comma-separated list of scraping task fields to return. Not that it overrides includeAllFields. Example: includeFields=uuid,url,status

Request Headers

  • Authorization: Bearer ${token}

Response Body Example

[
{
"uuid": "398a3b4d-241c-4297-9e81-5c4c6ecce340",
"url": "https://shopee.co.id/SEPATU-SNEAKERS-CASUAL-ADIDS-SPEZIAL-HANDBALL-BNIB-i.197850298.20595626884?sp_atk=ff613749-9cf1-4b40-ae8e-f76dd2279ba8&xptdk=ff613749-9cf1-4b40-ae8e-f76dd2279ba8",
"status": "SUCCESS",
"responseBody": "{}"
},
{
"uuid": "785ebeaf-ae2c-4453-908b-c7dac008fbf4",
"url": "https://shopee.co.id/adidas-Runfalcon-3.0-Men's-Running-Shoes---Core-Black-i.234490784.25462132870?sp_atk=67836f0a-c0c8-40d4-a0a7-bd1998943cc9&xptdk=67836f0a-c0c8-40d4-a0a7-bd1998943cc9",
"status": "ERROR",
"responseBody": null,
"errorMessage": "... explanation ..."
}
]

CURL Command Example

curl https://continuous-scraper.common.chartedapi.com/scraping-tasks/shopee?limit=20&offset=0 \
--header "Authorization: Bearer $token"

Cancel Scraping Tasks

HTTP Method and URL

POST https://continuous-scraper.common.chartedapi.com/scraping-tasks/${scraper}/cancel

Path Parameters

  • scraper: shopee or lazada.

Request Headers

  • Authorization: Bearer ${token}

Request Body Example

{
"uuids": [
"2199a83d-3ebe-442c-aa82-afd3a9903967",
"1ce16c7b-26f1-4851-a281-e09f79ce2de9",
"34d3449d-b5b9-46a5-a4e0-d524d2ab3222"
],
"countries": ["ID", "MY", "PH", "SG", "TH", "VN"]
}

Request body fields

  • uuids: Array of UUIDs of the scraping tasks to cancel.
  • countries: Array of countries to cancel.
info

Sub-scraping tasks are cancelled as well.

Response Body Example

{
"uuids": [
"2199a83d-3ebe-442c-aa82-afd3a9903967",
"1ce16c7b-26f1-4851-a281-e09f79ce2de9",
"34d3449d-b5b9-46a5-a4e0-d524d2ab3222",
"37744b6d-35fb-47c0-9c73-e9b7feb9f089",
"29d6b918-745b-4029-bdba-5f6489c43146"
],
"countries": ["ID"]
}

Response body fields

  • uuids: Array of cancelled scraping tasks (and sub-scraping tasks) UUIDs.
  • countries: Array of the countries related to cancelled scraping tasks.

CURL Command Example

curl --header "Content-Type: application/json" \
--header "Authorization: Bearer $token" \
--request POST \
--data '{"uuids":["2199a83d-3ebe-442c-aa82-afd3a9903967", "1ce16c7b-26f1-4851-a281-e09f79ce2de9"}' \
https://continuous-scraper.common.chartedapi.com/requests/shopee/cancel

Update Proxy Configuration

info

These operations are reserved for customers with a dedicated environment. If you're interested in a customized solution for large-scale operations, please contact us.

This section covers how to replace, add, or delete proxy configurations by optionally using dynamic placeholders in proxy URLs. These placeholders allow for automated generation of multiple URLs based on specified criteria.

Replace Proxy Configuration

Replace the entire proxy configuration for a specified country with a new set of proxy URLs.

HTTP Method and URL

PUT https://continuous-scraper.common.chartedapi.com/proxy-groups/${country}

Path Parameters

  • ${country}: ID, TW, VN, TH, PH, MY, SG, BR, MX, CO, CL

Request Headers

  • Authorization: Bearer ${token}

Request Body Example

{
"urls": [
"http://xxx:yyy@gw.dataimpulse.com:10000",
"http://xxx:yyy@gw.dataimpulse.com:10001"
]
}

Using Placeholders

You can use placeholders to dynamically generate proxy URLs. For example:

{
"urls": [
// Generate 9999 DataImpulse URLs
"http://uuu__cr.my:ppp@gw.dataimpulse.com:${rand(range=[10000,19999])}",

// Generate many LunaProxy URLs
"http://user-uuu-region-my-sessid-my${rand(len=16,use=[lowerCaseChar,digit])}-sesstime-30:ppp@as.e4f4xh3i.lunaproxy.net:12233",

// Generate many IPRoyal URLs
"http://uuu:ppp_country-my_session-${rand(len=8,use=[lowerCaseChar,upperCaseChar,digit])}_lifetime-30m@geo.iproyal.com:12321",

// Generate many BrightData URLs
"http://brd-customer-hl_uuu-zone-zzz-country-my-session-${rand(range=[1,70000000])}:ppp@brd.superproxy.io:22225",

// Generate many Oxylabs URLs
"http://customer-uuu-cc-MY-sessid-${rand(len=10,use=[lowerCaseChar,digit])}-sesstime-30:ppp@pr.oxylabs.io:7777",

// Generate many Massive URLs (note how the characters @, ?, & and = are URI encoded)
"http://john.doe%40example.com%3Fcountry%3DMY%26session%3D${rand(len=10,use=[lowerCaseChar,upperCaseChar,digit])}:ppp@network.joinmassive.com:65534",

// Generate many Nstproxy URLs
"http://uuu-residential-country_MY-r_10m-s_${rand(len=10,use=[lowerCaseChar,upperCaseChar,digit])}:ppp@gate.nstproxy.io:24125",

// Generate many Mango Proxy URLs
"http://uuu-zone-custom-region-my-session-${rand(len=9,use=[lowerCaseChar,digit])}-sessTime-30:ppp@p1.mangoproxy.com:2333",

// Generate many Netnut URLs
"http://uuu-res-MY-sid-${rand(len=9,use=[digit])}:ppp@gw.ntnt.io:5959",

// Generate many Zenrows URLs
"http://uuu:ppp_country-my_ttl-30m_session-${rand(len=12,use=[lowerCaseChar,upperCaseChar,digit])}@superproxy.zenrows.com:1337",

// Generate many Soax Proxy URLs
"http://package-123456-country-my-sessionid-${rand(len=10,use=[lowerCaseChar,digit])}-sessionlength-10:ppp@proxy.soax.com:5000"
]
}
info

Refer to the Placeholder syntax section for detailed usage.

Response Body Example

{
"uuid": "13306ec0-1a7b-471a-9c85-bcc0363d0614",
"createdAt": "2024-02-08T15:02:35.399Z",
"updatedAt": "2024-02-23T08:42:49.064Z",
"organizationSlug": "org-name",
"country": "MY",
"urls": [
"http://xxx:yyy@gw.dataimpulse.com:10000",
"http://xxx:yyy@gw.dataimpulse.com:10001"
]
}

CURL Command Example

curl --header "Content-Type: application/json" \
--header "Authorization: Bearer $token" \
--request PUT \
--data '{"urls":["http://xxx:yyy@gw.dataimpulse.com:10000", "http://xxx:yyy@gw.dataimpulse.com:10001", "http://xxx:yyy@gw.dataimpulse.com:10002", "http://xxx:yyy@gw.dataimpulse.com:10003"]}' \
https://continuous-scraper.common.chartedapi.com/proxy-groups/MY

Add Proxy Configuration

Add new proxy URLs to an existing configuration. This method does not remove or alter existing URLs but ignores duplicates.

POST https://continuous-scraper.common.chartedapi.com/proxy-groups/${country}/add
info

Refer to the Replace Proxy Configuration section for the parameter and body documentation.

Delete Proxy Configuration

Remove specified proxy URLs from an existing configuration.

POST https://continuous-scraper.common.chartedapi.com/proxy-groups/${country}/delete
info

Refer to the Replace Proxy Configuration section for the parameter and body documentation.

Placeholder syntax

Placeholders in proxy URLs are enclosed in ${}. They allow dynamic generation of parts of the URL based on specified functions and arguments.

Supported Functions:

  • ${rand(...)}: Generates a random string based on provided arguments:
    • len: Length of the string to generate. Mandatory unless range is provided.
    • use: Array specifying types of characters to include. Allowed values:
      • lowerCaseChar: abcdefghijklmnopqrstuvwxyz
      • upperCaseChar: ABCDEFGHIJKLMNOPQRSTUVWXYZ
      • digit: 0123456789
      • symbol: [],$&+:;=?@#|"'><.^*()%! -
    • subset: A string of allowed characters. This is an alternative to use. The characters $, " and \ must be escaped with \ (i.e. \$, \" and \\).
    • range: Specifies a range [min, max] to generate a random number within.

Read Proxy Configuration

info

These operations are reserved for customers with a dedicated environment. If you're interested in a customized solution for large-scale operations, please contact us.

HTTP Method and URL

GET https://continuous-scraper.common.chartedapi.com/proxy-groups

Request Headers

  • Authorization: Bearer ${token}

Request Body Example

[
{
"uuid": "6a63e145-c43c-425c-9326-e8b0341dc122",
"createdAt": "2024-02-15T14:06:03.023Z",
"updatedAt": "2024-02-15T14:06:03.023Z",
"organizationSlug": "org-name",
"country": "ID",
"urls": [
"http://xxx:yyy@gw.dataimpulse.com:10000",
"http://xxx:yyy@gw.dataimpulse.com:10001",
"http://xxx:yyy@gw.dataimpulse.com:10002",
"http://xxx:yyy@gw.dataimpulse.com:10003"
]
},
{
"uuid": "13306ec0-1a7b-471a-9c85-bcc0363d0614",
"createdAt": "2024-02-08T15:02:35.399Z",
"updatedAt": "2024-02-23T08:42:49.064Z",
"organizationSlug": "org-name",
"country": "MY",
"urls": [
"http://xxx:yyy@gw.dataimpulse.com:10000",
"http://xxx:yyy@gw.dataimpulse.com:10001",
"http://xxx:yyy@gw.dataimpulse.com:10002",
"http://xxx:yyy@gw.dataimpulse.com:10003"
]
}
]

CURL Command Example

curl https://continuous-scraper.common.chartedapi.com/proxy-groups \
--header "Authorization: Bearer $token"

System Configuration

info

These operations are reserved for customers with a dedicated environment. If you're interested in a customized solution for large-scale operations, please contact us.

This section covers how to update and read user-modifiable system configuration.

Read System Configuration

HTTP Method and URL

GET https://continuous-scraper.common.chartedapi.com/configurations/${config_key}

Path Parameters

  • config_key: So far, only page_slot_allocation is supported.

Request Headers

  • Authorization: Bearer ${token}
info

You can only invoke this API on your own environment.

Response Body Example

{
"configKey": "PAGE_SLOT_ALLOCATION",
"createdAt": "2024-06-04T08:39:57.003Z",
"updatedAt": "2024-06-04T08:40:13.135Z",
"configValue": {
"algorithm": "AUTOMATIC_RECENT_TASKS_FIRST"
}
}

The configValue depends on the config_key you passed in the URL. Read the Replace System Configuration section for more details.

CURL Command Example

curl https://continuous-scraper.common.chartedapi.com/configurations/page_slot_allocation \
--header "Authorization: Bearer $token"

Replace System Configuration

HTTP Method and URL

PUT https://continuous-scraper.common.chartedapi.com/configurations/${config_key}

Path Parameters

  • config_key: So far, only page_slot_allocation is supported.

Request Headers

  • Authorization: Bearer ${token}
info

You can only invoke this API on your own environment.

Request Body Example for page_slot_allocation

The request body depends on the config_key passed in the URL. The following section describes the request body for the page_slot_allocation config key, which is useful if you want to replace the default page slot allocation algorithm. Check the Page Slot Management if you want to learn more about it.

{
"algorithm": "MANUAL_WEIGHTS",

// Only if algorithm == MANUAL_WEIGHTS
"manualWeights": [
{ "weight": 70, "country": "ID" },
{ "weight": 30, "country": "PH", "scraper": "SHOPEE", "device": "DESKTOP_WEB" }
]
}

Request body fields for page_slot_allocation

  • algorithm: AUTOMATIC_RECENT_TASKS_FIRST (default) or MANUAL_WEIGHTS.
  • manualWeights: Manually define the page slot allocation. It is ignored if algorithm is different from MANUAL_WEIGHTS.
    • weight: How much weight this configuration will have in the page slot distribution. For example, If the sum of all weights is 100, then 70 means that 70% of page slots will have this configuration.
    • country: Page slot country. Supported values: BR, CL, CO, ID, MX, MY, PH, SG, TH, TW, VN.
    • scraper (optional): Page slot scraper, normally set by default. Supported values: SHOPEE, LAZADA.
    • device (optional): Page slot device (default is DESKTOP_WEB). MOBILE_WEB is used when emulateMobileDevice is set to true in the Lazada scraper. Supported values: DESKTOP_WEB, MOBILE_WEB.
warning

A bad configuration can substantially deteriorate scraping performance. Make sure you know what you are doing when you set algorithm to a non-default value.

Response Body Example

{
"configKey": "PAGE_SLOT_ALLOCATION",
"createdAt": "2024-06-04T08:39:57.003Z",
"updatedAt": "2024-06-04T09:09:26.074Z",
"configValue": {
"algorithm": "MANUAL_WEIGHTS",
"manualWeights": [
{
"weight": 70,
"country": "ID"
},
{
"device": "DESKTOP_WEB",
"weight": 30,
"country": "PH",
"scraper": "SHOPEE"
}
]
}
}

CURL Command Example

curl --header "Content-Type: application/json" \
--header "Authorization: Bearer $token" \
--request PUT \
--data '{"algorithm":"AUTOMATIC_RECENT_TASKS_FIRST"}' \
https://continuous-scraper.common.chartedapi.com/configurations/page_slot_allocation