API Reference
Contact Marc Plouhinec to get an authentication token.
Table Of Content
- Run Scraping Tasks
- Run A Single Scraping Task
- Read Scraping Tasks
- Cancel Scraping Tasks
- Update Proxy Configuration
- Read Proxy Configuration
- System Configuration
Run Scraping Tasks
HTTP Method and URL
POST https://continuous-scraper.common.chartedapi.com/scraping-tasks/${scraper}/run?autoCancelAfterSec=1200&includeAllFields=false&waitForCompletion=true
Path Parameters
scraper
: shopee or lazada.
Query Parameters
autoCancelAfterSec
(default = 600): scraping execution timeout (only enabled ifwaitForCompletion=true
).includeAllFields
(default = false): if false, only the most important fields are returned.waitForCompletion
(default = true): if true, wait for the scraping tasks to terminate. If false, return immediately (use this API with theuuids
parameter to get the results).
waitForCompletion=true
requires maintaining an open HTTP connection to receive results after the scraping tasks
complete. If this connection is closed prematurely, the ongoing scraping tasks will be cancelled.
Request Headers
Authorization: Bearer ${token}
Request Body Example
{
"requests": [
{
"url": "https://shopee.co.id/adidas-Runfalcon-3.0-Men's-Running-Shoes---Core-Black-i.234490784.25462132870?sp_atk=67836f0a-c0c8-40d4-a0a7-bd1998943cc9&xptdk=67836f0a-c0c8-40d4-a0a7-bd1998943cc9"
},
{
"url": "https://shopee.co.id/SEPATU-SNEAKERS-CASUAL-ADIDS-SPEZIAL-HANDBALL-BNIB-i.197850298.20595626884?sp_atk=ff613749-9cf1-4b40-ae8e-f76dd2279ba8&xptdk=ff613749-9cf1-4b40-ae8e-f76dd2279ba8"
},
{
"url": "https://shopee.co.th/api/v4/item/get_list",
"method": "POST",
"payload": {
"shop_item_ids": [
{
"item_id": 5289970960,
"shop_id": 237980476
},
{
"item_id": 23556386969,
"shop_id": 644345618
}
],
"source": "microsite_individual_product"
}
}
]
}
Request body fields
requests
: An object with the following fields:url
: The API or page URL of the resource to scrap.referrer
(optional): The referrer to pass as a request header.method
(optional): The request method (GET or POST).payload
(optional): The request body (if method is POST).
cacheMaxAgeInSec
(default = 0): If it exists and is fresh enough, return the successful scraping task result with the same URL.
In addition, you can also add scraper-specific parameters, for example:
{
"requests": [
{
"url": "https://shopee.co.id/api/v4/search/search_items?keyword=tshirt"
}
],
"productSearch_enrichUrlQuery_pageSize": 60,
"productSearch_crawlNextPages": true,
"productSearch_crawlNextPages_maxPages": 3,
"productSearch_crawlProductDetails": true
}
Response Body Example
[
{
"url": "https://shopee.co.id/SEPATU-SNEAKERS-CASUAL-ADIDS-SPEZIAL-HANDBALL-BNIB-i.197850298.20595626884?sp_atk=ff613749-9cf1-4b40-ae8e-f76dd2279ba8&xptdk=ff613749-9cf1-4b40-ae8e-f76dd2279ba8",
"status": "SUCCESS",
"responseBody": "{}"
},
{
"url": "https://shopee.co.id/adidas-Runfalcon-3.0-Men's-Running-Shoes---Core-Black-i.234490784.25462132870?sp_atk=67836f0a-c0c8-40d4-a0a7-bd1998943cc9&xptdk=67836f0a-c0c8-40d4-a0a7-bd1998943cc9",
"status": "ERROR",
"responseBody": null,
"errorMessage": "... explanation ..."
}
]
CURL Command Example
curl --header "Content-Type: application/json" \
--header "Authorization: Bearer $token" \
--request POST \
--data '{"requests":[{"url": "https://shopee.co.id/adidas-Runfalcon-3.0-Mens-Running-Shoes---Core-Black-i.234490784.25462132870?sp_atk=67836f0a-c0c8-40d4-a0a7-bd1998943cc9&xptdk=67836f0a-c0c8-40d4-a0a7-bd1998943cc9"}, {"url": "https://shopee.co.id/SEPATU-SNEAKERS-CASUAL-ADIDS-SPEZIAL-HANDBALL-BNIB-i.197850298.20595626884?sp_atk=ff613749-9cf1-4b40-ae8e-f76dd2279ba8&xptdk=ff613749-9cf1-4b40-ae8e-f76dd2279ba8"}]}' \
https://continuous-scraper.common.chartedapi.com/requests/shopee/run?autoCancelAfterSec=1200&includeAllFields=false
Run A Single Scraping Task
HTTP Method and URL
POST https://continuous-scraper.common.chartedapi.com/scraping-tasks/${scraper}/run-single?autoCancelAfterSec=1200&includeAllFields=false
Path Parameters
scraper
: shopee or lazada.
Query Parameters
autoCancelAfterSec
(default = 600): scraping execution timeout (only enabled ifwaitForCompletion=true
).includeAllFields
(default = false): if false, only the most important fields are returned.waitForCompletion
(default = true): if true, wait for the scraping tasks to terminate. If false, return immediately (use this API with theuuids
parameter to get the results).
waitForCompletion=true
requires maintaining an open HTTP connection to receive results after the scraping tasks
complete. If this connection is closed prematurely, the ongoing scraping tasks will be cancelled.
Request Headers
Authorization: Bearer ${token}
Request Body Example
{
"url": "https://shopee.co.id/adidas-Runfalcon-3.0-Men's-Running-Shoes---Core-Black-i.234490784.25462132870?sp_atk=67836f0a-c0c8-40d4-a0a7-bd1998943cc9&xptdk=67836f0a-c0c8-40d4-a0a7-bd1998943cc9"
}
Request body fields
url
: The API or page URL of the resource to scrap.referrer
(optional): The referrer to pass as a request header.method
(optional): The request method (GET or POST).payload
(optional): The request body (if method is POST).cacheMaxAgeInSec
(default = 0): If it exists and is fresh enough, return the successful scraping task result with the same URL.
In addition, you can also add scraper-specific parameters, for example:
{
"url": "https://shopee.co.id/api/v4/search/search_items?keyword=tshirt",
"productSearch_enrichUrlQuery_pageSize": 60,
"productSearch_crawlNextPages": true,
"productSearch_crawlNextPages_maxPages": 3,
"productSearch_crawlProductDetails": true
}
Response Body Example
{
"url": "https://shopee.co.id/SEPATU-SNEAKERS-CASUAL-ADIDS-SPEZIAL-HANDBALL-BNIB-i.197850298.20595626884?sp_atk=ff613749-9cf1-4b40-ae8e-f76dd2279ba8&xptdk=ff613749-9cf1-4b40-ae8e-f76dd2279ba8",
"status": "SUCCESS",
"responseBody": "{}"
}
CURL Command Example
curl --header "Content-Type: application/json" \
--header "Authorization: Bearer $token" \
--request POST \
--data '{"url": "https://shopee.co.id/adidas-Runfalcon-3.0-Mens-Running-Shoes---Core-Black-i.234490784.25462132870?sp_atk=67836f0a-c0c8-40d4-a0a7-bd1998943cc9&xptdk=67836f0a-c0c8-40d4-a0a7-bd1998943cc9"}' \
https://continuous-scraper.common.chartedapi.com/requests/shopee/run-single?autoCancelAfterSec=1200&includeAllFields=false
Read Scraping Tasks
HTTP Method and URL
GET https://continuous-scraper.common.chartedapi.com/scraping-tasks/${scraper}?limit=20&offset=0&uuids=398a3b4d-241c-4297-9e81-5c4c6ecce340,785ebeaf-ae2c-4453-908b-c7dac008fbf4
Path Parameters
scraper
: shopee or lazada.
Query Parameters
limit
(optional, default = 20): Maximum number of scraping tasks to return.offset
(default = 0): Number of scraping tasks to skip.uuids
(optional): Comma-separated list of scraping task UUIDs. Filter by the given scraping task UUIDs. Example:uuids=398a3b4d-241c-4297-9e81-5c4c6ecce340,785ebeaf-ae2c-4453-908b-c7dac008fbf4
urls
(optional): Comma-separated list of scraping task URLs. Filter by the given scraping task URLs. Example:urls=https://shopee.co.id/api/v4/search/search_items?keyword=nike,https://shopee.co.id/api/v4/search/search_items?keyword=huawei
minPendingAt
(optional): Only return scraping tasks that have been submitted after this date (ISO 8601 format).maxPendingAt
(optional): Only return scraping tasks that have been submitted before this date (ISO 8601 format).includeAllFields
(default = false): if false, only the most important fields are returned.includeFields
(optional): Comma-separated list of scraping task fields to return. Not that it overridesincludeAllFields
. Example:includeFields=uuid,url,status
Request Headers
Authorization: Bearer ${token}
Response Body Example
[
{
"uuid": "398a3b4d-241c-4297-9e81-5c4c6ecce340",
"url": "https://shopee.co.id/SEPATU-SNEAKERS-CASUAL-ADIDS-SPEZIAL-HANDBALL-BNIB-i.197850298.20595626884?sp_atk=ff613749-9cf1-4b40-ae8e-f76dd2279ba8&xptdk=ff613749-9cf1-4b40-ae8e-f76dd2279ba8",
"status": "SUCCESS",
"responseBody": "{}"
},
{
"uuid": "785ebeaf-ae2c-4453-908b-c7dac008fbf4",
"url": "https://shopee.co.id/adidas-Runfalcon-3.0-Men's-Running-Shoes---Core-Black-i.234490784.25462132870?sp_atk=67836f0a-c0c8-40d4-a0a7-bd1998943cc9&xptdk=67836f0a-c0c8-40d4-a0a7-bd1998943cc9",
"status": "ERROR",
"responseBody": null,
"errorMessage": "... explanation ..."
}
]
CURL Command Example
curl https://continuous-scraper.common.chartedapi.com/scraping-tasks/shopee?limit=20&offset=0 \
--header "Authorization: Bearer $token"
Cancel Scraping Tasks
HTTP Method and URL
POST https://continuous-scraper.common.chartedapi.com/scraping-tasks/${scraper}/cancel
Path Parameters
scraper
: shopee or lazada.
Request Headers
Authorization: Bearer ${token}
Request Body Example
{
"uuids": [
"2199a83d-3ebe-442c-aa82-afd3a9903967",
"1ce16c7b-26f1-4851-a281-e09f79ce2de9",
"34d3449d-b5b9-46a5-a4e0-d524d2ab3222"
],
"countries": ["ID", "MY", "PH", "SG", "TH", "VN"]
}
Request body fields
uuids
: Array of UUIDs of the scraping tasks to cancel.countries
: Array of countries to cancel.
Sub-scraping tasks are cancelled as well.
Response Body Example
{
"uuids": [
"2199a83d-3ebe-442c-aa82-afd3a9903967",
"1ce16c7b-26f1-4851-a281-e09f79ce2de9",
"34d3449d-b5b9-46a5-a4e0-d524d2ab3222",
"37744b6d-35fb-47c0-9c73-e9b7feb9f089",
"29d6b918-745b-4029-bdba-5f6489c43146"
],
"countries": ["ID"]
}
Response body fields
uuids
: Array of cancelled scraping tasks (and sub-scraping tasks) UUIDs.countries
: Array of the countries related to cancelled scraping tasks.
CURL Command Example
curl --header "Content-Type: application/json" \
--header "Authorization: Bearer $token" \
--request POST \
--data '{"uuids":["2199a83d-3ebe-442c-aa82-afd3a9903967", "1ce16c7b-26f1-4851-a281-e09f79ce2de9"}' \
https://continuous-scraper.common.chartedapi.com/requests/shopee/cancel
Update Proxy Configuration
These operations are reserved for customers with a dedicated environment. If you're interested in a customized solution for large-scale operations, please contact us.
This section covers how to replace, add, or delete proxy configurations by optionally using dynamic placeholders in proxy URLs. These placeholders allow for automated generation of multiple URLs based on specified criteria.
Replace Proxy Configuration
Replace the entire proxy configuration for a specified country with a new set of proxy URLs.
HTTP Method and URL
PUT https://continuous-scraper.common.chartedapi.com/proxy-groups/${country}
Path Parameters
${country}
: ID, TW, VN, TH, PH, MY, SG, BR, MX, CO, CL
Request Headers
Authorization: Bearer ${token}
Request Body Example
{
"urls": [
"http://xxx:yyy@gw.dataimpulse.com:10000",
"http://xxx:yyy@gw.dataimpulse.com:10001"
]
}
Using Placeholders
You can use placeholders to dynamically generate proxy URLs. For example:
{
"urls": [
// Generate 9999 DataImpulse URLs
"http://uuu__cr.my:ppp@gw.dataimpulse.com:${rand(range=[10000,19999])}",
// Generate many LunaProxy URLs
"http://user-uuu-region-my-sessid-my${rand(len=16,use=[lowerCaseChar,digit])}-sesstime-30:ppp@as.e4f4xh3i.lunaproxy.net:12233",
// Generate many IPRoyal URLs
"http://uuu:ppp_country-my_session-${rand(len=8,use=[lowerCaseChar,upperCaseChar,digit])}_lifetime-30m@geo.iproyal.com:12321",
// Generate many BrightData URLs
"http://brd-customer-hl_uuu-zone-zzz-country-my-session-${rand(range=[1,70000000])}:ppp@brd.superproxy.io:22225",
// Generate many Oxylabs URLs
"http://customer-uuu-cc-MY-sessid-${rand(len=10,use=[lowerCaseChar,digit])}-sesstime-30:ppp@pr.oxylabs.io:7777",
// Generate many Massive URLs (note how the characters @, ?, & and = are URI encoded)
"http://john.doe%40example.com%3Fcountry%3DMY%26session%3D${rand(len=10,use=[lowerCaseChar,upperCaseChar,digit])}:ppp@network.joinmassive.com:65534",
// Generate many Nstproxy URLs
"http://uuu-residential-country_MY-r_10m-s_${rand(len=10,use=[lowerCaseChar,upperCaseChar,digit])}:ppp@gate.nstproxy.io:24125",
// Generate many Mango Proxy URLs
"http://uuu-zone-custom-region-my-session-${rand(len=9,use=[lowerCaseChar,digit])}-sessTime-30:ppp@p1.mangoproxy.com:2333",
// Generate many Netnut URLs
"http://uuu-res-MY-sid-${rand(len=9,use=[digit])}:ppp@gw.ntnt.io:5959",
// Generate many Zenrows URLs
"http://uuu:ppp_country-my_ttl-30m_session-${rand(len=12,use=[lowerCaseChar,upperCaseChar,digit])}@superproxy.zenrows.com:1337",
// Generate many Soax Proxy URLs
"http://package-123456-country-my-sessionid-${rand(len=10,use=[lowerCaseChar,digit])}-sessionlength-10:ppp@proxy.soax.com:5000"
]
}
Refer to the Placeholder syntax section for detailed usage.
Response Body Example
{
"uuid": "13306ec0-1a7b-471a-9c85-bcc0363d0614",
"createdAt": "2024-02-08T15:02:35.399Z",
"updatedAt": "2024-02-23T08:42:49.064Z",
"organizationSlug": "org-name",
"country": "MY",
"urls": [
"http://xxx:yyy@gw.dataimpulse.com:10000",
"http://xxx:yyy@gw.dataimpulse.com:10001"
]
}
CURL Command Example
curl --header "Content-Type: application/json" \
--header "Authorization: Bearer $token" \
--request PUT \
--data '{"urls":["http://xxx:yyy@gw.dataimpulse.com:10000", "http://xxx:yyy@gw.dataimpulse.com:10001", "http://xxx:yyy@gw.dataimpulse.com:10002", "http://xxx:yyy@gw.dataimpulse.com:10003"]}' \
https://continuous-scraper.common.chartedapi.com/proxy-groups/MY
Add Proxy Configuration
Add new proxy URLs to an existing configuration. This method does not remove or alter existing URLs but ignores duplicates.
POST https://continuous-scraper.common.chartedapi.com/proxy-groups/${country}/add
Refer to the Replace Proxy Configuration section for the parameter and body documentation.
Delete Proxy Configuration
Remove specified proxy URLs from an existing configuration.
POST https://continuous-scraper.common.chartedapi.com/proxy-groups/${country}/delete
Refer to the Replace Proxy Configuration section for the parameter and body documentation.
Placeholder syntax
Placeholders in proxy URLs are enclosed in ${}
. They allow dynamic generation of parts of the URL based on specified functions and arguments.
Supported Functions:
${rand(...)}
: Generates a random string based on provided arguments:len
: Length of the string to generate. Mandatory unlessrange
is provided.use
: Array specifying types of characters to include. Allowed values:lowerCaseChar
: abcdefghijklmnopqrstuvwxyzupperCaseChar
: ABCDEFGHIJKLMNOPQRSTUVWXYZdigit
: 0123456789symbol
:[],$&+:;=?@#|"'><.^*()%! -
subset
: A string of allowed characters. This is an alternative touse
. The characters$
,"
and\
must be escaped with\
(i.e.\$
,\"
and\\
).range
: Specifies a range[min, max]
to generate a random number within.
Read Proxy Configuration
These operations are reserved for customers with a dedicated environment. If you're interested in a customized solution for large-scale operations, please contact us.
HTTP Method and URL
GET https://continuous-scraper.common.chartedapi.com/proxy-groups
Request Headers
Authorization: Bearer ${token}
Request Body Example
[
{
"uuid": "6a63e145-c43c-425c-9326-e8b0341dc122",
"createdAt": "2024-02-15T14:06:03.023Z",
"updatedAt": "2024-02-15T14:06:03.023Z",
"organizationSlug": "org-name",
"country": "ID",
"urls": [
"http://xxx:yyy@gw.dataimpulse.com:10000",
"http://xxx:yyy@gw.dataimpulse.com:10001",
"http://xxx:yyy@gw.dataimpulse.com:10002",
"http://xxx:yyy@gw.dataimpulse.com:10003"
]
},
{
"uuid": "13306ec0-1a7b-471a-9c85-bcc0363d0614",
"createdAt": "2024-02-08T15:02:35.399Z",
"updatedAt": "2024-02-23T08:42:49.064Z",
"organizationSlug": "org-name",
"country": "MY",
"urls": [
"http://xxx:yyy@gw.dataimpulse.com:10000",
"http://xxx:yyy@gw.dataimpulse.com:10001",
"http://xxx:yyy@gw.dataimpulse.com:10002",
"http://xxx:yyy@gw.dataimpulse.com:10003"
]
}
]
CURL Command Example
curl https://continuous-scraper.common.chartedapi.com/proxy-groups \
--header "Authorization: Bearer $token"
System Configuration
These operations are reserved for customers with a dedicated environment. If you're interested in a customized solution for large-scale operations, please contact us.
This section covers how to update and read user-modifiable system configuration.
Read System Configuration
HTTP Method and URL
GET https://continuous-scraper.common.chartedapi.com/configurations/${config_key}
Path Parameters
config_key
: So far, onlypage_slot_allocation
is supported.
Request Headers
Authorization: Bearer ${token}
You can only invoke this API on your own environment.
Response Body Example
{
"configKey": "PAGE_SLOT_ALLOCATION",
"createdAt": "2024-06-04T08:39:57.003Z",
"updatedAt": "2024-06-04T08:40:13.135Z",
"configValue": {
"algorithm": "AUTOMATIC_RECENT_TASKS_FIRST"
}
}
The configValue
depends on the config_key
you passed in the URL. Read the Replace System Configuration section for more details.
CURL Command Example
curl https://continuous-scraper.common.chartedapi.com/configurations/page_slot_allocation \
--header "Authorization: Bearer $token"
Replace System Configuration
HTTP Method and URL
PUT https://continuous-scraper.common.chartedapi.com/configurations/${config_key}
Path Parameters
config_key
: So far, onlypage_slot_allocation
is supported.
Request Headers
Authorization: Bearer ${token}
You can only invoke this API on your own environment.
Request Body Example for page_slot_allocation
The request body depends on the config_key
passed in the URL. The following section describes the
request body for the page_slot_allocation
config key, which is useful if you want to replace the default
page slot allocation algorithm. Check the Page Slot Management if you want to learn more about it.
{
"algorithm": "MANUAL_WEIGHTS",
// Only if algorithm == MANUAL_WEIGHTS
"manualWeights": [
{ "weight": 70, "country": "ID" },
{ "weight": 30, "country": "PH", "scraper": "SHOPEE", "device": "DESKTOP_WEB" }
]
}
Request body fields for page_slot_allocation
algorithm
:AUTOMATIC_RECENT_TASKS_FIRST
(default) orMANUAL_WEIGHTS
.manualWeights
: Manually define the page slot allocation. It is ignored ifalgorithm
is different fromMANUAL_WEIGHTS
.weight
: How much weight this configuration will have in the page slot distribution. For example, If the sum of all weights is 100, then 70 means that 70% of page slots will have this configuration.country
: Page slot country. Supported values: BR, CL, CO, ID, MX, MY, PH, SG, TH, TW, VN.scraper
(optional): Page slot scraper, normally set by default. Supported values: SHOPEE, LAZADA.device
(optional): Page slot device (default is DESKTOP_WEB). MOBILE_WEB is used whenemulateMobileDevice
is set to true in the Lazada scraper. Supported values: DESKTOP_WEB, MOBILE_WEB.
A bad configuration can substantially deteriorate scraping performance.
Make sure you know what you are doing when you set algorithm
to a non-default value.
Response Body Example
{
"configKey": "PAGE_SLOT_ALLOCATION",
"createdAt": "2024-06-04T08:39:57.003Z",
"updatedAt": "2024-06-04T09:09:26.074Z",
"configValue": {
"algorithm": "MANUAL_WEIGHTS",
"manualWeights": [
{
"weight": 70,
"country": "ID"
},
{
"device": "DESKTOP_WEB",
"weight": 30,
"country": "PH",
"scraper": "SHOPEE"
}
]
}
}
CURL Command Example
curl --header "Content-Type: application/json" \
--header "Authorization: Bearer $token" \
--request PUT \
--data '{"algorithm":"AUTOMATIC_RECENT_TASKS_FIRST"}' \
https://continuous-scraper.common.chartedapi.com/configurations/page_slot_allocation