Lazada Scraper
What is Lazada Scraper and how does it work?
Lazada Scraper allows you to scrape Lazada, a major ecommerce marketplace in South East Asia.
It takes care of all the "low-level" details such as setting the right HTTP headers, to allow you to easily and efficiently requests data.
Quick Start
Run the following scraping tasks:
{
"requests": [
{ "url": "https://www.lazada.sg/shop-desktop-computer/" },
{ "url": "https://www.lazada.sg/catalog/?q=tshirt" },
{ "url": "https://www.lazada.sg/products/raspberry-pi-4-heat-sinks-1-copper-2-aluminium-3-piece-cooling-heatsink-heat-sink-accessories-i453778483.html" },
{ "url": "https://www.lazada.sg/shop/loreal-paris" }
]
}
This performs four distinct requests:
Request | URL |
---|---|
Search products in the category "Computers & Laptops > Desktops Computers" | https://www.lazada.sg/shop-desktop-computer/ |
Search for products with the keyword "tshirt" | https://www.lazada.sg/catalog/?q=tshirt |
Retrieve detailed product information | https://www.lazada.sg/products/raspberry-pi-4-heat-sinks-1-copper-2-aluminium-3-piece-cooling-heatsink-heat-sink-accessories-i453778483.html |
Retrieve detailed information about a shop | https://www.lazada.sg/shop/loreal-paris |
After few seconds the Lazada Scraper returns the following results:
- Product listing for the category "Computers & Laptops > Desktops Computers"
- Product listing for the "tshirt" keyword
- Detailed information for the product "Raspberry Pi 4 Heat Sinks"
- Detailed information for the shop "L'Oréal Paris Official Store"
For a comprehensive understanding of how to utilize the Lazada Scraper effectively, including more examples and detailed explanations, please consult the How to Scrape Lazada? section below.
What data can you extract?
Lazada Scraper seamlessly interacts with Lazada's unofficial API to extract a wide range of data, including:
- Product searches by keyword and category.
- Detailed product information (description, price, stock levels, sales data, and variants).
- Buyer ratings with comments and media.
- Hierarchical category trees.
- Lists of all sellers on the platform.
- Comprehensive seller details (description, opening date, product counts, and performance metrics).
- Seller products.
- Seller hot keywords, which are the top 10 keywords driving sales for a seller over the past week.
Note: By aggregating hot keywords from all sellers, you can identify and rank significant keywords based on their sales volume. Regular scraping, such as on a weekly basis, enables you to track keyword trends and identify those that are gaining popularity the quickest, providing valuable insights for strategic planning.
Consider scraping both Lazada and Shopee simultaneously to strengthen your data acquisition strategy. This not only ensures a more reliable pipeline—given the low probability of both platforms introducing breaking changes simultaneously—but also leverages complementary data insights, as keyword trends from Lazada can enhance product searches on Shopee, which has a larger market share.
Please visit the API reference section below for more details about each API.
Please contact me if you encounter issues or need additional features.
Why scrape Lazada data?
Discover what you can achieve with the Lazada Scraper through these example use cases:
- Market and Competitive Analysis: Track category performance and identify emerging trends by analyzing sales data, product reviews, and competitor strategies. This can reveal insights into market demands and areas for strategic advantage.
- Opportunity Discovery: Use aggregated product search results to uncover popular yet underserved keywords and niches in the market. This can highlight potential areas for new product development or market entry.
- Product Review Analysis: Employ AI, such as ChatGPT, to deeply analyze product reviews. Summarize buyer feedback to gain insights into consumer satisfaction, preferences, and areas for product improvement.
- Competitor Watch: Regularly scrape competitor data to monitor for critical changes like products going out of stock, pricing adjustments, or new product launches. This enables real-time alerts and swift response strategies.
How to scrape Lazada?
Getting started
Get started with Lazada Scraper by following these simple steps:
- Prepare Requests: Formulate the request URLs you wish to scrape. Refer to the API reference section for guidance. Example URLs include:
- Search for popular products by keyword:
https://www.lazada.com.my/catalog/?q=raspberry%20pi%205
- Retrieve top categories:
https://acs-m.lazada.com.my/h5/mtop.lazada.guided.shopping.categories.categorieslpcommon/1.0/
- List products by category:
https://www.lazada.com.my/shop-computers-laptops/
- List all sellers:
https://www.lazada.com.my/sitemap-sellers.xml
- Access seller detail (including his hot keywords):
https://www.lazada.com.my/shop/citemalaysia/
- List products by seller:
https://www.lazada.com.my/nike/?q=All-Products&from=wangpu&langFlag=en&pageTypeId=2
- Access product detail:
https://www.lazada.com.my/products/120ml-skintific-all-day-light-sunscreen-mist-spf50-pa-sunblock-spray-anti-uv-face-body-spray-120ml-i3525761808-s22573296116.html
- Access product reviews:
https://my.lazada.com.my/pdp/review/getReviewList?itemId=2932861112
1Execution: Submit your scraping tasks and wait for the results. Follow the execution via the Grafana dashboard.
Example input:
{
// Search products with the "raspberry pi 5" keyword
"requests": [
{ "url": "https://www.lazada.com.my/catalog/?q=raspberry%20pi%205" }
]
}
Example JSON output from a scraping session:
{
"url": "https://www.lazada.com.my/catalog/?q=raspberry%20pi%205",
"enrichedUrl": "https://www.lazada.com.my/catalog/?q=raspberry%20pi%205",
"referrer": "https://lazada.com.my/",
"knownApi": "KEYWORD_PRODUCTS",
"responseStatus": 200,
"responseBody": {
"productTotal": 1309,
"products": [
{
"itemId": 4084035635,
"pageUrl": "https://www.lazada.com.my/products/raspberry-pi-5-bundle-8gbram32gb-noobsredwhite-case-with-fan-i4084035635.html",
"name": "Raspberry Pi 5 Bundle (8GBRAM/32GB NOOBS/Red/White Case with Fan)",
"imageUrl": "https://sg-test-11.slatic.net/p/c545e50e4305a33c6291a7021a1cc60a.png",
"categoryId": 10000493,
"location": "Wp Kuala Lumpur",
"originalPrice": 699,
"price": 618.9,
"isInStock": true,
"isAd": false,
"isSponsored": false,
"isLazMall": false,
"isLazGlobal": false,
"hasFreeShipping": false,
"reviewCount": 0,
"brandId": 32076,
"brandName": "Raspberry Pi",
"defaultSkuId": 23134383025,
"supplierId": 3335,
"shopId": 19295
}
// ... 59 other products
]
}
}
This output includes fields like the original URL (url
) and the HTTP response (responseStatus
and responseBody
), among others.
The responseBody
above is a cleaned version of the real Lazada response. If you want the raw response,
set cleanResponseBody
to false. Check the input parameters for more details.
Input parameters
The Lazada Scraper offers a range of input parameters, allowing you to tailor your scraping process according to your needs. These parameters can be broadly categorized into common parameters applicable to all APIs and specific parameters unique to certain APIs (detailed in the API reference section).
Key common parameters include:
requests
: This is where you specify the list of URLs you intend to scrape.maxRequestsPerCrawl
(optional): Set the maximum number of requests the crawler will execute. The process stops once this limit is reached.cleanResponseBody
(default = true): Indicates whether the response should be cleaned (true) or returned as received (false).emulateMobileDevice
(default = false): Simulates a mobile browser when set to true. Some APIs may return different results in this mode.language
(optional, default = "en" or "id"): Language set in the "hng" cookie and the "x-i18n-language" header. Acceptable values: en (English), id (Indonesian), ms (Malay), th (Thai), vi (Vietnamese).
For any queries or assistance, feel free to reach out.
Output fields
The data returned from the Lazada Scraper includes several important fields. Below is a description of each:
-
url
: This is the original URL you submitted for scraping. -
referrer
: The Referer HTTP header sent to Lazada. manually as well (see input parameters for more). -
knownApi
: Enumerated value corresponding to the requested API. This helps you identify which API endpoint was used:URL KnownApi https://www.lazada.xxx/sitemap-sellers.xml SELLER_LISTING https://www.lazada.xxx/shop/seller-slug/ SELLER_DETAIL https://www.lazada.xxx/seller-slug/?q=All-Products&from=wangpu&langFlag=en&pageTypeId=2 SELLER_PRODUCTS https://www.lazada.xxx/shop/site/api/shop/campaignTppProducts/query?shopId=111&sellerId=222&itemId=333 SELLER_CAMPAIGN_TPP_PRODUCTS https://acs-m.lazada.xxx/h5/mtop.lazada.guided.shopping.categories.categorieslpcommon/1.0/ CATEGORY_TREE https://www.lazada.xxx/category-slug/ CATEGORY_PRODUCTS https://www.lazada.xxx/tag-order-last-90days-morethan0.xml KEYWORD_LISTING https://www.lazada.xxx/catalog/?q=keyword KEYWORD_PRODUCTS https://www.lazada.xxx/products/product-slug-i111-s222.html PRODUCT_DETAIL https://my.lazada.com.my/pdp/review/getReviewList?itemId=111 PRODUCT_REVIEWS -
responseStatus
: Indicates the HTTP response status received from Lazada. -
responseBody
: Contains the HTTP response body from Lazada in JSON format. The specific content varies depending on the requested API.
API reference
Product Search by Keyword or Category or Seller
- URL Path:
https://www.lazada.${tld}/catalog/?q=${keyword}
https://www.lazada.${tld}/${categorySlug}/
https://www.lazada.${tld}/${sellerSlug}/?q=All-Products&from=wangpu&langFlag=en&pageTypeId=2
Avoid specifying the page
parameter in the URL due to inefficiencies related to Lazada's pagination limitations.
Sequential navigation from the first page is required, which can consume significant proxy traffic. For multi-page
crawling, consider using the productListing_crawlNextPages
parameter.
-
URL Parameters:
keyword
: Search keyword, e.g., "tshirt".categorySlug
: Category slug, e.g. "shop-computers-laptops".sellerSlug
: Seller slug, e.g. "nike".
-
Scraper Input Parameters
cleanResponseBody
(default = true): Indicates whether the response should be cleaned (true) or returned as received (false).emulateMobileDevice
(default = false): Simulates a mobile browser when set to true. This is particularly useful for this API, as it only provides consistent ordering in mobile mode.productListing_crawlNextPages
(default = false): Enables automatic crawling of subsequent pages.productListing_crawlNextPages_maxPages
(optional): Sets a cap on the number of pages to crawl.productListing_crawlNextPages_maxUniqueProducts
(optional): Limits the number of unique products to scrape. Products that appear multiple times are counted once.productListing_crawlNextPages_stopWhenNewPageOnlyContainsDuplicates
(default = true): Prevents infinite loops in pagination bugs by stopping when a page only contains previously listed products.
If productListing_crawlNextPages
is set to true
, all pages are scraped sequentially within a single scraping task.
This is because Lazada requires products to be listed page by page within the same web browser session, from page 1.
Direct access to a non-consecutive page (e.g., jumping straight to page 5) would cause Lazada to display results for the page 1.
In practice, this means that a single scraping task may require more than 10min to be processed, at it may involve more than 100 HTTP requests (note that a random delay is added between each scrape, to reduce bot detection).
The nature of this sequential crawling increases the susceptibility of the scraping task to being blocked, especially as successful completion depends on numerous uninterrupted requests. Although captchas can occasionally interrupt the session, they can be resolved; however, if no captcha is presented and the session is disrupted, the process need to restart from the first page.
- Example Input:
{
"requests": [
// Search products with the "raspberry pi 5" keyword
{ "url": "https://www.lazada.com.my/catalog/?q=raspberry%20pi%205" },
// Search products in the "Computers / Laptops" category
{ "url": "https://www.lazada.com.my/shop-computers-laptops/" },
// Search products from Nike store
{ "url": "https://www.lazada.com.my/nike/?q=All-Products&from=wangpu&langFlag=en&pageTypeId=2" }
]
}
- Example Responses:
Product Details
-
URL Path:
https://www.lazada.${tld}/products/${productSlug}-i${productId}-s${sellerId}.html
-
URL Parameters:
productSlug
: Product slug, e.g., "120ml-skintific-all-day-light-sunscreen-mist-spf50-pa-sunblock-spray-anti-uv-face-body-spray-120ml".productId
: Product ID (a.k.a.itemId
), e.g. 3525761808.sellerId
: Seller ID, e.g. 22573296116.
-
Scraper Input Parameters
cleanResponseBody
(default = true): Indicates whether the response should be cleaned (true) or returned as received (false).emulateMobileDevice
(default = false): Simulates a mobile browser when set to true. Some APIs may return different results in this mode.
-
Example Input:
{
"requests": [
{ "url": "https://www.lazada.com.my/products/120ml-skintific-all-day-light-sunscreen-mist-spf50-pa-sunblock-spray-anti-uv-face-body-spray-120ml-i3525761808-s22573296116.html" }
]
}
- Example Response:
Product Reviews
-
URL Path:
https://my.lazada.${tld}/pdp/review/getReviewList?itemId=${productId}
-
URL Parameters:
productId
: Product ID (a.k.a.itemId
), e.g. 3525761808.
-
Scraper Input Parameters
cleanResponseBody
(default = true): Indicates whether the response should be cleaned (true) or returned as received (false).emulateMobileDevice
(default = false): Simulates a mobile browser when set to true. Some APIs may return different results in this mode.
-
Example Input:
{
"requests": [
{ "url": "https://my.lazada.com.my/pdp/review/getReviewList?itemId=2932861112" }
]
}
- Example Response:
Category Tree
-
URL Path:
https://acs-m.lazada.${tld}/h5/mtop.lazada.guided.shopping.categories.categorieslpcommon/1.0/
-
Scraper Input Parameters
emulateMobileDevice
(default = false): Simulates a mobile browser when set to true. Some APIs may return different results in this mode.
-
Example Input:
{
"requests": [
{ "url": "https://acs-m.lazada.com.my/h5/mtop.lazada.guided.shopping.categories.categorieslpcommon/1.0/" }
]
}
- Example Response:
Seller Listing
-
URL Path:
https://www.lazada.${tld}/sitemap-sellers.xml?limit=${limit}&offset=${offset}
-
URL Parameters:
limit
: Number of sub-sitemaps to load (better equals to or less than 30).offset
: Offset for results, typically a multiple oflimit
.
-
Example Input:
{
"requests": [
{ "url": "https://www.lazada.com.my/sitemap-sellers.xml" }
]
}
- Example Response:
Seller Details
-
URL Path:
https://www.lazada.${tld}/shop/${sellerSlug}/
-
URL Parameters:
sellerSlug
: Seller slug, e.g., "citemalaysia".
-
Scraper Input Parameters
cleanResponseBody
(default = true): Indicates whether the response should be cleaned (true) or returned as received (false).emulateMobileDevice
(default = false): Simulates a mobile browser when set to true. Some APIs may return different results in this mode.
-
Example Input:
{
"requests": [
{ "url": "https://www.lazada.com.my/shop/citemalaysia/" }
]
}
- Example Response:
Seller Promoted Products
-
URL Path:
https://www.lazada.${tld}/shop/site/api/shop/campaignTppProducts/query?shopId=${shopId}&sellerId=${sellerId}&itemId=${productId}
-
URL Parameters:
shopId
: Shop ID, can be obtained from the Seller Details.sellerId
: Seller ID, can be obtained from the Seller Details.productId
: Any product ID from the seller.
-
Scraper Input Parameters
emulateMobileDevice
(default = false): Simulates a mobile browser when set to true. Some APIs may return different results in this mode.
-
Example Input:
{
"requests": [
{ "url": "https://www.lazada.co.id/shop/site/api/shop/campaignTppProducts/query?shopId=3258813&sellerId=400611231032&itemId=7991896339" }
]
}
- Example Response:
Keyword Listing
-
URL Path:
- For ID, VN:
https://www.lazada.${tld}/tag-order-last-30days-morethan0.xml?limit=${limit}&offset=${offset}
- For PH, TH:
https://www.lazada.${tld}/tag-order-last-60days-morethan0.xml?limit=${limit}&offset=${offset}
- For MY, SG:
https://www.lazada.${tld}/tag-order-last-90days-morethan0.xml?limit=${limit}&offset=${offset}
- For ID, VN:
-
URL Parameters:
limit
: Number of sub-sitemaps to load (better equals to or less than 30).offset
: Offset for results, typically a multiple oflimit
.
-
Example Input:
{
"requests": [
{ "url": "https://www.lazada.com.my/tag-order-last-90days-morethan0.xml?limit=30&offset=0" }
]
}
- Example Response:
Is it legal to scrape Lazada?
It is legal to scrape publicly available data such as product descriptions, prices, or ratings. Read Apify's blog post on the legality of web scraping to learn more.
Your feedback
Please don't hesitate to share your feedback (improvement ideas, bug, ...etc.). You can reach me on Discord (username "marcplouhinec").
Thanks
Lazada Scraper is built with Crawlee, a great JavaScript framework to accelerate scraper development.
It also uses ungoogled-chromium via the Chrome DevTools Protocol. These powerful technologies have been instrumental in "opening" the website despite anti-bot protections.
Finally, a lot of knowledge that enabled the development of Lazada Scraper comes from The Web Scraping Club and its Discord Community.