Skip to main content

Lazada Scraper

What is Lazada Scraper and how does it work?

Lazada Scraper allows you to scrape Lazada, a major ecommerce marketplace in South East Asia.

It takes care of all the "low-level" details such as setting the right HTTP headers, to allow you to easily and efficiently requests data.

Quick Start

Run the following scraping tasks:

{
"requests": [
{ "url": "https://www.lazada.sg/shop-desktop-computer/" },
{ "url": "https://www.lazada.sg/catalog/?q=tshirt" },
{ "url": "https://www.lazada.sg/products/raspberry-pi-4-heat-sinks-1-copper-2-aluminium-3-piece-cooling-heatsink-heat-sink-accessories-i453778483.html" },
{ "url": "https://www.lazada.sg/shop/loreal-paris" }
]
}

This performs four distinct requests:

RequestURL
Search products in the category "Computers & Laptops > Desktops Computers"https://www.lazada.sg/shop-desktop-computer/
Search for products with the keyword "tshirt"https://www.lazada.sg/catalog/?q=tshirt
Retrieve detailed product informationhttps://www.lazada.sg/products/raspberry-pi-4-heat-sinks-1-copper-2-aluminium-3-piece-cooling-heatsink-heat-sink-accessories-i453778483.html
Retrieve detailed information about a shophttps://www.lazada.sg/shop/loreal-paris

After few seconds the Lazada Scraper returns the following results:

For a comprehensive understanding of how to utilize the Lazada Scraper effectively, including more examples and detailed explanations, please consult the How to Scrape Lazada? section below.

What data can you extract?

Lazada Scraper seamlessly interacts with Lazada's unofficial API to extract a wide range of data, including:

  • Product searches by keyword and category.
  • Detailed product information (description, price, stock levels, sales data, and variants).
  • Buyer ratings with comments and media.
  • Hierarchical category trees.
  • Lists of all sellers on the platform.
  • Comprehensive seller details (description, opening date, product counts, and performance metrics).
  • Seller products.
  • Seller hot keywords, which are the top 10 keywords driving sales for a seller over the past week.

    Note: By aggregating hot keywords from all sellers, you can identify and rank significant keywords based on their sales volume. Regular scraping, such as on a weekly basis, enables you to track keyword trends and identify those that are gaining popularity the quickest, providing valuable insights for strategic planning.

tip

Consider scraping both Lazada and Shopee simultaneously to strengthen your data acquisition strategy. This not only ensures a more reliable pipeline—given the low probability of both platforms introducing breaking changes simultaneously—but also leverages complementary data insights, as keyword trends from Lazada can enhance product searches on Shopee, which has a larger market share.

info

Please visit the API reference section below for more details about each API.

Please contact me if you encounter issues or need additional features.

Why scrape Lazada data?

Discover what you can achieve with the Lazada Scraper through these example use cases:

  • Market and Competitive Analysis: Track category performance and identify emerging trends by analyzing sales data, product reviews, and competitor strategies. This can reveal insights into market demands and areas for strategic advantage.
  • Opportunity Discovery: Use aggregated product search results to uncover popular yet underserved keywords and niches in the market. This can highlight potential areas for new product development or market entry.
  • Product Review Analysis: Employ AI, such as ChatGPT, to deeply analyze product reviews. Summarize buyer feedback to gain insights into consumer satisfaction, preferences, and areas for product improvement.
  • Competitor Watch: Regularly scrape competitor data to monitor for critical changes like products going out of stock, pricing adjustments, or new product launches. This enables real-time alerts and swift response strategies.

How to scrape Lazada?

Getting started

Get started with Lazada Scraper by following these simple steps:

  1. Prepare Requests: Formulate the request URLs you wish to scrape. Refer to the API reference section for guidance. Example URLs include:
  • Search for popular products by keyword: https://www.lazada.com.my/catalog/?q=raspberry%20pi%205
  • Retrieve top categories: https://acs-m.lazada.com.my/h5/mtop.lazada.guided.shopping.categories.categorieslpcommon/1.0/
  • List products by category: https://www.lazada.com.my/shop-computers-laptops/
  • List all sellers: https://www.lazada.com.my/sitemap-sellers.xml
  • Access seller detail (including his hot keywords): https://www.lazada.com.my/shop/citemalaysia/
  • List products by seller: https://www.lazada.com.my/nike/?q=All-Products&from=wangpu&langFlag=en&pageTypeId=2
  • Access product detail: https://www.lazada.com.my/products/120ml-skintific-all-day-light-sunscreen-mist-spf50-pa-sunblock-spray-anti-uv-face-body-spray-120ml-i3525761808-s22573296116.html
  • Access product reviews: https://my.lazada.com.my/pdp/review/getReviewList?itemId=2932861112 1Execution: Submit your scraping tasks and wait for the results. Follow the execution via the Grafana dashboard.

Example input:

{
// Search products with the "raspberry pi 5" keyword
"requests": [
{ "url": "https://www.lazada.com.my/catalog/?q=raspberry%20pi%205" }
]
}

Example JSON output from a scraping session:

{
"url": "https://www.lazada.com.my/catalog/?q=raspberry%20pi%205",
"enrichedUrl": "https://www.lazada.com.my/catalog/?q=raspberry%20pi%205",
"referrer": "https://lazada.com.my/",
"knownApi": "KEYWORD_PRODUCTS",
"responseStatus": 200,
"responseBody": {
"productTotal": 1309,
"products": [
{
"itemId": 4084035635,
"pageUrl": "https://www.lazada.com.my/products/raspberry-pi-5-bundle-8gbram32gb-noobsredwhite-case-with-fan-i4084035635.html",
"name": "Raspberry Pi 5 Bundle (8GBRAM/32GB NOOBS/Red/White Case with Fan)",
"imageUrl": "https://sg-test-11.slatic.net/p/c545e50e4305a33c6291a7021a1cc60a.png",
"categoryId": 10000493,
"location": "Wp Kuala Lumpur",
"originalPrice": 699,
"price": 618.9,
"isInStock": true,
"isAd": false,
"isSponsored": false,
"isLazMall": false,
"isLazGlobal": false,
"hasFreeShipping": false,
"reviewCount": 0,
"brandId": 32076,
"brandName": "Raspberry Pi",
"defaultSkuId": 23134383025,
"supplierId": 3335,
"shopId": 19295
}
// ... 59 other products
]
}
}

This output includes fields like the original URL (url) and the HTTP response (responseStatus and responseBody), among others.

warning

The responseBody above is a cleaned version of the real Lazada response. If you want the raw response, set cleanResponseBody to false. Check the input parameters for more details.

Input parameters

The Lazada Scraper offers a range of input parameters, allowing you to tailor your scraping process according to your needs. These parameters can be broadly categorized into common parameters applicable to all APIs and specific parameters unique to certain APIs (detailed in the API reference section).

Key common parameters include:

  1. requests: This is where you specify the list of URLs you intend to scrape.
  2. maxRequestsPerCrawl (optional): Set the maximum number of requests the crawler will execute. The process stops once this limit is reached.
  3. cleanResponseBody (default = true): Indicates whether the response should be cleaned (true) or returned as received (false).
  4. emulateMobileDevice (default = false): Simulates a mobile browser when set to true. Some APIs may return different results in this mode.
  5. language (optional, default = "en" or "id"): Language set in the "hng" cookie and the "x-i18n-language" header. Acceptable values: en (English), id (Indonesian), ms (Malay), th (Thai), vi (Vietnamese).

For any queries or assistance, feel free to reach out.

Output fields

The data returned from the Lazada Scraper includes several important fields. Below is a description of each:

API reference

Product Search by Keyword or Category or Seller

  • URL Path:
    • https://www.lazada.${tld}/catalog/?q=${keyword}
    • https://www.lazada.${tld}/${categorySlug}/
    • https://www.lazada.${tld}/${sellerSlug}/?q=All-Products&from=wangpu&langFlag=en&pageTypeId=2
warning

Avoid specifying the page parameter in the URL due to inefficiencies related to Lazada's pagination limitations. Sequential navigation from the first page is required, which can consume significant proxy traffic. For multi-page crawling, consider using the productListing_crawlNextPages parameter.

  • URL Parameters:

    • keyword: Search keyword, e.g., "tshirt".
    • categorySlug: Category slug, e.g. "shop-computers-laptops".
    • sellerSlug: Seller slug, e.g. "nike".
  • Scraper Input Parameters

    • cleanResponseBody (default = true): Indicates whether the response should be cleaned (true) or returned as received (false).
    • emulateMobileDevice (default = false): Simulates a mobile browser when set to true. This is particularly useful for this API, as it only provides consistent ordering in mobile mode.
    • productListing_crawlNextPages (default = false): Enables automatic crawling of subsequent pages.
    • productListing_crawlNextPages_maxPages (optional): Sets a cap on the number of pages to crawl.
    • productListing_crawlNextPages_maxUniqueProducts (optional): Limits the number of unique products to scrape. Products that appear multiple times are counted once.
    • productListing_crawlNextPages_stopWhenNewPageOnlyContainsDuplicates (default = true): Prevents infinite loops in pagination bugs by stopping when a page only contains previously listed products.
info

If productListing_crawlNextPages is set to true, all pages are scraped sequentially within a single scraping task. This is because Lazada requires products to be listed page by page within the same web browser session, from page 1. Direct access to a non-consecutive page (e.g., jumping straight to page 5) would cause Lazada to display results for the page 1.

In practice, this means that a single scraping task may require more than 10min to be processed, at it may involve more than 100 HTTP requests (note that a random delay is added between each scrape, to reduce bot detection).

The nature of this sequential crawling increases the susceptibility of the scraping task to being blocked, especially as successful completion depends on numerous uninterrupted requests. Although captchas can occasionally interrupt the session, they can be resolved; however, if no captcha is presented and the session is disrupted, the process need to restart from the first page.

  • Example Input:
{
"requests": [
// Search products with the "raspberry pi 5" keyword
{ "url": "https://www.lazada.com.my/catalog/?q=raspberry%20pi%205" },

// Search products in the "Computers / Laptops" category
{ "url": "https://www.lazada.com.my/shop-computers-laptops/" },

// Search products from Nike store
{ "url": "https://www.lazada.com.my/nike/?q=All-Products&from=wangpu&langFlag=en&pageTypeId=2" }
]
}

Product Details

  • URL Path:

    • https://www.lazada.${tld}/products/${productSlug}-i${productId}-s${sellerId}.html
  • URL Parameters:

    • productSlug: Product slug, e.g., "120ml-skintific-all-day-light-sunscreen-mist-spf50-pa-sunblock-spray-anti-uv-face-body-spray-120ml".
    • productId: Product ID (a.k.a. itemId), e.g. 3525761808.
    • sellerId: Seller ID, e.g. 22573296116.
  • Scraper Input Parameters

    • cleanResponseBody (default = true): Indicates whether the response should be cleaned (true) or returned as received (false).
    • emulateMobileDevice (default = false): Simulates a mobile browser when set to true. Some APIs may return different results in this mode.
  • Example Input:

{
"requests": [
{ "url": "https://www.lazada.com.my/products/120ml-skintific-all-day-light-sunscreen-mist-spf50-pa-sunblock-spray-anti-uv-face-body-spray-120ml-i3525761808-s22573296116.html" }
]
}

Product Reviews

  • URL Path:

    • https://my.lazada.${tld}/pdp/review/getReviewList?itemId=${productId}
  • URL Parameters:

    • productId: Product ID (a.k.a. itemId), e.g. 3525761808.
  • Scraper Input Parameters

    • cleanResponseBody (default = true): Indicates whether the response should be cleaned (true) or returned as received (false).
    • emulateMobileDevice (default = false): Simulates a mobile browser when set to true. Some APIs may return different results in this mode.
  • Example Input:

{
"requests": [
{ "url": "https://my.lazada.com.my/pdp/review/getReviewList?itemId=2932861112" }
]
}

Category Tree

  • URL Path:

    • https://acs-m.lazada.${tld}/h5/mtop.lazada.guided.shopping.categories.categorieslpcommon/1.0/
  • Scraper Input Parameters

    • emulateMobileDevice (default = false): Simulates a mobile browser when set to true. Some APIs may return different results in this mode.
  • Example Input:

{
"requests": [
{ "url": "https://acs-m.lazada.com.my/h5/mtop.lazada.guided.shopping.categories.categorieslpcommon/1.0/" }
]
}

Seller Listing

  • URL Path:

    • https://www.lazada.${tld}/sitemap-sellers.xml?limit=${limit}&offset=${offset}
  • URL Parameters:

    • limit: Number of sub-sitemaps to load (better equals to or less than 30).
    • offset: Offset for results, typically a multiple of limit.
  • Example Input:

{
"requests": [
{ "url": "https://www.lazada.com.my/sitemap-sellers.xml" }
]
}

Seller Details

  • URL Path:

    • https://www.lazada.${tld}/shop/${sellerSlug}/
  • URL Parameters:

    • sellerSlug: Seller slug, e.g., "citemalaysia".
  • Scraper Input Parameters

    • cleanResponseBody (default = true): Indicates whether the response should be cleaned (true) or returned as received (false).
    • emulateMobileDevice (default = false): Simulates a mobile browser when set to true. Some APIs may return different results in this mode.
  • Example Input:

{
"requests": [
{ "url": "https://www.lazada.com.my/shop/citemalaysia/" }
]
}

Seller Promoted Products

  • URL Path:

    • https://www.lazada.${tld}/shop/site/api/shop/campaignTppProducts/query?shopId=${shopId}&sellerId=${sellerId}&itemId=${productId}
  • URL Parameters:

    • shopId: Shop ID, can be obtained from the Seller Details.
    • sellerId: Seller ID, can be obtained from the Seller Details.
    • productId: Any product ID from the seller.
  • Scraper Input Parameters

    • emulateMobileDevice (default = false): Simulates a mobile browser when set to true. Some APIs may return different results in this mode.
  • Example Input:

{
"requests": [
{ "url": "https://www.lazada.co.id/shop/site/api/shop/campaignTppProducts/query?shopId=3258813&sellerId=400611231032&itemId=7991896339" }
]
}

Keyword Listing

  • URL Path:

    • For ID, VN: https://www.lazada.${tld}/tag-order-last-30days-morethan0.xml?limit=${limit}&offset=${offset}
    • For PH, TH: https://www.lazada.${tld}/tag-order-last-60days-morethan0.xml?limit=${limit}&offset=${offset}
    • For MY, SG: https://www.lazada.${tld}/tag-order-last-90days-morethan0.xml?limit=${limit}&offset=${offset}
  • URL Parameters:

    • limit: Number of sub-sitemaps to load (better equals to or less than 30).
    • offset: Offset for results, typically a multiple of limit.
  • Example Input:

{
"requests": [
{ "url": "https://www.lazada.com.my/tag-order-last-90days-morethan0.xml?limit=30&offset=0" }
]
}

It is legal to scrape publicly available data such as product descriptions, prices, or ratings. Read Apify's blog post on the legality of web scraping to learn more.

Your feedback

Please don't hesitate to share your feedback (improvement ideas, bug, ...etc.). You can reach me on Discord (username "marcplouhinec").

Thanks

Lazada Scraper is built with Crawlee, a great JavaScript framework to accelerate scraper development.

It also uses ungoogled-chromium via the Chrome DevTools Protocol. These powerful technologies have been instrumental in "opening" the website despite anti-bot protections.

Finally, a lot of knowledge that enabled the development of Lazada Scraper comes from The Web Scraping Club and its Discord Community.