# Tokopaedi - Python Library for Tokopedia E-Commerce Data Extraction
**Extract product data, reviews, and search results from Tokopedia with ease.**
Tokopaedi is a powerful Python library designed for scraping e-commerce data from Tokopedia, including product searches, detailed product information, and customer reviews. Ideal for developers, data analysts, and businesses looking to analyze Tokopedia's marketplace.
 [](https://pepy.tech/projects/tokopaedi)  [](https://github.com/hilmiazizi/tokopaedi/blob/main/LICENSE) 
---
## Key Features
- **Product Search**: Search Tokopedia products by keyword with customizable filters (price, rating, condition, etc.).
- **Detailed Product Data**: Retrieve rich product details, including variants, pricing, stock, and media.
- **Customer Reviews**: Scrape product reviews with ratings, timestamps, and more.
- **Serializable Results**: Dataclass-based results with `.json()` for easy export to JSON or pandas DataFrames.
- **SearchResults Container**: Iterable and serializable container for streamlined data handling.
---
## Installation
Install Tokopaedi via pip:
```bash
pip install tokopaedi
## Quick Start
```python
from tokopaedi import search, SearchFilters, get_product, get_reviews, combine_data
from dataclasses import dataclass, asdict
import json
filters = SearchFilters(
bebas_ongkir_extra = True,
pmin = 15000000,
pmax = 30000000,
rt = 4.5
)
results = search("Zenbook 14 32GB", max_result=100, debug=False)
for result in results:
combine_data(
result,
get_product(product_id=result.product_id, debug=True),
get_reviews(product_id=result.product_id, max_result=20, debug=True)
)
with open('log.json','w') as f:
f.write(json.dumps(results.json(), indent=4))
print(json.dumps(results.json(), indent=4))
```
## 📘 API Overview
### 🔍 `search(keyword: str, max_result: int = 100, filters: Optional[SearchFilters] = None, debug: bool = False) -> SearchResults`
Search for products from Tokopedia.
**Parameters:**
- `keyword`: string keyword (e.g., `"logitech mouse"`).
- `max_result`: Expected number of results to return.
- `filters`: Optional `SearchFilters` instance to narrow search results.
- `debug`: Show debug message if True
**Returns:**
- A `SearchResults` instance (list-like object of `ProductSearchResult`), supporting `.json()` for easy export.
----------
### 📦 `get_product(product_id: Optional[Union[int, str]] = None, url: Optional[str] = None, debug: bool = False) -> ProductData`
Fetch detailed information for a given Tokopedia product.
**Parameters:**
- `product_id`: (Optional) The product ID returned from `search()`. If provided, this will take precedence over `url`.
- `url`: (Optional) The full product URL. Used only if `product_id` is not provided.
- `debug`: If `True`, prints debug output for troubleshooting.
> ⚠️ Either `product_id` or `url` must be provided. If both are given, `product_id` is used and `url` is ignored.
**Returns:**
- A `ProductData` instance containing detailed information such as product name, pricing, variants, media, stock, rating, etc.
- Supports `.json()` for easy serialization (e.g., to use with `pandas` or export as `.json`).
----------
### 🗣️ `get_reviews(product_id: Optional[Union[int, str]] = None, url: Optional[str] = None, max_count: int = 20, debug: bool = False) -> List[ProductReview]`
Scrape customer reviews for a given product.
**Parameters:**
- `product_id`: (Optional) The product ID to fetch reviews for. Takes precedence over `url` if both are provided.
- `url`: (Optional) Full product URL. Used only if `product_id` is not provided.
- `max_count`: Maximum number of reviews to fetch (default: 20).
- `debug`: Show debug messages if `True`.
> ⚠️ Either `product_id` or `url` must be provided.
**Returns:**
- A list of `ProductReview` objects.
- Each object supports `.json()` for serialization (e.g., for use with `pandas` or JSON export).
----------
### 🔗 `combine_data(search_results, products=None, reviews=None) -> SearchResults`
Attach product detail and/or reviews to the search results.
**Parameters:**
- `search_results`: The `SearchResults` from `search()`.
- `products`: List of `ProductData` from `get_product()` (optional).
- `reviews`: List of `ProductReview` from `get_reviews()` (optional).
**Returns:**
- A new `SearchResults` object with `.product_detail` and `.product_reviews` fields filled in (if data was provided).
----------
## `SearchFilters` – Optional Search Filters
Use `SearchFilters` to refine your search results. All fields are optional. Pass it into the `search()` function via the `filters` argument.
#### Example:
```python
from tokopaedi import SearchFilters, search
filters = SearchFilters(
pmin=100000,
pmax=1000000,
condition=1, # 1 = New
is_discount=True,
bebas_ongkir_extra=True,
rt=4.5, # Minimum rating 4.5
latest_product=30 # Products listed in the last 30 days
)
results = search("logitech mouse", filters=filters)
```
#### Available Fields:
| Field | Type | Description | Accepted Values |
|----------------------|----------|---------------------------------------------------|----------------------------------|
| `pmin` | `int` | Minimum price (in IDR) | e.g., `100000` |
| `pmax` | `int` | Maximum price (in IDR) | e.g., `1000000` |
| `condition` | `int` | Product condition | `1` = New, `2` = Used |
| `shop_tier` | `int` | Type of shop | `2` = Mall, `3` = Power Shop |
| `rt` | `float` | Minimum rating | e.g., `4.5` |
| `latest_product` | `int` | Product recency filter | `7`, `30`, `90` |
| `bebas_ongkir_extra` | `bool` | Filter for extra free shipping | `True` / `False` |
| `is_discount` | `bool` | Only show discounted products | `True` / `False` |
| `is_fulfillment` | `bool` | Only Fulfilled by Tokopedia | `True` / `False` |
| `is_plus` | `bool` | Only Tokopedia PLUS sellers | `True` / `False` |
| `cod` | `bool` | Cash on delivery available | `True` / `False` |
---
## Example: Enrich with product details & reviews, then convert to pandas DataFrame from Jupyter Notebook
```python
from tokopaedi import search, SearchFilters, get_product, get_reviews, combine_data
import json
import pandas as pd
from pandas import json_normalize
filters = SearchFilters(
bebas_ongkir_extra=True,
pmax=100000,
rt=4.5
)
# Fetch search results
results = search("logitech g304", max_result=10, debug=False)
# Enrich each result with product details and reviews
for result in results:
combine_data(
result,
get_product(product_id=result.product_id, debug=False),
get_reviews(product_id=result.product_id, max_result=1, debug=False)
)
# Convert to DataFrame and preview important fields
df = json_normalize(results.json())
print(df[[
"product_id",
"category",
"real_price",
"original_price",
"product_detail.product_name",
"rating",
"shop.name"
]].head())
```
## Author
Tokopaedi was created by **Hilmi Azizi**. For inquiries, feedback, or collaboration, contact me at [root@hilmiazizi.com](mailto:root@hilmiazizi.com). You can also reach out via [GitHub Issues](https://github.com/hilmiazizi/tokopaedi/issues) for bug reports or feature suggestions.
## 📄 License
This project is licensed under the MIT License.
You are free to use, modify, and distribute this project with attribution. See the [LICENSE](https://github.com/hilmiazizi/tokopaedi/blob/main/LICENSE) file for more details.
Raw data
{
"_id": null,
"home_page": "https://github.com/hilmiazizi/tokopaedi",
"name": "tokopaedi",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.8",
"maintainer_email": null,
"keywords": "tokopedia, scraper, ecommerce, product, data, web-scraping, python, data-analysis, tokopedia-reviews",
"author": "Hilmi Azizi",
"author_email": "root@hilmiazizi.com",
"download_url": "https://files.pythonhosted.org/packages/08/de/8cb423f0134f28b8f91e337a44844d73a481418ee9e3bf8f70aeca571007/tokopaedi-0.1.2.tar.gz",
"platform": null,
"description": "# Tokopaedi - Python Library for Tokopedia E-Commerce Data Extraction\n\n**Extract product data, reviews, and search results from Tokopedia with ease.**\n\nTokopaedi is a powerful Python library designed for scraping e-commerce data from Tokopedia, including product searches, detailed product information, and customer reviews. Ideal for developers, data analysts, and businesses looking to analyze Tokopedia's marketplace.\n\n [](https://pepy.tech/projects/tokopaedi)  [](https://github.com/hilmiazizi/tokopaedi/blob/main/LICENSE) \n\n---\n\n## Key Features\n\n- **Product Search**: Search Tokopedia products by keyword with customizable filters (price, rating, condition, etc.).\n- **Detailed Product Data**: Retrieve rich product details, including variants, pricing, stock, and media.\n- **Customer Reviews**: Scrape product reviews with ratings, timestamps, and more.\n- **Serializable Results**: Dataclass-based results with `.json()` for easy export to JSON or pandas DataFrames.\n- **SearchResults Container**: Iterable and serializable container for streamlined data handling.\n\n---\n\n## Installation\n\nInstall Tokopaedi via pip:\n\n```bash\npip install tokopaedi\n\n## Quick Start\n\n```python\nfrom tokopaedi import search, SearchFilters, get_product, get_reviews, combine_data\nfrom dataclasses import dataclass, asdict\nimport json\n\nfilters = SearchFilters(\n bebas_ongkir_extra = True,\n pmin = 15000000,\n pmax = 30000000,\n rt = 4.5\n )\n\nresults = search(\"Zenbook 14 32GB\", max_result=100, debug=False)\nfor result in results:\n combine_data(\n result,\n get_product(product_id=result.product_id, debug=True),\n get_reviews(product_id=result.product_id, max_result=20, debug=True)\n )\n\nwith open('log.json','w') as f:\n f.write(json.dumps(results.json(), indent=4))\nprint(json.dumps(results.json(), indent=4))\n```\n\n\n## \ud83d\udcd8 API Overview\n\n### \ud83d\udd0d `search(keyword: str, max_result: int = 100, filters: Optional[SearchFilters] = None, debug: bool = False) -> SearchResults`\n\nSearch for products from Tokopedia.\n\n**Parameters:**\n\n- `keyword`: string keyword (e.g., `\"logitech mouse\"`).\n \n- `max_result`: Expected number of results to return.\n \n- `filters`: Optional `SearchFilters` instance to narrow search results.\n \n- `debug`: Show debug message if True\n \n\n**Returns:**\n\n- A `SearchResults` instance (list-like object of `ProductSearchResult`), supporting `.json()` for easy export.\n \n\n----------\n\n### \ud83d\udce6 `get_product(product_id: Optional[Union[int, str]] = None, url: Optional[str] = None, debug: bool = False) -> ProductData`\n\nFetch detailed information for a given Tokopedia product.\n\n**Parameters:**\n\n- `product_id`: (Optional) The product ID returned from `search()`. If provided, this will take precedence over `url`.\n- `url`: (Optional) The full product URL. Used only if `product_id` is not provided.\n- `debug`: If `True`, prints debug output for troubleshooting.\n\n> \u26a0\ufe0f Either `product_id` or `url` must be provided. If both are given, `product_id` is used and `url` is ignored.\n\n**Returns:**\n\n- A `ProductData` instance containing detailed information such as product name, pricing, variants, media, stock, rating, etc.\n- Supports `.json()` for easy serialization (e.g., to use with `pandas` or export as `.json`).\n\n----------\n\n### \ud83d\udde3\ufe0f `get_reviews(product_id: Optional[Union[int, str]] = None, url: Optional[str] = None, max_count: int = 20, debug: bool = False) -> List[ProductReview]`\n\nScrape customer reviews for a given product.\n\n**Parameters:**\n\n- `product_id`: (Optional) The product ID to fetch reviews for. Takes precedence over `url` if both are provided.\n- `url`: (Optional) Full product URL. Used only if `product_id` is not provided.\n- `max_count`: Maximum number of reviews to fetch (default: 20).\n- `debug`: Show debug messages if `True`.\n\n> \u26a0\ufe0f Either `product_id` or `url` must be provided.\n\n**Returns:**\n\n- A list of `ProductReview` objects.\n- Each object supports `.json()` for serialization (e.g., for use with `pandas` or JSON export).\n\n----------\n\n### \ud83d\udd17 `combine_data(search_results, products=None, reviews=None) -> SearchResults`\n\nAttach product detail and/or reviews to the search results.\n\n**Parameters:**\n\n- `search_results`: The `SearchResults` from `search()`.\n \n- `products`: List of `ProductData` from `get_product()` (optional).\n \n- `reviews`: List of `ProductReview` from `get_reviews()` (optional).\n \n\n**Returns:**\n\n- A new `SearchResults` object with `.product_detail` and `.product_reviews` fields filled in (if data was provided).\n \n\n----------\n## `SearchFilters` \u2013 Optional Search Filters\n\nUse `SearchFilters` to refine your search results. All fields are optional. Pass it into the `search()` function via the `filters` argument.\n\n#### Example:\n```python\nfrom tokopaedi import SearchFilters, search\n\nfilters = SearchFilters(\n pmin=100000,\n pmax=1000000,\n condition=1, # 1 = New\n is_discount=True,\n bebas_ongkir_extra=True,\n rt=4.5, # Minimum rating 4.5\n latest_product=30 # Products listed in the last 30 days\n)\n\nresults = search(\"logitech mouse\", filters=filters)\n```\n\n#### Available Fields:\n\n| Field | Type | Description | Accepted Values |\n|----------------------|----------|---------------------------------------------------|----------------------------------|\n| `pmin` | `int` | Minimum price (in IDR) | e.g., `100000` |\n| `pmax` | `int` | Maximum price (in IDR) | e.g., `1000000` |\n| `condition` | `int` | Product condition | `1` = New, `2` = Used |\n| `shop_tier` | `int` | Type of shop | `2` = Mall, `3` = Power Shop |\n| `rt` | `float` | Minimum rating | e.g., `4.5` |\n| `latest_product` | `int` | Product recency filter | `7`, `30`, `90` |\n| `bebas_ongkir_extra` | `bool` | Filter for extra free shipping | `True` / `False` |\n| `is_discount` | `bool` | Only show discounted products | `True` / `False` |\n| `is_fulfillment` | `bool` | Only Fulfilled by Tokopedia | `True` / `False` |\n| `is_plus` | `bool` | Only Tokopedia PLUS sellers | `True` / `False` |\n| `cod` | `bool` | Cash on delivery available | `True` / `False` |\n\n\n---\n\n## Example: Enrich with product details & reviews, then convert to pandas DataFrame from Jupyter Notebook\n\n```python\nfrom tokopaedi import search, SearchFilters, get_product, get_reviews, combine_data\nimport json\nimport pandas as pd\nfrom pandas import json_normalize\n\nfilters = SearchFilters(\n bebas_ongkir_extra=True,\n pmax=100000,\n rt=4.5\n)\n\n# Fetch search results\nresults = search(\"logitech g304\", max_result=10, debug=False)\n\n# Enrich each result with product details and reviews\nfor result in results:\n combine_data(\n result,\n get_product(product_id=result.product_id, debug=False),\n get_reviews(product_id=result.product_id, max_result=1, debug=False)\n )\n\n# Convert to DataFrame and preview important fields\ndf = json_normalize(results.json())\nprint(df[[\n \"product_id\",\n \"category\",\n \"real_price\",\n \"original_price\",\n \"product_detail.product_name\",\n \"rating\",\n \"shop.name\"\n]].head())\n```\n\n## Author\n\nTokopaedi was created by **Hilmi Azizi**. For inquiries, feedback, or collaboration, contact me at [root@hilmiazizi.com](mailto:root@hilmiazizi.com). You can also reach out via [GitHub Issues](https://github.com/hilmiazizi/tokopaedi/issues) for bug reports or feature suggestions.\n\n## \ud83d\udcc4 License\n\nThis project is licensed under the MIT License.\n\nYou are free to use, modify, and distribute this project with attribution. See the [LICENSE](https://github.com/hilmiazizi/tokopaedi/blob/main/LICENSE) file for more details.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A Python scraper for Tokopedia that supports filtered product search, detailed product information, and customer reviews with accurate mobile pricing and Jupyter Notebook compatibility.",
"version": "0.1.2",
"project_urls": {
"Documentation": "https://github.com/hilmiazizi/tokopaedi",
"Homepage": "https://github.com/hilmiazizi/tokopaedi",
"Repository": "https://github.com/hilmiazizi/tokopaedi"
},
"split_keywords": [
"tokopedia",
" scraper",
" ecommerce",
" product",
" data",
" web-scraping",
" python",
" data-analysis",
" tokopedia-reviews"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "102a0752b30af638cd99a6721a3aa8c0db3001c396350031d21190ddef5efb74",
"md5": "13c79e70bab29a4aa2ef10fd7deefdf0",
"sha256": "2c650397ccce4f6d96ce5bc1be30af5ba448720bf2497e70f7139738fbdd2579"
},
"downloads": -1,
"filename": "tokopaedi-0.1.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "13c79e70bab29a4aa2ef10fd7deefdf0",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.8",
"size": 18723,
"upload_time": "2025-07-17T02:13:56",
"upload_time_iso_8601": "2025-07-17T02:13:56.013188Z",
"url": "https://files.pythonhosted.org/packages/10/2a/0752b30af638cd99a6721a3aa8c0db3001c396350031d21190ddef5efb74/tokopaedi-0.1.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "08de8cb423f0134f28b8f91e337a44844d73a481418ee9e3bf8f70aeca571007",
"md5": "47439ca83ec7db670d6a4a1da01e3f11",
"sha256": "8707ebc145cbe2bf880f9e69bc1b28273c97124ee5d83a48efba5911b675f8ba"
},
"downloads": -1,
"filename": "tokopaedi-0.1.2.tar.gz",
"has_sig": false,
"md5_digest": "47439ca83ec7db670d6a4a1da01e3f11",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.8",
"size": 18290,
"upload_time": "2025-07-17T02:13:57",
"upload_time_iso_8601": "2025-07-17T02:13:57.666792Z",
"url": "https://files.pythonhosted.org/packages/08/de/8cb423f0134f28b8f91e337a44844d73a481418ee9e3bf8f70aeca571007/tokopaedi-0.1.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-17 02:13:57",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "hilmiazizi",
"github_project": "tokopaedi",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "tokopaedi"
}