tokopaedi


Nametokopaedi JSON
Version 0.1.2 PyPI version JSON
download
home_pagehttps://github.com/hilmiazizi/tokopaedi
SummaryA Python scraper for Tokopedia that supports filtered product search, detailed product information, and customer reviews with accurate mobile pricing and Jupyter Notebook compatibility.
upload_time2025-07-17 02:13:57
maintainerNone
docs_urlNone
authorHilmi Azizi
requires_python<4.0,>=3.8
licenseMIT
keywords tokopedia scraper ecommerce product data web-scraping python data-analysis tokopedia-reviews
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Tokopaedi - Python Library for Tokopedia E-Commerce Data Extraction

**Extract product data, reviews, and search results from Tokopedia with ease.**

Tokopaedi is a powerful Python library designed for scraping e-commerce data from Tokopedia, including product searches, detailed product information, and customer reviews. Ideal for developers, data analysts, and businesses looking to analyze Tokopedia's marketplace.

![PyPI](https://img.shields.io/pypi/v/tokopaedi) [![PyPI Downloads](https://static.pepy.tech/badge/tokopaedi)](https://pepy.tech/projects/tokopaedi) ![GitHub Repo stars](https://img.shields.io/github/stars/hilmiazizi/tokopaedi?style=social) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://github.com/hilmiazizi/tokopaedi/blob/main/LICENSE) ![GitHub forks](https://img.shields.io/github/forks/hilmiazizi/tokopaedi?style=social)

---

## Key Features

- **Product Search**: Search Tokopedia products by keyword with customizable filters (price, rating, condition, etc.).
- **Detailed Product Data**: Retrieve rich product details, including variants, pricing, stock, and media.
- **Customer Reviews**: Scrape product reviews with ratings, timestamps, and more.
- **Serializable Results**: Dataclass-based results with `.json()` for easy export to JSON or pandas DataFrames.
- **SearchResults Container**: Iterable and serializable container for streamlined data handling.

---

## Installation

Install Tokopaedi via pip:

```bash
pip install tokopaedi

##  Quick Start

```python
from tokopaedi import search, SearchFilters, get_product, get_reviews, combine_data
from dataclasses import dataclass, asdict
import json

filters = SearchFilters(
            bebas_ongkir_extra = True,
            pmin = 15000000,
            pmax = 30000000,
            rt = 4.5
        )

results = search("Zenbook 14 32GB", max_result=100, debug=False)
for result in results:
    combine_data(
        result,
        get_product(product_id=result.product_id, debug=True),
        get_reviews(product_id=result.product_id, max_result=20, debug=True)
    )

with open('log.json','w') as f:
    f.write(json.dumps(results.json(), indent=4))
print(json.dumps(results.json(), indent=4))
```


## 📘 API Overview

### 🔍 `search(keyword: str, max_result: int = 100, filters: Optional[SearchFilters] = None, debug: bool = False) -> SearchResults`

Search for products from Tokopedia.

**Parameters:**

-   `keyword`: string keyword (e.g., `"logitech mouse"`).
    
-   `max_result`: Expected number of results to return.
    
-   `filters`: Optional `SearchFilters` instance to narrow search results.
    
-   `debug`: Show debug message if True
    

**Returns:**

-   A `SearchResults` instance (list-like object of `ProductSearchResult`), supporting `.json()` for easy export.
    

----------

### 📦 `get_product(product_id: Optional[Union[int, str]] = None, url: Optional[str] = None, debug: bool = False) -> ProductData`

Fetch detailed information for a given Tokopedia product.

**Parameters:**

- `product_id`: (Optional) The product ID returned from `search()`. If provided, this will take precedence over `url`.
- `url`: (Optional) The full product URL. Used only if `product_id` is not provided.
- `debug`: If `True`, prints debug output for troubleshooting.

> ⚠️ Either `product_id` or `url` must be provided. If both are given, `product_id` is used and `url` is ignored.

**Returns:**

- A `ProductData` instance containing detailed information such as product name, pricing, variants, media, stock, rating, etc.
- Supports `.json()` for easy serialization (e.g., to use with `pandas` or export as `.json`).

----------

### 🗣️ `get_reviews(product_id: Optional[Union[int, str]] = None, url: Optional[str] = None, max_count: int = 20, debug: bool = False) -> List[ProductReview]`

Scrape customer reviews for a given product.

**Parameters:**

- `product_id`: (Optional) The product ID to fetch reviews for. Takes precedence over `url` if both are provided.
- `url`: (Optional) Full product URL. Used only if `product_id` is not provided.
- `max_count`: Maximum number of reviews to fetch (default: 20).
- `debug`: Show debug messages if `True`.

> ⚠️ Either `product_id` or `url` must be provided.

**Returns:**

- A list of `ProductReview` objects.
- Each object supports `.json()` for serialization (e.g., for use with `pandas` or JSON export).

----------

### 🔗 `combine_data(search_results, products=None, reviews=None) -> SearchResults`

Attach product detail and/or reviews to the search results.

**Parameters:**

-   `search_results`: The `SearchResults` from `search()`.
    
-   `products`: List of `ProductData` from `get_product()` (optional).
    
-   `reviews`: List of `ProductReview` from `get_reviews()` (optional).
    

**Returns:**

-   A new `SearchResults` object with `.product_detail` and `.product_reviews` fields filled in (if data was provided).
    

----------
##  `SearchFilters` – Optional Search Filters

Use `SearchFilters` to refine your search results. All fields are optional. Pass it into the `search()` function via the `filters` argument.

#### Example:
```python
from tokopaedi import SearchFilters, search

filters = SearchFilters(
    pmin=100000,
    pmax=1000000,
    condition=1,              # 1 = New
    is_discount=True,
    bebas_ongkir_extra=True,
    rt=4.5,                   # Minimum rating 4.5
    latest_product=30         # Products listed in the last 30 days
)

results = search("logitech mouse", filters=filters)
```

#### Available Fields:

| Field                 | Type     | Description                                       | Accepted Values                  |
|----------------------|----------|---------------------------------------------------|----------------------------------|
| `pmin`               | `int`    | Minimum price (in IDR)                            | e.g., `100000`                   |
| `pmax`               | `int`    | Maximum price (in IDR)                            | e.g., `1000000`                  |
| `condition`          | `int`    | Product condition                                 | `1` = New, `2` = Used            |
| `shop_tier`          | `int`    | Type of shop                                      | `2` = Mall, `3` = Power Shop     |
| `rt`                 | `float`  | Minimum rating                                    | e.g., `4.5`                      |
| `latest_product`     | `int`    | Product recency filter                            | `7`, `30`, `90`               |
| `bebas_ongkir_extra` | `bool`   | Filter for extra free shipping                   | `True` / `False`                 |
| `is_discount`        | `bool`   | Only show discounted products                    | `True` / `False`                 |
| `is_fulfillment`     | `bool`   | Only Fulfilled by Tokopedia                      | `True` / `False`                 |
| `is_plus`            | `bool`   | Only Tokopedia PLUS sellers                      | `True` / `False`                 |
| `cod`                | `bool`   | Cash on delivery available                        | `True` / `False`                 |


---

## Example: Enrich with product details & reviews, then convert to pandas DataFrame from Jupyter Notebook

```python
from tokopaedi import search, SearchFilters, get_product, get_reviews, combine_data
import json
import pandas as pd
from pandas import json_normalize

filters = SearchFilters(
    bebas_ongkir_extra=True,
    pmax=100000,
    rt=4.5
)

# Fetch search results
results = search("logitech g304", max_result=10, debug=False)

# Enrich each result with product details and reviews
for result in results:
    combine_data(
        result,
        get_product(product_id=result.product_id, debug=False),
        get_reviews(product_id=result.product_id, max_result=1, debug=False)
    )

# Convert to DataFrame and preview important fields
df = json_normalize(results.json())
print(df[[
    "product_id",
    "category",
    "real_price",
    "original_price",
    "product_detail.product_name",
    "rating",
    "shop.name"
]].head())
```

## Author

Tokopaedi was created by **Hilmi Azizi**. For inquiries, feedback, or collaboration, contact me at [root@hilmiazizi.com](mailto:root@hilmiazizi.com). You can also reach out via [GitHub Issues](https://github.com/hilmiazizi/tokopaedi/issues) for bug reports or feature suggestions.

## 📄 License

This project is licensed under the MIT License.

You are free to use, modify, and distribute this project with attribution. See the [LICENSE](https://github.com/hilmiazizi/tokopaedi/blob/main/LICENSE) file for more details.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/hilmiazizi/tokopaedi",
    "name": "tokopaedi",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.8",
    "maintainer_email": null,
    "keywords": "tokopedia, scraper, ecommerce, product, data, web-scraping, python, data-analysis, tokopedia-reviews",
    "author": "Hilmi Azizi",
    "author_email": "root@hilmiazizi.com",
    "download_url": "https://files.pythonhosted.org/packages/08/de/8cb423f0134f28b8f91e337a44844d73a481418ee9e3bf8f70aeca571007/tokopaedi-0.1.2.tar.gz",
    "platform": null,
    "description": "# Tokopaedi - Python Library for Tokopedia E-Commerce Data Extraction\n\n**Extract product data, reviews, and search results from Tokopedia with ease.**\n\nTokopaedi is a powerful Python library designed for scraping e-commerce data from Tokopedia, including product searches, detailed product information, and customer reviews. Ideal for developers, data analysts, and businesses looking to analyze Tokopedia's marketplace.\n\n![PyPI](https://img.shields.io/pypi/v/tokopaedi) [![PyPI Downloads](https://static.pepy.tech/badge/tokopaedi)](https://pepy.tech/projects/tokopaedi) ![GitHub Repo stars](https://img.shields.io/github/stars/hilmiazizi/tokopaedi?style=social) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://github.com/hilmiazizi/tokopaedi/blob/main/LICENSE) ![GitHub forks](https://img.shields.io/github/forks/hilmiazizi/tokopaedi?style=social)\n\n---\n\n## Key Features\n\n- **Product Search**: Search Tokopedia products by keyword with customizable filters (price, rating, condition, etc.).\n- **Detailed Product Data**: Retrieve rich product details, including variants, pricing, stock, and media.\n- **Customer Reviews**: Scrape product reviews with ratings, timestamps, and more.\n- **Serializable Results**: Dataclass-based results with `.json()` for easy export to JSON or pandas DataFrames.\n- **SearchResults Container**: Iterable and serializable container for streamlined data handling.\n\n---\n\n## Installation\n\nInstall Tokopaedi via pip:\n\n```bash\npip install tokopaedi\n\n##  Quick Start\n\n```python\nfrom tokopaedi import search, SearchFilters, get_product, get_reviews, combine_data\nfrom dataclasses import dataclass, asdict\nimport json\n\nfilters = SearchFilters(\n            bebas_ongkir_extra = True,\n            pmin = 15000000,\n            pmax = 30000000,\n            rt = 4.5\n        )\n\nresults = search(\"Zenbook 14 32GB\", max_result=100, debug=False)\nfor result in results:\n    combine_data(\n        result,\n        get_product(product_id=result.product_id, debug=True),\n        get_reviews(product_id=result.product_id, max_result=20, debug=True)\n    )\n\nwith open('log.json','w') as f:\n    f.write(json.dumps(results.json(), indent=4))\nprint(json.dumps(results.json(), indent=4))\n```\n\n\n## \ud83d\udcd8 API Overview\n\n### \ud83d\udd0d `search(keyword: str, max_result: int = 100, filters: Optional[SearchFilters] = None, debug: bool = False) -> SearchResults`\n\nSearch for products from Tokopedia.\n\n**Parameters:**\n\n-   `keyword`: string keyword (e.g., `\"logitech mouse\"`).\n    \n-   `max_result`: Expected number of results to return.\n    \n-   `filters`: Optional `SearchFilters` instance to narrow search results.\n    \n-   `debug`: Show debug message if True\n    \n\n**Returns:**\n\n-   A `SearchResults` instance (list-like object of `ProductSearchResult`), supporting `.json()` for easy export.\n    \n\n----------\n\n### \ud83d\udce6 `get_product(product_id: Optional[Union[int, str]] = None, url: Optional[str] = None, debug: bool = False) -> ProductData`\n\nFetch detailed information for a given Tokopedia product.\n\n**Parameters:**\n\n- `product_id`: (Optional) The product ID returned from `search()`. If provided, this will take precedence over `url`.\n- `url`: (Optional) The full product URL. Used only if `product_id` is not provided.\n- `debug`: If `True`, prints debug output for troubleshooting.\n\n> \u26a0\ufe0f Either `product_id` or `url` must be provided. If both are given, `product_id` is used and `url` is ignored.\n\n**Returns:**\n\n- A `ProductData` instance containing detailed information such as product name, pricing, variants, media, stock, rating, etc.\n- Supports `.json()` for easy serialization (e.g., to use with `pandas` or export as `.json`).\n\n----------\n\n### \ud83d\udde3\ufe0f `get_reviews(product_id: Optional[Union[int, str]] = None, url: Optional[str] = None, max_count: int = 20, debug: bool = False) -> List[ProductReview]`\n\nScrape customer reviews for a given product.\n\n**Parameters:**\n\n- `product_id`: (Optional) The product ID to fetch reviews for. Takes precedence over `url` if both are provided.\n- `url`: (Optional) Full product URL. Used only if `product_id` is not provided.\n- `max_count`: Maximum number of reviews to fetch (default: 20).\n- `debug`: Show debug messages if `True`.\n\n> \u26a0\ufe0f Either `product_id` or `url` must be provided.\n\n**Returns:**\n\n- A list of `ProductReview` objects.\n- Each object supports `.json()` for serialization (e.g., for use with `pandas` or JSON export).\n\n----------\n\n### \ud83d\udd17 `combine_data(search_results, products=None, reviews=None) -> SearchResults`\n\nAttach product detail and/or reviews to the search results.\n\n**Parameters:**\n\n-   `search_results`: The `SearchResults` from `search()`.\n    \n-   `products`: List of `ProductData` from `get_product()` (optional).\n    \n-   `reviews`: List of `ProductReview` from `get_reviews()` (optional).\n    \n\n**Returns:**\n\n-   A new `SearchResults` object with `.product_detail` and `.product_reviews` fields filled in (if data was provided).\n    \n\n----------\n##  `SearchFilters` \u2013 Optional Search Filters\n\nUse `SearchFilters` to refine your search results. All fields are optional. Pass it into the `search()` function via the `filters` argument.\n\n#### Example:\n```python\nfrom tokopaedi import SearchFilters, search\n\nfilters = SearchFilters(\n    pmin=100000,\n    pmax=1000000,\n    condition=1,              # 1 = New\n    is_discount=True,\n    bebas_ongkir_extra=True,\n    rt=4.5,                   # Minimum rating 4.5\n    latest_product=30         # Products listed in the last 30 days\n)\n\nresults = search(\"logitech mouse\", filters=filters)\n```\n\n#### Available Fields:\n\n| Field                 | Type     | Description                                       | Accepted Values                  |\n|----------------------|----------|---------------------------------------------------|----------------------------------|\n| `pmin`               | `int`    | Minimum price (in IDR)                            | e.g., `100000`                   |\n| `pmax`               | `int`    | Maximum price (in IDR)                            | e.g., `1000000`                  |\n| `condition`          | `int`    | Product condition                                 | `1` = New, `2` = Used            |\n| `shop_tier`          | `int`    | Type of shop                                      | `2` = Mall, `3` = Power Shop     |\n| `rt`                 | `float`  | Minimum rating                                    | e.g., `4.5`                      |\n| `latest_product`     | `int`    | Product recency filter                            | `7`, `30`, `90`               |\n| `bebas_ongkir_extra` | `bool`   | Filter for extra free shipping                   | `True` / `False`                 |\n| `is_discount`        | `bool`   | Only show discounted products                    | `True` / `False`                 |\n| `is_fulfillment`     | `bool`   | Only Fulfilled by Tokopedia                      | `True` / `False`                 |\n| `is_plus`            | `bool`   | Only Tokopedia PLUS sellers                      | `True` / `False`                 |\n| `cod`                | `bool`   | Cash on delivery available                        | `True` / `False`                 |\n\n\n---\n\n## Example: Enrich with product details & reviews, then convert to pandas DataFrame from Jupyter Notebook\n\n```python\nfrom tokopaedi import search, SearchFilters, get_product, get_reviews, combine_data\nimport json\nimport pandas as pd\nfrom pandas import json_normalize\n\nfilters = SearchFilters(\n    bebas_ongkir_extra=True,\n    pmax=100000,\n    rt=4.5\n)\n\n# Fetch search results\nresults = search(\"logitech g304\", max_result=10, debug=False)\n\n# Enrich each result with product details and reviews\nfor result in results:\n    combine_data(\n        result,\n        get_product(product_id=result.product_id, debug=False),\n        get_reviews(product_id=result.product_id, max_result=1, debug=False)\n    )\n\n# Convert to DataFrame and preview important fields\ndf = json_normalize(results.json())\nprint(df[[\n    \"product_id\",\n    \"category\",\n    \"real_price\",\n    \"original_price\",\n    \"product_detail.product_name\",\n    \"rating\",\n    \"shop.name\"\n]].head())\n```\n\n## Author\n\nTokopaedi was created by **Hilmi Azizi**. For inquiries, feedback, or collaboration, contact me at [root@hilmiazizi.com](mailto:root@hilmiazizi.com). You can also reach out via [GitHub Issues](https://github.com/hilmiazizi/tokopaedi/issues) for bug reports or feature suggestions.\n\n## \ud83d\udcc4 License\n\nThis project is licensed under the MIT License.\n\nYou are free to use, modify, and distribute this project with attribution. See the [LICENSE](https://github.com/hilmiazizi/tokopaedi/blob/main/LICENSE) file for more details.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A Python scraper for Tokopedia that supports filtered product search, detailed product information, and customer reviews with accurate mobile pricing and Jupyter Notebook compatibility.",
    "version": "0.1.2",
    "project_urls": {
        "Documentation": "https://github.com/hilmiazizi/tokopaedi",
        "Homepage": "https://github.com/hilmiazizi/tokopaedi",
        "Repository": "https://github.com/hilmiazizi/tokopaedi"
    },
    "split_keywords": [
        "tokopedia",
        " scraper",
        " ecommerce",
        " product",
        " data",
        " web-scraping",
        " python",
        " data-analysis",
        " tokopedia-reviews"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "102a0752b30af638cd99a6721a3aa8c0db3001c396350031d21190ddef5efb74",
                "md5": "13c79e70bab29a4aa2ef10fd7deefdf0",
                "sha256": "2c650397ccce4f6d96ce5bc1be30af5ba448720bf2497e70f7139738fbdd2579"
            },
            "downloads": -1,
            "filename": "tokopaedi-0.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "13c79e70bab29a4aa2ef10fd7deefdf0",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.8",
            "size": 18723,
            "upload_time": "2025-07-17T02:13:56",
            "upload_time_iso_8601": "2025-07-17T02:13:56.013188Z",
            "url": "https://files.pythonhosted.org/packages/10/2a/0752b30af638cd99a6721a3aa8c0db3001c396350031d21190ddef5efb74/tokopaedi-0.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "08de8cb423f0134f28b8f91e337a44844d73a481418ee9e3bf8f70aeca571007",
                "md5": "47439ca83ec7db670d6a4a1da01e3f11",
                "sha256": "8707ebc145cbe2bf880f9e69bc1b28273c97124ee5d83a48efba5911b675f8ba"
            },
            "downloads": -1,
            "filename": "tokopaedi-0.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "47439ca83ec7db670d6a4a1da01e3f11",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.8",
            "size": 18290,
            "upload_time": "2025-07-17T02:13:57",
            "upload_time_iso_8601": "2025-07-17T02:13:57.666792Z",
            "url": "https://files.pythonhosted.org/packages/08/de/8cb423f0134f28b8f91e337a44844d73a481418ee9e3bf8f70aeca571007/tokopaedi-0.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-17 02:13:57",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "hilmiazizi",
    "github_project": "tokopaedi",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "tokopaedi"
}
        
Elapsed time: 0.42899s