amazon-product-details-scraper


Nameamazon-product-details-scraper JSON
Version 1.0.4 PyPI version JSON
download
home_pagehttps://github.com/ranjan-mohanty/amazon-product-details-scraper/blob/main/README.md
SummaryScrapes product details from Amazon product pages and also downloads the images
upload_time2024-03-22 22:07:09
maintainerNone
docs_urlNone
authorRanjan Mohanty
requires_pythonNone
licenseNone
keywords amazon scraper
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ## Amazon Product Details Scraper

[![GitHub License](https://img.shields.io/github/license/ranjan-mohanty/amazon-product-details-scraper)](https://github.com/ranjan-mohanty/amazon-product-details-scraper/blob/main/LICENSE)
[![GitHub Release](https://img.shields.io/github/v/release/ranjan-mohanty/amazon-product-details-scraper)](https://github.com/ranjan-mohanty/amazon-product-details-scraper/releases)
[![PyPI - Version](https://img.shields.io/pypi/v/amazon-product-details-scraper)](https://pypi.org/project/amazon-product-details-scraper/)
[![Downloads](https://static.pepy.tech/badge/amazon-product-details-scraper)](https://pepy.tech/project/amazon-product-details-scraper)
[![GitHub forks](https://img.shields.io/github/forks/ranjan-mohanty/amazon-product-details-scraper)](https://github.com/ranjan-mohanty/amazon-product-details-scraper/forks)
[![GitHub Repo stars](https://img.shields.io/github/stars/ranjan-mohanty/amazon-product-details-scraper)](https://github.com/ranjan-mohanty/amazon-product-details-scraper/stargazers)

[![GitHub Actions Workflow Status](https://img.shields.io/github/actions/workflow/status/ranjan-mohanty/amazon-product-details-scraper/build.yml)](https://github.com/ranjan-mohanty/amazon-product-details-scraper/actions/workflows/build.yml)
[![OpenSSF Scorecard](https://api.securityscorecards.dev/projects/github.com/ranjan-mohanty/amazon-product-details-scraper/badge)](https://securityscorecards.dev/viewer/?uri=github.com/ranjan-mohanty/amazon-product-details-scraper)
[![GitHub Issues or Pull Requests](https://img.shields.io/github/issues/ranjan-mohanty/amazon-product-details-scraper)](https://github.com/ranjan-mohanty/amazon-product-details-scraper/issues)
![Libraries.io dependency status for GitHub repo](https://img.shields.io/librariesio/github/ranjan-mohanty/amazon-product-details-scraper)

This script helps you scrape product details from Amazon product pages. It extracts information like title, description, and image URLs, saving them to JSON files.

### Features

- Fetches product details from a single Amazon product URL or a list of URLs in a file.
- Writes extracted data to JSON files for easy storage and processing.
- Optionally downloads product images along with details.

### Installation

**Requirements:**

- Python 3 (tested with 3.7+)
- Libraries:
  - requests
  - beautifulsoup4
  - urllib3

**Instructions:**

1. Make sure you have Python 3 installed. You can check by running `python3 --version` in your terminal.
2. **Create a virtual environment (recommended):**

   - Virtual environments help isolate project dependencies and avoid conflicts with other Python installations on your system.
   - Here's how to create a virtual environment using `venv`:

     ```bash
     python3 -m venv my_env  # Replace "my_env" with your desired environment name
     ```

   - Activate the virtual environment:

     ```bash
     source my_env/bin/activate
     ```

3. **Install:**

   ```bash
   python3 setup.py install
   ```

   This will automatically download and install the necessary libraries based on the specifications within the activated virtual environment.

### Usage

**Basic Usage:**

```bash
amazon-scraper --url https://www.amazon.com/product-1  # Replace with your product URL
```

This will scrape details from the provided Amazon product URL and write them to a JSON file in the "output" directory (default).

**Using a URL List:**

1. Create a text file containing a list of Amazon product URLs (one per line).
2. Run the script with the `--url-list` option and provide the file path:

```bash
amazon-scraper --url-list product_urls.txt
```

This will process each URL in the file and save the scraped details for each product in separate directories within "output".

**Optional: Downloading Images**

```bash
amazon-scraper --url https://www.amazon.com/product-1 --download-image
```

The `--download-image` flag enables downloading product images along with other details.

**Getting Help:**

The script offers a built-in help message that provides a quick overview of available options and usage instructions. To access the help, run the script with the `--help` option:

```bash
amazon_scraper --help
```

### Configuration

**Logging:**

- The script uses basic logging for information and error messages.
- You can modify the logging level by editing the `DEFAULT_LOG_LEVEL` in `config.py` line in the code (refer to the Python documentation for logging configuration).

### Example

**Scenario:**

Scrape details for two products from a file named "products.txt" and download images:

1. Create a file named "products.txt" with the following content:

```
https://www.amazon.com/product-1
https://www.amazon.com/product-2
```

2. Run the script with the following command:

```bash
amazon-scraper --url-list products.txt --download-image
```

This will process both URLs in the file, scrape details, create separate output directories for each product, and download images.

### Disclaimer

This script is for educational purposes only. Please be respectful of Amazon's terms of service when using it. Consider using official APIs provided by Amazon for extensive data collection.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/ranjan-mohanty/amazon-product-details-scraper/blob/main/README.md",
    "name": "amazon-product-details-scraper",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "amazon, scraper",
    "author": "Ranjan Mohanty",
    "author_email": "ranjan@duck.com",
    "download_url": "https://files.pythonhosted.org/packages/b5/b2/682e91b3b7c3aec25ae8a8f0e26f09c0f8eb5caaa32a38cee9013ec90d62/amazon-product-details-scraper-1.0.4.tar.gz",
    "platform": null,
    "description": "## Amazon Product Details Scraper\n\n[![GitHub License](https://img.shields.io/github/license/ranjan-mohanty/amazon-product-details-scraper)](https://github.com/ranjan-mohanty/amazon-product-details-scraper/blob/main/LICENSE)\n[![GitHub Release](https://img.shields.io/github/v/release/ranjan-mohanty/amazon-product-details-scraper)](https://github.com/ranjan-mohanty/amazon-product-details-scraper/releases)\n[![PyPI - Version](https://img.shields.io/pypi/v/amazon-product-details-scraper)](https://pypi.org/project/amazon-product-details-scraper/)\n[![Downloads](https://static.pepy.tech/badge/amazon-product-details-scraper)](https://pepy.tech/project/amazon-product-details-scraper)\n[![GitHub forks](https://img.shields.io/github/forks/ranjan-mohanty/amazon-product-details-scraper)](https://github.com/ranjan-mohanty/amazon-product-details-scraper/forks)\n[![GitHub Repo stars](https://img.shields.io/github/stars/ranjan-mohanty/amazon-product-details-scraper)](https://github.com/ranjan-mohanty/amazon-product-details-scraper/stargazers)\n\n[![GitHub Actions Workflow Status](https://img.shields.io/github/actions/workflow/status/ranjan-mohanty/amazon-product-details-scraper/build.yml)](https://github.com/ranjan-mohanty/amazon-product-details-scraper/actions/workflows/build.yml)\n[![OpenSSF Scorecard](https://api.securityscorecards.dev/projects/github.com/ranjan-mohanty/amazon-product-details-scraper/badge)](https://securityscorecards.dev/viewer/?uri=github.com/ranjan-mohanty/amazon-product-details-scraper)\n[![GitHub Issues or Pull Requests](https://img.shields.io/github/issues/ranjan-mohanty/amazon-product-details-scraper)](https://github.com/ranjan-mohanty/amazon-product-details-scraper/issues)\n![Libraries.io dependency status for GitHub repo](https://img.shields.io/librariesio/github/ranjan-mohanty/amazon-product-details-scraper)\n\nThis script helps you scrape product details from Amazon product pages. It extracts information like title, description, and image URLs, saving them to JSON files.\n\n### Features\n\n- Fetches product details from a single Amazon product URL or a list of URLs in a file.\n- Writes extracted data to JSON files for easy storage and processing.\n- Optionally downloads product images along with details.\n\n### Installation\n\n**Requirements:**\n\n- Python 3 (tested with 3.7+)\n- Libraries:\n  - requests\n  - beautifulsoup4\n  - urllib3\n\n**Instructions:**\n\n1. Make sure you have Python 3 installed. You can check by running `python3 --version` in your terminal.\n2. **Create a virtual environment (recommended):**\n\n   - Virtual environments help isolate project dependencies and avoid conflicts with other Python installations on your system.\n   - Here's how to create a virtual environment using `venv`:\n\n     ```bash\n     python3 -m venv my_env  # Replace \"my_env\" with your desired environment name\n     ```\n\n   - Activate the virtual environment:\n\n     ```bash\n     source my_env/bin/activate\n     ```\n\n3. **Install:**\n\n   ```bash\n   python3 setup.py install\n   ```\n\n   This will automatically download and install the necessary libraries based on the specifications within the activated virtual environment.\n\n### Usage\n\n**Basic Usage:**\n\n```bash\namazon-scraper --url https://www.amazon.com/product-1  # Replace with your product URL\n```\n\nThis will scrape details from the provided Amazon product URL and write them to a JSON file in the \"output\" directory (default).\n\n**Using a URL List:**\n\n1. Create a text file containing a list of Amazon product URLs (one per line).\n2. Run the script with the `--url-list` option and provide the file path:\n\n```bash\namazon-scraper --url-list product_urls.txt\n```\n\nThis will process each URL in the file and save the scraped details for each product in separate directories within \"output\".\n\n**Optional: Downloading Images**\n\n```bash\namazon-scraper --url https://www.amazon.com/product-1 --download-image\n```\n\nThe `--download-image` flag enables downloading product images along with other details.\n\n**Getting Help:**\n\nThe script offers a built-in help message that provides a quick overview of available options and usage instructions. To access the help, run the script with the `--help` option:\n\n```bash\namazon_scraper --help\n```\n\n### Configuration\n\n**Logging:**\n\n- The script uses basic logging for information and error messages.\n- You can modify the logging level by editing the `DEFAULT_LOG_LEVEL` in `config.py` line in the code (refer to the Python documentation for logging configuration).\n\n### Example\n\n**Scenario:**\n\nScrape details for two products from a file named \"products.txt\" and download images:\n\n1. Create a file named \"products.txt\" with the following content:\n\n```\nhttps://www.amazon.com/product-1\nhttps://www.amazon.com/product-2\n```\n\n2. Run the script with the following command:\n\n```bash\namazon-scraper --url-list products.txt --download-image\n```\n\nThis will process both URLs in the file, scrape details, create separate output directories for each product, and download images.\n\n### Disclaimer\n\nThis script is for educational purposes only. Please be respectful of Amazon's terms of service when using it. Consider using official APIs provided by Amazon for extensive data collection.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Scrapes product details from Amazon product pages and also downloads the images",
    "version": "1.0.4",
    "project_urls": {
        "Homepage": "https://github.com/ranjan-mohanty/amazon-product-details-scraper/blob/main/README.md",
        "Source": "https://github.com/ranjan-mohanty/amazon-product-details-scraper"
    },
    "split_keywords": [
        "amazon",
        " scraper"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "72edbb57c48ecea2ef958851b1899a3abc5e7f8d6dc9cf549d61a7c5db10ae24",
                "md5": "ab5eed620ce5e9304dc9eb54519a32ec",
                "sha256": "b9521a9200d375f6d46969cf804980d1bd561068d7481c1f36ac7ef82ab2d5d1"
            },
            "downloads": -1,
            "filename": "amazon_product_details_scraper-1.0.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ab5eed620ce5e9304dc9eb54519a32ec",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 9936,
            "upload_time": "2024-03-22T22:07:06",
            "upload_time_iso_8601": "2024-03-22T22:07:06.573109Z",
            "url": "https://files.pythonhosted.org/packages/72/ed/bb57c48ecea2ef958851b1899a3abc5e7f8d6dc9cf549d61a7c5db10ae24/amazon_product_details_scraper-1.0.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b5b2682e91b3b7c3aec25ae8a8f0e26f09c0f8eb5caaa32a38cee9013ec90d62",
                "md5": "e11a61407514ba6a1b7f60a21bb1fb7c",
                "sha256": "aa6858842b035fe0775df9bd575382b3b6cbb0ec8250a76c5f5b5690b3745eac"
            },
            "downloads": -1,
            "filename": "amazon-product-details-scraper-1.0.4.tar.gz",
            "has_sig": false,
            "md5_digest": "e11a61407514ba6a1b7f60a21bb1fb7c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 9359,
            "upload_time": "2024-03-22T22:07:09",
            "upload_time_iso_8601": "2024-03-22T22:07:09.933395Z",
            "url": "https://files.pythonhosted.org/packages/b5/b2/682e91b3b7c3aec25ae8a8f0e26f09c0f8eb5caaa32a38cee9013ec90d62/amazon-product-details-scraper-1.0.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-03-22 22:07:09",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "ranjan-mohanty",
    "github_project": "amazon-product-details-scraper",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "amazon-product-details-scraper"
}
        
Elapsed time: 0.21663s