reelscraper


Namereelscraper JSON
Version 2.2.2 PyPI version JSON
download
home_pageNone
SummaryA convenient way to harvest Reels data without breaking a sweat—or Instagram's TOS
upload_time2025-01-15 01:11:52
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9
licenseLICENSE.txt
keywords instagram scraper reels
VCS
bugtrack_url
requirements certifi charset-normalizer et_xmlfile fake-useragent idna openpyxl PyYAML requests urllib3 SQLAlchemy
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![PyPI version](https://img.shields.io/pypi/v/reelscraper.svg)](https://pypi.org/project/reelscraper/)
[![Build](https://github.com/andreaaazo/reelscraper/actions/workflows/tests.yml/badge.svg?branch=master)](https://github.com/andreaaazo/reelscraper/actions/workflows/tests.yml)
[![Code Tests Coverage](https://codecov.io/gh/andreaaazo/reelscraper/branch/master/graph/badge.svg)](https://codecov.io/gh/andreaaazo/reelscraper)

<h1 align="center">
  ReelScraper
  <br>
</h1>

<h4 align="center">
Scrape Instagram Reels data with ease—be it a single account or many in parallel—using Python, threading, robust logging, and optional database support.
</h4>

<p align="center">
  <a href="#-installation">Installation</a> •
  <a href="#-usage">Usage</a> •
  <a href="#-classes">Classes</a> •
  <a href="#-documentation">Documentation</a> •
  <a href="#-contributing">Contributing</a> •
  <a href="#-license">License</a> •
  <a href="#-acknowledgments">Acknowledgments</a> •
  <a href="#-disclaimer">Disclaimer</a>
</p>

---

## 💻 Installation

Requires **Python 3.9+**. Install directly from PyPI:

```bash
pip install reelscraper
```

Or clone from GitHub:

```bash
git clone https://github.com/andreaaazo/reelscraper.git
cd reelscraper
python -m pip install .
```

---

## 🚀 Usage

ReelScraper supports detailed logging and optional persistence via a database. You can either scrape a single Instagram account or handle multiple accounts concurrently.

### 1. Single-Account Scraping

Use **`ReelScraper`** to fetch Reels for a single account. Optionally pass a `LoggerManager` for retry logs and progress tracking.

```python
from reelscraper import ReelScraper
from reelscraper.utils import LoggerManager

# Optional logger setup
logger = LoggerManager()

# Initialize scraper with a 30-second timeout, no proxy, and logging
scraper = ReelScraper(timeout=30, proxy=None, logger_manager=logger)

# Fetch up to 10 reels for "someaccount"
reels_data = scraper.get_user_reels("someaccount", max_posts=10)
for reel in reels_data:
    print(reel)
```

### 2. Multi-Account Concurrency & Database Storage

Use **`ReelMultiScraper`** to process many accounts concurrently. Configure logging (`LoggerManager`) and database persistence (`DBManager`) if desired.

```python
from reelscraper import ReelScraper, ReelMultiScraper
from reelscraper.utils import LoggerManager
from reelscraper.utils.database import DBManager

# Configure logger and optional DB manager
logger = LoggerManager()
db_manager = DBManager(db_url="sqlite:///myreels.db")

# Create a single scraper instance
single_scraper = ReelScraper(timeout=30, proxy=None, logger_manager=logger)

# MultiScraper for concurrency, database integration, and auto-logging
multi_scraper = ReelMultiScraper(
    single_scraper,
    max_workers=5,
    db_manager=db_manager,
)

# File contains one username per line, e.g.:
#   user1
#   user2
accounts_file_path = "accounts.txt"

# Scrape accounts concurrently
# If DBManager is provided, results are stored in DB, and this method returns None
all_reels = multi_scraper.scrape_accounts(
    accounts_file=accounts_file_path,
    max_posts_per_profile=20,
    max_retires_per_profile=10
)

if all_reels is not None:
    print(f"Total reels scraped: {len(all_reels)}")
else:
    print("All reels have been stored in the database.")
```

> **Note:** If `DBManager` is set, scraped reels are saved to the database instead of being returned.

---

## 🏗 Classes

### `ReelScraper`
- **Purpose:**  
  Fetches Instagram Reels for a single user session.
- **Key Components:**  
  - `InstagramAPI`: Manages HTTP requests and proxy usage.  
  - `Extractor`: Structures raw reel data.  
  - `LoggerManager` (optional): Logs retries and status events.
- **Key Method:**  
  - `get_user_reels(username, max_posts=50, max_retries=10)`: Retrieves reels, handling pagination and retries.

### `ReelMultiScraper`
- **Purpose:**  
  Scrapes multiple accounts in parallel, powered by a single `ReelScraper` instance.
- **Key Components:**  
  - `ThreadPoolExecutor`: Enables concurrent scraping.  
  - `AccountManager`: Reads accounts from a local file.  
  - `LoggerManager` (optional): Captures multi-account events.  
  - `DBManager` (optional): Saves aggregated results to a database.
- **Key Method:**  
  - `scrape_accounts(accounts_file, max_posts_per_profile, max_retires_per_profile)`: Concurrently processes all accounts found in the file, optionally storing results in a DB.

---

## 📄 Documentation

Find full usage details in the [DOCS.md](https://github.com/andreaaazo/reelscraper/blob/master/DOCS.md) file.

---

## 🤝 Contributing

We welcome PRs that enhance features, fix bugs, or improve docs.

1. **Fork** the repo.
2. **Create** a new branch.
3. **Commit** code changes (add tests where possible).
4. **Open** a pull request.

Your contributions are appreciated—happy coding!

---

## 📄 License

Licensed under the [MIT License](https://github.com/andreaaazo/reelscraper/blob/master/LICENSE.txt). Feel free to modify and distribute, but please be mindful of best practices and ethical scraping.

---

## 🙏 Acknowledgments

- **Python Community**: For making concurrency and requests straightforward to implement.  
- **Instagram**: For providing reel content that inspires creativity.  
- **Beverages**: For fueling late-night debugging and coding sessions.

---

## ⚠ Disclaimer

This software is for **personal and educational** purposes only. Use it in accordance with Instagram’s Terms of Service. We do not promote or condone large-scale commercial scraping or any violation of privacy/IP rights.

---

Enjoy scraping, and may your concurrency be swift!

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "reelscraper",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "instagram, scraper, reels",
    "author": null,
    "author_email": "Andrea Zorzi <zorzi.andrea@outlook.com>",
    "download_url": "https://files.pythonhosted.org/packages/63/ad/b1d6bc0a7c96218f33a7c89d74471ce72e4dcf5d64b08880e11dc42ecd29/reelscraper-2.2.2.tar.gz",
    "platform": null,
    "description": "[![PyPI version](https://img.shields.io/pypi/v/reelscraper.svg)](https://pypi.org/project/reelscraper/)\n[![Build](https://github.com/andreaaazo/reelscraper/actions/workflows/tests.yml/badge.svg?branch=master)](https://github.com/andreaaazo/reelscraper/actions/workflows/tests.yml)\n[![Code Tests Coverage](https://codecov.io/gh/andreaaazo/reelscraper/branch/master/graph/badge.svg)](https://codecov.io/gh/andreaaazo/reelscraper)\n\n<h1 align=\"center\">\n  ReelScraper\n  <br>\n</h1>\n\n<h4 align=\"center\">\nScrape Instagram Reels data with ease\u2014be it a single account or many in parallel\u2014using Python, threading, robust logging, and optional database support.\n</h4>\n\n<p align=\"center\">\n  <a href=\"#-installation\">Installation</a> \u2022\n  <a href=\"#-usage\">Usage</a> \u2022\n  <a href=\"#-classes\">Classes</a> \u2022\n  <a href=\"#-documentation\">Documentation</a> \u2022\n  <a href=\"#-contributing\">Contributing</a> \u2022\n  <a href=\"#-license\">License</a> \u2022\n  <a href=\"#-acknowledgments\">Acknowledgments</a> \u2022\n  <a href=\"#-disclaimer\">Disclaimer</a>\n</p>\n\n---\n\n## \ud83d\udcbb Installation\n\nRequires **Python 3.9+**. Install directly from PyPI:\n\n```bash\npip install reelscraper\n```\n\nOr clone from GitHub:\n\n```bash\ngit clone https://github.com/andreaaazo/reelscraper.git\ncd reelscraper\npython -m pip install .\n```\n\n---\n\n## \ud83d\ude80 Usage\n\nReelScraper supports detailed logging and optional persistence via a database. You can either scrape a single Instagram account or handle multiple accounts concurrently.\n\n### 1. Single-Account Scraping\n\nUse **`ReelScraper`** to fetch Reels for a single account. Optionally pass a `LoggerManager` for retry logs and progress tracking.\n\n```python\nfrom reelscraper import ReelScraper\nfrom reelscraper.utils import LoggerManager\n\n# Optional logger setup\nlogger = LoggerManager()\n\n# Initialize scraper with a 30-second timeout, no proxy, and logging\nscraper = ReelScraper(timeout=30, proxy=None, logger_manager=logger)\n\n# Fetch up to 10 reels for \"someaccount\"\nreels_data = scraper.get_user_reels(\"someaccount\", max_posts=10)\nfor reel in reels_data:\n    print(reel)\n```\n\n### 2. Multi-Account Concurrency & Database Storage\n\nUse **`ReelMultiScraper`** to process many accounts concurrently. Configure logging (`LoggerManager`) and database persistence (`DBManager`) if desired.\n\n```python\nfrom reelscraper import ReelScraper, ReelMultiScraper\nfrom reelscraper.utils import LoggerManager\nfrom reelscraper.utils.database import DBManager\n\n# Configure logger and optional DB manager\nlogger = LoggerManager()\ndb_manager = DBManager(db_url=\"sqlite:///myreels.db\")\n\n# Create a single scraper instance\nsingle_scraper = ReelScraper(timeout=30, proxy=None, logger_manager=logger)\n\n# MultiScraper for concurrency, database integration, and auto-logging\nmulti_scraper = ReelMultiScraper(\n    single_scraper,\n    max_workers=5,\n    db_manager=db_manager,\n)\n\n# File contains one username per line, e.g.:\n#   user1\n#   user2\naccounts_file_path = \"accounts.txt\"\n\n# Scrape accounts concurrently\n# If DBManager is provided, results are stored in DB, and this method returns None\nall_reels = multi_scraper.scrape_accounts(\n    accounts_file=accounts_file_path,\n    max_posts_per_profile=20,\n    max_retires_per_profile=10\n)\n\nif all_reels is not None:\n    print(f\"Total reels scraped: {len(all_reels)}\")\nelse:\n    print(\"All reels have been stored in the database.\")\n```\n\n> **Note:** If `DBManager` is set, scraped reels are saved to the database instead of being returned.\n\n---\n\n## \ud83c\udfd7 Classes\n\n### `ReelScraper`\n- **Purpose:**  \n  Fetches Instagram Reels for a single user session.\n- **Key Components:**  \n  - `InstagramAPI`: Manages HTTP requests and proxy usage.  \n  - `Extractor`: Structures raw reel data.  \n  - `LoggerManager` (optional): Logs retries and status events.\n- **Key Method:**  \n  - `get_user_reels(username, max_posts=50, max_retries=10)`: Retrieves reels, handling pagination and retries.\n\n### `ReelMultiScraper`\n- **Purpose:**  \n  Scrapes multiple accounts in parallel, powered by a single `ReelScraper` instance.\n- **Key Components:**  \n  - `ThreadPoolExecutor`: Enables concurrent scraping.  \n  - `AccountManager`: Reads accounts from a local file.  \n  - `LoggerManager` (optional): Captures multi-account events.  \n  - `DBManager` (optional): Saves aggregated results to a database.\n- **Key Method:**  \n  - `scrape_accounts(accounts_file, max_posts_per_profile, max_retires_per_profile)`: Concurrently processes all accounts found in the file, optionally storing results in a DB.\n\n---\n\n## \ud83d\udcc4 Documentation\n\nFind full usage details in the [DOCS.md](https://github.com/andreaaazo/reelscraper/blob/master/DOCS.md) file.\n\n---\n\n## \ud83e\udd1d Contributing\n\nWe welcome PRs that enhance features, fix bugs, or improve docs.\n\n1. **Fork** the repo.\n2. **Create** a new branch.\n3. **Commit** code changes (add tests where possible).\n4. **Open** a pull request.\n\nYour contributions are appreciated\u2014happy coding!\n\n---\n\n## \ud83d\udcc4 License\n\nLicensed under the [MIT License](https://github.com/andreaaazo/reelscraper/blob/master/LICENSE.txt). Feel free to modify and distribute, but please be mindful of best practices and ethical scraping.\n\n---\n\n## \ud83d\ude4f Acknowledgments\n\n- **Python Community**: For making concurrency and requests straightforward to implement.  \n- **Instagram**: For providing reel content that inspires creativity.  \n- **Beverages**: For fueling late-night debugging and coding sessions.\n\n---\n\n## \u26a0 Disclaimer\n\nThis software is for **personal and educational** purposes only. Use it in accordance with Instagram\u2019s Terms of Service. We do not promote or condone large-scale commercial scraping or any violation of privacy/IP rights.\n\n---\n\nEnjoy scraping, and may your concurrency be swift!\n",
    "bugtrack_url": null,
    "license": "LICENSE.txt",
    "summary": "A convenient way to harvest Reels data without breaking a sweat\u2014or Instagram's TOS",
    "version": "2.2.2",
    "project_urls": {
        "Bug Tracker": "https://github.com/andreaaazo/reelscraper/issues",
        "Documentation": "https://github.com/andreaaazo/reelscraper/blob/master/DOCS.md",
        "Home": "https://github.com/andreaaazo/reelscraper"
    },
    "split_keywords": [
        "instagram",
        " scraper",
        " reels"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b7ef9f17d164c183da2e5f593ebe44ead05ed23a1a36f49f770f46f3299c4dd0",
                "md5": "0fb02bd1eb6b4973834436c7e8588bd9",
                "sha256": "27655d843a89549caff089e83bba3b6808dd8579d55fc2c8f984ee9c96cbd0d0"
            },
            "downloads": -1,
            "filename": "reelscraper-2.2.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "0fb02bd1eb6b4973834436c7e8588bd9",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 19091,
            "upload_time": "2025-01-15T01:11:49",
            "upload_time_iso_8601": "2025-01-15T01:11:49.370851Z",
            "url": "https://files.pythonhosted.org/packages/b7/ef/9f17d164c183da2e5f593ebe44ead05ed23a1a36f49f770f46f3299c4dd0/reelscraper-2.2.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "63adb1d6bc0a7c96218f33a7c89d74471ce72e4dcf5d64b08880e11dc42ecd29",
                "md5": "dbc0776ab2f5592b1c7a642d3962e0a0",
                "sha256": "15e9410a88908e753bc7e832919c90f81ac975285476fc51155ba855bce04783"
            },
            "downloads": -1,
            "filename": "reelscraper-2.2.2.tar.gz",
            "has_sig": false,
            "md5_digest": "dbc0776ab2f5592b1c7a642d3962e0a0",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 34240,
            "upload_time": "2025-01-15T01:11:52",
            "upload_time_iso_8601": "2025-01-15T01:11:52.101977Z",
            "url": "https://files.pythonhosted.org/packages/63/ad/b1d6bc0a7c96218f33a7c89d74471ce72e4dcf5d64b08880e11dc42ecd29/reelscraper-2.2.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-15 01:11:52",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "andreaaazo",
    "github_project": "reelscraper",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "certifi",
            "specs": [
                [
                    "==",
                    "2024.12.14"
                ]
            ]
        },
        {
            "name": "charset-normalizer",
            "specs": [
                [
                    "==",
                    "3.4.1"
                ]
            ]
        },
        {
            "name": "et_xmlfile",
            "specs": [
                [
                    "==",
                    "2.0.0"
                ]
            ]
        },
        {
            "name": "fake-useragent",
            "specs": [
                [
                    "==",
                    "2.0.3"
                ]
            ]
        },
        {
            "name": "idna",
            "specs": [
                [
                    "==",
                    "3.10"
                ]
            ]
        },
        {
            "name": "openpyxl",
            "specs": [
                [
                    "==",
                    "3.1.5"
                ]
            ]
        },
        {
            "name": "PyYAML",
            "specs": [
                [
                    "==",
                    "6.0.2"
                ]
            ]
        },
        {
            "name": "requests",
            "specs": [
                [
                    "==",
                    "2.32.3"
                ]
            ]
        },
        {
            "name": "urllib3",
            "specs": [
                [
                    "==",
                    "2.3.0"
                ]
            ]
        },
        {
            "name": "SQLAlchemy",
            "specs": [
                [
                    "==",
                    "2.0.37"
                ]
            ]
        }
    ],
    "lcname": "reelscraper"
}
        
Elapsed time: 0.40821s