[![PyPI version](https://img.shields.io/pypi/v/reelscraper.svg)](https://pypi.org/project/reelscraper/)
[![Build](https://github.com/andreaaazo/reelscraper/actions/workflows/tests.yml/badge.svg?branch=master)](https://github.com/andreaaazo/reelscraper/actions/workflows/tests.yml)
[![Code Tests Coverage](https://codecov.io/gh/andreaaazo/reelscraper/branch/master/graph/badge.svg)](https://codecov.io/gh/andreaaazo/reelscraper)
<h1 align="center">
ReelScraper
<br>
</h1>
<h4 align="center">
Scrape Instagram Reels data with ease—be it a single account or many in parallel—using Python, threading, robust logging, and optional database support.
</h4>
<p align="center">
<a href="#-installation">Installation</a> •
<a href="#-usage">Usage</a> •
<a href="#-classes">Classes</a> •
<a href="#-documentation">Documentation</a> •
<a href="#-contributing">Contributing</a> •
<a href="#-license">License</a> •
<a href="#-acknowledgments">Acknowledgments</a> •
<a href="#-disclaimer">Disclaimer</a>
</p>
---
## 💻 Installation
Requires **Python 3.9+**. Install directly from PyPI:
```bash
pip install reelscraper
```
Or clone from GitHub:
```bash
git clone https://github.com/andreaaazo/reelscraper.git
cd reelscraper
python -m pip install .
```
---
## 🚀 Usage
ReelScraper supports detailed logging and optional persistence via a database. You can either scrape a single Instagram account or handle multiple accounts concurrently.
### 1. Single-Account Scraping
Use **`ReelScraper`** to fetch Reels for a single account. Optionally pass a `LoggerManager` for retry logs and progress tracking.
```python
from reelscraper import ReelScraper
from reelscraper.utils import LoggerManager
# Optional logger setup
logger = LoggerManager()
# Initialize scraper with a 30-second timeout, no proxy, and logging
scraper = ReelScraper(timeout=30, proxy=None, logger_manager=logger)
# Fetch up to 10 reels for "someaccount"
reels_data = scraper.get_user_reels("someaccount", max_posts=10)
for reel in reels_data:
print(reel)
```
### 2. Multi-Account Concurrency & Database Storage
Use **`ReelMultiScraper`** to process many accounts concurrently. Configure logging (`LoggerManager`) and database persistence (`DBManager`) if desired.
```python
from reelscraper import ReelScraper, ReelMultiScraper
from reelscraper.utils import LoggerManager
from reelscraper.utils.database import DBManager
# Configure logger and optional DB manager
logger = LoggerManager()
db_manager = DBManager(db_url="sqlite:///myreels.db")
# Create a single scraper instance
single_scraper = ReelScraper(timeout=30, proxy=None, logger_manager=logger)
# MultiScraper for concurrency, database integration, and auto-logging
multi_scraper = ReelMultiScraper(
single_scraper,
max_workers=5,
db_manager=db_manager,
)
# File contains one username per line, e.g.:
# user1
# user2
accounts_file_path = "accounts.txt"
# Scrape accounts concurrently
# If DBManager is provided, results are stored in DB, and this method returns None
all_reels = multi_scraper.scrape_accounts(
accounts_file=accounts_file_path,
max_posts_per_profile=20,
max_retires_per_profile=10
)
if all_reels is not None:
print(f"Total reels scraped: {len(all_reels)}")
else:
print("All reels have been stored in the database.")
```
> **Note:** If `DBManager` is set, scraped reels are saved to the database instead of being returned.
---
## 🏗 Classes
### `ReelScraper`
- **Purpose:**
Fetches Instagram Reels for a single user session.
- **Key Components:**
- `InstagramAPI`: Manages HTTP requests and proxy usage.
- `Extractor`: Structures raw reel data.
- `LoggerManager` (optional): Logs retries and status events.
- **Key Method:**
- `get_user_reels(username, max_posts=50, max_retries=10)`: Retrieves reels, handling pagination and retries.
### `ReelMultiScraper`
- **Purpose:**
Scrapes multiple accounts in parallel, powered by a single `ReelScraper` instance.
- **Key Components:**
- `ThreadPoolExecutor`: Enables concurrent scraping.
- `AccountManager`: Reads accounts from a local file.
- `LoggerManager` (optional): Captures multi-account events.
- `DBManager` (optional): Saves aggregated results to a database.
- **Key Method:**
- `scrape_accounts(accounts_file, max_posts_per_profile, max_retires_per_profile)`: Concurrently processes all accounts found in the file, optionally storing results in a DB.
---
## 📄 Documentation
Find full usage details in the [DOCS.md](https://github.com/andreaaazo/reelscraper/blob/master/DOCS.md) file.
---
## 🤝 Contributing
We welcome PRs that enhance features, fix bugs, or improve docs.
1. **Fork** the repo.
2. **Create** a new branch.
3. **Commit** code changes (add tests where possible).
4. **Open** a pull request.
Your contributions are appreciated—happy coding!
---
## 📄 License
Licensed under the [MIT License](https://github.com/andreaaazo/reelscraper/blob/master/LICENSE.txt). Feel free to modify and distribute, but please be mindful of best practices and ethical scraping.
---
## 🙏 Acknowledgments
- **Python Community**: For making concurrency and requests straightforward to implement.
- **Instagram**: For providing reel content that inspires creativity.
- **Beverages**: For fueling late-night debugging and coding sessions.
---
## ⚠ Disclaimer
This software is for **personal and educational** purposes only. Use it in accordance with Instagram’s Terms of Service. We do not promote or condone large-scale commercial scraping or any violation of privacy/IP rights.
---
Enjoy scraping, and may your concurrency be swift!
Raw data
{
"_id": null,
"home_page": null,
"name": "reelscraper",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "instagram, scraper, reels",
"author": null,
"author_email": "Andrea Zorzi <zorzi.andrea@outlook.com>",
"download_url": "https://files.pythonhosted.org/packages/63/ad/b1d6bc0a7c96218f33a7c89d74471ce72e4dcf5d64b08880e11dc42ecd29/reelscraper-2.2.2.tar.gz",
"platform": null,
"description": "[![PyPI version](https://img.shields.io/pypi/v/reelscraper.svg)](https://pypi.org/project/reelscraper/)\n[![Build](https://github.com/andreaaazo/reelscraper/actions/workflows/tests.yml/badge.svg?branch=master)](https://github.com/andreaaazo/reelscraper/actions/workflows/tests.yml)\n[![Code Tests Coverage](https://codecov.io/gh/andreaaazo/reelscraper/branch/master/graph/badge.svg)](https://codecov.io/gh/andreaaazo/reelscraper)\n\n<h1 align=\"center\">\n ReelScraper\n <br>\n</h1>\n\n<h4 align=\"center\">\nScrape Instagram Reels data with ease\u2014be it a single account or many in parallel\u2014using Python, threading, robust logging, and optional database support.\n</h4>\n\n<p align=\"center\">\n <a href=\"#-installation\">Installation</a> \u2022\n <a href=\"#-usage\">Usage</a> \u2022\n <a href=\"#-classes\">Classes</a> \u2022\n <a href=\"#-documentation\">Documentation</a> \u2022\n <a href=\"#-contributing\">Contributing</a> \u2022\n <a href=\"#-license\">License</a> \u2022\n <a href=\"#-acknowledgments\">Acknowledgments</a> \u2022\n <a href=\"#-disclaimer\">Disclaimer</a>\n</p>\n\n---\n\n## \ud83d\udcbb Installation\n\nRequires **Python 3.9+**. Install directly from PyPI:\n\n```bash\npip install reelscraper\n```\n\nOr clone from GitHub:\n\n```bash\ngit clone https://github.com/andreaaazo/reelscraper.git\ncd reelscraper\npython -m pip install .\n```\n\n---\n\n## \ud83d\ude80 Usage\n\nReelScraper supports detailed logging and optional persistence via a database. You can either scrape a single Instagram account or handle multiple accounts concurrently.\n\n### 1. Single-Account Scraping\n\nUse **`ReelScraper`** to fetch Reels for a single account. Optionally pass a `LoggerManager` for retry logs and progress tracking.\n\n```python\nfrom reelscraper import ReelScraper\nfrom reelscraper.utils import LoggerManager\n\n# Optional logger setup\nlogger = LoggerManager()\n\n# Initialize scraper with a 30-second timeout, no proxy, and logging\nscraper = ReelScraper(timeout=30, proxy=None, logger_manager=logger)\n\n# Fetch up to 10 reels for \"someaccount\"\nreels_data = scraper.get_user_reels(\"someaccount\", max_posts=10)\nfor reel in reels_data:\n print(reel)\n```\n\n### 2. Multi-Account Concurrency & Database Storage\n\nUse **`ReelMultiScraper`** to process many accounts concurrently. Configure logging (`LoggerManager`) and database persistence (`DBManager`) if desired.\n\n```python\nfrom reelscraper import ReelScraper, ReelMultiScraper\nfrom reelscraper.utils import LoggerManager\nfrom reelscraper.utils.database import DBManager\n\n# Configure logger and optional DB manager\nlogger = LoggerManager()\ndb_manager = DBManager(db_url=\"sqlite:///myreels.db\")\n\n# Create a single scraper instance\nsingle_scraper = ReelScraper(timeout=30, proxy=None, logger_manager=logger)\n\n# MultiScraper for concurrency, database integration, and auto-logging\nmulti_scraper = ReelMultiScraper(\n single_scraper,\n max_workers=5,\n db_manager=db_manager,\n)\n\n# File contains one username per line, e.g.:\n# user1\n# user2\naccounts_file_path = \"accounts.txt\"\n\n# Scrape accounts concurrently\n# If DBManager is provided, results are stored in DB, and this method returns None\nall_reels = multi_scraper.scrape_accounts(\n accounts_file=accounts_file_path,\n max_posts_per_profile=20,\n max_retires_per_profile=10\n)\n\nif all_reels is not None:\n print(f\"Total reels scraped: {len(all_reels)}\")\nelse:\n print(\"All reels have been stored in the database.\")\n```\n\n> **Note:** If `DBManager` is set, scraped reels are saved to the database instead of being returned.\n\n---\n\n## \ud83c\udfd7 Classes\n\n### `ReelScraper`\n- **Purpose:** \n Fetches Instagram Reels for a single user session.\n- **Key Components:** \n - `InstagramAPI`: Manages HTTP requests and proxy usage. \n - `Extractor`: Structures raw reel data. \n - `LoggerManager` (optional): Logs retries and status events.\n- **Key Method:** \n - `get_user_reels(username, max_posts=50, max_retries=10)`: Retrieves reels, handling pagination and retries.\n\n### `ReelMultiScraper`\n- **Purpose:** \n Scrapes multiple accounts in parallel, powered by a single `ReelScraper` instance.\n- **Key Components:** \n - `ThreadPoolExecutor`: Enables concurrent scraping. \n - `AccountManager`: Reads accounts from a local file. \n - `LoggerManager` (optional): Captures multi-account events. \n - `DBManager` (optional): Saves aggregated results to a database.\n- **Key Method:** \n - `scrape_accounts(accounts_file, max_posts_per_profile, max_retires_per_profile)`: Concurrently processes all accounts found in the file, optionally storing results in a DB.\n\n---\n\n## \ud83d\udcc4 Documentation\n\nFind full usage details in the [DOCS.md](https://github.com/andreaaazo/reelscraper/blob/master/DOCS.md) file.\n\n---\n\n## \ud83e\udd1d Contributing\n\nWe welcome PRs that enhance features, fix bugs, or improve docs.\n\n1. **Fork** the repo.\n2. **Create** a new branch.\n3. **Commit** code changes (add tests where possible).\n4. **Open** a pull request.\n\nYour contributions are appreciated\u2014happy coding!\n\n---\n\n## \ud83d\udcc4 License\n\nLicensed under the [MIT License](https://github.com/andreaaazo/reelscraper/blob/master/LICENSE.txt). Feel free to modify and distribute, but please be mindful of best practices and ethical scraping.\n\n---\n\n## \ud83d\ude4f Acknowledgments\n\n- **Python Community**: For making concurrency and requests straightforward to implement. \n- **Instagram**: For providing reel content that inspires creativity. \n- **Beverages**: For fueling late-night debugging and coding sessions.\n\n---\n\n## \u26a0 Disclaimer\n\nThis software is for **personal and educational** purposes only. Use it in accordance with Instagram\u2019s Terms of Service. We do not promote or condone large-scale commercial scraping or any violation of privacy/IP rights.\n\n---\n\nEnjoy scraping, and may your concurrency be swift!\n",
"bugtrack_url": null,
"license": "LICENSE.txt",
"summary": "A convenient way to harvest Reels data without breaking a sweat\u2014or Instagram's TOS",
"version": "2.2.2",
"project_urls": {
"Bug Tracker": "https://github.com/andreaaazo/reelscraper/issues",
"Documentation": "https://github.com/andreaaazo/reelscraper/blob/master/DOCS.md",
"Home": "https://github.com/andreaaazo/reelscraper"
},
"split_keywords": [
"instagram",
" scraper",
" reels"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "b7ef9f17d164c183da2e5f593ebe44ead05ed23a1a36f49f770f46f3299c4dd0",
"md5": "0fb02bd1eb6b4973834436c7e8588bd9",
"sha256": "27655d843a89549caff089e83bba3b6808dd8579d55fc2c8f984ee9c96cbd0d0"
},
"downloads": -1,
"filename": "reelscraper-2.2.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "0fb02bd1eb6b4973834436c7e8588bd9",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 19091,
"upload_time": "2025-01-15T01:11:49",
"upload_time_iso_8601": "2025-01-15T01:11:49.370851Z",
"url": "https://files.pythonhosted.org/packages/b7/ef/9f17d164c183da2e5f593ebe44ead05ed23a1a36f49f770f46f3299c4dd0/reelscraper-2.2.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "63adb1d6bc0a7c96218f33a7c89d74471ce72e4dcf5d64b08880e11dc42ecd29",
"md5": "dbc0776ab2f5592b1c7a642d3962e0a0",
"sha256": "15e9410a88908e753bc7e832919c90f81ac975285476fc51155ba855bce04783"
},
"downloads": -1,
"filename": "reelscraper-2.2.2.tar.gz",
"has_sig": false,
"md5_digest": "dbc0776ab2f5592b1c7a642d3962e0a0",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 34240,
"upload_time": "2025-01-15T01:11:52",
"upload_time_iso_8601": "2025-01-15T01:11:52.101977Z",
"url": "https://files.pythonhosted.org/packages/63/ad/b1d6bc0a7c96218f33a7c89d74471ce72e4dcf5d64b08880e11dc42ecd29/reelscraper-2.2.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-01-15 01:11:52",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "andreaaazo",
"github_project": "reelscraper",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "certifi",
"specs": [
[
"==",
"2024.12.14"
]
]
},
{
"name": "charset-normalizer",
"specs": [
[
"==",
"3.4.1"
]
]
},
{
"name": "et_xmlfile",
"specs": [
[
"==",
"2.0.0"
]
]
},
{
"name": "fake-useragent",
"specs": [
[
"==",
"2.0.3"
]
]
},
{
"name": "idna",
"specs": [
[
"==",
"3.10"
]
]
},
{
"name": "openpyxl",
"specs": [
[
"==",
"3.1.5"
]
]
},
{
"name": "PyYAML",
"specs": [
[
"==",
"6.0.2"
]
]
},
{
"name": "requests",
"specs": [
[
"==",
"2.32.3"
]
]
},
{
"name": "urllib3",
"specs": [
[
"==",
"2.3.0"
]
]
},
{
"name": "SQLAlchemy",
"specs": [
[
"==",
"2.0.37"
]
]
}
],
"lcname": "reelscraper"
}