# IMDB Scraper
This repo is for:
1) Scraping content on IMDB website
2) REST API for content of IMDB
- Static data - hosted on MongoDB
- Dynamic data - scraping from IMDB on Request
### Link for API and documentation: https://imdb-rest-api.herokuapp.com/
# IMDb Data Extraction Tools
Comprehensive IMDb scraping and data extraction toolkit for:
1) **Data Extraction**: Scraping content from IMDb website using modern GraphQL APIs
2) **REST API**: Legacy API for IMDb content
- Static data - hosted on MongoDB
- Dynamic data - scraping from IMDb on Request
### Link for API and documentation: https://imdb-rest-api.herokuapp.com/
## IMDb Data Extraction Tools (ImdbDataExtraction)
Modern IMDb data extraction toolkit using GraphQL APIs for fast and reliable scraping.
## Features
### 🎬 Movie & TV Data
- **Pages Downloader**: Bulk movie/TV show scraping with pagination
- **Movie Info**: Detailed metadata extraction by ID
- **Search by String**: Find movies and people by search terms
- **Trending Movies**: Get currently trending content
### 👥 People Data
- **People Downloader**: Bulk celebrity/crew data extraction
- **Search by ID**: Get detailed person information
### 🎥 Media Content
- **Video Downloader**: Extract and download video trailers/clips
- **Video Gallery**: Get all videos from movie/show pages
- **Images Downloader**: High-quality poster and still downloads
- **Reviews Downloader**: Complete review extraction
## Usage
### Movie & TV Data
```bash
# Get bulk movie data (20 movies per page)
cd ImdbDataExtraction/pages_dowloader/
python3 scrape_all_movie_list.py --max-pages 5
# Search for specific movie
cd ../search_by_id/
python3 search_movie.py tt0944947 # Game of Thrones
# Search by text
cd ../search_by_string/
python3 search_by_string.py "batman" --limit 10
# Get trending movies
cd ../trending_downloader/
python3 trending_movies.py --count 10
```
### People Data
```bash
# Get bulk people data
cd people_downloader/
python3 scrape_all_people.py --max-pages 3
```
### Video Content
```bash
# Extract all videos from a movie
cd videos_downloader/
python3 extract_video_ids_from_gallery.py
# Download specific video
python3 download_video_from_id.py
```
## Key Improvements
- ✅ **GraphQL APIs**: Direct API access (no HTML parsing)
- ⚡ **Pagination**: Handle large datasets efficiently
- 🛡️ **Rate Limiting**: Built-in delays to avoid blocking
- 📊 **Comprehensive Data**: Movies, people, videos, images, reviews
- 🔍 **Search Capabilities**: Text search, ID lookup, trending content
- 📝 **JSON Output**: Structured data format
- 🎯 **Multiple Endpoints**: GraphQL + Suggestions API
## Project Structure
```
ImdbDataExtraction/
├── pages_dowloader/ # Movie/TV bulk scraping
├── search_by_id/ # Individual lookups
├── search_by_string/ # Text-based search
├── people_downloader/ # Celebrity/crew data
├── videos_downloader/ # Video content
├── images_dowloader/ # Image content
├── review_downloader/ # Review extraction
└── trending_downloader/ # Trending content
```
## Installation
```bash
# Install dependencies
pip install requests
# Optional: For video downloads
brew install ffmpeg # macOS
```
## API Endpoints
- **GraphQL**: `https://caching.graphql.imdb.com/`
- **Suggestions**: `https://v3.sg.media-imdb.com/suggestion`
## Data Output Examples
### Movie Data
```json
{
"id": "tt0944947",
"title": "Game of Thrones",
"year": 2011,
"rating": 9.2,
"genres": ["Action", "Adventure", "Drama"],
"cast": [...],
"videos": [...]
}
```
### Person Data
```json
{
"id": "nm0001191",
"name": "Adam Sandler",
"professions": ["Actor", "Producer"],
"knownFor": [...],
"birthDate": "1966-09-09"
}
```
## Legacy Movie Data API
id --> ImdbId Example - tt4154796
lan --> telugu,tamil,upcoming
```
Endpoint Methods Rule
--------------------------- ------- --------------------------------------
home GET /
ScrapMovieNow GET /api/livescraper/movie/<id>
SearchById GET /api/imdbid/<id>
SearchImagesById GET /api/images/<id>
genre GET /api/genre/<genre>
movie GET /api/movie/<movie>
scrapeReviewsNow GET /api/livescraper/reviews/<id>
scrapeReviewsNowAndDownload GET /api/livescraper/download/reviews/<id>
scrapeSearchByTitle GET /api/livescraper/title/<title>
scrapeTvshow GET /api/livescraper/tv/<id>
scrapeTvshowAndDownload GET /api/livescraper/download/tv/<id>
trendingIndia GET /api/livescraper/trendingIndia/<lan>
```
## Legal Notice
This tool is for educational and research purposes. Respect IMDb's terms of service and rate limits.
Raw data
{
"_id": null,
"home_page": "https://github.com/pavan412kalyan/imdb-movie-scraper",
"name": "real-imdb",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": null,
"keywords": "imdb, scraper, movies, graphql, api",
"author": "Pavan Kalyan",
"author_email": "Pavan Kalyan <your.email@example.com>",
"download_url": "https://files.pythonhosted.org/packages/0e/c2/0632bee59bf0f1922b37d23a89a54906aad5763d004f642606d08770efa5/real_imdb-1.0.0.tar.gz",
"platform": null,
"description": "# IMDB Scraper\n\nThis repo is for:\n1) Scraping content on IMDB website\n2) REST API for content of IMDB\n - Static data - hosted on MongoDB\n - Dynamic data - scraping from IMDB on Request\n\n### Link for API and documentation: https://imdb-rest-api.herokuapp.com/\n# IMDb Data Extraction Tools\n\nComprehensive IMDb scraping and data extraction toolkit for:\n1) **Data Extraction**: Scraping content from IMDb website using modern GraphQL APIs\n2) **REST API**: Legacy API for IMDb content\n - Static data - hosted on MongoDB\n - Dynamic data - scraping from IMDb on Request\n\n### Link for API and documentation: https://imdb-rest-api.herokuapp.com/\n\n## IMDb Data Extraction Tools (ImdbDataExtraction)\n\nModern IMDb data extraction toolkit using GraphQL APIs for fast and reliable scraping.\n\n## Features\n\n### \ud83c\udfac Movie & TV Data\n- **Pages Downloader**: Bulk movie/TV show scraping with pagination\n- **Movie Info**: Detailed metadata extraction by ID\n- **Search by String**: Find movies and people by search terms\n- **Trending Movies**: Get currently trending content\n\n### \ud83d\udc65 People Data\n- **People Downloader**: Bulk celebrity/crew data extraction\n- **Search by ID**: Get detailed person information\n\n### \ud83c\udfa5 Media Content\n- **Video Downloader**: Extract and download video trailers/clips\n- **Video Gallery**: Get all videos from movie/show pages\n- **Images Downloader**: High-quality poster and still downloads\n- **Reviews Downloader**: Complete review extraction\n\n## Usage\n\n### Movie & TV Data\n```bash\n# Get bulk movie data (20 movies per page)\ncd ImdbDataExtraction/pages_dowloader/\npython3 scrape_all_movie_list.py --max-pages 5\n\n# Search for specific movie\ncd ../search_by_id/\npython3 search_movie.py tt0944947 # Game of Thrones\n\n# Search by text\ncd ../search_by_string/\npython3 search_by_string.py \"batman\" --limit 10\n\n# Get trending movies\ncd ../trending_downloader/\npython3 trending_movies.py --count 10\n```\n\n### People Data\n```bash\n# Get bulk people data\ncd people_downloader/\npython3 scrape_all_people.py --max-pages 3\n```\n\n### Video Content\n```bash\n# Extract all videos from a movie\ncd videos_downloader/\npython3 extract_video_ids_from_gallery.py\n\n# Download specific video\npython3 download_video_from_id.py\n```\n\n## Key Improvements\n\n- \u2705 **GraphQL APIs**: Direct API access (no HTML parsing)\n- \u26a1 **Pagination**: Handle large datasets efficiently\n- \ud83d\udee1\ufe0f **Rate Limiting**: Built-in delays to avoid blocking\n- \ud83d\udcca **Comprehensive Data**: Movies, people, videos, images, reviews\n- \ud83d\udd0d **Search Capabilities**: Text search, ID lookup, trending content\n- \ud83d\udcdd **JSON Output**: Structured data format\n- \ud83c\udfaf **Multiple Endpoints**: GraphQL + Suggestions API\n\n## Project Structure\n\n```\nImdbDataExtraction/\n\u251c\u2500\u2500 pages_dowloader/ # Movie/TV bulk scraping\n\u251c\u2500\u2500 search_by_id/ # Individual lookups\n\u251c\u2500\u2500 search_by_string/ # Text-based search\n\u251c\u2500\u2500 people_downloader/ # Celebrity/crew data\n\u251c\u2500\u2500 videos_downloader/ # Video content\n\u251c\u2500\u2500 images_dowloader/ # Image content\n\u251c\u2500\u2500 review_downloader/ # Review extraction\n\u2514\u2500\u2500 trending_downloader/ # Trending content\n```\n\n## Installation\n\n```bash\n# Install dependencies\npip install requests\n\n# Optional: For video downloads\nbrew install ffmpeg # macOS\n```\n\n\n\n## API Endpoints\n\n- **GraphQL**: `https://caching.graphql.imdb.com/`\n- **Suggestions**: `https://v3.sg.media-imdb.com/suggestion`\n\n## Data Output Examples\n\n### Movie Data\n```json\n{\n \"id\": \"tt0944947\",\n \"title\": \"Game of Thrones\",\n \"year\": 2011,\n \"rating\": 9.2,\n \"genres\": [\"Action\", \"Adventure\", \"Drama\"],\n \"cast\": [...],\n \"videos\": [...]\n}\n```\n\n### Person Data\n```json\n{\n \"id\": \"nm0001191\",\n \"name\": \"Adam Sandler\",\n \"professions\": [\"Actor\", \"Producer\"],\n \"knownFor\": [...],\n \"birthDate\": \"1966-09-09\"\n}\n```\n\n## Legacy Movie Data API\nid --> ImdbId Example - tt4154796\nlan --> telugu,tamil,upcoming\n```\nEndpoint Methods Rule\n--------------------------- ------- --------------------------------------\nhome GET /\nScrapMovieNow GET /api/livescraper/movie/<id>\nSearchById GET /api/imdbid/<id>\nSearchImagesById GET /api/images/<id>\ngenre GET /api/genre/<genre>\nmovie GET /api/movie/<movie>\nscrapeReviewsNow GET /api/livescraper/reviews/<id>\nscrapeReviewsNowAndDownload GET /api/livescraper/download/reviews/<id>\nscrapeSearchByTitle GET /api/livescraper/title/<title>\nscrapeTvshow GET /api/livescraper/tv/<id>\nscrapeTvshowAndDownload GET /api/livescraper/download/tv/<id>\ntrendingIndia GET /api/livescraper/trendingIndia/<lan>\n```\n\n## Legal Notice\n\nThis tool is for educational and research purposes. Respect IMDb's terms of service and rate limits.\n\n\n\n\n\n \n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A comprehensive IMDb scraper using GraphQL API",
"version": "1.0.0",
"project_urls": {
"Homepage": "https://github.com/pavan412kalyan/imdb-movie-scraper",
"Issues": "https://github.com/pavan412kalyan/imdb-movie-scraper/issues",
"Repository": "https://github.com/pavan412kalyan/imdb-movie-scraper"
},
"split_keywords": [
"imdb",
" scraper",
" movies",
" graphql",
" api"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "096f6bbc8024761642c4959ea02e6efe772bef5877a1b18bbfe74e0f53622fe8",
"md5": "8cf05ed6e6b6c8455981c9043c4e1196",
"sha256": "2dfd2329f75a23758e1ec020ef5b9acf3c51634322906ba7f5d2f67ba893761d"
},
"downloads": -1,
"filename": "real_imdb-1.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "8cf05ed6e6b6c8455981c9043c4e1196",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 3458,
"upload_time": "2025-08-23T05:49:36",
"upload_time_iso_8601": "2025-08-23T05:49:36.594583Z",
"url": "https://files.pythonhosted.org/packages/09/6f/6bbc8024761642c4959ea02e6efe772bef5877a1b18bbfe74e0f53622fe8/real_imdb-1.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "0ec20632bee59bf0f1922b37d23a89a54906aad5763d004f642606d08770efa5",
"md5": "c379132001e262b117c0971c3b1ed8b2",
"sha256": "9df6458436d6113834b9575fda067c31437bd26500c968069fd20bbddd2db24a"
},
"downloads": -1,
"filename": "real_imdb-1.0.0.tar.gz",
"has_sig": false,
"md5_digest": "c379132001e262b117c0971c3b1ed8b2",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 6708497,
"upload_time": "2025-08-23T05:49:39",
"upload_time_iso_8601": "2025-08-23T05:49:39.492050Z",
"url": "https://files.pythonhosted.org/packages/0e/c2/0632bee59bf0f1922b37d23a89a54906aad5763d004f642606d08770efa5/real_imdb-1.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-23 05:49:39",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "pavan412kalyan",
"github_project": "imdb-movie-scraper",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "requests",
"specs": [
[
"==",
"2.22.0"
]
]
},
{
"name": "Flask",
"specs": [
[
"==",
"1.1.1"
]
]
},
{
"name": "pandas",
"specs": [
[
"==",
"1.3.1"
]
]
},
{
"name": "pymongo",
"specs": [
[
"==",
"3.11.3"
]
]
},
{
"name": "beautifulsoup4",
"specs": [
[
"==",
"4.9.3"
]
]
},
{
"name": "dnspython",
"specs": [
[
"==",
"1.16.0"
]
]
}
],
"lcname": "real-imdb"
}