real-imdb


Namereal-imdb JSON
Version 1.0.0 PyPI version JSON
download
home_pagehttps://github.com/pavan412kalyan/imdb-movie-scraper
SummaryA comprehensive IMDb scraper using GraphQL API
upload_time2025-08-23 05:49:39
maintainerNone
docs_urlNone
authorPavan Kalyan
requires_python>=3.7
licenseMIT
keywords imdb scraper movies graphql api
VCS
bugtrack_url
requirements requests Flask pandas pymongo beautifulsoup4 dnspython
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # IMDB Scraper

This repo is for:
1) Scraping content on IMDB website
2) REST API for content of IMDB
   - Static data - hosted on MongoDB
   - Dynamic data - scraping from IMDB on Request

### Link for API and documentation: https://imdb-rest-api.herokuapp.com/
# IMDb Data Extraction Tools

Comprehensive IMDb scraping and data extraction toolkit for:
1) **Data Extraction**: Scraping content from IMDb website using modern GraphQL APIs
2) **REST API**: Legacy API for IMDb content
   - Static data - hosted on MongoDB
   - Dynamic data - scraping from IMDb on Request

### Link for API and documentation: https://imdb-rest-api.herokuapp.com/

## IMDb Data Extraction Tools (ImdbDataExtraction)

Modern IMDb data extraction toolkit using GraphQL APIs for fast and reliable scraping.

## Features

### 🎬 Movie & TV Data
- **Pages Downloader**: Bulk movie/TV show scraping with pagination
- **Movie Info**: Detailed metadata extraction by ID
- **Search by String**: Find movies and people by search terms
- **Trending Movies**: Get currently trending content

### 👥 People Data
- **People Downloader**: Bulk celebrity/crew data extraction
- **Search by ID**: Get detailed person information

### 🎥 Media Content
- **Video Downloader**: Extract and download video trailers/clips
- **Video Gallery**: Get all videos from movie/show pages
- **Images Downloader**: High-quality poster and still downloads
- **Reviews Downloader**: Complete review extraction

## Usage

### Movie & TV Data
```bash
# Get bulk movie data (20 movies per page)
cd ImdbDataExtraction/pages_dowloader/
python3 scrape_all_movie_list.py --max-pages 5

# Search for specific movie
cd ../search_by_id/
python3 search_movie.py tt0944947  # Game of Thrones

# Search by text
cd ../search_by_string/
python3 search_by_string.py "batman" --limit 10

# Get trending movies
cd ../trending_downloader/
python3 trending_movies.py --count 10
```

### People Data
```bash
# Get bulk people data
cd people_downloader/
python3 scrape_all_people.py --max-pages 3
```

### Video Content
```bash
# Extract all videos from a movie
cd videos_downloader/
python3 extract_video_ids_from_gallery.py

# Download specific video
python3 download_video_from_id.py
```

## Key Improvements

- ✅ **GraphQL APIs**: Direct API access (no HTML parsing)
- ⚡ **Pagination**: Handle large datasets efficiently
- 🛡️ **Rate Limiting**: Built-in delays to avoid blocking
- 📊 **Comprehensive Data**: Movies, people, videos, images, reviews
- 🔍 **Search Capabilities**: Text search, ID lookup, trending content
- 📝 **JSON Output**: Structured data format
- 🎯 **Multiple Endpoints**: GraphQL + Suggestions API

## Project Structure

```
ImdbDataExtraction/
├── pages_dowloader/           # Movie/TV bulk scraping
├── search_by_id/              # Individual lookups
├── search_by_string/          # Text-based search
├── people_downloader/         # Celebrity/crew data
├── videos_downloader/         # Video content
├── images_dowloader/          # Image content
├── review_downloader/         # Review extraction
└── trending_downloader/       # Trending content
```

## Installation

```bash
# Install dependencies
pip install requests

# Optional: For video downloads
brew install ffmpeg  # macOS
```



## API Endpoints

- **GraphQL**: `https://caching.graphql.imdb.com/`
- **Suggestions**: `https://v3.sg.media-imdb.com/suggestion`

## Data Output Examples

### Movie Data
```json
{
  "id": "tt0944947",
  "title": "Game of Thrones",
  "year": 2011,
  "rating": 9.2,
  "genres": ["Action", "Adventure", "Drama"],
  "cast": [...],
  "videos": [...]
}
```

### Person Data
```json
{
  "id": "nm0001191",
  "name": "Adam Sandler",
  "professions": ["Actor", "Producer"],
  "knownFor": [...],
  "birthDate": "1966-09-09"
}
```

## Legacy Movie Data API
id -->  ImdbId Example -  tt4154796
lan --> telugu,tamil,upcoming
```
Endpoint                     Methods  Rule
---------------------------  -------  --------------------------------------
home                         GET      /
ScrapMovieNow                GET      /api/livescraper/movie/<id>
SearchById                   GET      /api/imdbid/<id>
SearchImagesById             GET      /api/images/<id>
genre                        GET      /api/genre/<genre>
movie                        GET      /api/movie/<movie>
scrapeReviewsNow             GET      /api/livescraper/reviews/<id>
scrapeReviewsNowAndDownload  GET      /api/livescraper/download/reviews/<id>
scrapeSearchByTitle          GET      /api/livescraper/title/<title>
scrapeTvshow                 GET      /api/livescraper/tv/<id>
scrapeTvshowAndDownload      GET      /api/livescraper/download/tv/<id>
trendingIndia                GET      /api/livescraper/trendingIndia/<lan>
```

## Legal Notice

This tool is for educational and research purposes. Respect IMDb's terms of service and rate limits.





   

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/pavan412kalyan/imdb-movie-scraper",
    "name": "real-imdb",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": "imdb, scraper, movies, graphql, api",
    "author": "Pavan Kalyan",
    "author_email": "Pavan Kalyan <your.email@example.com>",
    "download_url": "https://files.pythonhosted.org/packages/0e/c2/0632bee59bf0f1922b37d23a89a54906aad5763d004f642606d08770efa5/real_imdb-1.0.0.tar.gz",
    "platform": null,
    "description": "# IMDB Scraper\n\nThis repo is for:\n1) Scraping content on IMDB website\n2) REST API for content of IMDB\n   - Static data - hosted on MongoDB\n   - Dynamic data - scraping from IMDB on Request\n\n### Link for API and documentation: https://imdb-rest-api.herokuapp.com/\n# IMDb Data Extraction Tools\n\nComprehensive IMDb scraping and data extraction toolkit for:\n1) **Data Extraction**: Scraping content from IMDb website using modern GraphQL APIs\n2) **REST API**: Legacy API for IMDb content\n   - Static data - hosted on MongoDB\n   - Dynamic data - scraping from IMDb on Request\n\n### Link for API and documentation: https://imdb-rest-api.herokuapp.com/\n\n## IMDb Data Extraction Tools (ImdbDataExtraction)\n\nModern IMDb data extraction toolkit using GraphQL APIs for fast and reliable scraping.\n\n## Features\n\n### \ud83c\udfac Movie & TV Data\n- **Pages Downloader**: Bulk movie/TV show scraping with pagination\n- **Movie Info**: Detailed metadata extraction by ID\n- **Search by String**: Find movies and people by search terms\n- **Trending Movies**: Get currently trending content\n\n### \ud83d\udc65 People Data\n- **People Downloader**: Bulk celebrity/crew data extraction\n- **Search by ID**: Get detailed person information\n\n### \ud83c\udfa5 Media Content\n- **Video Downloader**: Extract and download video trailers/clips\n- **Video Gallery**: Get all videos from movie/show pages\n- **Images Downloader**: High-quality poster and still downloads\n- **Reviews Downloader**: Complete review extraction\n\n## Usage\n\n### Movie & TV Data\n```bash\n# Get bulk movie data (20 movies per page)\ncd ImdbDataExtraction/pages_dowloader/\npython3 scrape_all_movie_list.py --max-pages 5\n\n# Search for specific movie\ncd ../search_by_id/\npython3 search_movie.py tt0944947  # Game of Thrones\n\n# Search by text\ncd ../search_by_string/\npython3 search_by_string.py \"batman\" --limit 10\n\n# Get trending movies\ncd ../trending_downloader/\npython3 trending_movies.py --count 10\n```\n\n### People Data\n```bash\n# Get bulk people data\ncd people_downloader/\npython3 scrape_all_people.py --max-pages 3\n```\n\n### Video Content\n```bash\n# Extract all videos from a movie\ncd videos_downloader/\npython3 extract_video_ids_from_gallery.py\n\n# Download specific video\npython3 download_video_from_id.py\n```\n\n## Key Improvements\n\n- \u2705 **GraphQL APIs**: Direct API access (no HTML parsing)\n- \u26a1 **Pagination**: Handle large datasets efficiently\n- \ud83d\udee1\ufe0f **Rate Limiting**: Built-in delays to avoid blocking\n- \ud83d\udcca **Comprehensive Data**: Movies, people, videos, images, reviews\n- \ud83d\udd0d **Search Capabilities**: Text search, ID lookup, trending content\n- \ud83d\udcdd **JSON Output**: Structured data format\n- \ud83c\udfaf **Multiple Endpoints**: GraphQL + Suggestions API\n\n## Project Structure\n\n```\nImdbDataExtraction/\n\u251c\u2500\u2500 pages_dowloader/           # Movie/TV bulk scraping\n\u251c\u2500\u2500 search_by_id/              # Individual lookups\n\u251c\u2500\u2500 search_by_string/          # Text-based search\n\u251c\u2500\u2500 people_downloader/         # Celebrity/crew data\n\u251c\u2500\u2500 videos_downloader/         # Video content\n\u251c\u2500\u2500 images_dowloader/          # Image content\n\u251c\u2500\u2500 review_downloader/         # Review extraction\n\u2514\u2500\u2500 trending_downloader/       # Trending content\n```\n\n## Installation\n\n```bash\n# Install dependencies\npip install requests\n\n# Optional: For video downloads\nbrew install ffmpeg  # macOS\n```\n\n\n\n## API Endpoints\n\n- **GraphQL**: `https://caching.graphql.imdb.com/`\n- **Suggestions**: `https://v3.sg.media-imdb.com/suggestion`\n\n## Data Output Examples\n\n### Movie Data\n```json\n{\n  \"id\": \"tt0944947\",\n  \"title\": \"Game of Thrones\",\n  \"year\": 2011,\n  \"rating\": 9.2,\n  \"genres\": [\"Action\", \"Adventure\", \"Drama\"],\n  \"cast\": [...],\n  \"videos\": [...]\n}\n```\n\n### Person Data\n```json\n{\n  \"id\": \"nm0001191\",\n  \"name\": \"Adam Sandler\",\n  \"professions\": [\"Actor\", \"Producer\"],\n  \"knownFor\": [...],\n  \"birthDate\": \"1966-09-09\"\n}\n```\n\n## Legacy Movie Data API\nid -->  ImdbId Example -  tt4154796\nlan --> telugu,tamil,upcoming\n```\nEndpoint                     Methods  Rule\n---------------------------  -------  --------------------------------------\nhome                         GET      /\nScrapMovieNow                GET      /api/livescraper/movie/<id>\nSearchById                   GET      /api/imdbid/<id>\nSearchImagesById             GET      /api/images/<id>\ngenre                        GET      /api/genre/<genre>\nmovie                        GET      /api/movie/<movie>\nscrapeReviewsNow             GET      /api/livescraper/reviews/<id>\nscrapeReviewsNowAndDownload  GET      /api/livescraper/download/reviews/<id>\nscrapeSearchByTitle          GET      /api/livescraper/title/<title>\nscrapeTvshow                 GET      /api/livescraper/tv/<id>\nscrapeTvshowAndDownload      GET      /api/livescraper/download/tv/<id>\ntrendingIndia                GET      /api/livescraper/trendingIndia/<lan>\n```\n\n## Legal Notice\n\nThis tool is for educational and research purposes. Respect IMDb's terms of service and rate limits.\n\n\n\n\n\n   \n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A comprehensive IMDb scraper using GraphQL API",
    "version": "1.0.0",
    "project_urls": {
        "Homepage": "https://github.com/pavan412kalyan/imdb-movie-scraper",
        "Issues": "https://github.com/pavan412kalyan/imdb-movie-scraper/issues",
        "Repository": "https://github.com/pavan412kalyan/imdb-movie-scraper"
    },
    "split_keywords": [
        "imdb",
        " scraper",
        " movies",
        " graphql",
        " api"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "096f6bbc8024761642c4959ea02e6efe772bef5877a1b18bbfe74e0f53622fe8",
                "md5": "8cf05ed6e6b6c8455981c9043c4e1196",
                "sha256": "2dfd2329f75a23758e1ec020ef5b9acf3c51634322906ba7f5d2f67ba893761d"
            },
            "downloads": -1,
            "filename": "real_imdb-1.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "8cf05ed6e6b6c8455981c9043c4e1196",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 3458,
            "upload_time": "2025-08-23T05:49:36",
            "upload_time_iso_8601": "2025-08-23T05:49:36.594583Z",
            "url": "https://files.pythonhosted.org/packages/09/6f/6bbc8024761642c4959ea02e6efe772bef5877a1b18bbfe74e0f53622fe8/real_imdb-1.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "0ec20632bee59bf0f1922b37d23a89a54906aad5763d004f642606d08770efa5",
                "md5": "c379132001e262b117c0971c3b1ed8b2",
                "sha256": "9df6458436d6113834b9575fda067c31437bd26500c968069fd20bbddd2db24a"
            },
            "downloads": -1,
            "filename": "real_imdb-1.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "c379132001e262b117c0971c3b1ed8b2",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 6708497,
            "upload_time": "2025-08-23T05:49:39",
            "upload_time_iso_8601": "2025-08-23T05:49:39.492050Z",
            "url": "https://files.pythonhosted.org/packages/0e/c2/0632bee59bf0f1922b37d23a89a54906aad5763d004f642606d08770efa5/real_imdb-1.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-23 05:49:39",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "pavan412kalyan",
    "github_project": "imdb-movie-scraper",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "requests",
            "specs": [
                [
                    "==",
                    "2.22.0"
                ]
            ]
        },
        {
            "name": "Flask",
            "specs": [
                [
                    "==",
                    "1.1.1"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    "==",
                    "1.3.1"
                ]
            ]
        },
        {
            "name": "pymongo",
            "specs": [
                [
                    "==",
                    "3.11.3"
                ]
            ]
        },
        {
            "name": "beautifulsoup4",
            "specs": [
                [
                    "==",
                    "4.9.3"
                ]
            ]
        },
        {
            "name": "dnspython",
            "specs": [
                [
                    "==",
                    "1.16.0"
                ]
            ]
        }
    ],
    "lcname": "real-imdb"
}
        
Elapsed time: 1.05150s