scrapxd


Namescrapxd JSON
Version 0.1.1 PyPI version JSON
download
home_pageNone
SummaryA Python library for scraping, analysing and exporting Letterboxd data.
upload_time2025-10-21 14:52:42
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseMIT License Copyright (c) 2025 Cauã Santos Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords letterboxd scraper web scraping api movies
VCS
bugtrack_url
requirements annotated-types beautifulsoup4 bs4 certifi charset-normalizer coverage et_xmlfile fake-useragent idna iniconfig lxml numpy openpyxl packaging pluggy pydantic pydantic_core Pygments pytest pytest-cov pytest-dependency pytest-mock requests scipy setuptools six soupsieve style tenacity typing-inspection typing_extensions update urllib3
Travis-CI No Travis.
coveralls test coverage No coveralls.
            
# scrapxd: The Library for Letterboxd Data

[![PyPI Version](https://img.shields.io/pypi/v/scrapxd.svg)](https://pypi.org/project/scrapxd/)
[![Python Versions](https://img.shields.io/pypi/pyversions/scrapxd.svg)](https://pypi.org/project/scrapxd/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Build Status](https://img.shields.io/badge/build-passing-brightgreen.svg)](https://github.com/cauafsantosdev/scrapxd)

`scrapxd` is a Python library designed for web scraping, analyzing, and exporting data from [Letterboxd](https://letterboxd.com/), the social network for cinephiles. With an intuitive, strictly-typed API using Pydantic, `scrapxd` makes it easy to access user profiles, film lists, diaries, and much more.

---

## Key Features

* **Scrape Whatever You Need:** Extract detailed data from user profiles, including watched films, ratings, diary entries, lists, followers, and more.
* **Film Search:** Search for films on Letterboxd based on various filters.
* **Pydantic Data Models:** All returned data is validated and structured into Pydantic models, ensuring consistency and ease of use in your code.
* **Analytics Module:** Perform statistical analysis on the collected data, such as correlations and trends (requires the `[analytics]` extra).
* **Exporting Module:** Export collected data to popular formats like CSV, JSON, and Excel (`.xlsx`) (requires the `[export]` extra).
* **Retries Logic:** Utilizes `tenacity` for automatic retries on network failures, making the scraping process more reliable.
* **Simple and Intuitive:** Designed with a clean and easy-to-use API, as demonstrated in the examples.

---

## Installation

You can install the library directly from PyPI.

**Standard Installation:**

**Bash**

```bash
pip install scrapxd
```

The library has optional dependencies for extra features. You can install them as needed:

**For Data Analytics:**

**Bash**

```
pip install "scrapxd[analytics]"
```

**For File Exporting:**

**Bash**

```
pip install "scrapxd[export]"
```

**To Install Everything (including testing dependencies):**

**Bash**

```
pip install "scrapxd[all]"
```

---

## Quickstart

Using `scrapxd` is very simple. Here is a basic example to get a user's watched films:

**Python**

```
from scrapxd import Scrapxd

# 1. Create a client instance
client = Scrapxd()

# 2. Get data for a Letterboxd user
# The client handles searching and pagination automatically
user = client.get_user("your_username_here")
user_films = user.logs

# 3. Access the data
print(f"Total films watched by '{user_films.username}': {user_films.number_of_entries}")

# Each entry is a Pydantic object with structured data
for entry in user_films.entries[:5]: # Displaying the first 5
    print(f"- {entry.film.title} ({entry.film.year}) - Rating: {entry.rating}")

# 4. (Optional) Export the data to an Excel file
try:
    user_films.to_xlsx(f"{user_films.username}_films")
    print(f"\nData exported to {user_films.username}_films.xlsx")
except ImportError:
    print("\nTo export data, please install with: pip install \"scrapxd[export]\"")
```



---

## Detailed Examples

For a more in-depth guide covering all features like profile analysis, comparisons, and advanced use cases, please explore the Jupyter notebooks in the `/examples` folder:

* **[1. Quickstart Guide](https://www.google.com/search?q=./examples/1_quickstart_guide.ipynb&authuser=2)**
* **[2. Deep Dive Analysis](https://www.google.com/search?q=./examples/2_deep_dive_analysis.ipynb&authuser=2)**
* **[3. Comparing Profiles](https://www.google.com/search?q=./examples/3_comparing_profiles.ipynb&authuser=2)**
* **[4. Advanced Guide](https://www.google.com/search?q=./examples/4_advanced_guide.ipynb&authuser=2)**

---

## Contributing

Contributions are very welcome! If you have an idea for a new feature, find a bug, or want to improve the documentation, please open an [Issue](https://github.com/cauafsantosdev/scrapxd/issues) or submit a [Pull Request](https://github.com/cauafsantosdev/scrapxd/pulls).

---

## License

This project is licensed under the MIT License. See the [LICENSE](https://www.google.com/search?q=./LICENSE&authuser=2) file for more details.

---

## Contact

Cauã Santos - [My LinkedIn Profile](https://www.linkedin.com/in/cauafsantosdev/) - cauafsantosdev@gmail.com

GitHub URL: [https://github.com/cauafsantosdev/scrapxd](https://github.com/cauafsantosdev/scrapxd)

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "scrapxd",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "letterboxd, scraper, web scraping, api, movies",
    "author": null,
    "author_email": "Cau\u00e3 Santos <cauafsantosdev@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/b2/93/3d42c916fc7812834eeebab14380ad6b480171fc8d16c65baf5c7cde640c/scrapxd-0.1.1.tar.gz",
    "platform": null,
    "description": "\n# scrapxd: The Library for Letterboxd Data\n\n[![PyPI Version](https://img.shields.io/pypi/v/scrapxd.svg)](https://pypi.org/project/scrapxd/)\n[![Python Versions](https://img.shields.io/pypi/pyversions/scrapxd.svg)](https://pypi.org/project/scrapxd/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Build Status](https://img.shields.io/badge/build-passing-brightgreen.svg)](https://github.com/cauafsantosdev/scrapxd)\n\n`scrapxd` is a Python library designed for web scraping, analyzing, and exporting data from [Letterboxd](https://letterboxd.com/), the social network for cinephiles. With an intuitive, strictly-typed API using Pydantic, `scrapxd` makes it easy to access user profiles, film lists, diaries, and much more.\n\n---\n\n## Key Features\n\n* **Scrape Whatever You Need:** Extract detailed data from user profiles, including watched films, ratings, diary entries, lists, followers, and more.\n* **Film Search:** Search for films on Letterboxd based on various filters.\n* **Pydantic Data Models:** All returned data is validated and structured into Pydantic models, ensuring consistency and ease of use in your code.\n* **Analytics Module:** Perform statistical analysis on the collected data, such as correlations and trends (requires the `[analytics]` extra).\n* **Exporting Module:** Export collected data to popular formats like CSV, JSON, and Excel (`.xlsx`) (requires the `[export]` extra).\n* **Retries Logic:** Utilizes `tenacity` for automatic retries on network failures, making the scraping process more reliable.\n* **Simple and Intuitive:** Designed with a clean and easy-to-use API, as demonstrated in the examples.\n\n---\n\n## Installation\n\nYou can install the library directly from PyPI.\n\n**Standard Installation:**\n\n**Bash**\n\n```bash\npip install scrapxd\n```\n\nThe library has optional dependencies for extra features. You can install them as needed:\n\n**For Data Analytics:**\n\n**Bash**\n\n```\npip install \"scrapxd[analytics]\"\n```\n\n**For File Exporting:**\n\n**Bash**\n\n```\npip install \"scrapxd[export]\"\n```\n\n**To Install Everything (including testing dependencies):**\n\n**Bash**\n\n```\npip install \"scrapxd[all]\"\n```\n\n---\n\n## Quickstart\n\nUsing `scrapxd` is very simple. Here is a basic example to get a user's watched films:\n\n**Python**\n\n```\nfrom scrapxd import Scrapxd\n\n# 1. Create a client instance\nclient = Scrapxd()\n\n# 2. Get data for a Letterboxd user\n# The client handles searching and pagination automatically\nuser = client.get_user(\"your_username_here\")\nuser_films = user.logs\n\n# 3. Access the data\nprint(f\"Total films watched by '{user_films.username}': {user_films.number_of_entries}\")\n\n# Each entry is a Pydantic object with structured data\nfor entry in user_films.entries[:5]: # Displaying the first 5\n    print(f\"- {entry.film.title} ({entry.film.year}) - Rating: {entry.rating}\")\n\n# 4. (Optional) Export the data to an Excel file\ntry:\n    user_films.to_xlsx(f\"{user_films.username}_films\")\n    print(f\"\\nData exported to {user_films.username}_films.xlsx\")\nexcept ImportError:\n    print(\"\\nTo export data, please install with: pip install \\\"scrapxd[export]\\\"\")\n```\n\n\n\n---\n\n## Detailed Examples\n\nFor a more in-depth guide covering all features like profile analysis, comparisons, and advanced use cases, please explore the Jupyter notebooks in the `/examples` folder:\n\n* **[1. Quickstart Guide](https://www.google.com/search?q=./examples/1_quickstart_guide.ipynb&authuser=2)**\n* **[2. Deep Dive Analysis](https://www.google.com/search?q=./examples/2_deep_dive_analysis.ipynb&authuser=2)**\n* **[3. Comparing Profiles](https://www.google.com/search?q=./examples/3_comparing_profiles.ipynb&authuser=2)**\n* **[4. Advanced Guide](https://www.google.com/search?q=./examples/4_advanced_guide.ipynb&authuser=2)**\n\n---\n\n## Contributing\n\nContributions are very welcome! If you have an idea for a new feature, find a bug, or want to improve the documentation, please open an [Issue](https://github.com/cauafsantosdev/scrapxd/issues) or submit a [Pull Request](https://github.com/cauafsantosdev/scrapxd/pulls).\n\n---\n\n## License\n\nThis project is licensed under the MIT License. See the [LICENSE](https://www.google.com/search?q=./LICENSE&authuser=2) file for more details.\n\n---\n\n## Contact\n\nCau\u00e3 Santos - [My LinkedIn Profile](https://www.linkedin.com/in/cauafsantosdev/) - cauafsantosdev@gmail.com\n\nGitHub URL: [https://github.com/cauafsantosdev/scrapxd](https://github.com/cauafsantosdev/scrapxd)\n",
    "bugtrack_url": null,
    "license": "MIT License\n        \n        Copyright (c) 2025 Cau\u00e3 Santos\n        \n        Permission is hereby granted, free of charge, to any person obtaining a copy\n        of this software and associated documentation files (the \"Software\"), to deal\n        in the Software without restriction, including without limitation the rights\n        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\n        copies of the Software, and to permit persons to whom the Software is\n        furnished to do so, subject to the following conditions:\n        \n        The above copyright notice and this permission notice shall be included in all\n        copies or substantial portions of the Software.\n        \n        THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\n        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\n        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\n        SOFTWARE.\n        ",
    "summary": "A Python library for scraping, analysing and exporting Letterboxd data.",
    "version": "0.1.1",
    "project_urls": {
        "Bug Tracker": "https://github.com/cauafsantosdev/scrapxd/issues",
        "Homepage": "https://github.com/cauafsantosdev/scrapxd",
        "Repository": "https://github.com/cauafsantosdev/scrapxd"
    },
    "split_keywords": [
        "letterboxd",
        " scraper",
        " web scraping",
        " api",
        " movies"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "8eef83fffa942e9696020eb106c00d6c29513768c6cf884387de9d43813498c5",
                "md5": "5984dcc872002490af2b3177e99f944a",
                "sha256": "a3f714ebb144383ac31ba4188fd6fb704e4562336bb54c42989bbdc83373fe27"
            },
            "downloads": -1,
            "filename": "scrapxd-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "5984dcc872002490af2b3177e99f944a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 37859,
            "upload_time": "2025-10-21T14:52:40",
            "upload_time_iso_8601": "2025-10-21T14:52:40.880437Z",
            "url": "https://files.pythonhosted.org/packages/8e/ef/83fffa942e9696020eb106c00d6c29513768c6cf884387de9d43813498c5/scrapxd-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "b2933d42c916fc7812834eeebab14380ad6b480171fc8d16c65baf5c7cde640c",
                "md5": "bfec41825a2f79fc0bc0cf63e5501b04",
                "sha256": "e12ca916a9652b88440d2f308e8fcec299e124fb35e2cadf4739f5de506e4c2f"
            },
            "downloads": -1,
            "filename": "scrapxd-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "bfec41825a2f79fc0bc0cf63e5501b04",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 44438,
            "upload_time": "2025-10-21T14:52:42",
            "upload_time_iso_8601": "2025-10-21T14:52:42.569154Z",
            "url": "https://files.pythonhosted.org/packages/b2/93/3d42c916fc7812834eeebab14380ad6b480171fc8d16c65baf5c7cde640c/scrapxd-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-21 14:52:42",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "cauafsantosdev",
    "github_project": "scrapxd",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "annotated-types",
            "specs": [
                [
                    "==",
                    "0.7.0"
                ]
            ]
        },
        {
            "name": "beautifulsoup4",
            "specs": [
                [
                    "==",
                    "4.13.4"
                ]
            ]
        },
        {
            "name": "bs4",
            "specs": [
                [
                    "==",
                    "0.0.2"
                ]
            ]
        },
        {
            "name": "certifi",
            "specs": [
                [
                    "==",
                    "2025.4.26"
                ]
            ]
        },
        {
            "name": "charset-normalizer",
            "specs": [
                [
                    "==",
                    "3.4.2"
                ]
            ]
        },
        {
            "name": "coverage",
            "specs": [
                [
                    "==",
                    "7.10.7"
                ]
            ]
        },
        {
            "name": "et_xmlfile",
            "specs": [
                [
                    "==",
                    "2.0.0"
                ]
            ]
        },
        {
            "name": "fake-useragent",
            "specs": [
                [
                    "==",
                    "2.2.0"
                ]
            ]
        },
        {
            "name": "idna",
            "specs": [
                [
                    "==",
                    "3.10"
                ]
            ]
        },
        {
            "name": "iniconfig",
            "specs": [
                [
                    "==",
                    "2.1.0"
                ]
            ]
        },
        {
            "name": "lxml",
            "specs": [
                [
                    "==",
                    "5.4.0"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    "==",
                    "2.3.2"
                ]
            ]
        },
        {
            "name": "openpyxl",
            "specs": [
                [
                    "==",
                    "3.1.5"
                ]
            ]
        },
        {
            "name": "packaging",
            "specs": [
                [
                    "==",
                    "25.0"
                ]
            ]
        },
        {
            "name": "pluggy",
            "specs": [
                [
                    "==",
                    "1.6.0"
                ]
            ]
        },
        {
            "name": "pydantic",
            "specs": [
                [
                    "==",
                    "2.11.5"
                ]
            ]
        },
        {
            "name": "pydantic_core",
            "specs": [
                [
                    "==",
                    "2.33.2"
                ]
            ]
        },
        {
            "name": "Pygments",
            "specs": [
                [
                    "==",
                    "2.19.1"
                ]
            ]
        },
        {
            "name": "pytest",
            "specs": [
                [
                    "==",
                    "8.4.0"
                ]
            ]
        },
        {
            "name": "pytest-cov",
            "specs": [
                [
                    "==",
                    "7.0.0"
                ]
            ]
        },
        {
            "name": "pytest-dependency",
            "specs": [
                [
                    "==",
                    "0.6.0"
                ]
            ]
        },
        {
            "name": "pytest-mock",
            "specs": [
                [
                    "==",
                    "3.15.1"
                ]
            ]
        },
        {
            "name": "requests",
            "specs": [
                [
                    "==",
                    "2.32.4"
                ]
            ]
        },
        {
            "name": "scipy",
            "specs": [
                [
                    "==",
                    "1.16.1"
                ]
            ]
        },
        {
            "name": "setuptools",
            "specs": [
                [
                    "==",
                    "80.9.0"
                ]
            ]
        },
        {
            "name": "six",
            "specs": [
                [
                    "==",
                    "1.17.0"
                ]
            ]
        },
        {
            "name": "soupsieve",
            "specs": [
                [
                    "==",
                    "2.7"
                ]
            ]
        },
        {
            "name": "style",
            "specs": [
                [
                    "==",
                    "1.1.0"
                ]
            ]
        },
        {
            "name": "tenacity",
            "specs": [
                [
                    "==",
                    "9.1.2"
                ]
            ]
        },
        {
            "name": "typing-inspection",
            "specs": [
                [
                    "==",
                    "0.4.1"
                ]
            ]
        },
        {
            "name": "typing_extensions",
            "specs": [
                [
                    "==",
                    "4.14.0"
                ]
            ]
        },
        {
            "name": "update",
            "specs": [
                [
                    "==",
                    "0.0.1"
                ]
            ]
        },
        {
            "name": "urllib3",
            "specs": [
                [
                    "==",
                    "2.4.0"
                ]
            ]
        }
    ],
    "lcname": "scrapxd"
}
        
Elapsed time: 0.79712s