artvee-scraper


Nameartvee-scraper JSON
Version 4.0.4 PyPI version JSON
download
home_pagehttps://github.com/zduclos/artvee-scraper
SummaryFetch public domain artwork from Artvee (https://www.artvee.com)
upload_time2024-10-28 01:12:18
maintainerNone
docs_urlNone
authorZach Duclos
requires_python>=3.10
licenseMIT
keywords artvee artwork webscraper
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # artvee-scraper
![PyPI Version](https://img.shields.io/pypi/v/artvee-scraper.svg)

**artvee-scraper** is an easy to use library for fetching public domain artwork from [Artvee](https://www.artvee.com).

- [Artvee Web-scraper](#artvee-scraper)
  - [Overview](#overview)
  - [Installation](#installation)
  - [Getting Started](#getting-started)
  - [Examples](#examples)

## Overview
Artvee-scraper is a web scraper which concurrently extracts artwork from Artvee. Callbacks are notified asynchronously for each scraped
artwork so that user-defined actions may be taken. These actions are typically used to store the artwork, which can subsequently be used
for display, machine learning, or other applications.

> If you are seeking a command line utility, please note that it has been relocated to a separate project - [artvee-scraper-cli](https://github.com/zduclos/artvee-scraper-cli). Alternatively, you may still use [artvee-scraper 3.0.1](https://pypi.org/project/artvee-scraper/3.0.1/).

## Installation

Using PyPI
```console
$ python -m pip install artvee-scraper
```
Python 3.10+ is officially supported.

## Getting Started
1. Create callbacks (lambda, function, method).
    ```python
    # Use a lambda to log the event
    log_event = lambda artwork, thrown: logger.info(
        "Processing '%s' by %s", artwork.title, artwork.artist
    )
    
    # Write the artwork to a file as JSON format
    def on_artwork_received(artwork: Artwork, thrown: Exception | None = None) -> None:
        if thrown is None:
            with open(f"/tmp/{artwork.resource}.json", "w", encoding="UTF-8") as fout:
                json.dump(artwork.to_dict(), fout, ensure_ascii=False)
    ```
2. Initialize the scraper.
    ```python
    scraper = ArtveeScraper() # scrapes all categories by default
    ```
3. Register callbacks. The callbacks will be notified asynchronously for each event in the order that they are registered.
    ```python
    scraper.register_listener(log_event).register_listener(on_artwork_received)
    ```
4. Start scraping. Use either the context manager construct, or join to block until done.<br>
    `Example 1 - using context manager`
    ```python
    with scraper as s:
        s.start() # blocks until done
    ```
    `Example 2 - using join()`
    ```python
    scraper.start()
      ... // do other things
    scraper.join() # blocks until done
    ```

## Examples
**Create** `app.py`
```python
import logging
import os

from artvee_scraper.artvee_client import CategoryType
from artvee_scraper.artwork import Artwork
from artvee_scraper.scraper import ArtveeScraper

# Set up logging configuration
logging.basicConfig(
    level=logging.DEBUG,
    format="%(asctime)s.%(msecs)03d %(levelname)s [%(threadName)s] %(module)s.%(funcName)s(%(lineno)d) | %(message)s",
    datefmt="%Y-%m-%d %H:%M:%S"
)
logger = logging.getLogger(__name__)


def handle_event(artwork: Artwork, thrown: Exception | None = None) -> None:
    """A callback for handling the result of an artwork processing event."""

    if thrown is not None:
        # An error occurred; the artwork is partially populated (missing artwork.image.raw)
        logger.error("Failed to process artist=%s, title=%s, url=%s; %s", artwork.artist, artwork.title, artwork.url, thrown)
    else:
        file_path = os.path.expanduser(f"~/Downloads/{artwork.resource}.jpg") # create a unique filename
        logger.info("Writing %s to %s", artwork.title, file_path)

        # Write the raw image bytes to a file. 
        with open(file_path, "wb") as fout:
            fout.write(artwork.image.raw)


def main():
    # Choose which categories to scrape. Using `list(CategoryType)` creates a list of all categories.
    categories = [CategoryType.ABSTRACT, CategoryType.DRAWINGS]

    # Initialize the scraper
    scraper = ArtveeScraper(categories=categories)

    # Register listener functions
    scraper.register_listener(handle_event)

    # Start scraping
    with scraper as s:
        s.start() # blocks until done


if __name__ == "__main__":
    main()
```

**Run** `app.py`
```shell
me@linux-desktop:~$ python app.py
2038-01-19 19:36:36.839 DEBUG [MainThread] scraper.start(125) | Starting
2038-01-19 19:36:36.839 DEBUG [Thread-1 (_exec)] scraper._exec(152) | Executing scraper for categories [<CategoryType.ABSTRACT: 'abstract'>, <CategoryType.DRAWINGS: 'drawings'>]
2038-01-19 19:36:36.839 DEBUG [Thread-1 (_exec)] artvee_client.get_page_count(113) | Retrieving page count; category=abstract
2038-01-19 19:36:36.854 DEBUG [Thread-1 (_exec)] connectionpool._new_conn(1051) | Starting new HTTPS connection (1): artvee.com:443
2038-01-19 19:36:37.737 DEBUG [Thread-1 (_exec)] connectionpool._make_request(546) | https://artvee.com:443 "GET /c/abstract/page/1/?per_page=70 HTTP/11" 301 0
2038-01-19 19:36:37.827 DEBUG [Thread-1 (_exec)] connectionpool._make_request(546) | https://artvee.com:443 "GET /c/abstract/?per_page=70 HTTP/11" 200 19573
2038-01-19 19:36:37.955 DEBUG [Thread-1 (_exec)] scraper._exec(160) | Category abstract has 108 page(s)
2038-01-19 19:36:37.955 DEBUG [Thread-1 (_exec)] scraper._exec(166) | Processing category abstract, page (1/108)
2038-01-19 19:36:37.955 DEBUG [Thread-1 (_exec)] artvee_client.get_metadata(152) | Retrieving metadata; category=abstract, page=1
    ...
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/zduclos/artvee-scraper",
    "name": "artvee-scraper",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "artvee, artwork, webscraper",
    "author": "Zach Duclos",
    "author_email": "zduclos.github@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/33/e0/ed8f546561f8c3788f9222b186139fc1a2492f6b864645779d7da7e12ed1/artvee-scraper-4.0.4.tar.gz",
    "platform": null,
    "description": "# artvee-scraper\n![PyPI Version](https://img.shields.io/pypi/v/artvee-scraper.svg)\n\n**artvee-scraper** is an easy to use library for fetching public domain artwork from [Artvee](https://www.artvee.com).\n\n- [Artvee Web-scraper](#artvee-scraper)\n  - [Overview](#overview)\n  - [Installation](#installation)\n  - [Getting Started](#getting-started)\n  - [Examples](#examples)\n\n## Overview\nArtvee-scraper is a web scraper which concurrently extracts artwork from Artvee. Callbacks are notified asynchronously for each scraped\nartwork so that user-defined actions may be taken. These actions are typically used to store the artwork, which can subsequently be used\nfor display, machine learning, or other applications.\n\n> If you are seeking a command line utility, please note that it has been relocated to a separate project - [artvee-scraper-cli](https://github.com/zduclos/artvee-scraper-cli). Alternatively, you may still use [artvee-scraper 3.0.1](https://pypi.org/project/artvee-scraper/3.0.1/).\n\n## Installation\n\nUsing PyPI\n```console\n$ python -m pip install artvee-scraper\n```\nPython 3.10+ is officially supported.\n\n## Getting Started\n1. Create callbacks (lambda, function, method).\n    ```python\n    # Use a lambda to log the event\n    log_event = lambda artwork, thrown: logger.info(\n        \"Processing '%s' by %s\", artwork.title, artwork.artist\n    )\n    \n    # Write the artwork to a file as JSON format\n    def on_artwork_received(artwork: Artwork, thrown: Exception | None = None) -> None:\n        if thrown is None:\n            with open(f\"/tmp/{artwork.resource}.json\", \"w\", encoding=\"UTF-8\") as fout:\n                json.dump(artwork.to_dict(), fout, ensure_ascii=False)\n    ```\n2. Initialize the scraper.\n    ```python\n    scraper = ArtveeScraper() # scrapes all categories by default\n    ```\n3. Register callbacks. The callbacks will be notified asynchronously for each event in the order that they are registered.\n    ```python\n    scraper.register_listener(log_event).register_listener(on_artwork_received)\n    ```\n4. Start scraping. Use either the context manager construct, or join to block until done.<br>\n    `Example 1 - using context manager`\n    ```python\n    with scraper as s:\n        s.start() # blocks until done\n    ```\n    `Example 2 - using join()`\n    ```python\n    scraper.start()\n      ... // do other things\n    scraper.join() # blocks until done\n    ```\n\n## Examples\n**Create** `app.py`\n```python\nimport logging\nimport os\n\nfrom artvee_scraper.artvee_client import CategoryType\nfrom artvee_scraper.artwork import Artwork\nfrom artvee_scraper.scraper import ArtveeScraper\n\n# Set up logging configuration\nlogging.basicConfig(\n    level=logging.DEBUG,\n    format=\"%(asctime)s.%(msecs)03d %(levelname)s [%(threadName)s] %(module)s.%(funcName)s(%(lineno)d) | %(message)s\",\n    datefmt=\"%Y-%m-%d %H:%M:%S\"\n)\nlogger = logging.getLogger(__name__)\n\n\ndef handle_event(artwork: Artwork, thrown: Exception | None = None) -> None:\n    \"\"\"A callback for handling the result of an artwork processing event.\"\"\"\n\n    if thrown is not None:\n        # An error occurred; the artwork is partially populated (missing artwork.image.raw)\n        logger.error(\"Failed to process artist=%s, title=%s, url=%s; %s\", artwork.artist, artwork.title, artwork.url, thrown)\n    else:\n        file_path = os.path.expanduser(f\"~/Downloads/{artwork.resource}.jpg\") # create a unique filename\n        logger.info(\"Writing %s to %s\", artwork.title, file_path)\n\n        # Write the raw image bytes to a file. \n        with open(file_path, \"wb\") as fout:\n            fout.write(artwork.image.raw)\n\n\ndef main():\n    # Choose which categories to scrape. Using `list(CategoryType)` creates a list of all categories.\n    categories = [CategoryType.ABSTRACT, CategoryType.DRAWINGS]\n\n    # Initialize the scraper\n    scraper = ArtveeScraper(categories=categories)\n\n    # Register listener functions\n    scraper.register_listener(handle_event)\n\n    # Start scraping\n    with scraper as s:\n        s.start() # blocks until done\n\n\nif __name__ == \"__main__\":\n    main()\n```\n\n**Run** `app.py`\n```shell\nme@linux-desktop:~$ python app.py\n2038-01-19 19:36:36.839 DEBUG [MainThread] scraper.start(125) | Starting\n2038-01-19 19:36:36.839 DEBUG [Thread-1 (_exec)] scraper._exec(152) | Executing scraper for categories [<CategoryType.ABSTRACT: 'abstract'>, <CategoryType.DRAWINGS: 'drawings'>]\n2038-01-19 19:36:36.839 DEBUG [Thread-1 (_exec)] artvee_client.get_page_count(113) | Retrieving page count; category=abstract\n2038-01-19 19:36:36.854 DEBUG [Thread-1 (_exec)] connectionpool._new_conn(1051) | Starting new HTTPS connection (1): artvee.com:443\n2038-01-19 19:36:37.737 DEBUG [Thread-1 (_exec)] connectionpool._make_request(546) | https://artvee.com:443 \"GET /c/abstract/page/1/?per_page=70 HTTP/11\" 301 0\n2038-01-19 19:36:37.827 DEBUG [Thread-1 (_exec)] connectionpool._make_request(546) | https://artvee.com:443 \"GET /c/abstract/?per_page=70 HTTP/11\" 200 19573\n2038-01-19 19:36:37.955 DEBUG [Thread-1 (_exec)] scraper._exec(160) | Category abstract has 108 page(s)\n2038-01-19 19:36:37.955 DEBUG [Thread-1 (_exec)] scraper._exec(166) | Processing category abstract, page (1/108)\n2038-01-19 19:36:37.955 DEBUG [Thread-1 (_exec)] artvee_client.get_metadata(152) | Retrieving metadata; category=abstract, page=1\n    ...\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Fetch public domain artwork from Artvee (https://www.artvee.com)",
    "version": "4.0.4",
    "project_urls": {
        "Bug Reports": "https://github.com/zduclos/artvee-scraper/issues",
        "Homepage": "https://github.com/zduclos/artvee-scraper",
        "Source": "https://github.com/zduclos/artvee-scraper"
    },
    "split_keywords": [
        "artvee",
        " artwork",
        " webscraper"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8c3ebe517f050609de65034819ede873f6837cbb1dfb5e8f863cf8a62ca270e9",
                "md5": "621bc3cfc4850e428b4f9d25f68535c7",
                "sha256": "2e1cd2bef747b6581c93aaf273f3e876ab2d4560a6eae93221e3afff349a1979"
            },
            "downloads": -1,
            "filename": "artvee_scraper-4.0.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "621bc3cfc4850e428b4f9d25f68535c7",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 12937,
            "upload_time": "2024-10-28T01:12:16",
            "upload_time_iso_8601": "2024-10-28T01:12:16.811266Z",
            "url": "https://files.pythonhosted.org/packages/8c/3e/be517f050609de65034819ede873f6837cbb1dfb5e8f863cf8a62ca270e9/artvee_scraper-4.0.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "33e0ed8f546561f8c3788f9222b186139fc1a2492f6b864645779d7da7e12ed1",
                "md5": "ebc3c6ed1a4e5b4a4ba9a4a9772b3e3d",
                "sha256": "f51e23184120984f27c123216ffa5dab163cc8406ca0e9947211283e59bf848e"
            },
            "downloads": -1,
            "filename": "artvee-scraper-4.0.4.tar.gz",
            "has_sig": false,
            "md5_digest": "ebc3c6ed1a4e5b4a4ba9a4a9772b3e3d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 13485,
            "upload_time": "2024-10-28T01:12:18",
            "upload_time_iso_8601": "2024-10-28T01:12:18.367897Z",
            "url": "https://files.pythonhosted.org/packages/33/e0/ed8f546561f8c3788f9222b186139fc1a2492f6b864645779d7da7e12ed1/artvee-scraper-4.0.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-28 01:12:18",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "zduclos",
    "github_project": "artvee-scraper",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "artvee-scraper"
}
        
Elapsed time: 0.80259s