# artvee-scraper

**artvee-scraper** is an easy to use library for fetching public domain artwork from [Artvee](https://www.artvee.com).
- [Artvee Web-scraper](#artvee-scraper)
- [Overview](#overview)
- [Installation](#installation)
- [Getting Started](#getting-started)
- [Examples](#examples)
## Overview
Artvee-scraper is a web scraper which concurrently extracts artwork from Artvee. Callbacks are notified asynchronously for each scraped
artwork so that user-defined actions may be taken. These actions are typically used to store the artwork, which can subsequently be used
for display, machine learning, or other applications.
> If you are seeking a command line utility, please note that it has been relocated to a separate project - [artvee-scraper-cli](https://github.com/zduclos/artvee-scraper-cli). Alternatively, you may still use [artvee-scraper 3.0.1](https://pypi.org/project/artvee-scraper/3.0.1/).
## Installation
Using PyPI
```console
$ python -m pip install artvee-scraper
```
Python 3.10+ is officially supported.
## Getting Started
1. Create callbacks (lambda, function, method).
```python
# Use a lambda to log the event
log_event = lambda artwork, thrown: logger.info(
"Processing '%s' by %s", artwork.title, artwork.artist
)
# Write the artwork to a file as JSON format
def on_artwork_received(artwork: Artwork, thrown: Exception | None = None) -> None:
if thrown is None:
with open(f"/tmp/{artwork.resource}.json", "w", encoding="UTF-8") as fout:
json.dump(artwork.to_dict(), fout, ensure_ascii=False)
```
2. Initialize the scraper.
```python
scraper = ArtveeScraper() # scrapes all categories by default
```
3. Register callbacks. The callbacks will be notified asynchronously for each event in the order that they are registered.
```python
scraper.register_listener(log_event).register_listener(on_artwork_received)
```
4. Start scraping. Use either the context manager construct, or join to block until done.<br>
`Example 1 - using context manager`
```python
with scraper as s:
s.start() # blocks until done
```
`Example 2 - using join()`
```python
scraper.start()
... // do other things
scraper.join() # blocks until done
```
## Examples
**Create** `app.py`
```python
import logging
import os
from artvee_scraper.artvee_client import CategoryType
from artvee_scraper.artwork import Artwork
from artvee_scraper.scraper import ArtveeScraper
# Set up logging configuration
logging.basicConfig(
level=logging.DEBUG,
format="%(asctime)s.%(msecs)03d %(levelname)s [%(threadName)s] %(module)s.%(funcName)s(%(lineno)d) | %(message)s",
datefmt="%Y-%m-%d %H:%M:%S"
)
logger = logging.getLogger(__name__)
def handle_event(artwork: Artwork, thrown: Exception | None = None) -> None:
"""A callback for handling the result of an artwork processing event."""
if thrown is not None:
# An error occurred; the artwork is partially populated (missing artwork.image.raw)
logger.error("Failed to process artist=%s, title=%s, url=%s; %s", artwork.artist, artwork.title, artwork.url, thrown)
else:
file_path = os.path.expanduser(f"~/Downloads/{artwork.resource}.jpg") # create a unique filename
logger.info("Writing %s to %s", artwork.title, file_path)
# Write the raw image bytes to a file.
with open(file_path, "wb") as fout:
fout.write(artwork.image.raw)
def main():
# Choose which categories to scrape. Using `list(CategoryType)` creates a list of all categories.
categories = [CategoryType.ABSTRACT, CategoryType.DRAWINGS]
# Initialize the scraper
scraper = ArtveeScraper(categories=categories)
# Register listener functions
scraper.register_listener(handle_event)
# Start scraping
with scraper as s:
s.start() # blocks until done
if __name__ == "__main__":
main()
```
**Run** `app.py`
```shell
me@linux-desktop:~$ python app.py
2038-01-19 19:36:36.839 DEBUG [MainThread] scraper.start(125) | Starting
2038-01-19 19:36:36.839 DEBUG [Thread-1 (_exec)] scraper._exec(152) | Executing scraper for categories [<CategoryType.ABSTRACT: 'abstract'>, <CategoryType.DRAWINGS: 'drawings'>]
2038-01-19 19:36:36.839 DEBUG [Thread-1 (_exec)] artvee_client.get_page_count(113) | Retrieving page count; category=abstract
2038-01-19 19:36:36.854 DEBUG [Thread-1 (_exec)] connectionpool._new_conn(1051) | Starting new HTTPS connection (1): artvee.com:443
2038-01-19 19:36:37.737 DEBUG [Thread-1 (_exec)] connectionpool._make_request(546) | https://artvee.com:443 "GET /c/abstract/page/1/?per_page=70 HTTP/11" 301 0
2038-01-19 19:36:37.827 DEBUG [Thread-1 (_exec)] connectionpool._make_request(546) | https://artvee.com:443 "GET /c/abstract/?per_page=70 HTTP/11" 200 19573
2038-01-19 19:36:37.955 DEBUG [Thread-1 (_exec)] scraper._exec(160) | Category abstract has 108 page(s)
2038-01-19 19:36:37.955 DEBUG [Thread-1 (_exec)] scraper._exec(166) | Processing category abstract, page (1/108)
2038-01-19 19:36:37.955 DEBUG [Thread-1 (_exec)] artvee_client.get_metadata(152) | Retrieving metadata; category=abstract, page=1
...
```
Raw data
{
"_id": null,
"home_page": "https://github.com/zduclos/artvee-scraper",
"name": "artvee-scraper",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "artvee, artwork, webscraper",
"author": "Zach Duclos",
"author_email": "zduclos.github@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/33/e0/ed8f546561f8c3788f9222b186139fc1a2492f6b864645779d7da7e12ed1/artvee-scraper-4.0.4.tar.gz",
"platform": null,
"description": "# artvee-scraper\n\n\n**artvee-scraper** is an easy to use library for fetching public domain artwork from [Artvee](https://www.artvee.com).\n\n- [Artvee Web-scraper](#artvee-scraper)\n - [Overview](#overview)\n - [Installation](#installation)\n - [Getting Started](#getting-started)\n - [Examples](#examples)\n\n## Overview\nArtvee-scraper is a web scraper which concurrently extracts artwork from Artvee. Callbacks are notified asynchronously for each scraped\nartwork so that user-defined actions may be taken. These actions are typically used to store the artwork, which can subsequently be used\nfor display, machine learning, or other applications.\n\n> If you are seeking a command line utility, please note that it has been relocated to a separate project - [artvee-scraper-cli](https://github.com/zduclos/artvee-scraper-cli). Alternatively, you may still use [artvee-scraper 3.0.1](https://pypi.org/project/artvee-scraper/3.0.1/).\n\n## Installation\n\nUsing PyPI\n```console\n$ python -m pip install artvee-scraper\n```\nPython 3.10+ is officially supported.\n\n## Getting Started\n1. Create callbacks (lambda, function, method).\n ```python\n # Use a lambda to log the event\n log_event = lambda artwork, thrown: logger.info(\n \"Processing '%s' by %s\", artwork.title, artwork.artist\n )\n \n # Write the artwork to a file as JSON format\n def on_artwork_received(artwork: Artwork, thrown: Exception | None = None) -> None:\n if thrown is None:\n with open(f\"/tmp/{artwork.resource}.json\", \"w\", encoding=\"UTF-8\") as fout:\n json.dump(artwork.to_dict(), fout, ensure_ascii=False)\n ```\n2. Initialize the scraper.\n ```python\n scraper = ArtveeScraper() # scrapes all categories by default\n ```\n3. Register callbacks. The callbacks will be notified asynchronously for each event in the order that they are registered.\n ```python\n scraper.register_listener(log_event).register_listener(on_artwork_received)\n ```\n4. Start scraping. Use either the context manager construct, or join to block until done.<br>\n `Example 1 - using context manager`\n ```python\n with scraper as s:\n s.start() # blocks until done\n ```\n `Example 2 - using join()`\n ```python\n scraper.start()\n ... // do other things\n scraper.join() # blocks until done\n ```\n\n## Examples\n**Create** `app.py`\n```python\nimport logging\nimport os\n\nfrom artvee_scraper.artvee_client import CategoryType\nfrom artvee_scraper.artwork import Artwork\nfrom artvee_scraper.scraper import ArtveeScraper\n\n# Set up logging configuration\nlogging.basicConfig(\n level=logging.DEBUG,\n format=\"%(asctime)s.%(msecs)03d %(levelname)s [%(threadName)s] %(module)s.%(funcName)s(%(lineno)d) | %(message)s\",\n datefmt=\"%Y-%m-%d %H:%M:%S\"\n)\nlogger = logging.getLogger(__name__)\n\n\ndef handle_event(artwork: Artwork, thrown: Exception | None = None) -> None:\n \"\"\"A callback for handling the result of an artwork processing event.\"\"\"\n\n if thrown is not None:\n # An error occurred; the artwork is partially populated (missing artwork.image.raw)\n logger.error(\"Failed to process artist=%s, title=%s, url=%s; %s\", artwork.artist, artwork.title, artwork.url, thrown)\n else:\n file_path = os.path.expanduser(f\"~/Downloads/{artwork.resource}.jpg\") # create a unique filename\n logger.info(\"Writing %s to %s\", artwork.title, file_path)\n\n # Write the raw image bytes to a file. \n with open(file_path, \"wb\") as fout:\n fout.write(artwork.image.raw)\n\n\ndef main():\n # Choose which categories to scrape. Using `list(CategoryType)` creates a list of all categories.\n categories = [CategoryType.ABSTRACT, CategoryType.DRAWINGS]\n\n # Initialize the scraper\n scraper = ArtveeScraper(categories=categories)\n\n # Register listener functions\n scraper.register_listener(handle_event)\n\n # Start scraping\n with scraper as s:\n s.start() # blocks until done\n\n\nif __name__ == \"__main__\":\n main()\n```\n\n**Run** `app.py`\n```shell\nme@linux-desktop:~$ python app.py\n2038-01-19 19:36:36.839 DEBUG [MainThread] scraper.start(125) | Starting\n2038-01-19 19:36:36.839 DEBUG [Thread-1 (_exec)] scraper._exec(152) | Executing scraper for categories [<CategoryType.ABSTRACT: 'abstract'>, <CategoryType.DRAWINGS: 'drawings'>]\n2038-01-19 19:36:36.839 DEBUG [Thread-1 (_exec)] artvee_client.get_page_count(113) | Retrieving page count; category=abstract\n2038-01-19 19:36:36.854 DEBUG [Thread-1 (_exec)] connectionpool._new_conn(1051) | Starting new HTTPS connection (1): artvee.com:443\n2038-01-19 19:36:37.737 DEBUG [Thread-1 (_exec)] connectionpool._make_request(546) | https://artvee.com:443 \"GET /c/abstract/page/1/?per_page=70 HTTP/11\" 301 0\n2038-01-19 19:36:37.827 DEBUG [Thread-1 (_exec)] connectionpool._make_request(546) | https://artvee.com:443 \"GET /c/abstract/?per_page=70 HTTP/11\" 200 19573\n2038-01-19 19:36:37.955 DEBUG [Thread-1 (_exec)] scraper._exec(160) | Category abstract has 108 page(s)\n2038-01-19 19:36:37.955 DEBUG [Thread-1 (_exec)] scraper._exec(166) | Processing category abstract, page (1/108)\n2038-01-19 19:36:37.955 DEBUG [Thread-1 (_exec)] artvee_client.get_metadata(152) | Retrieving metadata; category=abstract, page=1\n ...\n```\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Fetch public domain artwork from Artvee (https://www.artvee.com)",
"version": "4.0.4",
"project_urls": {
"Bug Reports": "https://github.com/zduclos/artvee-scraper/issues",
"Homepage": "https://github.com/zduclos/artvee-scraper",
"Source": "https://github.com/zduclos/artvee-scraper"
},
"split_keywords": [
"artvee",
" artwork",
" webscraper"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "8c3ebe517f050609de65034819ede873f6837cbb1dfb5e8f863cf8a62ca270e9",
"md5": "621bc3cfc4850e428b4f9d25f68535c7",
"sha256": "2e1cd2bef747b6581c93aaf273f3e876ab2d4560a6eae93221e3afff349a1979"
},
"downloads": -1,
"filename": "artvee_scraper-4.0.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "621bc3cfc4850e428b4f9d25f68535c7",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 12937,
"upload_time": "2024-10-28T01:12:16",
"upload_time_iso_8601": "2024-10-28T01:12:16.811266Z",
"url": "https://files.pythonhosted.org/packages/8c/3e/be517f050609de65034819ede873f6837cbb1dfb5e8f863cf8a62ca270e9/artvee_scraper-4.0.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "33e0ed8f546561f8c3788f9222b186139fc1a2492f6b864645779d7da7e12ed1",
"md5": "ebc3c6ed1a4e5b4a4ba9a4a9772b3e3d",
"sha256": "f51e23184120984f27c123216ffa5dab163cc8406ca0e9947211283e59bf848e"
},
"downloads": -1,
"filename": "artvee-scraper-4.0.4.tar.gz",
"has_sig": false,
"md5_digest": "ebc3c6ed1a4e5b4a4ba9a4a9772b3e3d",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 13485,
"upload_time": "2024-10-28T01:12:18",
"upload_time_iso_8601": "2024-10-28T01:12:18.367897Z",
"url": "https://files.pythonhosted.org/packages/33/e0/ed8f546561f8c3788f9222b186139fc1a2492f6b864645779d7da7e12ed1/artvee-scraper-4.0.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-28 01:12:18",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "zduclos",
"github_project": "artvee-scraper",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "artvee-scraper"
}