trove-newspaper-images


Nametrove-newspaper-images JSON
Version 0.3.1 PyPI version JSON
download
home_pagehttps://github.com/wragge/trove_newspaper_images
SummaryTool to download Trove newspaper articles as images.
upload_time2024-04-16 04:04:33
maintainerNone
docs_urlNone
authorTim Sherratt
requires_python>=3.8
licenseMIT License
keywords nbdev jupyter notebook python
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # trove-newspaper-images


<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

## Background and alternatives

There’s no reliable way of downloading an image of a Trove newspaper
article from the web interface. The image download option produces an
HTML page with embedded images, and the article is often sliced into
pieces to fit the page.

This package includes tools to download articles as complete JPEG
images. If an article is printed across multiple newspaper pages,
multiple images will be downloaded – one for each page. It’s intended
for integration into other tools and processing workflows, or for people
who like working on the command line.

If you just want to quickly download an article as an image without
installing anything, you can [use this web
app](https://glam-workbench.net/trove-newspapers/#save-a-trove-newspaper-article-as-an-image)
in the GLAM Workbench. To download images of all articles returned by a
search in Trove, you can also use the [Trove Newspaper and Gazette
Harvester](https://glam-workbench.net/trove-harvester/).

See the
[documentation](https://wragge.github.io/trove_newspaper_images/) for
more information.

## Install

`pip install trove-newspaper-images`

## Download articles as images

### Use as a library

``` python
from trove_newspaper_images.articles import download_images

images = download_images('107024751')
images
```

    ['nla.news-article107024751-11565831.jpg']

### Use from the command line

Just call `trove_newspaper_images.download` from the command line with
an article identifier. You can use the `--output_dir` parameter to
specify a directory for the downloaded images. For example:

``` shell
trove_newspaper_images.download 107024751 --output_dir images
```

Add the `--masked` parameter to try and remove content from neighbouring
articles.

``` shell
trove_newspaper_images.download 107024751 --masked
```

------------------------------------------------------------------------

Created by [Tim Sherratt](https://timsherratt.org)
([@wragge](https://twitter.com/wragge)) for the [GLAM
Workbench](https://glam-workbench.net/).

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/wragge/trove_newspaper_images",
    "name": "trove-newspaper-images",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "nbdev jupyter notebook python",
    "author": "Tim Sherratt",
    "author_email": "tim@timsherratt.org",
    "download_url": "https://files.pythonhosted.org/packages/1d/7e/032036c1d3b1098ded5cdbcf4f52ad79af75f8143a92b071d2007abd3f39/trove_newspaper_images-0.3.1.tar.gz",
    "platform": null,
    "description": "# trove-newspaper-images\n\n\n<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->\n\n## Background and alternatives\n\nThere\u2019s no reliable way of downloading an image of a Trove newspaper\narticle from the web interface. The image download option produces an\nHTML page with embedded images, and the article is often sliced into\npieces to fit the page.\n\nThis package includes tools to download articles as complete JPEG\nimages. If an article is printed across multiple newspaper pages,\nmultiple images will be downloaded \u2013 one for each page. It\u2019s intended\nfor integration into other tools and processing workflows, or for people\nwho like working on the command line.\n\nIf you just want to quickly download an article as an image without\ninstalling anything, you can [use this web\napp](https://glam-workbench.net/trove-newspapers/#save-a-trove-newspaper-article-as-an-image)\nin the GLAM Workbench. To download images of all articles returned by a\nsearch in Trove, you can also use the [Trove Newspaper and Gazette\nHarvester](https://glam-workbench.net/trove-harvester/).\n\nSee the\n[documentation](https://wragge.github.io/trove_newspaper_images/) for\nmore information.\n\n## Install\n\n`pip install trove-newspaper-images`\n\n## Download articles as images\n\n### Use as a library\n\n``` python\nfrom trove_newspaper_images.articles import download_images\n\nimages = download_images('107024751')\nimages\n```\n\n    ['nla.news-article107024751-11565831.jpg']\n\n### Use from the command line\n\nJust call `trove_newspaper_images.download` from the command line with\nan article identifier. You can use the `--output_dir` parameter to\nspecify a directory for the downloaded images. For example:\n\n``` shell\ntrove_newspaper_images.download 107024751 --output_dir images\n```\n\nAdd the `--masked` parameter to try and remove content from neighbouring\narticles.\n\n``` shell\ntrove_newspaper_images.download 107024751 --masked\n```\n\n------------------------------------------------------------------------\n\nCreated by [Tim Sherratt](https://timsherratt.org)\n([@wragge](https://twitter.com/wragge)) for the [GLAM\nWorkbench](https://glam-workbench.net/).\n",
    "bugtrack_url": null,
    "license": "MIT License",
    "summary": "Tool to download Trove newspaper articles as images.",
    "version": "0.3.1",
    "project_urls": {
        "Homepage": "https://github.com/wragge/trove_newspaper_images"
    },
    "split_keywords": [
        "nbdev",
        "jupyter",
        "notebook",
        "python"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d401aa8459d71fd87bdfe3b624d777a7593934bb19a2f930f97984044eb60076",
                "md5": "e45bf00593b8b097065e9445b091ba7f",
                "sha256": "2eda61a7f7dd9a18f31156464d693d978b401ba9e6fe81f7f5b6006405da9e1a"
            },
            "downloads": -1,
            "filename": "trove_newspaper_images-0.3.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e45bf00593b8b097065e9445b091ba7f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 7206,
            "upload_time": "2024-04-16T04:04:31",
            "upload_time_iso_8601": "2024-04-16T04:04:31.953215Z",
            "url": "https://files.pythonhosted.org/packages/d4/01/aa8459d71fd87bdfe3b624d777a7593934bb19a2f930f97984044eb60076/trove_newspaper_images-0.3.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1d7e032036c1d3b1098ded5cdbcf4f52ad79af75f8143a92b071d2007abd3f39",
                "md5": "022e6f0b77142e3a88b96900604c9c3a",
                "sha256": "be34e2819edad672d1a904fa63169b08b9333b5f41482a6e34bb1a7b8ee68554"
            },
            "downloads": -1,
            "filename": "trove_newspaper_images-0.3.1.tar.gz",
            "has_sig": false,
            "md5_digest": "022e6f0b77142e3a88b96900604c9c3a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 8134,
            "upload_time": "2024-04-16T04:04:33",
            "upload_time_iso_8601": "2024-04-16T04:04:33.793376Z",
            "url": "https://files.pythonhosted.org/packages/1d/7e/032036c1d3b1098ded5cdbcf4f52ad79af75f8143a92b071d2007abd3f39/trove_newspaper_images-0.3.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-16 04:04:33",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "wragge",
    "github_project": "trove_newspaper_images",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "trove-newspaper-images"
}
        
Elapsed time: 0.23014s