trove-newspaper-harvester


Nametrove-newspaper-harvester JSON
Version 0.7.2 PyPI version JSON
download
home_pagehttps://github.com/wragge/trove-newspaper-harvester
SummaryTool for bulk harvests of digitised newspaper articles from Trove
upload_time2023-10-23 04:47:08
maintainer
docs_urlNone
authorTim Sherratt
requires_python>=3.8
licenseMIT License
keywords nbdev jupyter notebook python
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            trove-newspaper-harvester
================

<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

<div>

[![](https://zenodo.org/badge/DOI/10.5281/zenodo.7103174.svg)](https://doi.org/10.5281/zenodo.7103174)

</div>

[View the full
documentation](https://wragge.github.io/trove-newspaper-harvester/)

The Trove Newspaper (& Gazette) Harvester makes it easy to download
large quantities of digitised articles from [Trove’s newspapers and
gazettes](https://trove.nla.gov.au/newspaper/). Just give it a search
from the Trove web interface, and the harvester will save the metadata
of all the articles in a CSV (spreadsheet) file for further analysis.
You can also save the full text of every article, as well as copies of
the articles as JPG images, and even PDFs. While the web interface will
only show you the first 2,000 results matching your search, the
Newspaper Harvester will get everything.

## No installation required!

If you want to use the harvester without installing anything, just head
over to the [Trove Newspaper
Harvester](https://glam-workbench.github.io/trove-harvester/) section in
my GLAM Workbench.

## Installation

``` sh
pip install trove-newspaper-harvester
```

Before you do any harvesting you need to get yourself a [Trove API
key](https://trove.nla.gov.au/about/create-something/using-api).

## Use as a library

``` python
from trove_newspaper_harvester.core import prepare_query, Harvester
```

Generate a set of query parameters using
[`prepare_query`](https://wragge.github.io/trove-newspaper-harvester/core.html#prepare_query).

``` python
my_query = "https://trove.nla.gov.au/search/category/newspapers?keyword=wragge"
my_api_key = "mYSecREtkEy"

my_query_params = prepare_query(query=my_query)
```

Initialise the
[`Harvester`](https://wragge.github.io/trove-newspaper-harvester/core.html#harvester)
with your query parameters and api key.

``` python
harvester = Harvester(query_params=my_query_params, key=my_api_key)
```

Start the harvest!

``` python
harvester.harvest()
```

If the harvest fails just run
[`Harvester.harvest`](https://wragge.github.io/trove-newspaper-harvester/core.html#harvester.harvest)
again.

[See the core module
documentation](https://wragge.github.io/trove-newspaper-harvester/core.html)
for more options and examples.

## Use as a command-line tool

There are three basic commands:

- **start** – start a new harvest
- **restart** – restart a stalled harvest
- **report** – view harvest details

### Start a harvest

To start a new harvest you can just do:

``` sh
troveharvester start "[Trove query]" [Trove API key]
```

The Trove query can either be a url copied and pasted from a search in
the [Trove web interface](http://trove.nla.gov.au/newspaper/), or a
Trove API query url constructed using something like the [Trove API
Console](https://troveconsole.herokuapp.com/). Enclose the url in double
quotes.

[See the CLI module
documentation](https://wragge.github.io/trove-newspaper-harvester/cli.html)
for more details.

------------------------------------------------------------------------

Created by [Tim Sherratt](https://timsherratt.org/) for the [GLAM
Workbench](https://glam-workbench.net/). Support this project by
becoming a [GitHub sponsor](https://github.com/sponsors/wragge?o=esb).



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/wragge/trove-newspaper-harvester",
    "name": "trove-newspaper-harvester",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "nbdev jupyter notebook python",
    "author": "Tim Sherratt",
    "author_email": "tim@timsherratt.org",
    "download_url": "https://files.pythonhosted.org/packages/9d/98/973eef4f8ae16318b9d4ea8f31be9a8fb017fdc5645f70538f106f5fd566/trove-newspaper-harvester-0.7.2.tar.gz",
    "platform": null,
    "description": "trove-newspaper-harvester\n================\n\n<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->\n\n<div>\n\n[![](https://zenodo.org/badge/DOI/10.5281/zenodo.7103174.svg)](https://doi.org/10.5281/zenodo.7103174)\n\n</div>\n\n[View the full\ndocumentation](https://wragge.github.io/trove-newspaper-harvester/)\n\nThe Trove Newspaper (& Gazette) Harvester makes it easy to download\nlarge quantities of digitised articles from [Trove\u2019s newspapers and\ngazettes](https://trove.nla.gov.au/newspaper/). Just give it a search\nfrom the Trove web interface, and the harvester will save the metadata\nof all the articles in a CSV (spreadsheet) file for further analysis.\nYou can also save the full text of every article, as well as copies of\nthe articles as JPG images, and even PDFs. While the web interface will\nonly show you the first 2,000 results matching your search, the\nNewspaper Harvester will get everything.\n\n## No installation required!\n\nIf you want to use the harvester without installing anything, just head\nover to the [Trove Newspaper\nHarvester](https://glam-workbench.github.io/trove-harvester/) section in\nmy GLAM Workbench.\n\n## Installation\n\n``` sh\npip install trove-newspaper-harvester\n```\n\nBefore you do any harvesting you need to get yourself a [Trove API\nkey](https://trove.nla.gov.au/about/create-something/using-api).\n\n## Use as a library\n\n``` python\nfrom trove_newspaper_harvester.core import prepare_query, Harvester\n```\n\nGenerate a set of query parameters using\n[`prepare_query`](https://wragge.github.io/trove-newspaper-harvester/core.html#prepare_query).\n\n``` python\nmy_query = \"https://trove.nla.gov.au/search/category/newspapers?keyword=wragge\"\nmy_api_key = \"mYSecREtkEy\"\n\nmy_query_params = prepare_query(query=my_query)\n```\n\nInitialise the\n[`Harvester`](https://wragge.github.io/trove-newspaper-harvester/core.html#harvester)\nwith your query parameters and api key.\n\n``` python\nharvester = Harvester(query_params=my_query_params, key=my_api_key)\n```\n\nStart the harvest!\n\n``` python\nharvester.harvest()\n```\n\nIf the harvest fails just run\n[`Harvester.harvest`](https://wragge.github.io/trove-newspaper-harvester/core.html#harvester.harvest)\nagain.\n\n[See the core module\ndocumentation](https://wragge.github.io/trove-newspaper-harvester/core.html)\nfor more options and examples.\n\n## Use as a command-line tool\n\nThere are three basic commands:\n\n- **start** \u2013 start a new harvest\n- **restart** \u2013 restart a stalled harvest\n- **report** \u2013 view harvest details\n\n### Start a harvest\n\nTo start a new harvest you can just do:\n\n``` sh\ntroveharvester start \"[Trove query]\" [Trove API key]\n```\n\nThe Trove query can either be a url copied and pasted from a search in\nthe [Trove web interface](http://trove.nla.gov.au/newspaper/), or a\nTrove API query url constructed using something like the [Trove API\nConsole](https://troveconsole.herokuapp.com/). Enclose the url in double\nquotes.\n\n[See the CLI module\ndocumentation](https://wragge.github.io/trove-newspaper-harvester/cli.html)\nfor more details.\n\n------------------------------------------------------------------------\n\nCreated by [Tim Sherratt](https://timsherratt.org/) for the [GLAM\nWorkbench](https://glam-workbench.net/). Support this project by\nbecoming a [GitHub sponsor](https://github.com/sponsors/wragge?o=esb).\n\n\n",
    "bugtrack_url": null,
    "license": "MIT License",
    "summary": "Tool for bulk harvests of digitised newspaper articles from Trove",
    "version": "0.7.2",
    "project_urls": {
        "Homepage": "https://github.com/wragge/trove-newspaper-harvester"
    },
    "split_keywords": [
        "nbdev",
        "jupyter",
        "notebook",
        "python"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d2b1d95353398994f097e299d4001e39d0d7f0210a5b4260e4ed54bf5657b730",
                "md5": "ffc46bba732d5171f37a6be3cd314191",
                "sha256": "e6eafdb8ec732de84bb6c9d0e0e1c65c91f4fa44f6888c11ec4491f345298f78"
            },
            "downloads": -1,
            "filename": "trove_newspaper_harvester-0.7.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ffc46bba732d5171f37a6be3cd314191",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 14534,
            "upload_time": "2023-10-23T04:47:06",
            "upload_time_iso_8601": "2023-10-23T04:47:06.775805Z",
            "url": "https://files.pythonhosted.org/packages/d2/b1/d95353398994f097e299d4001e39d0d7f0210a5b4260e4ed54bf5657b730/trove_newspaper_harvester-0.7.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9d98973eef4f8ae16318b9d4ea8f31be9a8fb017fdc5645f70538f106f5fd566",
                "md5": "b6fd200d4c0a286f6a18e69e612853d4",
                "sha256": "403463d21d1b611fd1a38619bf505ac93771d7ed92bdf83617b9678b4d1feb23"
            },
            "downloads": -1,
            "filename": "trove-newspaper-harvester-0.7.2.tar.gz",
            "has_sig": false,
            "md5_digest": "b6fd200d4c0a286f6a18e69e612853d4",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 15642,
            "upload_time": "2023-10-23T04:47:08",
            "upload_time_iso_8601": "2023-10-23T04:47:08.928288Z",
            "url": "https://files.pythonhosted.org/packages/9d/98/973eef4f8ae16318b9d4ea8f31be9a8fb017fdc5645f70538f106f5fd566/trove-newspaper-harvester-0.7.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-10-23 04:47:08",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "wragge",
    "github_project": "trove-newspaper-harvester",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "trove-newspaper-harvester"
}
        
Elapsed time: 0.17069s