zensols.cnndmdb


Namezensols.cnndmdb JSON
Version 0.0.1 PyPI version JSON
download
home_pagehttps://github.com/plandes/cnndmdb
SummaryCreates a SQLite database if the CNN and DailyMail summarization dataset.
upload_time2023-11-29 01:47:46
maintainer
docs_urlNone
authorPaul Landes
requires_python
license
keywords tooling
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # CNN/DailyMail Dataset as SQLite

[![PyPI][pypi-badge]][pypi-link]
[![Python 3.9][python39-badge]][python39-link]
[![Python 3.10][python310-badge]][python310-link]
[![Build Status][build-badge]][build-link]

Creates a SQLite database if the CNN and DailyMail summarization dataset.


## Documentation

See the [full documentation](https://plandes.github.io/cnndmdb/index.html).
The [API reference](https://plandes.github.io/cnndmdb/api.html) is also
available.


## Obtaining

The easiest way to install the command line program is via the `pip` installer:
```bash
pip3 install zensols.cnndmdb
```

Binaries are also available on [pypi].


## Usage

First create the SQLite database file: `cnndmdb load` and check to make sure
the file `data/cnn.sqlite3` was created.  This takes a while since the entire
corpus is first downloaded and then inserted into the SQLite file.


### Command Line

The SQLite database keys can be given:
```bash
cnndmdb keys
```

Then the command line can also be used to print articles:
```bash
cnndmdb show -t org 3b07f5102c69e3e609d73b2ccb0dc5549d4fbaf6
```
The `-t org` tells it to use the original corpus keys.  This option also allows
for selected SQLite `rowid` keys or a Kth smallest article.


### API

The corpus objects are accessible as mapped Python objects.  For example:

```python
corpus: Corpus = ApplicationFactory.get_corpus()
art: Article = next(iter(corpus.stash.values()))
print(art.text)
```


## Data Source

The data is sourced from a [Tensorflow dataset], which in turn uses the
[Abigail See GitHub] repository.

```bibtex
@article{DBLP:journals/corr/SeeLM17,
  author    = {Abigail See and
               Peter J. Liu and
               Christopher D. Manning},
  title     = {Get To The Point: Summarization with Pointer-Generator Networks},
  journal   = {CoRR},
  volume    = {abs/1704.04368},
  year      = {2017},
  url       = {http://arxiv.org/abs/1704.04368},
  archivePrefix = {arXiv},
  eprint    = {1704.04368},
  timestamp = {Mon, 13 Aug 2018 16:46:08 +0200},
  biburl    = {https://dblp.org/rec/bib/journals/corr/SeeLM17},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}
```

```bibtex
@inproceedings{hermann2015teaching,
  title={Teaching machines to read and comprehend},
  author={Hermann, Karl Moritz and Kocisky, Tomas and Grefenstette, Edward and Espeholt, Lasse and Kay, Will and Suleyman, Mustafa and Blunsom, Phil},
  booktitle={Advances in neural information processing systems},
  pages={1693--1701},
  year={2015}
}
```


## Changelog

An extensive changelog is available [here](CHANGELOG.md).


## License

[MIT License](LICENSE.md)

Copyright (c) 2023 Paul Landes


<!-- links -->
[pypi]: https://pypi.org/project/zensols.cnndmdb/
[pypi-link]: https://pypi.python.org/pypi/zensols.cnndmdb
[pypi-badge]: https://img.shields.io/pypi/v/zensols.cnndmdb.svg
[python39-badge]: https://img.shields.io/badge/python-3.9-blue.svg
[python39-link]: https://www.python.org/downloads/release/python-390
[python310-badge]: https://img.shields.io/badge/python-3.10-blue.svg
[python310-link]: https://www.python.org/downloads/release/python-310
[build-badge]: https://github.com/plandes/cnndmdb/workflows/CI/badge.svg
[build-link]: https://github.com/plandes/cnndmdb/actions

[Tensorflow dataset]: https://www.tensorflow.org/datasets/catalog/cnn_dailymail
[Abigail See GitHub]: https://github.com/abisee/cnn-dailymail

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/plandes/cnndmdb",
    "name": "zensols.cnndmdb",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "tooling",
    "author": "Paul Landes",
    "author_email": "landes@mailc.net",
    "download_url": "https://github.com/plandes/cnndmdb/releases/download/v0.0.1/zensols.cnndmdb-0.0.1-py3-none-any.whl",
    "platform": null,
    "description": "# CNN/DailyMail Dataset as SQLite\n\n[![PyPI][pypi-badge]][pypi-link]\n[![Python 3.9][python39-badge]][python39-link]\n[![Python 3.10][python310-badge]][python310-link]\n[![Build Status][build-badge]][build-link]\n\nCreates a SQLite database if the CNN and DailyMail summarization dataset.\n\n\n## Documentation\n\nSee the [full documentation](https://plandes.github.io/cnndmdb/index.html).\nThe [API reference](https://plandes.github.io/cnndmdb/api.html) is also\navailable.\n\n\n## Obtaining\n\nThe easiest way to install the command line program is via the `pip` installer:\n```bash\npip3 install zensols.cnndmdb\n```\n\nBinaries are also available on [pypi].\n\n\n## Usage\n\nFirst create the SQLite database file: `cnndmdb load` and check to make sure\nthe file `data/cnn.sqlite3` was created.  This takes a while since the entire\ncorpus is first downloaded and then inserted into the SQLite file.\n\n\n### Command Line\n\nThe SQLite database keys can be given:\n```bash\ncnndmdb keys\n```\n\nThen the command line can also be used to print articles:\n```bash\ncnndmdb show -t org 3b07f5102c69e3e609d73b2ccb0dc5549d4fbaf6\n```\nThe `-t org` tells it to use the original corpus keys.  This option also allows\nfor selected SQLite `rowid` keys or a Kth smallest article.\n\n\n### API\n\nThe corpus objects are accessible as mapped Python objects.  For example:\n\n```python\ncorpus: Corpus = ApplicationFactory.get_corpus()\nart: Article = next(iter(corpus.stash.values()))\nprint(art.text)\n```\n\n\n## Data Source\n\nThe data is sourced from a [Tensorflow dataset], which in turn uses the\n[Abigail See GitHub] repository.\n\n```bibtex\n@article{DBLP:journals/corr/SeeLM17,\n  author    = {Abigail See and\n               Peter J. Liu and\n               Christopher D. Manning},\n  title     = {Get To The Point: Summarization with Pointer-Generator Networks},\n  journal   = {CoRR},\n  volume    = {abs/1704.04368},\n  year      = {2017},\n  url       = {http://arxiv.org/abs/1704.04368},\n  archivePrefix = {arXiv},\n  eprint    = {1704.04368},\n  timestamp = {Mon, 13 Aug 2018 16:46:08 +0200},\n  biburl    = {https://dblp.org/rec/bib/journals/corr/SeeLM17},\n  bibsource = {dblp computer science bibliography, https://dblp.org}\n}\n```\n\n```bibtex\n@inproceedings{hermann2015teaching,\n  title={Teaching machines to read and comprehend},\n  author={Hermann, Karl Moritz and Kocisky, Tomas and Grefenstette, Edward and Espeholt, Lasse and Kay, Will and Suleyman, Mustafa and Blunsom, Phil},\n  booktitle={Advances in neural information processing systems},\n  pages={1693--1701},\n  year={2015}\n}\n```\n\n\n## Changelog\n\nAn extensive changelog is available [here](CHANGELOG.md).\n\n\n## License\n\n[MIT License](LICENSE.md)\n\nCopyright (c) 2023 Paul Landes\n\n\n<!-- links -->\n[pypi]: https://pypi.org/project/zensols.cnndmdb/\n[pypi-link]: https://pypi.python.org/pypi/zensols.cnndmdb\n[pypi-badge]: https://img.shields.io/pypi/v/zensols.cnndmdb.svg\n[python39-badge]: https://img.shields.io/badge/python-3.9-blue.svg\n[python39-link]: https://www.python.org/downloads/release/python-390\n[python310-badge]: https://img.shields.io/badge/python-3.10-blue.svg\n[python310-link]: https://www.python.org/downloads/release/python-310\n[build-badge]: https://github.com/plandes/cnndmdb/workflows/CI/badge.svg\n[build-link]: https://github.com/plandes/cnndmdb/actions\n\n[Tensorflow dataset]: https://www.tensorflow.org/datasets/catalog/cnn_dailymail\n[Abigail See GitHub]: https://github.com/abisee/cnn-dailymail\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Creates a SQLite database if the CNN and DailyMail summarization dataset.",
    "version": "0.0.1",
    "project_urls": {
        "Download": "https://github.com/plandes/cnndmdb/releases/download/v0.0.1/zensols.cnndmdb-0.0.1-py3-none-any.whl",
        "Homepage": "https://github.com/plandes/cnndmdb"
    },
    "split_keywords": [
        "tooling"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "684bab5e8401c6ae10b0dec3bf962ce1d3978d7778931c2b628ccec8ed620564",
                "md5": "e80015f4297ba313fe02c13b2017058e",
                "sha256": "9a1b3c29a44f6525e3cfde91a95737453e530ecc763304dfa72fafb36badb7a4"
            },
            "downloads": -1,
            "filename": "zensols.cnndmdb-0.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e80015f4297ba313fe02c13b2017058e",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 10225,
            "upload_time": "2023-11-29T01:47:46",
            "upload_time_iso_8601": "2023-11-29T01:47:46.749437Z",
            "url": "https://files.pythonhosted.org/packages/68/4b/ab5e8401c6ae10b0dec3bf962ce1d3978d7778931c2b628ccec8ed620564/zensols.cnndmdb-0.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-11-29 01:47:46",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "plandes",
    "github_project": "cnndmdb",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "zensols.cnndmdb"
}
        
Elapsed time: 0.18120s