# CNN/DailyMail Dataset as SQLite
[![PyPI][pypi-badge]][pypi-link]
[![Python 3.9][python39-badge]][python39-link]
[![Python 3.10][python310-badge]][python310-link]
[![Build Status][build-badge]][build-link]
Creates a SQLite database if the CNN and DailyMail summarization dataset.
## Documentation
See the [full documentation](https://plandes.github.io/cnndmdb/index.html).
The [API reference](https://plandes.github.io/cnndmdb/api.html) is also
available.
## Obtaining
The easiest way to install the command line program is via the `pip` installer:
```bash
pip3 install zensols.cnndmdb
```
Binaries are also available on [pypi].
## Usage
First create the SQLite database file: `cnndmdb load` and check to make sure
the file `data/cnn.sqlite3` was created. This takes a while since the entire
corpus is first downloaded and then inserted into the SQLite file.
### Command Line
The SQLite database keys can be given:
```bash
cnndmdb keys
```
Then the command line can also be used to print articles:
```bash
cnndmdb show -t org 3b07f5102c69e3e609d73b2ccb0dc5549d4fbaf6
```
The `-t org` tells it to use the original corpus keys. This option also allows
for selected SQLite `rowid` keys or a Kth smallest article.
### API
The corpus objects are accessible as mapped Python objects. For example:
```python
corpus: Corpus = ApplicationFactory.get_corpus()
art: Article = next(iter(corpus.stash.values()))
print(art.text)
```
## Data Source
The data is sourced from a [Tensorflow dataset], which in turn uses the
[Abigail See GitHub] repository.
```bibtex
@article{DBLP:journals/corr/SeeLM17,
author = {Abigail See and
Peter J. Liu and
Christopher D. Manning},
title = {Get To The Point: Summarization with Pointer-Generator Networks},
journal = {CoRR},
volume = {abs/1704.04368},
year = {2017},
url = {http://arxiv.org/abs/1704.04368},
archivePrefix = {arXiv},
eprint = {1704.04368},
timestamp = {Mon, 13 Aug 2018 16:46:08 +0200},
biburl = {https://dblp.org/rec/bib/journals/corr/SeeLM17},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
```
```bibtex
@inproceedings{hermann2015teaching,
title={Teaching machines to read and comprehend},
author={Hermann, Karl Moritz and Kocisky, Tomas and Grefenstette, Edward and Espeholt, Lasse and Kay, Will and Suleyman, Mustafa and Blunsom, Phil},
booktitle={Advances in neural information processing systems},
pages={1693--1701},
year={2015}
}
```
## Changelog
An extensive changelog is available [here](CHANGELOG.md).
## License
[MIT License](LICENSE.md)
Copyright (c) 2023 Paul Landes
<!-- links -->
[pypi]: https://pypi.org/project/zensols.cnndmdb/
[pypi-link]: https://pypi.python.org/pypi/zensols.cnndmdb
[pypi-badge]: https://img.shields.io/pypi/v/zensols.cnndmdb.svg
[python39-badge]: https://img.shields.io/badge/python-3.9-blue.svg
[python39-link]: https://www.python.org/downloads/release/python-390
[python310-badge]: https://img.shields.io/badge/python-3.10-blue.svg
[python310-link]: https://www.python.org/downloads/release/python-310
[build-badge]: https://github.com/plandes/cnndmdb/workflows/CI/badge.svg
[build-link]: https://github.com/plandes/cnndmdb/actions
[Tensorflow dataset]: https://www.tensorflow.org/datasets/catalog/cnn_dailymail
[Abigail See GitHub]: https://github.com/abisee/cnn-dailymail
Raw data
{
"_id": null,
"home_page": "https://github.com/plandes/cnndmdb",
"name": "zensols.cnndmdb",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "tooling",
"author": "Paul Landes",
"author_email": "landes@mailc.net",
"download_url": "https://github.com/plandes/cnndmdb/releases/download/v0.0.1/zensols.cnndmdb-0.0.1-py3-none-any.whl",
"platform": null,
"description": "# CNN/DailyMail Dataset as SQLite\n\n[![PyPI][pypi-badge]][pypi-link]\n[![Python 3.9][python39-badge]][python39-link]\n[![Python 3.10][python310-badge]][python310-link]\n[![Build Status][build-badge]][build-link]\n\nCreates a SQLite database if the CNN and DailyMail summarization dataset.\n\n\n## Documentation\n\nSee the [full documentation](https://plandes.github.io/cnndmdb/index.html).\nThe [API reference](https://plandes.github.io/cnndmdb/api.html) is also\navailable.\n\n\n## Obtaining\n\nThe easiest way to install the command line program is via the `pip` installer:\n```bash\npip3 install zensols.cnndmdb\n```\n\nBinaries are also available on [pypi].\n\n\n## Usage\n\nFirst create the SQLite database file: `cnndmdb load` and check to make sure\nthe file `data/cnn.sqlite3` was created. This takes a while since the entire\ncorpus is first downloaded and then inserted into the SQLite file.\n\n\n### Command Line\n\nThe SQLite database keys can be given:\n```bash\ncnndmdb keys\n```\n\nThen the command line can also be used to print articles:\n```bash\ncnndmdb show -t org 3b07f5102c69e3e609d73b2ccb0dc5549d4fbaf6\n```\nThe `-t org` tells it to use the original corpus keys. This option also allows\nfor selected SQLite `rowid` keys or a Kth smallest article.\n\n\n### API\n\nThe corpus objects are accessible as mapped Python objects. For example:\n\n```python\ncorpus: Corpus = ApplicationFactory.get_corpus()\nart: Article = next(iter(corpus.stash.values()))\nprint(art.text)\n```\n\n\n## Data Source\n\nThe data is sourced from a [Tensorflow dataset], which in turn uses the\n[Abigail See GitHub] repository.\n\n```bibtex\n@article{DBLP:journals/corr/SeeLM17,\n author = {Abigail See and\n Peter J. Liu and\n Christopher D. Manning},\n title = {Get To The Point: Summarization with Pointer-Generator Networks},\n journal = {CoRR},\n volume = {abs/1704.04368},\n year = {2017},\n url = {http://arxiv.org/abs/1704.04368},\n archivePrefix = {arXiv},\n eprint = {1704.04368},\n timestamp = {Mon, 13 Aug 2018 16:46:08 +0200},\n biburl = {https://dblp.org/rec/bib/journals/corr/SeeLM17},\n bibsource = {dblp computer science bibliography, https://dblp.org}\n}\n```\n\n```bibtex\n@inproceedings{hermann2015teaching,\n title={Teaching machines to read and comprehend},\n author={Hermann, Karl Moritz and Kocisky, Tomas and Grefenstette, Edward and Espeholt, Lasse and Kay, Will and Suleyman, Mustafa and Blunsom, Phil},\n booktitle={Advances in neural information processing systems},\n pages={1693--1701},\n year={2015}\n}\n```\n\n\n## Changelog\n\nAn extensive changelog is available [here](CHANGELOG.md).\n\n\n## License\n\n[MIT License](LICENSE.md)\n\nCopyright (c) 2023 Paul Landes\n\n\n<!-- links -->\n[pypi]: https://pypi.org/project/zensols.cnndmdb/\n[pypi-link]: https://pypi.python.org/pypi/zensols.cnndmdb\n[pypi-badge]: https://img.shields.io/pypi/v/zensols.cnndmdb.svg\n[python39-badge]: https://img.shields.io/badge/python-3.9-blue.svg\n[python39-link]: https://www.python.org/downloads/release/python-390\n[python310-badge]: https://img.shields.io/badge/python-3.10-blue.svg\n[python310-link]: https://www.python.org/downloads/release/python-310\n[build-badge]: https://github.com/plandes/cnndmdb/workflows/CI/badge.svg\n[build-link]: https://github.com/plandes/cnndmdb/actions\n\n[Tensorflow dataset]: https://www.tensorflow.org/datasets/catalog/cnn_dailymail\n[Abigail See GitHub]: https://github.com/abisee/cnn-dailymail\n",
"bugtrack_url": null,
"license": "",
"summary": "Creates a SQLite database if the CNN and DailyMail summarization dataset.",
"version": "0.0.1",
"project_urls": {
"Download": "https://github.com/plandes/cnndmdb/releases/download/v0.0.1/zensols.cnndmdb-0.0.1-py3-none-any.whl",
"Homepage": "https://github.com/plandes/cnndmdb"
},
"split_keywords": [
"tooling"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "684bab5e8401c6ae10b0dec3bf962ce1d3978d7778931c2b628ccec8ed620564",
"md5": "e80015f4297ba313fe02c13b2017058e",
"sha256": "9a1b3c29a44f6525e3cfde91a95737453e530ecc763304dfa72fafb36badb7a4"
},
"downloads": -1,
"filename": "zensols.cnndmdb-0.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "e80015f4297ba313fe02c13b2017058e",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 10225,
"upload_time": "2023-11-29T01:47:46",
"upload_time_iso_8601": "2023-11-29T01:47:46.749437Z",
"url": "https://files.pythonhosted.org/packages/68/4b/ab5e8401c6ae10b0dec3bf962ce1d3978d7778931c2b628ccec8ed620564/zensols.cnndmdb-0.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-11-29 01:47:46",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "plandes",
"github_project": "cnndmdb",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "zensols.cnndmdb"
}