# GoodWiki
GoodWiki is a Python package that carefully converts Wikipedia pages into GitHub-flavored Markdown. Converted pages preserve layout features like lists, code blocks, math, and block quotes.
This package is used to generate the [GoodWiki Dataset](https://github.com/euirim/goodwiki).
## Installation
This package supports Python 3.11+.
1. Install via pip.
```bash
pip install goodwiki
```
2. Install pandoc v2.19.2. Follow instructions [here](https://pandoc.org/installing.html).
## Usage
### Initializing Client
```python
import asyncio
from goodwiki import GoodwikiClient
client = GoodwikiClient()
```
You can also optionally provide your own user agent (default is `goodwiki/1.0 (https://euirim.org)`):
```python
client = GoodwikiClient("goodwiki/1.0 (bob@gmail.com)")
```
### Getting Single Page
```python
page = asyncio.run(client.get_page("Usain Bolt"))
```
You can also optionally include styling syntax like bolding to the final markdown:
```python
page = asyncio.run(client.get_page("Usain Bolt", with_styling=True))
```
You can access the resulting data via properties. For example:
```python
print(page.markdown)
```
### Getting Category Pages
To get a list of page titles associated with a Wikipedia category, run the following:
```python
client.get_category_pages("Category:Good_articles")
```
### Converting Existing Raw Wikitext
If you've already downloaded raw wikitext from Wikipedia, you can convert it to Markdown by running:
```python
client.get_page_from_wikitext(
raw_wikitext="RAW_WIKITEXT",
# The rest of the fields are meant for populating the final WikiPage object
title="Usain Bolt",
pageid=123,
revid=123,
)
```
## Methodology
Full details are available in this package's [GitHub repo README](https://github.com/euirim/goodwiki).
## External Links
* [Changelog](https://github.com/euirim/goodwiki/releases)
* [GitHub](https://github.com/euirim/goodwiki)
* [Dataset](https://huggingface.co/datasets/euirim/goodwiki)
Raw data
{
"_id": null,
"home_page": "https://github.com/euirim/goodwiki",
"name": "goodwiki",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.11,<4.0",
"maintainer_email": "",
"keywords": "wikipedia,markdown,dataset,wikitext,wikicode",
"author": "Euirim Choi",
"author_email": "euirim@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/e5/7f/001314e0ecb375c9f493d379e181767ed4339e16237126b8b945d0f733c1/goodwiki-1.0.1.tar.gz",
"platform": null,
"description": "# GoodWiki\n\nGoodWiki is a Python package that carefully converts Wikipedia pages into GitHub-flavored Markdown. Converted pages preserve layout features like lists, code blocks, math, and block quotes.\n\nThis package is used to generate the [GoodWiki Dataset](https://github.com/euirim/goodwiki).\n\n## Installation\n\nThis package supports Python 3.11+.\n\n1. Install via pip.\n\n```bash\npip install goodwiki\n```\n\n2. Install pandoc v2.19.2. Follow instructions [here](https://pandoc.org/installing.html).\n\n## Usage\n\n### Initializing Client\n\n```python\nimport asyncio\nfrom goodwiki import GoodwikiClient\n\nclient = GoodwikiClient()\n```\n\nYou can also optionally provide your own user agent (default is `goodwiki/1.0 (https://euirim.org)`):\n\n```python\n\nclient = GoodwikiClient(\"goodwiki/1.0 (bob@gmail.com)\")\n```\n\n### Getting Single Page\n\n```python\npage = asyncio.run(client.get_page(\"Usain Bolt\"))\n```\n\nYou can also optionally include styling syntax like bolding to the final markdown:\n\n```python\npage = asyncio.run(client.get_page(\"Usain Bolt\", with_styling=True))\n```\n\nYou can access the resulting data via properties. For example:\n\n```python\nprint(page.markdown)\n```\n\n### Getting Category Pages\n\nTo get a list of page titles associated with a Wikipedia category, run the following:\n\n```python\nclient.get_category_pages(\"Category:Good_articles\")\n```\n\n### Converting Existing Raw Wikitext\n\nIf you've already downloaded raw wikitext from Wikipedia, you can convert it to Markdown by running:\n\n```python\nclient.get_page_from_wikitext(\n\traw_wikitext=\"RAW_WIKITEXT\",\n\t# The rest of the fields are meant for populating the final WikiPage object\n\ttitle=\"Usain Bolt\",\n\tpageid=123,\n\trevid=123,\n)\n```\n\n## Methodology\n\nFull details are available in this package's [GitHub repo README](https://github.com/euirim/goodwiki).\n\n## External Links\n\n* [Changelog](https://github.com/euirim/goodwiki/releases)\n* [GitHub](https://github.com/euirim/goodwiki)\n* [Dataset](https://huggingface.co/datasets/euirim/goodwiki)\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Utility that converts Wikipedia pages into GitHub-flavored Markdown.",
"version": "1.0.1",
"project_urls": {
"Homepage": "https://github.com/euirim/goodwiki",
"Repository": "https://github.com/euirim/goodwiki"
},
"split_keywords": [
"wikipedia",
"markdown",
"dataset",
"wikitext",
"wikicode"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "5a3896ba0f3f2f9c062e2bae674f250eb511a4e8c936bf3716d59097440a615c",
"md5": "8efdc6277f15fbaf1683e545486e3154",
"sha256": "41d4152361bb7a652ab46b421605c324d8239c241f4893a302f958154bbcefd3"
},
"downloads": -1,
"filename": "goodwiki-1.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "8efdc6277f15fbaf1683e545486e3154",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.11,<4.0",
"size": 15304,
"upload_time": "2023-09-11T04:45:24",
"upload_time_iso_8601": "2023-09-11T04:45:24.309292Z",
"url": "https://files.pythonhosted.org/packages/5a/38/96ba0f3f2f9c062e2bae674f250eb511a4e8c936bf3716d59097440a615c/goodwiki-1.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "e57f001314e0ecb375c9f493d379e181767ed4339e16237126b8b945d0f733c1",
"md5": "081d8c1a5ff71c1adc17f977ebacab89",
"sha256": "ba44a79803dfab5e37e2cded4c649e6bf9c3466114b77650853c038507fb295a"
},
"downloads": -1,
"filename": "goodwiki-1.0.1.tar.gz",
"has_sig": false,
"md5_digest": "081d8c1a5ff71c1adc17f977ebacab89",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.11,<4.0",
"size": 31523,
"upload_time": "2023-09-11T04:45:25",
"upload_time_iso_8601": "2023-09-11T04:45:25.489789Z",
"url": "https://files.pythonhosted.org/packages/e5/7f/001314e0ecb375c9f493d379e181767ed4339e16237126b8b945d0f733c1/goodwiki-1.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-09-11 04:45:25",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "euirim",
"github_project": "goodwiki",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "goodwiki"
}