goodwiki


Namegoodwiki JSON
Version 1.0.1 PyPI version JSON
download
home_pagehttps://github.com/euirim/goodwiki
SummaryUtility that converts Wikipedia pages into GitHub-flavored Markdown.
upload_time2023-09-11 04:45:25
maintainer
docs_urlNone
authorEuirim Choi
requires_python>=3.11,<4.0
licenseMIT
keywords wikipedia markdown dataset wikitext wikicode
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # GoodWiki

GoodWiki is a Python package that carefully converts Wikipedia pages into GitHub-flavored Markdown. Converted pages preserve layout features like lists, code blocks, math, and block quotes.

This package is used to generate the [GoodWiki Dataset](https://github.com/euirim/goodwiki).

## Installation

This package supports Python 3.11+.

1. Install via pip.

```bash
pip install goodwiki
```

2. Install pandoc v2.19.2. Follow instructions [here](https://pandoc.org/installing.html).

## Usage

### Initializing Client

```python
import asyncio
from goodwiki import GoodwikiClient

client = GoodwikiClient()
```

You can also optionally provide your own user agent (default is `goodwiki/1.0 (https://euirim.org)`):

```python

client = GoodwikiClient("goodwiki/1.0 (bob@gmail.com)")
```

### Getting Single Page

```python
page = asyncio.run(client.get_page("Usain Bolt"))
```

You can also optionally include styling syntax like bolding to the final markdown:

```python
page = asyncio.run(client.get_page("Usain Bolt", with_styling=True))
```

You can access the resulting data via properties. For example:

```python
print(page.markdown)
```

### Getting Category Pages

To get a list of page titles associated with a Wikipedia category, run the following:

```python
client.get_category_pages("Category:Good_articles")
```

### Converting Existing Raw Wikitext

If you've already downloaded raw wikitext from Wikipedia, you can convert it to Markdown by running:

```python
client.get_page_from_wikitext(
	raw_wikitext="RAW_WIKITEXT",
	# The rest of the fields are meant for populating the final WikiPage object
	title="Usain Bolt",
	pageid=123,
	revid=123,
)
```

## Methodology

Full details are available in this package's [GitHub repo README](https://github.com/euirim/goodwiki).

## External Links

* [Changelog](https://github.com/euirim/goodwiki/releases)
* [GitHub](https://github.com/euirim/goodwiki)
* [Dataset](https://huggingface.co/datasets/euirim/goodwiki)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/euirim/goodwiki",
    "name": "goodwiki",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.11,<4.0",
    "maintainer_email": "",
    "keywords": "wikipedia,markdown,dataset,wikitext,wikicode",
    "author": "Euirim Choi",
    "author_email": "euirim@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/e5/7f/001314e0ecb375c9f493d379e181767ed4339e16237126b8b945d0f733c1/goodwiki-1.0.1.tar.gz",
    "platform": null,
    "description": "# GoodWiki\n\nGoodWiki is a Python package that carefully converts Wikipedia pages into GitHub-flavored Markdown. Converted pages preserve layout features like lists, code blocks, math, and block quotes.\n\nThis package is used to generate the [GoodWiki Dataset](https://github.com/euirim/goodwiki).\n\n## Installation\n\nThis package supports Python 3.11+.\n\n1. Install via pip.\n\n```bash\npip install goodwiki\n```\n\n2. Install pandoc v2.19.2. Follow instructions [here](https://pandoc.org/installing.html).\n\n## Usage\n\n### Initializing Client\n\n```python\nimport asyncio\nfrom goodwiki import GoodwikiClient\n\nclient = GoodwikiClient()\n```\n\nYou can also optionally provide your own user agent (default is `goodwiki/1.0 (https://euirim.org)`):\n\n```python\n\nclient = GoodwikiClient(\"goodwiki/1.0 (bob@gmail.com)\")\n```\n\n### Getting Single Page\n\n```python\npage = asyncio.run(client.get_page(\"Usain Bolt\"))\n```\n\nYou can also optionally include styling syntax like bolding to the final markdown:\n\n```python\npage = asyncio.run(client.get_page(\"Usain Bolt\", with_styling=True))\n```\n\nYou can access the resulting data via properties. For example:\n\n```python\nprint(page.markdown)\n```\n\n### Getting Category Pages\n\nTo get a list of page titles associated with a Wikipedia category, run the following:\n\n```python\nclient.get_category_pages(\"Category:Good_articles\")\n```\n\n### Converting Existing Raw Wikitext\n\nIf you've already downloaded raw wikitext from Wikipedia, you can convert it to Markdown by running:\n\n```python\nclient.get_page_from_wikitext(\n\traw_wikitext=\"RAW_WIKITEXT\",\n\t# The rest of the fields are meant for populating the final WikiPage object\n\ttitle=\"Usain Bolt\",\n\tpageid=123,\n\trevid=123,\n)\n```\n\n## Methodology\n\nFull details are available in this package's [GitHub repo README](https://github.com/euirim/goodwiki).\n\n## External Links\n\n* [Changelog](https://github.com/euirim/goodwiki/releases)\n* [GitHub](https://github.com/euirim/goodwiki)\n* [Dataset](https://huggingface.co/datasets/euirim/goodwiki)\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Utility that converts Wikipedia pages into GitHub-flavored Markdown.",
    "version": "1.0.1",
    "project_urls": {
        "Homepage": "https://github.com/euirim/goodwiki",
        "Repository": "https://github.com/euirim/goodwiki"
    },
    "split_keywords": [
        "wikipedia",
        "markdown",
        "dataset",
        "wikitext",
        "wikicode"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5a3896ba0f3f2f9c062e2bae674f250eb511a4e8c936bf3716d59097440a615c",
                "md5": "8efdc6277f15fbaf1683e545486e3154",
                "sha256": "41d4152361bb7a652ab46b421605c324d8239c241f4893a302f958154bbcefd3"
            },
            "downloads": -1,
            "filename": "goodwiki-1.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "8efdc6277f15fbaf1683e545486e3154",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11,<4.0",
            "size": 15304,
            "upload_time": "2023-09-11T04:45:24",
            "upload_time_iso_8601": "2023-09-11T04:45:24.309292Z",
            "url": "https://files.pythonhosted.org/packages/5a/38/96ba0f3f2f9c062e2bae674f250eb511a4e8c936bf3716d59097440a615c/goodwiki-1.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e57f001314e0ecb375c9f493d379e181767ed4339e16237126b8b945d0f733c1",
                "md5": "081d8c1a5ff71c1adc17f977ebacab89",
                "sha256": "ba44a79803dfab5e37e2cded4c649e6bf9c3466114b77650853c038507fb295a"
            },
            "downloads": -1,
            "filename": "goodwiki-1.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "081d8c1a5ff71c1adc17f977ebacab89",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11,<4.0",
            "size": 31523,
            "upload_time": "2023-09-11T04:45:25",
            "upload_time_iso_8601": "2023-09-11T04:45:25.489789Z",
            "url": "https://files.pythonhosted.org/packages/e5/7f/001314e0ecb375c9f493d379e181767ed4339e16237126b8b945d0f733c1/goodwiki-1.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-09-11 04:45:25",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "euirim",
    "github_project": "goodwiki",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "goodwiki"
}
        
Elapsed time: 1.55271s