# kgdata ![PyPI](https://img.shields.io/pypi/v/kgdata) ![Documentation](https://readthedocs.org/projects/kgdata/badge/?version=latest&style=flat)
KGData is a library to process dumps of Wikipedia, Wikidata. What it can do:
- Clean up the dumps to ensure the data is consistent (resolve redirect, remove dangling references)
- Create embedded key-value databases to access entities from the dumps.
- Extract Wikidata ontology.
- Extract Wikipedia tables and convert the hyperlinks to Wikidata entities.
- Create Pyserini indices to search Wikidata’s entities.
- and more
For a full documentation, please see [the website](https://kgdata.readthedocs.io/).
## Installation
From PyPI (using pre-built binaries):
```bash
pip install kgdata[spark] # omit spark to manually specify its version if your cluster has different version
```
Raw data
{
"_id": null,
"home_page": "https://github.com/binh-vu/kgdata",
"name": "kgdata",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "knowledge-graph, wikidata, wikipedia, dbpedia",
"author": null,
"author_email": "Binh Vu <binh@toan2.com>",
"download_url": "https://files.pythonhosted.org/packages/80/c8/b64411a2bc1bd4b7cb6801badd40e0d1f2fdc461c786a3fb0480cb34ef2a/kgdata-7.0.4.tar.gz",
"platform": null,
"description": "# kgdata ![PyPI](https://img.shields.io/pypi/v/kgdata) ![Documentation](https://readthedocs.org/projects/kgdata/badge/?version=latest&style=flat)\n\nKGData is a library to process dumps of Wikipedia, Wikidata. What it can do:\n\n- Clean up the dumps to ensure the data is consistent (resolve redirect, remove dangling references)\n- Create embedded key-value databases to access entities from the dumps.\n- Extract Wikidata ontology.\n- Extract Wikipedia tables and convert the hyperlinks to Wikidata entities.\n- Create Pyserini indices to search Wikidata\u2019s entities.\n- and more\n\nFor a full documentation, please see [the website](https://kgdata.readthedocs.io/).\n\n## Installation\n\nFrom PyPI (using pre-built binaries):\n\n```bash\npip install kgdata[spark] # omit spark to manually specify its version if your cluster has different version\n```\n\n",
"bugtrack_url": null,
"license": null,
"summary": "Library to process dumps of knowledge graphs (Wikipedia, DBpedia, Wikidata)",
"version": "7.0.4",
"project_urls": {
"Homepage": "https://github.com/binh-vu/kgdata",
"repository": "https://github.com/binh-vu/kgdata"
},
"split_keywords": [
"knowledge-graph",
" wikidata",
" wikipedia",
" dbpedia"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "c139e53cc4f5c43879a7513fe110c5383d5abd597d3d446f1f16789e17dd6fb6",
"md5": "3f4ef5ac0d23f13ea253d44cb0678e9e",
"sha256": "26155cd09d9fb89459cfbc68d0ebe61689ae58dcaf37f708b55465b1af4dc223"
},
"downloads": -1,
"filename": "kgdata-7.0.4-cp310-cp310-macosx_10_14_x86_64.macosx_11_0_arm64.macosx_10_14_universal2.whl",
"has_sig": false,
"md5_digest": "3f4ef5ac0d23f13ea253d44cb0678e9e",
"packagetype": "bdist_wheel",
"python_version": "cp310",
"requires_python": ">=3.10",
"size": 5467979,
"upload_time": "2024-05-11T07:37:57",
"upload_time_iso_8601": "2024-05-11T07:37:57.072501Z",
"url": "https://files.pythonhosted.org/packages/c1/39/e53cc4f5c43879a7513fe110c5383d5abd597d3d446f1f16789e17dd6fb6/kgdata-7.0.4-cp310-cp310-macosx_10_14_x86_64.macosx_11_0_arm64.macosx_10_14_universal2.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "592e0385f2cfb33dcc4d718d0334fbec17a560c752e08f91b33a462e48c7ec6b",
"md5": "348cde176b575f8d95839810740ba59d",
"sha256": "d346320f23670b7cb18503076124a68d07be360042579ff36abaffe622fe8c9f"
},
"downloads": -1,
"filename": "kgdata-7.0.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "348cde176b575f8d95839810740ba59d",
"packagetype": "bdist_wheel",
"python_version": "cp310",
"requires_python": ">=3.10",
"size": 4011136,
"upload_time": "2024-05-11T07:37:59",
"upload_time_iso_8601": "2024-05-11T07:37:59.687262Z",
"url": "https://files.pythonhosted.org/packages/59/2e/0385f2cfb33dcc4d718d0334fbec17a560c752e08f91b33a462e48c7ec6b/kgdata-7.0.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "de28acd3d0bf61f44e8697456198a862ec1f90cd0687d18cba6f0ba8a1436255",
"md5": "c4ee051784dbcec70b93ccfc23ec4d1f",
"sha256": "a3240b8b08d8dd9c89c2ea9f670ddbe3e5b9ee43e4fb5aba41e8682aadbf6141"
},
"downloads": -1,
"filename": "kgdata-7.0.4-cp310-cp310-manylinux_2_35_x86_64.whl",
"has_sig": false,
"md5_digest": "c4ee051784dbcec70b93ccfc23ec4d1f",
"packagetype": "bdist_wheel",
"python_version": "cp310",
"requires_python": ">=3.10",
"size": 3280756,
"upload_time": "2024-05-11T07:38:02",
"upload_time_iso_8601": "2024-05-11T07:38:02.256735Z",
"url": "https://files.pythonhosted.org/packages/de/28/acd3d0bf61f44e8697456198a862ec1f90cd0687d18cba6f0ba8a1436255/kgdata-7.0.4-cp310-cp310-manylinux_2_35_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "c829304cf7fa3249da664627fb6c75ae7375c49302c44ca8de5757108dc8ac07",
"md5": "3a0c76e04b6bb46ff2caa3c2484128ea",
"sha256": "0d6dc8ffec6d4197141e25ed81ad32849a09ceb1450206848f264baac940cf5d"
},
"downloads": -1,
"filename": "kgdata-7.0.4-cp310-none-win_amd64.whl",
"has_sig": false,
"md5_digest": "3a0c76e04b6bb46ff2caa3c2484128ea",
"packagetype": "bdist_wheel",
"python_version": "cp310",
"requires_python": ">=3.10",
"size": 2250409,
"upload_time": "2024-05-11T07:38:04",
"upload_time_iso_8601": "2024-05-11T07:38:04.402144Z",
"url": "https://files.pythonhosted.org/packages/c8/29/304cf7fa3249da664627fb6c75ae7375c49302c44ca8de5757108dc8ac07/kgdata-7.0.4-cp310-none-win_amd64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "4a9cb4ac35bc95f5e4e0431476a0c2eaf63ebf6a481e7245ddedbb746de7f1fd",
"md5": "df6e95cd54376179dfc067463e573e45",
"sha256": "7ca4b6ac228b2ed1a3f0e51a8fde91bf7741377f81a9f446158d7a27f85fee49"
},
"downloads": -1,
"filename": "kgdata-7.0.4-cp311-cp311-macosx_10_14_x86_64.macosx_11_0_arm64.macosx_10_14_universal2.whl",
"has_sig": false,
"md5_digest": "df6e95cd54376179dfc067463e573e45",
"packagetype": "bdist_wheel",
"python_version": "cp311",
"requires_python": ">=3.10",
"size": 5467971,
"upload_time": "2024-05-11T07:38:06",
"upload_time_iso_8601": "2024-05-11T07:38:06.346931Z",
"url": "https://files.pythonhosted.org/packages/4a/9c/b4ac35bc95f5e4e0431476a0c2eaf63ebf6a481e7245ddedbb746de7f1fd/kgdata-7.0.4-cp311-cp311-macosx_10_14_x86_64.macosx_11_0_arm64.macosx_10_14_universal2.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "170afbf91adba83fe4bf0385260d1ac4cac8edf3ad7e3cac4779be3743b25ed1",
"md5": "5b4475028cda9d97017e1d212158e376",
"sha256": "35aa4ce4eaf02c7ecb4a88166f2b197026d99ac88bd2380f5373f6452dbaa74d"
},
"downloads": -1,
"filename": "kgdata-7.0.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "5b4475028cda9d97017e1d212158e376",
"packagetype": "bdist_wheel",
"python_version": "cp311",
"requires_python": ">=3.10",
"size": 4011086,
"upload_time": "2024-05-11T07:38:08",
"upload_time_iso_8601": "2024-05-11T07:38:08.673941Z",
"url": "https://files.pythonhosted.org/packages/17/0a/fbf91adba83fe4bf0385260d1ac4cac8edf3ad7e3cac4779be3743b25ed1/kgdata-7.0.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "c3e1bf70d9fa5f45232c6df98da4606dc6bd825b5319acbc628b2ef0769ac688",
"md5": "9bced9b9582d357bc7c8e9f2433d1372",
"sha256": "733e184e6d4d2bca07923dea0c1ebe7a4f5bf661fd7a98588236b41730ac41e6"
},
"downloads": -1,
"filename": "kgdata-7.0.4-cp311-cp311-manylinux_2_35_x86_64.whl",
"has_sig": false,
"md5_digest": "9bced9b9582d357bc7c8e9f2433d1372",
"packagetype": "bdist_wheel",
"python_version": "cp311",
"requires_python": ">=3.10",
"size": 3280695,
"upload_time": "2024-05-11T07:38:10",
"upload_time_iso_8601": "2024-05-11T07:38:10.889833Z",
"url": "https://files.pythonhosted.org/packages/c3/e1/bf70d9fa5f45232c6df98da4606dc6bd825b5319acbc628b2ef0769ac688/kgdata-7.0.4-cp311-cp311-manylinux_2_35_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "ecddd8d65e5a6646775a34e4b82c7767a88c77d131d36de766b6f6bc0ddb3953",
"md5": "27c2b4f7f215bf696619398898382712",
"sha256": "29981188670e8d964f4a58b22ac5c25647401c227bdb30fff0ef1b3120e14ce2"
},
"downloads": -1,
"filename": "kgdata-7.0.4-cp311-none-win_amd64.whl",
"has_sig": false,
"md5_digest": "27c2b4f7f215bf696619398898382712",
"packagetype": "bdist_wheel",
"python_version": "cp311",
"requires_python": ">=3.10",
"size": 2250404,
"upload_time": "2024-05-11T07:38:13",
"upload_time_iso_8601": "2024-05-11T07:38:13.192910Z",
"url": "https://files.pythonhosted.org/packages/ec/dd/d8d65e5a6646775a34e4b82c7767a88c77d131d36de766b6f6bc0ddb3953/kgdata-7.0.4-cp311-none-win_amd64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "eeefc525b57214eaf35f84ebbb537c45aac0fddabe92b0c1e9a155ba4cfec51d",
"md5": "f9def8e89ce99352b01858618bb76185",
"sha256": "53929186d9420d2637805ded26271461232e1ad663d285fad2ce4a8ad0fb91a5"
},
"downloads": -1,
"filename": "kgdata-7.0.4-cp312-cp312-macosx_10_14_x86_64.macosx_11_0_arm64.macosx_10_14_universal2.whl",
"has_sig": false,
"md5_digest": "f9def8e89ce99352b01858618bb76185",
"packagetype": "bdist_wheel",
"python_version": "cp312",
"requires_python": ">=3.10",
"size": 5467905,
"upload_time": "2024-05-11T07:38:15",
"upload_time_iso_8601": "2024-05-11T07:38:15.424284Z",
"url": "https://files.pythonhosted.org/packages/ee/ef/c525b57214eaf35f84ebbb537c45aac0fddabe92b0c1e9a155ba4cfec51d/kgdata-7.0.4-cp312-cp312-macosx_10_14_x86_64.macosx_11_0_arm64.macosx_10_14_universal2.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "a92de4d3b7b92e9501402ba7f5adfa0003d250fac434e9c67bed12a29b731896",
"md5": "9bc506b408c940d3c0bcd21dc6584ba5",
"sha256": "a0e5d194721779c4f34f9c32fbf5d03693b0ca0feacae8ee327160086b997dd3"
},
"downloads": -1,
"filename": "kgdata-7.0.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "9bc506b408c940d3c0bcd21dc6584ba5",
"packagetype": "bdist_wheel",
"python_version": "cp312",
"requires_python": ">=3.10",
"size": 4005424,
"upload_time": "2024-05-11T07:38:17",
"upload_time_iso_8601": "2024-05-11T07:38:17.803612Z",
"url": "https://files.pythonhosted.org/packages/a9/2d/e4d3b7b92e9501402ba7f5adfa0003d250fac434e9c67bed12a29b731896/kgdata-7.0.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "26fac4c3a0adbbc3f967578d4aceffedde9509cb8806b655f4d6ea17eaafaf7c",
"md5": "a53867ad53efee9f07f1305feb6370a0",
"sha256": "341bcb519e71060ef694dfa65841f6821e9577ce0f50319c79e5cfa3ee2ca762"
},
"downloads": -1,
"filename": "kgdata-7.0.4-cp312-cp312-manylinux_2_35_x86_64.whl",
"has_sig": false,
"md5_digest": "a53867ad53efee9f07f1305feb6370a0",
"packagetype": "bdist_wheel",
"python_version": "cp312",
"requires_python": ">=3.10",
"size": 3277147,
"upload_time": "2024-05-11T07:38:19",
"upload_time_iso_8601": "2024-05-11T07:38:19.973034Z",
"url": "https://files.pythonhosted.org/packages/26/fa/c4c3a0adbbc3f967578d4aceffedde9509cb8806b655f4d6ea17eaafaf7c/kgdata-7.0.4-cp312-cp312-manylinux_2_35_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "2e9306c174b74a827e6e2e6f05b031219557b128ed0c07a91db1f9ff22d435cb",
"md5": "55e723cb8711c6331bee0c62fc1d1a0f",
"sha256": "7a4641bcb36c0003456bf459b3ace8259b489fa3f63e083ae8dd0c2d762d3972"
},
"downloads": -1,
"filename": "kgdata-7.0.4-cp312-none-win_amd64.whl",
"has_sig": false,
"md5_digest": "55e723cb8711c6331bee0c62fc1d1a0f",
"packagetype": "bdist_wheel",
"python_version": "cp312",
"requires_python": ">=3.10",
"size": 2248918,
"upload_time": "2024-05-11T07:38:22",
"upload_time_iso_8601": "2024-05-11T07:38:22.245364Z",
"url": "https://files.pythonhosted.org/packages/2e/93/06c174b74a827e6e2e6f05b031219557b128ed0c07a91db1f9ff22d435cb/kgdata-7.0.4-cp312-none-win_amd64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "ae53dcfa1b0cb8c64d86432083215319392e470aa7316c473ff45ec2d659f912",
"md5": "75d5f2bd5d7c0c27f313b86cc1cbc3ec",
"sha256": "96d5929fb7669580efd490e79235297c72a764d8f2b4e407c6e860b84644ccad"
},
"downloads": -1,
"filename": "kgdata-7.0.4-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "75d5f2bd5d7c0c27f313b86cc1cbc3ec",
"packagetype": "bdist_wheel",
"python_version": "cp313",
"requires_python": ">=3.10",
"size": 4005422,
"upload_time": "2024-05-11T07:38:24",
"upload_time_iso_8601": "2024-05-11T07:38:24.424842Z",
"url": "https://files.pythonhosted.org/packages/ae/53/dcfa1b0cb8c64d86432083215319392e470aa7316c473ff45ec2d659f912/kgdata-7.0.4-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "fa27468d61eed6561e0cc942b28a55f258a576ab47ba50bdd2d69e61962afb48",
"md5": "38dc2e13deaa8a13f115212e30b001d7",
"sha256": "9ddc3f342f7da134a608b2ac49a642844f530acf1f2935331926e9f709821423"
},
"downloads": -1,
"filename": "kgdata-7.0.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "38dc2e13deaa8a13f115212e30b001d7",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": ">=3.10",
"size": 4011038,
"upload_time": "2024-05-11T07:38:26",
"upload_time_iso_8601": "2024-05-11T07:38:26.989804Z",
"url": "https://files.pythonhosted.org/packages/fa/27/468d61eed6561e0cc942b28a55f258a576ab47ba50bdd2d69e61962afb48/kgdata-7.0.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "2f7b4841cc2edd6a1ec3226de74e6c0238603ee799652239c0fee658d20e1453",
"md5": "2d6f69b32957ed5c55ad12d033f6b831",
"sha256": "867abaa9f2f0db21bb40c7b98e7567de5bd855f9fbb55141ef884f769fa6de16"
},
"downloads": -1,
"filename": "kgdata-7.0.4-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "2d6f69b32957ed5c55ad12d033f6b831",
"packagetype": "bdist_wheel",
"python_version": "pp310",
"requires_python": ">=3.10",
"size": 4010277,
"upload_time": "2024-05-11T07:38:29",
"upload_time_iso_8601": "2024-05-11T07:38:29.439606Z",
"url": "https://files.pythonhosted.org/packages/2f/7b/4841cc2edd6a1ec3226de74e6c0238603ee799652239c0fee658d20e1453/kgdata-7.0.4-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "373b9fedb7962c550992b672ab230d320f45253c0ab048be8317fb388b662e04",
"md5": "731e7bb18a91c69ed82b40e585ed7ee7",
"sha256": "a982abc19eb9dc1aaf50c961173c18519d2af61b5d503805dd14473d7f8dd90f"
},
"downloads": -1,
"filename": "kgdata-7.0.4-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "731e7bb18a91c69ed82b40e585ed7ee7",
"packagetype": "bdist_wheel",
"python_version": "pp39",
"requires_python": ">=3.10",
"size": 4010579,
"upload_time": "2024-05-11T07:38:31",
"upload_time_iso_8601": "2024-05-11T07:38:31.814819Z",
"url": "https://files.pythonhosted.org/packages/37/3b/9fedb7962c550992b672ab230d320f45253c0ab048be8317fb388b662e04/kgdata-7.0.4-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "80c8b64411a2bc1bd4b7cb6801badd40e0d1f2fdc461c786a3fb0480cb34ef2a",
"md5": "646ed05148d6d0c8589deeed03400592",
"sha256": "13eb9ec6b781c201dd6607d19940b37f739f568f65c4654aa373a383d7f45219"
},
"downloads": -1,
"filename": "kgdata-7.0.4.tar.gz",
"has_sig": false,
"md5_digest": "646ed05148d6d0c8589deeed03400592",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 150324,
"upload_time": "2024-05-11T07:38:33",
"upload_time_iso_8601": "2024-05-11T07:38:33.523823Z",
"url": "https://files.pythonhosted.org/packages/80/c8/b64411a2bc1bd4b7cb6801badd40e0d1f2fdc461c786a3fb0480cb34ef2a/kgdata-7.0.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-05-11 07:38:33",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "binh-vu",
"github_project": "kgdata",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "kgdata"
}