| Name | refextract JSON |
| Version |
1.1.6
JSON |
| download |
| home_page | None |
| Summary | Small library for extracting references used in scholarly communication. |
| upload_time | 2025-10-21 09:48:19 |
| maintainer | None |
| docs_url | https://pythonhosted.org/refextract/ |
| author | CERN |
| requires_python | <4,>=3.11 |
| license | GPL-2.0-or-later |
| keywords |
|
| VCS |
 |
| bugtrack_url |
|
| requirements |
No requirements were recorded.
|
| Travis-CI |
No Travis.
|
| coveralls test coverage |
No coveralls.
|
# refextract
## About
A library for extracting references used in scholarly communication.
## Getting Started
Note: due to the usage of `mmap` resize functionality this library cannot be locally installed on a mac
### Docker Setup:
Before the first usage, or anytime a new library/dependency is changed a new docker image must be created using:
```shell
docker build --target refextract-tests -t refextract .
```
After that, spin up a `refextract` service with:
```shell
docker run -it -v ./tests:/refextract/tests -v ./refextract:/refextract/refextract refextract
```
### Running tests
Exec into the container via
```shell
docker exec -it refextract /bin/bash
```
Then simply run
```shell
pytest .
```
## Usage
To get structured information from a publication reference:
``` python
>>> from refextract import extract_journal_reference
>>> reference = extract_journal_reference('J.Phys.,A39,13445')
>>> print(reference)
{
'extra_ibids': [],
'is_ibid': False,
'misc_txt': '',
'page': '13445',
'title': 'J. Phys.',
'type': 'JOURNAL',
'volume': 'A39',
'year': '',
}
```
To extract references from a PDF:
``` python
>>> from refextract import extract_references_from_file
>>> references = extract_references_from_file('1503.07589.pdf')
>>> print(references[0])
{
'author': ['F. Englert and R. Brout'],
'doi': ['doi:10.1103/PhysRevLett.13.321'],
'journal_page': ['321'],
'journal_reference': ['Phys. Rev. Lett. 13 (1964) 321'],
'journal_title': ['Phys. Rev. Lett.'],
'journal_volume': ['13'],
'journal_year': ['1964'],
'linemarker': ['1'],
'raw_ref': ['[1] F. Englert and R. Brout, \u201cBroken symmetry and the mass of gauge vector mesons\u201d, Phys. Rev. Lett. 13 (1964) 321, doi:10.1103/PhysRevLett.13.321.'],
'texkey': ['Englert:1964et'],
'year': ['1964'],
}
```
To extract directly from a URL:
``` python
>>> from refextract import extract_references_from_url
>>> references = extract_references_from_url('https://arxiv.org/pdf/1503.07589.pdf')
>>> print(references[0])
{
'author': ['F. Englert and R. Brout'],
'doi': ['doi:10.1103/PhysRevLett.13.321'],
'journal_page': ['321'],
'journal_reference': ['Phys. Rev. Lett. 13 (1964) 321'],
'journal_title': ['Phys. Rev. Lett.'],
'journal_volume': ['13'],
'journal_year': ['1964'],
'linemarker': ['1'],
'raw_ref': ['[1] F. Englert and R. Brout, \u201cBroken symmetry and the mass of gauge vector mesons\u201d, Phys. Rev. Lett. 13 (1964) 321, doi:10.1103/PhysRevLett.13.321.'],
'texkey': ['Englert:1964et'],
'year': ['1964'],
}
```
## Notes
`refextract` depends on
[pdftotext](http://linux.die.net/man/1/pdftotext).
## Acknowledgments
`refextract` is based on code and ideas from the following people, who
contributed to the `docextract` module in Invenio:
- Alessio Deiana
- Federico Poli
- Gerrit Rindermann
- Graham R. Armstrong
- Grzegorz Szpura
- Jan Aage Lavik
- Javier Martin Montull
- Micha Moskovic
- Samuele Kaplun
- Thorsten Schwander
- Tibor Simko
## License
GPLv2
Raw data
{
"_id": null,
"home_page": null,
"name": "refextract",
"maintainer": null,
"docs_url": "https://pythonhosted.org/refextract/",
"requires_python": "<4,>=3.11",
"maintainer_email": null,
"keywords": null,
"author": "CERN",
"author_email": "admin@inspirehep.net",
"download_url": "https://files.pythonhosted.org/packages/f2/5d/ec25190dd00f7121eebcde4656402c59ee565f88adcee40e1c8f8e602c00/refextract-1.1.6.tar.gz",
"platform": null,
"description": "\n# refextract\n\n## About\n\nA library for extracting references used in scholarly communication.\n\n## Getting Started\n\nNote: due to the usage of `mmap` resize functionality this library cannot be locally installed on a mac\n\n### Docker Setup:\n\nBefore the first usage, or anytime a new library/dependency is changed a new docker image must be created using:\n```shell\ndocker build --target refextract-tests -t refextract .\n```\n\nAfter that, spin up a `refextract` service with:\n```shell\ndocker run -it -v ./tests:/refextract/tests -v ./refextract:/refextract/refextract refextract\n```\n\n### Running tests\n\nExec into the container via\n```shell\ndocker exec -it refextract /bin/bash\n```\nThen simply run\n```shell\npytest .\n```\n\n## Usage\n\nTo get structured information from a publication reference:\n\n\n``` python\n>>> from refextract import extract_journal_reference\n>>> reference = extract_journal_reference('J.Phys.,A39,13445')\n>>> print(reference)\n{\n'extra_ibids': [],\n'is_ibid': False,\n'misc_txt': '',\n'page': '13445',\n'title': 'J. Phys.',\n'type': 'JOURNAL',\n'volume': 'A39',\n'year': '',\n\n}\n```\n\nTo extract references from a PDF:\n``` python\n>>> from refextract import extract_references_from_file\n>>> references = extract_references_from_file('1503.07589.pdf')\n>>> print(references[0])\n{\n'author': ['F. Englert and R. Brout'],\n'doi': ['doi:10.1103/PhysRevLett.13.321'],\n'journal_page': ['321'],\n'journal_reference': ['Phys. Rev. Lett. 13 (1964) 321'],\n'journal_title': ['Phys. Rev. Lett.'],\n'journal_volume': ['13'],\n'journal_year': ['1964'],\n'linemarker': ['1'],\n'raw_ref': ['[1] F. Englert and R. Brout, \\u201cBroken symmetry and the mass of gauge vector mesons\\u201d, Phys. Rev. Lett. 13 (1964) 321, doi:10.1103/PhysRevLett.13.321.'],\n'texkey': ['Englert:1964et'],\n'year': ['1964'],\n}\n```\n\nTo extract directly from a URL:\n``` python\n>>> from refextract import extract_references_from_url\n>>> references = extract_references_from_url('https://arxiv.org/pdf/1503.07589.pdf')\n>>> print(references[0])\n{\n'author': ['F. Englert and R. Brout'],\n'doi': ['doi:10.1103/PhysRevLett.13.321'],\n'journal_page': ['321'],\n'journal_reference': ['Phys. Rev. Lett. 13 (1964) 321'],\n'journal_title': ['Phys. Rev. Lett.'],\n'journal_volume': ['13'],\n'journal_year': ['1964'],\n'linemarker': ['1'],\n'raw_ref': ['[1] F. Englert and R. Brout, \\u201cBroken symmetry and the mass of gauge vector mesons\\u201d, Phys. Rev. Lett. 13 (1964) 321, doi:10.1103/PhysRevLett.13.321.'],\n'texkey': ['Englert:1964et'],\n'year': ['1964'],\n\n}\n\n```\n\n## Notes\n`refextract` depends on\n\n[pdftotext](http://linux.die.net/man/1/pdftotext).\n\n## Acknowledgments\n\n`refextract` is based on code and ideas from the following people, who\n\ncontributed to the `docextract` module in Invenio:\n- Alessio Deiana\n- Federico Poli\n- Gerrit Rindermann\n- Graham R. Armstrong\n- Grzegorz Szpura\n- Jan Aage Lavik\n- Javier Martin Montull\n- Micha Moskovic\n- Samuele Kaplun\n- Thorsten Schwander\n- Tibor Simko\n\n## License\nGPLv2\n\n",
"bugtrack_url": null,
"license": "GPL-2.0-or-later",
"summary": "Small library for extracting references used in scholarly communication.",
"version": "1.1.6",
"project_urls": {
"Homepage": "https://github.com/inspirehep/refextract"
},
"split_keywords": [],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "bc39f00089a804db6b1516568a7479a816dd413f2d12c526d65e746574634f97",
"md5": "ec803f8993c3e2ec0220679ed4fac2a8",
"sha256": "8fab1374a91e264dc23fac81f3b7ab31fcd4bd970756b9d4417974640fa03e77"
},
"downloads": -1,
"filename": "refextract-1.1.6-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ec803f8993c3e2ec0220679ed4fac2a8",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4,>=3.11",
"size": 276145,
"upload_time": "2025-10-21T09:48:18",
"upload_time_iso_8601": "2025-10-21T09:48:18.077303Z",
"url": "https://files.pythonhosted.org/packages/bc/39/f00089a804db6b1516568a7479a816dd413f2d12c526d65e746574634f97/refextract-1.1.6-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "f25dec25190dd00f7121eebcde4656402c59ee565f88adcee40e1c8f8e602c00",
"md5": "bee3ba760883bd8dce08ad1f9caaa216",
"sha256": "d1cfd235286f1e77af9992c493a3fab83bd3c6d69e91962f0c8c97dae45dc226"
},
"downloads": -1,
"filename": "refextract-1.1.6.tar.gz",
"has_sig": false,
"md5_digest": "bee3ba760883bd8dce08ad1f9caaa216",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4,>=3.11",
"size": 259512,
"upload_time": "2025-10-21T09:48:19",
"upload_time_iso_8601": "2025-10-21T09:48:19.717465Z",
"url": "https://files.pythonhosted.org/packages/f2/5d/ec25190dd00f7121eebcde4656402c59ee565f88adcee40e1c8f8e602c00/refextract-1.1.6.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-21 09:48:19",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "inspirehep",
"github_project": "refextract",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "refextract"
}