cathodedataextractor


Namecathodedataextractor JSON
Version 0.0.4 PyPI version JSON
download
home_pagehttps://github.com/GGNoWayBack/cathodedataextractor
SummaryA document-level information extraction pipeline for layered cathode materials for sodium-ion batteries.
upload_time2024-03-17 15:06:22
maintainer
docs_urlNone
authorYuxiao Gou
requires_python
licenseMIT
keywords text-mining information-extraction nlp battery-information
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # CathodeDataExtractor

------------

[![Supported Python versions](https://img.shields.io/badge/python-3.6%20%7C%203.7-blue.svg)](https://www.python.org/downloads/) [![GitHub LICENSE](https://img.shields.io/github/license/GGNoWayBack/cathodedataextractor.svg)](https://github.com/GGNoWayBack/cathodedataextractor/blob/main/LICENSE)  [![PyPI version](https://badge.fury.io/py/cathodedataextractor.svg)](https://badge.fury.io/py/cathodedataextractor)  
`Cathodedataextractor` is a lightweight document-level information extraction pipeline that can automatically extract
comprehensive properties related to synthesis parameters, cycling and rate performance of cathode materials from the
literature of layered cathode materials for sodium-ion batteries.

## Installation

------------

`pip install cathodedataextractor`

## Features

------------
- It is built on open-source libraries: [pymatgen], [text2chem], and [ChemDataExtractor v2] with some modifications.
- [BatterySciBERT-uncased Multi-Label text classification] model for filtering documents. 
- Automated comprehensive data extraction pipeline for cathode materials.
- Paragraph Multi-Class classification algorithms for documents (HTML/XML) from the [RSC] and [Elsevier].
- A normalised entity handling process is provided.
- An effective chemical abbreviation detection module.
- Heuristic multi-level relation extraction algorithm for electrochemical properties.

In addition, the pipeline is also suitable for string sequence text extraction.

## Quick start

------------
#### Extract from documents

```python
from glob import iglob
from cathodedataextractor.information_extraction_pipe import Pipeline

pipline = Pipeline()
for document in iglob('*ml'):
    extraction_results = pipline.extract(document)
```
> 

#### Extract from string

```python
from cathodedataextractor.information_extraction_pipe import Pipeline

extraction_results = Pipeline.from_string(
    'Apart from the conventional cationic redox of transition metals, '
    'both Na-deficit and Na-excess materials have showcased the ability '
    'to exploit oxygen redox activity as O2ā€“/O2nā€“ for a charge '
    'compensation mechanism. To realize cathodes with enhanced energy '
    'density, a technique like the incorporation of alkali metal ions '
    'into transition metal layers has been adopted. Recent work by Boisse '
    '(13) et al. displayed the impact of honeycomb cation ordering of '
    'a highly stabilized intermediate phase for a Na2RuO3 cathode material '
    'in instigating the anionic redox activity and providing a capacity '
    'of 180 mAh gā€“1 at 0.2C with a capacity retention of 89% for over '
    '50 cycles. More devoted efforts to realize the utmost potential '
    'of anionic redox ought to be carried out in the future.')
```
> 

## Issues?

------------
You can either report an issue on GitHub or contact me directly. 
Try [gouyx@mail2.sysu.edu.cn](mailto:gouyx@mail2.sysu.edu.cn).











[pymatgen]: https://pymatgen.org

[text2chem]: https://github.com/CederGroupHub/text2chem

[ChemDataExtractor v2]: https://github.com/CambridgeMolecularEngineering/chemdataextractor2

[RSC]: https://pubs.rsc.org/

[Elsevier]: https://www.elsevier.com/

[BatterySciBERT-uncased Multi-Label text classification]: https://huggingface.co/NoWayBack/batteryscibert-uncased-abstract-mtc
            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/GGNoWayBack/cathodedataextractor",
    "name": "cathodedataextractor",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "text-mining information-extraction nlp battery-information",
    "author": "Yuxiao Gou",
    "author_email": "gouyx@mail2.sysu.edu.cn",
    "download_url": "https://files.pythonhosted.org/packages/3d/7a/5a5e6df1ce4adb2428a1caf6468e7a125e31f3833da6d0baa9c15e2a76f4/cathodedataextractor-0.0.4.tar.gz",
    "platform": null,
    "description": "# CathodeDataExtractor\n\n------------\n\n[![Supported Python versions](https://img.shields.io/badge/python-3.6%20%7C%203.7-blue.svg)](https://www.python.org/downloads/) [![GitHub LICENSE](https://img.shields.io/github/license/GGNoWayBack/cathodedataextractor.svg)](https://github.com/GGNoWayBack/cathodedataextractor/blob/main/LICENSE)  [![PyPI version](https://badge.fury.io/py/cathodedataextractor.svg)](https://badge.fury.io/py/cathodedataextractor)  \n`Cathodedataextractor` is a lightweight document-level information extraction pipeline that can automatically extract\ncomprehensive properties related to synthesis parameters, cycling and rate performance of cathode materials from the\nliterature of layered cathode materials for sodium-ion batteries.\n\n## Installation\n\n------------\n\n`pip install cathodedataextractor`\n\n## Features\n\n------------\n- It is built on open-source libraries: [pymatgen], [text2chem], and [ChemDataExtractor v2] with some modifications.\n- [BatterySciBERT-uncased Multi-Label text classification] model for filtering documents. \n- Automated comprehensive data extraction pipeline for cathode materials.\n- Paragraph Multi-Class classification algorithms for documents (HTML/XML) from the [RSC] and [Elsevier].\n- A normalised entity handling process is provided.\n- An effective chemical abbreviation detection module.\n- Heuristic multi-level relation extraction algorithm for electrochemical properties.\n\nIn addition, the pipeline is also suitable for string sequence text extraction.\n\n## Quick start\n\n------------\n#### Extract from documents\n\n```python\nfrom glob import iglob\nfrom cathodedataextractor.information_extraction_pipe import Pipeline\n\npipline = Pipeline()\nfor document in iglob('*ml'):\n    extraction_results = pipline.extract(document)\n```\n> \n\n#### Extract from string\n\n```python\nfrom cathodedataextractor.information_extraction_pipe import Pipeline\n\nextraction_results = Pipeline.from_string(\n    'Apart from the conventional cationic redox of transition metals, '\n    'both Na-deficit and Na-excess materials have showcased the ability '\n    'to exploit oxygen redox activity as O2\u2013/O2n\u2013 for a charge '\n    'compensation mechanism. To realize cathodes with enhanced energy '\n    'density, a technique like the incorporation of alkali metal ions '\n    'into transition metal layers has been adopted. Recent work by Boisse '\n    '(13) et al. displayed the impact of honeycomb cation ordering of '\n    'a highly stabilized intermediate phase for a Na2RuO3 cathode material '\n    'in instigating the anionic redox activity and providing a capacity '\n    'of 180 mAh g\u20131 at 0.2C with a capacity retention of 89% for over '\n    '50 cycles. More devoted efforts to realize the utmost potential '\n    'of anionic redox ought to be carried out in the future.')\n```\n> \n\n## Issues?\n\n------------\nYou can either report an issue on GitHub or contact me directly. \nTry [gouyx@mail2.sysu.edu.cn](mailto:gouyx@mail2.sysu.edu.cn).\n\n\n\n\n\n\n\n\n\n\n\n[pymatgen]: https://pymatgen.org\n\n[text2chem]: https://github.com/CederGroupHub/text2chem\n\n[ChemDataExtractor v2]: https://github.com/CambridgeMolecularEngineering/chemdataextractor2\n\n[RSC]: https://pubs.rsc.org/\n\n[Elsevier]: https://www.elsevier.com/\n\n[BatterySciBERT-uncased Multi-Label text classification]: https://huggingface.co/NoWayBack/batteryscibert-uncased-abstract-mtc",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A document-level information extraction pipeline for layered cathode materials for sodium-ion batteries.",
    "version": "0.0.4",
    "project_urls": {
        "Homepage": "https://github.com/GGNoWayBack/cathodedataextractor"
    },
    "split_keywords": [
        "text-mining",
        "information-extraction",
        "nlp",
        "battery-information"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3d7a5a5e6df1ce4adb2428a1caf6468e7a125e31f3833da6d0baa9c15e2a76f4",
                "md5": "a85a2f9bae3e93dddfc7428221ef14a9",
                "sha256": "c23d7d1e982a93d6dc9014bd39d344d939e4020115b5a080f5ac1e8f946c513f"
            },
            "downloads": -1,
            "filename": "cathodedataextractor-0.0.4.tar.gz",
            "has_sig": false,
            "md5_digest": "a85a2f9bae3e93dddfc7428221ef14a9",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 65404,
            "upload_time": "2024-03-17T15:06:22",
            "upload_time_iso_8601": "2024-03-17T15:06:22.569492Z",
            "url": "https://files.pythonhosted.org/packages/3d/7a/5a5e6df1ce4adb2428a1caf6468e7a125e31f3833da6d0baa9c15e2a76f4/cathodedataextractor-0.0.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-03-17 15:06:22",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "GGNoWayBack",
    "github_project": "cathodedataextractor",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "cathodedataextractor"
}
        
Elapsed time: 0.23151s