patent-parsing-tools


Namepatent-parsing-tools JSON
Version 0.9.5 PyPI version JSON
download
home_pagehttps://github.com/pprzetacznik/patent-parsing-tools
Summarypatent-parsing-tools is a library providing tools for generating training and test set from Google's USPTO data helpful with for testing machine learning algorithms
upload_time2024-12-29 21:11:17
maintainerNone
docs_urlNone
authorMichal Dul, Piotr Przetacznik, Krzysztof Strojny
requires_pythonNone
licenseMIT
keywords deeplearning dbn rbm rsm backpropagation precission recall
VCS
bugtrack_url
requirements lxml nltk stemming Sphinx sphinx_rtd_theme requests numpy Theano mypy pytest pytest-cov
Travis-CI No Travis.
coveralls test coverage No coveralls.
            patent-parsing-tools
====================
USPTO patents dataset generator.

[![Documentation Status](https://readthedocs.org/projects/patent-parsing-tools/badge/?version=latest)](https://patent-parsing-tools.readthedocs.io/en/latest/?badge=latest)
[![patent-parsing-tools CI](https://github.com/pprzetacznik/patent-parsing-tools/workflows/patent-parsing-tools%20CI/badge.svg)](https://github.com/pprzetacznik/patent-parsing-tools/actions?query=workflow%3A"patent-parsing-tools+CI")
[![PyPI version](https://badge.fury.io/py/patent-parsing-tools.svg)](https://pypi.org/project/patent-parsing-tools/)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/patent-parsing-tools)](https://pypi.org/project/patent-parsing-tools/)

## Documentation

[Read the docs](https://patent-parsing-tools.readthedocs.io/en/latest/)

## System requirements

```Bash
sudo yum install python-devel libxslt-devel libxml2-devel
```

## Installation:

```
pip install patent-parsing-tools
```

## Examples:

Downloading dataset:
```Bash
python -m patent_parsing_tools.downloader \
  --directory dataset \
  --year-from 2010 \
  --year-to 2010
```

Collecting and serializing data:
```Bash
python -m patent_parsing_tools.supervisor \
  --working-directory patents/working_directory \
  --train-destination patents/train_destination \
  --test-destination patents/test_destination \
  --year-from 2014 \
  --year-to 2015
```

Generating dictionary with train set:
```Bash
python -m patent_parsing_tools.bow.dictionary_maker \
  --train-directory patents/train_destination \
  --max-patents 1000000000 \
  --dictionary dictionary.txt \
  --dict-max-size 4096
```

Generate bag of words with train set and test set:
```Bash
python -m patent_parsing_tools.bow.bag_of_words \
  --serialized-patents patents/train_destination \
  --destination-directory patents/final_dataset_train \
  --dictionary dictionary.txt \
  --batch-size 1048576
python -m patent_parsing_tools.bow.bag_of_words \
  --serialized-patents patents/test_destination \
  --destination-directory patents/final_dataset_test \
  --dictionary dictionary.txt \
  --batch-size 1048576
```

## Testing

```Bash
pytest
```

## Contributing and develpment

```Bash
$ mkvirtualenv ppt
$ workon ppt
(ppt) $ pip install -r requirements.txt
```

## Publish new release

```Bash
$ git tag v1.0
$ git push origin v1.0
```

## Building documentation

```Bash
(ppt) $ sphinx-build -M html docs docs_build
```

## References

Usage:
* Elton, *Using natural language processing techniques to extract information on the properties and functionalities of energetic materials from large text corpora*, 2019, online: [https://arxiv.org/abs/1903.00415](https://arxiv.org/abs/1903.00415).
* Lee, *Natural Language Processing Techniques for Advancing Materials Discovery: A Short Review*, 2023, online: [https://doi.org/10.1007/s40684-023-00523-6](https://doi.org/10.1007/s40684-023-00523-6).

## License

The MIT License (MIT). Copyright (c) 2014 MichaƂ Dul, Piotr Przetacznik, Krzysztof Strojny. Check [LICENSE](LICENSE) files for more information.




            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/pprzetacznik/patent-parsing-tools",
    "name": "patent-parsing-tools",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "deeplearning dbn rbm rsm backpropagation precission recall",
    "author": "Michal Dul, Piotr Przetacznik, Krzysztof Strojny",
    "author_email": "piotr.przetacznik@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/f0/18/0b8a5cbd4e2fb669e2c34b0c10839bf089ded88e02ff94e4f80c1b749896/patent-parsing-tools-0.9.5.tar.gz",
    "platform": null,
    "description": "patent-parsing-tools\n====================\nUSPTO patents dataset generator.\n\n[![Documentation Status](https://readthedocs.org/projects/patent-parsing-tools/badge/?version=latest)](https://patent-parsing-tools.readthedocs.io/en/latest/?badge=latest)\n[![patent-parsing-tools CI](https://github.com/pprzetacznik/patent-parsing-tools/workflows/patent-parsing-tools%20CI/badge.svg)](https://github.com/pprzetacznik/patent-parsing-tools/actions?query=workflow%3A\"patent-parsing-tools+CI\")\n[![PyPI version](https://badge.fury.io/py/patent-parsing-tools.svg)](https://pypi.org/project/patent-parsing-tools/)\n[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/patent-parsing-tools)](https://pypi.org/project/patent-parsing-tools/)\n\n## Documentation\n\n[Read the docs](https://patent-parsing-tools.readthedocs.io/en/latest/)\n\n## System requirements\n\n```Bash\nsudo yum install python-devel libxslt-devel libxml2-devel\n```\n\n## Installation:\n\n```\npip install patent-parsing-tools\n```\n\n## Examples:\n\nDownloading dataset:\n```Bash\npython -m patent_parsing_tools.downloader \\\n  --directory dataset \\\n  --year-from 2010 \\\n  --year-to 2010\n```\n\nCollecting and serializing data:\n```Bash\npython -m patent_parsing_tools.supervisor \\\n  --working-directory patents/working_directory \\\n  --train-destination patents/train_destination \\\n  --test-destination patents/test_destination \\\n  --year-from 2014 \\\n  --year-to 2015\n```\n\nGenerating dictionary with train set:\n```Bash\npython -m patent_parsing_tools.bow.dictionary_maker \\\n  --train-directory patents/train_destination \\\n  --max-patents 1000000000 \\\n  --dictionary dictionary.txt \\\n  --dict-max-size 4096\n```\n\nGenerate bag of words with train set and test set:\n```Bash\npython -m patent_parsing_tools.bow.bag_of_words \\\n  --serialized-patents patents/train_destination \\\n  --destination-directory patents/final_dataset_train \\\n  --dictionary dictionary.txt \\\n  --batch-size 1048576\npython -m patent_parsing_tools.bow.bag_of_words \\\n  --serialized-patents patents/test_destination \\\n  --destination-directory patents/final_dataset_test \\\n  --dictionary dictionary.txt \\\n  --batch-size 1048576\n```\n\n## Testing\n\n```Bash\npytest\n```\n\n## Contributing and develpment\n\n```Bash\n$ mkvirtualenv ppt\n$ workon ppt\n(ppt) $ pip install -r requirements.txt\n```\n\n## Publish new release\n\n```Bash\n$ git tag v1.0\n$ git push origin v1.0\n```\n\n## Building documentation\n\n```Bash\n(ppt) $ sphinx-build -M html docs docs_build\n```\n\n## References\n\nUsage:\n* Elton, *Using natural language processing techniques to extract information on the properties and functionalities of energetic materials from large text corpora*, 2019, online: [https://arxiv.org/abs/1903.00415](https://arxiv.org/abs/1903.00415).\n* Lee, *Natural Language Processing Techniques for Advancing Materials Discovery: A Short Review*, 2023, online: [https://doi.org/10.1007/s40684-023-00523-6](https://doi.org/10.1007/s40684-023-00523-6).\n\n## License\n\nThe MIT License (MIT). Copyright (c) 2014 Micha\u0142 Dul, Piotr Przetacznik, Krzysztof Strojny. Check [LICENSE](LICENSE) files for more information.\n\n\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "patent-parsing-tools is a library providing tools for generating training and test set from Google's USPTO data helpful with for testing machine learning algorithms",
    "version": "0.9.5",
    "project_urls": {
        "Homepage": "https://github.com/pprzetacznik/patent-parsing-tools"
    },
    "split_keywords": [
        "deeplearning",
        "dbn",
        "rbm",
        "rsm",
        "backpropagation",
        "precission",
        "recall"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "65f78b2f6a6f49f85107f3660af525809cb6714a8b61fb9673b09dfc8c0f8b40",
                "md5": "07c93439d0b248945ae9db1e24e62b20",
                "sha256": "7bb52a2deaaaec6faa49ac3d78f59f959189d4e4215a15776e5cadbb40dd3802"
            },
            "downloads": -1,
            "filename": "patent_parsing_tools-0.9.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "07c93439d0b248945ae9db1e24e62b20",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 1568951,
            "upload_time": "2024-12-29T21:11:15",
            "upload_time_iso_8601": "2024-12-29T21:11:15.367646Z",
            "url": "https://files.pythonhosted.org/packages/65/f7/8b2f6a6f49f85107f3660af525809cb6714a8b61fb9673b09dfc8c0f8b40/patent_parsing_tools-0.9.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f0180b8a5cbd4e2fb669e2c34b0c10839bf089ded88e02ff94e4f80c1b749896",
                "md5": "4fe1f2bf42c6a2f3fb84b245ef36f67f",
                "sha256": "8a4c2da98468fde1c87ca20d01cc1988b077e9a5493588b2e192f22e9c7883ef"
            },
            "downloads": -1,
            "filename": "patent-parsing-tools-0.9.5.tar.gz",
            "has_sig": false,
            "md5_digest": "4fe1f2bf42c6a2f3fb84b245ef36f67f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 1524932,
            "upload_time": "2024-12-29T21:11:17",
            "upload_time_iso_8601": "2024-12-29T21:11:17.948824Z",
            "url": "https://files.pythonhosted.org/packages/f0/18/0b8a5cbd4e2fb669e2c34b0c10839bf089ded88e02ff94e4f80c1b749896/patent-parsing-tools-0.9.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-29 21:11:17",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "pprzetacznik",
    "github_project": "patent-parsing-tools",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "lxml",
            "specs": []
        },
        {
            "name": "nltk",
            "specs": []
        },
        {
            "name": "stemming",
            "specs": []
        },
        {
            "name": "Sphinx",
            "specs": []
        },
        {
            "name": "sphinx_rtd_theme",
            "specs": []
        },
        {
            "name": "requests",
            "specs": []
        },
        {
            "name": "numpy",
            "specs": [
                [
                    "==",
                    "1.22.0"
                ]
            ]
        },
        {
            "name": "Theano",
            "specs": [
                [
                    "==",
                    "0.9.0"
                ]
            ]
        },
        {
            "name": "mypy",
            "specs": []
        },
        {
            "name": "pytest",
            "specs": []
        },
        {
            "name": "pytest-cov",
            "specs": []
        }
    ],
    "lcname": "patent-parsing-tools"
}
        
Elapsed time: 0.46325s