pynativeextractor

Name	pynativeextractor JSON
Version	10.0.12 JSON
	download
home_page	https://github.com/SpongeData-cz/pynativeextractor
Summary	Python binding for nativeextractor
upload_time	2022-07-13 11:32:51
maintainer
docs_url	None
author	SpongeData s.r.o.
requires_python	>=2.7
license
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # NativeExtractor module for Python
This is official Python binding for the [NativeExtractor](https://github.com/SpongeData-cz/nativeextractor) project.

<p align="center"><img src="https://raw.githubusercontent.com/SpongeData-cz/nativeextractor/main/logo.svg" width="400" /></p>
<p align="center"><img src="logo_python.png" width="400" /></p>

# Installation
## Requirements
* Python >=2.7 (>3 usage is highly recommended)
* `pip`
* `build-essential` (gcc, make)
* `libglib2.0`, `libglib2.0-dev`, `libpythonX-dev`

We recommend to use virtual environments.
```bash
virtualenv myproject
source myproject/bin/activate
```
or
```bash
python -m venv myproject
source myproject/bin/activate
```

## Instant PyPi solution
```pip install pynativeextractor```

## Manual
* Clone the repo
`git clone --recurse-submodules https://github.com/SpongeData-cz/pynativeextractor.git`

* Install via `pip` or `pip3`
    ```bash
    pip install -e ./pynativeextractor/
    ```

# Typical usage

```python
import os
from pynativeextractor.extractor import BufferStream, Extractor, DEFAULT_MINERS_PATH

# Construct new Extractor instance
ex = Extractor()
# Add fictional miner from web_entities.so with name match_url matching all URLs
ex.add_miner_so(os.path.join(DEFAULT_MINERS_PATH, 'web_entities.so'), 'match_url')
text = '{}'.format("https://spongedata.cz")

# Make from hw stream (you can also do the stream from files - use FileStream - mmap is used internally)
with BufferStream(text) as bf:
    # Initialize occurrences list as empty list
    occurrences = []
    # Set the stream to the extractor
    with ex.set_stream(bf):
        # Mine all occurrences of URLs
        while not ex.eof():
            # Summarize occurrences
            occurrences += ex.next()

print(occurrences) # Prints [{'label': 'URL', 'value': 'https://spongedata.cz', 'pos': 0, 'len': 13, 'prob': 1.0}]
```

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/SpongeData-cz/pynativeextractor",
    "name": "pynativeextractor",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=2.7",
    "maintainer_email": "",
    "keywords": "",
    "author": "SpongeData s.r.o.",
    "author_email": "info@spongedata.cz",
    "download_url": "https://files.pythonhosted.org/packages/ae/61/7be4bb317ee6434504f3b34d823f7339e525e5fc6a74bdee0514ceb0182f/pynativeextractor-10.0.12.tar.gz",
    "platform": null,
    "description": "# NativeExtractor module for Python\nThis is official Python binding for the [NativeExtractor](https://github.com/SpongeData-cz/nativeextractor) project.\n\n<p align=\"center\"><img src=\"https://raw.githubusercontent.com/SpongeData-cz/nativeextractor/main/logo.svg\" width=\"400\" /></p>\n<p align=\"center\"><img src=\"logo_python.png\" width=\"400\" /></p>\n\n# Installation\n## Requirements\n* Python >=2.7 (>3 usage is highly recommended)\n* `pip`\n* `build-essential` (gcc, make)\n* `libglib2.0`, `libglib2.0-dev`, `libpythonX-dev`\n\nWe recommend to use virtual environments.\n```bash\nvirtualenv myproject\nsource myproject/bin/activate\n```\nor\n```bash\npython -m venv myproject\nsource myproject/bin/activate\n```\n\n## Instant PyPi solution\n```pip install pynativeextractor```\n\n## Manual\n* Clone the repo\n`git clone --recurse-submodules https://github.com/SpongeData-cz/pynativeextractor.git`\n\n* Install via `pip` or `pip3`\n    ```bash\n    pip install -e ./pynativeextractor/\n    ```\n\n# Typical usage\n\n```python\nimport os\nfrom pynativeextractor.extractor import BufferStream, Extractor, DEFAULT_MINERS_PATH\n\n# Construct new Extractor instance\nex = Extractor()\n# Add fictional miner from web_entities.so with name match_url matching all URLs\nex.add_miner_so(os.path.join(DEFAULT_MINERS_PATH, 'web_entities.so'), 'match_url')\ntext = '{}'.format(\"https://spongedata.cz\")\n\n# Make from hw stream (you can also do the stream from files - use FileStream - mmap is used internally)\nwith BufferStream(text) as bf:\n    # Initialize occurrences list as empty list\n    occurrences = []\n    # Set the stream to the extractor\n    with ex.set_stream(bf):\n        # Mine all occurrences of URLs\n        while not ex.eof():\n            # Summarize occurrences\n            occurrences += ex.next()\n\nprint(occurrences) # Prints [{'label': 'URL', 'value': 'https://spongedata.cz', 'pos': 0, 'len': 13, 'prob': 1.0}]\n```",
    "bugtrack_url": null,
    "license": "",
    "summary": "Python binding for nativeextractor",
    "version": "10.0.12",
    "project_urls": {
        "Bug Tracker": "https://github.com/SpongeData-cz/pynativeextractor/issues",
        "Homepage": "https://github.com/SpongeData-cz/pynativeextractor"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ae617be4bb317ee6434504f3b34d823f7339e525e5fc6a74bdee0514ceb0182f",
                "md5": "a40a10cb26e4df22fe3a6c91d63dfcc9",
                "sha256": "eb6d9bc85bd74d46bf2c0393d1f2ddbf94b41f3c94548b5f40df8bedb65789fd"
            },
            "downloads": -1,
            "filename": "pynativeextractor-10.0.12.tar.gz",
            "has_sig": false,
            "md5_digest": "a40a10cb26e4df22fe3a6c91d63dfcc9",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=2.7",
            "size": 41443,
            "upload_time": "2022-07-13T11:32:51",
            "upload_time_iso_8601": "2022-07-13T11:32:51.734370Z",
            "url": "https://files.pythonhosted.org/packages/ae/61/7be4bb317ee6434504f3b34d823f7339e525e5fc6a74bdee0514ceb0182f/pynativeextractor-10.0.12.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2022-07-13 11:32:51",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "SpongeData-cz",
    "github_project": "pynativeextractor",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "pynativeextractor"
}

SpongeData s.r.o.