pynativeextractor


Namepynativeextractor JSON
Version 1.0.1 PyPI version JSON
download
home_pagehttps://github.com/SpongeData-cz/pynativeextractor
SummaryPython binding for nativeextractor
upload_time2021-06-11 07:01:00
maintainer
docs_urlNone
authorSpongeData s.r.o.
requires_python>=2.7
license
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # NativeExtractor module for Python
This is official Python binding for the [NativeExtractor](https://github.com/SpongeData-cz/nativeextractor) project.

<p align="center"><img src="https://raw.githubusercontent.com/SpongeData-cz/nativeextractor/main/logo.svg" width="400" /></p>
<p align="center"><img src="logo_python.png" width="400" /></p>

# Installation
## Requirements
* Python >=2.7 (>3 usage is highly recommended)
* `pip`
* `build-essential` (gcc, make)
* `libglib2.0`, `libglib2.0-dev`, `libpythonX-dev`

We recommend to use virtual environments.
```bash
virtualenv myproject
source myproject/bin/activate
```
or
```bash
python -m venv myproject
source myproject/bin/activate
```

## Instant PyPi solution
**TODO:**
```pip install pynativeextractor```

## Manual
* Clone the repo
`git clone --recurse-submodules https://github.com/SpongeData-cz/pynativeextractor.git`

* Install via `pip` or `pip3`
    ```bash
    pip install -e ./pynativeextractor/
    ```

# Typical usage

```python
import os
from pynativeextractor.extractor import BufferStream, Extractor, DEFAULT_MINERS_PATH

# Construct new Extractor instance
ex = Extractor()
# Add fictional miner from web_entities.so with name match_url matching all URLs
ex.add_miner_so(os.path.join(DEFAULT_MINERS_PATH, 'web_entities.so'), 'match_url')
text = '{}'.format("https://spongedata.cz")

# Make from hw stream (you can also do the stream from files - use FileStream - mmap is used internally)
with BufferStream(text) as bf:
    # Initialize occurrences list as empty list
    occurrences = []
    # Set the stream to the extractor
    with ex.set_stream(bf):
        # Mine all occurrences of URLs
        while not ex.eof():
            # Summarize occurrences
            occurrences += ex.next()

print(occurrences) # Prints [{'label': 'URL', 'value': 'https://spongedata.cz', 'pos': 0, 'len': 13, 'prob': 1.0}]
```
            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/SpongeData-cz/pynativeextractor",
    "name": "pynativeextractor",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=2.7",
    "maintainer_email": "",
    "keywords": "",
    "author": "SpongeData s.r.o.",
    "author_email": "info@spongedata.cz",
    "download_url": "https://files.pythonhosted.org/packages/d2/c9/d14326e22af5bab0a6848c57ffb7f7547c0a941b16de3775b2a59abbeb43/pynativeextractor-1.0.1.tar.gz",
    "platform": "",
    "description": "# NativeExtractor module for Python\nThis is official Python binding for the [NativeExtractor](https://github.com/SpongeData-cz/nativeextractor) project.\n\n<p align=\"center\"><img src=\"https://raw.githubusercontent.com/SpongeData-cz/nativeextractor/main/logo.svg\" width=\"400\" /></p>\n<p align=\"center\"><img src=\"logo_python.png\" width=\"400\" /></p>\n\n# Installation\n## Requirements\n* Python >=2.7 (>3 usage is highly recommended)\n* `pip`\n* `build-essential` (gcc, make)\n* `libglib2.0`, `libglib2.0-dev`, `libpythonX-dev`\n\nWe recommend to use virtual environments.\n```bash\nvirtualenv myproject\nsource myproject/bin/activate\n```\nor\n```bash\npython -m venv myproject\nsource myproject/bin/activate\n```\n\n## Instant PyPi solution\n**TODO:**\n```pip install pynativeextractor```\n\n## Manual\n* Clone the repo\n`git clone --recurse-submodules https://github.com/SpongeData-cz/pynativeextractor.git`\n\n* Install via `pip` or `pip3`\n    ```bash\n    pip install -e ./pynativeextractor/\n    ```\n\n# Typical usage\n\n```python\nimport os\nfrom pynativeextractor.extractor import BufferStream, Extractor, DEFAULT_MINERS_PATH\n\n# Construct new Extractor instance\nex = Extractor()\n# Add fictional miner from web_entities.so with name match_url matching all URLs\nex.add_miner_so(os.path.join(DEFAULT_MINERS_PATH, 'web_entities.so'), 'match_url')\ntext = '{}'.format(\"https://spongedata.cz\")\n\n# Make from hw stream (you can also do the stream from files - use FileStream - mmap is used internally)\nwith BufferStream(text) as bf:\n    # Initialize occurrences list as empty list\n    occurrences = []\n    # Set the stream to the extractor\n    with ex.set_stream(bf):\n        # Mine all occurrences of URLs\n        while not ex.eof():\n            # Summarize occurrences\n            occurrences += ex.next()\n\nprint(occurrences) # Prints [{'label': 'URL', 'value': 'https://spongedata.cz', 'pos': 0, 'len': 13, 'prob': 1.0}]\n```",
    "bugtrack_url": null,
    "license": "",
    "summary": "Python binding for nativeextractor",
    "version": "1.0.1",
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "e7d6c3f988f71abc022ff5b789a765d7",
                "sha256": "9c231bf621117b003be6fe41044459badda3b2a61619efe868f8ce5fa6b97b86"
            },
            "downloads": -1,
            "filename": "pynativeextractor-1.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "e7d6c3f988f71abc022ff5b789a765d7",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=2.7",
            "size": 52318,
            "upload_time": "2021-06-11T07:01:00",
            "upload_time_iso_8601": "2021-06-11T07:01:00.238330Z",
            "url": "https://files.pythonhosted.org/packages/d2/c9/d14326e22af5bab0a6848c57ffb7f7547c0a941b16de3775b2a59abbeb43/pynativeextractor-1.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2021-06-11 07:01:00",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": null,
    "github_project": "SpongeData-cz",
    "error": "Could not fetch GitHub repository",
    "lcname": "pynativeextractor"
}
        
Elapsed time: 0.30161s