IUExtract


NameIUExtract JSON
Version 1.0.8 PyPI version JSON
download
home_pagehttps://github.com/TT-CL/iuextract
SummaryRule-based Idea Unit segmentation algorithm for the English language.
upload_time2024-12-17 13:41:22
maintainerNone
docs_urlNone
authorGecchele Marcello
requires_python>=3.8
licenseThe Clear BSD License Copyright (c) 2022 Marcello Gecchele, Tokunaga Laboratory of Computational Linguistics, Tokyo Institute of Technology All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted (subject to the limitations in the disclaimer below) provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. NO EXPRESS OR IMPLIED LICENSES TO ANY PARTY'S PATENT RIGHTS ARE GRANTED BY THIS LICENSE. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
keywords idea unit textual segmentation segmentation linguistics
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # IUExtract
Rule-based Idea Unit segmentation algorithm for the English language.

## Example Segmentation
```
My dog, Chippy, just won its first grooming competition.
```
Will be segmented into the following Idea Units:
```
D1|My dog,
2|Chippy,
D1|just
```
Each line denotes a segment. At the beginning of each line there is an Idea Unit index. Each Unit is assigned an index in sequential order. Discontinuous Units are prefixed by the character "D". Naturally, these indexes can be found on multiple lines and the complete Idea Unit can be obtained by joining the lines with the same index.
## Installation
### Installation as standalone executable via pipx
To install the package as a command line tool first install pipx. Specific instructions for your operative system can be found [here](https://pipx.pypa.io/latest/installation/).
If you have python installed, you can install pipx with the following commands:
```
python3 -m pip install -U pipx
python3 -m pipx ensurepath
```
After pipx is installed, you can install IUExtract with the following command:
```
pipx install iuextract
```
If the install fails you might want to try to pin a specific python version with the following command:
```
pipx install iuextract --python 3.9
```
**Note:** on first run, the program will download the Spacy model `en_core_web_lg`. This could take some time. A custom Spacy model can be selected if you install iuextract as a python module.

### Installation as a python module
If you want to use IUExtract in your python projects you will need to install it as a regular python module.
First of all, you need to install the dependencies:
```
pip install spacy
python -m spacy download en_core_web_lg
```
You can then install IUExtract.
```
pip install iuextract
```
## Command Line Interface (CLI) Usage
Once installed via `pipx`, you can run iuextract directly from the CLI.

Example:
```
iuextract My dog, Chippy, just won its first grooming competition.
```
will output
```
D1|My dog,
2|Chippy,
D1|just won its first grooming competition.
```
If you installed iuextract as a python module, you can still run the program via CLI with the following command:
```
python -m iuextract My dog, Chippy, just won its first grooming competition.
```

**Note:** When running from CLI, all positional arguments are grouped into a single string and parsed as input text. If you need to use named arguments put them before the input text or use the `-i` argument to parse a file as input.
### Input text from file
You can run iuextract with the `-i` argument to parse a file.
For example
```
iuextract -i input_file.txt
```
will read `input_file.txt` from the working directory and output the segmentation to the console.

### Output file
You can specify an output file with the `-o` parameter.
```
iuextract -i input_file.txt -o output_file.txt
```
This command will segment `input_file.txt` and put the resulting segmentation into `output_file.txt`.

### Additional arguments
For additional arguments, such as specifying the separator between the IUs and the index, you can call iuextract with the help argument and get a list of possible arguments.
```
iuextract -h
```

## Usage as module

Simple text segmentation:
```
from iuextract.extract import segment_ius

text = "My dog, Chippy, just won its first grooming competition."
print(segment_ius(text, mode='str'))
```
```
D1|My dog, 
2|Chippy, 
D1|just won its first grooming competition.
```

For more examples check `example.ipynb`

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/TT-CL/iuextract",
    "name": "IUExtract",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "Idea Unit, textual segmentation, segmentation, linguistics",
    "author": "Gecchele Marcello",
    "author_email": "Marcello Gecchele <linked.uno@pm.me>",
    "download_url": "https://files.pythonhosted.org/packages/27/b6/07b621d1517b36330be6a2eef01e0a41db8eb3f8076988d37a9e36089681/iuextract-1.0.8.tar.gz",
    "platform": null,
    "description": "# IUExtract\nRule-based Idea Unit segmentation algorithm for the English language.\n\n## Example Segmentation\n```\nMy dog, Chippy, just won its first grooming competition.\n```\nWill be segmented into the following Idea Units:\n```\nD1|My dog,\n2|Chippy,\nD1|just\n```\nEach line denotes a segment. At the beginning of each line there is an Idea Unit index. Each Unit is assigned an index in sequential order. Discontinuous Units are prefixed by the character \"D\". Naturally, these indexes can be found on multiple lines and the complete Idea Unit can be obtained by joining the lines with the same index.\n## Installation\n### Installation as standalone executable via pipx\nTo install the package as a command line tool first install pipx. Specific instructions for your operative system can be found [here](https://pipx.pypa.io/latest/installation/).\nIf you have python installed, you can install pipx with the following commands:\n```\npython3 -m pip install -U pipx\npython3 -m pipx ensurepath\n```\nAfter pipx is installed, you can install IUExtract with the following command:\n```\npipx install iuextract\n```\nIf the install fails you might want to try to pin a specific python version with the following command:\n```\npipx install iuextract --python 3.9\n```\n**Note:** on first run, the program will download the Spacy model `en_core_web_lg`. This could take some time. A custom Spacy model can be selected if you install iuextract as a python module.\n\n### Installation as a python module\nIf you want to use IUExtract in your python projects you will need to install it as a regular python module.\nFirst of all, you need to install the dependencies:\n```\npip install spacy\npython -m spacy download en_core_web_lg\n```\nYou can then install IUExtract.\n```\npip install iuextract\n```\n## Command Line Interface (CLI) Usage\nOnce installed via `pipx`, you can run iuextract directly from the CLI.\n\nExample:\n```\niuextract My dog, Chippy, just won its first grooming competition.\n```\nwill output\n```\nD1|My dog,\n2|Chippy,\nD1|just won its first grooming competition.\n```\nIf you installed iuextract as a python module, you can still run the program via CLI with the following command:\n```\npython -m iuextract My dog, Chippy, just won its first grooming competition.\n```\n\n**Note:** When running from CLI, all positional arguments are grouped into a single string and parsed as input text. If you need to use named arguments put them before the input text or use the `-i` argument to parse a file as input.\n### Input text from file\nYou can run iuextract with the `-i` argument to parse a file.\nFor example\n```\niuextract -i input_file.txt\n```\nwill read `input_file.txt` from the working directory and output the segmentation to the console.\n\n### Output file\nYou can specify an output file with the `-o` parameter.\n```\niuextract -i input_file.txt -o output_file.txt\n```\nThis command will segment `input_file.txt` and put the resulting segmentation into `output_file.txt`.\n\n### Additional arguments\nFor additional arguments, such as specifying the separator between the IUs and the index, you can call iuextract with the help argument and get a list of possible arguments.\n```\niuextract -h\n```\n\n## Usage as module\n\nSimple text segmentation:\n```\nfrom iuextract.extract import segment_ius\n\ntext = \"My dog, Chippy, just won its first grooming competition.\"\nprint(segment_ius(text, mode='str'))\n```\n```\nD1|My dog, \n2|Chippy, \nD1|just won its first grooming competition.\n```\n\nFor more examples check `example.ipynb`\n",
    "bugtrack_url": null,
    "license": "The Clear BSD License  Copyright (c) 2022 Marcello Gecchele, Tokunaga Laboratory of Computational Linguistics, Tokyo Institute of Technology All rights reserved.  Redistribution and use in source and binary forms, with or without modification, are permitted (subject to the limitations in the disclaimer below) provided that the following conditions are met:  * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.  * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.  * Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.  NO EXPRESS OR IMPLIED LICENSES TO ANY PARTY'S PATENT RIGHTS ARE GRANTED BY THIS LICENSE. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS \"AS IS\" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.",
    "summary": "Rule-based Idea Unit segmentation algorithm for the English language.",
    "version": "1.0.8",
    "project_urls": {
        "Documentation": "https://github.com/TT-CL/iuextract",
        "Homepage": "https://tt-cl.github.io/iu-resources/",
        "Issues": "https://github.com/TT-CL/iuextract/issues",
        "Repository": "https://github.com/TT-CL/iuextract.git"
    },
    "split_keywords": [
        "idea unit",
        " textual segmentation",
        " segmentation",
        " linguistics"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d63adc713b54fb263e32b147b8c7319654b918cb9a9ebbad9b6919759a929453",
                "md5": "c8c5b90b5716b98369acacf58a46a836",
                "sha256": "fc78e82932afe27741890865f5dcd3fa53dc6a7aa4f65e50e28373441fa61832"
            },
            "downloads": -1,
            "filename": "IUExtract-1.0.8-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c8c5b90b5716b98369acacf58a46a836",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 19952,
            "upload_time": "2024-12-17T13:41:13",
            "upload_time_iso_8601": "2024-12-17T13:41:13.823485Z",
            "url": "https://files.pythonhosted.org/packages/d6/3a/dc713b54fb263e32b147b8c7319654b918cb9a9ebbad9b6919759a929453/IUExtract-1.0.8-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "27b607b621d1517b36330be6a2eef01e0a41db8eb3f8076988d37a9e36089681",
                "md5": "b66e2ea7a74cef83786d207336e70801",
                "sha256": "b4a34eb931cf2c2eafb3471db0e36cd1d1b1ebf9491badbfcf01e5e5fb0a0b51"
            },
            "downloads": -1,
            "filename": "iuextract-1.0.8.tar.gz",
            "has_sig": false,
            "md5_digest": "b66e2ea7a74cef83786d207336e70801",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 17442,
            "upload_time": "2024-12-17T13:41:22",
            "upload_time_iso_8601": "2024-12-17T13:41:22.354715Z",
            "url": "https://files.pythonhosted.org/packages/27/b6/07b621d1517b36330be6a2eef01e0a41db8eb3f8076988d37a9e36089681/iuextract-1.0.8.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-17 13:41:22",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "TT-CL",
    "github_project": "iuextract",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "iuextract"
}
        
Elapsed time: 0.41902s