conll-transform


Nameconll-transform JSON
Version 0.1.0 PyPI version JSON
download
home_page
SummaryVarious functions to manipulate CONLL files
upload_time2024-01-16 22:38:45
maintainer
docs_urlNone
author
requires_python>=3.8
license
keywords
VCS
bugtrack_url
requirements black flake8 isort pytest mypy coverage pandas
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # CONLL Transform - functions to manipulate CONLL data

This package constains several functions to manipulate conll data:

- `read_files`: Read one or several conll files and return a dictionary of documents.
- `read_file`: Read a conll file and return dictionary of documents.
- `write_file`: Write a conll file.
- `compute_mentions`: Compute mentions from the raw last column of the conll file.
- `compute_chains`: Compute and return the chains from the conll data.
- `sentpos2textpos`: Transform mentions `[SENT, START, STOP]` to `[TEXT_START, TEXT_STOP]`.
- `textpos2sentpos`: Transform mentions `[TEXT_START, TEXT_STOP]` to `[SENT, START, STOP]`.
- `write_chains`: Convert a list of chains to a conll coreference column.
- `replace_coref_col`: Replace the last column of `tar_docs` by the last column of `src_docs`.
- `remove_singletons`: Remove the singletons of the conll file `infpath`, and write the version without singleton in the conll file `outfpath`.
- `filter_pos`: Filter mentions that have POS in unwanted_pos, return a new mention list.
- `check_no_duplicate_mentions`: Return True if there is no duplicate mentions.
- `merge_boundaries`: Add the mentions of `boundary_docs` to `coref_docs` if they don't already exist, as singletons.
- `remove_col`: Remove columns from all tokens in docs.
- `write_mentions`: Opposite for `compute_mentions()`.  Write the last column in `sent`.
- `compare_coref_cols`: Build a conll file that merge the corefcols of several other files.
- `to_corefcol`: Write the conll file `outfpath` with just the last column (coref) of the conll file `infpath`.
- `get_conll_2012_key_pattern`: Return a compiled pattern object to match conll2012 key format.
- `merge_amalgams`: Add amalgams in documents from where they have been removed.

To use it, just import the function from `conll_transform`, for example:

```python
from conll_transform import read_files

documents = read_files("myfile.conll", "myfile2.conll")
print(documents)
```


The source can be found at [GitHub](https://github.com/boberle/corefconversion/blob/master/conll_transform.py).


            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "conll-transform",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "",
    "author": "",
    "author_email": "Bruno Oberle <bruno@boberle.com>",
    "download_url": "https://files.pythonhosted.org/packages/c5/11/41a298bbce94cc05f607766666ffbaccaec6ba5f8ff4973afb3f191a359e/conll_transform-0.1.0.tar.gz",
    "platform": null,
    "description": "# CONLL Transform - functions to manipulate CONLL data\n\nThis package constains several functions to manipulate conll data:\n\n- `read_files`: Read one or several conll files and return a dictionary of documents.\n- `read_file`: Read a conll file and return dictionary of documents.\n- `write_file`: Write a conll file.\n- `compute_mentions`: Compute mentions from the raw last column of the conll file.\n- `compute_chains`: Compute and return the chains from the conll data.\n- `sentpos2textpos`: Transform mentions `[SENT, START, STOP]` to `[TEXT_START, TEXT_STOP]`.\n- `textpos2sentpos`: Transform mentions `[TEXT_START, TEXT_STOP]` to `[SENT, START, STOP]`.\n- `write_chains`: Convert a list of chains to a conll coreference column.\n- `replace_coref_col`: Replace the last column of `tar_docs` by the last column of `src_docs`.\n- `remove_singletons`: Remove the singletons of the conll file `infpath`, and write the version without singleton in the conll file `outfpath`.\n- `filter_pos`: Filter mentions that have POS in unwanted_pos, return a new mention list.\n- `check_no_duplicate_mentions`: Return True if there is no duplicate mentions.\n- `merge_boundaries`: Add the mentions of `boundary_docs` to `coref_docs` if they don't already exist, as singletons.\n- `remove_col`: Remove columns from all tokens in docs.\n- `write_mentions`: Opposite for `compute_mentions()`.  Write the last column in `sent`.\n- `compare_coref_cols`: Build a conll file that merge the corefcols of several other files.\n- `to_corefcol`: Write the conll file `outfpath` with just the last column (coref) of the conll file `infpath`.\n- `get_conll_2012_key_pattern`: Return a compiled pattern object to match conll2012 key format.\n- `merge_amalgams`: Add amalgams in documents from where they have been removed.\n\nTo use it, just import the function from `conll_transform`, for example:\n\n```python\nfrom conll_transform import read_files\n\ndocuments = read_files(\"myfile.conll\", \"myfile2.conll\")\nprint(documents)\n```\n\n\nThe source can be found at [GitHub](https://github.com/boberle/corefconversion/blob/master/conll_transform.py).\n\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Various functions to manipulate CONLL files",
    "version": "0.1.0",
    "project_urls": {
        "Homepage": "https://github.com/boberle/corefconversion/",
        "Issues": "https://github.com/boberle/corefconversion/issues"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "28670fcd538ffc5622029dc9c42060d9fba2656b2c74d6a10b376e54bee3bae0",
                "md5": "1c9fea99b60512d8a251e23b2a6d7724",
                "sha256": "c06c97b3cb25673d40b66ed25c91ecb91adbb4aae4c229fecda0bfe22ce5d324"
            },
            "downloads": -1,
            "filename": "conll_transform-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1c9fea99b60512d8a251e23b2a6d7724",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 8648,
            "upload_time": "2024-01-16T22:38:43",
            "upload_time_iso_8601": "2024-01-16T22:38:43.855647Z",
            "url": "https://files.pythonhosted.org/packages/28/67/0fcd538ffc5622029dc9c42060d9fba2656b2c74d6a10b376e54bee3bae0/conll_transform-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c51141a298bbce94cc05f607766666ffbaccaec6ba5f8ff4973afb3f191a359e",
                "md5": "118cf0a8c19b1f1f7c21403c484f3c0c",
                "sha256": "26ed32f55f20aef06b39a4af40360ea30948491c6f9daf206e85d8a995e0c395"
            },
            "downloads": -1,
            "filename": "conll_transform-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "118cf0a8c19b1f1f7c21403c484f3c0c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 8381,
            "upload_time": "2024-01-16T22:38:45",
            "upload_time_iso_8601": "2024-01-16T22:38:45.690177Z",
            "url": "https://files.pythonhosted.org/packages/c5/11/41a298bbce94cc05f607766666ffbaccaec6ba5f8ff4973afb3f191a359e/conll_transform-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-01-16 22:38:45",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "boberle",
    "github_project": "corefconversion",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "black",
            "specs": [
                [
                    "==",
                    "23.3.0"
                ]
            ]
        },
        {
            "name": "flake8",
            "specs": [
                [
                    "==",
                    "6.0.0"
                ]
            ]
        },
        {
            "name": "isort",
            "specs": [
                [
                    "==",
                    "5.12.0"
                ]
            ]
        },
        {
            "name": "pytest",
            "specs": [
                [
                    "==",
                    "7.2.2"
                ]
            ]
        },
        {
            "name": "mypy",
            "specs": [
                [
                    "==",
                    "1.1.1"
                ]
            ]
        },
        {
            "name": "coverage",
            "specs": [
                [
                    "==",
                    "7.2.3"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    "==",
                    "2.0.0"
                ]
            ]
        }
    ],
    "lcname": "conll-transform"
}
        
Elapsed time: 0.39779s