# CONLL Transform - functions to manipulate CONLL data
This package constains several functions to manipulate conll data:
- `read_files`: Read one or several conll files and return a dictionary of documents.
- `read_file`: Read a conll file and return dictionary of documents.
- `write_file`: Write a conll file.
- `compute_mentions`: Compute mentions from the raw last column of the conll file.
- `compute_chains`: Compute and return the chains from the conll data.
- `sentpos2textpos`: Transform mentions `[SENT, START, STOP]` to `[TEXT_START, TEXT_STOP]`.
- `textpos2sentpos`: Transform mentions `[TEXT_START, TEXT_STOP]` to `[SENT, START, STOP]`.
- `write_chains`: Convert a list of chains to a conll coreference column.
- `replace_coref_col`: Replace the last column of `tar_docs` by the last column of `src_docs`.
- `remove_singletons`: Remove the singletons of the conll file `infpath`, and write the version without singleton in the conll file `outfpath`.
- `filter_pos`: Filter mentions that have POS in unwanted_pos, return a new mention list.
- `check_no_duplicate_mentions`: Return True if there is no duplicate mentions.
- `merge_boundaries`: Add the mentions of `boundary_docs` to `coref_docs` if they don't already exist, as singletons.
- `remove_col`: Remove columns from all tokens in docs.
- `write_mentions`: Opposite for `compute_mentions()`. Write the last column in `sent`.
- `compare_coref_cols`: Build a conll file that merge the corefcols of several other files.
- `to_corefcol`: Write the conll file `outfpath` with just the last column (coref) of the conll file `infpath`.
- `get_conll_2012_key_pattern`: Return a compiled pattern object to match conll2012 key format.
- `merge_amalgams`: Add amalgams in documents from where they have been removed.
To use it, just import the function from `conll_transform`, for example:
```python
from conll_transform import read_files
documents = read_files("myfile.conll", "myfile2.conll")
print(documents)
```
The source can be found at [GitHub](https://github.com/boberle/corefconversion/blob/master/conll_transform.py).
Raw data
{
"_id": null,
"home_page": "",
"name": "conll-transform",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "",
"keywords": "",
"author": "",
"author_email": "Bruno Oberle <bruno@boberle.com>",
"download_url": "https://files.pythonhosted.org/packages/c5/11/41a298bbce94cc05f607766666ffbaccaec6ba5f8ff4973afb3f191a359e/conll_transform-0.1.0.tar.gz",
"platform": null,
"description": "# CONLL Transform - functions to manipulate CONLL data\n\nThis package constains several functions to manipulate conll data:\n\n- `read_files`: Read one or several conll files and return a dictionary of documents.\n- `read_file`: Read a conll file and return dictionary of documents.\n- `write_file`: Write a conll file.\n- `compute_mentions`: Compute mentions from the raw last column of the conll file.\n- `compute_chains`: Compute and return the chains from the conll data.\n- `sentpos2textpos`: Transform mentions `[SENT, START, STOP]` to `[TEXT_START, TEXT_STOP]`.\n- `textpos2sentpos`: Transform mentions `[TEXT_START, TEXT_STOP]` to `[SENT, START, STOP]`.\n- `write_chains`: Convert a list of chains to a conll coreference column.\n- `replace_coref_col`: Replace the last column of `tar_docs` by the last column of `src_docs`.\n- `remove_singletons`: Remove the singletons of the conll file `infpath`, and write the version without singleton in the conll file `outfpath`.\n- `filter_pos`: Filter mentions that have POS in unwanted_pos, return a new mention list.\n- `check_no_duplicate_mentions`: Return True if there is no duplicate mentions.\n- `merge_boundaries`: Add the mentions of `boundary_docs` to `coref_docs` if they don't already exist, as singletons.\n- `remove_col`: Remove columns from all tokens in docs.\n- `write_mentions`: Opposite for `compute_mentions()`. Write the last column in `sent`.\n- `compare_coref_cols`: Build a conll file that merge the corefcols of several other files.\n- `to_corefcol`: Write the conll file `outfpath` with just the last column (coref) of the conll file `infpath`.\n- `get_conll_2012_key_pattern`: Return a compiled pattern object to match conll2012 key format.\n- `merge_amalgams`: Add amalgams in documents from where they have been removed.\n\nTo use it, just import the function from `conll_transform`, for example:\n\n```python\nfrom conll_transform import read_files\n\ndocuments = read_files(\"myfile.conll\", \"myfile2.conll\")\nprint(documents)\n```\n\n\nThe source can be found at [GitHub](https://github.com/boberle/corefconversion/blob/master/conll_transform.py).\n\n",
"bugtrack_url": null,
"license": "",
"summary": "Various functions to manipulate CONLL files",
"version": "0.1.0",
"project_urls": {
"Homepage": "https://github.com/boberle/corefconversion/",
"Issues": "https://github.com/boberle/corefconversion/issues"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "28670fcd538ffc5622029dc9c42060d9fba2656b2c74d6a10b376e54bee3bae0",
"md5": "1c9fea99b60512d8a251e23b2a6d7724",
"sha256": "c06c97b3cb25673d40b66ed25c91ecb91adbb4aae4c229fecda0bfe22ce5d324"
},
"downloads": -1,
"filename": "conll_transform-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "1c9fea99b60512d8a251e23b2a6d7724",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 8648,
"upload_time": "2024-01-16T22:38:43",
"upload_time_iso_8601": "2024-01-16T22:38:43.855647Z",
"url": "https://files.pythonhosted.org/packages/28/67/0fcd538ffc5622029dc9c42060d9fba2656b2c74d6a10b376e54bee3bae0/conll_transform-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "c51141a298bbce94cc05f607766666ffbaccaec6ba5f8ff4973afb3f191a359e",
"md5": "118cf0a8c19b1f1f7c21403c484f3c0c",
"sha256": "26ed32f55f20aef06b39a4af40360ea30948491c6f9daf206e85d8a995e0c395"
},
"downloads": -1,
"filename": "conll_transform-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "118cf0a8c19b1f1f7c21403c484f3c0c",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 8381,
"upload_time": "2024-01-16T22:38:45",
"upload_time_iso_8601": "2024-01-16T22:38:45.690177Z",
"url": "https://files.pythonhosted.org/packages/c5/11/41a298bbce94cc05f607766666ffbaccaec6ba5f8ff4973afb3f191a359e/conll_transform-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-01-16 22:38:45",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "boberle",
"github_project": "corefconversion",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "black",
"specs": [
[
"==",
"23.3.0"
]
]
},
{
"name": "flake8",
"specs": [
[
"==",
"6.0.0"
]
]
},
{
"name": "isort",
"specs": [
[
"==",
"5.12.0"
]
]
},
{
"name": "pytest",
"specs": [
[
"==",
"7.2.2"
]
]
},
{
"name": "mypy",
"specs": [
[
"==",
"1.1.1"
]
]
},
{
"name": "coverage",
"specs": [
[
"==",
"7.2.3"
]
]
},
{
"name": "pandas",
"specs": [
[
"==",
"2.0.0"
]
]
}
],
"lcname": "conll-transform"
}