====================================
PELICAN_nlp
====================================
pelican_nlp stands for "Preprocessing and Extraction of Linguistic Information for Computational Analysis - Natural Language Processing". This package enables the creation of standardized and reproducible language processing pipelines, extracting linguistic features from various tasks like discourse, fluency, and image descriptions.
.. image:: https://img.shields.io/pypi/v/package-name.svg
:target: https://pypi.org/project/pelican_nlp/
:alt: PyPI version
.. image:: https://img.shields.io/github/license/username/package-name.svg
:target: https://github.com/ypauli/pelican_nlp/blob/main/LICENSE
:alt: License
.. image:: https://img.shields.io/pypi/pyversions/package-name.svg
:target: https://pypi.org/project/pelican_nlp/
:alt: Supported Python Versions
Installation
============
Install the package using pip:
.. code-block:: bash
pip install pelican_nlp
For the latest development version:
.. code-block:: bash
pip install https://github.com/ypauli/pelican_nlp/releases/tag/v0.1.2-alpha
Usage
=====
To use the pelican_nlp package:
Adapt your configuration file to your needs.
ALWAYS change the specified project folder location.
Save configuration file to main project directory.
Run from command line:
Navigate to main project directory in command line and enter the following command (Note: Folder must contain your subjects folder and your configuration.yml file):
.. code-block:: bash
pelican-run
Run with python script:
Create python file with IDE of your choice (e.g. Visual Studio Code, Pycharm, etc.) and copy the following code into the file:
.. code-block:: python
from pelican_nlp.main import Pelican
configuration_file = "/path/to/your/config/file.yml"
pelican = Pelican(configuration_file)
pelican.run()
Replace "/path/to/your/config/file" with the path to your configuration file located in your main project folder.
For reliable operation, data must be stored in the *Language Processing Data Structure (LPDS)* format, inspired by brain imaging data structure conventions.
Text and audio files should follow this naming convention:
[subjectID]_[sessionID]_[task]_[task-supplement]_[corpus].[extension]
- subjectID: ID of subject (e.g., sub-01), mandatory
- sessionID: ID of session (e.g., ses-01), if available
- task: task used for file creation, mandatory
- task-supplement: additional information regarding the task, if available
- corpus: (e.g., healthy-control / patient) specify files belonging to the same group, mandatory
- extension: file extension (e.g., txt / pdf / docx / rtf), mandatory
Example filenames:
- sub-01_interview_schizophrenia.rtf
- sub-03_ses-02_fluency_semantic_animals.docx
To optimize performance, close other programs and limit GPU usage during language processing.
Features
========
- **Feature 1: Cleaning text files**
- Handles whitespaces, timestamps, punctuation, special characters, and case-sensitivity.
- **Feature 2: Linguistic Feature Extraction**
- Extracts semantic embeddings, logits, distance from optimality, and semantic similarity.
Examples
========
You can find example setups in the [`examples/`](https://github.com/ypauli/pelican_nlp/tree/main/examples) folder.
ALWAYS change the path to the project folder specified in the configuration file to your specific project location.
Contributing
============
Contributions are welcome! Please check out the `contributing guide <https://github.com/ypauli/pelican_nlp/blob/main/CONTRIBUTING.md>`_.
License
=======
This project is licensed under Attribution-NonCommercial 4.0 International. See the `LICENSE <https://github.com/ypauli/pelican_nlp/blob/main/LICENSE>`_ file for details.
Raw data
{
"_id": null,
"home_page": null,
"name": "pelican-nlp",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "nlp, linguistics, preprocessing, language-processing, text-analysis",
"author": null,
"author_email": "Yves Pauli <yves.pauli@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/53/14/60da34012f894d28ddb3618beb85ec14d3826493fdc2309a1c3262c26d0d/pelican_nlp-0.2.5.tar.gz",
"platform": null,
"description": "====================================\nPELICAN_nlp\n====================================\n\npelican_nlp stands for \"Preprocessing and Extraction of Linguistic Information for Computational Analysis - Natural Language Processing\". This package enables the creation of standardized and reproducible language processing pipelines, extracting linguistic features from various tasks like discourse, fluency, and image descriptions.\n\n.. image:: https://img.shields.io/pypi/v/package-name.svg\n :target: https://pypi.org/project/pelican_nlp/\n :alt: PyPI version\n\n.. image:: https://img.shields.io/github/license/username/package-name.svg\n :target: https://github.com/ypauli/pelican_nlp/blob/main/LICENSE\n :alt: License\n\n.. image:: https://img.shields.io/pypi/pyversions/package-name.svg\n :target: https://pypi.org/project/pelican_nlp/\n :alt: Supported Python Versions\n\nInstallation\n============\n\nInstall the package using pip:\n\n.. code-block:: bash\n\n pip install pelican_nlp\n\nFor the latest development version:\n\n.. code-block:: bash\n\n pip install https://github.com/ypauli/pelican_nlp/releases/tag/v0.1.2-alpha\n\nUsage\n=====\n\nTo use the pelican_nlp package:\n\nAdapt your configuration file to your needs.\nALWAYS change the specified project folder location.\n\nSave configuration file to main project directory.\n\nRun from command line:\n\nNavigate to main project directory in command line and enter the following command (Note: Folder must contain your subjects folder and your configuration.yml file):\n\n.. code-block:: bash\n\n pelican-run\n\n\nRun with python script:\n\nCreate python file with IDE of your choice (e.g. Visual Studio Code, Pycharm, etc.) and copy the following code into the file:\n\n.. code-block:: python\n\n from pelican_nlp.main import Pelican\n\n configuration_file = \"/path/to/your/config/file.yml\"\n pelican = Pelican(configuration_file)\n pelican.run()\n\nReplace \"/path/to/your/config/file\" with the path to your configuration file located in your main project folder.\n\nFor reliable operation, data must be stored in the *Language Processing Data Structure (LPDS)* format, inspired by brain imaging data structure conventions.\n\nText and audio files should follow this naming convention:\n\n[subjectID]_[sessionID]_[task]_[task-supplement]_[corpus].[extension]\n\n- subjectID: ID of subject (e.g., sub-01), mandatory\n- sessionID: ID of session (e.g., ses-01), if available\n- task: task used for file creation, mandatory\n- task-supplement: additional information regarding the task, if available\n- corpus: (e.g., healthy-control / patient) specify files belonging to the same group, mandatory\n- extension: file extension (e.g., txt / pdf / docx / rtf), mandatory\n\nExample filenames:\n\n- sub-01_interview_schizophrenia.rtf\n- sub-03_ses-02_fluency_semantic_animals.docx\n\nTo optimize performance, close other programs and limit GPU usage during language processing.\n\nFeatures\n========\n\n- **Feature 1: Cleaning text files**\n - Handles whitespaces, timestamps, punctuation, special characters, and case-sensitivity.\n\n- **Feature 2: Linguistic Feature Extraction**\n - Extracts semantic embeddings, logits, distance from optimality, and semantic similarity.\n\nExamples\n========\n\nYou can find example setups in the [`examples/`](https://github.com/ypauli/pelican_nlp/tree/main/examples) folder.\nALWAYS change the path to the project folder specified in the configuration file to your specific project location.\n\nContributing\n============\n\nContributions are welcome! Please check out the `contributing guide <https://github.com/ypauli/pelican_nlp/blob/main/CONTRIBUTING.md>`_.\n\nLicense\n=======\n\nThis project is licensed under Attribution-NonCommercial 4.0 International. See the `LICENSE <https://github.com/ypauli/pelican_nlp/blob/main/LICENSE>`_ file for details.\n",
"bugtrack_url": null,
"license": null,
"summary": "Preprocessing and Extraction of Linguistic Information for Computational Analysis",
"version": "0.2.5",
"project_urls": {
"Bug Tracker": "https://github.com/ypauli/pelican_nlp/issues",
"Documentation": "https://github.com/ypauli/pelican_nlp#readme",
"Homepage": "https://github.com/ypauli/pelican_nlp",
"Repository": "https://github.com/ypauli/pelican_nlp"
},
"split_keywords": [
"nlp",
" linguistics",
" preprocessing",
" language-processing",
" text-analysis"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "38e242f1e5aed0594b9f89b2dabbcc95fa7a263caf49d0f0645f91478df3734d",
"md5": "7d9f0da33862c19bc42802185a7c0f39",
"sha256": "401d409d65d2c327cb191a5129255239225b7ef2babef87d25e8ee6022301d13"
},
"downloads": -1,
"filename": "pelican_nlp-0.2.5-py3-none-any.whl",
"has_sig": false,
"md5_digest": "7d9f0da33862c19bc42802185a7c0f39",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 307359,
"upload_time": "2025-04-09T14:37:38",
"upload_time_iso_8601": "2025-04-09T14:37:38.814628Z",
"url": "https://files.pythonhosted.org/packages/38/e2/42f1e5aed0594b9f89b2dabbcc95fa7a263caf49d0f0645f91478df3734d/pelican_nlp-0.2.5-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "531460da34012f894d28ddb3618beb85ec14d3826493fdc2309a1c3262c26d0d",
"md5": "6c573b6ba299796c9332815a394c6353",
"sha256": "0ebb7a38dd9e4d8336d1a74f8b71fdcff3c51f851fc7a20eb9bbf8aa17a9fa53"
},
"downloads": -1,
"filename": "pelican_nlp-0.2.5.tar.gz",
"has_sig": false,
"md5_digest": "6c573b6ba299796c9332815a394c6353",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 318892,
"upload_time": "2025-04-09T14:37:42",
"upload_time_iso_8601": "2025-04-09T14:37:42.490078Z",
"url": "https://files.pythonhosted.org/packages/53/14/60da34012f894d28ddb3618beb85ec14d3826493fdc2309a1c3262c26d0d/pelican_nlp-0.2.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-04-09 14:37:42",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "ypauli",
"github_project": "pelican_nlp",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "accelerate",
"specs": [
[
"==",
"1.4.0"
]
]
},
{
"name": "aiohappyeyeballs",
"specs": [
[
"==",
"2.4.4"
]
]
},
{
"name": "aiohttp",
"specs": [
[
"==",
"3.11.10"
]
]
},
{
"name": "aiosignal",
"specs": [
[
"==",
"1.2.0"
]
]
},
{
"name": "annotated-types",
"specs": [
[
"==",
"0.6.0"
]
]
},
{
"name": "async-timeout",
"specs": [
[
"==",
"5.0.1"
]
]
},
{
"name": "attrs",
"specs": [
[
"==",
"24.3.0"
]
]
},
{
"name": "cython-blis",
"specs": [
[
"==",
"1.0.1"
]
]
},
{
"name": "Bottleneck",
"specs": [
[
"==",
"1.4.2"
]
]
},
{
"name": "Brotli-python",
"specs": [
[
"==",
"1.0.9"
]
]
},
{
"name": "catalogue",
"specs": [
[
"==",
"2.0.10"
]
]
},
{
"name": "certifi",
"specs": [
[
"==",
"2025.1.31"
]
]
},
{
"name": "charset-normalizer",
"specs": [
[
"==",
"3.3.2"
]
]
},
{
"name": "click",
"specs": [
[
"==",
"8.1.7"
]
]
},
{
"name": "cloudpathlib",
"specs": [
[
"==",
"0.16.0"
]
]
},
{
"name": "colorama",
"specs": [
[
"==",
"0.4.6"
]
]
},
{
"name": "confection",
"specs": [
[
"==",
"0.1.4"
]
]
},
{
"name": "cymem",
"specs": [
[
"==",
"2.0.6"
]
]
},
{
"name": "datasets",
"specs": [
[
"==",
"3.3.2"
]
]
},
{
"name": "Deprecated",
"specs": [
[
"==",
"1.2.13"
]
]
},
{
"name": "dill",
"specs": [
[
"==",
"0.3.8"
]
]
},
{
"name": "fasttext-wheel",
"specs": [
[
"==",
"0.9.2"
]
]
},
{
"name": "filelock",
"specs": [
[
"==",
"3.13.1"
]
]
},
{
"name": "frozenlist",
"specs": [
[
"==",
"1.5.0"
]
]
},
{
"name": "fsspec",
"specs": [
[
"==",
"2024.12.0"
]
]
},
{
"name": "gmpy2",
"specs": [
[
"==",
"2.2.1"
]
]
},
{
"name": "huggingface_hub",
"specs": [
[
"==",
"0.29.2"
]
]
},
{
"name": "idna",
"specs": [
[
"==",
"3.7"
]
]
},
{
"name": "importlib_metadata",
"specs": [
[
"==",
"8.5.0"
]
]
},
{
"name": "Jinja2",
"specs": [
[
"==",
"3.1.6"
]
]
},
{
"name": "langcodes",
"specs": [
[
"==",
"3.3.0"
]
]
},
{
"name": "lit",
"specs": [
[
"==",
"19.1.7.dev0"
]
]
},
{
"name": "markdown-it-py",
"specs": [
[
"==",
"2.2.0"
]
]
},
{
"name": "MarkupSafe",
"specs": [
[
"==",
"3.0.2"
]
]
},
{
"name": "mdurl",
"specs": [
[
"==",
"0.1.0"
]
]
},
{
"name": "mkl_fft",
"specs": [
[
"==",
"1.3.11"
]
]
},
{
"name": "mkl_random",
"specs": [
[
"==",
"1.2.8"
]
]
},
{
"name": "mkl-service",
"specs": [
[
"==",
"2.4.0"
]
]
},
{
"name": "mpmath",
"specs": [
[
"==",
"1.3.0"
]
]
},
{
"name": "multidict",
"specs": [
[
"==",
"6.1.0"
]
]
},
{
"name": "multiprocess",
"specs": [
[
"==",
"0.70.15"
]
]
},
{
"name": "murmurhash",
"specs": [
[
"==",
"1.0.12"
]
]
},
{
"name": "networkx",
"specs": [
[
"==",
"3.4.2"
]
]
},
{
"name": "numexpr",
"specs": [
[
"==",
"2.10.1"
]
]
},
{
"name": "numpy",
"specs": [
[
"==",
"2.0.1"
]
]
},
{
"name": "opentelemetry-api",
"specs": [
[
"==",
"1.30.0"
]
]
},
{
"name": "packaging",
"specs": [
[
"==",
"24.2"
]
]
},
{
"name": "pandas",
"specs": [
[
"==",
"2.2.3"
]
]
},
{
"name": "pip",
"specs": [
[
"==",
"25.0"
]
]
},
{
"name": "preshed",
"specs": [
[
"==",
"3.0.6"
]
]
},
{
"name": "propcache",
"specs": [
[
"==",
"0.3.1"
]
]
},
{
"name": "protobuf",
"specs": [
[
"==",
"4.25.3"
]
]
},
{
"name": "psutil",
"specs": [
[
">=",
"5.9.0"
]
]
},
{
"name": "pyarrow",
"specs": [
[
"==",
"19.0.0"
]
]
},
{
"name": "pybind11",
"specs": [
[
"==",
"2.13.6"
]
]
},
{
"name": "pybind11_global",
"specs": [
[
"==",
"2.13.6"
]
]
},
{
"name": "pydantic",
"specs": [
[
"==",
"2.10.3"
]
]
},
{
"name": "pydantic_core",
"specs": [
[
"==",
"2.27.1"
]
]
},
{
"name": "Pygments",
"specs": [
[
"==",
"2.15.1"
]
]
},
{
"name": "PySocks",
"specs": [
[
"==",
"1.7.1"
]
]
},
{
"name": "python-dateutil",
"specs": [
[
"==",
"2.9.0.post0"
]
]
},
{
"name": "pytz",
"specs": [
[
"==",
"2024.1"
]
]
},
{
"name": "PyYAML",
"specs": [
[
"==",
"6.0.2"
]
]
},
{
"name": "regex",
"specs": [
[
"==",
"2024.11.6"
]
]
},
{
"name": "requests",
"specs": [
[
"==",
"2.32.3"
]
]
},
{
"name": "rich",
"specs": [
[
"==",
"13.9.4"
]
]
},
{
"name": "safetensors",
"specs": [
[
"==",
"0.5.3"
]
]
},
{
"name": "scipy",
"specs": [
[
"==",
"1.15.2"
]
]
},
{
"name": "setuptools",
"specs": [
[
"==",
"75.8.0"
]
]
},
{
"name": "shellingham",
"specs": [
[
"==",
"1.5.0"
]
]
},
{
"name": "six",
"specs": [
[
"==",
"1.16.0"
]
]
},
{
"name": "smart-open",
"specs": [
[
"==",
"5.2.1"
]
]
},
{
"name": "spacy",
"specs": [
[
"==",
"3.8.2"
]
]
},
{
"name": "spacy-legacy",
"specs": [
[
"==",
"3.0.12"
]
]
},
{
"name": "spacy-loggers",
"specs": [
[
"==",
"1.0.4"
]
]
},
{
"name": "srsly",
"specs": [
[
"==",
"2.5.1"
]
]
},
{
"name": "sympy",
"specs": [
[
"==",
"1.13.3"
]
]
},
{
"name": "thinc",
"specs": [
[
"==",
"8.3.2"
]
]
},
{
"name": "tokenizers",
"specs": [
[
"==",
"0.21.0"
]
]
},
{
"name": "torch",
"specs": [
[
"==",
"2.5.1"
]
]
},
{
"name": "tqdm",
"specs": [
[
"==",
"4.67.1"
]
]
},
{
"name": "transformers",
"specs": [
[
"==",
"4.49.0"
]
]
},
{
"name": "triton",
"specs": [
[
"==",
"3.1.0"
]
]
},
{
"name": "typer",
"specs": [
[
"==",
"0.9.0"
]
]
},
{
"name": "typing_extensions",
"specs": [
[
"==",
"4.12.2"
]
]
},
{
"name": "tzdata",
"specs": [
[
"==",
"2023.3"
]
]
},
{
"name": "urllib3",
"specs": [
[
"==",
"2.3.0"
]
]
},
{
"name": "wasabi",
"specs": [
[
"==",
"0.9.1"
]
]
},
{
"name": "weasel",
"specs": [
[
"==",
"0.3.4"
]
]
},
{
"name": "wheel",
"specs": [
[
"==",
"0.45.1"
]
]
},
{
"name": "wrapt",
"specs": [
[
"==",
"1.17.0"
]
]
},
{
"name": "xxhash",
"specs": [
[
"==",
"3.5.0"
]
]
},
{
"name": "yarl",
"specs": [
[
"==",
"1.18.0"
]
]
},
{
"name": "zipp",
"specs": [
[
"==",
"3.21.0"
]
]
},
{
"name": "yaml",
"specs": [
[
"==",
"0.2.5"
]
]
},
{
"name": "docx2txt",
"specs": [
[
"==",
"0.9"
]
]
},
{
"name": "striprtf",
"specs": [
[
">=",
"0.0.29"
]
]
},
{
"name": "chardet",
"specs": [
[
">=",
"4.0.0"
]
]
},
{
"name": "scikit_learn",
"specs": [
[
">=",
"1.6.1"
]
]
},
{
"name": "matplotlib",
"specs": [
[
">=",
"3.10.0"
]
]
},
{
"name": "seaborn",
"specs": [
[
">=",
"0.13.2"
]
]
},
{
"name": "editdistance",
"specs": [
[
">=",
"0.8.1"
]
]
},
{
"name": "pytest",
"specs": [
[
">=",
"8.3.4"
]
]
},
{
"name": "statsmodels",
"specs": [
[
">=",
"0.14.4"
]
]
}
],
"lcname": "pelican-nlp"
}