ideadensity


Nameideadensity JSON
Version 0.4.1 PyPI version JSON
download
home_pagehttps://github.com/jrrobison1/PyCPIDR
SummaryPython library for computing propositional idea density
upload_time2025-03-11 02:15:31
maintainerNone
docs_urlNone
authorJason Robison
requires_python<3.13,>=3.10
licenseGNU GPLv2
keywords nlp idea density linguistics
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # ideadensity
[![PyPI - Version](https://img.shields.io/pypi/v/ideadensity?link=https%3A%2F%2Fpypi.org%2Fproject%2Fideadensity%2F)](https://pypi.org/project/ideadensity/) [![Unit Tests](https://github.com/jrrobison1/pycpidr/actions/workflows/unit_tests.yml/badge.svg)](https://github.com/jrrobison1/pycpidr/actions/workflows/unit_tests.yml) [![Downloads](https://static.pepy.tech/badge/pycpidr)](https://pepy.tech/project/pycpidr)

Python library for computing propositional idea density.

## Table of Contents
- [Introduction](#introduction)
- [What is Idea Density?](#what-is-idea-density)
- [Installation](#installation)
- [Usage](#usage)
  - [CPIDR](#cpidr)
  - [DEPID](#depid)
  - [Command Line Interface](#command-line-interface)
- [Requirements](#requirements)
- [Development Setup](#development-setup)
- [Running Tests](#running-tests)
- [CPIDR Parity with CPIDR 3.2](#cpidr-parity-with-cpidr-32)
- [References](#references)
- [Citing](#citing)
- [Contributing](#contributing)
- [License](#license)

## Introduction

ideadensity is a Python library which determines the propositional idea density of an English text automatically. This project aims to make this functionality more accessible to Python developers and researchers. ideadensity provides two ways of computing idea density:
- CPIDR. The CPIDR implementation in ideadensity is a direct port of the Computerized Propositional Idea Density Rater (CPIDR) 3.2 (Brown et al., 2008) [1]
- DEPID. This library implements the DEPID algorithm described by Sirts et al (2017) [2]

Here's a quick example of how to use ideadensity:
```python
from ideadensity import cpidr, depid

text = "The quick brown fox jumps over the lazy dog."
cpidr_word_count, proposition_count, cpidr_density, word_list = cpidr(text)
depid_density, depid_word_count, dependencies = depid(text)

print(f"CPIDR density: {cpidr_density:.3f}")
print(f"DEPID density: {depid_density:.3f}")
```

## What is Idea Density?

Idea density, also known as propositional density, is a measure of the amount of information conveyed relative to the number of words used. It's calculated by dividing the number of expressed propositions by the number of words. This metric has applications in various fields, including linguistics, cognitive science, and healthcare research.

## Installation

### Using pip
1. Install the package
```bash
pip install ideadensity
```

2. Download the required spaCy model:
```bash
python -m spacy download en_core_web_sm
```

### Using poetry

```bash
poetry add ideadensity
python -m spacy download en_core_web_sm
```

**Note**: This package currently supports Python 3.10-3.12 due to dependency constraints with spaCy and its dependencies. If you're using Python 3.13, you'll need to create a virtual environment with a compatible Python version.


## Usage
### CPIDR
Here's a simple example of how to use CPIDR:

```python
from ideadensity import cpidr

text = "The quick brown fox jumps over the lazy dog."
word_count, proposition_count, density, word_list = cpidr(text)

print(f"Word count: {word_count}")
print(f"Proposition count: {proposition_count}")
print(f"Idea density: {density:.3f}")

# Analyzing speech
speech_text = "Um, you know, I think that, like, the weather is nice today."
word_count, proposition_count, density, word_list = cpidr(speech_text, speech_mode=True)

print(f"Speech mode - Idea density: {density:.3f}")

# Detailed word analysis
for word in word_list.items:
    if word.is_word:
        print(f"Token: {word.token}, Tag: {word.tag}, Is proposition: {word.is_proposition}")
```

#### Speech Mode

ideadensity CPIDR mode supports a speech mode that handles common speech patterns and fillers differently from written text. When analyzing transcripts or spoken language, use the `speech_mode=True` parameter for more accurate results.

### DEPID
Here's an example of how to use the DEPID functionality:
```python
from ideadensity import depid

text = "The quick brown fox jumps over the lazy dog."
density, word_count, dependencies = depid(text)
print(f"Word count: {word_count}")
print(f"Idea density: {density:.3f}")
print("Dependencies:")
for dep in dependencies:
    print(f"Token: {dep[0]}, Dependency: {dep[1]}, Head: {dep[2]}")
```

#### DEPID-R
DEPID-R counts _distinct_ dependencies.

```python
from ideadensity import depid

text = "This is a test of DEPID-R. This is a test of DEPID-R"
density, word_count, dependencies = depid(text, is_depid_r=True)

print(f"DEPID-R idea density: {density:.3f}")
```

#### Using custom filters
ideadensity DEPID mode supports custom filtering of sentences and tokens. By default, ideadensity uses filters described by (Sirts et al., 2017):
- Sentence filter. 
    - Filter out sentences with "I" or "You" as the subject of the sentence (i.e. if the "I" or "You" token dependency is "nsubj" and it's head dependency is the root). 
    - Note: Sirts et al (2017) also filters out vague sentences using SpeciTeller. That is a filter which ideadensity does not yet implement.
- Token filters:
    - Filter out "det" dependencies if the token is "a", "an" or "the".
    - Filter out "nsubj" dependencies if the token is "it" or "this".
    - Filter out all "cc" dependencies.

This example demonstrates how to apply your own custom filters to modify the analysis. The `sentence_filters` and `token_filters` parameters allow you to customize the DEPID algorithm to suit your specific needs.
```python
def custom_sentence_filter(sent):
    return len(sent) > 3
def custom_token_filter(token):
    return token.pos_ != "DET"
text_with_filters = "I run. The quick brown fox jumps over the lazy dog."
density, word_count, dependencies = depid(text_with_filters,
sentence_filters=[custom_sentence_filter],
token_filters=[custom_token_filter])
print(f"\nWith custom filters - Idea density: {density:.3f}")
```

### Command Line Interface
The package includes a command line interface for quick analysis of text:
Command line options:
- `--text TEXT`: Directly provide text for analysis (can include multiple words)
- `--file FILE`: Path to a file containing text to analyze
- `--speech-mode`: Enable speech mode for analyzing transcripts (filters common fillers)
- `--csv CSV`: Export token details to a CSV file at the specified path
- `--txt TXT`: Export results to a TXT file in CPIDR format at the specified path

Note: You must provide either `--text` or `--file` when using the command line interface.

```bash
# Analyze text directly from command line
python main.py --text "The quick brown fox jumps over the lazy dog."

# Analyze text from a file
python main.py --file sample.txt

# Use speech mode with text from a file
python main.py --file transcript.txt --speech-mode

# Export token details to a CSV file
python main.py --text "This is a test sentence." --csv output.csv

# Export results in CPIDR-compatible format to a TXT file
python main.py --text "This is a test sentence." --txt output.txt

# Export in both formats
python main.py --file sample.txt --csv output.csv --txt output.txt
```

### Graphical User Interface
Use one of the provided downloads for your operating system, or clone this repository and run:
```bash
python main.py
```



#### Export Formats

**CSV Export**: The CSV export includes detailed information about each token with the following columns:
- Token: The actual word or token
- Tag: The part-of-speech tag
- Is Word: Whether the token is considered a word (True/False)
- Is Proposition: Whether the token is considered a proposition (True/False)
- Rule Number: The rule number that identified the token as a proposition (if applicable)

**TXT Export**: The TXT export produces a file in a format compatible with the original CPIDR tool:
```
ideadensity 0.2.11

"This is a test sentence...."
 054 PRP  W   This
 200 VBZ  W P is
 201 DT   W   a
 200 JJ   W P test
     NN   W   sentence
     .        .

     2 propositions
     5 words
 0.400 density
```
Each line in the token section includes:
- Rule number (if available)
- Part-of-speech tag
- Word marker (W if the token is a word)
- Proposition marker (P if the token is a proposition)
- The token text


## Requirements

- Python 3.10+
- spaCy 3.7.5+

## Development Setup

To set up the development environment:

1. Clone the repository
2. Install Poetry if you haven't already: `pip install poetry`
3. Install project dependencies: `poetry install`
4. Install the required spaCy model: `poetry run python -m spacy download en_core_web_sm`
5. Activate the virtual environment: `poetry shell`

## Running Tests

To run the tests, use pytest:

```bash
pytest tests/
```

## CPIDR Parity with CPIDR 3.2
Because this port uses spaCy as a part-of-speech tagger instead of the original program's MontyLingua, there is a very slight difference in the reported idea density. This port includes unit tests containing 847 words of text.
ideadensity: 434 propositions. 0.512 idea density
CPIDR 3.2: 436 propositions. 0.515 idea density

For more information about the original CPIDR 3.2, please visit [CASPR's official page](http://ai1.ai.uga.edu/caspr/).

## References
[1] Brown, C., Snodgrass, T., Kemper, S. J., Herman, R., & Covington, M. A. (2008). Automatic measurement of propositional idea density from part-of-speech tagging. Behavior research methods, 40(2), 540-545.

[2] Sirts, K., Piguet, O., & Johnson, M. (2017). Idea density for predicting Alzheimer's disease from transcribed speech. arXiv preprint arXiv:1706.04473.

## Citing
If you use this project in your research, you may cite it as: 

Jason Robison. (2024). *ideadensity* (0.2.0) [Source code]. GitHub. https://github.com/jrrobison1/ideadensity


## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

1. Fork the repository
2. Create your feature branch (`git checkout -b feature/AmazingFeature`)
3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the branch (`git push origin feature/AmazingFeature`)
5. Open a Pull Request

Please ensure that your code passes all tests and follows the project's coding style.

## License
This project is licensed under the GNU General Public License v2.0. See the [LICENSE](LICENSE) file for details.

ideadensity's CPIDR implementation is a port of the original CPIDR 3.2, which was released under GPL v2. This project maintains the same license to comply with the terms of the original software.


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/jrrobison1/PyCPIDR",
    "name": "ideadensity",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.13,>=3.10",
    "maintainer_email": null,
    "keywords": "nlp, idea density, linguistics",
    "author": "Jason Robison",
    "author_email": "jrrobison@protonmail.com",
    "download_url": "https://files.pythonhosted.org/packages/1f/6e/b8c1c8a7b21b556b76fd84a095e1c19be7eae4bcf7adbcc52819294cafca/ideadensity-0.4.1.tar.gz",
    "platform": null,
    "description": "# ideadensity\n[![PyPI - Version](https://img.shields.io/pypi/v/ideadensity?link=https%3A%2F%2Fpypi.org%2Fproject%2Fideadensity%2F)](https://pypi.org/project/ideadensity/) [![Unit Tests](https://github.com/jrrobison1/pycpidr/actions/workflows/unit_tests.yml/badge.svg)](https://github.com/jrrobison1/pycpidr/actions/workflows/unit_tests.yml) [![Downloads](https://static.pepy.tech/badge/pycpidr)](https://pepy.tech/project/pycpidr)\n\nPython library for computing propositional idea density.\n\n## Table of Contents\n- [Introduction](#introduction)\n- [What is Idea Density?](#what-is-idea-density)\n- [Installation](#installation)\n- [Usage](#usage)\n  - [CPIDR](#cpidr)\n  - [DEPID](#depid)\n  - [Command Line Interface](#command-line-interface)\n- [Requirements](#requirements)\n- [Development Setup](#development-setup)\n- [Running Tests](#running-tests)\n- [CPIDR Parity with CPIDR 3.2](#cpidr-parity-with-cpidr-32)\n- [References](#references)\n- [Citing](#citing)\n- [Contributing](#contributing)\n- [License](#license)\n\n## Introduction\n\nideadensity is a Python library which determines the propositional idea density of an English text automatically. This project aims to make this functionality more accessible to Python developers and researchers. ideadensity provides two ways of computing idea density:\n- CPIDR. The CPIDR implementation in ideadensity is a direct port of the Computerized Propositional Idea Density Rater (CPIDR) 3.2 (Brown et al., 2008) [1]\n- DEPID. This library implements the DEPID algorithm described by Sirts et al (2017) [2]\n\nHere's a quick example of how to use ideadensity:\n```python\nfrom ideadensity import cpidr, depid\n\ntext = \"The quick brown fox jumps over the lazy dog.\"\ncpidr_word_count, proposition_count, cpidr_density, word_list = cpidr(text)\ndepid_density, depid_word_count, dependencies = depid(text)\n\nprint(f\"CPIDR density: {cpidr_density:.3f}\")\nprint(f\"DEPID density: {depid_density:.3f}\")\n```\n\n## What is Idea Density?\n\nIdea density, also known as propositional density, is a measure of the amount of information conveyed relative to the number of words used. It's calculated by dividing the number of expressed propositions by the number of words. This metric has applications in various fields, including linguistics, cognitive science, and healthcare research.\n\n## Installation\n\n### Using pip\n1. Install the package\n```bash\npip install ideadensity\n```\n\n2. Download the required spaCy model:\n```bash\npython -m spacy download en_core_web_sm\n```\n\n### Using poetry\n\n```bash\npoetry add ideadensity\npython -m spacy download en_core_web_sm\n```\n\n**Note**: This package currently supports Python 3.10-3.12 due to dependency constraints with spaCy and its dependencies. If you're using Python 3.13, you'll need to create a virtual environment with a compatible Python version.\n\n\n## Usage\n### CPIDR\nHere's a simple example of how to use CPIDR:\n\n```python\nfrom ideadensity import cpidr\n\ntext = \"The quick brown fox jumps over the lazy dog.\"\nword_count, proposition_count, density, word_list = cpidr(text)\n\nprint(f\"Word count: {word_count}\")\nprint(f\"Proposition count: {proposition_count}\")\nprint(f\"Idea density: {density:.3f}\")\n\n# Analyzing speech\nspeech_text = \"Um, you know, I think that, like, the weather is nice today.\"\nword_count, proposition_count, density, word_list = cpidr(speech_text, speech_mode=True)\n\nprint(f\"Speech mode - Idea density: {density:.3f}\")\n\n# Detailed word analysis\nfor word in word_list.items:\n    if word.is_word:\n        print(f\"Token: {word.token}, Tag: {word.tag}, Is proposition: {word.is_proposition}\")\n```\n\n#### Speech Mode\n\nideadensity CPIDR mode supports a speech mode that handles common speech patterns and fillers differently from written text. When analyzing transcripts or spoken language, use the `speech_mode=True` parameter for more accurate results.\n\n### DEPID\nHere's an example of how to use the DEPID functionality:\n```python\nfrom ideadensity import depid\n\ntext = \"The quick brown fox jumps over the lazy dog.\"\ndensity, word_count, dependencies = depid(text)\nprint(f\"Word count: {word_count}\")\nprint(f\"Idea density: {density:.3f}\")\nprint(\"Dependencies:\")\nfor dep in dependencies:\n    print(f\"Token: {dep[0]}, Dependency: {dep[1]}, Head: {dep[2]}\")\n```\n\n#### DEPID-R\nDEPID-R counts _distinct_ dependencies.\n\n```python\nfrom ideadensity import depid\n\ntext = \"This is a test of DEPID-R. This is a test of DEPID-R\"\ndensity, word_count, dependencies = depid(text, is_depid_r=True)\n\nprint(f\"DEPID-R idea density: {density:.3f}\")\n```\n\n#### Using custom filters\nideadensity DEPID mode supports custom filtering of sentences and tokens. By default, ideadensity uses filters described by (Sirts et al., 2017):\n- Sentence filter. \n    - Filter out sentences with \"I\" or \"You\" as the subject of the sentence (i.e. if the \"I\" or \"You\" token dependency is \"nsubj\" and it's head dependency is the root). \n    - Note: Sirts et al (2017) also filters out vague sentences using SpeciTeller. That is a filter which ideadensity does not yet implement.\n- Token filters:\n    - Filter out \"det\" dependencies if the token is \"a\", \"an\" or \"the\".\n    - Filter out \"nsubj\" dependencies if the token is \"it\" or \"this\".\n    - Filter out all \"cc\" dependencies.\n\nThis example demonstrates how to apply your own custom filters to modify the analysis. The `sentence_filters` and `token_filters` parameters allow you to customize the DEPID algorithm to suit your specific needs.\n```python\ndef custom_sentence_filter(sent):\n    return len(sent) > 3\ndef custom_token_filter(token):\n    return token.pos_ != \"DET\"\ntext_with_filters = \"I run. The quick brown fox jumps over the lazy dog.\"\ndensity, word_count, dependencies = depid(text_with_filters,\nsentence_filters=[custom_sentence_filter],\ntoken_filters=[custom_token_filter])\nprint(f\"\\nWith custom filters - Idea density: {density:.3f}\")\n```\n\n### Command Line Interface\nThe package includes a command line interface for quick analysis of text:\nCommand line options:\n- `--text TEXT`: Directly provide text for analysis (can include multiple words)\n- `--file FILE`: Path to a file containing text to analyze\n- `--speech-mode`: Enable speech mode for analyzing transcripts (filters common fillers)\n- `--csv CSV`: Export token details to a CSV file at the specified path\n- `--txt TXT`: Export results to a TXT file in CPIDR format at the specified path\n\nNote: You must provide either `--text` or `--file` when using the command line interface.\n\n```bash\n# Analyze text directly from command line\npython main.py --text \"The quick brown fox jumps over the lazy dog.\"\n\n# Analyze text from a file\npython main.py --file sample.txt\n\n# Use speech mode with text from a file\npython main.py --file transcript.txt --speech-mode\n\n# Export token details to a CSV file\npython main.py --text \"This is a test sentence.\" --csv output.csv\n\n# Export results in CPIDR-compatible format to a TXT file\npython main.py --text \"This is a test sentence.\" --txt output.txt\n\n# Export in both formats\npython main.py --file sample.txt --csv output.csv --txt output.txt\n```\n\n### Graphical User Interface\nUse one of the provided downloads for your operating system, or clone this repository and run:\n```bash\npython main.py\n```\n\n\n\n#### Export Formats\n\n**CSV Export**: The CSV export includes detailed information about each token with the following columns:\n- Token: The actual word or token\n- Tag: The part-of-speech tag\n- Is Word: Whether the token is considered a word (True/False)\n- Is Proposition: Whether the token is considered a proposition (True/False)\n- Rule Number: The rule number that identified the token as a proposition (if applicable)\n\n**TXT Export**: The TXT export produces a file in a format compatible with the original CPIDR tool:\n```\nideadensity 0.2.11\n\n\"This is a test sentence....\"\n 054 PRP  W   This\n 200 VBZ  W P is\n 201 DT   W   a\n 200 JJ   W P test\n     NN   W   sentence\n     .        .\n\n     2 propositions\n     5 words\n 0.400 density\n```\nEach line in the token section includes:\n- Rule number (if available)\n- Part-of-speech tag\n- Word marker (W if the token is a word)\n- Proposition marker (P if the token is a proposition)\n- The token text\n\n\n## Requirements\n\n- Python 3.10+\n- spaCy 3.7.5+\n\n## Development Setup\n\nTo set up the development environment:\n\n1. Clone the repository\n2. Install Poetry if you haven't already: `pip install poetry`\n3. Install project dependencies: `poetry install`\n4. Install the required spaCy model: `poetry run python -m spacy download en_core_web_sm`\n5. Activate the virtual environment: `poetry shell`\n\n## Running Tests\n\nTo run the tests, use pytest:\n\n```bash\npytest tests/\n```\n\n## CPIDR Parity with CPIDR 3.2\nBecause this port uses spaCy as a part-of-speech tagger instead of the original program's MontyLingua, there is a very slight difference in the reported idea density. This port includes unit tests containing 847 words of text.\nideadensity: 434 propositions. 0.512 idea density\nCPIDR 3.2: 436 propositions. 0.515 idea density\n\nFor more information about the original CPIDR 3.2, please visit [CASPR's official page](http://ai1.ai.uga.edu/caspr/).\n\n## References\n[1] Brown, C., Snodgrass, T., Kemper, S. J., Herman, R., & Covington, M. A. (2008). Automatic measurement of propositional idea density from part-of-speech tagging. Behavior research methods, 40(2), 540-545.\n\n[2] Sirts, K., Piguet, O., & Johnson, M. (2017). Idea density for predicting Alzheimer's disease from transcribed speech. arXiv preprint arXiv:1706.04473.\n\n## Citing\nIf you use this project in your research, you may cite it as: \n\nJason Robison. (2024). *ideadensity* (0.2.0) [Source code]. GitHub. https://github.com/jrrobison1/ideadensity\n\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n1. Fork the repository\n2. Create your feature branch (`git checkout -b feature/AmazingFeature`)\n3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)\n4. Push to the branch (`git push origin feature/AmazingFeature`)\n5. Open a Pull Request\n\nPlease ensure that your code passes all tests and follows the project's coding style.\n\n## License\nThis project is licensed under the GNU General Public License v2.0. See the [LICENSE](LICENSE) file for details.\n\nideadensity's CPIDR implementation is a port of the original CPIDR 3.2, which was released under GPL v2. This project maintains the same license to comply with the terms of the original software.\n\n",
    "bugtrack_url": null,
    "license": "GNU GPLv2",
    "summary": "Python library for computing propositional idea density",
    "version": "0.4.1",
    "project_urls": {
        "Homepage": "https://github.com/jrrobison1/PyCPIDR",
        "Repository": "https://github.com/jrrobison1/PyCPIDR"
    },
    "split_keywords": [
        "nlp",
        " idea density",
        " linguistics"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "2454e1706cbdd7429564487685f722bde5e8d0c4541992761fcef8e10e4b422f",
                "md5": "8fd24f29e1fe00f33f1475d1c4223a47",
                "sha256": "ae348d068d19ed1a53227d6412e6f01da0cbcb9fa35c55487dbe1f7a45dedaf8"
            },
            "downloads": -1,
            "filename": "ideadensity-0.4.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "8fd24f29e1fe00f33f1475d1c4223a47",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.13,>=3.10",
            "size": 29942,
            "upload_time": "2025-03-11T02:15:29",
            "upload_time_iso_8601": "2025-03-11T02:15:29.905235Z",
            "url": "https://files.pythonhosted.org/packages/24/54/e1706cbdd7429564487685f722bde5e8d0c4541992761fcef8e10e4b422f/ideadensity-0.4.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "1f6eb8c1c8a7b21b556b76fd84a095e1c19be7eae4bcf7adbcc52819294cafca",
                "md5": "46d88d1d164b8469cd9bee5091dd109d",
                "sha256": "63930ec08f1be13957d1238311e353fa11a5a239324433f2d5723f7a52dedd31"
            },
            "downloads": -1,
            "filename": "ideadensity-0.4.1.tar.gz",
            "has_sig": false,
            "md5_digest": "46d88d1d164b8469cd9bee5091dd109d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.13,>=3.10",
            "size": 29698,
            "upload_time": "2025-03-11T02:15:31",
            "upload_time_iso_8601": "2025-03-11T02:15:31.285501Z",
            "url": "https://files.pythonhosted.org/packages/1f/6e/b8c1c8a7b21b556b76fd84a095e1c19be7eae4bcf7adbcc52819294cafca/ideadensity-0.4.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-03-11 02:15:31",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "jrrobison1",
    "github_project": "PyCPIDR",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "ideadensity"
}
        
Elapsed time: 2.19720s