CTApy


NameCTApy JSON
Version 0.1.4 PyPI version JSON
download
home_pagehttps://github.com/twekhof/CTA
SummaryPython package for the Conditional Topic Allocation (CTA)
upload_time2024-08-22 10:06:56
maintainerNone
docs_urlNone
authorTobias Wekhof
requires_python>=3.9
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # `CTApy`

Python package for the "Conditional Topic Allocation" (CTA): a text-analysis method that identifies topics that correlate with numerical outcomes.


* Corresponding research paper: [Conditional Topic Allocations for Open-Ended Survey Responses (2024)](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4190308).


## How does CTA work?


CTA finds topics by conditioning on observables. For example, do Republicans write differently about politics than Democrats?
It consists of three steps:

<br>
1. Predict the outcome variable with text.

* Uses DistilBERT to predict outcome.
 
 <br>
2. Select words with high predictive power (positive or negative).

* Calculates SHAP values for each word and select words with a statistically significant SHAP value.

<br>
3. Group words by semantic similarity.

* Returns topics with either positive or negative correlation with the outcome.

<br>
CTA supports all languages.

## Installation

CTApy requires Python 3.9 and pip.  
It is highly recommended to use a virtual environment (or conda environment) for the installation.

```bash
# upgrade pip, wheel and setuptools
python -m pip install -U pip wheel setuptools

# install the package
python -m pip install -U CTApy
```

If you want to use Jupyter, make sure you have it installed in the current environment.

## Quickstart 

Please see the hands-on tutorials, which replicate the research paper: [https://github.com/twekhof/CTA/tree/main/tutorials](https://github.com/twekhof/CTA/tree/main/tutorials).


## Author

`CTApy` was developed by

[Tobias Wekhof](https://tobiaswekhof.com), ETH Zurich


## Disclaimer

This Python package is a research tool currently under development. The authors take no responsibility for the accuracy or reliability of the results produced by it.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/twekhof/CTA",
    "name": "CTApy",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": null,
    "author": "Tobias Wekhof",
    "author_email": "tobiaswekhof@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/10/f0/5911f184262209332830461edf159a2aff137e786675285fc3497cacf159/ctapy-0.1.4.tar.gz",
    "platform": null,
    "description": "# `CTApy`\r\n\r\nPython package for the \"Conditional Topic Allocation\" (CTA): a text-analysis method that identifies topics that correlate with numerical outcomes.\r\n\r\n\r\n* Corresponding research paper: [Conditional Topic Allocations for Open-Ended Survey Responses (2024)](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4190308).\r\n\r\n\r\n## How does CTA work?\r\n\r\n\r\nCTA finds topics by conditioning on observables. For example, do Republicans write differently about politics than Democrats?\r\nIt consists of three steps:\r\n\r\n<br>\r\n1. Predict the outcome variable with text.\r\n\r\n* Uses DistilBERT to predict outcome.\r\n \r\n <br>\r\n2. Select words with high predictive power (positive or negative).\r\n\r\n* Calculates SHAP values for each word and select words with a statistically significant SHAP value.\r\n\r\n<br>\r\n3. Group words by semantic similarity.\r\n\r\n* Returns topics with either positive or negative correlation with the outcome.\r\n\r\n<br>\r\nCTA supports all languages.\r\n\r\n## Installation\r\n\r\nCTApy requires Python 3.9 and pip.  \r\nIt is highly recommended to use a virtual environment (or conda environment) for the installation.\r\n\r\n```bash\r\n# upgrade pip, wheel and setuptools\r\npython -m pip install -U pip wheel setuptools\r\n\r\n# install the package\r\npython -m pip install -U CTApy\r\n```\r\n\r\nIf you want to use Jupyter, make sure you have it installed in the current environment.\r\n\r\n## Quickstart \r\n\r\nPlease see the hands-on tutorials, which replicate the research paper: [https://github.com/twekhof/CTA/tree/main/tutorials](https://github.com/twekhof/CTA/tree/main/tutorials).\r\n\r\n\r\n## Author\r\n\r\n`CTApy` was developed by\r\n\r\n[Tobias Wekhof](https://tobiaswekhof.com), ETH Zurich\r\n\r\n\r\n## Disclaimer\r\n\r\nThis Python package is a research tool currently under development. The authors take no responsibility for the accuracy or reliability of the results produced by it.\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Python package for the Conditional Topic Allocation (CTA)",
    "version": "0.1.4",
    "project_urls": {
        "Homepage": "https://github.com/twekhof/CTA"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "04a33142356f920c0532354531a37ed08e7f2c32d298eb323d6d9bb0182014ef",
                "md5": "6ccd9081ea686a84ffaa127fec46a0b1",
                "sha256": "51e9eb7f901fbb50fec3f4b9e5a651daca2c848ce73240f805675cbda7af65bf"
            },
            "downloads": -1,
            "filename": "CTApy-0.1.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6ccd9081ea686a84ffaa127fec46a0b1",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 12709,
            "upload_time": "2024-08-22T10:06:54",
            "upload_time_iso_8601": "2024-08-22T10:06:54.798210Z",
            "url": "https://files.pythonhosted.org/packages/04/a3/3142356f920c0532354531a37ed08e7f2c32d298eb323d6d9bb0182014ef/CTApy-0.1.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "10f05911f184262209332830461edf159a2aff137e786675285fc3497cacf159",
                "md5": "9723d6a0cb7c9a2e89bb7387fed20ef1",
                "sha256": "166cf8ea9e2b8e93b07a2df359d995ea55ddb5cf2e288b6be50f5b93369b7de9"
            },
            "downloads": -1,
            "filename": "ctapy-0.1.4.tar.gz",
            "has_sig": false,
            "md5_digest": "9723d6a0cb7c9a2e89bb7387fed20ef1",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 9743,
            "upload_time": "2024-08-22T10:06:56",
            "upload_time_iso_8601": "2024-08-22T10:06:56.671145Z",
            "url": "https://files.pythonhosted.org/packages/10/f0/5911f184262209332830461edf159a2aff137e786675285fc3497cacf159/ctapy-0.1.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-22 10:06:56",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "twekhof",
    "github_project": "CTA",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "ctapy"
}
        
Elapsed time: 0.48405s