[![PyPi](https://img.shields.io/pypi/v/ttta.svg)](https://pypi.org/project/ttta/)
[![Poster](https://badgen.net/badge/Poster/CPSS@Konvens24/red?icon=github)](https://github.com/K-RLange/ttta/blob/main/docs/poster.pdf)
# ttta: Tools for temporal text analysis
ttta (spoken: "triple t a") is a collection of algorithms to handle diachronic texts in an efficient and unbiased manner.
As code for temporal text analysis papers is mostly scattered across many different repositories and varies heavily in both code quality and usage interface, we thought of a solution. ttta is designed to be a provide a collection of methods with a consistent interface and a good code quality.
**This package is currently a work in progress and in its beta stage, so there may be bugs and inconsistencies. If you encounter any, please report them in the issue tracker.**
The package is maintained by [Kai-Robin Lange](https://lwus.statistik.tu-dortmund.de/en/chair/team/lange/).
## Contributing
If you have implemented temporal text analysis methods in Python, we would be happy to include them in this package. Your contribution will, of course, be acknowledged on this repository and all further publications. If you are interested in sharing your code, feel free to contact me at [kalange\@statistik.tu-dortmund.de](mailto:kalange@statistik.tu-dortmund.de?subject=ttta%20contribution).
## Features
- **Pipeline**: An object to help the user to use the respective methods in a consistent manner. The pipeline can be used to preprocess the data, split it into time chunks, train the model on each time chunk, and evaluate the results. The pipeline can be used to train and evaluate all methods in the package. This feature was implemented by Kai-Robin Lange. This feature is currently still work in progress and not usable.
- **Preprocessing**: Tokenization, lemmatization, stopword removal, and more. This feature was implemented by Kai-Robin Lange.
- **LDAPrototype**: A method for more consistent LDA results by training multiple LDAs and selecting the best one - the prototype. See the [respective paper by Rieger et. al. here](https://doi.org/10.21203/rs.3.rs-1486359/v1). This feature was implemented by Kai-Robin Lange.
- **RollingLDA**: A method to train an LDA model on a time series of texts. The model is updated with each new time chunk. See the [respective paper by Rieger et. al. here](http://dx.doi.org/10.18653/v1/2021.findings-emnlp.201). This feature was implemented by Niklas Benner and Kai-Robin Lange.
- **TopicalChanges**: A method, to detect changes in word-topic distribution over time by utilizing RollingLDA and LDAPrototype and using a time-varying bootstrap control chart. See the [respective paper by Rieger et. al. here](http://ceur-ws.org/Vol-3117/paper1.pdf). This feature was implemented by Kai-Robin Lange.
- **Poisson Reduced Rank Model**: A method to train the Poisson Reduced Rank Model - a document scaling technique for temporal text data, based on a time series of term frequencies. See the [respective paper by Jentsch et. al. here](https://doi.org/10.1093/biomet/asaa063). This feature was implemented by Lars Grönberg.
- **BERT-based sense disambiguation**: A method to track the frequency of a word sense over time using BERT's contextualized embeddings. This method was inspired by the [respective paper by Hu et. al. here](https://aclanthology.org/P19-1379/). This feature was implemented by Aymane Hachcham.
- **Word2Vec-based semantic change detection**: A method that aligns Word2Vec vector spaces, trained on different time chunks, to detect changes in word meaning by comparing the embeddings. This method was inspired by [this paper by Hamilton et. al.](https://aclanthology.org/P16-1141.pdf). This feature was implemented by Imene Kolli.
## Upcoming features
- **Hierarchichal Sense Modeling**
- **Graphon-Network-based word sense modeling**
- **Spatiotemporal topic modeling**
- **Hopefully many more**
## Installation
You can install the package by cloning the GitHub repository or by using pip.
### Cloning the repository
```bash
git clone https://github.com/K-RLange/ttta.git
cd ttta
pip install .
```
### Using pip
```bash
pip install git+https://github.com/K-RLange/ttta.git
```
or
```bash
pip install ttta
```
## Getting started
You can find a tutorial on how to use each feature of the package in the [examples folder](https://github.com/K-RLange/ttta/tree/main/examples).
## Citing ttta
If you use ttta in your research, please cite the package as follows:
```
@software{ttta,
author = {Kai-Robin Lange, Lars Grönberg, Niklas Benner, Imene Kolli, Aymane Hachcham, Jonas Rieger and Carsten Jentsch},
title = {ttta: Tools for temporal text analysis},
url = {https://github.com/K-RLange/ttta},
version = {0.9.1},
}
```
Raw data
{
"_id": null,
"home_page": null,
"name": "ttta",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "python, nlp, diachronic embeddings, dynamic topic modelling, document scaling",
"author": null,
"author_email": "Kai-Robin Lange <kalange@statistik.tu-dortmund.de>, Lars Gr\u00f6nberg <lars.groenberg@tu-dortmund.de>, Niklas Benner <benner@statistik.tu-dortmund.de>, Imene Kolli <imene.kolli@tu-dortmund.de>, Aymane Hachcham <aymane.hachcham@tu-dortmund.de>, Jonas Rieger <rieger@statistik.tu-dortmund.de>, Carsten Jentsch <jentsch@statistik.tu-dortmund.de>",
"download_url": "https://files.pythonhosted.org/packages/0f/73/988b02f2033e1211a84f965bfb960e0a9765cca3ecb3ba7f96cb293512eb/ttta-0.9.5.tar.gz",
"platform": null,
"description": "[![PyPi](https://img.shields.io/pypi/v/ttta.svg)](https://pypi.org/project/ttta/)\n[![Poster](https://badgen.net/badge/Poster/CPSS@Konvens24/red?icon=github)](https://github.com/K-RLange/ttta/blob/main/docs/poster.pdf)\n# ttta: Tools for temporal text analysis\nttta (spoken: \"triple t a\") is a collection of algorithms to handle diachronic texts in an efficient and unbiased manner. \n\nAs code for temporal text analysis papers is mostly scattered across many different repositories and varies heavily in both code quality and usage interface, we thought of a solution. ttta is designed to be a provide a collection of methods with a consistent interface and a good code quality.\n\n**This package is currently a work in progress and in its beta stage, so there may be bugs and inconsistencies. If you encounter any, please report them in the issue tracker.**\n\nThe package is maintained by [Kai-Robin Lange](https://lwus.statistik.tu-dortmund.de/en/chair/team/lange/).\n## Contributing\nIf you have implemented temporal text analysis methods in Python, we would be happy to include them in this package. Your contribution will, of course, be acknowledged on this repository and all further publications. If you are interested in sharing your code, feel free to contact me at [kalange\\@statistik.tu-dortmund.de](mailto:kalange@statistik.tu-dortmund.de?subject=ttta%20contribution).\n\n## Features\n- **Pipeline**: An object to help the user to use the respective methods in a consistent manner. The pipeline can be used to preprocess the data, split it into time chunks, train the model on each time chunk, and evaluate the results. The pipeline can be used to train and evaluate all methods in the package. This feature was implemented by Kai-Robin Lange. This feature is currently still work in progress and not usable.\n- **Preprocessing**: Tokenization, lemmatization, stopword removal, and more. This feature was implemented by Kai-Robin Lange. \n- **LDAPrototype**: A method for more consistent LDA results by training multiple LDAs and selecting the best one - the prototype. See the [respective paper by Rieger et. al. here](https://doi.org/10.21203/rs.3.rs-1486359/v1). This feature was implemented by Kai-Robin Lange.\n- **RollingLDA**: A method to train an LDA model on a time series of texts. The model is updated with each new time chunk. See the [respective paper by Rieger et. al. here](http://dx.doi.org/10.18653/v1/2021.findings-emnlp.201). This feature was implemented by Niklas Benner and Kai-Robin Lange.\n- **TopicalChanges**: A method, to detect changes in word-topic distribution over time by utilizing RollingLDA and LDAPrototype and using a time-varying bootstrap control chart. See the [respective paper by Rieger et. al. here](http://ceur-ws.org/Vol-3117/paper1.pdf). This feature was implemented by Kai-Robin Lange.\n- **Poisson Reduced Rank Model**: A method to train the Poisson Reduced Rank Model - a document scaling technique for temporal text data, based on a time series of term frequencies. See the [respective paper by Jentsch et. al. here](https://doi.org/10.1093/biomet/asaa063). This feature was implemented by Lars Gr\u00f6nberg.\n- **BERT-based sense disambiguation**: A method to track the frequency of a word sense over time using BERT's contextualized embeddings. This method was inspired by the [respective paper by Hu et. al. here](https://aclanthology.org/P19-1379/). This feature was implemented by Aymane Hachcham.\n- **Word2Vec-based semantic change detection**: A method that aligns Word2Vec vector spaces, trained on different time chunks, to detect changes in word meaning by comparing the embeddings. This method was inspired by [this paper by Hamilton et. al.](https://aclanthology.org/P16-1141.pdf). This feature was implemented by Imene Kolli.\n\n## Upcoming features\n- **Hierarchichal Sense Modeling**\n- **Graphon-Network-based word sense modeling**\n- **Spatiotemporal topic modeling**\n- **Hopefully many more**\n\n## Installation\nYou can install the package by cloning the GitHub repository or by using pip.\n\n### Cloning the repository\n```bash\ngit clone https://github.com/K-RLange/ttta.git\ncd ttta\npip install .\n```\n\n### Using pip\n```bash\npip install git+https://github.com/K-RLange/ttta.git\n```\nor\n```bash\npip install ttta\n```\n\n## Getting started\nYou can find a tutorial on how to use each feature of the package in the [examples folder](https://github.com/K-RLange/ttta/tree/main/examples).\n\n## Citing ttta\nIf you use ttta in your research, please cite the package as follows:\n```\n@software{ttta,\n author = {Kai-Robin Lange, Lars Gr\u00f6nberg, Niklas Benner, Imene Kolli, Aymane Hachcham, Jonas Rieger and Carsten Jentsch},\n title = {ttta: Tools for temporal text analysis},\n url = {https://github.com/K-RLange/ttta},\n version = {0.9.1},\n}\n```\n",
"bugtrack_url": null,
"license": null,
"summary": "Tools for temporal text analysis: A Python package providing diachronic tools for text analysis.",
"version": "0.9.5",
"project_urls": null,
"split_keywords": [
"python",
" nlp",
" diachronic embeddings",
" dynamic topic modelling",
" document scaling"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "0f73988b02f2033e1211a84f965bfb960e0a9765cca3ecb3ba7f96cb293512eb",
"md5": "def7c9db814e663aa8b16eb3e511ea40",
"sha256": "6d58d95b4faee4da0b0ae37069a4bfda0df50bafc086afebf6dbe0f96be01356"
},
"downloads": -1,
"filename": "ttta-0.9.5.tar.gz",
"has_sig": false,
"md5_digest": "def7c9db814e663aa8b16eb3e511ea40",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 90039,
"upload_time": "2025-01-11T15:31:12",
"upload_time_iso_8601": "2025-01-11T15:31:12.558681Z",
"url": "https://files.pythonhosted.org/packages/0f/73/988b02f2033e1211a84f965bfb960e0a9765cca3ecb3ba7f96cb293512eb/ttta-0.9.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-01-11 15:31:12",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "ttta"
}