|topmost-logo| TopMost
=================================
.. |topmost-logo| image:: docs/source/_static/topmost-logo.png
:width: 38
.. image:: https://img.shields.io/github/stars/bobxwu/topmost?logo=github
:target: https://github.com/bobxwu/topmost/stargazers
:alt: Github Stars
.. image:: https://static.pepy.tech/badge/topmost
:target: https://pepy.tech/project/topmost
:alt: Downloads
.. image:: https://img.shields.io/pypi/v/topmost
:target: https://pypi.org/project/topmost
:alt: PyPi
.. image:: https://readthedocs.org/projects/topmost/badge/?version=latest
:target: https://topmost.readthedocs.io/en/latest/?badge=latest
:alt: Documentation Status
.. image:: https://img.shields.io/github/license/bobxwu/topmost
:target: https://www.apache.org/licenses/LICENSE-2.0/
:alt: License
.. image:: https://img.shields.io/github/contributors/bobxwu/topmost
:target: https://github.com/bobxwu/topmost/graphs/contributors/
:alt: Contributors
.. image:: https://img.shields.io/badge/arXiv-2309.06908-<COLOR>.svg
:target: https://arxiv.org/pdf/2309.06908.pdf
:alt: arXiv
TopMost provides complete lifecycles of topic modeling, including datasets, preprocessing, models, training, and evaluations. It covers the most popular topic modeling scenarios, like basic, dynamic, hierarchical, and cross-lingual topic modeling.
| Check our **ACL 2024 demo paper**: `Towards the TopMost: A Topic Modeling System Toolkit <https://arxiv.org/pdf/2309.06908.pdf>`_.
| Check our survey paper on neural topic models accepted to **Artificial Intelligence Review**: `A Survey on Neural Topic Models: Methods, Applications, and Challenges <https://arxiv.org/pdf/2401.15351.pdf>`_.
|
| If you want to use TopMost, please cite as
::
@inproceedings{wu2023topmost,
title = "Towards the {T}op{M}ost: A Topic Modeling System Toolkit",
author = "Wu, Xiaobao and Pan, Fengjun and Luu, Anh Tuan",
editor = "Cao, Yixin and Feng, Yang and Xiong, Deyi",
booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)",
month = aug,
year = "2024",
address = "Bangkok, Thailand",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.acl-demos.4",
pages = "31--41"
}
@article{wu2023survey,
title={A Survey on Neural Topic Models: Methods, Applications, and Challenges},
author={Wu, Xiaobao and Nguyen, Thong and Luu, Anh Tuan},
journal={Artificial Intelligence Review},
url={https://doi.org/10.1007/s10462-023-10661-7},
year={2024},
publisher={Springer}
}
==================
.. contents:: **Table of Contents**
:depth: 2
============
Overview
============
TopMost offers the following topic modeling scenarios with models, evaluation metrics, and datasets:
.. image:: https://github.com/BobXWu/TopMost/raw/main/docs/source/_static/architecture.svg
:width: 390
:align: center
+------------------------------+---------------+--------------------------------------------+-----------------+
| Scenario | Model | Evaluation Metric | Datasets |
+==============================+===============+============================================+=================+
| | | LDA_ | | |
| | | NMF_ | | | 20NG |
| | | ProdLDA_ | | TC | | IMDB |
| | | DecTM_ | | TD | | NeurIPS |
| | Basic Topic Modeling | | ETM_ | | Clustering | | ACL |
| | | NSTM_ | | Classification | | NYT |
| | | TSCTM_ | | | Wikitext-103 |
| | | BERTopic_ | | |
| | | ECRTM_ | | |
| | | FASTopic_ | | |
+------------------------------+---------------+--------------------------------------------+-----------------+
| | | | | 20NG |
| | | HDP_ | | TC over levels | | IMDB |
| | Hierarchical | | SawETM_ | | TD over levels | | NeurIPS |
| | Topic Modeling | | HyperMiner_ | | Clustering over levels | | ACL |
| | | ProGBN_ | | Classification over levels | | NYT |
| | | TraCo_ | | | Wikitext-103 |
| | | | |
+------------------------------+---------------+--------------------------------------------+-----------------+
| | | | TC over time slices | |
| | Dynamic | | DTM_ | | TD over time slices | | NeurIPS |
| | Topic Modeling | | DETM_ | | Clustering | | ACL |
| | | CFDTM_ | | Classification | | NYT |
+------------------------------+---------------+--------------------------------------------+-----------------+
| | | | TC (CNPMI) | | ECNews |
| | Cross-lingual | | NMTM_ | | TD over languages | | Amazon |
| | Topic Modeling | | InfoCTM_ | | Classification (Intra and Cross-lingual) | | Review Rakuten|
| | | | | | |
+------------------------------+---------------+--------------------------------------------+-----------------+
.. _LDA: https://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf
.. _NMF: https://papers.nips.cc/paper_files/paper/2000/hash/f9d1152547c0bde01830b7e8bd60024c-Abstract.html
.. _ProdLDA: https://arxiv.org/pdf/1703.01488.pdf
.. _DecTM: https://aclanthology.org/2021.findings-acl.15.pdf
.. _ETM: https://aclanthology.org/2020.tacl-1.29.pdf
.. _NSTM: https://arxiv.org/abs/2008.13537
.. _BERTopic: https://arxiv.org/pdf/2203.05794.pdf
.. _CTM: https://aclanthology.org/2021.eacl-main.143/
.. _TSCTM: https://aclanthology.org/2022.emnlp-main.176/
.. _ECRTM: https://arxiv.org/pdf/2306.04217.pdf
.. _FASTopic: https://arxiv.org/pdf/2405.17978
.. _HDP: https://people.eecs.berkeley.edu/~jordan/papers/hdp.pdf
.. _SawETM: http://proceedings.mlr.press/v139/duan21b/duan21b.pdf
.. _HyperMiner: https://arxiv.org/pdf/2210.10625.pdf
.. _ProGBN: https://proceedings.mlr.press/v202/duan23c/duan23c.pdf
.. _TraCo: https://arxiv.org/pdf/2401.14113.pdf
.. _DTM: https://mimno.infosci.cornell.edu/info6150/readings/dynamic_topic_models.pdf
.. _DETM: https://arxiv.org/abs/1907.05545
.. _CFDTM: https://arxiv.org/pdf/2405.17957
.. _NMTM: https://bobxwu.github.io/files/pub/NLPCC2020_Neural_Multilingual_Topic_Model.pdf
.. _InfoCTM: https://arxiv.org/abs/2304.03544
============
Quick Start
============
Install TopMost
-----------------
Install topmost with ``pip`` as
.. code-block:: console
$ pip install topmost
-------------------------------------------
We try FASTopic_ to get the top words of discovered topics, ``topic_top_words`` and the topic distributions of documents, ``doc_topic_dist``.
The preprocessing steps are configurable. See our documentations.
.. code-block:: python
from topmost import RawDataset, Preprocess, FASTopicTrainer
from sklearn.datasets import fetch_20newsgroups
docs = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))['data']
preprocess = Preprocess(vocab_size=10000)
dataset = RawDataset(docs, preprocess, device="cuda")
trainer = FASTopicTrainer(dataset, verbose=True)
top_words, doc_topic_dist = trainer.train()
new_docs = [
"This is a document about space, including words like space, satellite, launch, orbit.",
"This is a document about Microsoft Windows, including words like windows, files, dos."
]
new_theta = trainer.test(new_docs)
print(new_theta.argmax(1))
============
Usage
============
Download a preprocessed dataset
-----------------------------------
.. code-block:: python
import topmost
topmost.download_dataset('20NG', cache_path='./datasets')
Train a model
-----------------------------------
.. code-block:: python
device = "cuda" # or "cpu"
# load a preprocessed dataset
dataset = topmost.BasicDataset("./datasets/20NG", device=device, read_labels=True)
# create a model
model = topmost.ProdLDA(dataset.vocab_size)
model = model.to(device)
# create a trainer
trainer = topmost.BasicTrainer(model, dataset)
# train the model
top_words, train_theta = trainer.train()
Evaluate
-----------------------------------
.. code-block:: python
from topmost import eva
# topic diversity and coherence
TD = eva._diversity(top_words)
TC = eva._coherence(dataset.train_texts, dataset.vocab, top_words)
# get doc-topic distributions of testing samples
test_theta = trainer.test(dataset.test_data)
# clustering
clustering_results = eva._clustering(test_theta, dataset.test_labels)
# classification
cls_results = eva._cls(train_theta, test_theta, dataset.train_labels, dataset.test_labels)
Test new documents
-----------------------------------
.. code-block:: python
import torch
from topmost import Preprocess
new_docs = [
"This is a new document about space, including words like space, satellite, launch, orbit.",
"This is a new document about Microsoft Windows, including words like windows, files, dos."
]
preprocess = Preprocess()
new_parsed_docs, new_bow = preprocess.parse(new_docs, vocab=dataset.vocab)
new_theta = trainer.test(torch.as_tensor(new_bow.toarray(), device=device).float())
============
Installation
============
Stable release
--------------
To install TopMost, run this command in the terminal:
.. code-block:: console
$ pip install topmost
This is the preferred method to install TopMost, as it will always install the most recent stable release.
From sources
------------
The sources for TopMost can be downloaded from the Github repository.
.. code-block:: console
$ pip install git+https://github.com/bobxwu/TopMost.git
============
Tutorials
============
.. |github0| image:: https://img.shields.io/badge/Open%20in%20Github-%20?logo=github&color=grey
:target: https://github.com/BobXWu/TopMost/blob/master/tutorials/tutorial_quickstart.ipynb
:alt: Open In GitHub
.. |github1| image:: https://img.shields.io/badge/Open%20in%20Github-%20?logo=github&color=grey
:target: https://github.com/BobXWu/TopMost/blob/master/tutorials/tutorial_preprocessing_datasets.ipynb
:alt: Open In GitHub
.. |github2| image:: https://img.shields.io/badge/Open%20in%20Github-%20?logo=github&color=grey
:target: https://github.com/BobXWu/TopMost/blob/master/tutorials/tutorial_basic_topic_models.ipynb
:alt: Open In GitHub
.. |github3| image:: https://img.shields.io/badge/Open%20in%20Github-%20?logo=github&color=grey
:target: https://github.com/BobXWu/TopMost/blob/master/tutorials/tutorial_hierarchical_topic_models.ipynb
:alt: Open In GitHub
.. |github4| image:: https://img.shields.io/badge/Open%20in%20Github-%20?logo=github&color=grey
:target: https://github.com/BobXWu/TopMost/blob/master/tutorials/tutorial_dynamic_topic_models.ipynb
:alt: Open In GitHub
.. |github5| image:: https://img.shields.io/badge/Open%20in%20Github-%20?logo=github&color=grey
:target: https://github.com/BobXWu/TopMost/blob/master/tutorials/tutorial_crosslingual_topic_models.ipynb
:alt: Open In GitHub
We provide tutorials for different usages:
+--------------------------------------------------------------------------------+-------------------+
| Name | Link |
+================================================================================+===================+
| Quickstart | |github0| |
+--------------------------------------------------------------------------------+-------------------+
| How to preprocess datasets | |github1| |
+--------------------------------------------------------------------------------+-------------------+
| How to train and evaluate a basic topic model | |github2| |
+--------------------------------------------------------------------------------+-------------------+
| How to train and evaluate a hierarchical topic model | |github3| |
+--------------------------------------------------------------------------------+-------------------+
| How to train and evaluate a dynamic topic model | |github4| |
+--------------------------------------------------------------------------------+-------------------+
| How to train and evaluate a cross-lingual topic model | |github5| |
+--------------------------------------------------------------------------------+-------------------+
============
Disclaimer
============
This library includes some datasets for demonstration. If you are a dataset owner who wants to exclude your dataset from this library, please contact `Xiaobao Wu <xiaobao002@e.ntu.edu.sg>`_.
============
Authors
============
+----------------------------------------------------------+
| |xiaobao-figure| |
| `Xiaobao Wu <https://bobxwu.github.io>`__ |
+----------------------------------------------------------+
| |fengjun-figure| |
| `Fengjun Pan <https://github.com/panFJCharlotte98>`__ |
+----------------------------------------------------------+
.. |xiaobao-figure| image:: https://bobxwu.github.io/assets/img/figure-1400.webp
:target: https://bobxwu.github.io
:width: 50
.. |fengjun-figure| image:: https://avatars.githubusercontent.com/u/126648078?v=4
:target: https://github.com/panFJCharlotte98
:width: 50
==============
Contributors
==============
.. image:: https://contrib.rocks/image?repo=bobxwu/topmost
:alt: Contributors
=================
Contact
=================
- We welcome your contributions to this project. Please feel free to submit pull requests.
- If you encounter any problem, please either directly contact `Xiaobao Wu <xiaobao002@e.ntu.edu.sg>`_ or leave an issue in the GitHub repo.
Raw data
{
"_id": null,
"home_page": "https://github.com/bobxwu/topmost",
"name": "topmost",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "toolkit, topic model, neural topic model",
"author": "Xiaobao Wu",
"author_email": "xiaobao002@e.ntu.edu.sg",
"download_url": "https://files.pythonhosted.org/packages/db/29/bfd63c0526a8cfde1485c051ffc352dec59ec23541b038bb881b6cb2e75c/topmost-1.0.1.tar.gz",
"platform": null,
"description": "|topmost-logo| TopMost\n=================================\n\n.. |topmost-logo| image:: docs/source/_static/topmost-logo.png\n :width: 38\n\n.. image:: https://img.shields.io/github/stars/bobxwu/topmost?logo=github\n :target: https://github.com/bobxwu/topmost/stargazers\n :alt: Github Stars\n\n.. image:: https://static.pepy.tech/badge/topmost\n :target: https://pepy.tech/project/topmost\n :alt: Downloads\n\n.. image:: https://img.shields.io/pypi/v/topmost\n :target: https://pypi.org/project/topmost\n :alt: PyPi\n\n.. image:: https://readthedocs.org/projects/topmost/badge/?version=latest\n :target: https://topmost.readthedocs.io/en/latest/?badge=latest\n :alt: Documentation Status\n\n.. image:: https://img.shields.io/github/license/bobxwu/topmost\n :target: https://www.apache.org/licenses/LICENSE-2.0/\n :alt: License\n\n.. image:: https://img.shields.io/github/contributors/bobxwu/topmost\n :target: https://github.com/bobxwu/topmost/graphs/contributors/\n :alt: Contributors\n\n.. image:: https://img.shields.io/badge/arXiv-2309.06908-<COLOR>.svg\n :target: https://arxiv.org/pdf/2309.06908.pdf\n :alt: arXiv\n\n\nTopMost provides complete lifecycles of topic modeling, including datasets, preprocessing, models, training, and evaluations. It covers the most popular topic modeling scenarios, like basic, dynamic, hierarchical, and cross-lingual topic modeling.\n\n\n| Check our **ACL 2024 demo paper**: `Towards the TopMost: A Topic Modeling System Toolkit <https://arxiv.org/pdf/2309.06908.pdf>`_.\n| Check our survey paper on neural topic models accepted to **Artificial Intelligence Review**: `A Survey on Neural Topic Models: Methods, Applications, and Challenges <https://arxiv.org/pdf/2401.15351.pdf>`_.\n\n\n|\n| If you want to use TopMost, please cite as\n\n::\n\n @inproceedings{wu2023topmost,\n title = \"Towards the {T}op{M}ost: A Topic Modeling System Toolkit\",\n author = \"Wu, Xiaobao and Pan, Fengjun and Luu, Anh Tuan\",\n editor = \"Cao, Yixin and Feng, Yang and Xiong, Deyi\",\n booktitle = \"Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)\",\n month = aug,\n year = \"2024\",\n address = \"Bangkok, Thailand\",\n publisher = \"Association for Computational Linguistics\",\n url = \"https://aclanthology.org/2024.acl-demos.4\",\n pages = \"31--41\"\n }\n\n @article{wu2023survey,\n title={A Survey on Neural Topic Models: Methods, Applications, and Challenges},\n author={Wu, Xiaobao and Nguyen, Thong and Luu, Anh Tuan},\n journal={Artificial Intelligence Review},\n url={https://doi.org/10.1007/s10462-023-10661-7},\n year={2024},\n publisher={Springer}\n }\n\n\n\n==================\n\n.. contents:: **Table of Contents**\n :depth: 2\n\n\n\n============\nOverview\n============\n\nTopMost offers the following topic modeling scenarios with models, evaluation metrics, and datasets:\n\n.. image:: https://github.com/BobXWu/TopMost/raw/main/docs/source/_static/architecture.svg\n :width: 390\n :align: center\n\n+------------------------------+---------------+--------------------------------------------+-----------------+\n| Scenario | Model | Evaluation Metric | Datasets |\n+==============================+===============+============================================+=================+\n| | | LDA_ | | |\n| | | NMF_ | | | 20NG |\n| | | ProdLDA_ | | TC | | IMDB |\n| | | DecTM_ | | TD | | NeurIPS |\n| | Basic Topic Modeling | | ETM_ | | Clustering | | ACL |\n| | | NSTM_ | | Classification | | NYT |\n| | | TSCTM_ | | | Wikitext-103 |\n| | | BERTopic_ | | |\n| | | ECRTM_ | | |\n| | | FASTopic_ | | |\n+------------------------------+---------------+--------------------------------------------+-----------------+\n| | | | | 20NG |\n| | | HDP_ | | TC over levels | | IMDB |\n| | Hierarchical | | SawETM_ | | TD over levels | | NeurIPS |\n| | Topic Modeling | | HyperMiner_ | | Clustering over levels | | ACL |\n| | | ProGBN_ | | Classification over levels | | NYT |\n| | | TraCo_ | | | Wikitext-103 |\n| | | | |\n+------------------------------+---------------+--------------------------------------------+-----------------+\n| | | | TC over time slices | |\n| | Dynamic | | DTM_ | | TD over time slices | | NeurIPS |\n| | Topic Modeling | | DETM_ | | Clustering | | ACL |\n| | | CFDTM_ | | Classification | | NYT |\n+------------------------------+---------------+--------------------------------------------+-----------------+\n| | | | TC (CNPMI) | | ECNews |\n| | Cross-lingual | | NMTM_ | | TD over languages | | Amazon |\n| | Topic Modeling | | InfoCTM_ | | Classification (Intra and Cross-lingual) | | Review Rakuten|\n| | | | | | |\n+------------------------------+---------------+--------------------------------------------+-----------------+\n\n.. _LDA: https://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf\n.. _NMF: https://papers.nips.cc/paper_files/paper/2000/hash/f9d1152547c0bde01830b7e8bd60024c-Abstract.html\n.. _ProdLDA: https://arxiv.org/pdf/1703.01488.pdf\n.. _DecTM: https://aclanthology.org/2021.findings-acl.15.pdf\n.. _ETM: https://aclanthology.org/2020.tacl-1.29.pdf\n.. _NSTM: https://arxiv.org/abs/2008.13537\n.. _BERTopic: https://arxiv.org/pdf/2203.05794.pdf\n.. _CTM: https://aclanthology.org/2021.eacl-main.143/\n.. _TSCTM: https://aclanthology.org/2022.emnlp-main.176/\n.. _ECRTM: https://arxiv.org/pdf/2306.04217.pdf\n.. _FASTopic: https://arxiv.org/pdf/2405.17978\n\n.. _HDP: https://people.eecs.berkeley.edu/~jordan/papers/hdp.pdf\n.. _SawETM: http://proceedings.mlr.press/v139/duan21b/duan21b.pdf\n.. _HyperMiner: https://arxiv.org/pdf/2210.10625.pdf\n.. _ProGBN: https://proceedings.mlr.press/v202/duan23c/duan23c.pdf\n.. _TraCo: https://arxiv.org/pdf/2401.14113.pdf\n\n.. _DTM: https://mimno.infosci.cornell.edu/info6150/readings/dynamic_topic_models.pdf\n.. _DETM: https://arxiv.org/abs/1907.05545\n.. _CFDTM: https://arxiv.org/pdf/2405.17957\n\n.. _NMTM: https://bobxwu.github.io/files/pub/NLPCC2020_Neural_Multilingual_Topic_Model.pdf\n.. _InfoCTM: https://arxiv.org/abs/2304.03544\n\n\n\n\n============\nQuick Start\n============\n\nInstall TopMost\n-----------------\n\nInstall topmost with ``pip`` as \n\n.. code-block:: console\n\n $ pip install topmost\n\n-------------------------------------------\n\nWe try FASTopic_ to get the top words of discovered topics, ``topic_top_words`` and the topic distributions of documents, ``doc_topic_dist``.\nThe preprocessing steps are configurable. See our documentations.\n\n.. code-block:: python\n\n from topmost import RawDataset, Preprocess, FASTopicTrainer\n from sklearn.datasets import fetch_20newsgroups\n\n docs = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))['data']\n preprocess = Preprocess(vocab_size=10000)\n\n dataset = RawDataset(docs, preprocess, device=\"cuda\")\n\n trainer = FASTopicTrainer(dataset, verbose=True)\n top_words, doc_topic_dist = trainer.train()\n\n new_docs = [\n \"This is a document about space, including words like space, satellite, launch, orbit.\",\n \"This is a document about Microsoft Windows, including words like windows, files, dos.\"\n ]\n\n new_theta = trainer.test(new_docs)\n print(new_theta.argmax(1))\n\n\n\n============\nUsage\n============\n\nDownload a preprocessed dataset\n-----------------------------------\n\n.. code-block:: python\n\n import topmost\n\n topmost.download_dataset('20NG', cache_path='./datasets')\n\n\nTrain a model\n-----------------------------------\n\n.. code-block:: python\n\n device = \"cuda\" # or \"cpu\"\n\n # load a preprocessed dataset\n dataset = topmost.BasicDataset(\"./datasets/20NG\", device=device, read_labels=True)\n # create a model\n model = topmost.ProdLDA(dataset.vocab_size)\n model = model.to(device)\n\n # create a trainer\n trainer = topmost.BasicTrainer(model, dataset)\n\n # train the model\n top_words, train_theta = trainer.train()\n\n\nEvaluate\n-----------------------------------\n\n.. code-block:: python\n\n from topmost import eva\n\n # topic diversity and coherence\n TD = eva._diversity(top_words)\n TC = eva._coherence(dataset.train_texts, dataset.vocab, top_words)\n\n # get doc-topic distributions of testing samples\n test_theta = trainer.test(dataset.test_data)\n # clustering\n clustering_results = eva._clustering(test_theta, dataset.test_labels)\n # classification\n cls_results = eva._cls(train_theta, test_theta, dataset.train_labels, dataset.test_labels)\n\n\n\nTest new documents\n-----------------------------------\n\n.. code-block:: python\n\n import torch\n from topmost import Preprocess\n\n new_docs = [\n \"This is a new document about space, including words like space, satellite, launch, orbit.\",\n \"This is a new document about Microsoft Windows, including words like windows, files, dos.\"\n ]\n\n preprocess = Preprocess()\n new_parsed_docs, new_bow = preprocess.parse(new_docs, vocab=dataset.vocab)\n new_theta = trainer.test(torch.as_tensor(new_bow.toarray(), device=device).float())\n\n\n\n============\nInstallation\n============\n\n\nStable release\n--------------\n\nTo install TopMost, run this command in the terminal:\n\n.. code-block:: console\n\n $ pip install topmost\n\nThis is the preferred method to install TopMost, as it will always install the most recent stable release.\n\nFrom sources\n------------\n\nThe sources for TopMost can be downloaded from the Github repository.\n\n.. code-block:: console\n\n $ pip install git+https://github.com/bobxwu/TopMost.git\n\n\n\n\n\n============\nTutorials\n============\n\n.. |github0| image:: https://img.shields.io/badge/Open%20in%20Github-%20?logo=github&color=grey\n :target: https://github.com/BobXWu/TopMost/blob/master/tutorials/tutorial_quickstart.ipynb\n :alt: Open In GitHub\n\n.. |github1| image:: https://img.shields.io/badge/Open%20in%20Github-%20?logo=github&color=grey\n :target: https://github.com/BobXWu/TopMost/blob/master/tutorials/tutorial_preprocessing_datasets.ipynb\n :alt: Open In GitHub\n\n.. |github2| image:: https://img.shields.io/badge/Open%20in%20Github-%20?logo=github&color=grey\n :target: https://github.com/BobXWu/TopMost/blob/master/tutorials/tutorial_basic_topic_models.ipynb\n :alt: Open In GitHub\n\n.. |github3| image:: https://img.shields.io/badge/Open%20in%20Github-%20?logo=github&color=grey\n :target: https://github.com/BobXWu/TopMost/blob/master/tutorials/tutorial_hierarchical_topic_models.ipynb\n :alt: Open In GitHub\n\n.. |github4| image:: https://img.shields.io/badge/Open%20in%20Github-%20?logo=github&color=grey\n :target: https://github.com/BobXWu/TopMost/blob/master/tutorials/tutorial_dynamic_topic_models.ipynb\n :alt: Open In GitHub\n\n.. |github5| image:: https://img.shields.io/badge/Open%20in%20Github-%20?logo=github&color=grey\n :target: https://github.com/BobXWu/TopMost/blob/master/tutorials/tutorial_crosslingual_topic_models.ipynb\n :alt: Open In GitHub\n\n\n\nWe provide tutorials for different usages:\n\n+--------------------------------------------------------------------------------+-------------------+\n| Name | Link |\n+================================================================================+===================+\n| Quickstart | |github0| |\n+--------------------------------------------------------------------------------+-------------------+\n| How to preprocess datasets | |github1| |\n+--------------------------------------------------------------------------------+-------------------+\n| How to train and evaluate a basic topic model | |github2| |\n+--------------------------------------------------------------------------------+-------------------+\n| How to train and evaluate a hierarchical topic model | |github3| |\n+--------------------------------------------------------------------------------+-------------------+\n| How to train and evaluate a dynamic topic model | |github4| |\n+--------------------------------------------------------------------------------+-------------------+\n| How to train and evaluate a cross-lingual topic model | |github5| |\n+--------------------------------------------------------------------------------+-------------------+\n\n\n============\nDisclaimer\n============\n\nThis library includes some datasets for demonstration. If you are a dataset owner who wants to exclude your dataset from this library, please contact `Xiaobao Wu <xiaobao002@e.ntu.edu.sg>`_.\n\n\n\n============\nAuthors\n============\n\n+----------------------------------------------------------+\n| |xiaobao-figure| |\n| `Xiaobao Wu <https://bobxwu.github.io>`__ |\n+----------------------------------------------------------+\n| |fengjun-figure| |\n| `Fengjun Pan <https://github.com/panFJCharlotte98>`__ |\n+----------------------------------------------------------+\n\n.. |xiaobao-figure| image:: https://bobxwu.github.io/assets/img/figure-1400.webp \n :target: https://bobxwu.github.io\n :width: 50\n\n.. |fengjun-figure| image:: https://avatars.githubusercontent.com/u/126648078?v=4\n :target: https://github.com/panFJCharlotte98\n :width: 50\n\n\n==============\nContributors\n==============\n\n\n.. image:: https://contrib.rocks/image?repo=bobxwu/topmost\n :alt: Contributors\n\n\n\n=================\nContact\n=================\n\n- We welcome your contributions to this project. Please feel free to submit pull requests.\n- If you encounter any problem, please either directly contact `Xiaobao Wu <xiaobao002@e.ntu.edu.sg>`_ or leave an issue in the GitHub repo.\n",
"bugtrack_url": null,
"license": "Apache 2.0 License",
"summary": "Topmost: A Topic Modeling System Tookit",
"version": "1.0.1",
"project_urls": {
"Homepage": "https://github.com/bobxwu/topmost"
},
"split_keywords": [
"toolkit",
" topic model",
" neural topic model"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "0e0e9335f06490f8ee996d49834378514fc5fd69edb92ce85100b76738b740af",
"md5": "f4d04fbd40ef2550d063bc2af8b4aab1",
"sha256": "e2de04876bdecd593ce6af3642ebf49abff410ecea26e6314c56a56cdac0c4fa"
},
"downloads": -1,
"filename": "topmost-1.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "f4d04fbd40ef2550d063bc2af8b4aab1",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 93609,
"upload_time": "2025-01-26T05:57:22",
"upload_time_iso_8601": "2025-01-26T05:57:22.214022Z",
"url": "https://files.pythonhosted.org/packages/0e/0e/9335f06490f8ee996d49834378514fc5fd69edb92ce85100b76738b740af/topmost-1.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "db29bfd63c0526a8cfde1485c051ffc352dec59ec23541b038bb881b6cb2e75c",
"md5": "18f4c4b2c6b13a8cb73fbbe0a3ea95ea",
"sha256": "e075902119601e8b0e6546fe4c2762d69da79a2f66b194a7bc863c5f0a4977d2"
},
"downloads": -1,
"filename": "topmost-1.0.1.tar.gz",
"has_sig": false,
"md5_digest": "18f4c4b2c6b13a8cb73fbbe0a3ea95ea",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 56757,
"upload_time": "2025-01-26T05:57:25",
"upload_time_iso_8601": "2025-01-26T05:57:25.341298Z",
"url": "https://files.pythonhosted.org/packages/db/29/bfd63c0526a8cfde1485c051ffc352dec59ec23541b038bb881b6cb2e75c/topmost-1.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-01-26 05:57:25",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "bobxwu",
"github_project": "topmost",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "numpy",
"specs": [
[
"<",
"1.27.0"
]
]
},
{
"name": "scipy",
"specs": [
[
"<=",
"1.10.1"
]
]
},
{
"name": "sentence-transformers",
"specs": [
[
">=",
"2.6.0"
],
[
"<",
"3.0.0"
]
]
},
{
"name": "torchvision",
"specs": [
[
">=",
"0.14.1"
]
]
},
{
"name": "gensim",
"specs": [
[
">=",
"4.2.0"
]
]
},
{
"name": "scikit-learn",
"specs": [
[
">=",
"0.24.2"
]
]
},
{
"name": "tqdm",
"specs": [
[
">=",
"4.66.0"
]
]
},
{
"name": "fastopic",
"specs": [
[
">=",
"1.0.0"
]
]
},
{
"name": "bertopic",
"specs": [
[
">=",
"0.15.0"
]
]
}
],
"lcname": "topmost"
}