topmost


Nametopmost JSON
Version 0.0.2 PyPI version JSON
download
home_pagehttps://github.com/bobxwu/topmost
SummaryTowards the Topmost: A Topic Modeling System Tookit
upload_time2024-03-08 09:42:12
maintainer
docs_urlNone
authorXiaobao Wu
requires_python
licenseApache 2.0 License
keywords toolkit topic model neural topic model
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            |topmost-logo| TopMost
=================================

.. |topmost-logo| image:: docs/source/_static/topmost-logo.png
    :width: 38

.. image:: https://img.shields.io/github/stars/bobxwu/topmost?logo=github
        :target: https://github.com/bobxwu/topmost/stargazers
        :alt: Github Stars

.. image:: https://static.pepy.tech/badge/topmost
        :target: https://pepy.tech/project/topmost
        :alt: Downloads

.. image:: https://img.shields.io/pypi/v/topmost
        :target: https://pypi.org/project/topmost
        :alt: PyPi

.. image:: https://readthedocs.org/projects/topmost/badge/?version=latest
    :target: https://topmost.readthedocs.io/en/latest/?badge=latest
    :alt: Documentation Status

.. image:: https://img.shields.io/github/license/bobxwu/topmost
        :target: https://www.apache.org/licenses/LICENSE-2.0/
        :alt: License

.. image:: https://img.shields.io/github/contributors/bobxwu/topmost
        :target: https://github.com/bobxwu/topmost/graphs/contributors/
        :alt: Contributors

.. image:: https://img.shields.io/badge/arXiv-2309.06908-<COLOR>.svg
        :target: https://arxiv.org/pdf/2309.06908.pdf
        :alt: arXiv


TopMost provides complete lifecycles of topic modeling, including datasets, preprocessing, models, training, and evaluations. It covers the most popular topic modeling scenarios, like basic, dynamic, hierarchical, and cross-lingual topic modeling.


| This is our demo paper `Towards the TopMost: A Topic Modeling System Toolkit <https://arxiv.org/pdf/2309.06908.pdf>`_.
| This is our survey paper on neural topic models: `A Survey on Neural Topic Models: Methods, Applications, and Challenges <https://arxiv.org/pdf/2401.15351.pdf>`_.

==================

.. contents:: **Table of Contents**
   :depth: 2



============
Overview
============

TopMost offers the following topic modeling scenarios with models, evaluation metrics, and datasets:

.. image:: docs/source/_static/architecture.svg
    :width: 390
    :align: center

+------------------------------+---------------+--------------------------------------------+-----------------+
|            Scenario          |     Model     |               Evaluation Metric            |  Datasets       |
+==============================+===============+============================================+=================+
|                              | | LDA_        |                                            |                 |
|                              | | NMF_        |                                            | | 20NG          |
|                              | | ProdLDA_    | | TC                                       | | IMDB          |
|                              | | DecTM_      | | TD                                       | | NeurIPS       |
| | Basic Topic Modeling       | | ETM_        | | Clustering                               | | ACL           |
|                              | | NSTM_       | | Classification                           | | NYT           |
|                              | | TSCTM_      |                                            | | Wikitext-103  |
|                              | | ECRTM_      |                                            |                 |
|                              | |             |                                            |                 |
+------------------------------+---------------+--------------------------------------------+-----------------+
|                              |               |                                            | | 20NG          |
|                              | | HDP_        | | TC over levels                           | | IMDB          |
| | Hierarchical               | | SawETM_     | | TD over levels                           | | NeurIPS       |
| | Topic Modeling             | | HyperMiner_ | | Clustering over levels                   | | ACL           |
|                              | | ProGBN_     | | Classification over levels               | | NYT           |
|                              | | TraCo_      |                                            | | Wikitext-103  |
|                              |               |                                            |                 |
+------------------------------+---------------+--------------------------------------------+-----------------+
|                              |               | | TC over time slices                      |                 |
| | Dynamic                    | | DTM_        | | TD over time slices                      | | NeurIPS       |
| | Topic Modeling             | | DETM_       | | Clustering                               | | ACL           |
|                              |               | | Classification                           | | NYT           |
+------------------------------+---------------+--------------------------------------------+-----------------+
|                              |               | | TC (CNPMI)                               | | ECNews        |
| | Cross-lingual              | | NMTM_       | | TD over languages                        | | Amazon        |
| | Topic Modeling             | | InfoCTM_    | | Classification (Intra and Cross-lingual) | | Review Rakuten|
|                              |               | |                                          | |               |
+------------------------------+---------------+--------------------------------------------+-----------------+

.. _LDA: https://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf
.. _NMF: https://papers.nips.cc/paper_files/paper/2000/hash/f9d1152547c0bde01830b7e8bd60024c-Abstract.html
.. _ProdLDA: https://arxiv.org/pdf/1703.01488.pdf
.. _DecTM: https://aclanthology.org/2021.findings-acl.15.pdf
.. _ETM: https://aclanthology.org/2020.tacl-1.29.pdf
.. _NSTM: https://arxiv.org/abs/2008.13537
.. _CTM: https://aclanthology.org/2021.eacl-main.143/
.. _TSCTM: https://aclanthology.org/2022.emnlp-main.176/
.. _ECRTM: https://arxiv.org/pdf/2306.04217.pdf

.. _HDP: https://people.eecs.berkeley.edu/~jordan/papers/hdp.pdf
.. _SawETM: http://proceedings.mlr.press/v139/duan21b/duan21b.pdf
.. _HyperMiner: https://arxiv.org/pdf/2210.10625.pdf
.. _ProGBN: https://proceedings.mlr.press/v202/duan23c/duan23c.pdf
.. _TraCo: https://arxiv.org/pdf/2401.14113.pdf

.. _DTM: https://mimno.infosci.cornell.edu/info6150/readings/dynamic_topic_models.pdf
.. _DETM: https://arxiv.org/abs/1907.05545

.. _NMTM: https://bobxwu.github.io/files/pub/NLPCC2020_Neural_Multilingual_Topic_Model.pdf
.. _InfoCTM: https://arxiv.org/abs/2304.03544



============
Quick Start
============

Install TopMost
-----------------

Install topmost with ``pip`` as 

.. code-block:: console

    $ pip install topmost


Discover topics from your own datasets
-------------------------------------------

We can get the top words of discovered topics, ``topic_top_words``` and the topic distributions of documents, ``doc_topic_dist``.
The preprocessing steps are configurable. See our documentations.

.. code-block:: python

    import topmost
    from topmost.preprocessing import Preprocessing

    # Your own documents
    docs = [
        "This is a document about space, including words like space, satellite, launch, orbit.",
        "This is a document about Microsoft Windows, including words like windows, files, dos.",
        # more documents...
    ]

    device = 'cuda' # or 'cpu'
    preprocessing = Preprocessing()
    dataset = topmost.data.RawDatasetHandler(docs, preprocessing, device=device, as_tensor=True)

    model = topmost.models.ProdLDA(dataset.vocab_size, num_topics=2)
    model = model.to(device)

    trainer = topmost.trainers.BasicTrainer(model)

    topic_top_words, doc_topic_dist = trainer.fit_transform(dataset, num_top_words=15, verbose=False)




============
Usage
============

Download a preprocessed dataset
-----------------------------------

.. code-block:: python

    import topmost
    from topmost.data import download_dataset

    download_dataset('20NG', cache_path='./datasets')


Train a model
-----------------------------------

.. code-block:: python

    device = "cuda" # or "cpu"

    # load a preprocessed dataset
    dataset = topmost.data.BasicDatasetHandler("./datasets/20NG", device=device, read_labels=True, as_tensor=True)
    # create a model
    model = topmost.models.ProdLDA(dataset.vocab_size)
    model = model.to(device)

    # create a trainer
    trainer = topmost.trainers.BasicTrainer(model)

    # train the model
    trainer.train(dataset)


Evaluate
-----------------------------------

.. code-block:: python

    # get theta (doc-topic distributions)
    train_theta, test_theta = trainer.export_theta(dataset)
    # get top words of topics
    topic_top_words = trainer.export_top_words(dataset.vocab)

    # evaluate topic diversity
    TD = topmost.evaluations.compute_topic_diversity(top_words)

    # evaluate clustering
    clustering_results = topmost.evaluations.evaluate_clustering(test_theta, dataset.test_labels)

    # evaluate classification
    classification_results = topmost.evaluations.evaluate_classification(train_theta, test_theta, dataset.train_labels, dataset.test_labels)



Test new documents
-----------------------------------

.. code-block:: python

    import torch
    from topmost.preprocessing import Preprocessing

    new_docs = [
        "This is a new document about space, including words like space, satellite, launch, orbit.",
        "This is a new document about Microsoft Windows, including words like windows, files, dos."
    ]

    parsed_new_docs, new_bow = preprocessing.parse(new_docs, vocab=dataset.vocab)
    new_doc_topic_dist = trainer.test(torch.as_tensor(new_bow, device=device).float())



============
Installation
============


Stable release
--------------

To install TopMost, run this command in your terminal:

.. code-block:: console

    $ pip install topmost

This is the preferred method to install TopMost, as it will always install the most recent stable release.

From sources
------------

The sources for TopMost can be downloaded from the Github repository.
You can clone the public repository by

.. code-block:: console

    $ git clone https://github.com/BobXWu/TopMost.git

Then install the TopMost by

.. code-block:: console

    $ python setup.py install





============
Tutorials
============

.. |github0| image:: https://img.shields.io/badge/Open%20in%20Github-%20?logo=github&color=grey
    :target: https://github.com/BobXWu/TopMost/blob/master/tutorials/tutorial_quickstart.ipynb
    :alt: Open In GitHub

.. |github1| image:: https://img.shields.io/badge/Open%20in%20Github-%20?logo=github&color=grey
    :target: https://github.com/BobXWu/TopMost/blob/master/tutorials/tutorial_preprocessing_datasets.ipynb
    :alt: Open In GitHub

.. |github2| image:: https://img.shields.io/badge/Open%20in%20Github-%20?logo=github&color=grey
    :target: https://github.com/BobXWu/TopMost/blob/master/tutorials/tutorial_basic_topic_models.ipynb
    :alt: Open In GitHub

.. |github3| image:: https://img.shields.io/badge/Open%20in%20Github-%20?logo=github&color=grey
    :target: https://github.com/BobXWu/TopMost/blob/master/tutorials/tutorial_hierarchical_topic_models.ipynb
    :alt: Open In GitHub

.. |github4| image:: https://img.shields.io/badge/Open%20in%20Github-%20?logo=github&color=grey
    :target: https://github.com/BobXWu/TopMost/blob/master/tutorials/tutorial_dynamic_topic_models.ipynb
    :alt: Open In GitHub

.. |github5| image:: https://img.shields.io/badge/Open%20in%20Github-%20?logo=github&color=grey
    :target: https://github.com/BobXWu/TopMost/blob/master/tutorials/tutorial_crosslingual_topic_models.ipynb
    :alt: Open In GitHub



We provide tutorials for different usages:

+--------------------------------------------------------------------------------+-------------------+
| Name                                                                           | Link              |
+================================================================================+===================+
| Quickstart                                                                     | |github0|         |
+--------------------------------------------------------------------------------+-------------------+
| How to preprocess datasets                                                     | |github1|         |
+--------------------------------------------------------------------------------+-------------------+
| How to train and evaluate a basic topic model                                  | |github2|         |
+--------------------------------------------------------------------------------+-------------------+
| How to train and evaluate a hierarchical topic model                           | |github3|         |
+--------------------------------------------------------------------------------+-------------------+
| How to train and evaluate a dynamic topic model                                | |github4|         |
+--------------------------------------------------------------------------------+-------------------+
| How to train and evaluate a cross-lingual topic model                          | |github5|         |
+--------------------------------------------------------------------------------+-------------------+




============
Notice
============

Differences from original implementations
-------------------------------------------

 1. Oringal implementations may use different optimizer settings. For simplicity and brevity, our package by default uses the same setting for different models.



============
Disclaimer
============

This library includes some datasets for demonstration. If you are a dataset owner who wants to exclude your dataset from this library, please contact `Xiaobao Wu <xiaobao002@e.ntu.edu.sg>`_.



============
Authors
============

+----------------------------------------------------------+
| |xiaobao-figure|                                         |
| `Xiaobao Wu <https://bobxwu.github.io>`__                |
+----------------------------------------------------------+
| |fengjun-figure|                                         |
| `Fengjun Pan <https://github.com/panFJCharlotte98>`__    |
+----------------------------------------------------------+

.. |xiaobao-figure| image:: https://bobxwu.github.io/img/figure.jpg 
   :target: https://bobxwu.github.io
   :width: 50

.. |fengjun-figure| image:: https://avatars.githubusercontent.com/u/126648078?v=4
    :target: https://github.com/panFJCharlotte98
    :width: 50


==============
Contributors
==============


.. image:: https://contrib.rocks/image?repo=bobxwu/topmost
        :alt: Contributors


======================
How to cite our work
======================

If you want to use our toolkit, please cite as

::

    @article{wu2023topmost,
    title={Towards the TopMost: A Topic Modeling System Toolkit},
    author={Wu, Xiaobao and Pan, Fengjun and Luu, Anh Tuan},
    journal={arXiv preprint arXiv:2309.06908},
    year={2023}
    }

    @article{wu2023survey,
        title={A Survey on Neural Topic Models: Methods, Applications, and Challenges},
        author={Wu, Xiaobao and Nguyen, Thong and Luu, Anh Tuan},
        journal={Artificial Intelligence Review},
        url={https://doi.org/10.1007/s10462-023-10661-7},
        year={2024},
        publisher={Springer}
    }


=================
Acknowledgments
=================

- If you want to add any models to this package, we welcome your pull requests.
- If you encounter any problem, please either directly contact `Xiaobao Wu <xiaobao002@e.ntu.edu.sg>`_ or leave an issue in the GitHub repo.
- Icon by `Flat-icons-com <https://www.freepik.com/icon/top_671169>`_.


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/bobxwu/topmost",
    "name": "topmost",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "toolkit,topic model,neural topic model",
    "author": "Xiaobao Wu",
    "author_email": "xiaobao002@e.ntu.edu.sg",
    "download_url": "",
    "platform": null,
    "description": "|topmost-logo| TopMost\n=================================\n\n.. |topmost-logo| image:: docs/source/_static/topmost-logo.png\n    :width: 38\n\n.. image:: https://img.shields.io/github/stars/bobxwu/topmost?logo=github\n        :target: https://github.com/bobxwu/topmost/stargazers\n        :alt: Github Stars\n\n.. image:: https://static.pepy.tech/badge/topmost\n        :target: https://pepy.tech/project/topmost\n        :alt: Downloads\n\n.. image:: https://img.shields.io/pypi/v/topmost\n        :target: https://pypi.org/project/topmost\n        :alt: PyPi\n\n.. image:: https://readthedocs.org/projects/topmost/badge/?version=latest\n    :target: https://topmost.readthedocs.io/en/latest/?badge=latest\n    :alt: Documentation Status\n\n.. image:: https://img.shields.io/github/license/bobxwu/topmost\n        :target: https://www.apache.org/licenses/LICENSE-2.0/\n        :alt: License\n\n.. image:: https://img.shields.io/github/contributors/bobxwu/topmost\n        :target: https://github.com/bobxwu/topmost/graphs/contributors/\n        :alt: Contributors\n\n.. image:: https://img.shields.io/badge/arXiv-2309.06908-<COLOR>.svg\n        :target: https://arxiv.org/pdf/2309.06908.pdf\n        :alt: arXiv\n\n\nTopMost provides complete lifecycles of topic modeling, including datasets, preprocessing, models, training, and evaluations. It covers the most popular topic modeling scenarios, like basic, dynamic, hierarchical, and cross-lingual topic modeling.\n\n\n| This is our demo paper `Towards the TopMost: A Topic Modeling System Toolkit <https://arxiv.org/pdf/2309.06908.pdf>`_.\n| This is our survey paper on neural topic models: `A Survey on Neural Topic Models: Methods, Applications, and Challenges <https://arxiv.org/pdf/2401.15351.pdf>`_.\n\n==================\n\n.. contents:: **Table of Contents**\n   :depth: 2\n\n\n\n============\nOverview\n============\n\nTopMost offers the following topic modeling scenarios with models, evaluation metrics, and datasets:\n\n.. image:: docs/source/_static/architecture.svg\n    :width: 390\n    :align: center\n\n+------------------------------+---------------+--------------------------------------------+-----------------+\n|            Scenario          |     Model     |               Evaluation Metric            |  Datasets       |\n+==============================+===============+============================================+=================+\n|                              | | LDA_        |                                            |                 |\n|                              | | NMF_        |                                            | | 20NG          |\n|                              | | ProdLDA_    | | TC                                       | | IMDB          |\n|                              | | DecTM_      | | TD                                       | | NeurIPS       |\n| | Basic Topic Modeling       | | ETM_        | | Clustering                               | | ACL           |\n|                              | | NSTM_       | | Classification                           | | NYT           |\n|                              | | TSCTM_      |                                            | | Wikitext-103  |\n|                              | | ECRTM_      |                                            |                 |\n|                              | |             |                                            |                 |\n+------------------------------+---------------+--------------------------------------------+-----------------+\n|                              |               |                                            | | 20NG          |\n|                              | | HDP_        | | TC over levels                           | | IMDB          |\n| | Hierarchical               | | SawETM_     | | TD over levels                           | | NeurIPS       |\n| | Topic Modeling             | | HyperMiner_ | | Clustering over levels                   | | ACL           |\n|                              | | ProGBN_     | | Classification over levels               | | NYT           |\n|                              | | TraCo_      |                                            | | Wikitext-103  |\n|                              |               |                                            |                 |\n+------------------------------+---------------+--------------------------------------------+-----------------+\n|                              |               | | TC over time slices                      |                 |\n| | Dynamic                    | | DTM_        | | TD over time slices                      | | NeurIPS       |\n| | Topic Modeling             | | DETM_       | | Clustering                               | | ACL           |\n|                              |               | | Classification                           | | NYT           |\n+------------------------------+---------------+--------------------------------------------+-----------------+\n|                              |               | | TC (CNPMI)                               | | ECNews        |\n| | Cross-lingual              | | NMTM_       | | TD over languages                        | | Amazon        |\n| | Topic Modeling             | | InfoCTM_    | | Classification (Intra and Cross-lingual) | | Review Rakuten|\n|                              |               | |                                          | |               |\n+------------------------------+---------------+--------------------------------------------+-----------------+\n\n.. _LDA: https://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf\n.. _NMF: https://papers.nips.cc/paper_files/paper/2000/hash/f9d1152547c0bde01830b7e8bd60024c-Abstract.html\n.. _ProdLDA: https://arxiv.org/pdf/1703.01488.pdf\n.. _DecTM: https://aclanthology.org/2021.findings-acl.15.pdf\n.. _ETM: https://aclanthology.org/2020.tacl-1.29.pdf\n.. _NSTM: https://arxiv.org/abs/2008.13537\n.. _CTM: https://aclanthology.org/2021.eacl-main.143/\n.. _TSCTM: https://aclanthology.org/2022.emnlp-main.176/\n.. _ECRTM: https://arxiv.org/pdf/2306.04217.pdf\n\n.. _HDP: https://people.eecs.berkeley.edu/~jordan/papers/hdp.pdf\n.. _SawETM: http://proceedings.mlr.press/v139/duan21b/duan21b.pdf\n.. _HyperMiner: https://arxiv.org/pdf/2210.10625.pdf\n.. _ProGBN: https://proceedings.mlr.press/v202/duan23c/duan23c.pdf\n.. _TraCo: https://arxiv.org/pdf/2401.14113.pdf\n\n.. _DTM: https://mimno.infosci.cornell.edu/info6150/readings/dynamic_topic_models.pdf\n.. _DETM: https://arxiv.org/abs/1907.05545\n\n.. _NMTM: https://bobxwu.github.io/files/pub/NLPCC2020_Neural_Multilingual_Topic_Model.pdf\n.. _InfoCTM: https://arxiv.org/abs/2304.03544\n\n\n\n============\nQuick Start\n============\n\nInstall TopMost\n-----------------\n\nInstall topmost with ``pip`` as \n\n.. code-block:: console\n\n    $ pip install topmost\n\n\nDiscover topics from your own datasets\n-------------------------------------------\n\nWe can get the top words of discovered topics, ``topic_top_words``` and the topic distributions of documents, ``doc_topic_dist``.\nThe preprocessing steps are configurable. See our documentations.\n\n.. code-block:: python\n\n    import topmost\n    from topmost.preprocessing import Preprocessing\n\n    # Your own documents\n    docs = [\n        \"This is a document about space, including words like space, satellite, launch, orbit.\",\n        \"This is a document about Microsoft Windows, including words like windows, files, dos.\",\n        # more documents...\n    ]\n\n    device = 'cuda' # or 'cpu'\n    preprocessing = Preprocessing()\n    dataset = topmost.data.RawDatasetHandler(docs, preprocessing, device=device, as_tensor=True)\n\n    model = topmost.models.ProdLDA(dataset.vocab_size, num_topics=2)\n    model = model.to(device)\n\n    trainer = topmost.trainers.BasicTrainer(model)\n\n    topic_top_words, doc_topic_dist = trainer.fit_transform(dataset, num_top_words=15, verbose=False)\n\n\n\n\n============\nUsage\n============\n\nDownload a preprocessed dataset\n-----------------------------------\n\n.. code-block:: python\n\n    import topmost\n    from topmost.data import download_dataset\n\n    download_dataset('20NG', cache_path='./datasets')\n\n\nTrain a model\n-----------------------------------\n\n.. code-block:: python\n\n    device = \"cuda\" # or \"cpu\"\n\n    # load a preprocessed dataset\n    dataset = topmost.data.BasicDatasetHandler(\"./datasets/20NG\", device=device, read_labels=True, as_tensor=True)\n    # create a model\n    model = topmost.models.ProdLDA(dataset.vocab_size)\n    model = model.to(device)\n\n    # create a trainer\n    trainer = topmost.trainers.BasicTrainer(model)\n\n    # train the model\n    trainer.train(dataset)\n\n\nEvaluate\n-----------------------------------\n\n.. code-block:: python\n\n    # get theta (doc-topic distributions)\n    train_theta, test_theta = trainer.export_theta(dataset)\n    # get top words of topics\n    topic_top_words = trainer.export_top_words(dataset.vocab)\n\n    # evaluate topic diversity\n    TD = topmost.evaluations.compute_topic_diversity(top_words)\n\n    # evaluate clustering\n    clustering_results = topmost.evaluations.evaluate_clustering(test_theta, dataset.test_labels)\n\n    # evaluate classification\n    classification_results = topmost.evaluations.evaluate_classification(train_theta, test_theta, dataset.train_labels, dataset.test_labels)\n\n\n\nTest new documents\n-----------------------------------\n\n.. code-block:: python\n\n    import torch\n    from topmost.preprocessing import Preprocessing\n\n    new_docs = [\n        \"This is a new document about space, including words like space, satellite, launch, orbit.\",\n        \"This is a new document about Microsoft Windows, including words like windows, files, dos.\"\n    ]\n\n    parsed_new_docs, new_bow = preprocessing.parse(new_docs, vocab=dataset.vocab)\n    new_doc_topic_dist = trainer.test(torch.as_tensor(new_bow, device=device).float())\n\n\n\n============\nInstallation\n============\n\n\nStable release\n--------------\n\nTo install TopMost, run this command in your terminal:\n\n.. code-block:: console\n\n    $ pip install topmost\n\nThis is the preferred method to install TopMost, as it will always install the most recent stable release.\n\nFrom sources\n------------\n\nThe sources for TopMost can be downloaded from the Github repository.\nYou can clone the public repository by\n\n.. code-block:: console\n\n    $ git clone https://github.com/BobXWu/TopMost.git\n\nThen install the TopMost by\n\n.. code-block:: console\n\n    $ python setup.py install\n\n\n\n\n\n============\nTutorials\n============\n\n.. |github0| image:: https://img.shields.io/badge/Open%20in%20Github-%20?logo=github&color=grey\n    :target: https://github.com/BobXWu/TopMost/blob/master/tutorials/tutorial_quickstart.ipynb\n    :alt: Open In GitHub\n\n.. |github1| image:: https://img.shields.io/badge/Open%20in%20Github-%20?logo=github&color=grey\n    :target: https://github.com/BobXWu/TopMost/blob/master/tutorials/tutorial_preprocessing_datasets.ipynb\n    :alt: Open In GitHub\n\n.. |github2| image:: https://img.shields.io/badge/Open%20in%20Github-%20?logo=github&color=grey\n    :target: https://github.com/BobXWu/TopMost/blob/master/tutorials/tutorial_basic_topic_models.ipynb\n    :alt: Open In GitHub\n\n.. |github3| image:: https://img.shields.io/badge/Open%20in%20Github-%20?logo=github&color=grey\n    :target: https://github.com/BobXWu/TopMost/blob/master/tutorials/tutorial_hierarchical_topic_models.ipynb\n    :alt: Open In GitHub\n\n.. |github4| image:: https://img.shields.io/badge/Open%20in%20Github-%20?logo=github&color=grey\n    :target: https://github.com/BobXWu/TopMost/blob/master/tutorials/tutorial_dynamic_topic_models.ipynb\n    :alt: Open In GitHub\n\n.. |github5| image:: https://img.shields.io/badge/Open%20in%20Github-%20?logo=github&color=grey\n    :target: https://github.com/BobXWu/TopMost/blob/master/tutorials/tutorial_crosslingual_topic_models.ipynb\n    :alt: Open In GitHub\n\n\n\nWe provide tutorials for different usages:\n\n+--------------------------------------------------------------------------------+-------------------+\n| Name                                                                           | Link              |\n+================================================================================+===================+\n| Quickstart                                                                     | |github0|         |\n+--------------------------------------------------------------------------------+-------------------+\n| How to preprocess datasets                                                     | |github1|         |\n+--------------------------------------------------------------------------------+-------------------+\n| How to train and evaluate a basic topic model                                  | |github2|         |\n+--------------------------------------------------------------------------------+-------------------+\n| How to train and evaluate a hierarchical topic model                           | |github3|         |\n+--------------------------------------------------------------------------------+-------------------+\n| How to train and evaluate a dynamic topic model                                | |github4|         |\n+--------------------------------------------------------------------------------+-------------------+\n| How to train and evaluate a cross-lingual topic model                          | |github5|         |\n+--------------------------------------------------------------------------------+-------------------+\n\n\n\n\n============\nNotice\n============\n\nDifferences from original implementations\n-------------------------------------------\n\n 1. Oringal implementations may use different optimizer settings. For simplicity and brevity, our package by default uses the same setting for different models.\n\n\n\n============\nDisclaimer\n============\n\nThis library includes some datasets for demonstration. If you are a dataset owner who wants to exclude your dataset from this library, please contact `Xiaobao Wu <xiaobao002@e.ntu.edu.sg>`_.\n\n\n\n============\nAuthors\n============\n\n+----------------------------------------------------------+\n| |xiaobao-figure|                                         |\n| `Xiaobao Wu <https://bobxwu.github.io>`__                |\n+----------------------------------------------------------+\n| |fengjun-figure|                                         |\n| `Fengjun Pan <https://github.com/panFJCharlotte98>`__    |\n+----------------------------------------------------------+\n\n.. |xiaobao-figure| image:: https://bobxwu.github.io/img/figure.jpg \n   :target: https://bobxwu.github.io\n   :width: 50\n\n.. |fengjun-figure| image:: https://avatars.githubusercontent.com/u/126648078?v=4\n    :target: https://github.com/panFJCharlotte98\n    :width: 50\n\n\n==============\nContributors\n==============\n\n\n.. image:: https://contrib.rocks/image?repo=bobxwu/topmost\n        :alt: Contributors\n\n\n======================\nHow to cite our work\n======================\n\nIf you want to use our toolkit, please cite as\n\n::\n\n    @article{wu2023topmost,\n    title={Towards the TopMost: A Topic Modeling System Toolkit},\n    author={Wu, Xiaobao and Pan, Fengjun and Luu, Anh Tuan},\n    journal={arXiv preprint arXiv:2309.06908},\n    year={2023}\n    }\n\n    @article{wu2023survey,\n        title={A Survey on Neural Topic Models: Methods, Applications, and Challenges},\n        author={Wu, Xiaobao and Nguyen, Thong and Luu, Anh Tuan},\n        journal={Artificial Intelligence Review},\n        url={https://doi.org/10.1007/s10462-023-10661-7},\n        year={2024},\n        publisher={Springer}\n    }\n\n\n=================\nAcknowledgments\n=================\n\n- If you want to add any models to this package, we welcome your pull requests.\n- If you encounter any problem, please either directly contact `Xiaobao Wu <xiaobao002@e.ntu.edu.sg>`_ or leave an issue in the GitHub repo.\n- Icon by `Flat-icons-com <https://www.freepik.com/icon/top_671169>`_.\n\n",
    "bugtrack_url": null,
    "license": "Apache 2.0 License",
    "summary": "Towards the Topmost: A Topic Modeling System Tookit",
    "version": "0.0.2",
    "project_urls": {
        "Homepage": "https://github.com/bobxwu/topmost"
    },
    "split_keywords": [
        "toolkit",
        "topic model",
        "neural topic model"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "704a2c3439d387e6cb14a9cee6bbb495ea9a6bf34f91fdd50fe81c632f7f8a50",
                "md5": "69adf30d4fe3ebcc580c86719495ee7c",
                "sha256": "8f0f399e521758f1e1eec1bb9e93370ad8995ffb8952cef6cebaf992a2329470"
            },
            "downloads": -1,
            "filename": "topmost-0.0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "69adf30d4fe3ebcc580c86719495ee7c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 80251,
            "upload_time": "2024-03-08T09:42:12",
            "upload_time_iso_8601": "2024-03-08T09:42:12.219848Z",
            "url": "https://files.pythonhosted.org/packages/70/4a/2c3439d387e6cb14a9cee6bbb495ea9a6bf34f91fdd50fe81c632f7f8a50/topmost-0.0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-03-08 09:42:12",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "bobxwu",
    "github_project": "topmost",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "topmost"
}
        
Elapsed time: 0.20885s