<h1 align="center">TopicNet</h1>
<img align="right" height="15%" width="15%" src="https://avatars3.githubusercontent.com/u/49844788?s=200&v=4" style="max-width:100%;">
<div align="center">
<a href="https://pypi.org/project/topicnet">
<img alt="PyPI Version" src="https://img.shields.io/pypi/v/topicnet?color=blue">
</a>
<a href="https://www.python.org/downloads/">
<img alt="Python Version" src="https://img.shields.io/pypi/pyversions/TopicNet">
</a>
<a href="https://app.travis-ci.com/machine-intelligence-laboratory/TopicNet">
<img alt="Travis Build Status" src="https://api.travis-ci.com/machine-intelligence-laboratory/TopicNet.svg?branch=master">
</a>
<a href="https://codecov.io/gh/machine-intelligence-laboratory/TopicNet">
<img alt="Code Coverage" src="https://codecov.io/gh/machine-intelligence-laboratory/TopicNet/branch/master/graph/badge.svg">
</a>
<a href="https://github.com/machine-intelligence-laboratory/TopicNet/blob/master/LICENSE.txt">
<img alt="License" src="https://img.shields.io/pypi/l/TopicNet?color=Black">
</a>
</div>
<div align="center">
A high-level interface developed by <a href="http://machine-intelligence.ru/en">Machine Intelligence Laboratory</a> for <a href="https://github.com/bigartm/bigartm">BigARTM</a> library.
</div>
## What is TopicNet
`TopicNet` library was created to assist in the task of building topic models.
It aims at automating model training routine freeing more time for artistic process of constructing a target functional for the task at hand.
Consider using TopicNet if:
* you want to explore BigARTM functionality without writing an overhead;
* you need help with rapid solution prototyping;
* you want to build a good topic model quickly (out-of-box, with default parameters);
* you have an ARTM model at hand and you want to explore it's topics.
`TopicNet` provides an infrastructure for your prototyping with the help of `Experiment` class and helps to observe results of your actions via [`viewers`](topicnet/viewers) module.
<p>
<div align="center">
<img src="./docs/readme_images/training_scheme_example.png" width="50%" alt/>
</div>
<em>
Example of the two-stage experiment scheme.
At the first stage, regularizer with parameter <img src="./docs/readme_images/tau.svg"> taking values in some range <img src="./docs/readme_images/tau1-tau2-tau3.svg"> is applied.
Best models after the first stage are <em>Model 1</em> and <em>Model 2</em> — so <em>Model 3</em> is not taking part in the training process anymore.
The second stage is connected with another regularizer with parameter <img src="./docs/readme_images/xi.svg"> taking values in range <img src="./docs/readme_images/xi1-xi2.svg">.
As a result of this stage, two descendant models of <em>Model 1</em> and two descendant models of <em>Model 2</em> are obtained.
</em>
</p>
And here is sample code of the TopicNet baseline experiment:
```python
from topicnet.cooking_machine.config_parser import build_experiment_environment_from_yaml_config
from topicnet.cooking_machine.recipes import ARTM_baseline as config_string
config_string = config_string.format(
dataset_path = '/data/datasets/NIPS/dataset.csv',
modality_list = ['@word'],
main_modality = '@word',
specific_topics = [f'spc_topic_{i}' for i in range(19)],
background_topics = [f'bcg_topic_{i}' for i in range( 1)],
)
experiment, dataset = (
build_experiment_environment_from_yaml_config(
yaml_string = config_string,
experiment_id = 'sample_config',
save_path = 'sample_save_folder_path',
)
)
experiment.run(dataset)
best_model = experiment.select('PerplexityScore@all -> min')[0]
```
## How to Start
Define `TopicModel` from an ARTM model at hand or with help from `model_constructor` module, where you can set models main parameters. Then create an `Experiment`, assigning a root position to this model and path to store your experiment. Further, you can define a set of training stages by the functionality provided by the `cooking_machine.cubes` module.
Further you can read documentation [here](https://machine-intelligence-laboratory.github.io/TopicNet/).
If you want to get familiar with BigARTM (which is not necessary, but generally useful), we recommend the [video tutorial](https://youtu.be/AIN00vWOJGw) by [Murat Apishev](https://github.com/MelLain).
The tutorial is in Russian, but it comes with a [Colab Notebook](https://colab.research.google.com/drive/13oUI1yxZHdQWUfmMpFY4KVlkyWzAkoky).
## Installation
**Core library functionality is based on BigARTM library**.
So BigARTM should also be installed on the machine.
Fortunately, the installation process should not be so difficult now.
Below are the detailed explanations.
### Via Pip
The easiest way to install everything is via `pip` (but currently works fine only for Linux users!)
```bash
pip install topicnet
```
The command also installs BigARTM library, not only TopicNet.
However, [BigARTM Command Line Utility](https://bigartm.readthedocs.io/en/stable/tutorials/bigartm_cli.html) will not be assembled.
Pip installation makes it possible to use BigARTM only through Python Interface.
If working on Windows or Mac, you should install BigARTM by yourself first, then `pip install topicnet` will work just fine.
We are hoping to bring all-in-`pip` installation support to the mentioned systems.
However, right now you may find the following guide useful.
### BigARTM for Non-Linux Users
To avoid installing BigARTM you can use [docker images](https://hub.docker.com/r/xtonev/bigartm/tags) with preinstalled different versions of BigARTM library:
```bash
docker pull xtonev/bigartm:v0.10.0
docker run -t -i xtonev/bigartm:v0.10.0
```
Checking if all installed successfully:
```bash
$ python
>>> import artm
>>> artm.version()
```
Alternatively, you can follow [BigARTM installation manual](https://bigartm.readthedocs.io/en/stable/installation/index.html).
There is also a pair of tips which may provide additional help for Windows users:
1. Go to the [installation page for Windows](http://docs.bigartm.org/en/stable/installation/windows.html) and download the 7z archive in the Downloads section.
2. Use Anaconda `conda install` to download all the Python packages that BigARTM requires.
3. Path variables must be set through the GUI window of system variables, and, if the variable `PYTHONPATH` is missing — add it to the **system wide** variables. Close the GUI window.
After setting up the environment you can fork this repository or use `pip install topicnet` to install the library.
### From Source
One can also install the library from GitHub, which may give more flexibility in developing (for example, making one's own viewers or regularizers a part of the module as .py files)
```bash
git clone https://github.com/machine-intelligence-laboratory/TopicNet.git
cd topicnet
pip install .
```
### Google Colab & Kaggle Notebooks
As Linux installation may be done solely using `pip`, TopicNet can be used in such online services as
[Google Colab](https://colab.research.google.com) and
[Kaggle Notebooks](https://www.kaggle.com/kernels).
All you need is to run the following command in a notebook cell:
```bash
! pip install topicnet
```
There is also a [notebook in Google Colab](https://colab.research.google.com/drive/1Tr1ZO03iPufj11HtIH3JjaWWU1Wyxkzv) made by [Nikolay Gerasimenko](https://github.com/Nikolay-Gerasimenko), where BigARTM is build from source.
This may be useful, for example, if you plan to use the BigARTM Command Line Utility.
# Usage
Let's say you have a handful of raw text mined from some source and you want to perform some topic modelling on them.
Where should you start?
## Data Preparation
Every ML problem starts with data preprocess step.
TopicNet does not perform data preprocessing itself.
Instead, it demands data being prepared by the user and loaded via [Dataset](topicnet/cooking_machine/dataset.py) class.
Here is a basic example of how one can achieve that: [rtl_wiki_preprocessing](topicnet/demos/RTL-Wiki-Preprocessing.ipynb).
For the convenience of everyone who wants to use TopicNet and in general for everyone interested in topic modeling, we provide a couple of already proprocessed datasets (see [DemoDataset.ipynb](topicnet/dataset_manager/DemoDataset.ipynb) notebook for more information).
These datasets can be downloaded from code.
For example:
```python
from topicnet.dataset_manager import api
dataset = api.load_dataset('postnauka')
```
Or, in case the API is broken or something, you can just go to the [TopicNet's page on Hugging Face](https://huggingface.co/TopicNet) and get the needed .csv files there.
## Training a Topic Model
Here we can finally get on the main part: making your own, best of them all, manually crafted Topic Model
### Get Your Data
We need to load our previously prepared data with Dataset:
```python
DATASET_PATH = '/Wiki_raw_set/wiki_data.csv'
dataset = Dataset(DATASET_PATH)
```
### Make an Initial Model
In case you want to start from a fresh model we suggest you use this code:
```python
from topicnet.cooking_machine.model_constructor import init_simple_default_model
artm_model = init_simple_default_model(
dataset=dataset,
modalities_to_use={'@lemmatized': 1.0, '@bigram':0.5},
main_modality='@lemmatized',
specific_topics=14,
background_topics=1,
)
```
Note that here we have model with two modalities: `'@lemmatized'` and `'@bigram'`.
Further, if needed, one can define a custom score to be calculated during the model training.
```python
from topicnet.cooking_machine.models.base_score import BaseScore
class CustomScore(BaseScore):
def __init__(self):
super().__init__()
def call(self,
model,
eps=1e-5,
n_specific_topics=14):
phi = model.get_phi().values[:,:n_specific_topics]
specific_sparsity = np.sum(phi < eps) / np.sum(phi < 1)
return specific_sparsity
```
Now, `TopicModel` with custom score can be defined:
```python
from topicnet.cooking_machine.models.topic_model import TopicModel
custom_scores = {'SpecificSparsity': CustomScore()}
topic_model = TopicModel(artm_model, model_id='Groot', custom_scores=custom_scores)
```
### Define an Experiment
For further model training and tuning `Experiment` is necessary:
```python
from topicnet.cooking_machine.experiment import Experiment
experiment = Experiment(
experiment_id="simple_experiment", save_path="experiments", topic_model=topic_model
)
```
### Toy with the Cubes
Defining a next stage of the model training to select a decorrelator parameter:
```python
from topicnet.cooking_machine.cubes import RegularizersModifierCube
my_first_cube = RegularizersModifierCube(
num_iter=5,
tracked_score_function='PerplexityScore@lemmatized',
regularizer_parameters={
'regularizer': artm.DecorrelatorPhiRegularizer(name='decorrelation_phi', tau=1),
'tau_grid': [0,1,2,3,4,5],
},
reg_search='grid',
verbose=True,
)
my_first_cube(topic_model, dataset)
```
Selecting a model with best perplexity score:
```python
perplexity_criterion = 'PerplexityScore@lemmatized -> min COLLECT 1'
best_model = experiment.select(perplexity_criterion)
```
### Alternatively: Use Recipes
If you need a topic model now, you can use one of the code snippets we call *recipes*.
```python
from topicnet.cooking_machine.recipes import BaselineRecipe
EXPERIMENT_PATH = '/home/user/experiment/'
training_pipeline = BaselineRecipe()
training_pipeline.format_recipe(dataset_path=DATASET_PATH)
experiment, dataset = training_pipeline.build_experiment_environment(
save_path=EXPERIMENT_PATH
)
```
after that you can expect a following result:
![run_result](./docs/readme_images/experiment_train.gif)
### View the Results
Browsing the model is easy: create a viewer and call its `view()` method (or `view_from_jupyter()` — it is advised to use it if working in Jupyter Notebook):
```python
from topicnet.viewers import TopTokensViewer
toptok_viewer = TopTokensViewer(best_model, num_top_tokens=10, method='phi')
toptok_viewer.view_from_jupyter()
```
More info about different viewers is available here: [`viewers`](topicnet/viewers).
# FAQ
### In the example we used to write vw modality like **@modality**, is it a VowpalWabbit format?
It is a convention to write data designating modalities with @ sign taken by TopicNet from BigARTM.
### CubeCreator helps to perform a grid search over initial model parameters. How can I do it with modalities?
Modality search space can be defined using standart library logic like:
```python
class_ids_cube = CubeCreator(
num_iter=5,
parameters: [
name: 'class_ids',
values: {
'@text': [1, 2, 3],
'@ngrams': [4, 5, 6],
},
]
reg_search='grid',
verbose=True,
)
```
However, for the case of modalities a couple of slightly more convenient methods are availiable:
```python
parameters : [
{
'name' : 'class_ids@text',
'values': [1, 2, 3]
},
{
'name' : 'class_ids@ngrams',
'values': [4, 5, 6]
}
]
parameters:[
{
'class_ids@text' : [1, 2, 3],
'class_ids@ngrams': [4, 5, 6]
}
]
```
# Contribution
If you find a bug, or if you would like the library to have some new features — you are welcome to contact us or create an issue or a pull request!
It also worth noting that TopicNet library is always open to improvements in several areas:
* New custom regularizers.
* New topic model scores.
* New topic models or recipes to train topic models for a particular task/with some special properties.
* New datasets (so as to make them available for everyone to download and conduct experiments with topic models).
# Citing TopicNet
When citing `topicnet` in academic papers and theses, please use this BibTeX entry:
```
@InProceedings{bulatov-EtAl:2020:LREC,
author = {Bulatov, Victor and Alekseev, Vasiliy and Vorontsov, Konstantin and Polyudova, Darya and Veselova, Eugenia and Goncharov, Alexey and Egorov, Evgeny},
title = {TopicNet: Making Additive Regularisation for Topic Modelling Accessible},
booktitle = {Proceedings of The 12th Language Resources and Evaluation Conference},
month = {May},
year = {2020},
address = {Marseille, France},
publisher = {European Language Resources Association},
pages = {6747--6754},
url = {https://www.aclweb.org/anthology/2020.lrec-1.833}
}
```
Raw data
{
"_id": null,
"home_page": "https://github.com/machine-intelligence-laboratory/TopicNet",
"name": "topicnet",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "ARTM, topic modeling, regularization, multimodal learning, document vector representation",
"author": "Machine Intelligence Laboratory",
"author_email": "alex.goncharov@phystech.edu",
"download_url": "https://files.pythonhosted.org/packages/f5/09/0e9a62fca12ddd3d9d97877d47747f623da96ca49f36d2c8826abcd05884/topicnet-0.9.0.tar.gz",
"platform": null,
"description": "<h1 align=\"center\">TopicNet</h1>\n<img align=\"right\" height=\"15%\" width=\"15%\" src=\"https://avatars3.githubusercontent.com/u/49844788?s=200&v=4\" style=\"max-width:100%;\">\n\n<div align=\"center\">\n <a href=\"https://pypi.org/project/topicnet\">\n <img alt=\"PyPI Version\" src=\"https://img.shields.io/pypi/v/topicnet?color=blue\">\n </a>\n <a href=\"https://www.python.org/downloads/\">\n <img alt=\"Python Version\" src=\"https://img.shields.io/pypi/pyversions/TopicNet\">\n </a>\n <a href=\"https://app.travis-ci.com/machine-intelligence-laboratory/TopicNet\">\n <img alt=\"Travis Build Status\" src=\"https://api.travis-ci.com/machine-intelligence-laboratory/TopicNet.svg?branch=master\">\n </a>\n <a href=\"https://codecov.io/gh/machine-intelligence-laboratory/TopicNet\">\n <img alt=\"Code Coverage\" src=\"https://codecov.io/gh/machine-intelligence-laboratory/TopicNet/branch/master/graph/badge.svg\">\n </a>\n <a href=\"https://github.com/machine-intelligence-laboratory/TopicNet/blob/master/LICENSE.txt\">\n <img alt=\"License\" src=\"https://img.shields.io/pypi/l/TopicNet?color=Black\">\n </a>\n</div>\n\n<div align=\"center\">\n A high-level interface developed by <a href=\"http://machine-intelligence.ru/en\">Machine Intelligence Laboratory</a> for <a href=\"https://github.com/bigartm/bigartm\">BigARTM</a> library.\n</div>\n\n\n## What is TopicNet\n\n`TopicNet` library was created to assist in the task of building topic models.\nIt aims at automating model training routine freeing more time for artistic process of constructing a target functional for the task at hand.\n\nConsider using TopicNet if:\n\n* you want to explore BigARTM functionality without writing an overhead;\n* you need help with rapid solution prototyping;\n* you want to build a good topic model quickly (out-of-box, with default parameters);\n* you have an ARTM model at hand and you want to explore it's topics.\n\n`TopicNet` provides an infrastructure for your prototyping with the help of `Experiment` class and helps to observe results of your actions via [`viewers`](topicnet/viewers) module.\n\n<p>\n <div align=\"center\">\n <img src=\"./docs/readme_images/training_scheme_example.png\" width=\"50%\" alt/>\n </div>\n <em>\n Example of the two-stage experiment scheme.\n At the first stage, regularizer with parameter <img src=\"./docs/readme_images/tau.svg\"> taking values in some range <img src=\"./docs/readme_images/tau1-tau2-tau3.svg\"> is applied.\n Best models after the first stage are <em>Model 1</em> and <em>Model 2</em> \u2014 so <em>Model 3</em> is not taking part in the training process anymore.\n The second stage is connected with another regularizer with parameter <img src=\"./docs/readme_images/xi.svg\"> taking values in range <img src=\"./docs/readme_images/xi1-xi2.svg\">.\n As a result of this stage, two descendant models of <em>Model 1</em> and two descendant models of <em>Model 2</em> are obtained.\n </em>\n</p>\n\nAnd here is sample code of the TopicNet baseline experiment:\n\n```python\nfrom topicnet.cooking_machine.config_parser import build_experiment_environment_from_yaml_config\nfrom topicnet.cooking_machine.recipes import ARTM_baseline as config_string\n\n\nconfig_string = config_string.format(\n dataset_path = '/data/datasets/NIPS/dataset.csv',\n modality_list = ['@word'],\n main_modality = '@word',\n specific_topics = [f'spc_topic_{i}' for i in range(19)],\n background_topics = [f'bcg_topic_{i}' for i in range( 1)],\n)\nexperiment, dataset = (\n build_experiment_environment_from_yaml_config(\n yaml_string = config_string,\n experiment_id = 'sample_config',\n save_path = 'sample_save_folder_path',\n )\n)\n\nexperiment.run(dataset)\n\nbest_model = experiment.select('PerplexityScore@all -> min')[0]\n```\n\n\n## How to Start\n\nDefine `TopicModel` from an ARTM model at hand or with help from `model_constructor` module, where you can set models main parameters. Then create an `Experiment`, assigning a root position to this model and path to store your experiment. Further, you can define a set of training stages by the functionality provided by the `cooking_machine.cubes` module.\n\nFurther you can read documentation [here](https://machine-intelligence-laboratory.github.io/TopicNet/).\n\nIf you want to get familiar with BigARTM (which is not necessary, but generally useful), we recommend the [video tutorial](https://youtu.be/AIN00vWOJGw) by [Murat Apishev](https://github.com/MelLain).\nThe tutorial is in Russian, but it comes with a [Colab Notebook](https://colab.research.google.com/drive/13oUI1yxZHdQWUfmMpFY4KVlkyWzAkoky).\n\n\n## Installation\n\n**Core library functionality is based on BigARTM library**.\nSo BigARTM should also be installed on the machine.\nFortunately, the installation process should not be so difficult now.\nBelow are the detailed explanations.\n\n\n### Via Pip\n\nThe easiest way to install everything is via `pip` (but currently works fine only for Linux users!)\n\n```bash\npip install topicnet\n```\n\nThe command also installs BigARTM library, not only TopicNet.\nHowever, [BigARTM Command Line Utility](https://bigartm.readthedocs.io/en/stable/tutorials/bigartm_cli.html) will not be assembled.\nPip installation makes it possible to use BigARTM only through Python Interface.\n\nIf working on Windows or Mac, you should install BigARTM by yourself first, then `pip install topicnet` will work just fine.\nWe are hoping to bring all-in-`pip` installation support to the mentioned systems.\nHowever, right now you may find the following guide useful.\n\n### BigARTM for Non-Linux Users\n\nTo avoid installing BigARTM you can use [docker images](https://hub.docker.com/r/xtonev/bigartm/tags) with preinstalled different versions of BigARTM library:\n\n```bash\ndocker pull xtonev/bigartm:v0.10.0\ndocker run -t -i xtonev/bigartm:v0.10.0\n```\n\nChecking if all installed successfully:\n\n```bash\n$ python\n\n>>> import artm\n>>> artm.version()\n```\n\nAlternatively, you can follow [BigARTM installation manual](https://bigartm.readthedocs.io/en/stable/installation/index.html).\nThere is also a pair of tips which may provide additional help for Windows users:\n\n1. Go to the [installation page for Windows](http://docs.bigartm.org/en/stable/installation/windows.html) and download the 7z archive in the Downloads section.\n2. Use Anaconda `conda install` to download all the Python packages that BigARTM requires.\n3. Path variables must be set through the GUI window of system variables, and, if the variable `PYTHONPATH` is missing \u2014 add it to the **system wide** variables. Close the GUI window.\n\nAfter setting up the environment you can fork this repository or use `pip install topicnet` to install the library.\n\n\n### From Source\n\nOne can also install the library from GitHub, which may give more flexibility in developing (for example, making one's own viewers or regularizers a part of the module as .py files)\n\n```bash\ngit clone https://github.com/machine-intelligence-laboratory/TopicNet.git\ncd topicnet\npip install .\n```\n\n### Google Colab & Kaggle Notebooks\n\nAs Linux installation may be done solely using `pip`, TopicNet can be used in such online services as\n[Google Colab](https://colab.research.google.com) and\n[Kaggle Notebooks](https://www.kaggle.com/kernels).\nAll you need is to run the following command in a notebook cell:\n\n```bash\n! pip install topicnet\n```\n\nThere is also a [notebook in Google Colab](https://colab.research.google.com/drive/1Tr1ZO03iPufj11HtIH3JjaWWU1Wyxkzv) made by [Nikolay Gerasimenko](https://github.com/Nikolay-Gerasimenko), where BigARTM is build from source.\nThis may be useful, for example, if you plan to use the BigARTM Command Line Utility.\n\n\n# Usage\n\nLet's say you have a handful of raw text mined from some source and you want to perform some topic modelling on them.\nWhere should you start?\n\n## Data Preparation\n\nEvery ML problem starts with data preprocess step.\nTopicNet does not perform data preprocessing itself.\nInstead, it demands data being prepared by the user and loaded via [Dataset](topicnet/cooking_machine/dataset.py) class.\nHere is a basic example of how one can achieve that: [rtl_wiki_preprocessing](topicnet/demos/RTL-Wiki-Preprocessing.ipynb).\n\nFor the convenience of everyone who wants to use TopicNet and in general for everyone interested in topic modeling, we provide a couple of already proprocessed datasets (see [DemoDataset.ipynb](topicnet/dataset_manager/DemoDataset.ipynb) notebook for more information).\nThese datasets can be downloaded from code.\nFor example:\n\n```python\nfrom topicnet.dataset_manager import api\n\n\ndataset = api.load_dataset('postnauka')\n```\n\nOr, in case the API is broken or something, you can just go to the [TopicNet's page on Hugging Face](https://huggingface.co/TopicNet) and get the needed .csv files there.\n\n\n## Training a Topic Model\n\nHere we can finally get on the main part: making your own, best of them all, manually crafted Topic Model\n\n### Get Your Data\n\nWe need to load our previously prepared data with Dataset:\n\n```python\nDATASET_PATH = '/Wiki_raw_set/wiki_data.csv'\n\ndataset = Dataset(DATASET_PATH)\n```\n\n### Make an Initial Model\n\nIn case you want to start from a fresh model we suggest you use this code:\n\n```python\nfrom topicnet.cooking_machine.model_constructor import init_simple_default_model\n\n\nartm_model = init_simple_default_model(\n dataset=dataset,\n modalities_to_use={'@lemmatized': 1.0, '@bigram':0.5},\n main_modality='@lemmatized',\n specific_topics=14,\n background_topics=1,\n)\n```\n\nNote that here we have model with two modalities: `'@lemmatized'` and `'@bigram'`.\nFurther, if needed, one can define a custom score to be calculated during the model training.\n\n```python\nfrom topicnet.cooking_machine.models.base_score import BaseScore\n\n\nclass CustomScore(BaseScore):\n def __init__(self):\n super().__init__()\n\n def call(self,\n model,\n eps=1e-5,\n n_specific_topics=14):\n\n phi = model.get_phi().values[:,:n_specific_topics]\n specific_sparsity = np.sum(phi < eps) / np.sum(phi < 1)\n\n return specific_sparsity\n```\n\nNow, `TopicModel` with custom score can be defined:\n\n```python\nfrom topicnet.cooking_machine.models.topic_model import TopicModel\n\n\ncustom_scores = {'SpecificSparsity': CustomScore()}\ntopic_model = TopicModel(artm_model, model_id='Groot', custom_scores=custom_scores)\n```\n\n### Define an Experiment\n\nFor further model training and tuning `Experiment` is necessary:\n\n```python\nfrom topicnet.cooking_machine.experiment import Experiment\n\n\nexperiment = Experiment(\n experiment_id=\"simple_experiment\", save_path=\"experiments\", topic_model=topic_model\n)\n```\n\n### Toy with the Cubes\n\nDefining a next stage of the model training to select a decorrelator parameter:\n\n```python\nfrom topicnet.cooking_machine.cubes import RegularizersModifierCube\n\n\nmy_first_cube = RegularizersModifierCube(\n num_iter=5,\n tracked_score_function='PerplexityScore@lemmatized',\n regularizer_parameters={\n 'regularizer': artm.DecorrelatorPhiRegularizer(name='decorrelation_phi', tau=1),\n 'tau_grid': [0,1,2,3,4,5],\n },\n reg_search='grid',\n verbose=True,\n)\n\nmy_first_cube(topic_model, dataset)\n```\n\nSelecting a model with best perplexity score:\n\n```python\nperplexity_criterion = 'PerplexityScore@lemmatized -> min COLLECT 1'\nbest_model = experiment.select(perplexity_criterion)\n```\n\n### Alternatively: Use Recipes\n\nIf you need a topic model now, you can use one of the code snippets we call *recipes*.\n```python\nfrom topicnet.cooking_machine.recipes import BaselineRecipe\n\n\nEXPERIMENT_PATH = '/home/user/experiment/'\n\ntraining_pipeline = BaselineRecipe()\ntraining_pipeline.format_recipe(dataset_path=DATASET_PATH)\nexperiment, dataset = training_pipeline.build_experiment_environment(\n save_path=EXPERIMENT_PATH\n)\n```\nafter that you can expect a following result:\n![run_result](./docs/readme_images/experiment_train.gif)\n\n\n### View the Results\n\nBrowsing the model is easy: create a viewer and call its `view()` method (or `view_from_jupyter()` \u2014 it is advised to use it if working in Jupyter Notebook):\n\n```python\nfrom topicnet.viewers import TopTokensViewer\n\n\ntoptok_viewer = TopTokensViewer(best_model, num_top_tokens=10, method='phi')\ntoptok_viewer.view_from_jupyter()\n```\n\nMore info about different viewers is available here: [`viewers`](topicnet/viewers).\n\n# FAQ\n\n### In the example we used to write vw modality like **@modality**, is it a VowpalWabbit format?\n\nIt is a convention to write data designating modalities with @ sign taken by TopicNet from BigARTM.\n\n### CubeCreator helps to perform a grid search over initial model parameters. How can I do it with modalities?\n\nModality search space can be defined using standart library logic like:\n\n```python\nclass_ids_cube = CubeCreator(\n num_iter=5,\n parameters: [\n name: 'class_ids',\n values: {\n '@text': [1, 2, 3],\n '@ngrams': [4, 5, 6],\n },\n ]\n reg_search='grid',\n verbose=True,\n)\n```\n\nHowever, for the case of modalities a couple of slightly more convenient methods are availiable:\n\n```python\nparameters : [\n {\n 'name' : 'class_ids@text',\n 'values': [1, 2, 3]\n },\n {\n 'name' : 'class_ids@ngrams',\n 'values': [4, 5, 6]\n }\n]\nparameters:[\n {\n 'class_ids@text' : [1, 2, 3],\n 'class_ids@ngrams': [4, 5, 6]\n }\n]\n```\n\n# Contribution\n\nIf you find a bug, or if you would like the library to have some new features \u2014 you are welcome to contact us or create an issue or a pull request!\n\nIt also worth noting that TopicNet library is always open to improvements in several areas:\n\n* New custom regularizers.\n* New topic model scores.\n* New topic models or recipes to train topic models for a particular task/with some special properties.\n* New datasets (so as to make them available for everyone to download and conduct experiments with topic models).\n\n\n# Citing TopicNet\n\nWhen citing `topicnet` in academic papers and theses, please use this BibTeX entry:\n\n```\n@InProceedings{bulatov-EtAl:2020:LREC,\n author = {Bulatov, Victor and Alekseev, Vasiliy and Vorontsov, Konstantin and Polyudova, Darya and Veselova, Eugenia and Goncharov, Alexey and Egorov, Evgeny},\n title = {TopicNet: Making Additive Regularisation for Topic Modelling Accessible},\n booktitle = {Proceedings of The 12th Language Resources and Evaluation Conference},\n month = {May},\n year = {2020},\n address = {Marseille, France},\n publisher = {European Language Resources Association},\n pages = {6747--6754},\n url = {https://www.aclweb.org/anthology/2020.lrec-1.833}\n}\n\n```\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "TopicNet is a module for topic modelling using ARTM algorithm",
"version": "0.9.0",
"project_urls": {
"Download": "https://github.com/machine-intelligence-laboratory/TopicNet/archive/v0.9.0.tar.gz",
"Homepage": "https://github.com/machine-intelligence-laboratory/TopicNet"
},
"split_keywords": [
"artm",
" topic modeling",
" regularization",
" multimodal learning",
" document vector representation"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "99b80b9207b63f7207468cbb22b5f19ee0d43e30e44ede3ce3e77efc6f67c2f5",
"md5": "3c8d8f1b597e3d0fdecb1b9c33154848",
"sha256": "0b0bb348592deed6120c0f2e72d867aeb488b5d4845cafaae874c204a09078ef"
},
"downloads": -1,
"filename": "topicnet-0.9.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "3c8d8f1b597e3d0fdecb1b9c33154848",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 141990,
"upload_time": "2024-07-28T16:00:11",
"upload_time_iso_8601": "2024-07-28T16:00:11.058043Z",
"url": "https://files.pythonhosted.org/packages/99/b8/0b9207b63f7207468cbb22b5f19ee0d43e30e44ede3ce3e77efc6f67c2f5/topicnet-0.9.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "f5090e9a62fca12ddd3d9d97877d47747f623da96ca49f36d2c8826abcd05884",
"md5": "4f516d0e434652b2faf4487ab30a22b2",
"sha256": "353280069575fd82ca06b86588b70d19f9305cea2d71bcf2d1f8e9eda885d736"
},
"downloads": -1,
"filename": "topicnet-0.9.0.tar.gz",
"has_sig": false,
"md5_digest": "4f516d0e434652b2faf4487ab30a22b2",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 121369,
"upload_time": "2024-07-28T16:00:13",
"upload_time_iso_8601": "2024-07-28T16:00:13.253873Z",
"url": "https://files.pythonhosted.org/packages/f5/09/0e9a62fca12ddd3d9d97877d47747f623da96ca49f36d2c8826abcd05884/topicnet-0.9.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-07-28 16:00:13",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "machine-intelligence-laboratory",
"github_project": "TopicNet",
"travis_ci": true,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "bigartm",
"specs": [
[
">=",
"0.9.2"
]
]
},
{
"name": "codecov",
"specs": []
},
{
"name": "colorlover",
"specs": [
[
"==",
"0.3.0"
]
]
},
{
"name": "coverage",
"specs": []
},
{
"name": "dask",
"specs": [
[
"==",
"2023.5.0"
]
]
},
{
"name": "dill",
"specs": [
[
"==",
"0.3.8"
]
]
},
{
"name": "ipython",
"specs": [
[
"==",
"8.12.3"
]
]
},
{
"name": "jinja2",
"specs": [
[
"==",
"3.1.4"
]
]
},
{
"name": "numba",
"specs": [
[
"==",
"0.58.1"
]
]
},
{
"name": "numexpr",
"specs": [
[
"==",
"2.8.6"
]
]
},
{
"name": "numpy",
"specs": [
[
"==",
"1.24.4"
]
]
},
{
"name": "pandas",
"specs": [
[
"==",
"2.0.3"
]
]
},
{
"name": "plotly",
"specs": [
[
"==",
"5.20.0"
]
]
},
{
"name": "protobuf",
"specs": [
[
"==",
"3.20.3"
]
]
},
{
"name": "pytest",
"specs": [
[
"==",
"8.1.1"
]
]
},
{
"name": "pytest-cov",
"specs": [
[
"==",
"5.0.0"
]
]
},
{
"name": "pytest-rerunfailures",
"specs": [
[
"==",
"14.0"
]
]
},
{
"name": "pytest-timeout",
"specs": [
[
"==",
"2.3.1"
]
]
},
{
"name": "scikit-learn",
"specs": [
[
"==",
"1.3.2"
]
]
},
{
"name": "scipy",
"specs": [
[
"==",
"1.10.1"
]
]
},
{
"name": "six",
"specs": [
[
"==",
"1.16.0"
]
]
},
{
"name": "strictyaml",
"specs": [
[
"==",
"1.7.3"
]
]
},
{
"name": "toolz",
"specs": [
[
"==",
"0.12.1"
]
]
},
{
"name": "tqdm",
"specs": [
[
"==",
"4.66.3"
]
]
}
],
"lcname": "topicnet"
}