diarizationlm

Name	diarizationlm JSON
Version	0.1.4 JSON
	download
home_page	https://github.com/google/speaker-id/tree/master/DiarizationLM
Summary	DiarizationLM
upload_time	2024-10-25 00:22:55
maintainer	None
docs_url	None
author	Quan Wang
requires_python	None
license	None
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # DiarizationLM

[![Python application](https://github.com/google/speaker-id/actions/workflows/python-app-diarizationlm.yml/badge.svg)](https://github.com/google/speaker-id/actions/workflows/python-app-diarizationlm.yml)
[![PyPI Version](https://img.shields.io/pypi/v/diarizationlm.svg)](https://pypi.python.org/pypi/diarizationlm)
[![Python Versions](https://img.shields.io/pypi/pyversions/diarizationlm.svg)](https://pypi.org/project/diarizationlm)
[![Downloads](https://static.pepy.tech/badge/diarizationlm)](https://www.pepy.tech/projects/diarizationlm)
[![codecov](https://codecov.io/gh/google/speaker-id/branch/master/graph/badge.svg)](https://codecov.io/gh/google/speaker-id)
[![Documentation](https://img.shields.io/badge/arXiv-preprint-blue.svg)](https://arxiv.org/abs/2401.03506)
[![HuggingFace](https://img.shields.io/badge/Hugging-Face-blue.svg)](https://huggingface.co/google/DiarizationLM-8b-Fisher-v2)
[![HuggingFace Space](https://img.shields.io/badge/Online-Demo-blue.svg)](https://huggingface.co/spaces/diarizers-community/DiarizationLM-GGUF)


## Table of contents

* [Overview](#Overview)
* [Instructions](#Instructions)
  * [Install the package](#Install-the-package)
  * [Data format](#Data-format)
  * [Conversion between representations](#Conversion-between-representations)
  * [Transcript-preserving speaker transfer (TPST)](#Transcript-preserving-speaker-transfer-TPST)
  * [Training data preparation](#Training-data-preparation)
  * [LLM finetuning and inference (OpenAI)](#LLM-finetuning-and-inference-OpenAI)
  * [LLM finetuning and inference (Llama)](#LLM-finetuning-and-inference-Llama)
  * [Completion parser](#Completion-parser)
  * [Metrics](#Metrics)
* [Citation](#Citation)

## Overview

Here we open source some functions and tools used in the [DiarizationLM paper](https://arxiv.org/abs/2401.03506).

We also have open source models on Hugging Face: https://huggingface.co/google/DiarizationLM-8b-Fisher-v2

Play with our demo: https://huggingface.co/spaces/diarizers-community/DiarizationLM-GGUF

<img src="resources/DiarizationLM_Bard_demo.gif" alt="demo" width="512"/>

## Disclaimer

**This is NOT an official Google product.**

## Instructions

![img](resources/diagram.png)

### Install the package

You can install the package with:

```
pip install diarizationlm
```

Once installed, you can directly use many of the existing functions from the package. For example:

```python
import diarizationlm

src_text = "hello good morning hi how are you pretty good"
src_spk = "1 1 1 2 2 2 2 1 1"
tgt_text = "hello morning hi hey are you be good"
tgt_spk = "1 2 2 2 1 1 2 1"
transferred_spk = diarizationlm.transcript_preserving_speaker_transfer(
    src_text, src_spk, tgt_text, tgt_spk)
print(transferred_spk)
```

### Data format

We assume all internal data are stored in JSON files. An example is `testdata/example_data.json`. The field `"utterances"` stores a list of utterances, and in each utterance we have these string fields:

| Field | Description |
| ----- | ----------- |
| `"utterance_id"` | This stores the utterance ID.|
| `"hyp_text"` | This stores the sequence of hypothesis words, but joined by spaces.|
| `"hyp_spk"` | This stores the sequence of hypothesis speakers, but joined by spaces.|
| `"hyp_diarized_text"` | This is the text representation of the hypothesis words and speakers. It can be used for debugging and to build the prompts to LLM.|
| `"ref_*"` | Similar to the `"hyp_*"` fields, but these are ground truth reference, rather than hypothesis.|

### Conversion between representations

In the paper, we mentioned two representations:

1. The word sequence and speaker sequence representation.
2. The pure text representation.

Example:

```
Word sequence:         ["good", "morning", "how", "are", "you"]
Speaker sequence:      [1, 1, 2, 2, 2]
Text representation:   "<spk:1> good morning <spk:2> how are you"
```

We provide the functions in `diarizationlm/utils.py` to convert between these two representations:

* `create_diarized_text()` converts the word and speaker sequences to the pure text representation.
* `extract_text_and_spk()` converts the pure text representation to the word and speaker sequences.

### Transcript-preserving speaker transfer (TPST)

TPST is a critical data processing algorithm used in multiple places in our paper.

A Python implementation is available in `diarizationlm/utils.py`, defined as:

```Python
def transcript_preserving_speaker_transfer(
    src_text: str, src_spk: str, tgt_text: str, tgt_spk: str
) -> str
```

![img](resources/TPST_algorithm.png)

### Training data preparation

We provide a Python script `train_data_prep.py` that can be used for preparing the dataset for finetuning LLMs (i.e. the prompt builder module described in the paper). This tool will do these for you:

1. Segment the prompts and completions based on the input and output length limit.
2. Optionally apply prefix and suffix to prompts and completions.
3. Store prompt-completion pairs in different file formats.

The segmentation length, prefix, and suffix are passed in as flags to `train_data_prep.py`. In Python code, they are configured as `PromptOptions` defined in `utils.py`.

We support 3 different output file formats:

| Format | Description |
| ------ | ----------- |
| `tfrecord` | The [TFRecord format](https://www.tensorflow.org/tutorials/load_data/tfrecord) can be used by various machine learning libraries.|
| `json` | This format is more human readable and can be used for debugging. It's also useful for finetuning PaLM models via the [Google Cloud API](https://cloud.google.com/vertex-ai/docs/generative-ai/models/tune-text-models-supervised#text).|
| `csv` | This format can be used by many existing tools. OpenAI also provides a tool to convert csv files to jsonl files.|
| `jsonl` | This format can be directly used by the [OpenAI API](https://platform.openai.com/docs/api-reference/) for finetuning GPT models.|

Example command:

```bash
python3 train_data_prep.py \
--input="testdata/example_data.json" \
--output="/tmp/example_data.jsonl" \
--output_type=jsonl \
--emit_input_length=1000 \
--emit_target_length=1000 \
--prompt_suffix=" --> " \
--completion_suffix=" [eod]" \
--input_feature_key="prompt" \
--output_feature_key="completion"
```

### LLM finetuning and inference (OpenAI)

> **Warning: This step is very costly! Proceed with caution at your own risk. Also GPT models are very different from PaLM models. Reproducibility is not guaranteed!**

In our paper, we used Google's internal tools to finetune PaLM 2 models and to run the model inference. Google's policy does not allow us to disclose any details about the tools and the PaLM 2 models.

However, if you are interested in reproducing some of our experiments, one option is to use other alternative LLMs, such as OpenAI's GPT models.

Using the `train_data_prep.py` tool mentioned above, you can create `csv` files, and use OpenAI libraries to convert to the jsonl format. Example command:

```
openai tools fine_tunes.prepare_data -f train_data.csv
```

Once you have the training data in jsonl format, you can finetune GPT models with the data, either via the API or using OpenAI's web UI. For example:

```
openai api fine_tunes.create -t "train_data.jsonl"
```

After you have finetuned a model, we provide a Python script `run_finetuned_gpt.py` to run the GPT model inference on testing data. You need to provide your `--api_key` and `--engine` to the script.

### LLM finetuning and inference (Llama)

We open sourced Llama 2 & 3 based models on Hugging Face:

* Llama 2: https://huggingface.co/google/DiarizationLM-13b-Fisher-v1
* Llama 3: https://huggingface.co/google/DiarizationLM-8b-Fisher-v2

The scripts to finetune these models are available in the `unsloth` folder.

### Completion parser

During inference, the prompts are send to the LLM, and the LLM will generate the completions. We provide a `postprocess_completions.py` script that serves as the completion parser module as described in the paper. It will:

1. Truncate the completion suffix, and any text generated after this suffix.
2. Concatenate the completions of all segments from the same utterance.
3. Transfer the speakers to the original hypothesis ASR transcript.

### Metrics

We report three metrics in our paper:

* [Word Error Rate (WER)](https://en.wikipedia.org/wiki/Word_error_rate)
* [Word Diarization Error Rate (WDER)](https://arxiv.org/pdf/1907.05337)
* [Concatenated minimum-permutation Word Error Rate (cpWER)](https://arxiv.org/pdf/2004.09249)

Also, we would like to highlight that all these three metrics reported in our
papers are **micro** metrics, i.e. both numerators and denominators are
aggregated on the entire dataset.

We provide an implementation of WER, WDER and cpWER in `metrics.py`. If you use
our json-based data format, you can call the `compute_metrics_on_json_dict()` function
as below:

```python
import diarizationlm

json_dict = {
  "utterances": [
      {
          "utterance_id": "utt1",
          "hyp_text": "hello good morning how are you",
          "hyp_spk": "1 1 1 2 2 2",
          "ref_text": "Hello. Good morning, how are you?",
          "ref_spk": "2 2 2 2 1 1",
      },
      {
          "utterance_id": "utt2",
          "hyp_text": "a b c d e f g h",
          "hyp_spk": "1 1 1 2 2 2 3 2",
          "ref_text": "a bb c e f gg g h ii",
          "ref_spk": "2 2 2 2 3 3 4 3 2",
      },
  ]
}
result = diarizationlm.compute_metrics_on_json_dict(json_dict)
print("WER =", result["WER"])
print("WDER =", result["WDER"])
print("cpWER =", result["cpWER"])
```

Or you can our script to produce metrics as below:

```
python3 compute_metrics_on_json.py \
--input=testdata/example_data.json \
--output=/tmp/example_metrics.json
```

If you use our `postprocess_completions.py` script to process the LLM results,
you need to specify `--hyp_spk_field="hyp_spk_llm"` when running
`compute_metrics_on_json.py`.

Also please note that this implementation is different from Google's internal implementation that we used in the paper, but is a best-effort attempt to
replicate the results. The biggest differences are from text normalization,
such as de-punctuation.

## Citation

Our paper is cited as:

```
@inproceedings{wang24h_interspeech,
  title     = {{DiarizationLM: Speaker Diarization Post-Processing with Large Language Models}},
  author    = {Quan Wang and Yiling Huang and Guanlong Zhao and Evan Clark and Wei Xia and Hank Liao},
  year      = {2024},
  booktitle = {Interspeech 2024},
  pages     = {3754--3758},
  doi       = {10.21437/Interspeech.2024-209},
}
```

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/google/speaker-id/tree/master/DiarizationLM",
    "name": "diarizationlm",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": null,
    "author": "Quan Wang",
    "author_email": "quanw@google.com",
    "download_url": "https://files.pythonhosted.org/packages/81/7c/e70f96779244812dc9d39ac02364d48e4ffd977c67fbd58c896d0f11c629/diarizationlm-0.1.4.tar.gz",
    "platform": null,
    "description": "# DiarizationLM\n\n[![Python application](https://github.com/google/speaker-id/actions/workflows/python-app-diarizationlm.yml/badge.svg)](https://github.com/google/speaker-id/actions/workflows/python-app-diarizationlm.yml)\n[![PyPI Version](https://img.shields.io/pypi/v/diarizationlm.svg)](https://pypi.python.org/pypi/diarizationlm)\n[![Python Versions](https://img.shields.io/pypi/pyversions/diarizationlm.svg)](https://pypi.org/project/diarizationlm)\n[![Downloads](https://static.pepy.tech/badge/diarizationlm)](https://www.pepy.tech/projects/diarizationlm)\n[![codecov](https://codecov.io/gh/google/speaker-id/branch/master/graph/badge.svg)](https://codecov.io/gh/google/speaker-id)\n[![Documentation](https://img.shields.io/badge/arXiv-preprint-blue.svg)](https://arxiv.org/abs/2401.03506)\n[![HuggingFace](https://img.shields.io/badge/Hugging-Face-blue.svg)](https://huggingface.co/google/DiarizationLM-8b-Fisher-v2)\n[![HuggingFace Space](https://img.shields.io/badge/Online-Demo-blue.svg)](https://huggingface.co/spaces/diarizers-community/DiarizationLM-GGUF)\n\n\n## Table of contents\n\n* [Overview](#Overview)\n* [Instructions](#Instructions)\n  * [Install the package](#Install-the-package)\n  * [Data format](#Data-format)\n  * [Conversion between representations](#Conversion-between-representations)\n  * [Transcript-preserving speaker transfer (TPST)](#Transcript-preserving-speaker-transfer-TPST)\n  * [Training data preparation](#Training-data-preparation)\n  * [LLM finetuning and inference (OpenAI)](#LLM-finetuning-and-inference-OpenAI)\n  * [LLM finetuning and inference (Llama)](#LLM-finetuning-and-inference-Llama)\n  * [Completion parser](#Completion-parser)\n  * [Metrics](#Metrics)\n* [Citation](#Citation)\n\n## Overview\n\nHere we open source some functions and tools used in the [DiarizationLM paper](https://arxiv.org/abs/2401.03506).\n\nWe also have open source models on Hugging Face: https://huggingface.co/google/DiarizationLM-8b-Fisher-v2\n\nPlay with our demo: https://huggingface.co/spaces/diarizers-community/DiarizationLM-GGUF\n\n<img src=\"resources/DiarizationLM_Bard_demo.gif\" alt=\"demo\" width=\"512\"/>\n\n## Disclaimer\n\n**This is NOT an official Google product.**\n\n## Instructions\n\n![img](resources/diagram.png)\n\n### Install the package\n\nYou can install the package with:\n\n```\npip install diarizationlm\n```\n\nOnce installed, you can directly use many of the existing functions from the package. For example:\n\n```python\nimport diarizationlm\n\nsrc_text = \"hello good morning hi how are you pretty good\"\nsrc_spk = \"1 1 1 2 2 2 2 1 1\"\ntgt_text = \"hello morning hi hey are you be good\"\ntgt_spk = \"1 2 2 2 1 1 2 1\"\ntransferred_spk = diarizationlm.transcript_preserving_speaker_transfer(\n    src_text, src_spk, tgt_text, tgt_spk)\nprint(transferred_spk)\n```\n\n### Data format\n\nWe assume all internal data are stored in JSON files. An example is `testdata/example_data.json`. The field `\"utterances\"` stores a list of utterances, and in each utterance we have these string fields:\n\n| Field | Description |\n| ----- | ----------- |\n| `\"utterance_id\"` | This stores the utterance ID.|\n| `\"hyp_text\"` | This stores the sequence of hypothesis words, but joined by spaces.|\n| `\"hyp_spk\"` | This stores the sequence of hypothesis speakers, but joined by spaces.|\n| `\"hyp_diarized_text\"` | This is the text representation of the hypothesis words and speakers. It can be used for debugging and to build the prompts to LLM.|\n| `\"ref_*\"` | Similar to the `\"hyp_*\"` fields, but these are ground truth reference, rather than hypothesis.|\n\n### Conversion between representations\n\nIn the paper, we mentioned two representations:\n\n1. The word sequence and speaker sequence representation.\n2. The pure text representation.\n\nExample:\n\n```\nWord sequence:         [\"good\", \"morning\", \"how\", \"are\", \"you\"]\nSpeaker sequence:      [1, 1, 2, 2, 2]\nText representation:   \"<spk:1> good morning <spk:2> how are you\"\n```\n\nWe provide the functions in `diarizationlm/utils.py` to convert between these two representations:\n\n* `create_diarized_text()` converts the word and speaker sequences to the pure text representation.\n* `extract_text_and_spk()` converts the pure text representation to the word and speaker sequences.\n\n### Transcript-preserving speaker transfer (TPST)\n\nTPST is a critical data processing algorithm used in multiple places in our paper.\n\nA Python implementation is available in `diarizationlm/utils.py`, defined as:\n\n```Python\ndef transcript_preserving_speaker_transfer(\n    src_text: str, src_spk: str, tgt_text: str, tgt_spk: str\n) -> str\n```\n\n![img](resources/TPST_algorithm.png)\n\n### Training data preparation\n\nWe provide a Python script `train_data_prep.py` that can be used for preparing the dataset for finetuning LLMs (i.e. the prompt builder module described in the paper). This tool will do these for you:\n\n1. Segment the prompts and completions based on the input and output length limit.\n2. Optionally apply prefix and suffix to prompts and completions.\n3. Store prompt-completion pairs in different file formats.\n\nThe segmentation length, prefix, and suffix are passed in as flags to `train_data_prep.py`. In Python code, they are configured as `PromptOptions` defined in `utils.py`.\n\nWe support 3 different output file formats:\n\n| Format | Description |\n| ------ | ----------- |\n| `tfrecord` | The [TFRecord format](https://www.tensorflow.org/tutorials/load_data/tfrecord) can be used by various machine learning libraries.|\n| `json` | This format is more human readable and can be used for debugging. It's also useful for finetuning PaLM models via the [Google Cloud API](https://cloud.google.com/vertex-ai/docs/generative-ai/models/tune-text-models-supervised#text).|\n| `csv` | This format can be used by many existing tools. OpenAI also provides a tool to convert csv files to jsonl files.|\n| `jsonl` | This format can be directly used by the [OpenAI API](https://platform.openai.com/docs/api-reference/) for finetuning GPT models.|\n\nExample command:\n\n```bash\npython3 train_data_prep.py \\\n--input=\"testdata/example_data.json\" \\\n--output=\"/tmp/example_data.jsonl\" \\\n--output_type=jsonl \\\n--emit_input_length=1000 \\\n--emit_target_length=1000 \\\n--prompt_suffix=\" --> \" \\\n--completion_suffix=\" [eod]\" \\\n--input_feature_key=\"prompt\" \\\n--output_feature_key=\"completion\"\n```\n\n### LLM finetuning and inference (OpenAI)\n\n> **Warning: This step is very costly! Proceed with caution at your own risk. Also GPT models are very different from PaLM models. Reproducibility is not guaranteed!**\n\nIn our paper, we used Google's internal tools to finetune PaLM 2 models and to run the model inference. Google's policy does not allow us to disclose any details about the tools and the PaLM 2 models.\n\nHowever, if you are interested in reproducing some of our experiments, one option is to use other alternative LLMs, such as OpenAI's GPT models.\n\nUsing the `train_data_prep.py` tool mentioned above, you can create `csv` files, and use OpenAI libraries to convert to the jsonl format. Example command:\n\n```\nopenai tools fine_tunes.prepare_data -f train_data.csv\n```\n\nOnce you have the training data in jsonl format, you can finetune GPT models with the data, either via the API or using OpenAI's web UI. For example:\n\n```\nopenai api fine_tunes.create -t \"train_data.jsonl\"\n```\n\nAfter you have finetuned a model, we provide a Python script `run_finetuned_gpt.py` to run the GPT model inference on testing data. You need to provide your `--api_key` and `--engine` to the script.\n\n### LLM finetuning and inference (Llama)\n\nWe open sourced Llama 2 & 3 based models on Hugging Face:\n\n* Llama 2: https://huggingface.co/google/DiarizationLM-13b-Fisher-v1\n* Llama 3: https://huggingface.co/google/DiarizationLM-8b-Fisher-v2\n\nThe scripts to finetune these models are available in the `unsloth` folder.\n\n### Completion parser\n\nDuring inference, the prompts are send to the LLM, and the LLM will generate the completions. We provide a `postprocess_completions.py` script that serves as the completion parser module as described in the paper. It will:\n\n1. Truncate the completion suffix, and any text generated after this suffix.\n2. Concatenate the completions of all segments from the same utterance.\n3. Transfer the speakers to the original hypothesis ASR transcript.\n\n### Metrics\n\nWe report three metrics in our paper:\n\n* [Word Error Rate (WER)](https://en.wikipedia.org/wiki/Word_error_rate)\n* [Word Diarization Error Rate (WDER)](https://arxiv.org/pdf/1907.05337)\n* [Concatenated minimum-permutation Word Error Rate (cpWER)](https://arxiv.org/pdf/2004.09249)\n\nAlso, we would like to highlight that all these three metrics reported in our\npapers are **micro** metrics, i.e. both numerators and denominators are\naggregated on the entire dataset.\n\nWe provide an implementation of WER, WDER and cpWER in `metrics.py`. If you use\nour json-based data format, you can call the `compute_metrics_on_json_dict()` function\nas below:\n\n```python\nimport diarizationlm\n\njson_dict = {\n  \"utterances\": [\n      {\n          \"utterance_id\": \"utt1\",\n          \"hyp_text\": \"hello good morning how are you\",\n          \"hyp_spk\": \"1 1 1 2 2 2\",\n          \"ref_text\": \"Hello. Good morning, how are you?\",\n          \"ref_spk\": \"2 2 2 2 1 1\",\n      },\n      {\n          \"utterance_id\": \"utt2\",\n          \"hyp_text\": \"a b c d e f g h\",\n          \"hyp_spk\": \"1 1 1 2 2 2 3 2\",\n          \"ref_text\": \"a bb c e f gg g h ii\",\n          \"ref_spk\": \"2 2 2 2 3 3 4 3 2\",\n      },\n  ]\n}\nresult = diarizationlm.compute_metrics_on_json_dict(json_dict)\nprint(\"WER =\", result[\"WER\"])\nprint(\"WDER =\", result[\"WDER\"])\nprint(\"cpWER =\", result[\"cpWER\"])\n```\n\nOr you can our script to produce metrics as below:\n\n```\npython3 compute_metrics_on_json.py \\\n--input=testdata/example_data.json \\\n--output=/tmp/example_metrics.json\n```\n\nIf you use our `postprocess_completions.py` script to process the LLM results,\nyou need to specify `--hyp_spk_field=\"hyp_spk_llm\"` when running\n`compute_metrics_on_json.py`.\n\nAlso please note that this implementation is different from Google's internal implementation that we used in the paper, but is a best-effort attempt to\nreplicate the results. The biggest differences are from text normalization,\nsuch as de-punctuation.\n\n## Citation\n\nOur paper is cited as:\n\n```\n@inproceedings{wang24h_interspeech,\n  title     = {{DiarizationLM: Speaker Diarization Post-Processing with Large Language Models}},\n  author    = {Quan Wang and Yiling Huang and Guanlong Zhao and Evan Clark and Wei Xia and Hank Liao},\n  year      = {2024},\n  booktitle = {Interspeech 2024},\n  pages     = {3754--3758},\n  doi       = {10.21437/Interspeech.2024-209},\n}\n```\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "DiarizationLM",
    "version": "0.1.4",
    "project_urls": {
        "Homepage": "https://github.com/google/speaker-id/tree/master/DiarizationLM"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "09debb9c9ac96491dc5cbaef151648fcbe4b06e762272c416bb9551d2bbef833",
                "md5": "b1edc6d01972bc7b53a136b38f840cf4",
                "sha256": "ea9a2153f70c3dcfd2ce6410573784c1be2eec7672cf4f149971537576230f0f"
            },
            "downloads": -1,
            "filename": "diarizationlm-0.1.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b1edc6d01972bc7b53a136b38f840cf4",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 24707,
            "upload_time": "2024-10-25T00:22:53",
            "upload_time_iso_8601": "2024-10-25T00:22:53.679340Z",
            "url": "https://files.pythonhosted.org/packages/09/de/bb9c9ac96491dc5cbaef151648fcbe4b06e762272c416bb9551d2bbef833/diarizationlm-0.1.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "817ce70f96779244812dc9d39ac02364d48e4ffd977c67fbd58c896d0f11c629",
                "md5": "2561bf201f83811bb5d6173cb7958ad4",
                "sha256": "468ef9ac72f5207dd5e7ad19558d2491611e5b213f3d0ec8d26e956ba34accdb"
            },
            "downloads": -1,
            "filename": "diarizationlm-0.1.4.tar.gz",
            "has_sig": false,
            "md5_digest": "2561bf201f83811bb5d6173cb7958ad4",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 26351,
            "upload_time": "2024-10-25T00:22:55",
            "upload_time_iso_8601": "2024-10-25T00:22:55.350841Z",
            "url": "https://files.pythonhosted.org/packages/81/7c/e70f96779244812dc9d39ac02364d48e4ffd977c67fbd58c896d0f11c629/diarizationlm-0.1.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-25 00:22:55",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "google",
    "github_project": "speaker-id",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "diarizationlm"
}

Quan Wang