# tf-seq2seq-losses
Tensorflow implementations for
[Connectionist Temporal Classification](file:///home/alexey/Downloads/Connectionist_temporal_classification_Labelling_un.pdf)
(CTC) loss functions that are fast and support second-order derivatives.
## Installation
```bash
$ pip install tf-seq2seq-losses
```
## Why Use This Package?
### 1. Faster Performance
Official CTC loss implementation,
[`tf.nn.ctc_loss`](https://www.tensorflow.org/api_docs/python/tf/nn/ctc_loss),
is significantly slower.
Our implementation is approximately 30 times faster, as shown by the benchmark results:
| Name | Forward Time (ms) | Gradient Calculation Time (ms) |
|:------------------:|:-----------------:|:------------------------------:|
| `tf.nn.ctc_loss` | 13.2 ± 0.02 | 10.4 ± 3 |
| `classic_ctc_loss` | 0.138 ± 0.006 | 0.28 ± 0.01 |
| `simple_ctc_loss` | 0.0531 ± 0.003 | 0.119 ± 0.004 |
Tested on a single GPU: GeForce GTX 970, Driver Version: 460.91.03, CUDA Version: 11.2. For the experimental setup, see
[`benchmark.py`](tests/performance_test.py)
To reproduce this benchmark, run the following command from the project root directory
(install `pytest` and `pandas` if needed):
```bash
$ pytest -o log_cli=true --log-level=INFO tests/benchmark.py
```
Here, `classic_ctc_loss` is the standard version of CTC loss with token collapsing, e.g., `a_bb_ccc_c -> abcc`.
The `simple_ctc_loss` is a simplified version that removes blanks trivially, e.g., `a_bb_ccc_c -> abbcccc`.
### 2. Supports Second-Order Derivatives
This implementation supports second-order derivatives without using TensorFlow's autogradient.
Instead, it uses a custom approach similar to the one described
[here](https://www.tensorflow.org/api_docs/python/tf/nn/ctc_loss)
with a complexity of
$O(l^4)$,
where
$l$
is the sequence length. The gradient complexity is
$O(l^2)$.
Example usage:
```python
import tensorflow as tf
from tf_seq2seq_losses import classic_ctc_loss
batch_size = 2
num_tokens = 3
logit_length = 5
labels = tf.constant([[1, 2, 2, 1], [1, 2, 1, 0]], dtype=tf.int32)
label_length = tf.constant([4, 3], dtype=tf.int32)
logits = tf.zeros(shape=[batch_size, logit_length, num_tokens], dtype=tf.float32)
logit_length = tf.constant([5, 4], dtype=tf.int32)
with tf.GradientTape(persistent=True) as tape1:
tape1.watch([logits])
with tf.GradientTape() as tape2:
tape2.watch([logits])
loss = tf.reduce_sum(classic_ctc_loss(
labels=labels,
logits=logits,
label_length=label_length,
logit_length=logit_length,
blank_index=0,
))
gradient = tape2.gradient(loss, sources=logits)
hessian = tape1.batch_jacobian(gradient, source=logits, experimental_use_pfor=False)
# shape = [2, 5, 3, 5, 3]
```
### 3. Numerical Stability
1. The proposed implementation is more numerically stable,
producing reasonable outputs even for logits of order `1e+10` and `-tf.inf`.
2. If the logit length is too short to predict the label output,
the loss is `tf.inf` for that sample, unlike `tf.nn.ctc_loss`, which might output `707.13184`.
### 4. Pure Python Implementation
This is a pure Python/TensorFlow implementation, eliminating the need to build or compile any C++/CUDA components.
## Usage
The interface is identical to `tensorflow.nn.ctc_loss` with `logits_time_major=False`.
Example:
```python
import tensorflow as tf
from tf_seq2seq_losses import classic_ctc_loss
batch_size = 1
num_tokens = 3 # = 2 tokens + 1 blank token
logit_length = 5
loss = classic_ctc_loss(
labels=tf.constant([[1, 2, 2, 1]], dtype=tf.int32),
logits=tf.zeros(shape=[batch_size, logit_length, num_tokens], dtype=tf.float32),
label_length=tf.constant([4], dtype=tf.int32),
logit_length=tf.constant([logit_length], dtype=tf.int32),
blank_index=0,
)
```
## Under the Hood
The implementation uses TensorFlow operations such as tf.while_loop and tf.TensorArray.
The main computational bottleneck is the iteration over the logit length to calculate α and β
(as described in the original
[CTC paper](file:///home/alexey/Downloads/Connectionist_temporal_classification_Labelling_un.pdf)).
The expected gradient GPU calculation time is linear with respect to the logit length.
## Known Issues
### 1. Warning:
> AutoGraph could not transform <function classic_ctc_loss at ...> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10
(on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Observed with TensorFlow version 2.4.1.
This warning does not affect performance and is caused by the use of Union in type annotations.
### 2. UnimplementedError:
Using `tf.jacobian` and `tf.batch_jacobian` for the second derivative of classic_ctc_loss with
`experimental_use_pfor=False` in `tf.GradientTape` may cause an unexpected `UnimplementedError`
in TensorFlow version 2.4.1 or later.
This can be avoided by setting `experimental_use_pfor=True`
or by using `ClassicCtcLossData.hessian` directly without `tf.GradientTape`.
Feel free to reach out if you have any questions or need further clarification.
Raw data
{
"_id": null,
"home_page": "https://github.com/alexeytochin/tf-seq2seq-losses",
"name": "tf-seq2seq-losses",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "tensorflow, loss, loss function, ctc, connectionist temporal classification, seq2se, seq 2 seq, seq to seq, asr, automatic speach recognition, sequence recognitionhessian, second derivative",
"author": "Alexey Tochin",
"author_email": "alexey.tochin@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/35/53/2d759f7d7f0003afe07f1c1cfbf2fb70f141443f65ad263c861639d38087/tf_seq2seq_losses-0.3.0.tar.gz",
"platform": null,
"description": "# tf-seq2seq-losses\nTensorflow implementations for\n[Connectionist Temporal Classification](file:///home/alexey/Downloads/Connectionist_temporal_classification_Labelling_un.pdf)\n(CTC) loss functions that are fast and support second-order derivatives.\n\n## Installation\n```bash\n$ pip install tf-seq2seq-losses\n```\n\n## Why Use This Package?\n### 1. Faster Performance\nOfficial CTC loss implementation, \n[`tf.nn.ctc_loss`](https://www.tensorflow.org/api_docs/python/tf/nn/ctc_loss),\nis significantly slower.\nOur implementation is approximately 30 times faster, as shown by the benchmark results:\n\n| Name | Forward Time (ms) | Gradient Calculation Time (ms) | \n|:------------------:|:-----------------:|:------------------------------:|\n| `tf.nn.ctc_loss` | 13.2 \u00b1 0.02 | 10.4 \u00b1 3 |\n| `classic_ctc_loss` | 0.138 \u00b1 0.006 | 0.28 \u00b1 0.01 |\n| `simple_ctc_loss` | 0.0531 \u00b1 0.003 | 0.119 \u00b1 0.004 |\n\nTested on a single GPU: GeForce GTX 970, Driver Version: 460.91.03, CUDA Version: 11.2. For the experimental setup, see\n[`benchmark.py`](tests/performance_test.py)\nTo reproduce this benchmark, run the following command from the project root directory \n(install `pytest` and `pandas` if needed):\n```bash\n$ pytest -o log_cli=true --log-level=INFO tests/benchmark.py\n```\nHere, `classic_ctc_loss` is the standard version of CTC loss with token collapsing, e.g., `a_bb_ccc_c -> abcc`. \nThe `simple_ctc_loss` is a simplified version that removes blanks trivially, e.g., `a_bb_ccc_c -> abbcccc`.\n\n### 2. Supports Second-Order Derivatives\nThis implementation supports second-order derivatives without using TensorFlow's autogradient. \nInstead, it uses a custom approach similar to the one described\n[here](https://www.tensorflow.org/api_docs/python/tf/nn/ctc_loss)\nwith a complexity of \n$O(l^4)$, \nwhere \n$l$\nis the sequence length. The gradient complexity is \n$O(l^2)$.\n\nExample usage:\n```python\nimport tensorflow as tf\nfrom tf_seq2seq_losses import classic_ctc_loss \n\nbatch_size = 2\nnum_tokens = 3\nlogit_length = 5\nlabels = tf.constant([[1, 2, 2, 1], [1, 2, 1, 0]], dtype=tf.int32)\nlabel_length = tf.constant([4, 3], dtype=tf.int32)\nlogits = tf.zeros(shape=[batch_size, logit_length, num_tokens], dtype=tf.float32)\nlogit_length = tf.constant([5, 4], dtype=tf.int32)\n\nwith tf.GradientTape(persistent=True) as tape1: \n tape1.watch([logits])\n with tf.GradientTape() as tape2:\n tape2.watch([logits])\n loss = tf.reduce_sum(classic_ctc_loss(\n labels=labels,\n logits=logits,\n label_length=label_length,\n logit_length=logit_length,\n blank_index=0,\n ))\n gradient = tape2.gradient(loss, sources=logits)\nhessian = tape1.batch_jacobian(gradient, source=logits, experimental_use_pfor=False)\n# shape = [2, 5, 3, 5, 3]\n```\n\n### 3. Numerical Stability\n1. The proposed implementation is more numerically stable, \nproducing reasonable outputs even for logits of order `1e+10` and `-tf.inf`.\n2. If the logit length is too short to predict the label output, \nthe loss is `tf.inf` for that sample, unlike `tf.nn.ctc_loss`, which might output `707.13184`.\n\n\n### 4. Pure Python Implementation\nThis is a pure Python/TensorFlow implementation, eliminating the need to build or compile any C++/CUDA components.\n\n\n## Usage\nThe interface is identical to `tensorflow.nn.ctc_loss` with `logits_time_major=False`.\n\nExample:\n```python\nimport tensorflow as tf\nfrom tf_seq2seq_losses import classic_ctc_loss\n\nbatch_size = 1\nnum_tokens = 3 # = 2 tokens + 1 blank token\nlogit_length = 5\nloss = classic_ctc_loss(\n labels=tf.constant([[1, 2, 2, 1]], dtype=tf.int32),\n logits=tf.zeros(shape=[batch_size, logit_length, num_tokens], dtype=tf.float32),\n label_length=tf.constant([4], dtype=tf.int32),\n logit_length=tf.constant([logit_length], dtype=tf.int32),\n blank_index=0,\n)\n```\n\n## Under the Hood\nThe implementation uses TensorFlow operations such as tf.while_loop and tf.TensorArray. \nThe main computational bottleneck is the iteration over the logit length to calculate \u03b1 and \u03b2 \n(as described in the original\n[CTC paper](file:///home/alexey/Downloads/Connectionist_temporal_classification_Labelling_un.pdf)). \nThe expected gradient GPU calculation time is linear with respect to the logit length.\n\n## Known Issues\n### 1. Warning:\n> AutoGraph could not transform <function classic_ctc_loss at ...> and will run it as-is.\nPlease report this to the TensorFlow team. When filing the bug, set the verbosity to 10 \n(on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.\n\nObserved with TensorFlow version 2.4.1. \nThis warning does not affect performance and is caused by the use of Union in type annotations.\n\n### 2. UnimplementedError:\nUsing `tf.jacobian` and `tf.batch_jacobian` for the second derivative of classic_ctc_loss with \n`experimental_use_pfor=False` in `tf.GradientTape` may cause an unexpected `UnimplementedError` \nin TensorFlow version 2.4.1 or later. \nThis can be avoided by setting `experimental_use_pfor=True` \nor by using `ClassicCtcLossData.hessian` directly without `tf.GradientTape`.\n\nFeel free to reach out if you have any questions or need further clarification.\n",
"bugtrack_url": null,
"license": "Apache 2.0",
"summary": "Tensorflow implementations for (CTC) loss functions that are fast and support second-order derivatives.",
"version": "0.3.0",
"project_urls": {
"Changelog": "https://github.com/alexeytochin/tf-seq2seq-losses/CHANGELOG.md",
"Documentation": "https://github.com/alexeytochin/tf-seq2seq-losses/README.md",
"GitHub": "https://github.com/alexeytochin/tf-seq2seq-losses",
"Homepage": "https://github.com/alexeytochin/tf-seq2seq-losses"
},
"split_keywords": [
"tensorflow",
" loss",
" loss function",
" ctc",
" connectionist temporal classification",
" seq2se",
" seq 2 seq",
" seq to seq",
" asr",
" automatic speach recognition",
" sequence recognitionhessian",
" second derivative"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "f3976a9c72a0159900f7be1fbd0dcc9e0bb53e69ab5095a2a97c6bb75cf8ed92",
"md5": "7ffd7292e4fb390c4af8ee3cc114c1ae",
"sha256": "1ae12a3ede0bb96f1276de4c6784f9503fc5b28f99310bfb25ff3b1b147501bb"
},
"downloads": -1,
"filename": "tf_seq2seq_losses-0.3.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "7ffd7292e4fb390c4af8ee3cc114c1ae",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 26081,
"upload_time": "2024-06-20T20:41:28",
"upload_time_iso_8601": "2024-06-20T20:41:28.286844Z",
"url": "https://files.pythonhosted.org/packages/f3/97/6a9c72a0159900f7be1fbd0dcc9e0bb53e69ab5095a2a97c6bb75cf8ed92/tf_seq2seq_losses-0.3.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "35532d759f7d7f0003afe07f1c1cfbf2fb70f141443f65ad263c861639d38087",
"md5": "758c9dd2909ed5ca6cae3a3b913d3aa9",
"sha256": "f03d225675115746670efa7ee5410be5757533c8129a43b8bc9fd40419f3e134"
},
"downloads": -1,
"filename": "tf_seq2seq_losses-0.3.0.tar.gz",
"has_sig": false,
"md5_digest": "758c9dd2909ed5ca6cae3a3b913d3aa9",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 28050,
"upload_time": "2024-06-20T20:41:32",
"upload_time_iso_8601": "2024-06-20T20:41:32.455127Z",
"url": "https://files.pythonhosted.org/packages/35/53/2d759f7d7f0003afe07f1c1cfbf2fb70f141443f65ad263c861639d38087/tf_seq2seq_losses-0.3.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-06-20 20:41:32",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "alexeytochin",
"github_project": "tf-seq2seq-losses",
"github_not_found": true,
"lcname": "tf-seq2seq-losses"
}