# torch-cif
A fast parallel implementation pure PyTorch implementation of *"CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition"* https://arxiv.org/abs/1905.11235.
## Installation
### PyPI
```bash
pip install torch-cif
```
### Locally
```bash
git clone https://github.com/George0828Zhang/torch_cif
cd torch_cif
python setup.py install
```
## Usage
```python
def cif_function(
inputs: Tensor,
alpha: Tensor,
beta: float = 1.0,
tail_thres: float = 0.5,
padding_mask: Optional[Tensor] = None,
target_lengths: Optional[Tensor] = None,
eps: float = 1e-4,
unbound_alpha: bool = False
) -> Dict[str, List[Tensor]]:
r""" A fast parallel implementation of continuous integrate-and-fire (CIF)
https://arxiv.org/abs/1905.11235
Shapes:
N: batch size
S: source (encoder) sequence length
C: source feature dimension
T: target sequence length
Args:
inputs (Tensor): (N, S, C) Input features to be integrated.
alpha (Tensor): (N, S) Weights corresponding to each elements in the
inputs. It is expected to be after sigmoid function.
beta (float): the threshold used for determine firing.
tail_thres (float): the threshold for determine firing for tail handling.
padding_mask (Tensor, optional): (N, S) A binary mask representing
padded elements in the inputs. 1 is padding, 0 is not.
target_lengths (Tensor, optional): (N,) Desired length of the targets
for each sample in the minibatch.
eps (float, optional): Epsilon to prevent underflow for divisions.
Default: 1e-4
unbound_alpha (bool, optional): Whether to check if 0 <= alpha <= 1.
Returns -> Dict[str, List[Tensor]]: Key/values described below.
cif_out: (N, T, C) The output integrated from the source.
cif_lengths: (N,) The output length for each element in batch.
alpha_sum: (N,) The sum of alpha for each element in batch.
Can be used to compute the quantity loss.
delays: (N, T) The expected delay (in terms of source tokens) for
each target tokens in the batch.
tail_weights: (N,) During inference, return the tail.
scaled_alpha: (N, S) alpha after applying weight scaling.
cumsum_alpha: (N, S) cumsum of alpha after scaling.
right_indices: (N, S) right scatter indices, or floor(cumsum(alpha)).
right_weights: (N, S) right scatter weights.
left_indices: (N, S) left scatter indices.
left_weights: (N, S) left scatter weights.
"""
```
## Note
- This implementation uses `cumsum` and `floor` to determine the firing positions, and use `scatter` to merge the weighted source features. The figure below demonstrates this concept using *scaled* weight sequence `(0.4, 1.8, 1.2, 1.2, 1.4)`
<img src="concept.png" alt="drawing" width="300"/>
- Runing test requires `pip install hypothesis expecttest`.
- If `beta != 1`, our implementation slightly differ from Algorithm 1 in the paper [[1]](#reference):
- When a boundary is located, the original algorithm add the last feature to the current integration with weight `1 - accumulation` (line 11 in Algorithm 1), which causes negative weights in next integration when `alpha < 1 - accumulation`.
- We use `beta - accumulation`, which means the weight in next integration `alpha - (beta - accumulation)` is always positive.
- Feel free to contact me if there are bugs in the code.
## References
1. [CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition](https://arxiv.org/abs/1905.11235)
2. [Exploring Continuous Integrate-and-Fire for Adaptive Simultaneous Speech Translation](https://www.isca-archive.org/interspeech_2022/chang22f_interspeech.html)
Raw data
{
"_id": null,
"home_page": "https://github.com/George0828Zhang/torch_cif",
"name": "torch-cif",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": "",
"keywords": "speech speech-recognition asr automatic-speech-recognition speech-to-text speech-translation continuous-integrate-and-fire cif monotonic alignment torch pytorch",
"author": "Chih-Chiang Chang",
"author_email": "cc.chang0828@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/68/db/512246d1f48fe3132ddbf4f7a1cc89f2a35bb556b4b8bc69542d48e004c4/torch_cif-0.2.0.tar.gz",
"platform": null,
"description": "# torch-cif\r\n\r\nA fast parallel implementation pure PyTorch implementation of *\"CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition\"* https://arxiv.org/abs/1905.11235.\r\n\r\n## Installation\r\n### PyPI\r\n```bash\r\npip install torch-cif\r\n```\r\n### Locally\r\n```bash\r\ngit clone https://github.com/George0828Zhang/torch_cif\r\ncd torch_cif\r\npython setup.py install\r\n```\r\n\r\n## Usage\r\n```python\r\ndef cif_function(\r\n inputs: Tensor,\r\n alpha: Tensor,\r\n beta: float = 1.0,\r\n tail_thres: float = 0.5,\r\n padding_mask: Optional[Tensor] = None,\r\n target_lengths: Optional[Tensor] = None,\r\n eps: float = 1e-4,\r\n unbound_alpha: bool = False\r\n) -> Dict[str, List[Tensor]]:\r\n r\"\"\" A fast parallel implementation of continuous integrate-and-fire (CIF)\r\n https://arxiv.org/abs/1905.11235\r\n\r\n Shapes:\r\n N: batch size\r\n S: source (encoder) sequence length\r\n C: source feature dimension\r\n T: target sequence length\r\n\r\n Args:\r\n inputs (Tensor): (N, S, C) Input features to be integrated.\r\n alpha (Tensor): (N, S) Weights corresponding to each elements in the\r\n inputs. It is expected to be after sigmoid function.\r\n beta (float): the threshold used for determine firing.\r\n tail_thres (float): the threshold for determine firing for tail handling.\r\n padding_mask (Tensor, optional): (N, S) A binary mask representing\r\n padded elements in the inputs. 1 is padding, 0 is not.\r\n target_lengths (Tensor, optional): (N,) Desired length of the targets\r\n for each sample in the minibatch.\r\n eps (float, optional): Epsilon to prevent underflow for divisions.\r\n Default: 1e-4\r\n unbound_alpha (bool, optional): Whether to check if 0 <= alpha <= 1.\r\n\r\n Returns -> Dict[str, List[Tensor]]: Key/values described below.\r\n cif_out: (N, T, C) The output integrated from the source.\r\n cif_lengths: (N,) The output length for each element in batch.\r\n alpha_sum: (N,) The sum of alpha for each element in batch.\r\n Can be used to compute the quantity loss.\r\n delays: (N, T) The expected delay (in terms of source tokens) for\r\n each target tokens in the batch.\r\n tail_weights: (N,) During inference, return the tail.\r\n scaled_alpha: (N, S) alpha after applying weight scaling.\r\n cumsum_alpha: (N, S) cumsum of alpha after scaling.\r\n right_indices: (N, S) right scatter indices, or floor(cumsum(alpha)).\r\n right_weights: (N, S) right scatter weights.\r\n left_indices: (N, S) left scatter indices.\r\n left_weights: (N, S) left scatter weights.\r\n \"\"\"\r\n```\r\n\r\n## Note\r\n- This implementation uses `cumsum` and `floor` to determine the firing positions, and use `scatter` to merge the weighted source features. The figure below demonstrates this concept using *scaled* weight sequence `(0.4, 1.8, 1.2, 1.2, 1.4)`\r\n\r\n<img src=\"concept.png\" alt=\"drawing\" width=\"300\"/>\r\n\r\n- Runing test requires `pip install hypothesis expecttest`.\r\n- If `beta != 1`, our implementation slightly differ from Algorithm 1 in the paper [[1]](#reference):\r\n - When a boundary is located, the original algorithm add the last feature to the current integration with weight `1 - accumulation` (line 11 in Algorithm 1), which causes negative weights in next integration when `alpha < 1 - accumulation`. \r\n - We use `beta - accumulation`, which means the weight in next integration `alpha - (beta - accumulation)` is always positive.\r\n- Feel free to contact me if there are bugs in the code.\r\n\r\n## References\r\n1. [CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition](https://arxiv.org/abs/1905.11235)\r\n2. [Exploring Continuous Integrate-and-Fire for Adaptive Simultaneous Speech Translation](https://www.isca-archive.org/interspeech_2022/chang22f_interspeech.html)\r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A fast parallel implementation of continuous integrate-and-fire (CIF) https://arxiv.org/abs/1905.11235",
"version": "0.2.0",
"project_urls": {
"Homepage": "https://github.com/George0828Zhang/torch_cif"
},
"split_keywords": [
"speech",
"speech-recognition",
"asr",
"automatic-speech-recognition",
"speech-to-text",
"speech-translation",
"continuous-integrate-and-fire",
"cif",
"monotonic",
"alignment",
"torch",
"pytorch"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "506256b63836a97fdd830010daceb6f512d0fd0402eb7f4b8499e1ca7da6ca70",
"md5": "53c6936c1ba904aa95592c01cad07762",
"sha256": "b9027a4411cc46d8d66e81fed8d3041dc3e466b560b47d22c798885d7bcc4dfc"
},
"downloads": -1,
"filename": "torch_cif-0.2.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "53c6936c1ba904aa95592c01cad07762",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 6626,
"upload_time": "2024-02-09T05:22:46",
"upload_time_iso_8601": "2024-02-09T05:22:46.269680Z",
"url": "https://files.pythonhosted.org/packages/50/62/56b63836a97fdd830010daceb6f512d0fd0402eb7f4b8499e1ca7da6ca70/torch_cif-0.2.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "68db512246d1f48fe3132ddbf4f7a1cc89f2a35bb556b4b8bc69542d48e004c4",
"md5": "cbd2377f35fddabe9375dfe094fe9ddc",
"sha256": "d865465dffb940840f82ff3db381747cdac63438c725fe64c649a3e24b5829d3"
},
"downloads": -1,
"filename": "torch_cif-0.2.0.tar.gz",
"has_sig": false,
"md5_digest": "cbd2377f35fddabe9375dfe094fe9ddc",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 8028,
"upload_time": "2024-02-09T05:22:48",
"upload_time_iso_8601": "2024-02-09T05:22:48.106521Z",
"url": "https://files.pythonhosted.org/packages/68/db/512246d1f48fe3132ddbf4f7a1cc89f2a35bb556b4b8bc69542d48e004c4/torch_cif-0.2.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-02-09 05:22:48",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "George0828Zhang",
"github_project": "torch_cif",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "torch-cif"
}