torch-cif

Name	torch-cif JSON
Version	0.2.0 JSON
	download
home_page	https://github.com/George0828Zhang/torch_cif
Summary	A fast parallel implementation of continuous integrate-and-fire (CIF) https://arxiv.org/abs/1905.11235
upload_time	2024-02-09 05:22:48
maintainer
docs_url	None
author	Chih-Chiang Chang
requires_python	>=3.6
license	MIT
keywords	speech speech-recognition asr automatic-speech-recognition speech-to-text speech-translation continuous-integrate-and-fire cif monotonic alignment torch pytorch
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # torch-cif

A fast parallel implementation pure PyTorch implementation of *"CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition"*  https://arxiv.org/abs/1905.11235.

## Installation
### PyPI
```bash
pip install torch-cif
```
### Locally
```bash
git clone https://github.com/George0828Zhang/torch_cif
cd torch_cif
python setup.py install
```

## Usage
```python
def cif_function(
    inputs: Tensor,
    alpha: Tensor,
    beta: float = 1.0,
    tail_thres: float = 0.5,
    padding_mask: Optional[Tensor] = None,
    target_lengths: Optional[Tensor] = None,
    eps: float = 1e-4,
    unbound_alpha: bool = False
) -> Dict[str, List[Tensor]]:
    r""" A fast parallel implementation of continuous integrate-and-fire (CIF)
    https://arxiv.org/abs/1905.11235

    Shapes:
        N: batch size
        S: source (encoder) sequence length
        C: source feature dimension
        T: target sequence length

    Args:
        inputs (Tensor): (N, S, C) Input features to be integrated.
        alpha (Tensor): (N, S) Weights corresponding to each elements in the
            inputs. It is expected to be after sigmoid function.
        beta (float): the threshold used for determine firing.
        tail_thres (float): the threshold for determine firing for tail handling.
        padding_mask (Tensor, optional): (N, S) A binary mask representing
            padded elements in the inputs. 1 is padding, 0 is not.
        target_lengths (Tensor, optional): (N,) Desired length of the targets
            for each sample in the minibatch.
        eps (float, optional): Epsilon to prevent underflow for divisions.
            Default: 1e-4
        unbound_alpha (bool, optional): Whether to check if 0 <= alpha <= 1.

    Returns -> Dict[str, List[Tensor]]: Key/values described below.
        cif_out: (N, T, C) The output integrated from the source.
        cif_lengths: (N,) The output length for each element in batch.
        alpha_sum: (N,) The sum of alpha for each element in batch.
            Can be used to compute the quantity loss.
        delays: (N, T) The expected delay (in terms of source tokens) for
            each target tokens in the batch.
        tail_weights: (N,) During inference, return the tail.
        scaled_alpha: (N, S) alpha after applying weight scaling.
        cumsum_alpha: (N, S) cumsum of alpha after scaling.
        right_indices: (N, S) right scatter indices, or floor(cumsum(alpha)).
        right_weights: (N, S) right scatter weights.
        left_indices: (N, S) left scatter indices.
        left_weights: (N, S) left scatter weights.
    """
```

## Note
- This implementation uses `cumsum` and `floor` to determine the firing positions, and use `scatter` to merge the weighted source features. The figure below demonstrates this concept using *scaled* weight sequence `(0.4, 1.8, 1.2, 1.2, 1.4)`

<img src="concept.png" alt="drawing" width="300"/>

- Runing test requires `pip install hypothesis expecttest`.
- If `beta != 1`, our implementation slightly differ from Algorithm 1 in the paper [[1]](#reference):
    - When a boundary is located, the original algorithm add the last feature to the current integration with weight `1 - accumulation` (line 11 in Algorithm 1), which causes negative weights in next integration when `alpha < 1 - accumulation`. 
    - We use `beta - accumulation`, which means the weight in next integration `alpha - (beta - accumulation)` is always positive.
- Feel free to contact me if there are bugs in the code.

## References
1. [CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition](https://arxiv.org/abs/1905.11235)
2. [Exploring Continuous Integrate-and-Fire for Adaptive Simultaneous Speech Translation](https://www.isca-archive.org/interspeech_2022/chang22f_interspeech.html)

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/George0828Zhang/torch_cif",
    "name": "torch-cif",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "speech speech-recognition asr automatic-speech-recognition speech-to-text speech-translation continuous-integrate-and-fire cif monotonic alignment torch pytorch",
    "author": "Chih-Chiang Chang",
    "author_email": "cc.chang0828@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/68/db/512246d1f48fe3132ddbf4f7a1cc89f2a35bb556b4b8bc69542d48e004c4/torch_cif-0.2.0.tar.gz",
    "platform": null,
    "description": "# torch-cif\r\n\r\nA fast parallel implementation pure PyTorch implementation of *\"CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition\"*  https://arxiv.org/abs/1905.11235.\r\n\r\n## Installation\r\n### PyPI\r\n```bash\r\npip install torch-cif\r\n```\r\n### Locally\r\n```bash\r\ngit clone https://github.com/George0828Zhang/torch_cif\r\ncd torch_cif\r\npython setup.py install\r\n```\r\n\r\n## Usage\r\n```python\r\ndef cif_function(\r\n    inputs: Tensor,\r\n    alpha: Tensor,\r\n    beta: float = 1.0,\r\n    tail_thres: float = 0.5,\r\n    padding_mask: Optional[Tensor] = None,\r\n    target_lengths: Optional[Tensor] = None,\r\n    eps: float = 1e-4,\r\n    unbound_alpha: bool = False\r\n) -> Dict[str, List[Tensor]]:\r\n    r\"\"\" A fast parallel implementation of continuous integrate-and-fire (CIF)\r\n    https://arxiv.org/abs/1905.11235\r\n\r\n    Shapes:\r\n        N: batch size\r\n        S: source (encoder) sequence length\r\n        C: source feature dimension\r\n        T: target sequence length\r\n\r\n    Args:\r\n        inputs (Tensor): (N, S, C) Input features to be integrated.\r\n        alpha (Tensor): (N, S) Weights corresponding to each elements in the\r\n            inputs. It is expected to be after sigmoid function.\r\n        beta (float): the threshold used for determine firing.\r\n        tail_thres (float): the threshold for determine firing for tail handling.\r\n        padding_mask (Tensor, optional): (N, S) A binary mask representing\r\n            padded elements in the inputs. 1 is padding, 0 is not.\r\n        target_lengths (Tensor, optional): (N,) Desired length of the targets\r\n            for each sample in the minibatch.\r\n        eps (float, optional): Epsilon to prevent underflow for divisions.\r\n            Default: 1e-4\r\n        unbound_alpha (bool, optional): Whether to check if 0 <= alpha <= 1.\r\n\r\n    Returns -> Dict[str, List[Tensor]]: Key/values described below.\r\n        cif_out: (N, T, C) The output integrated from the source.\r\n        cif_lengths: (N,) The output length for each element in batch.\r\n        alpha_sum: (N,) The sum of alpha for each element in batch.\r\n            Can be used to compute the quantity loss.\r\n        delays: (N, T) The expected delay (in terms of source tokens) for\r\n            each target tokens in the batch.\r\n        tail_weights: (N,) During inference, return the tail.\r\n        scaled_alpha: (N, S) alpha after applying weight scaling.\r\n        cumsum_alpha: (N, S) cumsum of alpha after scaling.\r\n        right_indices: (N, S) right scatter indices, or floor(cumsum(alpha)).\r\n        right_weights: (N, S) right scatter weights.\r\n        left_indices: (N, S) left scatter indices.\r\n        left_weights: (N, S) left scatter weights.\r\n    \"\"\"\r\n```\r\n\r\n## Note\r\n- This implementation uses `cumsum` and `floor` to determine the firing positions, and use `scatter` to merge the weighted source features. The figure below demonstrates this concept using *scaled* weight sequence `(0.4, 1.8, 1.2, 1.2, 1.4)`\r\n\r\n<img src=\"concept.png\" alt=\"drawing\" width=\"300\"/>\r\n\r\n- Runing test requires `pip install hypothesis expecttest`.\r\n- If `beta != 1`, our implementation slightly differ from Algorithm 1 in the paper [[1]](#reference):\r\n    - When a boundary is located, the original algorithm add the last feature to the current integration with weight `1 - accumulation` (line 11 in Algorithm 1), which causes negative weights in next integration when `alpha < 1 - accumulation`. \r\n    - We use `beta - accumulation`, which means the weight in next integration `alpha - (beta - accumulation)` is always positive.\r\n- Feel free to contact me if there are bugs in the code.\r\n\r\n## References\r\n1. [CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition](https://arxiv.org/abs/1905.11235)\r\n2. [Exploring Continuous Integrate-and-Fire for Adaptive Simultaneous Speech Translation](https://www.isca-archive.org/interspeech_2022/chang22f_interspeech.html)\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A fast parallel implementation of continuous integrate-and-fire (CIF) https://arxiv.org/abs/1905.11235",
    "version": "0.2.0",
    "project_urls": {
        "Homepage": "https://github.com/George0828Zhang/torch_cif"
    },
    "split_keywords": [
        "speech",
        "speech-recognition",
        "asr",
        "automatic-speech-recognition",
        "speech-to-text",
        "speech-translation",
        "continuous-integrate-and-fire",
        "cif",
        "monotonic",
        "alignment",
        "torch",
        "pytorch"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "506256b63836a97fdd830010daceb6f512d0fd0402eb7f4b8499e1ca7da6ca70",
                "md5": "53c6936c1ba904aa95592c01cad07762",
                "sha256": "b9027a4411cc46d8d66e81fed8d3041dc3e466b560b47d22c798885d7bcc4dfc"
            },
            "downloads": -1,
            "filename": "torch_cif-0.2.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "53c6936c1ba904aa95592c01cad07762",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 6626,
            "upload_time": "2024-02-09T05:22:46",
            "upload_time_iso_8601": "2024-02-09T05:22:46.269680Z",
            "url": "https://files.pythonhosted.org/packages/50/62/56b63836a97fdd830010daceb6f512d0fd0402eb7f4b8499e1ca7da6ca70/torch_cif-0.2.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "68db512246d1f48fe3132ddbf4f7a1cc89f2a35bb556b4b8bc69542d48e004c4",
                "md5": "cbd2377f35fddabe9375dfe094fe9ddc",
                "sha256": "d865465dffb940840f82ff3db381747cdac63438c725fe64c649a3e24b5829d3"
            },
            "downloads": -1,
            "filename": "torch_cif-0.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "cbd2377f35fddabe9375dfe094fe9ddc",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 8028,
            "upload_time": "2024-02-09T05:22:48",
            "upload_time_iso_8601": "2024-02-09T05:22:48.106521Z",
            "url": "https://files.pythonhosted.org/packages/68/db/512246d1f48fe3132ddbf4f7a1cc89f2a35bb556b4b8bc69542d48e004c4/torch_cif-0.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-09 05:22:48",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "George0828Zhang",
    "github_project": "torch_cif",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "torch-cif"
}

Chih-Chiang Chang