fad-pytorch

Name	fad-pytorch JSON
Version	0.0.6 JSON
	download
home_page	https://github.com/drscotthawley/fad_pytorch
Summary	Frechet Audio Distance evaluation in PyTorch
upload_time	2023-06-09 17:10:26
maintainer
docs_url	None
author	Scott H. Hawley
requires_python	>=3.7
license	Apache Software License 2.0
keywords	nbdev jupyter notebook python
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            fad_pytorch
================

<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

[Original FAD paper (PDF)](https://arxiv.org/pdf/1812.08466.pdf)

## Install

``` sh
pip install fad_pytorch
```

## Features:

- runs in parallel on multiple processors and multiple GPUs (via
  `accelerate`)
- supports multiple embedding methods:
  - VGGish and PANN, both mono @ 16kHz
  - OpenL3 and (LAION-)CLAP, stereo @ 48kHz
- uses publicly-available pretrained checkpoints for music (+other
  sources) for those models. (if you want Speech, submit a PR or an
  Issue; I don’t do speech.)
- favors ops in PyTorch rather than numpy (or tensorflow)
- `fad_gen` supports local data read or WebDataset (audio data stored in
  S3 buckets)
- runs on CPU, CUDA, or MPS

## Instructions:

This is designed to be run as 3 command-line scripts in succession. The
latter 2 (`fad_embed` and `fad_score`) are probably what most people
will want:

1.  `fad_gen`: produces directories of real & fake audio (given real
    data). See `fad_gen`
    [documentation](https://drscotthawley.github.io/fad_pytorch/fad_gen.html)
    for calling sequence.
2.  `fad_embed [options] <real_audio_dir> <fake_audio_dir>`: produces
    directories of *embeddings* of real & fake audio
3.  `fad_score [options] <real_emb_dir> <fake_emb_dir>`: reads the
    embeddings & generates FAD score, for real (“$r$”) and fake (“$f$”):

$$ FAD = || \mu_r - \mu_f ||^2 + tr\left(\Sigma_r + \Sigma_f - 2 \sqrt{\Sigma_r \Sigma_f}\right)$$

## Documentation

See the [Documentation
Website](https://drscotthawley.github.io/fad_pytorch/).

## Comments / FAQ / Troubleshooting

- “`RuntimeError: CUDA error: invalid device ordinal`”: This happens
  when you have a “bad node” on an AWS cluster. [Haven’t yet figured out
  what causes it or how to fix
  it](https://discuss.huggingface.co/t/solved-accelerate-accelerator-cuda-error-invalid-device-ordinal/21509/1).
  Workaround: Just add the current node to your SLURM `--exclude` list,
  exit and retry. Note: it may take as many as 5 to 7 retries before you
  get a “good node”.
- “FAD scores obtained from different embedding methods are *wildly*
  different!” …Yea. It’s not obvious that scores from different
  embedding methods should be comparable. Rather, compare different
  groups of audio files using the same embedding method, and/or check
  that FAD scores go *down* as similarity improves.
- “FAD score for the same dataset repeated (twice) is not exactly zero!”
  …Yea. There seems to be an uncertainty of around +/- 0.008. I’d say,
  don’t quote any numbers past the first decimal point.

## Contributing

This repo is still fairly “bare bones” and will benefit from more
documentation and features as time goes on. Note that it is written
using [nbdev](https://nbdev.fast.ai/), so the things to do are:

1.  Fork this repo
2.  Clone your fork to your (local) machine
3.  Install nbdev: `python3 -m pip install -U nbdev`
4.  Make changes by editing the notebooks in `nbs/`, not the `.py` files
    in `fad_pytorch/`.
5.  Run `nbdev_export` to export notebook changes to `.py` files
6.  For good measure, run `nbdev_install_hooks` and `nbdev_clean` -
    especially if you’ve *added* any notebooks.
7.  Do a `git status` to see all the `.ipynb` and `.py` files that need
    to be added & committed
8.  `git add` those files and then `git commit`, and then `git push`
9.  Take a look in your fork’s GitHub Actions tab, and see if the “test”
    and “deploy” CI runs finish properly (green light) or fail (red
    light)
10. Once you get green lights, send in a Pull Request!

*Feel free to ask me for tips with nbdev, it has quite a learning curve.
You can also ask on [fast.ai forums](https://forums.fast.ai/) and/or
[fast.ai
Discord](https://discord.com/channels/689892369998676007/887694559952400424)*

## Citations / Blame / Disclaimer

This repo is 2 weeks old. I’m not ready for this to be cited in your
papers. I’d hate for there to be some mistake I haven’t found yet.
Perhaps a later version will have citation info. For now, instead,
there’s:

**Disclaimer:** Results from this repo are still a work in progress.
While every effort has been made to test model outputs, the author takes
no responsbility for mistakes. If you want to double-check via another
source, see “Related Repos” below.

## Related Repos

There are \[several\] others, but this one is mine. These repos didn’t
have all the features I wanted, but I used them for inspiration:

- https://github.com/gudgud96/frechet-audio-distance
- https://github.com/google-research/google-research/tree/master/frechet_audio_distance:
  Goes with [Original FAD paper](https://arxiv.org/pdf/1812.08466.pdf)
- https://github.com/AndreevP/speech_distances

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/drscotthawley/fad_pytorch",
    "name": "fad-pytorch",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "nbdev jupyter notebook python",
    "author": "Scott H. Hawley",
    "author_email": "scott.hawley@belmont.edu",
    "download_url": "https://files.pythonhosted.org/packages/b4/63/8ef510a2a01274516f120472f443245d81776ea56d6a8c65766fd7bce19d/fad_pytorch-0.0.6.tar.gz",
    "platform": null,
    "description": "fad_pytorch\n================\n\n<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->\n\n[Original FAD paper (PDF)](https://arxiv.org/pdf/1812.08466.pdf)\n\n## Install\n\n``` sh\npip install fad_pytorch\n```\n\n## Features:\n\n- runs in parallel on multiple processors and multiple GPUs (via\n  `accelerate`)\n- supports multiple embedding methods:\n  - VGGish and PANN, both mono @ 16kHz\n  - OpenL3 and (LAION-)CLAP, stereo @ 48kHz\n- uses publicly-available pretrained checkpoints for music (+other\n  sources) for those models. (if you want Speech, submit a PR or an\n  Issue; I don\u2019t do speech.)\n- favors ops in PyTorch rather than numpy (or tensorflow)\n- `fad_gen` supports local data read or WebDataset (audio data stored in\n  S3 buckets)\n- runs on CPU, CUDA, or MPS\n\n## Instructions:\n\nThis is designed to be run as 3 command-line scripts in succession. The\nlatter 2 (`fad_embed` and `fad_score`) are probably what most people\nwill want:\n\n1.  `fad_gen`: produces directories of real & fake audio (given real\n    data). See `fad_gen`\n    [documentation](https://drscotthawley.github.io/fad_pytorch/fad_gen.html)\n    for calling sequence.\n2.  `fad_embed [options] <real_audio_dir> <fake_audio_dir>`: produces\n    directories of *embeddings* of real & fake audio\n3.  `fad_score [options] <real_emb_dir> <fake_emb_dir>`: reads the\n    embeddings & generates FAD score, for real (\u201c$r$\u201d) and fake (\u201c$f$\u201d):\n\n$$ FAD = || \\mu_r - \\mu_f ||^2 + tr\\left(\\Sigma_r + \\Sigma_f - 2 \\sqrt{\\Sigma_r \\Sigma_f}\\right)$$\n\n## Documentation\n\nSee the [Documentation\nWebsite](https://drscotthawley.github.io/fad_pytorch/).\n\n## Comments / FAQ / Troubleshooting\n\n- \u201c`RuntimeError: CUDA error: invalid device ordinal`\u201d: This happens\n  when you have a \u201cbad node\u201d on an AWS cluster. [Haven\u2019t yet figured out\n  what causes it or how to fix\n  it](https://discuss.huggingface.co/t/solved-accelerate-accelerator-cuda-error-invalid-device-ordinal/21509/1).\n  Workaround: Just add the current node to your SLURM `--exclude` list,\n  exit and retry. Note: it may take as many as 5 to 7 retries before you\n  get a \u201cgood node\u201d.\n- \u201cFAD scores obtained from different embedding methods are *wildly*\n  different!\u201d \u2026Yea. It\u2019s not obvious that scores from different\n  embedding methods should be comparable. Rather, compare different\n  groups of audio files using the same embedding method, and/or check\n  that FAD scores go *down* as similarity improves.\n- \u201cFAD score for the same dataset repeated (twice) is not exactly zero!\u201d\n  \u2026Yea. There seems to be an uncertainty of around +/- 0.008. I\u2019d say,\n  don\u2019t quote any numbers past the first decimal point.\n\n## Contributing\n\nThis repo is still fairly \u201cbare bones\u201d and will benefit from more\ndocumentation and features as time goes on. Note that it is written\nusing [nbdev](https://nbdev.fast.ai/), so the things to do are:\n\n1.  Fork this repo\n2.  Clone your fork to your (local) machine\n3.  Install nbdev: `python3 -m pip install -U nbdev`\n4.  Make changes by editing the notebooks in `nbs/`, not the `.py` files\n    in `fad_pytorch/`.\n5.  Run `nbdev_export` to export notebook changes to `.py` files\n6.  For good measure, run `nbdev_install_hooks` and `nbdev_clean` -\n    especially if you\u2019ve *added* any notebooks.\n7.  Do a `git status` to see all the `.ipynb` and `.py` files that need\n    to be added & committed\n8.  `git add` those files and then `git commit`, and then `git push`\n9.  Take a look in your fork\u2019s GitHub Actions tab, and see if the \u201ctest\u201d\n    and \u201cdeploy\u201d CI runs finish properly (green light) or fail (red\n    light)\n10. Once you get green lights, send in a Pull Request!\n\n*Feel free to ask me for tips with nbdev, it has quite a learning curve.\nYou can also ask on [fast.ai forums](https://forums.fast.ai/) and/or\n[fast.ai\nDiscord](https://discord.com/channels/689892369998676007/887694559952400424)*\n\n## Citations / Blame / Disclaimer\n\nThis repo is 2 weeks old. I\u2019m not ready for this to be cited in your\npapers. I\u2019d hate for there to be some mistake I haven\u2019t found yet.\nPerhaps a later version will have citation info. For now, instead,\nthere\u2019s:\n\n**Disclaimer:** Results from this repo are still a work in progress.\nWhile every effort has been made to test model outputs, the author takes\nno responsbility for mistakes. If you want to double-check via another\nsource, see \u201cRelated Repos\u201d below.\n\n## Related Repos\n\nThere are \\[several\\] others, but this one is mine. These repos didn\u2019t\nhave all the features I wanted, but I used them for inspiration:\n\n- https://github.com/gudgud96/frechet-audio-distance\n- https://github.com/google-research/google-research/tree/master/frechet_audio_distance:\n  Goes with [Original FAD paper](https://arxiv.org/pdf/1812.08466.pdf)\n- https://github.com/AndreevP/speech_distances\n",
    "bugtrack_url": null,
    "license": "Apache Software License 2.0",
    "summary": "Frechet Audio Distance evaluation in PyTorch",
    "version": "0.0.6",
    "project_urls": {
        "Homepage": "https://github.com/drscotthawley/fad_pytorch"
    },
    "split_keywords": [
        "nbdev",
        "jupyter",
        "notebook",
        "python"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2ee299325a1f8438fdb1942f54295d78e6912e89d299b92e51fd172fc2d9d144",
                "md5": "e7fb613433224ea84ee71b51cdb87b11",
                "sha256": "902d940fe1d12bdc9a6b2fa4847e0b074fbf5dde8d2bd66d0eeb9dd221fa5b0a"
            },
            "downloads": -1,
            "filename": "fad_pytorch-0.0.6-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e7fb613433224ea84ee71b51cdb87b11",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 32851,
            "upload_time": "2023-06-09T17:10:25",
            "upload_time_iso_8601": "2023-06-09T17:10:25.246407Z",
            "url": "https://files.pythonhosted.org/packages/2e/e2/99325a1f8438fdb1942f54295d78e6912e89d299b92e51fd172fc2d9d144/fad_pytorch-0.0.6-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b4638ef510a2a01274516f120472f443245d81776ea56d6a8c65766fd7bce19d",
                "md5": "65dc81abb674052188d36f822fb0895b",
                "sha256": "1a67c20597a39feaa29300f0e6996cab34ec0303c4bbd5695377c546e8eed1bd"
            },
            "downloads": -1,
            "filename": "fad_pytorch-0.0.6.tar.gz",
            "has_sig": false,
            "md5_digest": "65dc81abb674052188d36f822fb0895b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 33156,
            "upload_time": "2023-06-09T17:10:26",
            "upload_time_iso_8601": "2023-06-09T17:10:26.981151Z",
            "url": "https://files.pythonhosted.org/packages/b4/63/8ef510a2a01274516f120472f443245d81776ea56d6a8c65766fd7bce19d/fad_pytorch-0.0.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-06-09 17:10:26",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "drscotthawley",
    "github_project": "fad_pytorch",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "fad-pytorch"
}

Scott H. Hawley