audiofeat


Nameaudiofeat JSON
Version 1.0.0 PyPI version JSON
download
home_pagehttps://github.com/ankitshah009/audiofeat
SummaryA comprehensive PyTorch-based audio feature extraction library for machine learning, research, and audio analysis
upload_time2025-08-04 04:54:45
maintainerNone
docs_urlNone
authorAnkit Shah
requires_python>=3.8
licenseMIT License Copyright (c) 2025 Ankit Parag Shah Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords audio feature-extraction signal-processing pytorch machine-learning speech voice spectrogram mfcc spectral-features temporal-features pitch-detection audio-analysis music-information-retrieval mir dsp audio-processing
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # audiofeat: A Comprehensive Audio Feature Extraction Library

`audiofeat` is designed to be the most comprehensive publicly available Python library for audio feature extraction. It provides a wide range of temporal, spectral, pitch, and voice-related features, along with various spectrogram representations, all implemented using `torch` for efficient computation.

## Features

### Temporal Features
- **RMS (Root Mean Square):** Measures the loudness or power of an audio signal.
- **Short-Time Energy (STE):** The sum of squared signal values in a frame.
- **Zero-Crossing Rate (ZCR):** Indicates the rate at which the signal changes its sign.
- **Amplitude Modulation Depth:** Measures the depth of amplitude modulation over a sliding window.
- **Breath Group Duration:** Estimates the duration of breath groups from the audio envelope.
- **Speech Rate:** Estimates speech rate in syllables per second.
- **Log Attack Time:** Measures the time for a signal's envelope to rise to its peak.
- **Temporal Centroid:** The "center of gravity" of the signal's amplitude envelope.
- **Entropy of Energy:** Measures abrupt changes in energy within a frame.
- **Decay Time:** Measures the time for a signal's envelope to decay from its peak.

### Spectral Features
- **Spectral Centroid:** Represents the "center of mass" of the spectrum, indicating dominant frequencies.
- **Spectral Rolloff:** The frequency below which a certain percentage of the total spectral energy is concentrated. Configurable `rolloff_percent` (e.g., 0.85, 0.90, 0.95).
- **Spectral Flux:** Measures the rate of change of the power spectrum.
- **Spectral Flatness:** Quantifies how noise-like a sound is, using a `torch`-native geometric mean.
- **Spectral Entropy:** Measures the randomness or unpredictability of the spectrum.
- **Spectral Skewness:** Describes the asymmetry of the spectral distribution.
- **Spectral Spread (Bandwidth):** Measures the bandwidth of the spectrum, or how "spread out" it is around the centroid.
- **Spectral Slope:** The slope of a linear regression fitted to the spectrum.
- **Spectral Crest Factor:** Ratio of the max spectral magnitude to the sum of magnitudes; measures "peakiness".
- **Spectral Contrast:** Measures the amplitude difference between spectral peaks and valleys across several frequency sub-bands, calculated as `(peak - valley) / (peak + valley)`.
- **Harmonic-to-Noise Ratio (HNR):** Ratio of energy in harmonic components to noise components.
- **Spectral Deviation:** Quantifies the "jaggedness" of the local spectrum.
- **Low-High Energy Ratio:** Ratio of energy below 1 kHz to that above 3 kHz.
- **LPC (Linear Prediction Coefficients):** Coefficients representing the spectral envelope of a signal.
- **LSP (Line Spectral Pairs):** Robust and compact representation of the LPC filter.
- **MFCCs (Mel-Frequency Cepstral Coefficients):** Compact representation of the spectral envelope, based on the Mel scale.
- **Linear Spectrogram (STFT):** Visual representation of the spectrum of frequencies over time.
- **Mel Spectrogram:** Spectrogram with a Mel-scaled frequency axis, mimicking human auditory perception.
- **CQT Spectrogram (Constant-Q Transform):** Spectrogram with logarithmically spaced frequency bins. (Note: This is a simplified `torch`-native implementation and not a full, optimized CQT).
- **Chroma Features:** Represents the intensity of the 12 different pitch classes of the Western musical scale.
- **Spectral Sharpness (Zwicker Model):** Measures the perceived sharpness of a sound based on the Zwicker model.
- **Spectral Tonality:** Quantifies the tonal characteristics of a sound using the spectral crest factor.

### Cepstral Features
- **LPCC (Linear Predictive Cepstral Coefficients):** Cepstral coefficients derived from Linear Predictive Coding (LPC) analysis.
- **GTCC (Gammatone Cepstral Coefficients):** Cepstral coefficients derived from a Gammatone filterbank.
- **Delta Coefficients:** First-order derivative of a feature contour over time.
- **Delta-Delta Coefficients:** Second-order derivative of a feature contour over time.

### Pitch Features
- **Fundamental Frequency (F0) Autocorrelation:** Estimates F0 via autocorrelation.
- **Fundamental Frequency (F0) YIN:** Estimates F0 using the YIN algorithm.
- **Semitone Standard Deviation:** Standard deviation of F0 in semitones.
- **Pitch Strength:** Measures the strength of periodicity in a signal.

### Voice Features
- **Jitter:** Cycle-to-cycle F0 variation.
- **Shimmer:** Cycle-to-cycle amplitude variation.
- **Subharmonic to Harmonic Ratio:** Ratio of subharmonic power to harmonic power.
- **Normalized Amplitude Quotient (NAQ):** Computed from peak glottal flow, MFDR, and period.
- **Closed Quotient:** Derived from EGG timings per cycle.
- **Glottal Closure Time:** Average relative glottal closure time.
- **Soft Phonation Index:** Derived from low/high band energies.
- **Speed Quotient:** From glottal flow opening and closing times.
- **Vocal Fry Index:** Ratio of fry frames to voiced frames.
- **Voice Onset Time (VOT):** Simplified estimation of voice onset time.
- **Glottal to Noise Excitation (GNE):** Approximate GNE using band cross-correlations.
- **Maximum Flow Declination Rate (MFDR):** Approximate MFDR from differentiated glottal flow.
- **Nasality Index:** Computed from nasal and oral microphone signals.
- **Vocal Tract Length:** Estimated from the first two formants.
- **Alpha Ratio:** Ratio of low-frequency energy (50-1k Hz) to high-frequency energy (1-5k Hz).
- **Hammarberg Index:** Ratio of max energy in 0-2k Hz band to max energy in 2-5k Hz band.
- **Harmonic Differences (e.g., H1-H2, H1-A3):** Ratios between the amplitudes of specific harmonics.

### Tonal and Musical Features
- **Tonnetz (Tonal Centroid Features):** A 6-dimensional representation of tonal space based on music theory.

### Rhythm Features
- **Tempo:** Estimates the tempo (BPM) of an audio signal.
- **Beat Tracking:** Performs simple beat tracking on an audio signal.

### Statistical Functionals
- **Mean:** Average value of a feature over time.
- **Standard Deviation:** Variability of a feature over time.
- **Min:** Minimum value of a feature over time.
- **Max:** Maximum value of a feature over time.
- **Skewness:** Asymmetry of the feature distribution over time.
- **Kurtosis:** Peakiness of the feature distribution over time.

## Installation

### Install from PyPI (Recommended)

```bash
pip install audiofeat
```

### Install from Source

To install `audiofeat` from source, clone the repository and install it in editable mode:

```bash
git clone https://github.com/ankitshah009/audiofeat.git
cd audiofeat
pip install -e .
```

### Optional Dependencies

For development and examples:

```bash
# For development
pip install audiofeat[dev]

# For running examples
pip install audiofeat[examples]
```

## Usage

Here's a basic example of how to use `audiofeat` to extract various features:

```python
import torch
import audiofeat

# Create a dummy audio signal
sample_rate = 22050
duration = 5
audio_data = torch.randn(sample_rate * duration)

# Compute features
rms = audiofeat.rms(audio_data, frame_length=2048, hop_length=512)
zcr = audiofeat.zero_crossing_rate(audio_data)
spectral_centroid = audiofeat.spectral_centroid(audio_data)
mel_spec = audiofeat.mel_spectrogram(audio_data, sample_rate)
mfccs = audiofeat.mfcc(audio_data, sample_rate)

print(f"RMS: {rms.shape}")
print(f"ZCR: {zcr.shape}")
print(f"Spectral Centroid: {spectral_centroid.shape}")
print(f"Mel Spectrogram: {mel_spec.shape}")
print(f"MFCCs: {mfccs.shape}")
```

For more detailed examples, refer to the `examples/compute_features.py` file.

## Contributing

We welcome contributions to `audiofeat`! If you have new features to add, bug fixes, or improvements, please feel free to open a pull request.

## Citation

If you use `audiofeat` in your research, please cite the following Ph.D. thesis:

```bibtex
@phdthesis{shah2024computational,
  title={Computational Audition with Imprecise Labels},
  author={Shah, Ankit Parag},
  year={2024},
  school={Carnegie Mellon University Pittsburgh, PA}
}
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/ankitshah009/audiofeat",
    "name": "audiofeat",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "audio, feature-extraction, signal-processing, pytorch, machine-learning, speech, voice, spectrogram, mfcc, spectral-features, temporal-features, pitch-detection, audio-analysis, music-information-retrieval, mir, dsp, audio-processing",
    "author": "Ankit Shah",
    "author_email": "Ankit Shah <ankit.tronix@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/c1/bd/9a1a0a4f3c974d2014fe492d8b8812577c877fe3d1db5eba65514f0eb560/audiofeat-1.0.0.tar.gz",
    "platform": null,
    "description": "# audiofeat: A Comprehensive Audio Feature Extraction Library\n\n`audiofeat` is designed to be the most comprehensive publicly available Python library for audio feature extraction. It provides a wide range of temporal, spectral, pitch, and voice-related features, along with various spectrogram representations, all implemented using `torch` for efficient computation.\n\n## Features\n\n### Temporal Features\n- **RMS (Root Mean Square):** Measures the loudness or power of an audio signal.\n- **Short-Time Energy (STE):** The sum of squared signal values in a frame.\n- **Zero-Crossing Rate (ZCR):** Indicates the rate at which the signal changes its sign.\n- **Amplitude Modulation Depth:** Measures the depth of amplitude modulation over a sliding window.\n- **Breath Group Duration:** Estimates the duration of breath groups from the audio envelope.\n- **Speech Rate:** Estimates speech rate in syllables per second.\n- **Log Attack Time:** Measures the time for a signal's envelope to rise to its peak.\n- **Temporal Centroid:** The \"center of gravity\" of the signal's amplitude envelope.\n- **Entropy of Energy:** Measures abrupt changes in energy within a frame.\n- **Decay Time:** Measures the time for a signal's envelope to decay from its peak.\n\n### Spectral Features\n- **Spectral Centroid:** Represents the \"center of mass\" of the spectrum, indicating dominant frequencies.\n- **Spectral Rolloff:** The frequency below which a certain percentage of the total spectral energy is concentrated. Configurable `rolloff_percent` (e.g., 0.85, 0.90, 0.95).\n- **Spectral Flux:** Measures the rate of change of the power spectrum.\n- **Spectral Flatness:** Quantifies how noise-like a sound is, using a `torch`-native geometric mean.\n- **Spectral Entropy:** Measures the randomness or unpredictability of the spectrum.\n- **Spectral Skewness:** Describes the asymmetry of the spectral distribution.\n- **Spectral Spread (Bandwidth):** Measures the bandwidth of the spectrum, or how \"spread out\" it is around the centroid.\n- **Spectral Slope:** The slope of a linear regression fitted to the spectrum.\n- **Spectral Crest Factor:** Ratio of the max spectral magnitude to the sum of magnitudes; measures \"peakiness\".\n- **Spectral Contrast:** Measures the amplitude difference between spectral peaks and valleys across several frequency sub-bands, calculated as `(peak - valley) / (peak + valley)`.\n- **Harmonic-to-Noise Ratio (HNR):** Ratio of energy in harmonic components to noise components.\n- **Spectral Deviation:** Quantifies the \"jaggedness\" of the local spectrum.\n- **Low-High Energy Ratio:** Ratio of energy below 1 kHz to that above 3 kHz.\n- **LPC (Linear Prediction Coefficients):** Coefficients representing the spectral envelope of a signal.\n- **LSP (Line Spectral Pairs):** Robust and compact representation of the LPC filter.\n- **MFCCs (Mel-Frequency Cepstral Coefficients):** Compact representation of the spectral envelope, based on the Mel scale.\n- **Linear Spectrogram (STFT):** Visual representation of the spectrum of frequencies over time.\n- **Mel Spectrogram:** Spectrogram with a Mel-scaled frequency axis, mimicking human auditory perception.\n- **CQT Spectrogram (Constant-Q Transform):** Spectrogram with logarithmically spaced frequency bins. (Note: This is a simplified `torch`-native implementation and not a full, optimized CQT).\n- **Chroma Features:** Represents the intensity of the 12 different pitch classes of the Western musical scale.\n- **Spectral Sharpness (Zwicker Model):** Measures the perceived sharpness of a sound based on the Zwicker model.\n- **Spectral Tonality:** Quantifies the tonal characteristics of a sound using the spectral crest factor.\n\n### Cepstral Features\n- **LPCC (Linear Predictive Cepstral Coefficients):** Cepstral coefficients derived from Linear Predictive Coding (LPC) analysis.\n- **GTCC (Gammatone Cepstral Coefficients):** Cepstral coefficients derived from a Gammatone filterbank.\n- **Delta Coefficients:** First-order derivative of a feature contour over time.\n- **Delta-Delta Coefficients:** Second-order derivative of a feature contour over time.\n\n### Pitch Features\n- **Fundamental Frequency (F0) Autocorrelation:** Estimates F0 via autocorrelation.\n- **Fundamental Frequency (F0) YIN:** Estimates F0 using the YIN algorithm.\n- **Semitone Standard Deviation:** Standard deviation of F0 in semitones.\n- **Pitch Strength:** Measures the strength of periodicity in a signal.\n\n### Voice Features\n- **Jitter:** Cycle-to-cycle F0 variation.\n- **Shimmer:** Cycle-to-cycle amplitude variation.\n- **Subharmonic to Harmonic Ratio:** Ratio of subharmonic power to harmonic power.\n- **Normalized Amplitude Quotient (NAQ):** Computed from peak glottal flow, MFDR, and period.\n- **Closed Quotient:** Derived from EGG timings per cycle.\n- **Glottal Closure Time:** Average relative glottal closure time.\n- **Soft Phonation Index:** Derived from low/high band energies.\n- **Speed Quotient:** From glottal flow opening and closing times.\n- **Vocal Fry Index:** Ratio of fry frames to voiced frames.\n- **Voice Onset Time (VOT):** Simplified estimation of voice onset time.\n- **Glottal to Noise Excitation (GNE):** Approximate GNE using band cross-correlations.\n- **Maximum Flow Declination Rate (MFDR):** Approximate MFDR from differentiated glottal flow.\n- **Nasality Index:** Computed from nasal and oral microphone signals.\n- **Vocal Tract Length:** Estimated from the first two formants.\n- **Alpha Ratio:** Ratio of low-frequency energy (50-1k Hz) to high-frequency energy (1-5k Hz).\n- **Hammarberg Index:** Ratio of max energy in 0-2k Hz band to max energy in 2-5k Hz band.\n- **Harmonic Differences (e.g., H1-H2, H1-A3):** Ratios between the amplitudes of specific harmonics.\n\n### Tonal and Musical Features\n- **Tonnetz (Tonal Centroid Features):** A 6-dimensional representation of tonal space based on music theory.\n\n### Rhythm Features\n- **Tempo:** Estimates the tempo (BPM) of an audio signal.\n- **Beat Tracking:** Performs simple beat tracking on an audio signal.\n\n### Statistical Functionals\n- **Mean:** Average value of a feature over time.\n- **Standard Deviation:** Variability of a feature over time.\n- **Min:** Minimum value of a feature over time.\n- **Max:** Maximum value of a feature over time.\n- **Skewness:** Asymmetry of the feature distribution over time.\n- **Kurtosis:** Peakiness of the feature distribution over time.\n\n## Installation\n\n### Install from PyPI (Recommended)\n\n```bash\npip install audiofeat\n```\n\n### Install from Source\n\nTo install `audiofeat` from source, clone the repository and install it in editable mode:\n\n```bash\ngit clone https://github.com/ankitshah009/audiofeat.git\ncd audiofeat\npip install -e .\n```\n\n### Optional Dependencies\n\nFor development and examples:\n\n```bash\n# For development\npip install audiofeat[dev]\n\n# For running examples\npip install audiofeat[examples]\n```\n\n## Usage\n\nHere's a basic example of how to use `audiofeat` to extract various features:\n\n```python\nimport torch\nimport audiofeat\n\n# Create a dummy audio signal\nsample_rate = 22050\nduration = 5\naudio_data = torch.randn(sample_rate * duration)\n\n# Compute features\nrms = audiofeat.rms(audio_data, frame_length=2048, hop_length=512)\nzcr = audiofeat.zero_crossing_rate(audio_data)\nspectral_centroid = audiofeat.spectral_centroid(audio_data)\nmel_spec = audiofeat.mel_spectrogram(audio_data, sample_rate)\nmfccs = audiofeat.mfcc(audio_data, sample_rate)\n\nprint(f\"RMS: {rms.shape}\")\nprint(f\"ZCR: {zcr.shape}\")\nprint(f\"Spectral Centroid: {spectral_centroid.shape}\")\nprint(f\"Mel Spectrogram: {mel_spec.shape}\")\nprint(f\"MFCCs: {mfccs.shape}\")\n```\n\nFor more detailed examples, refer to the `examples/compute_features.py` file.\n\n## Contributing\n\nWe welcome contributions to `audiofeat`! If you have new features to add, bug fixes, or improvements, please feel free to open a pull request.\n\n## Citation\n\nIf you use `audiofeat` in your research, please cite the following Ph.D. thesis:\n\n```bibtex\n@phdthesis{shah2024computational,\n  title={Computational Audition with Imprecise Labels},\n  author={Shah, Ankit Parag},\n  year={2024},\n  school={Carnegie Mellon University Pittsburgh, PA}\n}\n```\n",
    "bugtrack_url": null,
    "license": "MIT License\n        \n        Copyright (c) 2025 Ankit Parag Shah\n        \n        Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:\n        \n        The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.\n        \n        THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.\n        ",
    "summary": "A comprehensive PyTorch-based audio feature extraction library for machine learning, research, and audio analysis",
    "version": "1.0.0",
    "project_urls": {
        "Bug Reports": "https://github.com/ankitshah009/audiofeat/issues",
        "Documentation": "https://github.com/ankitshah009/audiofeat#readme",
        "Homepage": "https://github.com/ankitshah009/audiofeat",
        "Repository": "https://github.com/ankitshah009/audiofeat"
    },
    "split_keywords": [
        "audio",
        " feature-extraction",
        " signal-processing",
        " pytorch",
        " machine-learning",
        " speech",
        " voice",
        " spectrogram",
        " mfcc",
        " spectral-features",
        " temporal-features",
        " pitch-detection",
        " audio-analysis",
        " music-information-retrieval",
        " mir",
        " dsp",
        " audio-processing"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "ffe80bd82dd955d2c15aef0105c5c0386bff0b38011ca7026e97559a4c33c96e",
                "md5": "5c1fffbe4184df9b158375ea86d89c57",
                "sha256": "272f6a195fb7fbc953ef989b00e0e54553272f5eeca39ef57f7bc01129189655"
            },
            "downloads": -1,
            "filename": "audiofeat-1.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "5c1fffbe4184df9b158375ea86d89c57",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 72071,
            "upload_time": "2025-08-04T04:54:43",
            "upload_time_iso_8601": "2025-08-04T04:54:43.682538Z",
            "url": "https://files.pythonhosted.org/packages/ff/e8/0bd82dd955d2c15aef0105c5c0386bff0b38011ca7026e97559a4c33c96e/audiofeat-1.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "c1bd9a1a0a4f3c974d2014fe492d8b8812577c877fe3d1db5eba65514f0eb560",
                "md5": "f4f8508772bd063e498f3645ca3ac777",
                "sha256": "6d466db00e20fbe77f6b2498b9e0966bc7f6d4ab650a8baa5cba731ab1f5fd4f"
            },
            "downloads": -1,
            "filename": "audiofeat-1.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "f4f8508772bd063e498f3645ca3ac777",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 52738,
            "upload_time": "2025-08-04T04:54:45",
            "upload_time_iso_8601": "2025-08-04T04:54:45.158781Z",
            "url": "https://files.pythonhosted.org/packages/c1/bd/9a1a0a4f3c974d2014fe492d8b8812577c877fe3d1db5eba65514f0eb560/audiofeat-1.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-04 04:54:45",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "ankitshah009",
    "github_project": "audiofeat",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "audiofeat"
}
        
Elapsed time: 1.62943s