# Otozu (音図) - Audio Spectrogram Conversion Library
Otozu (音図), combining the Japanese words for "sound" (音) and "diagram" (図), is a Python library that simplifies the conversion between audio files and spectrograms. Whether you're working on audio analysis, machine learning with sound data, or just curious about sound visualization, Otozu provides an intuitive interface for bidirectional conversion between audio files and their spectrogram representations.
## Why?
Ever needed to convert audio files into spectrograms? Maybe you're training a neural network for audio classification and want to understand what your model is actually learning. Otozu makes it easy to convert your audio dataset into spectrograms for training, but it goes further - you can also convert those learned representations back into audio. Want to know what features your CNN's first layer is detecting? Convert its activation patterns back into sound. Curious about what "maximum meowing" sounds like to your cat detector? Otozu can help you translate those neural patterns back into audio that humans can understand.
## Features
- Convert audio files to mel spectrograms (saved as PNG or NPY files)
- Reconstruct audio from spectrograms using the Griffin-Lim algorithm
- Support for multiple audio formats (WAV, MP3, FLAC, OGG)
- Batch processing of entire directories
- Command-line interface for easy usage
- Python API for integration into your projects
- Configurable spectrogram parameters
- Preserves metadata for accurate audio reconstruction
## Installation
```bash
pip install otozu
```
## Quick Start
### Command Line Usage
Convert an audio file to a spectrogram:
```bash
otozu audio-to-spec input.wav output_directory/
```
Convert all audio files in a directory:
```bash
otozu audio-to-spec input_directory/ output_directory/ --label dataset1
```
Reconstruct audio from a spectrogram:
```bash
otozu spec-to-audio spectrogram.png output_directory/
```
### Python API Usage
```python
from otozu import AudioSpectrogramConverter
from otozu.config import SpectrogramConfig
# Create a converter with custom settings
config = SpectrogramConfig(
n_fft=2048,
hop_length=512,
n_mels=128
)
converter = AudioSpectrogramConverter(config)
# Convert audio to spectrogram
spec_path, metadata = converter.audio_to_spectrogram(
"input.wav",
"output_directory"
)
# Reconstruct audio from spectrogram
audio_path = converter.spectrogram_to_audio(
"spectrogram.png",
"reconstructed.wav"
)
```
## Advanced Configuration
The `SpectrogramConfig` class allows you to customize various aspects of the spectrogram generation:
```python
from otozu.config import SpectrogramConfig
config = SpectrogramConfig(
n_fft=2048, # FFT window size
hop_length=512, # Number of samples between successive frames
n_mels=128, # Number of mel bands
sample_rate=44100, # Target sample rate (None for original)
center=False, # Whether to pad signal at edges
normalized_range=(0, 65535) # Output range for normalization
)
```
## Technical Details
### Spectrogram Generation Process
1. Audio Loading: Files are loaded using librosa with configurable sample rate
2. Mel Spectrogram Creation: Converts audio to mel-scale spectrogram
3. Power to dB: Converts power spectrogram to decibel units
4. Normalization: Scales values to 16-bit range for PNG storage
5. Metadata Preservation: Stores original range and sample rate for reconstruction
### Audio Reconstruction Process
1. Spectrogram Loading: Loads normalized spectrogram data
2. Denormalization: Restores original decibel scale
3. Mel to Linear: Converts mel-scale spectrogram to linear-scale
4. Griffin-Lim Algorithm: Estimates phase information
5. Waveform Generation: Produces final audio output
## A Quick Note About Audio Reconstruction
When you convert a spectrogram back to audio, it won't sound exactly like the original. When we create spectrograms, we lose some information about the sound (specifically, the phase information). We do our best to guess what that missing information should be with the Griffin-Lim algorithm, but it's not perfect. The result is usually pretty good for most purposes, but it's going to be quite lossy.
## Contributing
Found a bug? Have an idea for a feature? PRs are always welcome! If you're thinking of adding something big, maybe open an issue first so we can chat about it.
## License
This project is licensed under the MIT License - see the LICENSE file for details.
## Acknowledgments
Otozu is basically just a friendly wrapper around several excellent libraries:
- librosa for audio processing
- numpy for numerical operations
- Pillow for image handling
- soundfile for audio file I/O
- typer for CLI interface
## Need Help?
If something's not working right or you're not sure how to do something, open an issue on GitHub. I'll do my best to help out when I can.
## Citation
If you use Otozu in your research, please cite:
```bibtex
@software{otozu2024,
title = {Otozu: Audio Spectrogram Conversion Library},
author = {Yamashiro, John},
year = {2024},
url = {https://github.com/jyaaan/otozu}
}
```
---
Built with 🎵 by an engineer who got tired of writing the same spectrogram conversion code over and over again. Bulk of code was taken from my sneeze detector, Kushami.
Raw data
{
"_id": null,
"home_page": null,
"name": "otozu",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "audio, audio analysis, audio classification, deep learning, machine learning, mel-spectrogram, model visualization, neural networks, sound processing, spectrogram",
"author": null,
"author_email": "John Yamashiro <john.yamashiro@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/12/91/92f75f275d716f01e75a82bd232046f133a2fc2c8690a1a2f895ece3dc15/otozu-0.1.0.tar.gz",
"platform": null,
"description": "# Otozu (\u97f3\u56f3) - Audio Spectrogram Conversion Library\n\nOtozu (\u97f3\u56f3), combining the Japanese words for \"sound\" (\u97f3) and \"diagram\" (\u56f3), is a Python library that simplifies the conversion between audio files and spectrograms. Whether you're working on audio analysis, machine learning with sound data, or just curious about sound visualization, Otozu provides an intuitive interface for bidirectional conversion between audio files and their spectrogram representations.\n\n## Why?\n\nEver needed to convert audio files into spectrograms? Maybe you're training a neural network for audio classification and want to understand what your model is actually learning. Otozu makes it easy to convert your audio dataset into spectrograms for training, but it goes further - you can also convert those learned representations back into audio. Want to know what features your CNN's first layer is detecting? Convert its activation patterns back into sound. Curious about what \"maximum meowing\" sounds like to your cat detector? Otozu can help you translate those neural patterns back into audio that humans can understand.\n\n## Features\n\n- Convert audio files to mel spectrograms (saved as PNG or NPY files)\n- Reconstruct audio from spectrograms using the Griffin-Lim algorithm\n- Support for multiple audio formats (WAV, MP3, FLAC, OGG)\n- Batch processing of entire directories\n- Command-line interface for easy usage\n- Python API for integration into your projects\n- Configurable spectrogram parameters\n- Preserves metadata for accurate audio reconstruction\n\n## Installation\n\n```bash\npip install otozu\n```\n\n## Quick Start\n\n### Command Line Usage\n\nConvert an audio file to a spectrogram:\n\n```bash\notozu audio-to-spec input.wav output_directory/\n```\n\nConvert all audio files in a directory:\n\n```bash\notozu audio-to-spec input_directory/ output_directory/ --label dataset1\n```\n\nReconstruct audio from a spectrogram:\n\n```bash\notozu spec-to-audio spectrogram.png output_directory/\n```\n\n### Python API Usage\n\n```python\nfrom otozu import AudioSpectrogramConverter\nfrom otozu.config import SpectrogramConfig\n\n# Create a converter with custom settings\nconfig = SpectrogramConfig(\n n_fft=2048,\n hop_length=512,\n n_mels=128\n)\nconverter = AudioSpectrogramConverter(config)\n\n# Convert audio to spectrogram\nspec_path, metadata = converter.audio_to_spectrogram(\n \"input.wav\",\n \"output_directory\"\n)\n\n# Reconstruct audio from spectrogram\naudio_path = converter.spectrogram_to_audio(\n \"spectrogram.png\",\n \"reconstructed.wav\"\n)\n```\n\n## Advanced Configuration\n\nThe `SpectrogramConfig` class allows you to customize various aspects of the spectrogram generation:\n\n```python\nfrom otozu.config import SpectrogramConfig\n\nconfig = SpectrogramConfig(\n n_fft=2048, # FFT window size\n hop_length=512, # Number of samples between successive frames\n n_mels=128, # Number of mel bands\n sample_rate=44100, # Target sample rate (None for original)\n center=False, # Whether to pad signal at edges\n normalized_range=(0, 65535) # Output range for normalization\n)\n```\n\n## Technical Details\n\n### Spectrogram Generation Process\n\n1. Audio Loading: Files are loaded using librosa with configurable sample rate\n2. Mel Spectrogram Creation: Converts audio to mel-scale spectrogram\n3. Power to dB: Converts power spectrogram to decibel units\n4. Normalization: Scales values to 16-bit range for PNG storage\n5. Metadata Preservation: Stores original range and sample rate for reconstruction\n\n### Audio Reconstruction Process\n\n1. Spectrogram Loading: Loads normalized spectrogram data\n2. Denormalization: Restores original decibel scale\n3. Mel to Linear: Converts mel-scale spectrogram to linear-scale\n4. Griffin-Lim Algorithm: Estimates phase information\n5. Waveform Generation: Produces final audio output\n\n## A Quick Note About Audio Reconstruction\n\nWhen you convert a spectrogram back to audio, it won't sound exactly like the original. When we create spectrograms, we lose some information about the sound (specifically, the phase information). We do our best to guess what that missing information should be with the Griffin-Lim algorithm, but it's not perfect. The result is usually pretty good for most purposes, but it's going to be quite lossy.\n\n## Contributing\n\nFound a bug? Have an idea for a feature? PRs are always welcome! If you're thinking of adding something big, maybe open an issue first so we can chat about it.\n\n## License\n\nThis project is licensed under the MIT License - see the LICENSE file for details.\n\n## Acknowledgments\n\nOtozu is basically just a friendly wrapper around several excellent libraries:\n\n- librosa for audio processing\n- numpy for numerical operations\n- Pillow for image handling\n- soundfile for audio file I/O\n- typer for CLI interface\n\n## Need Help?\n\nIf something's not working right or you're not sure how to do something, open an issue on GitHub. I'll do my best to help out when I can.\n\n## Citation\n\nIf you use Otozu in your research, please cite:\n\n```bibtex\n@software{otozu2024,\n title = {Otozu: Audio Spectrogram Conversion Library},\n author = {Yamashiro, John},\n year = {2024},\n url = {https://github.com/jyaaan/otozu}\n}\n```\n\n---\n\nBuilt with \ud83c\udfb5 by an engineer who got tired of writing the same spectrogram conversion code over and over again. Bulk of code was taken from my sneeze detector, Kushami.\n",
"bugtrack_url": null,
"license": null,
"summary": "A library for converting between audio files and spectrograms",
"version": "0.1.0",
"project_urls": {
"Bug Tracker": "https://github.com/jyaaan/otozu/issues",
"Homepage": "https://github.com/jyaaan/otozu"
},
"split_keywords": [
"audio",
" audio analysis",
" audio classification",
" deep learning",
" machine learning",
" mel-spectrogram",
" model visualization",
" neural networks",
" sound processing",
" spectrogram"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "139f726828decebf5f3c393c036962f13f321bead4730755b71da5ce3c96b9ca",
"md5": "6f77e64a35ed948b79f823c80cc44d9c",
"sha256": "97f16ffd70372eaae9de14d0e3631fe61e245076e3ac86fb9ef3800fe60b8913"
},
"downloads": -1,
"filename": "otozu-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "6f77e64a35ed948b79f823c80cc44d9c",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 7027,
"upload_time": "2024-12-05T20:11:20",
"upload_time_iso_8601": "2024-12-05T20:11:20.681461Z",
"url": "https://files.pythonhosted.org/packages/13/9f/726828decebf5f3c393c036962f13f321bead4730755b71da5ce3c96b9ca/otozu-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "129192f75f275d716f01e75a82bd232046f133a2fc2c8690a1a2f895ece3dc15",
"md5": "18c2f2f7f403a13b2a1babf638692d24",
"sha256": "dabff51b0a27450749f5aeb8d70da98390364f3877f30fde5dd828d589aa3ac5"
},
"downloads": -1,
"filename": "otozu-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "18c2f2f7f403a13b2a1babf638692d24",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 7600,
"upload_time": "2024-12-05T20:11:22",
"upload_time_iso_8601": "2024-12-05T20:11:22.519034Z",
"url": "https://files.pythonhosted.org/packages/12/91/92f75f275d716f01e75a82bd232046f133a2fc2c8690a1a2f895ece3dc15/otozu-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-05 20:11:22",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "jyaaan",
"github_project": "otozu",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "audioread",
"specs": [
[
"==",
"3.0.1"
]
]
},
{
"name": "certifi",
"specs": [
[
"==",
"2024.8.30"
]
]
},
{
"name": "cffi",
"specs": [
[
"==",
"1.17.1"
]
]
},
{
"name": "charset-normalizer",
"specs": [
[
"==",
"3.4.0"
]
]
},
{
"name": "click",
"specs": [
[
"==",
"8.1.7"
]
]
},
{
"name": "decorator",
"specs": [
[
"==",
"5.1.1"
]
]
},
{
"name": "idna",
"specs": [
[
"==",
"3.10"
]
]
},
{
"name": "joblib",
"specs": [
[
"==",
"1.4.2"
]
]
},
{
"name": "lazy_loader",
"specs": [
[
"==",
"0.4"
]
]
},
{
"name": "librosa",
"specs": [
[
"==",
"0.10.2.post1"
]
]
},
{
"name": "llvmlite",
"specs": [
[
"==",
"0.43.0"
]
]
},
{
"name": "markdown-it-py",
"specs": [
[
"==",
"3.0.0"
]
]
},
{
"name": "mdurl",
"specs": [
[
"==",
"0.1.2"
]
]
},
{
"name": "msgpack",
"specs": [
[
"==",
"1.1.0"
]
]
},
{
"name": "numba",
"specs": [
[
"==",
"0.60.0"
]
]
},
{
"name": "numpy",
"specs": [
[
"==",
"2.0.2"
]
]
},
{
"name": "packaging",
"specs": [
[
"==",
"24.2"
]
]
},
{
"name": "pillow",
"specs": [
[
"==",
"11.0.0"
]
]
},
{
"name": "platformdirs",
"specs": [
[
"==",
"4.3.6"
]
]
},
{
"name": "pooch",
"specs": [
[
"==",
"1.8.2"
]
]
},
{
"name": "pycparser",
"specs": [
[
"==",
"2.22"
]
]
},
{
"name": "Pygments",
"specs": [
[
"==",
"2.18.0"
]
]
},
{
"name": "requests",
"specs": [
[
"==",
"2.32.3"
]
]
},
{
"name": "rich",
"specs": [
[
"==",
"13.9.4"
]
]
},
{
"name": "scikit-learn",
"specs": [
[
"==",
"1.5.2"
]
]
},
{
"name": "scipy",
"specs": [
[
"==",
"1.14.1"
]
]
},
{
"name": "shellingham",
"specs": [
[
"==",
"1.5.4"
]
]
},
{
"name": "soundfile",
"specs": [
[
"==",
"0.12.1"
]
]
},
{
"name": "soxr",
"specs": [
[
"==",
"0.5.0.post1"
]
]
},
{
"name": "threadpoolctl",
"specs": [
[
"==",
"3.5.0"
]
]
},
{
"name": "typer",
"specs": [
[
"==",
"0.15.0"
]
]
},
{
"name": "typing_extensions",
"specs": [
[
"==",
"4.12.2"
]
]
},
{
"name": "urllib3",
"specs": [
[
"==",
"2.2.3"
]
]
}
],
"lcname": "otozu"
}