swift-f0

Name	swift-f0 JSON
Version	0.1.2 JSON
	download
home_page	None
Summary	Fast and accurate fundamental frequency (F0) detector using convolutional neural networks
upload_time	2025-07-24 22:50:50
maintainer	None
docs_url	None
author	None
requires_python	>=3.8
license	None
keywords	pitch detection fundamental frequency audio analysis speech processing f0
VCS
bugtrack_url
requirements	onnxruntime numpy
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # SwiftF0

[![PyPI version](https://img.shields.io/pypi/v/swift-f0.svg)](https://pypi.org/project/swift-f0/)
[![License](https://img.shields.io/github/license/lars76/swift_f0.svg)](https://github.com/lars76/swift_f0/blob/main/LICENSE)
[![Demo](https://img.shields.io/badge/demo-online-blue.svg)](https://swift-f0.github.io/)
[![Pitch Benchmark](https://img.shields.io/badge/benchmark-pitch--benchmark-green.svg)](https://github.com/lars76/pitch-benchmark/)

**SwiftF0** is a fast and accurate F0 detector that works by first converting audio into a spectrogram using an STFT, then applying a 2D convolutional neural network to estimate pitch. It’s optimized for:

* ⚡ Real-time analysis (132 ms for 5 seconds of audio on CPU)
* 🎵 Music Information Retrieval
* 🗣️ Speech Analysis

In the [Pitch Detection Benchmark](https://github.com/lars76/pitch-benchmark/), SwiftF0 outperforms algorithms like CREPE in both speed and accuracy. It supports frequencies between **46.875 Hz and 2093.75 Hz** (G1 to C7).

## 🧪 Live Demo

The demo runs entirely client-side using WebAssembly and ONNX.js, so your audio stays private.

👉 [**swift-f0.github.io**](https://swift-f0.github.io/)

## 🚀 Installation

```bash
pip install swift-f0
```

**Optional dependencies**:

```bash
pip install librosa     # audio loading & resampling
pip install matplotlib  # plotting utilities
pip install mido        # MIDI export functionality
```

## ⚡ Quick Start

```python
from swift_f0 import *

# Initialize the detector
# For speech analysis, consider setting fmin=65 and fmax=400
detector = SwiftF0(fmin=46.875, fmax=2093.75, confidence_threshold=0.9)

# Run pitch detection from an audio file
result = detector.detect_from_file("audio.wav")

# For raw audio arrays (e.g., loaded via librosa or scipy)
# result = detector.detect_from_array(audio_data, sample_rate)

# Visualize and export results
plot_pitch(result, show=False, output_path="pitch.jpg")
export_to_csv(result, "pitch_data.csv")

# Segment pitch contour into musical notes
notes = segment_notes(
    result,
    split_semitone_threshold=0.8,
    min_note_duration=0.05
)
plot_notes(notes, output_path="note_segments.jpg")
plot_pitch_and_notes(result, notes, output_path="combined_analysis.jpg")
export_to_midi(notes, "notes.mid")
```

## 📖 API Reference

### Core

#### `SwiftF0(...)`
```python
SwiftF0(
    confidence_threshold: Optional[float] = 0.9,
    fmin: Optional[float] = 46.875,
    fmax: Optional[float] = 2093.75,
)
```
Initialize the pitch detector. Processes audio at 16kHz with 256-sample hop size. The model always detects pitch across its full range (46.875-2093.75 Hz), but these parameters control which detections are marked as "voiced" in the results.

#### `SwiftF0.detect_from_array(...)`
```python
detect_from_array(
    audio_array: np.ndarray,
    sample_rate: int
) -> PitchResult
```
Detect pitch from numpy array. Automatically handles resampling to 16kHz (requires librosa) and converts multi-channel audio to mono by averaging.

#### `SwiftF0.detect_from_file(...)`
```python
detect_from_file(
    audio_path: str
) -> PitchResult
```
Detect pitch from audio file. Requires librosa for file loading. Supports any audio format that librosa can read (WAV, MP3, FLAC, etc.).

#### `class PitchResult`
```python
@dataclass
class PitchResult:
    pitch_hz: np.ndarray      # F0 estimates (Hz) for each frame
    confidence: np.ndarray    # Model confidence [0.0–1.0] for each frame
    timestamps: np.ndarray    # Frame centers in seconds for each frame
    voicing: np.ndarray       # Boolean voicing decisions for each frame
```
Container for pitch detection results. All arrays have the same length. Timestamps are calculated accounting for STFT windowing for accurate frame positioning.

#### `export_to_csv(...)`
```python
export_to_csv(
    result: PitchResult,
    output_path: str
) -> None
```
Export pitch detection results to CSV file with columns: timestamp, pitch_hz, confidence, voiced. Timestamps are formatted to 4 decimal places, pitch to 2 decimal places, confidence to 4 decimal places.

### Musical Note Analysis

#### `segment_notes(...)`
```python
segment_notes(
    result: PitchResult,
    split_semitone_threshold: float = 0.8,
    min_note_duration: float = 0.05,
    unvoiced_grace_period: float = 0.02,
) -> List[NoteSegment]
```
Segments a pitch contour into discrete musical notes. Groups consecutive frames into note segments, splitting when pitch deviates significantly or during extended unvoiced periods. The `split_semitone_threshold` controls pitch sensitivity (higher values create longer notes), while `min_note_duration` filters out brief segments. The `unvoiced_grace_period` allows brief gaps without splitting notes. Returns a list of NoteSegment objects with timing, pitch, and MIDI information, automatically merging adjacent segments with identical MIDI pitch.

#### `class NoteSegment`
```python
@dataclass
class NoteSegment:
    start: float         # Start time in seconds
    end: float           # End time in seconds  
    pitch_median: float  # Median pitch frequency in Hz
    pitch_midi: int      # Quantized MIDI note number (0-127)
```
Represents a musical note segment with timing and pitch information.

#### `export_to_midi(...)`
```python
export_to_midi(
    notes: List[NoteSegment],
    output_path: str,
    tempo: int = 120,
    velocity: int = 80,
    track_name: str = "SwiftF0 Notes",
) -> None
```
Export note segments to MIDI file. The tempo parameter controls playback speed in beats per minute (120 = moderate speed), while velocity controls how loud each note sounds (0 = silent, 127 = maximum volume, 80 = comfortably loud). The track_name labels the MIDI track. Requires the `mido` package.

### Visualization

#### `plot_pitch(...)`
```python
plot_pitch(
    result: PitchResult,
    output_path: Optional[str] = None,
    show: bool = True,
    dpi: int = 300,
    figsize: Tuple[float, float] = (12, 4),
    style: str = "seaborn-v0_8",
) -> None
```
Plot pitch detection results with voicing information. Voiced regions are shown in blue, unvoiced in light gray. Automatically scales y-axis based on detected pitch range. Requires matplotlib.

#### `plot_notes(...)`
```python
plot_notes(
    notes: List[NoteSegment],
    output_path: Optional[str] = None,
    show: bool = True,
    dpi: int = 300,
    figsize: Tuple[float, float] = (12, 6),
    style: str = "seaborn-v0_8",
) -> None
```
Plot note segments as a piano roll visualization. Each note is displayed as a colored rectangle with MIDI note number labels. Colors are mapped to pitch height for visual clarity.

#### `plot_pitch_and_notes(...)`
```python
plot_pitch_and_notes(
    result: PitchResult,
    segments: List[NoteSegment],
    output_path: Optional[str] = None,
    show: bool = True,
    dpi: int = 300,
    figsize: Tuple[float, float] = (12, 4),
    style: str = "seaborn-v0_8",
) -> None
```
Plot pitch contour with overlaid note segments. Displays continuous pitch contour with shaded regions showing segmented notes. Each segment is labeled with its MIDI note number. Ideal for analyzing segmentation quality.

## 🔄 Changelog

See [CHANGELOG.md](CHANGELOG.md) for detailed version history and updates.

## 📄 Citation

If you use SwiftF0 in your research, please cite:

```bibtex
@software{swiftf0,
    title={SwiftF0: Fast and Accurate Fundamental Frequency Detection},
    author={Lars Nieradzik},
    url={https://github.com/lars76/swift-f0},
    year={2025}
}
```

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "swift-f0",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "pitch detection, fundamental frequency, audio analysis, speech processing, F0",
    "author": null,
    "author_email": "Lars Nieradzik <l.nieradzik@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/4b/6e/09c4d2e1c7fda7823369dd3006877f1b508614939eed4c5073d60fb3fd3a/swift_f0-0.1.2.tar.gz",
    "platform": null,
    "description": "# SwiftF0\n\n[![PyPI version](https://img.shields.io/pypi/v/swift-f0.svg)](https://pypi.org/project/swift-f0/)\n[![License](https://img.shields.io/github/license/lars76/swift_f0.svg)](https://github.com/lars76/swift_f0/blob/main/LICENSE)\n[![Demo](https://img.shields.io/badge/demo-online-blue.svg)](https://swift-f0.github.io/)\n[![Pitch Benchmark](https://img.shields.io/badge/benchmark-pitch--benchmark-green.svg)](https://github.com/lars76/pitch-benchmark/)\n\n**SwiftF0** is a fast and accurate F0 detector that works by first converting audio into a spectrogram using an STFT, then applying a 2D convolutional neural network to estimate pitch. It\u2019s optimized for:\n\n* \u26a1 Real-time analysis (132 ms for 5 seconds of audio on CPU)\n* \ud83c\udfb5 Music Information Retrieval\n* \ud83d\udde3\ufe0f Speech Analysis\n\nIn the [Pitch Detection Benchmark](https://github.com/lars76/pitch-benchmark/), SwiftF0 outperforms algorithms like CREPE in both speed and accuracy. It supports frequencies between **46.875 Hz and 2093.75 Hz** (G1 to C7).\n\n## \ud83e\uddea Live Demo\n\nThe demo runs entirely client-side using WebAssembly and ONNX.js, so your audio stays private.\n\n\ud83d\udc49 [**swift-f0.github.io**](https://swift-f0.github.io/)\n\n## \ud83d\ude80 Installation\n\n```bash\npip install swift-f0\n```\n\n**Optional dependencies**:\n\n```bash\npip install librosa     # audio loading & resampling\npip install matplotlib  # plotting utilities\npip install mido        # MIDI export functionality\n```\n\n## \u26a1 Quick Start\n\n```python\nfrom swift_f0 import *\n\n# Initialize the detector\n# For speech analysis, consider setting fmin=65 and fmax=400\ndetector = SwiftF0(fmin=46.875, fmax=2093.75, confidence_threshold=0.9)\n\n# Run pitch detection from an audio file\nresult = detector.detect_from_file(\"audio.wav\")\n\n# For raw audio arrays (e.g., loaded via librosa or scipy)\n# result = detector.detect_from_array(audio_data, sample_rate)\n\n# Visualize and export results\nplot_pitch(result, show=False, output_path=\"pitch.jpg\")\nexport_to_csv(result, \"pitch_data.csv\")\n\n# Segment pitch contour into musical notes\nnotes = segment_notes(\n    result,\n    split_semitone_threshold=0.8,\n    min_note_duration=0.05\n)\nplot_notes(notes, output_path=\"note_segments.jpg\")\nplot_pitch_and_notes(result, notes, output_path=\"combined_analysis.jpg\")\nexport_to_midi(notes, \"notes.mid\")\n```\n\n## \ud83d\udcd6 API Reference\n\n### Core\n\n#### `SwiftF0(...)`\n```python\nSwiftF0(\n    confidence_threshold: Optional[float] = 0.9,\n    fmin: Optional[float] = 46.875,\n    fmax: Optional[float] = 2093.75,\n)\n```\nInitialize the pitch detector. Processes audio at 16kHz with 256-sample hop size. The model always detects pitch across its full range (46.875-2093.75 Hz), but these parameters control which detections are marked as \"voiced\" in the results.\n\n#### `SwiftF0.detect_from_array(...)`\n```python\ndetect_from_array(\n    audio_array: np.ndarray,\n    sample_rate: int\n) -> PitchResult\n```\nDetect pitch from numpy array. Automatically handles resampling to 16kHz (requires librosa) and converts multi-channel audio to mono by averaging.\n\n#### `SwiftF0.detect_from_file(...)`\n```python\ndetect_from_file(\n    audio_path: str\n) -> PitchResult\n```\nDetect pitch from audio file. Requires librosa for file loading. Supports any audio format that librosa can read (WAV, MP3, FLAC, etc.).\n\n#### `class PitchResult`\n```python\n@dataclass\nclass PitchResult:\n    pitch_hz: np.ndarray      # F0 estimates (Hz) for each frame\n    confidence: np.ndarray    # Model confidence [0.0\u20131.0] for each frame\n    timestamps: np.ndarray    # Frame centers in seconds for each frame\n    voicing: np.ndarray       # Boolean voicing decisions for each frame\n```\nContainer for pitch detection results. All arrays have the same length. Timestamps are calculated accounting for STFT windowing for accurate frame positioning.\n\n#### `export_to_csv(...)`\n```python\nexport_to_csv(\n    result: PitchResult,\n    output_path: str\n) -> None\n```\nExport pitch detection results to CSV file with columns: timestamp, pitch_hz, confidence, voiced. Timestamps are formatted to 4 decimal places, pitch to 2 decimal places, confidence to 4 decimal places.\n\n### Musical Note Analysis\n\n#### `segment_notes(...)`\n```python\nsegment_notes(\n    result: PitchResult,\n    split_semitone_threshold: float = 0.8,\n    min_note_duration: float = 0.05,\n    unvoiced_grace_period: float = 0.02,\n) -> List[NoteSegment]\n```\nSegments a pitch contour into discrete musical notes. Groups consecutive frames into note segments, splitting when pitch deviates significantly or during extended unvoiced periods. The `split_semitone_threshold` controls pitch sensitivity (higher values create longer notes), while `min_note_duration` filters out brief segments. The `unvoiced_grace_period` allows brief gaps without splitting notes. Returns a list of NoteSegment objects with timing, pitch, and MIDI information, automatically merging adjacent segments with identical MIDI pitch.\n\n#### `class NoteSegment`\n```python\n@dataclass\nclass NoteSegment:\n    start: float         # Start time in seconds\n    end: float           # End time in seconds  \n    pitch_median: float  # Median pitch frequency in Hz\n    pitch_midi: int      # Quantized MIDI note number (0-127)\n```\nRepresents a musical note segment with timing and pitch information.\n\n#### `export_to_midi(...)`\n```python\nexport_to_midi(\n    notes: List[NoteSegment],\n    output_path: str,\n    tempo: int = 120,\n    velocity: int = 80,\n    track_name: str = \"SwiftF0 Notes\",\n) -> None\n```\nExport note segments to MIDI file. The tempo parameter controls playback speed in beats per minute (120 = moderate speed), while velocity controls how loud each note sounds (0 = silent, 127 = maximum volume, 80 = comfortably loud). The track_name labels the MIDI track. Requires the `mido` package.\n\n### Visualization\n\n#### `plot_pitch(...)`\n```python\nplot_pitch(\n    result: PitchResult,\n    output_path: Optional[str] = None,\n    show: bool = True,\n    dpi: int = 300,\n    figsize: Tuple[float, float] = (12, 4),\n    style: str = \"seaborn-v0_8\",\n) -> None\n```\nPlot pitch detection results with voicing information. Voiced regions are shown in blue, unvoiced in light gray. Automatically scales y-axis based on detected pitch range. Requires matplotlib.\n\n#### `plot_notes(...)`\n```python\nplot_notes(\n    notes: List[NoteSegment],\n    output_path: Optional[str] = None,\n    show: bool = True,\n    dpi: int = 300,\n    figsize: Tuple[float, float] = (12, 6),\n    style: str = \"seaborn-v0_8\",\n) -> None\n```\nPlot note segments as a piano roll visualization. Each note is displayed as a colored rectangle with MIDI note number labels. Colors are mapped to pitch height for visual clarity.\n\n#### `plot_pitch_and_notes(...)`\n```python\nplot_pitch_and_notes(\n    result: PitchResult,\n    segments: List[NoteSegment],\n    output_path: Optional[str] = None,\n    show: bool = True,\n    dpi: int = 300,\n    figsize: Tuple[float, float] = (12, 4),\n    style: str = \"seaborn-v0_8\",\n) -> None\n```\nPlot pitch contour with overlaid note segments. Displays continuous pitch contour with shaded regions showing segmented notes. Each segment is labeled with its MIDI note number. Ideal for analyzing segmentation quality.\n\n## \ud83d\udd04 Changelog\n\nSee [CHANGELOG.md](CHANGELOG.md) for detailed version history and updates.\n\n## \ud83d\udcc4 Citation\n\nIf you use SwiftF0 in your research, please cite:\n\n```bibtex\n@software{swiftf0,\n    title={SwiftF0: Fast and Accurate Fundamental Frequency Detection},\n    author={Lars Nieradzik},\n    url={https://github.com/lars76/swift-f0},\n    year={2025}\n}\n```\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Fast and accurate fundamental frequency (F0) detector using convolutional neural networks",
    "version": "0.1.2",
    "project_urls": {
        "Bug Reports": "https://github.com/lars76/swift-f0/issues",
        "Homepage": "https://github.com/lars76/swift-f0",
        "Source": "https://github.com/lars76/swift-f0"
    },
    "split_keywords": [
        "pitch detection",
        " fundamental frequency",
        " audio analysis",
        " speech processing",
        " f0"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "eb65029ed8e87f77f5c1b171f6387e9c99509f1385c9d9a3c50b844c45bd26b4",
                "md5": "da26d9db0c755c9399408f15b019cbe1",
                "sha256": "212715116025a490be70db0afda8fb27b1eadf267a3b18ed8df4866c1574e717"
            },
            "downloads": -1,
            "filename": "swift_f0-0.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "da26d9db0c755c9399408f15b019cbe1",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 379040,
            "upload_time": "2025-07-24T22:50:47",
            "upload_time_iso_8601": "2025-07-24T22:50:47.645416Z",
            "url": "https://files.pythonhosted.org/packages/eb/65/029ed8e87f77f5c1b171f6387e9c99509f1385c9d9a3c50b844c45bd26b4/swift_f0-0.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "4b6e09c4d2e1c7fda7823369dd3006877f1b508614939eed4c5073d60fb3fd3a",
                "md5": "bc3b3bfccdca758543b7ae835de8bc09",
                "sha256": "2d0e48f1a673bf146358bcf5bbd677da06c77af2480b8bb0926da4f653c9f7d2"
            },
            "downloads": -1,
            "filename": "swift_f0-0.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "bc3b3bfccdca758543b7ae835de8bc09",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 383031,
            "upload_time": "2025-07-24T22:50:50",
            "upload_time_iso_8601": "2025-07-24T22:50:50.516442Z",
            "url": "https://files.pythonhosted.org/packages/4b/6e/09c4d2e1c7fda7823369dd3006877f1b508614939eed4c5073d60fb3fd3a/swift_f0-0.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-24 22:50:50",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "lars76",
    "github_project": "swift-f0",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "onnxruntime",
            "specs": [
                [
                    ">=",
                    "1.12.0"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    ">=",
                    "1.21.0"
                ]
            ]
        }
    ],
    "lcname": "swift-f0"
}

None