# Cascade - 高性能异步并行VAD处理库
[](https://python.org)
[](LICENSE)
[](https://github.com/xucailiang/cascade)
[](https://github.com/snakers4/silero-vad)
Cascade是一个专为语音活动检测(VAD)设计的高性能、低延迟音频流处理库。基于优秀的[Silero VAD](https://github.com/snakers4/silero-vad)模型,通过创新的1:1:1绑定架构和异步流式处理技术,显著降低VAD处理延迟,同时保证检测结果的准确性。
## ✨ 核心特性
- **🚀 高性能处理**: 17.75x实时处理速度,优化的并发架构
- **🔄 异步流式**: 基于asyncio的非阻塞音频流处理
- **🎯 简洁API**: 符合开源项目最佳实践的直观接口
- **🧵 1:1:1绑定**: 每个实例对应一个线程、一个缓冲区、一个VAD模型
- **📊 智能状态机**: 基于[Silero VAD](https://github.com/snakers4/silero-vad)的语音段检测和收集
- **🔧 灵活配置**: 支持多种音频格式和处理参数
- **📈 性能监控**: 内置统计和性能分析功能
- **🛡️ 错误恢复**: 完善的错误处理和恢复机制
- **🎯 企业级VAD**: 集成Silero团队的预训练企业级语音活动检测模型
## 🏗️ 架构设计
Cascade采用创新的1:1:1绑定架构,确保最佳性能和资源利用:
```mermaid
graph TD
Client[客户端] --> StreamProcessor[流式处理器]
subgraph "处理实例池"
StreamProcessor --> Instance1[Cascade实例1]
StreamProcessor --> Instance2[Cascade实例2]
StreamProcessor --> InstanceN[Cascade实例N]
end
subgraph "1:1:1绑定架构"
Instance1 --> Thread1[专用线程1]
Thread1 --> Buffer1[环形缓冲区1]
Thread1 --> VAD1[Silero VAD1]
end
subgraph "VAD状态机"
VAD1 --> StateMachine[状态机]
StateMachine --> |None| SingleFrame[单帧输出]
StateMachine --> |start| Collecting[开始收集]
StateMachine --> |end| SpeechSegment[语音段输出]
end
```
## 🚀 快速开始
### 安装
```bash
# 从PyPI安装(推荐)
pip install cascade-vad
# 或从源码安装
git clone https://github.com/xucailiang/cascade.git
cd cascade
pip install -e .
```
### 基础使用
```python
import cascade
import asyncio
async def basic_example():
"""基础使用示例"""
# 方式1:最简单的文件处理
results = await cascade.process_audio_file("audio.wav")
speech_segments = [r for r in results if r.is_speech_segment]
print(f"检测到 {len(speech_segments)} 个语音段")
# 方式2:流式处理
async with cascade.StreamProcessor() as processor:
async for result in processor.process_stream(audio_stream):
if result.is_speech_segment:
segment = result.segment
print(f"🎤 语音段: {segment.start_timestamp_ms:.0f}ms - {segment.end_timestamp_ms:.0f}ms")
else:
frame = result.frame
print(f"🔇 单帧: {frame.timestamp_ms:.0f}ms")
asyncio.run(basic_example())
```
### 高级配置
```python
from cascade.stream import StreamProcessor, create_default_config
async def advanced_example():
"""高级配置示例"""
# 自定义配置
config = create_default_config(
vad_threshold=0.7, # 较高的检测阈值
max_instances=3, # 最多3个并发实例
buffer_size_frames=128 # 较大缓冲区
)
# 使用自定义配置
async with StreamProcessor(config) as processor:
# 处理音频流
async for result in processor.process_stream(audio_stream, "my-stream"):
# 处理结果...
pass
# 获取性能统计
stats = processor.get_stats()
print(f"处理统计: {stats.summary()}")
print(f"吞吐量: {stats.throughput_chunks_per_second:.1f} 块/秒")
asyncio.run(advanced_example())
```
## 📁 项目结构
```
cascade/
├── __init__.py # 主要API入口
├── stream/ # 流式处理核心模块
│ ├── __init__.py # 统一API导出
│ ├── processor.py # StreamProcessor主处理器
│ ├── instance.py # CascadeInstance处理实例
│ ├── state_machine.py # VAD状态机
│ ├── collector.py # 语音帧收集器
│ └── types.py # 流式处理类型定义
├── backends/ # VAD后端实现
│ ├── __init__.py
│ ├── base.py # 后端基类
│ ├── silero.py # Silero VAD后端
│ └── onnx.py # ONNX后端
├── buffer/ # 缓冲区管理
│ ├── __init__.py
│ ├── base.py # 缓冲区基类
│ └── ring_buffer.py # 环形缓冲区实现
├── types/ # 类型系统
│ ├── __init__.py # 核心类型导出
│ ├── errors.py # 错误类型
│ ├── performance.py # 性能监控类型
│ └── version.py # 版本信息
└── _internal/ # 内部工具
├── __init__.py
├── atomic.py # 原子操作
├── thread_pool.py # 线程池管理
└── utils.py # 工具函数
```
## 🎯 核心概念
### VAD状态机
Cascade基于Silero VAD的输出设计了智能状态机:
- **None**: 非语音帧,直接输出单帧结果
- **{'start': timestamp}**: 语音开始,进入收集状态
- **{'end': timestamp}**: 语音结束,输出完整语音段
### 数据类型
```python
# 单个音频帧(512样本)
class AudioFrame:
frame_id: int
audio_data: bytes
timestamp_ms: float
vad_result: Optional[Dict[str, Any]]
# 完整语音段(从start到end)
class SpeechSegment:
segment_id: int
audio_data: bytes # 合并的音频数据
start_timestamp_ms: float
end_timestamp_ms: float
frame_count: int
duration_ms: float
# 统一输出结果
class CascadeResult:
result_type: Literal["frame", "segment"]
frame: Optional[AudioFrame]
segment: Optional[SpeechSegment]
```
### 配置系统
```python
from cascade.stream import create_default_config
# 创建配置
config = create_default_config(
# VAD配置
vad_threshold=0.5, # VAD检测阈值 (0.0-1.0)
# 性能配置
max_instances=5, # 最大并发实例数
buffer_size_frames=64, # 缓冲区大小(帧数)
# 音频配置(固定值,基于Silero VAD要求)
sample_rate=16000, # 采样率(固定16kHz)
frame_size=512, # 帧大小(固定512样本)
frame_duration_ms=32.0, # 帧时长(固定32ms)
)
```
## 🧪 测试
项目包含完整的测试套件,验证所有核心功能:
```bash
# 运行综合测试
python test_comprehensive_core.py
# 运行实时音频测试
python test_stream_real_audio.py
```
测试覆盖:
- ✅ 基础API使用
- ✅ 流式处理功能
- ✅ 文件处理功能
- ✅ 高级配置测试
- ✅ 并发处理能力
- ✅ 错误处理和恢复
- ✅ 性能基准测试
## 📊 性能表现
基于测试结果的性能指标:
- **处理速度**: 17.75x实时处理速度
- **延迟**: 1ms最小延迟(智能模式)
- **并发能力**: 支持多实例并发处理
- **内存效率**: 智能缓冲区管理,最小内存占用
- **准确性**: 基于Silero VAD,保证检测准确性
## 🔧 依赖要求
### 核心依赖
- **Python**: 3.12+
- **pydantic**: 2.4.0+ (数据验证)
- **numpy**: 1.24.0+ (数值计算)
- **scipy**: 1.11.0+ (信号处理)
- **silero-vad**: 5.1.2+ (VAD模型)
- **onnxruntime**: 1.22.1+ (ONNX推理)
- **torchaudio**: 2.7.1+ (音频处理)
### 开发依赖
- **pytest**: 测试框架
- **black**: 代码格式化
- **ruff**: 代码检查
- **mypy**: 类型检查
- **pre-commit**: Git钩子
## 📖 API文档
### StreamProcessor
核心流式处理器,提供统一的音频处理接口:
```python
class StreamProcessor:
async def __aenter__(self) -> 'StreamProcessor'
async def __aexit__(self, exc_type, exc_val, exc_tb) -> None
async def process_chunk(self, audio_data: bytes) -> List[CascadeResult]
async def process_stream(self, audio_stream: AsyncIterator[bytes], stream_id: str = None) -> AsyncIterator[CascadeResult]
def get_stats(self) -> ProcessorStats
@property
def is_running(self) -> bool
```
### 便捷函数
```python
# 处理音频流
async def process_audio_stream(audio_stream, config=None, stream_id=None)
# 处理音频块
async def process_audio_chunk(audio_data: bytes, config=None)
# 创建默认配置
def create_default_config(**kwargs) -> Config
# 创建流式处理器
def create_stream_processor(config=None) -> StreamProcessor
```
## 🤝 贡献指南
我们欢迎社区贡献!请遵循以下步骤:
1. **Fork项目**并创建特性分支
2. **安装开发依赖**: `pip install -e .[dev]`
3. **运行测试**: `pytest`
4. **代码检查**: `ruff check . && black --check .`
5. **类型检查**: `mypy cascade`
6. **提交PR**并描述变更
### 开发环境设置
```bash
# 克隆项目
git clone https://github.com/xucailiang/cascade.git
cd cascade
# 创建虚拟环境
python -m venv venv
source venv/bin/activate # Linux/Mac
# 或 venv\Scripts\activate # Windows
# 安装开发依赖
pip install -e .
# 安装pre-commit钩子
pre-commit install
# 运行测试
python test_comprehensive_core.py
```
## 📄 许可证
本项目采用MIT许可证 - 详见 [LICENSE](LICENSE) 文件。
## 🙏 致谢
- **Silero Team**: 提供优秀的VAD模型
- **PyTorch Team**: 深度学习框架支持
- **Pydantic Team**: 类型验证系统
- **Python社区**: 丰富的生态系统
## 📞 联系方式
- **作者**: Xucailiang
- **邮箱**: xucailiang.ai@gmail.com
- **项目主页**: https://github.com/xucailiang/cascade
- **问题反馈**: https://github.com/xucailiang/cascade/issues
- **文档**: https://cascade-vad.readthedocs.io/
## 🗺️ 路线图
### v0.2.0 (计划中)
- [ ] 支持更多音频格式 (MP3, FLAC)
- [ ] 实时麦克风输入支持
- [ ] WebSocket API接口
- [ ] 性能优化和内存减少
### v0.3.0 (计划中)
- [ ] 多语言VAD模型支持
- [ ] 语音分离和增强
- [ ] 云端部署支持
- [ ] 可视化监控界面
---
**⭐ 如果这个项目对您有帮助,请给我们一个Star!**
Raw data
{
"_id": null,
"home_page": null,
"name": "cascade-vad",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.11",
"maintainer_email": "Xucailiang <xucailiang.ai@gmail.com>",
"keywords": "voice-activity-detection, vad, audio-processing, speech, async, parallel, high-performance",
"author": null,
"author_email": "Xucailiang <xucailiang.ai@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/25/5f/e104b59a0cebda4b57bb566bf3c1944bdcb34b87e73763bc95eb1bc3bec0/cascade_vad-0.1.0.tar.gz",
"platform": null,
"description": "# Cascade - \u9ad8\u6027\u80fd\u5f02\u6b65\u5e76\u884cVAD\u5904\u7406\u5e93\n\n[](https://python.org)\n[](LICENSE)\n[](https://github.com/xucailiang/cascade)\n[](https://github.com/snakers4/silero-vad)\n\nCascade\u662f\u4e00\u4e2a\u4e13\u4e3a\u8bed\u97f3\u6d3b\u52a8\u68c0\u6d4b(VAD)\u8bbe\u8ba1\u7684\u9ad8\u6027\u80fd\u3001\u4f4e\u5ef6\u8fdf\u97f3\u9891\u6d41\u5904\u7406\u5e93\u3002\u57fa\u4e8e\u4f18\u79c0\u7684[Silero VAD](https://github.com/snakers4/silero-vad)\u6a21\u578b\uff0c\u901a\u8fc7\u521b\u65b0\u76841:1:1\u7ed1\u5b9a\u67b6\u6784\u548c\u5f02\u6b65\u6d41\u5f0f\u5904\u7406\u6280\u672f\uff0c\u663e\u8457\u964d\u4f4eVAD\u5904\u7406\u5ef6\u8fdf\uff0c\u540c\u65f6\u4fdd\u8bc1\u68c0\u6d4b\u7ed3\u679c\u7684\u51c6\u786e\u6027\u3002\n\n## \u2728 \u6838\u5fc3\u7279\u6027\n\n- **\ud83d\ude80 \u9ad8\u6027\u80fd\u5904\u7406**: 17.75x\u5b9e\u65f6\u5904\u7406\u901f\u5ea6\uff0c\u4f18\u5316\u7684\u5e76\u53d1\u67b6\u6784\n- **\ud83d\udd04 \u5f02\u6b65\u6d41\u5f0f**: \u57fa\u4e8easyncio\u7684\u975e\u963b\u585e\u97f3\u9891\u6d41\u5904\u7406\n- **\ud83c\udfaf \u7b80\u6d01API**: \u7b26\u5408\u5f00\u6e90\u9879\u76ee\u6700\u4f73\u5b9e\u8df5\u7684\u76f4\u89c2\u63a5\u53e3\n- **\ud83e\uddf5 1:1:1\u7ed1\u5b9a**: \u6bcf\u4e2a\u5b9e\u4f8b\u5bf9\u5e94\u4e00\u4e2a\u7ebf\u7a0b\u3001\u4e00\u4e2a\u7f13\u51b2\u533a\u3001\u4e00\u4e2aVAD\u6a21\u578b\n- **\ud83d\udcca \u667a\u80fd\u72b6\u6001\u673a**: \u57fa\u4e8e[Silero VAD](https://github.com/snakers4/silero-vad)\u7684\u8bed\u97f3\u6bb5\u68c0\u6d4b\u548c\u6536\u96c6\n- **\ud83d\udd27 \u7075\u6d3b\u914d\u7f6e**: \u652f\u6301\u591a\u79cd\u97f3\u9891\u683c\u5f0f\u548c\u5904\u7406\u53c2\u6570\n- **\ud83d\udcc8 \u6027\u80fd\u76d1\u63a7**: \u5185\u7f6e\u7edf\u8ba1\u548c\u6027\u80fd\u5206\u6790\u529f\u80fd\n- **\ud83d\udee1\ufe0f \u9519\u8bef\u6062\u590d**: \u5b8c\u5584\u7684\u9519\u8bef\u5904\u7406\u548c\u6062\u590d\u673a\u5236\n- **\ud83c\udfaf \u4f01\u4e1a\u7ea7VAD**: \u96c6\u6210Silero\u56e2\u961f\u7684\u9884\u8bad\u7ec3\u4f01\u4e1a\u7ea7\u8bed\u97f3\u6d3b\u52a8\u68c0\u6d4b\u6a21\u578b\n\n## \ud83c\udfd7\ufe0f \u67b6\u6784\u8bbe\u8ba1\n\nCascade\u91c7\u7528\u521b\u65b0\u76841:1:1\u7ed1\u5b9a\u67b6\u6784\uff0c\u786e\u4fdd\u6700\u4f73\u6027\u80fd\u548c\u8d44\u6e90\u5229\u7528\uff1a\n\n```mermaid\ngraph TD\n Client[\u5ba2\u6237\u7aef] --> StreamProcessor[\u6d41\u5f0f\u5904\u7406\u5668]\n \n subgraph \"\u5904\u7406\u5b9e\u4f8b\u6c60\"\n StreamProcessor --> Instance1[Cascade\u5b9e\u4f8b1]\n StreamProcessor --> Instance2[Cascade\u5b9e\u4f8b2]\n StreamProcessor --> InstanceN[Cascade\u5b9e\u4f8bN]\n end\n \n subgraph \"1:1:1\u7ed1\u5b9a\u67b6\u6784\"\n Instance1 --> Thread1[\u4e13\u7528\u7ebf\u7a0b1]\n Thread1 --> Buffer1[\u73af\u5f62\u7f13\u51b2\u533a1]\n Thread1 --> VAD1[Silero VAD1]\n end\n \n subgraph \"VAD\u72b6\u6001\u673a\"\n VAD1 --> StateMachine[\u72b6\u6001\u673a]\n StateMachine --> |None| SingleFrame[\u5355\u5e27\u8f93\u51fa]\n StateMachine --> |start| Collecting[\u5f00\u59cb\u6536\u96c6]\n StateMachine --> |end| SpeechSegment[\u8bed\u97f3\u6bb5\u8f93\u51fa]\n end\n```\n\n## \ud83d\ude80 \u5feb\u901f\u5f00\u59cb\n\n### \u5b89\u88c5\n\n```bash\n# \u4ecePyPI\u5b89\u88c5\uff08\u63a8\u8350\uff09\npip install cascade-vad\n\n# \u6216\u4ece\u6e90\u7801\u5b89\u88c5\ngit clone https://github.com/xucailiang/cascade.git\ncd cascade\npip install -e .\n```\n\n### \u57fa\u7840\u4f7f\u7528\n\n```python\nimport cascade\nimport asyncio\n\nasync def basic_example():\n \"\"\"\u57fa\u7840\u4f7f\u7528\u793a\u4f8b\"\"\"\n \n # \u65b9\u5f0f1\uff1a\u6700\u7b80\u5355\u7684\u6587\u4ef6\u5904\u7406\n results = await cascade.process_audio_file(\"audio.wav\")\n speech_segments = [r for r in results if r.is_speech_segment]\n print(f\"\u68c0\u6d4b\u5230 {len(speech_segments)} \u4e2a\u8bed\u97f3\u6bb5\")\n \n # \u65b9\u5f0f2\uff1a\u6d41\u5f0f\u5904\u7406\n async with cascade.StreamProcessor() as processor:\n async for result in processor.process_stream(audio_stream):\n if result.is_speech_segment:\n segment = result.segment\n print(f\"\ud83c\udfa4 \u8bed\u97f3\u6bb5: {segment.start_timestamp_ms:.0f}ms - {segment.end_timestamp_ms:.0f}ms\")\n else:\n frame = result.frame\n print(f\"\ud83d\udd07 \u5355\u5e27: {frame.timestamp_ms:.0f}ms\")\n\nasyncio.run(basic_example())\n```\n\n### \u9ad8\u7ea7\u914d\u7f6e\n\n```python\nfrom cascade.stream import StreamProcessor, create_default_config\n\nasync def advanced_example():\n \"\"\"\u9ad8\u7ea7\u914d\u7f6e\u793a\u4f8b\"\"\"\n \n # \u81ea\u5b9a\u4e49\u914d\u7f6e\n config = create_default_config(\n vad_threshold=0.7, # \u8f83\u9ad8\u7684\u68c0\u6d4b\u9608\u503c\n max_instances=3, # \u6700\u591a3\u4e2a\u5e76\u53d1\u5b9e\u4f8b\n buffer_size_frames=128 # \u8f83\u5927\u7f13\u51b2\u533a\n )\n \n # \u4f7f\u7528\u81ea\u5b9a\u4e49\u914d\u7f6e\n async with StreamProcessor(config) as processor:\n # \u5904\u7406\u97f3\u9891\u6d41\n async for result in processor.process_stream(audio_stream, \"my-stream\"):\n # \u5904\u7406\u7ed3\u679c...\n pass\n \n # \u83b7\u53d6\u6027\u80fd\u7edf\u8ba1\n stats = processor.get_stats()\n print(f\"\u5904\u7406\u7edf\u8ba1: {stats.summary()}\")\n print(f\"\u541e\u5410\u91cf: {stats.throughput_chunks_per_second:.1f} \u5757/\u79d2\")\n\nasyncio.run(advanced_example())\n```\n\n## \ud83d\udcc1 \u9879\u76ee\u7ed3\u6784\n\n```\ncascade/\n\u251c\u2500\u2500 __init__.py # \u4e3b\u8981API\u5165\u53e3\n\u251c\u2500\u2500 stream/ # \u6d41\u5f0f\u5904\u7406\u6838\u5fc3\u6a21\u5757\n\u2502 \u251c\u2500\u2500 __init__.py # \u7edf\u4e00API\u5bfc\u51fa\n\u2502 \u251c\u2500\u2500 processor.py # StreamProcessor\u4e3b\u5904\u7406\u5668\n\u2502 \u251c\u2500\u2500 instance.py # CascadeInstance\u5904\u7406\u5b9e\u4f8b\n\u2502 \u251c\u2500\u2500 state_machine.py # VAD\u72b6\u6001\u673a\n\u2502 \u251c\u2500\u2500 collector.py # \u8bed\u97f3\u5e27\u6536\u96c6\u5668\n\u2502 \u2514\u2500\u2500 types.py # \u6d41\u5f0f\u5904\u7406\u7c7b\u578b\u5b9a\u4e49\n\u251c\u2500\u2500 backends/ # VAD\u540e\u7aef\u5b9e\u73b0\n\u2502 \u251c\u2500\u2500 __init__.py\n\u2502 \u251c\u2500\u2500 base.py # \u540e\u7aef\u57fa\u7c7b\n\u2502 \u251c\u2500\u2500 silero.py # Silero VAD\u540e\u7aef\n\u2502 \u2514\u2500\u2500 onnx.py # ONNX\u540e\u7aef\n\u251c\u2500\u2500 buffer/ # \u7f13\u51b2\u533a\u7ba1\u7406\n\u2502 \u251c\u2500\u2500 __init__.py\n\u2502 \u251c\u2500\u2500 base.py # \u7f13\u51b2\u533a\u57fa\u7c7b\n\u2502 \u2514\u2500\u2500 ring_buffer.py # \u73af\u5f62\u7f13\u51b2\u533a\u5b9e\u73b0\n\u251c\u2500\u2500 types/ # \u7c7b\u578b\u7cfb\u7edf\n\u2502 \u251c\u2500\u2500 __init__.py # \u6838\u5fc3\u7c7b\u578b\u5bfc\u51fa\n\u2502 \u251c\u2500\u2500 errors.py # \u9519\u8bef\u7c7b\u578b\n\u2502 \u251c\u2500\u2500 performance.py # \u6027\u80fd\u76d1\u63a7\u7c7b\u578b\n\u2502 \u2514\u2500\u2500 version.py # \u7248\u672c\u4fe1\u606f\n\u2514\u2500\u2500 _internal/ # \u5185\u90e8\u5de5\u5177\n \u251c\u2500\u2500 __init__.py\n \u251c\u2500\u2500 atomic.py # \u539f\u5b50\u64cd\u4f5c\n \u251c\u2500\u2500 thread_pool.py # \u7ebf\u7a0b\u6c60\u7ba1\u7406\n \u2514\u2500\u2500 utils.py # \u5de5\u5177\u51fd\u6570\n```\n\n## \ud83c\udfaf \u6838\u5fc3\u6982\u5ff5\n\n### VAD\u72b6\u6001\u673a\n\nCascade\u57fa\u4e8eSilero VAD\u7684\u8f93\u51fa\u8bbe\u8ba1\u4e86\u667a\u80fd\u72b6\u6001\u673a\uff1a\n\n- **None**: \u975e\u8bed\u97f3\u5e27\uff0c\u76f4\u63a5\u8f93\u51fa\u5355\u5e27\u7ed3\u679c\n- **{'start': timestamp}**: \u8bed\u97f3\u5f00\u59cb\uff0c\u8fdb\u5165\u6536\u96c6\u72b6\u6001\n- **{'end': timestamp}**: \u8bed\u97f3\u7ed3\u675f\uff0c\u8f93\u51fa\u5b8c\u6574\u8bed\u97f3\u6bb5\n\n### \u6570\u636e\u7c7b\u578b\n\n```python\n# \u5355\u4e2a\u97f3\u9891\u5e27\uff08512\u6837\u672c\uff09\nclass AudioFrame:\n frame_id: int\n audio_data: bytes\n timestamp_ms: float\n vad_result: Optional[Dict[str, Any]]\n\n# \u5b8c\u6574\u8bed\u97f3\u6bb5\uff08\u4ecestart\u5230end\uff09\nclass SpeechSegment:\n segment_id: int\n audio_data: bytes # \u5408\u5e76\u7684\u97f3\u9891\u6570\u636e\n start_timestamp_ms: float\n end_timestamp_ms: float\n frame_count: int\n duration_ms: float\n\n# \u7edf\u4e00\u8f93\u51fa\u7ed3\u679c\nclass CascadeResult:\n result_type: Literal[\"frame\", \"segment\"]\n frame: Optional[AudioFrame]\n segment: Optional[SpeechSegment]\n```\n\n### \u914d\u7f6e\u7cfb\u7edf\n\n```python\nfrom cascade.stream import create_default_config\n\n# \u521b\u5efa\u914d\u7f6e\nconfig = create_default_config(\n # VAD\u914d\u7f6e\n vad_threshold=0.5, # VAD\u68c0\u6d4b\u9608\u503c (0.0-1.0)\n \n # \u6027\u80fd\u914d\u7f6e\n max_instances=5, # \u6700\u5927\u5e76\u53d1\u5b9e\u4f8b\u6570\n buffer_size_frames=64, # \u7f13\u51b2\u533a\u5927\u5c0f\uff08\u5e27\u6570\uff09\n \n # \u97f3\u9891\u914d\u7f6e\uff08\u56fa\u5b9a\u503c\uff0c\u57fa\u4e8eSilero VAD\u8981\u6c42\uff09\n sample_rate=16000, # \u91c7\u6837\u7387\uff08\u56fa\u5b9a16kHz\uff09\n frame_size=512, # \u5e27\u5927\u5c0f\uff08\u56fa\u5b9a512\u6837\u672c\uff09\n frame_duration_ms=32.0, # \u5e27\u65f6\u957f\uff08\u56fa\u5b9a32ms\uff09\n)\n```\n\n## \ud83e\uddea \u6d4b\u8bd5\n\n\u9879\u76ee\u5305\u542b\u5b8c\u6574\u7684\u6d4b\u8bd5\u5957\u4ef6\uff0c\u9a8c\u8bc1\u6240\u6709\u6838\u5fc3\u529f\u80fd\uff1a\n\n```bash\n# \u8fd0\u884c\u7efc\u5408\u6d4b\u8bd5\npython test_comprehensive_core.py\n\n# \u8fd0\u884c\u5b9e\u65f6\u97f3\u9891\u6d4b\u8bd5\npython test_stream_real_audio.py\n```\n\n\u6d4b\u8bd5\u8986\u76d6\uff1a\n- \u2705 \u57fa\u7840API\u4f7f\u7528\n- \u2705 \u6d41\u5f0f\u5904\u7406\u529f\u80fd\n- \u2705 \u6587\u4ef6\u5904\u7406\u529f\u80fd\n- \u2705 \u9ad8\u7ea7\u914d\u7f6e\u6d4b\u8bd5\n- \u2705 \u5e76\u53d1\u5904\u7406\u80fd\u529b\n- \u2705 \u9519\u8bef\u5904\u7406\u548c\u6062\u590d\n- \u2705 \u6027\u80fd\u57fa\u51c6\u6d4b\u8bd5\n\n## \ud83d\udcca \u6027\u80fd\u8868\u73b0\n\n\u57fa\u4e8e\u6d4b\u8bd5\u7ed3\u679c\u7684\u6027\u80fd\u6307\u6807\uff1a\n\n- **\u5904\u7406\u901f\u5ea6**: 17.75x\u5b9e\u65f6\u5904\u7406\u901f\u5ea6\n- **\u5ef6\u8fdf**: 1ms\u6700\u5c0f\u5ef6\u8fdf\uff08\u667a\u80fd\u6a21\u5f0f\uff09\n- **\u5e76\u53d1\u80fd\u529b**: \u652f\u6301\u591a\u5b9e\u4f8b\u5e76\u53d1\u5904\u7406\n- **\u5185\u5b58\u6548\u7387**: \u667a\u80fd\u7f13\u51b2\u533a\u7ba1\u7406\uff0c\u6700\u5c0f\u5185\u5b58\u5360\u7528\n- **\u51c6\u786e\u6027**: \u57fa\u4e8eSilero VAD\uff0c\u4fdd\u8bc1\u68c0\u6d4b\u51c6\u786e\u6027\n\n## \ud83d\udd27 \u4f9d\u8d56\u8981\u6c42\n\n### \u6838\u5fc3\u4f9d\u8d56\n\n- **Python**: 3.12+\n- **pydantic**: 2.4.0+ (\u6570\u636e\u9a8c\u8bc1)\n- **numpy**: 1.24.0+ (\u6570\u503c\u8ba1\u7b97)\n- **scipy**: 1.11.0+ (\u4fe1\u53f7\u5904\u7406)\n- **silero-vad**: 5.1.2+ (VAD\u6a21\u578b)\n- **onnxruntime**: 1.22.1+ (ONNX\u63a8\u7406)\n- **torchaudio**: 2.7.1+ (\u97f3\u9891\u5904\u7406)\n\n### \u5f00\u53d1\u4f9d\u8d56\n\n- **pytest**: \u6d4b\u8bd5\u6846\u67b6\n- **black**: \u4ee3\u7801\u683c\u5f0f\u5316\n- **ruff**: \u4ee3\u7801\u68c0\u67e5\n- **mypy**: \u7c7b\u578b\u68c0\u67e5\n- **pre-commit**: Git\u94a9\u5b50\n\n## \ud83d\udcd6 API\u6587\u6863\n\n### StreamProcessor\n\n\u6838\u5fc3\u6d41\u5f0f\u5904\u7406\u5668\uff0c\u63d0\u4f9b\u7edf\u4e00\u7684\u97f3\u9891\u5904\u7406\u63a5\u53e3\uff1a\n\n```python\nclass StreamProcessor:\n async def __aenter__(self) -> 'StreamProcessor'\n async def __aexit__(self, exc_type, exc_val, exc_tb) -> None\n \n async def process_chunk(self, audio_data: bytes) -> List[CascadeResult]\n async def process_stream(self, audio_stream: AsyncIterator[bytes], stream_id: str = None) -> AsyncIterator[CascadeResult]\n \n def get_stats(self) -> ProcessorStats\n @property\n def is_running(self) -> bool\n```\n\n### \u4fbf\u6377\u51fd\u6570\n\n```python\n# \u5904\u7406\u97f3\u9891\u6d41\nasync def process_audio_stream(audio_stream, config=None, stream_id=None)\n\n# \u5904\u7406\u97f3\u9891\u5757\nasync def process_audio_chunk(audio_data: bytes, config=None)\n\n# \u521b\u5efa\u9ed8\u8ba4\u914d\u7f6e\ndef create_default_config(**kwargs) -> Config\n\n# \u521b\u5efa\u6d41\u5f0f\u5904\u7406\u5668\ndef create_stream_processor(config=None) -> StreamProcessor\n```\n\n## \ud83e\udd1d \u8d21\u732e\u6307\u5357\n\n\u6211\u4eec\u6b22\u8fce\u793e\u533a\u8d21\u732e\uff01\u8bf7\u9075\u5faa\u4ee5\u4e0b\u6b65\u9aa4\uff1a\n\n1. **Fork\u9879\u76ee**\u5e76\u521b\u5efa\u7279\u6027\u5206\u652f\n2. **\u5b89\u88c5\u5f00\u53d1\u4f9d\u8d56**: `pip install -e .[dev]`\n3. **\u8fd0\u884c\u6d4b\u8bd5**: `pytest`\n4. **\u4ee3\u7801\u68c0\u67e5**: `ruff check . && black --check .`\n5. **\u7c7b\u578b\u68c0\u67e5**: `mypy cascade`\n6. **\u63d0\u4ea4PR**\u5e76\u63cf\u8ff0\u53d8\u66f4\n\n### \u5f00\u53d1\u73af\u5883\u8bbe\u7f6e\n\n```bash\n# \u514b\u9686\u9879\u76ee\ngit clone https://github.com/xucailiang/cascade.git\ncd cascade\n\n# \u521b\u5efa\u865a\u62df\u73af\u5883\npython -m venv venv\nsource venv/bin/activate # Linux/Mac\n# \u6216 venv\\Scripts\\activate # Windows\n\n# \u5b89\u88c5\u5f00\u53d1\u4f9d\u8d56\npip install -e .\n\n# \u5b89\u88c5pre-commit\u94a9\u5b50\npre-commit install\n\n# \u8fd0\u884c\u6d4b\u8bd5\npython test_comprehensive_core.py\n```\n\n## \ud83d\udcc4 \u8bb8\u53ef\u8bc1\n\n\u672c\u9879\u76ee\u91c7\u7528MIT\u8bb8\u53ef\u8bc1 - \u8be6\u89c1 [LICENSE](LICENSE) \u6587\u4ef6\u3002\n\n## \ud83d\ude4f \u81f4\u8c22\n\n- **Silero Team**: \u63d0\u4f9b\u4f18\u79c0\u7684VAD\u6a21\u578b\n- **PyTorch Team**: \u6df1\u5ea6\u5b66\u4e60\u6846\u67b6\u652f\u6301\n- **Pydantic Team**: \u7c7b\u578b\u9a8c\u8bc1\u7cfb\u7edf\n- **Python\u793e\u533a**: \u4e30\u5bcc\u7684\u751f\u6001\u7cfb\u7edf\n\n## \ud83d\udcde \u8054\u7cfb\u65b9\u5f0f\n\n- **\u4f5c\u8005**: Xucailiang\n- **\u90ae\u7bb1**: xucailiang.ai@gmail.com\n- **\u9879\u76ee\u4e3b\u9875**: https://github.com/xucailiang/cascade\n- **\u95ee\u9898\u53cd\u9988**: https://github.com/xucailiang/cascade/issues\n- **\u6587\u6863**: https://cascade-vad.readthedocs.io/\n\n## \ud83d\uddfa\ufe0f \u8def\u7ebf\u56fe\n\n### v0.2.0 (\u8ba1\u5212\u4e2d)\n- [ ] \u652f\u6301\u66f4\u591a\u97f3\u9891\u683c\u5f0f (MP3, FLAC)\n- [ ] \u5b9e\u65f6\u9ea6\u514b\u98ce\u8f93\u5165\u652f\u6301\n- [ ] WebSocket API\u63a5\u53e3\n- [ ] \u6027\u80fd\u4f18\u5316\u548c\u5185\u5b58\u51cf\u5c11\n\n### v0.3.0 (\u8ba1\u5212\u4e2d)\n- [ ] \u591a\u8bed\u8a00VAD\u6a21\u578b\u652f\u6301\n- [ ] \u8bed\u97f3\u5206\u79bb\u548c\u589e\u5f3a\n- [ ] \u4e91\u7aef\u90e8\u7f72\u652f\u6301\n- [ ] \u53ef\u89c6\u5316\u76d1\u63a7\u754c\u9762\n\n---\n\n**\u2b50 \u5982\u679c\u8fd9\u4e2a\u9879\u76ee\u5bf9\u60a8\u6709\u5e2e\u52a9\uff0c\u8bf7\u7ed9\u6211\u4eec\u4e00\u4e2aStar\uff01**\n",
"bugtrack_url": null,
"license": null,
"summary": "\u9ad8\u6027\u80fd\u5f02\u6b65\u5e76\u884cVAD\u5904\u7406\u5e93",
"version": "0.1.0",
"project_urls": {
"Changelog": "https://github.com/xucailiang/cascade/blob/main/CHANGELOG.md",
"Documentation": "https://cascade-vad.readthedocs.io/",
"Issues": "https://github.com/xucailiang/cascade/issues",
"Repository": "https://github.com/xucailiang/cascade"
},
"split_keywords": [
"voice-activity-detection",
" vad",
" audio-processing",
" speech",
" async",
" parallel",
" high-performance"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "3fa46f323e08400dc29763afbded67fcbd93d75c0b44e92466f4f9015e4b4528",
"md5": "3fc060fd95e05a77163eb2ffc822d545",
"sha256": "19e96be1f07d7753ee03f7210e6f087dc20f547df5d8a7f9b6f04e3531def102"
},
"downloads": -1,
"filename": "cascade_vad-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "3fc060fd95e05a77163eb2ffc822d545",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.11",
"size": 10352,
"upload_time": "2025-08-27T06:42:27",
"upload_time_iso_8601": "2025-08-27T06:42:27.061537Z",
"url": "https://files.pythonhosted.org/packages/3f/a4/6f323e08400dc29763afbded67fcbd93d75c0b44e92466f4f9015e4b4528/cascade_vad-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "255fe104b59a0cebda4b57bb566bf3c1944bdcb34b87e73763bc95eb1bc3bec0",
"md5": "b942ed29abcbbe1f4ed483c59db7ea78",
"sha256": "ffd821f8a70c69355163974b912fd7c618c24891ac43bac03b945903d600a699"
},
"downloads": -1,
"filename": "cascade_vad-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "b942ed29abcbbe1f4ed483c59db7ea78",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.11",
"size": 12100,
"upload_time": "2025-08-27T06:42:28",
"upload_time_iso_8601": "2025-08-27T06:42:28.500814Z",
"url": "https://files.pythonhosted.org/packages/25/5f/e104b59a0cebda4b57bb566bf3c1944bdcb34b87e73763bc95eb1bc3bec0/cascade_vad-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-27 06:42:28",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "xucailiang",
"github_project": "cascade",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "cascade-vad"
}