# DiscoSeqSampler
[](https://github.com/lifeiteng/DiscoSeqSampler/actions/workflows/ci.yml)
[](https://codecov.io/gh/lifeiteng/DiscoSeqSampler)
[](https://badge.fury.io/py/discoss)
[](https://pypi.org/project/discoss/)
[](https://opensource.org/licenses/MIT)
Distributed Coordinated Sequence Sampler - 一个高效的分布式序列采样框架。
## 背景
在当前的 AI 领域,无论是音频语音(Audio/Speech)还是图像视频(Image/Video)模型,都广泛使用 Transformer 架构。这类模型的计算量与序列长度高度相关,而在大规模数据集中,数据的长度分布往往非常广泛。为了实现高效的多 GPU 训练,必须对训练数据的序列长度进行精细准确的管理。
DiscoSeqSampler 正是为了解决这一关键问题而设计的分布式序列采样框架,它能够智能地协调和管理不同长度的序列数据,确保训练过程的高效性和稳定性。
## 特性
- 🚀 **高性能**: 优化的分布式采样算法
- 🔄 **协调机制**: 智能的序列协调和同步
- 📊 **可扩展**: 支持大规模分布式部署
- 🛠️ **易用性**: 简洁的 API 设计
- 🔧 **可配置**: 灵活的配置选项
## 安装
* **项目仍在开发中,功能尚未完整验证**
### 从 PyPI 安装
```bash
pip install discoss
```
### 从源码安装
```bash
git clone https://github.com/lifeiteng/DiscoSeqSampler.git
cd DiscoSeqSampler
pip install -e .
```
## 快速开始
```python
import discoss
# TODO: 添加使用示例
```
## 开发
查看 [DEVELOPMENT.md](DEVELOPMENT.md) 获取详细的开发指南。
### 快速设置
```bash
# 克隆仓库
git clone https://github.com/lifeiteng/DiscoSeqSampler.git
cd DiscoSeqSampler
# 安装开发依赖
pip install -e .[dev]
# 设置 pre-commit 钩子
make setup-dev
```
### 运行测试
```bash
make test
```
## 贡献
欢迎贡献!请查看 [DEVELOPMENT.md](DEVELOPMENT.md) 了解如何设置开发环境。
## 许可证
本项目采用 MIT 许可证 - 查看 [LICENSE](LICENSE) 文件了解详情。
## 引用
如果您在研究中使用了 DiscoSeqSampler,请引用:
```bibtex
@software{discoss2024,
title={DiscoSeqSampler: Distributed Coordinated Sequence Sampler},
author={Feiteng Li},
year={2025},
url={https://github.com/lifeiteng/DiscoSeqSampler}
}
```
Raw data
{
"_id": null,
"home_page": "https://github.com/lifeiteng/DiscoSeqSampler",
"name": "discoss",
"maintainer": "Feiteng Li",
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "distributed, sampling, sequence, coordination",
"author": "Feiteng Li",
"author_email": "lifeiteng0422@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/de/8e/b2a8cb82fa44f5dbb15230d99337d7ce57e5b4c7304365c559652408b34b/discoss-0.1.1.tar.gz",
"platform": null,
"description": "# DiscoSeqSampler\n\n[](https://github.com/lifeiteng/DiscoSeqSampler/actions/workflows/ci.yml)\n[](https://codecov.io/gh/lifeiteng/DiscoSeqSampler)\n[](https://badge.fury.io/py/discoss)\n[](https://pypi.org/project/discoss/)\n[](https://opensource.org/licenses/MIT)\n\nDistributed Coordinated Sequence Sampler - \u4e00\u4e2a\u9ad8\u6548\u7684\u5206\u5e03\u5f0f\u5e8f\u5217\u91c7\u6837\u6846\u67b6\u3002\n\n## \u80cc\u666f\n\n\u5728\u5f53\u524d\u7684 AI \u9886\u57df\uff0c\u65e0\u8bba\u662f\u97f3\u9891\u8bed\u97f3\uff08Audio/Speech\uff09\u8fd8\u662f\u56fe\u50cf\u89c6\u9891\uff08Image/Video\uff09\u6a21\u578b\uff0c\u90fd\u5e7f\u6cdb\u4f7f\u7528 Transformer \u67b6\u6784\u3002\u8fd9\u7c7b\u6a21\u578b\u7684\u8ba1\u7b97\u91cf\u4e0e\u5e8f\u5217\u957f\u5ea6\u9ad8\u5ea6\u76f8\u5173\uff0c\u800c\u5728\u5927\u89c4\u6a21\u6570\u636e\u96c6\u4e2d\uff0c\u6570\u636e\u7684\u957f\u5ea6\u5206\u5e03\u5f80\u5f80\u975e\u5e38\u5e7f\u6cdb\u3002\u4e3a\u4e86\u5b9e\u73b0\u9ad8\u6548\u7684\u591a GPU \u8bad\u7ec3\uff0c\u5fc5\u987b\u5bf9\u8bad\u7ec3\u6570\u636e\u7684\u5e8f\u5217\u957f\u5ea6\u8fdb\u884c\u7cbe\u7ec6\u51c6\u786e\u7684\u7ba1\u7406\u3002\n\nDiscoSeqSampler \u6b63\u662f\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u5173\u952e\u95ee\u9898\u800c\u8bbe\u8ba1\u7684\u5206\u5e03\u5f0f\u5e8f\u5217\u91c7\u6837\u6846\u67b6\uff0c\u5b83\u80fd\u591f\u667a\u80fd\u5730\u534f\u8c03\u548c\u7ba1\u7406\u4e0d\u540c\u957f\u5ea6\u7684\u5e8f\u5217\u6570\u636e\uff0c\u786e\u4fdd\u8bad\u7ec3\u8fc7\u7a0b\u7684\u9ad8\u6548\u6027\u548c\u7a33\u5b9a\u6027\u3002\n\n## \u7279\u6027\n\n- \ud83d\ude80 **\u9ad8\u6027\u80fd**: \u4f18\u5316\u7684\u5206\u5e03\u5f0f\u91c7\u6837\u7b97\u6cd5\n- \ud83d\udd04 **\u534f\u8c03\u673a\u5236**: \u667a\u80fd\u7684\u5e8f\u5217\u534f\u8c03\u548c\u540c\u6b65\n- \ud83d\udcca **\u53ef\u6269\u5c55**: \u652f\u6301\u5927\u89c4\u6a21\u5206\u5e03\u5f0f\u90e8\u7f72\n- \ud83d\udee0\ufe0f **\u6613\u7528\u6027**: \u7b80\u6d01\u7684 API \u8bbe\u8ba1\n- \ud83d\udd27 **\u53ef\u914d\u7f6e**: \u7075\u6d3b\u7684\u914d\u7f6e\u9009\u9879\n\n## \u5b89\u88c5\n* **\u9879\u76ee\u4ecd\u5728\u5f00\u53d1\u4e2d\uff0c\u529f\u80fd\u5c1a\u672a\u5b8c\u6574\u9a8c\u8bc1**\n\n### \u4ece PyPI \u5b89\u88c5\n\n```bash\npip install discoss\n```\n\n### \u4ece\u6e90\u7801\u5b89\u88c5\n\n```bash\ngit clone https://github.com/lifeiteng/DiscoSeqSampler.git\ncd DiscoSeqSampler\npip install -e .\n```\n\n## \u5feb\u901f\u5f00\u59cb\n\n```python\nimport discoss\n\n# TODO: \u6dfb\u52a0\u4f7f\u7528\u793a\u4f8b\n```\n\n## \u5f00\u53d1\n\n\u67e5\u770b [DEVELOPMENT.md](DEVELOPMENT.md) \u83b7\u53d6\u8be6\u7ec6\u7684\u5f00\u53d1\u6307\u5357\u3002\n\n### \u5feb\u901f\u8bbe\u7f6e\n\n```bash\n# \u514b\u9686\u4ed3\u5e93\ngit clone https://github.com/lifeiteng/DiscoSeqSampler.git\ncd DiscoSeqSampler\n\n# \u5b89\u88c5\u5f00\u53d1\u4f9d\u8d56\npip install -e .[dev]\n\n# \u8bbe\u7f6e pre-commit \u94a9\u5b50\nmake setup-dev\n```\n\n### \u8fd0\u884c\u6d4b\u8bd5\n\n```bash\nmake test\n```\n\n## \u8d21\u732e\n\n\u6b22\u8fce\u8d21\u732e\uff01\u8bf7\u67e5\u770b [DEVELOPMENT.md](DEVELOPMENT.md) \u4e86\u89e3\u5982\u4f55\u8bbe\u7f6e\u5f00\u53d1\u73af\u5883\u3002\n\n## \u8bb8\u53ef\u8bc1\n\n\u672c\u9879\u76ee\u91c7\u7528 MIT \u8bb8\u53ef\u8bc1 - \u67e5\u770b [LICENSE](LICENSE) \u6587\u4ef6\u4e86\u89e3\u8be6\u60c5\u3002\n\n## \u5f15\u7528\n\n\u5982\u679c\u60a8\u5728\u7814\u7a76\u4e2d\u4f7f\u7528\u4e86 DiscoSeqSampler\uff0c\u8bf7\u5f15\u7528\uff1a\n\n```bibtex\n@software{discoss2024,\n title={DiscoSeqSampler: Distributed Coordinated Sequence Sampler},\n author={Feiteng Li},\n year={2025},\n url={https://github.com/lifeiteng/DiscoSeqSampler}\n}\n```\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Distributed Coordinated Sequence Sampler",
"version": "0.1.1",
"project_urls": {
"Homepage": "https://github.com/lifeiteng/DiscoSeqSampler",
"Issues": "https://github.com/lifeiteng/DiscoSeqSampler/issues",
"Repository": "https://github.com/lifeiteng/DiscoSeqSampler.git"
},
"split_keywords": [
"distributed",
" sampling",
" sequence",
" coordination"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "ee87ddb3208e6c620532282113a9d1eb1c63125dd46f4eac7c1d9e76852c4bf4",
"md5": "b77febc5acaecfdfb728302b63fecbd9",
"sha256": "3b8cd95c48025734ca5d4ae996946ad842b18e18eb300f0a79deed86271caa35"
},
"downloads": -1,
"filename": "discoss-0.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "b77febc5acaecfdfb728302b63fecbd9",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 16539,
"upload_time": "2025-08-03T15:36:40",
"upload_time_iso_8601": "2025-08-03T15:36:40.688787Z",
"url": "https://files.pythonhosted.org/packages/ee/87/ddb3208e6c620532282113a9d1eb1c63125dd46f4eac7c1d9e76852c4bf4/discoss-0.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "de8eb2a8cb82fa44f5dbb15230d99337d7ce57e5b4c7304365c559652408b34b",
"md5": "23a52a6f1609a8a048eb6292eaf60546",
"sha256": "f049c242687fcfb675afbc577d52a06d33c3af76a335c541af597499caa59b98"
},
"downloads": -1,
"filename": "discoss-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "23a52a6f1609a8a048eb6292eaf60546",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 20109,
"upload_time": "2025-08-03T15:36:42",
"upload_time_iso_8601": "2025-08-03T15:36:42.218234Z",
"url": "https://files.pythonhosted.org/packages/de/8e/b2a8cb82fa44f5dbb15230d99337d7ce57e5b4c7304365c559652408b34b/discoss-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-03 15:36:42",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "lifeiteng",
"github_project": "DiscoSeqSampler",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "discoss"
}