discoss


Namediscoss JSON
Version 0.1.1 PyPI version JSON
download
home_pagehttps://github.com/lifeiteng/DiscoSeqSampler
SummaryDistributed Coordinated Sequence Sampler
upload_time2025-08-03 15:36:42
maintainerFeiteng Li
docs_urlNone
authorFeiteng Li
requires_python>=3.10
licenseMIT
keywords distributed sampling sequence coordination
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # DiscoSeqSampler

[![CI](https://github.com/lifeiteng/DiscoSeqSampler/actions/workflows/ci.yml/badge.svg)](https://github.com/lifeiteng/DiscoSeqSampler/actions/workflows/ci.yml)
[![codecov](https://codecov.io/gh/lifeiteng/DiscoSeqSampler/branch/main/graph/badge.svg)](https://codecov.io/gh/lifeiteng/DiscoSeqSampler)
[![PyPI version](https://badge.fury.io/py/discoss.svg)](https://badge.fury.io/py/discoss)
[![Python version](https://img.shields.io/pypi/pyversions/discoss.svg)](https://pypi.org/project/discoss/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

Distributed Coordinated Sequence Sampler - 一个高效的分布式序列采样框架。

## 背景

在当前的 AI 领域,无论是音频语音(Audio/Speech)还是图像视频(Image/Video)模型,都广泛使用 Transformer 架构。这类模型的计算量与序列长度高度相关,而在大规模数据集中,数据的长度分布往往非常广泛。为了实现高效的多 GPU 训练,必须对训练数据的序列长度进行精细准确的管理。

DiscoSeqSampler 正是为了解决这一关键问题而设计的分布式序列采样框架,它能够智能地协调和管理不同长度的序列数据,确保训练过程的高效性和稳定性。

## 特性

- 🚀 **高性能**: 优化的分布式采样算法
- 🔄 **协调机制**: 智能的序列协调和同步
- 📊 **可扩展**: 支持大规模分布式部署
- 🛠️ **易用性**: 简洁的 API 设计
- 🔧 **可配置**: 灵活的配置选项

## 安装
* **项目仍在开发中,功能尚未完整验证**

### 从 PyPI 安装

```bash
pip install discoss
```

### 从源码安装

```bash
git clone https://github.com/lifeiteng/DiscoSeqSampler.git
cd DiscoSeqSampler
pip install -e .
```

## 快速开始

```python
import discoss

# TODO: 添加使用示例
```

## 开发

查看 [DEVELOPMENT.md](DEVELOPMENT.md) 获取详细的开发指南。

### 快速设置

```bash
# 克隆仓库
git clone https://github.com/lifeiteng/DiscoSeqSampler.git
cd DiscoSeqSampler

# 安装开发依赖
pip install -e .[dev]

# 设置 pre-commit 钩子
make setup-dev
```

### 运行测试

```bash
make test
```

## 贡献

欢迎贡献!请查看 [DEVELOPMENT.md](DEVELOPMENT.md) 了解如何设置开发环境。

## 许可证

本项目采用 MIT 许可证 - 查看 [LICENSE](LICENSE) 文件了解详情。

## 引用

如果您在研究中使用了 DiscoSeqSampler,请引用:

```bibtex
@software{discoss2024,
  title={DiscoSeqSampler: Distributed Coordinated Sequence Sampler},
  author={Feiteng Li},
  year={2025},
  url={https://github.com/lifeiteng/DiscoSeqSampler}
}
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/lifeiteng/DiscoSeqSampler",
    "name": "discoss",
    "maintainer": "Feiteng Li",
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "distributed, sampling, sequence, coordination",
    "author": "Feiteng Li",
    "author_email": "lifeiteng0422@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/de/8e/b2a8cb82fa44f5dbb15230d99337d7ce57e5b4c7304365c559652408b34b/discoss-0.1.1.tar.gz",
    "platform": null,
    "description": "# DiscoSeqSampler\n\n[![CI](https://github.com/lifeiteng/DiscoSeqSampler/actions/workflows/ci.yml/badge.svg)](https://github.com/lifeiteng/DiscoSeqSampler/actions/workflows/ci.yml)\n[![codecov](https://codecov.io/gh/lifeiteng/DiscoSeqSampler/branch/main/graph/badge.svg)](https://codecov.io/gh/lifeiteng/DiscoSeqSampler)\n[![PyPI version](https://badge.fury.io/py/discoss.svg)](https://badge.fury.io/py/discoss)\n[![Python version](https://img.shields.io/pypi/pyversions/discoss.svg)](https://pypi.org/project/discoss/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\nDistributed Coordinated Sequence Sampler - \u4e00\u4e2a\u9ad8\u6548\u7684\u5206\u5e03\u5f0f\u5e8f\u5217\u91c7\u6837\u6846\u67b6\u3002\n\n## \u80cc\u666f\n\n\u5728\u5f53\u524d\u7684 AI \u9886\u57df\uff0c\u65e0\u8bba\u662f\u97f3\u9891\u8bed\u97f3\uff08Audio/Speech\uff09\u8fd8\u662f\u56fe\u50cf\u89c6\u9891\uff08Image/Video\uff09\u6a21\u578b\uff0c\u90fd\u5e7f\u6cdb\u4f7f\u7528 Transformer \u67b6\u6784\u3002\u8fd9\u7c7b\u6a21\u578b\u7684\u8ba1\u7b97\u91cf\u4e0e\u5e8f\u5217\u957f\u5ea6\u9ad8\u5ea6\u76f8\u5173\uff0c\u800c\u5728\u5927\u89c4\u6a21\u6570\u636e\u96c6\u4e2d\uff0c\u6570\u636e\u7684\u957f\u5ea6\u5206\u5e03\u5f80\u5f80\u975e\u5e38\u5e7f\u6cdb\u3002\u4e3a\u4e86\u5b9e\u73b0\u9ad8\u6548\u7684\u591a GPU \u8bad\u7ec3\uff0c\u5fc5\u987b\u5bf9\u8bad\u7ec3\u6570\u636e\u7684\u5e8f\u5217\u957f\u5ea6\u8fdb\u884c\u7cbe\u7ec6\u51c6\u786e\u7684\u7ba1\u7406\u3002\n\nDiscoSeqSampler \u6b63\u662f\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u5173\u952e\u95ee\u9898\u800c\u8bbe\u8ba1\u7684\u5206\u5e03\u5f0f\u5e8f\u5217\u91c7\u6837\u6846\u67b6\uff0c\u5b83\u80fd\u591f\u667a\u80fd\u5730\u534f\u8c03\u548c\u7ba1\u7406\u4e0d\u540c\u957f\u5ea6\u7684\u5e8f\u5217\u6570\u636e\uff0c\u786e\u4fdd\u8bad\u7ec3\u8fc7\u7a0b\u7684\u9ad8\u6548\u6027\u548c\u7a33\u5b9a\u6027\u3002\n\n## \u7279\u6027\n\n- \ud83d\ude80 **\u9ad8\u6027\u80fd**: \u4f18\u5316\u7684\u5206\u5e03\u5f0f\u91c7\u6837\u7b97\u6cd5\n- \ud83d\udd04 **\u534f\u8c03\u673a\u5236**: \u667a\u80fd\u7684\u5e8f\u5217\u534f\u8c03\u548c\u540c\u6b65\n- \ud83d\udcca **\u53ef\u6269\u5c55**: \u652f\u6301\u5927\u89c4\u6a21\u5206\u5e03\u5f0f\u90e8\u7f72\n- \ud83d\udee0\ufe0f **\u6613\u7528\u6027**: \u7b80\u6d01\u7684 API \u8bbe\u8ba1\n- \ud83d\udd27 **\u53ef\u914d\u7f6e**: \u7075\u6d3b\u7684\u914d\u7f6e\u9009\u9879\n\n## \u5b89\u88c5\n* **\u9879\u76ee\u4ecd\u5728\u5f00\u53d1\u4e2d\uff0c\u529f\u80fd\u5c1a\u672a\u5b8c\u6574\u9a8c\u8bc1**\n\n### \u4ece PyPI \u5b89\u88c5\n\n```bash\npip install discoss\n```\n\n### \u4ece\u6e90\u7801\u5b89\u88c5\n\n```bash\ngit clone https://github.com/lifeiteng/DiscoSeqSampler.git\ncd DiscoSeqSampler\npip install -e .\n```\n\n## \u5feb\u901f\u5f00\u59cb\n\n```python\nimport discoss\n\n# TODO: \u6dfb\u52a0\u4f7f\u7528\u793a\u4f8b\n```\n\n## \u5f00\u53d1\n\n\u67e5\u770b [DEVELOPMENT.md](DEVELOPMENT.md) \u83b7\u53d6\u8be6\u7ec6\u7684\u5f00\u53d1\u6307\u5357\u3002\n\n### \u5feb\u901f\u8bbe\u7f6e\n\n```bash\n# \u514b\u9686\u4ed3\u5e93\ngit clone https://github.com/lifeiteng/DiscoSeqSampler.git\ncd DiscoSeqSampler\n\n# \u5b89\u88c5\u5f00\u53d1\u4f9d\u8d56\npip install -e .[dev]\n\n# \u8bbe\u7f6e pre-commit \u94a9\u5b50\nmake setup-dev\n```\n\n### \u8fd0\u884c\u6d4b\u8bd5\n\n```bash\nmake test\n```\n\n## \u8d21\u732e\n\n\u6b22\u8fce\u8d21\u732e\uff01\u8bf7\u67e5\u770b [DEVELOPMENT.md](DEVELOPMENT.md) \u4e86\u89e3\u5982\u4f55\u8bbe\u7f6e\u5f00\u53d1\u73af\u5883\u3002\n\n## \u8bb8\u53ef\u8bc1\n\n\u672c\u9879\u76ee\u91c7\u7528 MIT \u8bb8\u53ef\u8bc1 - \u67e5\u770b [LICENSE](LICENSE) \u6587\u4ef6\u4e86\u89e3\u8be6\u60c5\u3002\n\n## \u5f15\u7528\n\n\u5982\u679c\u60a8\u5728\u7814\u7a76\u4e2d\u4f7f\u7528\u4e86 DiscoSeqSampler\uff0c\u8bf7\u5f15\u7528\uff1a\n\n```bibtex\n@software{discoss2024,\n  title={DiscoSeqSampler: Distributed Coordinated Sequence Sampler},\n  author={Feiteng Li},\n  year={2025},\n  url={https://github.com/lifeiteng/DiscoSeqSampler}\n}\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Distributed Coordinated Sequence Sampler",
    "version": "0.1.1",
    "project_urls": {
        "Homepage": "https://github.com/lifeiteng/DiscoSeqSampler",
        "Issues": "https://github.com/lifeiteng/DiscoSeqSampler/issues",
        "Repository": "https://github.com/lifeiteng/DiscoSeqSampler.git"
    },
    "split_keywords": [
        "distributed",
        " sampling",
        " sequence",
        " coordination"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "ee87ddb3208e6c620532282113a9d1eb1c63125dd46f4eac7c1d9e76852c4bf4",
                "md5": "b77febc5acaecfdfb728302b63fecbd9",
                "sha256": "3b8cd95c48025734ca5d4ae996946ad842b18e18eb300f0a79deed86271caa35"
            },
            "downloads": -1,
            "filename": "discoss-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b77febc5acaecfdfb728302b63fecbd9",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 16539,
            "upload_time": "2025-08-03T15:36:40",
            "upload_time_iso_8601": "2025-08-03T15:36:40.688787Z",
            "url": "https://files.pythonhosted.org/packages/ee/87/ddb3208e6c620532282113a9d1eb1c63125dd46f4eac7c1d9e76852c4bf4/discoss-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "de8eb2a8cb82fa44f5dbb15230d99337d7ce57e5b4c7304365c559652408b34b",
                "md5": "23a52a6f1609a8a048eb6292eaf60546",
                "sha256": "f049c242687fcfb675afbc577d52a06d33c3af76a335c541af597499caa59b98"
            },
            "downloads": -1,
            "filename": "discoss-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "23a52a6f1609a8a048eb6292eaf60546",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 20109,
            "upload_time": "2025-08-03T15:36:42",
            "upload_time_iso_8601": "2025-08-03T15:36:42.218234Z",
            "url": "https://files.pythonhosted.org/packages/de/8e/b2a8cb82fa44f5dbb15230d99337d7ce57e5b4c7304365c559652408b34b/discoss-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-03 15:36:42",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "lifeiteng",
    "github_project": "DiscoSeqSampler",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "discoss"
}
        
Elapsed time: 1.24245s