aihpi


Nameaihpi JSON
Version 0.1.5 PyPI version JSON
download
home_pagehttps://github.com/username/aihpi
SummaryAI High Performance Infrastructure - Distributed job submission for SLURM clusters
upload_time2025-09-10 19:38:08
maintainerNone
docs_urlNone
authorFelix Boelter
requires_python>=3.8
licenseNone
keywords slurm distributed training ai ml pytorch llamafactory
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <div align="center">
<img src="https://raw.githubusercontent.com/aihpi/aihpi-cluster/main/00_aisc/img/logo_aisc_bmftr.jpg" alt="AI Service Centre Logo" width="400">
<h1>aihpi - AI High Performance Infrastructure</h1>
</div>

A Python package for simplified distributed job submission on SLURM clusters with container support. Built on top of submitit with additional features specifically designed for AI/ML workloads.

## Installation

```bash
# Basic installation
pip install aihpi

# With experiment tracking support
pip install aihpi[tracking]

# With all optional dependencies
pip install aihpi[all]
```

## Quick Start

```python
from aihpi import SlurmJobExecutor, JobConfig

config = JobConfig(
    job_name="my-training",
    num_nodes=1,
    gpus_per_node=2,
    walltime="01:00:00",
    partition="gpu",
    login_node="10.130.0.6"  # Your SLURM login node IP
)

executor = SlurmJobExecutor(config)
job = executor.submit_function(my_training_function)
```

## Features

- **Simple API**: Configure and submit jobs with minimal code
- **Command Line Interface**: `aihpi` CLI for easy job submission and management
- **Distributed Training**: Automatic setup for multi-node distributed training
- **Container Support**: First-class support for Pyxis/Enroot containers
- **Container Submission**: Submit jobs from within containers via SSH to login nodes
- **LlamaFactory Integration**: Built-in support for LlamaFactory training
- **Job Monitoring**: Real-time job status tracking and log streaming
- **Experiment Tracking**: Integration with Weights & Biases, MLflow, and local tracking

## Command Line Usage

```bash
# Submit a Python job
aihpi run train.py --config config.py

# Submit with monitoring
aihpi run train.py --config config.py --monitor

# Submit distributed job
aihpi run train.py --config distributed_config.py

# Monitor a running job
aihpi monitor 12345 --follow
```

## Documentation & Examples

For detailed documentation, examples, and setup instructions, visit:
- **GitHub Repository**: [aihpi/aihpi-cluster](https://github.com/aihpi/aihpi-cluster)
- **Full Documentation**: [README.md](https://github.com/aihpi/aihpi-cluster#readme)

## Requirements

- Python ≥ 3.8
- Access to SLURM cluster
- submitit ≥ 1.4.0

## License

MIT License

---

## Acknowledgements
<div align="center">
<img src="https://raw.githubusercontent.com/aihpi/aihpi-cluster/main/00_aisc/img/logo_bmftr_de.png" alt="BMBF Logo" width="170"/>
</div>

The [AI Service Centre Berlin Brandenburg](http://hpi.de/kisz) is funded by the [Federal Ministry of Research, Technology and Space](https://www.bmbf.de/) under the funding code 01IS22092.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/username/aihpi",
    "name": "aihpi",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "slurm, distributed, training, ai, ml, pytorch, llamafactory",
    "author": "Felix Boelter",
    "author_email": "Felix Boelter <felix.boelter@hpi.de>",
    "download_url": "https://files.pythonhosted.org/packages/bd/4f/f77cbd80ed7a51fdbe951c700dca11a95a1f944d69eb3284db8924cd620f/aihpi-0.1.5.tar.gz",
    "platform": null,
    "description": "<div align=\"center\">\n<img src=\"https://raw.githubusercontent.com/aihpi/aihpi-cluster/main/00_aisc/img/logo_aisc_bmftr.jpg\" alt=\"AI Service Centre Logo\" width=\"400\">\n<h1>aihpi - AI High Performance Infrastructure</h1>\n</div>\n\nA Python package for simplified distributed job submission on SLURM clusters with container support. Built on top of submitit with additional features specifically designed for AI/ML workloads.\n\n## Installation\n\n```bash\n# Basic installation\npip install aihpi\n\n# With experiment tracking support\npip install aihpi[tracking]\n\n# With all optional dependencies\npip install aihpi[all]\n```\n\n## Quick Start\n\n```python\nfrom aihpi import SlurmJobExecutor, JobConfig\n\nconfig = JobConfig(\n    job_name=\"my-training\",\n    num_nodes=1,\n    gpus_per_node=2,\n    walltime=\"01:00:00\",\n    partition=\"gpu\",\n    login_node=\"10.130.0.6\"  # Your SLURM login node IP\n)\n\nexecutor = SlurmJobExecutor(config)\njob = executor.submit_function(my_training_function)\n```\n\n## Features\n\n- **Simple API**: Configure and submit jobs with minimal code\n- **Command Line Interface**: `aihpi` CLI for easy job submission and management\n- **Distributed Training**: Automatic setup for multi-node distributed training\n- **Container Support**: First-class support for Pyxis/Enroot containers\n- **Container Submission**: Submit jobs from within containers via SSH to login nodes\n- **LlamaFactory Integration**: Built-in support for LlamaFactory training\n- **Job Monitoring**: Real-time job status tracking and log streaming\n- **Experiment Tracking**: Integration with Weights & Biases, MLflow, and local tracking\n\n## Command Line Usage\n\n```bash\n# Submit a Python job\naihpi run train.py --config config.py\n\n# Submit with monitoring\naihpi run train.py --config config.py --monitor\n\n# Submit distributed job\naihpi run train.py --config distributed_config.py\n\n# Monitor a running job\naihpi monitor 12345 --follow\n```\n\n## Documentation & Examples\n\nFor detailed documentation, examples, and setup instructions, visit:\n- **GitHub Repository**: [aihpi/aihpi-cluster](https://github.com/aihpi/aihpi-cluster)\n- **Full Documentation**: [README.md](https://github.com/aihpi/aihpi-cluster#readme)\n\n## Requirements\n\n- Python \u2265 3.8\n- Access to SLURM cluster\n- submitit \u2265 1.4.0\n\n## License\n\nMIT License\n\n---\n\n## Acknowledgements\n<div align=\"center\">\n<img src=\"https://raw.githubusercontent.com/aihpi/aihpi-cluster/main/00_aisc/img/logo_bmftr_de.png\" alt=\"BMBF Logo\" width=\"170\"/>\n</div>\n\nThe [AI Service Centre Berlin Brandenburg](http://hpi.de/kisz) is funded by the [Federal Ministry of Research, Technology and Space](https://www.bmbf.de/) under the funding code 01IS22092.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "AI High Performance Infrastructure - Distributed job submission for SLURM clusters",
    "version": "0.1.5",
    "project_urls": {
        "Bug Reports": "https://github.com/aihpi/aihpi-cluster/issues",
        "Documentation": "https://github.com/aihpi/aihpi-cluster#readme",
        "Homepage": "https://github.com/aihpi/aihpi-cluster",
        "Source": "https://github.com/aihpi/aihpi-cluster"
    },
    "split_keywords": [
        "slurm",
        " distributed",
        " training",
        " ai",
        " ml",
        " pytorch",
        " llamafactory"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "6448414e390ea6b09c1f4b7c54f22a30f0bae8441bc8b17a25efe333fafad40b",
                "md5": "a9366cc3e639fc391edf476641b56ce7",
                "sha256": "ec86651377ce24af3b4a326e1352e2595a2132840682bccdc1e558c5c38fe760"
            },
            "downloads": -1,
            "filename": "aihpi-0.1.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a9366cc3e639fc391edf476641b56ce7",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 28627,
            "upload_time": "2025-09-10T19:38:07",
            "upload_time_iso_8601": "2025-09-10T19:38:07.519529Z",
            "url": "https://files.pythonhosted.org/packages/64/48/414e390ea6b09c1f4b7c54f22a30f0bae8441bc8b17a25efe333fafad40b/aihpi-0.1.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "bd4ff77cbd80ed7a51fdbe951c700dca11a95a1f944d69eb3284db8924cd620f",
                "md5": "aa3d30440fd4e0cb9e7973ad372233eb",
                "sha256": "26ae7f177ddac2ff887292e54409bec93b5d5b548209e0ce6f26d40d03b15a65"
            },
            "downloads": -1,
            "filename": "aihpi-0.1.5.tar.gz",
            "has_sig": false,
            "md5_digest": "aa3d30440fd4e0cb9e7973ad372233eb",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 43730,
            "upload_time": "2025-09-10T19:38:08",
            "upload_time_iso_8601": "2025-09-10T19:38:08.911082Z",
            "url": "https://files.pythonhosted.org/packages/bd/4f/f77cbd80ed7a51fdbe951c700dca11a95a1f944d69eb3284db8924cd620f/aihpi-0.1.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-10 19:38:08",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "username",
    "github_project": "aihpi",
    "github_not_found": true,
    "lcname": "aihpi"
}
        
Elapsed time: 1.75498s