# Nedo Vision Training Service
A distributed AI model training service for the Nedo Vision platform. This service manages training workflows, monitoring, and lifecycle management for computer vision models using RF-DETR architecture.
## Features
- **Configurable Training Service**: Automated training with customizable intervals and parameters
- **gRPC Communication**: Reliable communication with the vision manager and other services
- **Distributed Training**: Support for multi-GPU and distributed training scenarios
- **Real-time Monitoring**: System resource monitoring and training progress tracking
- **Cloud Integration**: AWS S3 integration for model storage and dataset management
- **Message Queue Support**: RabbitMQ integration for task queue management
## Installation
Install the package from PyPI:
```bash
pip install nedo-vision-training
```
For GPU support with CUDA 12.1:
```bash
pip install nedo-vision-training[gpu] --extra-index-url https://download.pytorch.org/whl/cu121
```
For development with all tools:
```bash
pip install nedo-vision-training[dev]
```
## Quick Start
### Using the CLI
After installation, you can use the training service CLI:
```bash
# Show CLI help
nedo-trainer --help
# Start training service with authentication token
nedo-trainer --token YOUR_TOKEN
# Start with custom server configuration
nedo-trainer --token YOUR_TOKEN --server-host custom.server.com --server-port 60000
# Start with custom system usage reporting interval (in seconds)
nedo-trainer --token YOUR_TOKEN --system-usage-interval 30
# Start with custom latency monitoring interval (in seconds)
nedo-trainer --token YOUR_TOKEN --latency-check-interval 15
```
### Configuration Options
The service supports various configuration options:
- `--token`: Authentication token for secure communication
- `--server-host`: gRPC server host (default: localhost)
- `--server-port`: gRPC server port (default: 50051)
- `--system-usage-interval`: System usage reporting interval in seconds (default: 30)
- `--latency-check-interval`: Latency monitoring interval in seconds (default: 10)
## Architecture
### Core Components
- **TrainingService**: Main service orchestrator for training workflows
- **RFDETRTrainer**: RF-DETR algorithm implementation with PyTorch backend
- **TrainerLogger**: Real-time training progress logging via gRPC
- **ResourceMonitor**: System resource monitoring (GPU, CPU, memory)
### Dependencies
The service relies on several key technologies:
- **PyTorch**: Deep learning framework with CUDA support
- **RF-DETR**: Roboflow's Real-time Detection Transformer
- **gRPC**: High-performance RPC framework
- **RabbitMQ**: Message queue for distributed task management
- **AWS SDK**: Cloud storage integration
- **NVIDIA ML**: GPU monitoring and management
## Development Setup
## Troubleshooting
### Common Issues
1. **gRPC Connection Timeouts**: Ensure the server host and port are correctly configured
2. **CUDA Out of Memory**: Reduce batch size or use gradient accumulation
3. **Missing Dependencies**: Reinstall with `pip install --upgrade nedo-vision-training`
### Support
For issues and questions:
- Check the logs for detailed error information
- Ensure your token is valid and not expired
- Verify network connectivity to the training manager
## License
This project is part of the Nedo Vision platform. Please refer to the main project license for usage terms.
Raw data
{
"_id": null,
"home_page": null,
"name": "nedo-vision-training",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": "Willy Achmat Fauzi <willy.achmat@gmail.com>",
"keywords": "computer-vision, machine-learning, ai, training, deep-learning, object-detection, neural-networks, pytorch",
"author": null,
"author_email": "Willy Achmat Fauzi <willy.achmat@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/27/e0/d9e0e99c27492a6df12f0269434dce1c98ad26803c2831baf2ac7d86546f/nedo_vision_training-1.0.0.tar.gz",
"platform": null,
"description": "# Nedo Vision Training Service\n\nA distributed AI model training service for the Nedo Vision platform. This service manages training workflows, monitoring, and lifecycle management for computer vision models using RF-DETR architecture.\n\n## Features\n\n- **Configurable Training Service**: Automated training with customizable intervals and parameters\n- **gRPC Communication**: Reliable communication with the vision manager and other services\n- **Distributed Training**: Support for multi-GPU and distributed training scenarios\n- **Real-time Monitoring**: System resource monitoring and training progress tracking\n- **Cloud Integration**: AWS S3 integration for model storage and dataset management\n- **Message Queue Support**: RabbitMQ integration for task queue management\n\n## Installation\n\nInstall the package from PyPI:\n\n```bash\npip install nedo-vision-training\n```\n\nFor GPU support with CUDA 12.1:\n\n```bash\npip install nedo-vision-training[gpu] --extra-index-url https://download.pytorch.org/whl/cu121\n```\n\nFor development with all tools:\n\n```bash\npip install nedo-vision-training[dev]\n```\n\n## Quick Start\n\n### Using the CLI\n\nAfter installation, you can use the training service CLI:\n\n```bash\n# Show CLI help\nnedo-trainer --help\n\n# Start training service with authentication token\nnedo-trainer --token YOUR_TOKEN\n\n# Start with custom server configuration\nnedo-trainer --token YOUR_TOKEN --server-host custom.server.com --server-port 60000\n\n# Start with custom system usage reporting interval (in seconds)\nnedo-trainer --token YOUR_TOKEN --system-usage-interval 30\n\n# Start with custom latency monitoring interval (in seconds)\nnedo-trainer --token YOUR_TOKEN --latency-check-interval 15\n```\n\n### Configuration Options\n\nThe service supports various configuration options:\n\n- `--token`: Authentication token for secure communication\n- `--server-host`: gRPC server host (default: localhost)\n- `--server-port`: gRPC server port (default: 50051)\n- `--system-usage-interval`: System usage reporting interval in seconds (default: 30)\n- `--latency-check-interval`: Latency monitoring interval in seconds (default: 10)\n\n## Architecture\n\n### Core Components\n\n- **TrainingService**: Main service orchestrator for training workflows\n- **RFDETRTrainer**: RF-DETR algorithm implementation with PyTorch backend\n- **TrainerLogger**: Real-time training progress logging via gRPC\n- **ResourceMonitor**: System resource monitoring (GPU, CPU, memory)\n\n### Dependencies\n\nThe service relies on several key technologies:\n\n- **PyTorch**: Deep learning framework with CUDA support\n- **RF-DETR**: Roboflow's Real-time Detection Transformer\n- **gRPC**: High-performance RPC framework\n- **RabbitMQ**: Message queue for distributed task management\n- **AWS SDK**: Cloud storage integration\n- **NVIDIA ML**: GPU monitoring and management\n\n## Development Setup\n\n## Troubleshooting\n\n### Common Issues\n\n1. **gRPC Connection Timeouts**: Ensure the server host and port are correctly configured\n2. **CUDA Out of Memory**: Reduce batch size or use gradient accumulation\n3. **Missing Dependencies**: Reinstall with `pip install --upgrade nedo-vision-training`\n\n### Support\n\nFor issues and questions:\n\n- Check the logs for detailed error information\n- Ensure your token is valid and not expired\n- Verify network connectivity to the training manager\n\n## License\n\nThis project is part of the Nedo Vision platform. Please refer to the main project license for usage terms.\n",
"bugtrack_url": null,
"license": null,
"summary": "A comprehensive training service library for AI models in the Nedo Vision platform",
"version": "1.0.0",
"project_urls": {
"Bug Reports": "https://gitlab.com/sindika/research/nedo-vision/nedo-vision-training-service/-/issues",
"Documentation": "https://gitlab.com/sindika/research/nedo-vision/nedo-vision-training-service/-/blob/main/README.md",
"Homepage": "https://gitlab.com/sindika/research/nedo-vision/nedo-vision-training-service",
"Repository": "https://gitlab.com/sindika/research/nedo-vision/nedo-vision-training-service"
},
"split_keywords": [
"computer-vision",
" machine-learning",
" ai",
" training",
" deep-learning",
" object-detection",
" neural-networks",
" pytorch"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "a57909e573d98ff2f2e5256631103295a9f2debe33ebd6411f9ae22f651179c8",
"md5": "03d5ff99fc11939d8007a7c15020da26",
"sha256": "d62d2008b4480e050ddd9dc68c2a96cdc16af2bd53d962d7707ad784632e708a"
},
"downloads": -1,
"filename": "nedo_vision_training-1.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "03d5ff99fc11939d8007a7c15020da26",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 62601,
"upload_time": "2025-08-04T04:14:20",
"upload_time_iso_8601": "2025-08-04T04:14:20.507090Z",
"url": "https://files.pythonhosted.org/packages/a5/79/09e573d98ff2f2e5256631103295a9f2debe33ebd6411f9ae22f651179c8/nedo_vision_training-1.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "27e0d9e0e99c27492a6df12f0269434dce1c98ad26803c2831baf2ac7d86546f",
"md5": "ba3098b7253ffc9302c273f3cacff411",
"sha256": "d7d2f6158fb2023aa3bd0b70fd22984972ab1e7cd13f330db4c907a19c087568"
},
"downloads": -1,
"filename": "nedo_vision_training-1.0.0.tar.gz",
"has_sig": false,
"md5_digest": "ba3098b7253ffc9302c273f3cacff411",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 45783,
"upload_time": "2025-08-04T04:14:21",
"upload_time_iso_8601": "2025-08-04T04:14:21.835729Z",
"url": "https://files.pythonhosted.org/packages/27/e0/d9e0e99c27492a6df12f0269434dce1c98ad26803c2831baf2ac7d86546f/nedo_vision_training-1.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-04 04:14:21",
"github": false,
"gitlab": true,
"bitbucket": false,
"codeberg": false,
"gitlab_user": "sindika",
"gitlab_project": "research",
"lcname": "nedo-vision-training"
}