# SAIS Prism
SAIS Prism is a unified interface for machine learning workflows, acting as a virtual access layer for data and a lifecycle management system through MLflow. Like a prism separates light into its spectrum, this framework separates and manages different aspects of ML development while maintaining a cohesive workflow.
## Features
- Virtual Data Access Layer with unified interface
- Dynamic Configuration Management through YAML files
- Automated Metric Tracking for training and system metrics
- One-line MLflow Integration with `@tracing` decorator
- Flexible Parameter Access with dot notation
- Strict Configuration Validation
- Pre-configured Training Parameters and Metrics
## Installation
```bash
# Install from PyPI
pip install sais-prism
# Install from source
git clone http://gitlab-paas.internal.sais.com.cn/data_intelligence_platform/sais-prism.git
cd sais-prism
pip install -e .
```
## Quick Start
1. **Create Configuration File**
Create `tracing.yaml` in your project root:
```yaml
tracing:
generic:
experiment_name: "my_experiment"
system_tracing: true
model_repo:
registered: true
name: "my_model"
tag:
framework: "pytorch"
task_type: "classification"
version: "1.0.0"
metric:
training:
- loss
- accuracy
- learning_rate
train_parameters:
num_train_epochs: 3
learning_rate: 2.0e-4
model:
model_name: "bert-base-uncased"
quantization:
load_in_4bit: true
bnb_4bit_compute_dtype: "float16"
```
2. **Enable Tracking**
```python
from sais_prism.decorators import tracing
@tracing
def train_model():
# Your training code
model.train()
# Metrics are automatically logged to MLflow
return model
```
## Usage Guide
### Configuration Management
#### Configuration File Structure
The `tracing.yaml` configuration file contains the following sections:
- **generic**: Basic configuration
- `experiment_name`: MLflow experiment name
- `system_tracing`: Enable/disable system metric tracking
- **model_repo**: Model repository settings
- `registered`: Model registration flag
- `name`: Model name
- `tag`: Model tags
- `version`: Model version
- **metric**: Metrics to track
- `training`: List of training metrics
- **artifacts**: Artifacts to save
```yaml
artifacts:
- name: "checkpoints"
path: "./checkpoints"
- name: "plots"
path: "./plots"
```
- **train_parameters**: Training configuration
- Supports nested parameter structures
- Common parameters (learning rate, batch size)
- Model-specific parameters (quantization)
- LoRA parameters (if applicable)
#### Accessing Configuration
Two access methods are available:
1. **Dot Notation Access (Recommended)**:
```python
from sais_prism.config_manager import config_manager
config = config_manager.params
learning_rate = config.train_parameters.learning_rate
model_name = config.train_parameters.model.model_name
```
2. **Dictionary Access**:
```python
training_params = config_manager.get_training_params()
model_info = config_manager.get_model_info()
metrics_config = config_manager.get_metrics_config()
```
### Metric Tracking
#### Automatic Tracking
The `@tracing` decorator automatically:
- Creates MLflow experiment if not exists
- Starts a new run
- Logs all configured parameters
- Enables system metric tracking if configured
- Ends run and saves results
```python
@tracing
def train():
for epoch in range(num_epochs):
# Training code
metrics_tracker.log_metric("loss", loss.item(), step=epoch)
```
#### Manual Tracking
```python
from sais_prism.metrics_tracker import metrics_tracker
# Log single metric
metrics_tracker.log_metric("loss", 0.5, step=1)
# Log multiple metrics
metrics_tracker.log_metrics({
"loss": 0.5,
"accuracy": 0.95
}, step=1)
```
### System Metrics
When `system_tracing` is enabled, SAIS Prism tracks:
- CPU utilization
- Memory usage
- GPU utilization (if available)
- GPU memory (if available)
- Network latency
### Complete Example
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from sais_prism.decorators import tracing
from sais_prism.config_manager import config_manager
@tracing
def train_model():
# Get configuration
config = config_manager.params
train_params = config.train_parameters
# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
train_params.model.model_name,
device_map={"": 0}
)
# Training loop
for epoch in range(train_params.num_train_epochs):
# Training code
pass
return model
if __name__ == "__main__":
model = train_model()
```
## Development
### Running Tests
```bash
# Install test dependencies
pip install -e ".[test]"
# Run tests
./run_tests.sh
```
### Project Structure
```
sais_prism/
├── sais_prism/
│ ├── config_manager.py # Configuration management
│ ├── config_objects.py # Dynamic configuration objects
│ ├── config_validator.py # Configuration validation
│ ├── decorators.py # MLflow decorators
│ └── metrics_tracker.py # Metric tracking
├── examples/
│ ├── example_usage.py # Usage examples
│ └── tracing.yaml # Example configuration
└── tests/ # Test suite
```
## FAQ
1. **Q: How to change MLflow tracking server?**
A: Add tracking_url in `tracing.yaml`:
```yaml
tracing:
tracking_url: "http://my-mlflow-server:5000"
```
2. **Q: How to disable system metrics?**
A: Set `system_tracing: false` in configuration
3. **Q: What metrics are supported?**
A: SAIS Prism supports any numeric metrics, including:
- Training losses
- Evaluation metrics
- Custom metrics
- System metrics (CPU, memory, GPU)
## Contributing
1. Fork the repository
2. Create a feature branch
3. Commit changes
4. Push to the branch
5. Create Pull Request
## License
MIT License
## Author
- Shepard (zhaoxun@sais.com.cn)
## Changelog
### v0.1.0
- Initial release
- Virtual data access layer
- Dynamic configuration management
- Automated metric tracking
- MLflow integration
- System metric tracking
Raw data
{
"_id": null,
"home_page": "http://gitlab-paas.internal.sais.com.cn/data_intelligence_platform/sais-prism",
"name": "sais-prism",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": null,
"author": "Shepard",
"author_email": "Shepard <zhaoxun@sais.com.cn>",
"download_url": "https://files.pythonhosted.org/packages/7c/4d/de423b08dacb55d20a81f3da7f1f5fefdec8b73fd0820d21190893c7bb9e/sais_prism-0.1.5.tar.gz",
"platform": null,
"description": "# SAIS Prism\n\nSAIS Prism is a unified interface for machine learning workflows, acting as a virtual access layer for data and a lifecycle management system through MLflow. Like a prism separates light into its spectrum, this framework separates and manages different aspects of ML development while maintaining a cohesive workflow.\n\n## Features\n\n- Virtual Data Access Layer with unified interface\n- Dynamic Configuration Management through YAML files\n- Automated Metric Tracking for training and system metrics\n- One-line MLflow Integration with `@tracing` decorator\n- Flexible Parameter Access with dot notation\n- Strict Configuration Validation\n- Pre-configured Training Parameters and Metrics\n\n## Installation\n\n```bash\n# Install from PyPI\npip install sais-prism\n\n# Install from source\ngit clone http://gitlab-paas.internal.sais.com.cn/data_intelligence_platform/sais-prism.git\ncd sais-prism\npip install -e .\n```\n\n## Quick Start\n\n1. **Create Configuration File**\n\nCreate `tracing.yaml` in your project root:\n\n```yaml\ntracing:\n generic:\n experiment_name: \"my_experiment\"\n system_tracing: true\n \n model_repo:\n registered: true\n name: \"my_model\"\n tag:\n framework: \"pytorch\"\n task_type: \"classification\"\n version: \"1.0.0\"\n\n metric:\n training:\n - loss\n - accuracy\n - learning_rate\n\n train_parameters:\n num_train_epochs: 3\n learning_rate: 2.0e-4\n model:\n model_name: \"bert-base-uncased\"\n quantization:\n load_in_4bit: true\n bnb_4bit_compute_dtype: \"float16\"\n```\n\n2. **Enable Tracking**\n\n```python\nfrom sais_prism.decorators import tracing\n\n@tracing\ndef train_model():\n # Your training code\n model.train()\n \n # Metrics are automatically logged to MLflow\n return model\n```\n\n## Usage Guide\n\n### Configuration Management\n\n#### Configuration File Structure\n\nThe `tracing.yaml` configuration file contains the following sections:\n\n- **generic**: Basic configuration\n - `experiment_name`: MLflow experiment name\n - `system_tracing`: Enable/disable system metric tracking\n\n- **model_repo**: Model repository settings\n - `registered`: Model registration flag\n - `name`: Model name\n - `tag`: Model tags\n - `version`: Model version\n\n- **metric**: Metrics to track\n - `training`: List of training metrics\n\n- **artifacts**: Artifacts to save\n ```yaml\n artifacts:\n - name: \"checkpoints\"\n path: \"./checkpoints\"\n - name: \"plots\"\n path: \"./plots\"\n ```\n\n- **train_parameters**: Training configuration\n - Supports nested parameter structures\n - Common parameters (learning rate, batch size)\n - Model-specific parameters (quantization)\n - LoRA parameters (if applicable)\n\n#### Accessing Configuration\n\nTwo access methods are available:\n\n1. **Dot Notation Access (Recommended)**:\n```python\nfrom sais_prism.config_manager import config_manager\n\nconfig = config_manager.params\nlearning_rate = config.train_parameters.learning_rate\nmodel_name = config.train_parameters.model.model_name\n```\n\n2. **Dictionary Access**:\n```python\ntraining_params = config_manager.get_training_params()\nmodel_info = config_manager.get_model_info()\nmetrics_config = config_manager.get_metrics_config()\n```\n\n### Metric Tracking\n\n#### Automatic Tracking\n\nThe `@tracing` decorator automatically:\n- Creates MLflow experiment if not exists\n- Starts a new run\n- Logs all configured parameters\n- Enables system metric tracking if configured\n- Ends run and saves results\n\n```python\n@tracing\ndef train():\n for epoch in range(num_epochs):\n # Training code\n metrics_tracker.log_metric(\"loss\", loss.item(), step=epoch)\n```\n\n#### Manual Tracking\n\n```python\nfrom sais_prism.metrics_tracker import metrics_tracker\n\n# Log single metric\nmetrics_tracker.log_metric(\"loss\", 0.5, step=1)\n\n# Log multiple metrics\nmetrics_tracker.log_metrics({\n \"loss\": 0.5,\n \"accuracy\": 0.95\n}, step=1)\n```\n\n### System Metrics\n\nWhen `system_tracing` is enabled, SAIS Prism tracks:\n- CPU utilization\n- Memory usage\n- GPU utilization (if available)\n- GPU memory (if available)\n- Network latency\n\n### Complete Example\n\n```python\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\nfrom sais_prism.decorators import tracing\nfrom sais_prism.config_manager import config_manager\n\n@tracing\ndef train_model():\n # Get configuration\n config = config_manager.params\n train_params = config.train_parameters\n \n # Load model and tokenizer\n model = AutoModelForCausalLM.from_pretrained(\n train_params.model.model_name,\n device_map={\"\": 0}\n )\n \n # Training loop\n for epoch in range(train_params.num_train_epochs):\n # Training code\n pass\n \n return model\n\nif __name__ == \"__main__\":\n model = train_model()\n```\n\n## Development\n\n### Running Tests\n\n```bash\n# Install test dependencies\npip install -e \".[test]\"\n\n# Run tests\n./run_tests.sh\n```\n\n### Project Structure\n\n```\nsais_prism/\n\u251c\u2500\u2500 sais_prism/\n\u2502 \u251c\u2500\u2500 config_manager.py # Configuration management\n\u2502 \u251c\u2500\u2500 config_objects.py # Dynamic configuration objects\n\u2502 \u251c\u2500\u2500 config_validator.py # Configuration validation\n\u2502 \u251c\u2500\u2500 decorators.py # MLflow decorators\n\u2502 \u2514\u2500\u2500 metrics_tracker.py # Metric tracking\n\u251c\u2500\u2500 examples/\n\u2502 \u251c\u2500\u2500 example_usage.py # Usage examples\n\u2502 \u2514\u2500\u2500 tracing.yaml # Example configuration\n\u2514\u2500\u2500 tests/ # Test suite\n```\n\n## FAQ\n\n1. **Q: How to change MLflow tracking server?**\n A: Add tracking_url in `tracing.yaml`:\n ```yaml\n tracing:\n tracking_url: \"http://my-mlflow-server:5000\"\n ```\n\n2. **Q: How to disable system metrics?**\n A: Set `system_tracing: false` in configuration\n\n3. **Q: What metrics are supported?**\n A: SAIS Prism supports any numeric metrics, including:\n - Training losses\n - Evaluation metrics\n - Custom metrics\n - System metrics (CPU, memory, GPU)\n\n## Contributing\n\n1. Fork the repository\n2. Create a feature branch\n3. Commit changes\n4. Push to the branch\n5. Create Pull Request\n\n## License\n\nMIT License\n\n## Author\n\n- Shepard (zhaoxun@sais.com.cn)\n\n## Changelog\n\n### v0.1.0\n- Initial release\n- Virtual data access layer\n- Dynamic configuration management\n- Automated metric tracking\n- MLflow integration\n- System metric tracking\n",
"bugtrack_url": null,
"license": "The Unlicense",
"summary": "SAIS Prism: A unified interface for ML data access and lifecycle management",
"version": "0.1.5",
"project_urls": {
"Homepage": "http://gitlab-paas.internal.sais.com.cn/data_intelligence_platform/sais-prism"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "8be3bae5b262b6ffe0212b73d3b2021b48916db32d3a653d9d2e6b72d6d597ce",
"md5": "909ebd67e07d6e37e8bda0655e593224",
"sha256": "37801aa398456b6f16910272587bbc556e42b8b2464d49af07e354e269add3c9"
},
"downloads": -1,
"filename": "sais_prism-0.1.5-py3-none-any.whl",
"has_sig": false,
"md5_digest": "909ebd67e07d6e37e8bda0655e593224",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 21085,
"upload_time": "2025-02-21T06:54:53",
"upload_time_iso_8601": "2025-02-21T06:54:53.453251Z",
"url": "https://files.pythonhosted.org/packages/8b/e3/bae5b262b6ffe0212b73d3b2021b48916db32d3a653d9d2e6b72d6d597ce/sais_prism-0.1.5-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "7c4dde423b08dacb55d20a81f3da7f1f5fefdec8b73fd0820d21190893c7bb9e",
"md5": "2552986ebd1916fa2a625fd542ff8b4f",
"sha256": "6d1ccd949dae753dff655f8dd99da58ddb9899f85c4df09713324dd96f5773fb"
},
"downloads": -1,
"filename": "sais_prism-0.1.5.tar.gz",
"has_sig": false,
"md5_digest": "2552986ebd1916fa2a625fd542ff8b4f",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 334030,
"upload_time": "2025-02-21T06:54:55",
"upload_time_iso_8601": "2025-02-21T06:54:55.975392Z",
"url": "https://files.pythonhosted.org/packages/7c/4d/de423b08dacb55d20a81f3da7f1f5fefdec8b73fd0820d21190893c7bb9e/sais_prism-0.1.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-02-21 06:54:55",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "sais-prism"
}