Name | rag-classic-ml JSON |
Version |
0.1.4
JSON |
| download |
home_page | None |
Summary | Classical Machine Learning methods for Reterival Augmented Generation |
upload_time | 2024-09-20 21:53:36 |
maintainer | None |
docs_url | None |
author | VatsalPatel18 |
requires_python | <4.0,>=3.9 |
license | None |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# RAG-Classic-ML
**RAG-Classic-ML** is a versatile Python package designed to provide out-of-the-box machine learning pipelines for both basic and advanced tasks. It simplifies the process of building, training, and evaluating models for tasks like classification, regression, autoencoder-based feature extraction, and survival clustering. The package is designed for ease of use, offering pre-built pipelines and customizable parameters for a variety of machine learning algorithms.
## Table of Contents
- [Features](#features)
- [Installation](#installation)
- [Usage](#usage)
- [Basic Machine Learning Pipelines](#classic_ml)
- [Classification]
- [Regression]
- [Advanced Pipelines](#classic_ml)
- [Autoencoder with Feature Extraction](#train_autoencoder)
- [Clustering and Survival Analysis](#Cluster)
- [Train Autoencoder Model](#train-autoencoder-model)
- [Survival Clustering Analysis](#survival-clustering-analysis)
- [Command-Line Arguments](#command-line-arguments)
- [Dependencies](#dependencies)
- [License](#license)
- [Author](#author)
## Features
- **Basic Machine Learning Pipelines**: Ready-to-use pipelines for common supervised learning tasks, including classification and regression, with a variety of machine learning models (e.g., Logistic Regression, SVC, Random Forest).
- **Advanced Pipelines**
- **Autoencoder** : Dimensionality reduction and feature extraction using deep learning autoencoders.
- **Survival Clustering Analysis**: Performs clustering on patient features and integrates clinical data to generate Kaplan-Meier survival plots and log-rank tests.
- **Customizable Models and Parameters**: Easily define and customize machine learning models and hyperparameters.
- **Prediction and Metrics Generation**: Generates and saves predictions, feature importance scores, and various performance metrics for each model and run.
- **Aggregation of Results**: Aggregates results across runs and models for comprehensive analysis, facilitating comparison and evaluation.
- **Visualization Tools**: Generates plots including AUC curves, AUC box plots, feature importance charts, radar charts for model performance comparison, and survival analysis plots.
## Installation
You can install the package directly from PyPI:
```bash
pip install classic-ml
```
Alternatively, install from source:
```bash
git clone https://github.com/yourusername/classic-ml.git
cd benchmark-adv-ml
pip install .
```
## Useage
The classic-ml package provides a command-line interface (CLI) for ease of use. Below are examples of how to use the various components.
## Basic Machine Learning Pipelines
### Classification
Train and evaluate a classification model using the classic-ml CLI. You can specify different models and hyperparameters.
### Example 1: Support Vector Classifier (SVC)
```bash
classic-ml classification \
--data ./Raisin_Dataset.data \
--target 'label' \
--output ./results/svc_rbf/ \
--model SVC \
--model_params '{"C": 1.0, "kernel": "rbf", "gamma": "scale", "probability": true}' \
--test_size 0.2 \
--seed 42
```
### Example 2: Logistic Regression
```bash
classic-ml classification \
--data ./Raisin_Dataset.data \
--target 'label' \
--output ./results/logistic_regression/ \
--model LogisticRegression \
--model_params '{"C": 0.5, "penalty": "l1", "solver": "saga", "max_iter": 1000, "class_weight": "balanced"}' \
--test_size 0.2 \
--seed 42
```
### Example 3: Random Forest Classifier
```bash
classic-ml classification \
--data ./Raisin_Dataset.data \
--target 'label' \
--output ./results/random_forest/ \
--model RandomForestClassifier \
--model_params '{"n_estimators": 100, "max_depth": 10}' \
--test_size 0.2 \
--seed 42
```
### Benchmark Machine Learning Models
Run the benchmark ML pipeline to evaluate model stability across multiple runs.
```bash
benchmark-adv-ml benchmark --data ./your_dataset.csv --output ./final_results --prelim_output ./prelim_results --n_runs 10 --seed 42
```
### Train Autoencoder Model
Train and evaluate an autoencoder model for feature extraction.
```bash
classic-ml autoencoder \
--data ./your_dataset.csv \
--sampleID 'PatientID' \
--output_dir ./final_results \
--prelim_output ./prelim_results \
--latent_dim 10 \
--epochs 50 \
--batch_size 32 \
--validation_split 0.1 \
--test_size 0.2 \
--seed 42
```
### Survival Clustering Analysis
```bash
classic-ml survival_clustering \
--data_path ./latent_features.csv \
--clinical_df_path ./clinical_data.csv \
--save_dir ./final_results
```
## Command-Line Arguments
### Common Arguments
- `--data`: Path to the existing CSV file containing the dataset.
- `--output`: Directory to save the final results and plots.
- `--prelim_output`: Directory to save the preliminary results (predictions).
- `--seed`: Seed for random state (default is 42).
- `--test_size`: Fraction of data to use for testing (default: 0.2).
### Classification/Regression Command Arguments
- `--target`: Target column name in the dataset (e.g., 'label' for classification or 'price' for regression).
- `--model`: Specify the machine learning model to use (e.g., SVC, LogisticRegression, RandomForestClassifier, LinearRegression).
- `--model_params`: Hyperparameters for the specified model in JSON format (e.g., {"C": 1.0, "kernel": "rbf"}).
### Autoencoder Command Arguments
- `--sampleID`: Column name representing the sample or patient ID (default: 'sampleID').
- `--latent_dim`: Dimensionality of the latent space (default: input_dim // 8).
- `--epochs`: Number of training epochs (default: 50).
- `--batch_size`: Training batch size (default: 32).
- `--validation_split`: Proportion of training data to use as validation set (default: 0.1).
- `--test_size`: Proportion of data to use as test set (default: 0.2).
- `--early_stopping`: Enable early stopping (use flag to activate).
- `--patience`: Patience for early stopping (default: 5).
- `--checkpoint`: Enable model checkpointing (use flag to activate).
### Benchmark Command Arguments
- `--target`: Target column name in the dataset (default: 'label').
- `--n_runs`: Number of runs for model stability evaluation (default: 20).
### Survival Clustering Command Arguments
- `--data_path`: Path to the CSV file containing patient features.
- `--clinical_df_path`: Path to the CSV file containing clinical data.
- `--save_dir`: Directory to save the results.
## Dependencies
- Python 3.11+
- numpy
- pandas
- scikit-learn
- matplotlib
- seaborn
- tensorflow
- lifelines
- yellowbrick
## License
This project is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. See the LICENSE file for details.
## Author
Vatsal Patel - VatsalPatel18
Raw data
{
"_id": null,
"home_page": null,
"name": "rag-classic-ml",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.9",
"maintainer_email": null,
"keywords": null,
"author": "VatsalPatel18",
"author_email": "vatsal1804@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/39/31/91ece7b2d386aba561cf668fdbc44b0d99f30d9189de007e5f17ee9689d0/rag_classic_ml-0.1.4.tar.gz",
"platform": null,
"description": "# RAG-Classic-ML\n\n**RAG-Classic-ML** is a versatile Python package designed to provide out-of-the-box machine learning pipelines for both basic and advanced tasks. It simplifies the process of building, training, and evaluating models for tasks like classification, regression, autoencoder-based feature extraction, and survival clustering. The package is designed for ease of use, offering pre-built pipelines and customizable parameters for a variety of machine learning algorithms.\n\n## Table of Contents \n\n- [Features](#features)\n- [Installation](#installation)\n- [Usage](#usage)\n - [Basic Machine Learning Pipelines](#classic_ml)\n - [Classification]\n - [Regression]\n - [Advanced Pipelines](#classic_ml)\n - [Autoencoder with Feature Extraction](#train_autoencoder)\n - [Clustering and Survival Analysis](#Cluster)\n - [Train Autoencoder Model](#train-autoencoder-model)\n - [Survival Clustering Analysis](#survival-clustering-analysis)\n- [Command-Line Arguments](#command-line-arguments)\n- [Dependencies](#dependencies)\n- [License](#license)\n- [Author](#author)\n\n## Features\n\n- **Basic Machine Learning Pipelines**: Ready-to-use pipelines for common supervised learning tasks, including classification and regression, with a variety of machine learning models (e.g., Logistic Regression, SVC, Random Forest).\n- **Advanced Pipelines**\n - **Autoencoder** : Dimensionality reduction and feature extraction using deep learning autoencoders.\n - **Survival Clustering Analysis**: Performs clustering on patient features and integrates clinical data to generate Kaplan-Meier survival plots and log-rank tests.\n- **Customizable Models and Parameters**: Easily define and customize machine learning models and hyperparameters.\n- **Prediction and Metrics Generation**: Generates and saves predictions, feature importance scores, and various performance metrics for each model and run.\n- **Aggregation of Results**: Aggregates results across runs and models for comprehensive analysis, facilitating comparison and evaluation.\n- **Visualization Tools**: Generates plots including AUC curves, AUC box plots, feature importance charts, radar charts for model performance comparison, and survival analysis plots.\n\n## Installation\n\nYou can install the package directly from PyPI:\n\n```bash\npip install classic-ml\n```\nAlternatively, install from source:\n\n```bash\ngit clone https://github.com/yourusername/classic-ml.git\ncd benchmark-adv-ml\npip install .\n```\n\n## Useage\nThe classic-ml package provides a command-line interface (CLI) for ease of use. Below are examples of how to use the various components.\n\n## Basic Machine Learning Pipelines\n\n### Classification\nTrain and evaluate a classification model using the classic-ml CLI. You can specify different models and hyperparameters.\n\n### Example 1: Support Vector Classifier (SVC)\n\n```bash\nclassic-ml classification \\\n --data ./Raisin_Dataset.data \\\n --target 'label' \\\n --output ./results/svc_rbf/ \\\n --model SVC \\\n --model_params '{\"C\": 1.0, \"kernel\": \"rbf\", \"gamma\": \"scale\", \"probability\": true}' \\\n --test_size 0.2 \\\n --seed 42\n```\n\n### Example 2: Logistic Regression\n\n```bash\nclassic-ml classification \\\n --data ./Raisin_Dataset.data \\\n --target 'label' \\\n --output ./results/logistic_regression/ \\\n --model LogisticRegression \\\n --model_params '{\"C\": 0.5, \"penalty\": \"l1\", \"solver\": \"saga\", \"max_iter\": 1000, \"class_weight\": \"balanced\"}' \\\n --test_size 0.2 \\\n --seed 42\n\n```\n\n### Example 3: Random Forest Classifier\n\n```bash\nclassic-ml classification \\\n --data ./Raisin_Dataset.data \\\n --target 'label' \\\n --output ./results/random_forest/ \\\n --model RandomForestClassifier \\\n --model_params '{\"n_estimators\": 100, \"max_depth\": 10}' \\\n --test_size 0.2 \\\n --seed 42\n\n```\n\n### Benchmark Machine Learning Models\nRun the benchmark ML pipeline to evaluate model stability across multiple runs.\n\n```bash\nbenchmark-adv-ml benchmark --data ./your_dataset.csv --output ./final_results --prelim_output ./prelim_results --n_runs 10 --seed 42\n```\n### Train Autoencoder Model\nTrain and evaluate an autoencoder model for feature extraction.\n\n```bash\nclassic-ml autoencoder \\\n --data ./your_dataset.csv \\\n --sampleID 'PatientID' \\\n --output_dir ./final_results \\\n --prelim_output ./prelim_results \\\n --latent_dim 10 \\\n --epochs 50 \\\n --batch_size 32 \\\n --validation_split 0.1 \\\n --test_size 0.2 \\\n --seed 42\n\n```\n\n### Survival Clustering Analysis\n```bash\nclassic-ml survival_clustering \\\n --data_path ./latent_features.csv \\\n --clinical_df_path ./clinical_data.csv \\\n --save_dir ./final_results\n\n```\n\n## Command-Line Arguments\n\n### Common Arguments\n- `--data`: Path to the existing CSV file containing the dataset.\n- `--output`: Directory to save the final results and plots.\n- `--prelim_output`: Directory to save the preliminary results (predictions).\n- `--seed`: Seed for random state (default is 42).\n- `--test_size`: Fraction of data to use for testing (default: 0.2).\n\n### Classification/Regression Command Arguments\n\n- `--target`: Target column name in the dataset (e.g., 'label' for classification or 'price' for regression).\n- `--model`: Specify the machine learning model to use (e.g., SVC, LogisticRegression, RandomForestClassifier, LinearRegression).\n- `--model_params`: Hyperparameters for the specified model in JSON format (e.g., {\"C\": 1.0, \"kernel\": \"rbf\"}).\n\n### Autoencoder Command Arguments\n\n- `--sampleID`: Column name representing the sample or patient ID (default: 'sampleID').\n- `--latent_dim`: Dimensionality of the latent space (default: input_dim // 8).\n- `--epochs`: Number of training epochs (default: 50).\n- `--batch_size`: Training batch size (default: 32).\n- `--validation_split`: Proportion of training data to use as validation set (default: 0.1).\n- `--test_size`: Proportion of data to use as test set (default: 0.2).\n- `--early_stopping`: Enable early stopping (use flag to activate).\n- `--patience`: Patience for early stopping (default: 5).\n- `--checkpoint`: Enable model checkpointing (use flag to activate).\n\n\n### Benchmark Command Arguments\n\n- `--target`: Target column name in the dataset (default: 'label').\n- `--n_runs`: Number of runs for model stability evaluation (default: 20).\n\n### Survival Clustering Command Arguments\n\n- `--data_path`: Path to the CSV file containing patient features.\n- `--clinical_df_path`: Path to the CSV file containing clinical data.\n- `--save_dir`: Directory to save the results.\n\n## Dependencies\n\n- Python 3.11+\n- numpy\n- pandas\n- scikit-learn\n- matplotlib\n- seaborn\n- tensorflow\n- lifelines\n- yellowbrick\n\n## License \nThis project is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. See the LICENSE file for details.\n\n## Author\nVatsal Patel - VatsalPatel18",
"bugtrack_url": null,
"license": null,
"summary": "Classical Machine Learning methods for Reterival Augmented Generation",
"version": "0.1.4",
"project_urls": null,
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "9d3cb1d42881e3bfe477925111c9048303332ba148cdfd42567a11e6ddaa96d6",
"md5": "69ef666e265a83b28a16b3690d5a6088",
"sha256": "6b1a6192ca3b0d5ecb7361120d39931e3da8a08fc1f1381868fe121a03f1e31f"
},
"downloads": -1,
"filename": "rag_classic_ml-0.1.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "69ef666e265a83b28a16b3690d5a6088",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.9",
"size": 93412,
"upload_time": "2024-09-20T21:53:33",
"upload_time_iso_8601": "2024-09-20T21:53:33.268124Z",
"url": "https://files.pythonhosted.org/packages/9d/3c/b1d42881e3bfe477925111c9048303332ba148cdfd42567a11e6ddaa96d6/rag_classic_ml-0.1.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "393191ece7b2d386aba561cf668fdbc44b0d99f30d9189de007e5f17ee9689d0",
"md5": "43e308ce93f747bb7b1f31106eaf841b",
"sha256": "642a813ab9f4e8f396963ba34076b0f95fae7f622e015d8e5c1636be7221d474"
},
"downloads": -1,
"filename": "rag_classic_ml-0.1.4.tar.gz",
"has_sig": false,
"md5_digest": "43e308ce93f747bb7b1f31106eaf841b",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.9",
"size": 82198,
"upload_time": "2024-09-20T21:53:36",
"upload_time_iso_8601": "2024-09-20T21:53:36.961065Z",
"url": "https://files.pythonhosted.org/packages/39/31/91ece7b2d386aba561cf668fdbc44b0d99f30d9189de007e5f17ee9689d0/rag_classic_ml-0.1.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-09-20 21:53:36",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "rag-classic-ml"
}