rag-classic-ml


Namerag-classic-ml JSON
Version 0.1.4 PyPI version JSON
download
home_pageNone
SummaryClassical Machine Learning methods for Reterival Augmented Generation
upload_time2024-09-20 21:53:36
maintainerNone
docs_urlNone
authorVatsalPatel18
requires_python<4.0,>=3.9
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # RAG-Classic-ML

**RAG-Classic-ML** is a versatile Python package designed to provide out-of-the-box machine learning pipelines for both basic and advanced tasks. It simplifies the process of building, training, and evaluating models for tasks like classification, regression, autoencoder-based feature extraction, and survival clustering. The package is designed for ease of use, offering pre-built pipelines and customizable parameters for a variety of machine learning algorithms.

## Table of Contents  

- [Features](#features)
- [Installation](#installation)
- [Usage](#usage)
  - [Basic Machine Learning Pipelines](#classic_ml)
    - [Classification]
    - [Regression]
  - [Advanced Pipelines](#classic_ml)
    - [Autoencoder with Feature Extraction](#train_autoencoder)
    - [Clustering and Survival Analysis](#Cluster)
  - [Train Autoencoder Model](#train-autoencoder-model)
  - [Survival Clustering Analysis](#survival-clustering-analysis)
- [Command-Line Arguments](#command-line-arguments)
- [Dependencies](#dependencies)
- [License](#license)
- [Author](#author)

## Features

- **Basic Machine Learning Pipelines**: Ready-to-use pipelines for common supervised learning tasks, including classification and regression, with a variety of machine learning models (e.g., Logistic Regression, SVC, Random Forest).
- **Advanced Pipelines**
  - **Autoencoder** : Dimensionality reduction and feature extraction using deep learning autoencoders.
  - **Survival Clustering Analysis**: Performs clustering on patient features and integrates clinical data to generate Kaplan-Meier survival plots and log-rank tests.
- **Customizable Models and Parameters**: Easily define and customize machine learning models and hyperparameters.
- **Prediction and Metrics Generation**: Generates and saves predictions, feature importance scores, and various performance metrics for each model and run.
- **Aggregation of Results**: Aggregates results across runs and models for comprehensive analysis, facilitating comparison and evaluation.
- **Visualization Tools**: Generates plots including AUC curves, AUC box plots, feature importance charts, radar charts for model performance comparison, and survival analysis plots.

## Installation

You can install the package directly from PyPI:

```bash
pip install classic-ml
```
Alternatively, install from source:

```bash
git clone https://github.com/yourusername/classic-ml.git
cd benchmark-adv-ml
pip install .
```

## Useage
The classic-ml package provides a command-line interface (CLI) for ease of use. Below are examples of how to use the various components.

## Basic Machine Learning Pipelines

### Classification
Train and evaluate a classification model using the classic-ml CLI. You can specify different models and hyperparameters.

### Example 1: Support Vector Classifier (SVC)

```bash
classic-ml classification \
    --data ./Raisin_Dataset.data \
    --target 'label' \
    --output ./results/svc_rbf/ \
    --model SVC \
    --model_params '{"C": 1.0, "kernel": "rbf", "gamma": "scale", "probability": true}' \
    --test_size 0.2 \
    --seed 42
```

### Example 2: Logistic Regression

```bash
classic-ml classification \
    --data ./Raisin_Dataset.data \
    --target 'label' \
    --output ./results/logistic_regression/ \
    --model LogisticRegression \
    --model_params '{"C": 0.5, "penalty": "l1", "solver": "saga", "max_iter": 1000, "class_weight": "balanced"}' \
    --test_size 0.2 \
    --seed 42

```

### Example 3: Random Forest Classifier

```bash
classic-ml classification \
    --data ./Raisin_Dataset.data \
    --target 'label' \
    --output ./results/random_forest/ \
    --model RandomForestClassifier \
    --model_params '{"n_estimators": 100, "max_depth": 10}' \
    --test_size 0.2 \
    --seed 42

```

### Benchmark Machine Learning Models
Run the benchmark ML pipeline to evaluate model stability across multiple runs.

```bash
benchmark-adv-ml benchmark --data ./your_dataset.csv --output ./final_results --prelim_output ./prelim_results --n_runs 10 --seed 42
```
### Train Autoencoder Model
Train and evaluate an autoencoder model for feature extraction.

```bash
classic-ml autoencoder \
    --data ./your_dataset.csv \
    --sampleID 'PatientID' \
    --output_dir ./final_results \
    --prelim_output ./prelim_results \
    --latent_dim 10 \
    --epochs 50 \
    --batch_size 32 \
    --validation_split 0.1 \
    --test_size 0.2 \
    --seed 42

```

### Survival Clustering Analysis
```bash
classic-ml survival_clustering \
    --data_path ./latent_features.csv \
    --clinical_df_path ./clinical_data.csv \
    --save_dir ./final_results

```

## Command-Line Arguments

### Common Arguments
- `--data`: Path to the existing CSV file containing the dataset.
- `--output`: Directory to save the final results and plots.
- `--prelim_output`: Directory to save the preliminary results (predictions).
- `--seed`: Seed for random state (default is 42).
- `--test_size`: Fraction of data to use for testing (default: 0.2).

### Classification/Regression Command Arguments

- `--target`:  Target column name in the dataset (e.g., 'label' for classification or 'price' for regression).
- `--model`:  Specify the machine learning model to use (e.g., SVC, LogisticRegression, RandomForestClassifier, LinearRegression).
- `--model_params`:  Hyperparameters for the specified model in JSON format (e.g., {"C": 1.0, "kernel": "rbf"}).

### Autoencoder Command Arguments

- `--sampleID`: Column name representing the sample or patient ID (default: 'sampleID').
- `--latent_dim`: Dimensionality of the latent space (default: input_dim // 8).
- `--epochs`: Number of training epochs (default: 50).
- `--batch_size`: Training batch size (default: 32).
- `--validation_split`: Proportion of training data to use as validation set (default: 0.1).
- `--test_size`: Proportion of data to use as test set (default: 0.2).
- `--early_stopping`: Enable early stopping (use flag to activate).
- `--patience`: Patience for early stopping (default: 5).
- `--checkpoint`: Enable model checkpointing (use flag to activate).


### Benchmark Command Arguments

- `--target`: Target column name in the dataset (default: 'label').
- `--n_runs`: Number of runs for model stability evaluation (default: 20).

### Survival Clustering Command Arguments

- `--data_path`: Path to the CSV file containing patient features.
- `--clinical_df_path`: Path to the CSV file containing clinical data.
- `--save_dir`: Directory to save the results.

## Dependencies

- Python 3.11+
- numpy
- pandas
- scikit-learn
- matplotlib
- seaborn
- tensorflow
- lifelines
- yellowbrick

## License 
This project is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. See the LICENSE file for details.

## Author
Vatsal Patel - VatsalPatel18
            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "rag-classic-ml",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.9",
    "maintainer_email": null,
    "keywords": null,
    "author": "VatsalPatel18",
    "author_email": "vatsal1804@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/39/31/91ece7b2d386aba561cf668fdbc44b0d99f30d9189de007e5f17ee9689d0/rag_classic_ml-0.1.4.tar.gz",
    "platform": null,
    "description": "# RAG-Classic-ML\n\n**RAG-Classic-ML** is a versatile Python package designed to provide out-of-the-box machine learning pipelines for both basic and advanced tasks. It simplifies the process of building, training, and evaluating models for tasks like classification, regression, autoencoder-based feature extraction, and survival clustering. The package is designed for ease of use, offering pre-built pipelines and customizable parameters for a variety of machine learning algorithms.\n\n## Table of Contents  \n\n- [Features](#features)\n- [Installation](#installation)\n- [Usage](#usage)\n  - [Basic Machine Learning Pipelines](#classic_ml)\n    - [Classification]\n    - [Regression]\n  - [Advanced Pipelines](#classic_ml)\n    - [Autoencoder with Feature Extraction](#train_autoencoder)\n    - [Clustering and Survival Analysis](#Cluster)\n  - [Train Autoencoder Model](#train-autoencoder-model)\n  - [Survival Clustering Analysis](#survival-clustering-analysis)\n- [Command-Line Arguments](#command-line-arguments)\n- [Dependencies](#dependencies)\n- [License](#license)\n- [Author](#author)\n\n## Features\n\n- **Basic Machine Learning Pipelines**: Ready-to-use pipelines for common supervised learning tasks, including classification and regression, with a variety of machine learning models (e.g., Logistic Regression, SVC, Random Forest).\n- **Advanced Pipelines**\n  - **Autoencoder** : Dimensionality reduction and feature extraction using deep learning autoencoders.\n  - **Survival Clustering Analysis**: Performs clustering on patient features and integrates clinical data to generate Kaplan-Meier survival plots and log-rank tests.\n- **Customizable Models and Parameters**: Easily define and customize machine learning models and hyperparameters.\n- **Prediction and Metrics Generation**: Generates and saves predictions, feature importance scores, and various performance metrics for each model and run.\n- **Aggregation of Results**: Aggregates results across runs and models for comprehensive analysis, facilitating comparison and evaluation.\n- **Visualization Tools**: Generates plots including AUC curves, AUC box plots, feature importance charts, radar charts for model performance comparison, and survival analysis plots.\n\n## Installation\n\nYou can install the package directly from PyPI:\n\n```bash\npip install classic-ml\n```\nAlternatively, install from source:\n\n```bash\ngit clone https://github.com/yourusername/classic-ml.git\ncd benchmark-adv-ml\npip install .\n```\n\n## Useage\nThe classic-ml package provides a command-line interface (CLI) for ease of use. Below are examples of how to use the various components.\n\n## Basic Machine Learning Pipelines\n\n### Classification\nTrain and evaluate a classification model using the classic-ml CLI. You can specify different models and hyperparameters.\n\n### Example 1: Support Vector Classifier (SVC)\n\n```bash\nclassic-ml classification \\\n    --data ./Raisin_Dataset.data \\\n    --target 'label' \\\n    --output ./results/svc_rbf/ \\\n    --model SVC \\\n    --model_params '{\"C\": 1.0, \"kernel\": \"rbf\", \"gamma\": \"scale\", \"probability\": true}' \\\n    --test_size 0.2 \\\n    --seed 42\n```\n\n### Example 2: Logistic Regression\n\n```bash\nclassic-ml classification \\\n    --data ./Raisin_Dataset.data \\\n    --target 'label' \\\n    --output ./results/logistic_regression/ \\\n    --model LogisticRegression \\\n    --model_params '{\"C\": 0.5, \"penalty\": \"l1\", \"solver\": \"saga\", \"max_iter\": 1000, \"class_weight\": \"balanced\"}' \\\n    --test_size 0.2 \\\n    --seed 42\n\n```\n\n### Example 3: Random Forest Classifier\n\n```bash\nclassic-ml classification \\\n    --data ./Raisin_Dataset.data \\\n    --target 'label' \\\n    --output ./results/random_forest/ \\\n    --model RandomForestClassifier \\\n    --model_params '{\"n_estimators\": 100, \"max_depth\": 10}' \\\n    --test_size 0.2 \\\n    --seed 42\n\n```\n\n### Benchmark Machine Learning Models\nRun the benchmark ML pipeline to evaluate model stability across multiple runs.\n\n```bash\nbenchmark-adv-ml benchmark --data ./your_dataset.csv --output ./final_results --prelim_output ./prelim_results --n_runs 10 --seed 42\n```\n### Train Autoencoder Model\nTrain and evaluate an autoencoder model for feature extraction.\n\n```bash\nclassic-ml autoencoder \\\n    --data ./your_dataset.csv \\\n    --sampleID 'PatientID' \\\n    --output_dir ./final_results \\\n    --prelim_output ./prelim_results \\\n    --latent_dim 10 \\\n    --epochs 50 \\\n    --batch_size 32 \\\n    --validation_split 0.1 \\\n    --test_size 0.2 \\\n    --seed 42\n\n```\n\n### Survival Clustering Analysis\n```bash\nclassic-ml survival_clustering \\\n    --data_path ./latent_features.csv \\\n    --clinical_df_path ./clinical_data.csv \\\n    --save_dir ./final_results\n\n```\n\n## Command-Line Arguments\n\n### Common Arguments\n- `--data`: Path to the existing CSV file containing the dataset.\n- `--output`: Directory to save the final results and plots.\n- `--prelim_output`: Directory to save the preliminary results (predictions).\n- `--seed`: Seed for random state (default is 42).\n- `--test_size`: Fraction of data to use for testing (default: 0.2).\n\n### Classification/Regression Command Arguments\n\n- `--target`:  Target column name in the dataset (e.g., 'label' for classification or 'price' for regression).\n- `--model`:  Specify the machine learning model to use (e.g., SVC, LogisticRegression, RandomForestClassifier, LinearRegression).\n- `--model_params`:  Hyperparameters for the specified model in JSON format (e.g., {\"C\": 1.0, \"kernel\": \"rbf\"}).\n\n### Autoencoder Command Arguments\n\n- `--sampleID`: Column name representing the sample or patient ID (default: 'sampleID').\n- `--latent_dim`: Dimensionality of the latent space (default: input_dim // 8).\n- `--epochs`: Number of training epochs (default: 50).\n- `--batch_size`: Training batch size (default: 32).\n- `--validation_split`: Proportion of training data to use as validation set (default: 0.1).\n- `--test_size`: Proportion of data to use as test set (default: 0.2).\n- `--early_stopping`: Enable early stopping (use flag to activate).\n- `--patience`: Patience for early stopping (default: 5).\n- `--checkpoint`: Enable model checkpointing (use flag to activate).\n\n\n### Benchmark Command Arguments\n\n- `--target`: Target column name in the dataset (default: 'label').\n- `--n_runs`: Number of runs for model stability evaluation (default: 20).\n\n### Survival Clustering Command Arguments\n\n- `--data_path`: Path to the CSV file containing patient features.\n- `--clinical_df_path`: Path to the CSV file containing clinical data.\n- `--save_dir`: Directory to save the results.\n\n## Dependencies\n\n- Python 3.11+\n- numpy\n- pandas\n- scikit-learn\n- matplotlib\n- seaborn\n- tensorflow\n- lifelines\n- yellowbrick\n\n## License \nThis project is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. See the LICENSE file for details.\n\n## Author\nVatsal Patel - VatsalPatel18",
    "bugtrack_url": null,
    "license": null,
    "summary": "Classical Machine Learning methods for Reterival Augmented Generation",
    "version": "0.1.4",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9d3cb1d42881e3bfe477925111c9048303332ba148cdfd42567a11e6ddaa96d6",
                "md5": "69ef666e265a83b28a16b3690d5a6088",
                "sha256": "6b1a6192ca3b0d5ecb7361120d39931e3da8a08fc1f1381868fe121a03f1e31f"
            },
            "downloads": -1,
            "filename": "rag_classic_ml-0.1.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "69ef666e265a83b28a16b3690d5a6088",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.9",
            "size": 93412,
            "upload_time": "2024-09-20T21:53:33",
            "upload_time_iso_8601": "2024-09-20T21:53:33.268124Z",
            "url": "https://files.pythonhosted.org/packages/9d/3c/b1d42881e3bfe477925111c9048303332ba148cdfd42567a11e6ddaa96d6/rag_classic_ml-0.1.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "393191ece7b2d386aba561cf668fdbc44b0d99f30d9189de007e5f17ee9689d0",
                "md5": "43e308ce93f747bb7b1f31106eaf841b",
                "sha256": "642a813ab9f4e8f396963ba34076b0f95fae7f622e015d8e5c1636be7221d474"
            },
            "downloads": -1,
            "filename": "rag_classic_ml-0.1.4.tar.gz",
            "has_sig": false,
            "md5_digest": "43e308ce93f747bb7b1f31106eaf841b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.9",
            "size": 82198,
            "upload_time": "2024-09-20T21:53:36",
            "upload_time_iso_8601": "2024-09-20T21:53:36.961065Z",
            "url": "https://files.pythonhosted.org/packages/39/31/91ece7b2d386aba561cf668fdbc44b0d99f30d9189de007e5f17ee9689d0/rag_classic_ml-0.1.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-20 21:53:36",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "rag-classic-ml"
}
        
Elapsed time: 1.92080s