| Name | vtacML JSON |
| Version |
0.1.20
JSON |
| download |
| home_page | https://github.com/jerbeario/VTAC_ML |
| Summary | A machine learning pipeline to classify objects in VTAC dataset as GRB or not. |
| upload_time | 2024-09-02 14:44:38 |
| maintainer | None |
| docs_url | None |
| author | Jeremy Palmerio |
| requires_python | >=3.10 |
| license | MIT |
| keywords |
|
| VCS |
 |
| bugtrack_url |
|
| requirements |
No requirements were recorded.
|
| Travis-CI |
No Travis.
|
| coveralls test coverage |
No coveralls.
|
# vtacML
vtacML is a machine learning package designed for the analysis of data from the Visible Telescope (VT) on the SVOM mission. This package uses machine learning models to analyze a dataframe of features from VT observations and identify potential gamma-ray burst (GRB) candidates. The primary goal of vtacML is to integrate into the SVOM data analysis pipeline and add a feature to each observation indicating the probability that it is a GRB candidate.
## Table of Contents
- [Overview](#overview)
- [Installation](#installation)
- [Usage](#usage)
- [Quick Start](#quick-start)
- [Grid Search and Model Training](#grid-search-and-model-training)
- [Loading and Using the Best Model](#loading-and-using-the-best-model)
- [Using Pre-trained Model for Immediate Prediction](#using-pre-trained-model-for-immediate-prediction)
- [Config File](#config-file)
- [Documentation](#documentation)
- [License](#license)
- [Contact](#contact)
## Overview
The SVOM mission, a collaboration between the China National Space Administration (CNSA) and the French space agency CNES, aims to study gamma-ray bursts (GRBs), the most energetic explosions in the universe. The Visible Telescope (VT) on SVOM plays a critical role in observing these events in the optical wavelength range.
vtacML leverages machine learning to analyze VT data, providing a probability score for each observation to indicate its likelihood of being a GRB candidate. The package includes tools for data preprocessing, model training, evaluation, and visualization.
## Installation
To install vtacML, you can use `pip`:
```sh
pip install vtacML
```
Alternatively, you can clone the repository and install the package locally:
```sh
git clone https://github.com/jerbeario/vtacML.git
cd vtacML
pip install .
```
## Usage
### Quick Start
Here’s a quick example to get you started with vtacML:
```python
from vtacML.pipeline import VTACMLPipe
# Initialize the pipeline
pipeline = VTACMLPipe()
# Load configuration
pipeline.load_config('path/to/config.yaml')
# Train the model
pipeline.train()
# Evaluate the model
pipeline.evaluate('evaluation_name', plot=True)
# Predict GRB candidates
predictions = pipeline.predict(observation_dataframe, prob=True)
print(predictions)
```
### Grid Search and Model Training
vtacML can perform grid search on a large array of models and parameters specified in the configuration file. Initialize the `VTACMLPipe` class with a specified config file (or use the default) and train it. Then, you can save the best model for future use.
```python
from vtacML.pipeline import VTACMLPipe
# Initialize the pipeline with a configuration file
pipeline = VTACMLPipe(config_file='path/to/config.yaml')
# Train the model with grid search
pipeline.train()
# Save the best model
pipeline.save_best_model('path/to/save/best_model.pkl')
```
### Loading and Using the Best Model
After training and saving the best model, you can create a new instance of the `VTACMLPipe` class and load the best model for further use.
```python
from vtacML.pipeline import VTACMLPipe
# Initialize a new pipeline instance
pipeline = VTACMLPipe()
# Load the best model
pipeline.load_best_model('path/to/save/best_model.pkl')
# Predict GRB candidates
predictions = pipeline.predict(observation_dataframe, prob=True)
print(predictions)
```
### Using Pre-trained Model for Immediate Prediction
If you already have a trained model, you can use the quick wrapper function `predict_from_best_pipeline` to predict data immediately. A pre-trained model is available by default.
```python
from vtacML.pipeline import predict_from_best_pipeline
# Predict GRB candidates using the pre-trained model
predictions = predict_from_best_pipeline(observation_dataframe, model_path='path/to/pretrained_model.pkl')
print(predictions)
```
### Config File
The config file is used to configure the model searching process.
```yaml
# Default config file, used to search for best model using only first two sequences (X0, X1) from the VT pipeline
Inputs:
file: 'combined_qpo_vt_all_cases_with_GRB_with_flags.parquet' # Data file used for training. Located in /data/
# path: 'combined_qpo_vt_with_GRB.parquet'
# path: 'combined_qpo_vt_faint_case_with_GRB_with_flags.parquet'
columns: [
"MAGCAL_R0",
"MAGCAL_B0",
"MAGERR_R0",
"MAGERR_B0",
"MAGCAL_R1",
"MAGCAL_B1",
"MAGERR_R1",
"MAGERR_B1",
"MAGVAR_R1",
"MAGVAR_B1",
'EFLAG_R0',
'EFLAG_R1',
'EFLAG_B0',
'EFLAG_B1',
"NEW_SRC",
"DMAG_CAT"
] # features used for training
target_column: 'IS_GRB' # feature column that holds the class information to be predicted
# Set of models and parameters to perform GridSearchCV over
Models:
rfc:
class: RandomForestClassifier()
param_grid:
'rfc__n_estimators': [100, 200, 300] # Number of trees in the forest
'rfc__max_depth': [4, 6, 8] # Maximum depth of the tree
'rfc__min_samples_split': [2, 5, 10] # Minimum number of samples required to split an internal node
'rfc__min_samples_leaf': [1, 2, 4] # Minimum number of samples required to be at a leaf node
'rfc__bootstrap': [True, False] # Whether bootstrap samples are used when building trees
ada:
class: AdaBoostClassifier()
param_grid:
'ada__n_estimators': [50, 100, 200] # Number of weak learners
'ada__learning_rate': [0.01, 0.1, 1] # Learning rate
'ada__algorithm': ['SAMME'] # Algorithm for boosting
svc:
class: SVC()
param_grid:
'svc__C': [0.1, 1, 10, 100] # Regularization parameter
'svc__kernel': ['poly', 'rbf', 'sigmoid'] # Kernel type to be used in the algorithm
'svc__gamma': ['scale', 'auto'] # Kernel coefficient
'svc__degree': [3, 4, 5] # Degree of the polynomial kernel function (if `kernel` is 'poly')
knn:
class: KNeighborsClassifier()
param_grid:
'knn__n_neighbors': [3, 5, 7, 9] # Number of neighbors to use
'knn__weights': ['uniform', 'distance'] # Weight function used in prediction
'knn__algorithm': ['ball_tree', 'kd_tree', 'brute'] # Algorithm used to compute the nearest neighbors
'knn__p': [1, 2] # Power parameter for the Minkowski metric
lr:
class: LogisticRegression()
param_grid:
'lr__penalty': ['l1', 'l2', 'elasticnet'] # Specify the norm of the penalty
'lr__C': [0.01, 0.1, 1, 10] # Inverse of regularization strength
'lr__solver': ['newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga'] # Algorithm to use in the optimization problem
'lr__max_iter': [100, 200, 300] # Maximum number of iterations taken for the solvers to converge
dt:
class: DecisionTreeClassifier()
param_grid:
'dt__criterion': ['gini', 'entropy'] # The function to measure the quality of a split
'dt__splitter': ['best', 'random'] # The strategy used to choose the split at each node
'dt__max_depth': [4, 6, 8, 10] # Maximum depth of the tree
'dt__min_samples_split': [2, 5, 10] # Minimum number of samples required to split an internal node
'dt__min_samples_leaf': [1, 2, 4] # Minimum number of samples required to be at a leaf node
# Output directories
Outputs:
model_path: '/output/models'
viz_path: '/output/visualizations/'
plot_correlation:
flag: True
path: 'output/corr_plots/'
```
## Documentation
See documentation at
### Setting Up Development Environment
To set up a development environment, you can use the provided `requirements-dev.txt`:
```sh
conda create --name vtacML-dev python=3.8
conda activate vtacML-dev
pip install -r requirements.txt
```
### Running Tests
To run tests, use the following command:
```sh
pytest
```
## License
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for more details.
## Contact
For questions or support, please contact:
- Jeremy Palmerio - [palmerio.jeremy@gmail.com](mailto:palmerio.jeremy@gmail.com)
- Project Link: [https://github.com/jerbeario/vtacML](https://github.com/jerbeario/VTAC_ML)
Raw data
{
"_id": null,
"home_page": "https://github.com/jerbeario/VTAC_ML",
"name": "vtacML",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": null,
"author": "Jeremy Palmerio",
"author_email": "jeremypalmerio05@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/fd/f2/11e502537c78d5c01a7c41cd9faa884291ae9bda4ef063dc3d7f87db9813/vtacml-0.1.20.tar.gz",
"platform": null,
"description": "# vtacML\n\nvtacML is a machine learning package designed for the analysis of data from the Visible Telescope (VT) on the SVOM mission. This package uses machine learning models to analyze a dataframe of features from VT observations and identify potential gamma-ray burst (GRB) candidates. The primary goal of vtacML is to integrate into the SVOM data analysis pipeline and add a feature to each observation indicating the probability that it is a GRB candidate.\n\n## Table of Contents\n\n- [Overview](#overview)\n- [Installation](#installation)\n- [Usage](#usage)\n - [Quick Start](#quick-start)\n - [Grid Search and Model Training](#grid-search-and-model-training)\n - [Loading and Using the Best Model](#loading-and-using-the-best-model)\n - [Using Pre-trained Model for Immediate Prediction](#using-pre-trained-model-for-immediate-prediction)\n - [Config File](#config-file)\n- [Documentation](#documentation)\n- [License](#license)\n- [Contact](#contact)\n\n## Overview\n\nThe SVOM mission, a collaboration between the China National Space Administration (CNSA) and the French space agency CNES, aims to study gamma-ray bursts (GRBs), the most energetic explosions in the universe. The Visible Telescope (VT) on SVOM plays a critical role in observing these events in the optical wavelength range.\n\nvtacML leverages machine learning to analyze VT data, providing a probability score for each observation to indicate its likelihood of being a GRB candidate. The package includes tools for data preprocessing, model training, evaluation, and visualization.\n\n## Installation\n\nTo install vtacML, you can use `pip`:\n\n```sh\npip install vtacML\n```\n\nAlternatively, you can clone the repository and install the package locally:\n\n```sh\ngit clone https://github.com/jerbeario/vtacML.git\ncd vtacML\npip install .\n```\n\n## Usage\n\n### Quick Start\n\nHere\u2019s a quick example to get you started with vtacML:\n\n```python\nfrom vtacML.pipeline import VTACMLPipe\n\n# Initialize the pipeline\npipeline = VTACMLPipe()\n\n# Load configuration\npipeline.load_config('path/to/config.yaml')\n\n# Train the model\npipeline.train()\n\n# Evaluate the model\npipeline.evaluate('evaluation_name', plot=True)\n\n# Predict GRB candidates\npredictions = pipeline.predict(observation_dataframe, prob=True)\nprint(predictions)\n```\n\n### Grid Search and Model Training\n\nvtacML can perform grid search on a large array of models and parameters specified in the configuration file. Initialize the `VTACMLPipe` class with a specified config file (or use the default) and train it. Then, you can save the best model for future use.\n\n```python\nfrom vtacML.pipeline import VTACMLPipe\n\n# Initialize the pipeline with a configuration file\npipeline = VTACMLPipe(config_file='path/to/config.yaml')\n\n# Train the model with grid search\npipeline.train()\n\n# Save the best model\npipeline.save_best_model('path/to/save/best_model.pkl')\n```\n\n### Loading and Using the Best Model\n\nAfter training and saving the best model, you can create a new instance of the `VTACMLPipe` class and load the best model for further use.\n\n```python\nfrom vtacML.pipeline import VTACMLPipe\n\n# Initialize a new pipeline instance\npipeline = VTACMLPipe()\n\n# Load the best model\npipeline.load_best_model('path/to/save/best_model.pkl')\n\n# Predict GRB candidates\npredictions = pipeline.predict(observation_dataframe, prob=True)\nprint(predictions)\n```\n\n### Using Pre-trained Model for Immediate Prediction\n\nIf you already have a trained model, you can use the quick wrapper function `predict_from_best_pipeline` to predict data immediately. A pre-trained model is available by default.\n\n```python\nfrom vtacML.pipeline import predict_from_best_pipeline\n\n# Predict GRB candidates using the pre-trained model\npredictions = predict_from_best_pipeline(observation_dataframe, model_path='path/to/pretrained_model.pkl')\nprint(predictions)\n```\n\n### Config File\n\nThe config file is used to configure the model searching process. \n\n```yaml\n# Default config file, used to search for best model using only first two sequences (X0, X1) from the VT pipeline\nInputs:\n file: 'combined_qpo_vt_all_cases_with_GRB_with_flags.parquet' # Data file used for training. Located in /data/\n# path: 'combined_qpo_vt_with_GRB.parquet'\n# path: 'combined_qpo_vt_faint_case_with_GRB_with_flags.parquet'\n columns: [\n \"MAGCAL_R0\",\n \"MAGCAL_B0\",\n \"MAGERR_R0\",\n \"MAGERR_B0\",\n \"MAGCAL_R1\",\n \"MAGCAL_B1\",\n \"MAGERR_R1\",\n \"MAGERR_B1\",\n \"MAGVAR_R1\",\n \"MAGVAR_B1\",\n 'EFLAG_R0',\n 'EFLAG_R1',\n 'EFLAG_B0',\n 'EFLAG_B1',\n \"NEW_SRC\",\n \"DMAG_CAT\"\n ] # features used for training\n target_column: 'IS_GRB' # feature column that holds the class information to be predicted\n\n# Set of models and parameters to perform GridSearchCV over\nModels:\n rfc:\n class: RandomForestClassifier()\n param_grid:\n 'rfc__n_estimators': [100, 200, 300] # Number of trees in the forest\n 'rfc__max_depth': [4, 6, 8] # Maximum depth of the tree\n 'rfc__min_samples_split': [2, 5, 10] # Minimum number of samples required to split an internal node\n 'rfc__min_samples_leaf': [1, 2, 4] # Minimum number of samples required to be at a leaf node\n 'rfc__bootstrap': [True, False] # Whether bootstrap samples are used when building trees\n ada:\n class: AdaBoostClassifier()\n param_grid:\n 'ada__n_estimators': [50, 100, 200] # Number of weak learners\n 'ada__learning_rate': [0.01, 0.1, 1] # Learning rate\n 'ada__algorithm': ['SAMME'] # Algorithm for boosting\n svc:\n class: SVC()\n param_grid:\n 'svc__C': [0.1, 1, 10, 100] # Regularization parameter\n 'svc__kernel': ['poly', 'rbf', 'sigmoid'] # Kernel type to be used in the algorithm\n 'svc__gamma': ['scale', 'auto'] # Kernel coefficient\n 'svc__degree': [3, 4, 5] # Degree of the polynomial kernel function (if `kernel` is 'poly')\n knn:\n class: KNeighborsClassifier()\n param_grid:\n 'knn__n_neighbors': [3, 5, 7, 9] # Number of neighbors to use\n 'knn__weights': ['uniform', 'distance'] # Weight function used in prediction\n 'knn__algorithm': ['ball_tree', 'kd_tree', 'brute'] # Algorithm used to compute the nearest neighbors\n 'knn__p': [1, 2] # Power parameter for the Minkowski metric\n lr:\n class: LogisticRegression()\n param_grid:\n 'lr__penalty': ['l1', 'l2', 'elasticnet'] # Specify the norm of the penalty\n 'lr__C': [0.01, 0.1, 1, 10] # Inverse of regularization strength\n 'lr__solver': ['newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga'] # Algorithm to use in the optimization problem\n 'lr__max_iter': [100, 200, 300] # Maximum number of iterations taken for the solvers to converge\n dt:\n class: DecisionTreeClassifier()\n param_grid:\n 'dt__criterion': ['gini', 'entropy'] # The function to measure the quality of a split\n 'dt__splitter': ['best', 'random'] # The strategy used to choose the split at each node\n 'dt__max_depth': [4, 6, 8, 10] # Maximum depth of the tree\n 'dt__min_samples_split': [2, 5, 10] # Minimum number of samples required to split an internal node\n 'dt__min_samples_leaf': [1, 2, 4] # Minimum number of samples required to be at a leaf node\n\n# Output directories\nOutputs:\n model_path: '/output/models'\n viz_path: '/output/visualizations/'\n plot_correlation:\n flag: True\n path: 'output/corr_plots/'\n\n\n```\n\n## Documentation\n\nSee documentation at \n\n\n### Setting Up Development Environment\n\n\nTo set up a development environment, you can use the provided `requirements-dev.txt`:\n\n\n```sh\n\nconda create --name vtacML-dev python=3.8\n\nconda activate vtacML-dev\n\npip install -r requirements.txt\n\n```\n\n\n### Running Tests\n\n\nTo run tests, use the following command:\n\n\n```sh\n\npytest\n\n```\n\n\n## License\n\n\nThis project is licensed under the MIT License. See the [LICENSE](LICENSE) file for more details.\n\n\n## Contact\n\n\nFor questions or support, please contact:\n\n\n- Jeremy Palmerio - [palmerio.jeremy@gmail.com](mailto:palmerio.jeremy@gmail.com)\n\n- Project Link: [https://github.com/jerbeario/vtacML](https://github.com/jerbeario/VTAC_ML)\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A machine learning pipeline to classify objects in VTAC dataset as GRB or not.",
"version": "0.1.20",
"project_urls": {
"Homepage": "https://github.com/jerbeario/VTAC_ML"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "1e778f487632c6d815da20832da849767c5d7c47f4882f85a8899b90fbdb0adc",
"md5": "c22d85d60cc4c3c93603b95870104edd",
"sha256": "4899f2ce5f92a5ee0e6dfdb748ee89a0a2a901069ac536f1c6ef846036765d09"
},
"downloads": -1,
"filename": "vtacML-0.1.20-py3-none-any.whl",
"has_sig": false,
"md5_digest": "c22d85d60cc4c3c93603b95870104edd",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 32286668,
"upload_time": "2024-09-02T14:44:33",
"upload_time_iso_8601": "2024-09-02T14:44:33.523438Z",
"url": "https://files.pythonhosted.org/packages/1e/77/8f487632c6d815da20832da849767c5d7c47f4882f85a8899b90fbdb0adc/vtacML-0.1.20-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "fdf211e502537c78d5c01a7c41cd9faa884291ae9bda4ef063dc3d7f87db9813",
"md5": "903823dcaf754079f3b3a208f3b37464",
"sha256": "2b840d1a6d786cfa6686fea6cc67b3bf73d06487da93fc2dbaf784d93433305c"
},
"downloads": -1,
"filename": "vtacml-0.1.20.tar.gz",
"has_sig": false,
"md5_digest": "903823dcaf754079f3b3a208f3b37464",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 31871473,
"upload_time": "2024-09-02T14:44:38",
"upload_time_iso_8601": "2024-09-02T14:44:38.771620Z",
"url": "https://files.pythonhosted.org/packages/fd/f2/11e502537c78d5c01a7c41cd9faa884291ae9bda4ef063dc3d7f87db9813/vtacml-0.1.20.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-09-02 14:44:38",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "jerbeario",
"github_project": "VTAC_ML",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "vtacml"
}