lmur


Namelmur JSON
Version 2.1.1 PyPI version JSON
download
home_pageNone
SummaryNeural Network Dataset
upload_time2025-10-27 09:53:42
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseMIT License Copyright (c) 2024- ABrain One and contributors All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords neural network dataset experiment archive hyperparameters automl llm integration model validation benchmarking python library ai research
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ## <img src='https://abrain.one/img/lemur-nn-icon-64x64.png' width='32px'/> Neural Network Dataset 
<sub><a href='https://pypi.python.org/pypi/nn-dataset'><img src='https://img.shields.io/pypi/v/nn-dataset.svg'/></a> <a href="https://pepy.tech/project/nn-dataset"><img alt="GitHub release" src="https://static.pepy.tech/badge/nn-dataset"></a><br/>
short alias  <a href='https://pypi.python.org/pypi/lmur'>lmur</a></sub>
   
LEMUR - Learning, Evaluation, and Modeling for Unified Research

<img src='https://abrain.one/img/lemur-nn-whit.jpg' width='25%'/>

The original version of the <a href='https://github.com/ABrain-One/nn-dataset'>LEMUR dataset</a> was created by <strong>Arash Torabi Goodarzi, Roman Kochnev</strong> and <strong>Zofia Antonina Bentyn</strong> at the Computer Vision Laboratory, University of WΓΌrzburg, Germany.

## Contents

1. [πŸ“– Overview](#-overview)
2. [Installation](#installation-or-update-of-the-nn-dataset-with-pip)
   - [Create and Activate a Virtual Environment](#create-and-activate-a-virtual-environment-recommended)
   - [Installation or Update of the NN Dataset with pip](#installation-or-update-of-the-nn-dataset-with-pip)
3. [Usage](#usage)
   - [Standard Use Cases](#standard-use-cases)
   - [Reproducing Results with Fixed Training Parameters](#reproducing-results-with-fixed-training-parameters)
   - [View Supported Flags](#to-view-supported-flags)
4. [πŸ’» API: Programmatic Access](#-api-programmatic-access)
   - [Why the API is Important](#why-the-api-is-important)
   - [Data Extraction and Mechanism](#data-extraction-and-mechanism)
   - [The `data()` Function for Data Retrieval](#1-the-data-function-for-data-retrieval)
   - [The `check_nn()` Function for NN Validation](#2-the-check_nn-function-for-nn-validation)
   - [πŸš€ Get Started: Build Smarter, Train Less](#-get-started-build-smarter-train-less)
5. [🐳 Docker](#-docker)
   - [Example of Training LEMUR Neural Network within AI Linux Container](#example-of-training-lemur-neural-network-within-ai-linux-container-linux-host)
6. [Environment for NN Dataset Contributors](#environment-for-nn-dataset-contributors)
   - [Pip Package Manager](#pip-package-manager)
7. [Contribution](#contribution)
   - [Adding a New Neural Network Model](#adding-a-new-neural-network-model)
8. [Available Modules](#available-modules)
9. [Citation](#citation)
10. [Licenses](#licenses)
    
<h3>πŸ“– Overview</h3>
NN Dataset project provides flexibility for dynamically combining various deep learing tasks, datasets, metrics, and neural network models. It is designed to facilitate the verification of neural network performance under various combinations of training hyperparameters and data transformation algorithms, by automatically generating performance statistics. Developed to support the <a href='https://github.com/ABrain-One/nn-gpt'>NNGPT</a> project, this dataset contains neural network models modified or generated by NNGPT's large language models, with names featuring alphanumeric postfixes (e.g., C10C-ResNetTransformer-e2b49b871c8b9a9014277a51b2348999).

## Create and Activate a Virtual Environment (recommended)
For Linux/Mac:
   ```bash
   python3 -m venv .venv
   source .venv/bin/activate
   python -m pip install --upgrade pip
   ```
For Windows:
   ```bash
   python3 -m venv .venv
   .venv\Scripts\activate
   python -m pip install --upgrade pip
   ```

It is assumed that CUDA 12.6 is installed; otherwise, consider replacing 'cu126' with the appropriate version. Some neural network training tasks require GPUs with at least 24 GB of memory.

## Installation or Update of the NN Dataset with pip
Remove an old version of the LEMUR Dataset and its database:
```bash
pip uninstall nn-dataset -y
rm -rf db
```
Installing the stable version:
```bash
pip install --no-cache-dir nn-dataset --upgrade --extra-index-url https://download.pytorch.org/whl/cu126
```
Installing from GitHub to get the most recent code and statistics updates:
```bash
pip install git+https://github.com/ABrain-One/nn-dataset --upgrade --force --extra-index-url https://download.pytorch.org/whl/cu126
```
Adding functionality to export data to Excel files and generate plots for <a href='https://github.com/ABrain-One/nn-stat'>analyzing neural network performance</a>:
```bash
pip install nn-dataset[stat] --upgrade --extra-index-url https://download.pytorch.org/whl/cu126
```
and export/generate:
```bash
python -m ab.stat.export
```

## Usage

Standard use cases:

Run the automated training process for this model (e.g., a new ComplexNet training pipeline configuration):
```bash
python -m ab.nn.train -c img-classification_cifar-10_acc_ComplexNet
```
or for all image segmentation models using a fixed range of training parameters and transformer:
```bash
. train.sh -c img-segmentation -f echo --min_learning_rate 1e-4 -l 1e-2 --min_momentum 0.8 -m 0.99 --min_batch_binary_power 2 -b 6
```
`train.sh` internally calls `ab.nn.train`, offering a shorter way to run the program. Both scripts accept the same input flags and can be used interchangeably.

##### Reproducing Results with Fixed Training Parameters

To reproduce previously obtained results, provide fixed values for the training parameters in JSON format. The parameter names should match those returned by the <strong>supported_hyperparameters()</strong> function of the NN model.

Example command:

```bash
. train.sh -c img-classification_cifar-10_acc_ComplexNet -f complex -p '{"lr": 0.017, "momentum": 0.022 , "batch": 32}'
```

where:

-c specifies the training pipeline,

-f selects the preprocessing algorithm,

-p sets the hyperparameters explicitly (e.g., learning rate, momentum, batch size) using a JSON string.


##### To view supported flags:
```bash
. train.sh -h
```

**Add your new neural network model to the `ab/nn/nn` directory and proceed with your experiments (see [Contribution](#contribution) for details).**


## πŸ’» API: Programmatic Access

The **LEMUR NN Dataset API** (`ab.nn.api`) is the dedicated programmatic interface for both querying validated deep learning experiment data and submitting new neural network configurations for automatic training and archival. It is the essential layer supporting modern AutoML systems, including the NNGPT framework.

### Why the API is Important

The API solves the problem of **costly and time-consuming model validation**. By providing two distinct and powerful functions, it transforms the bottleneck of "waiting for results" into two key steps: instant query and automated validation.

1.  **Enables Predictive Models:** Access to the full historical data allows researchers to train **performance prediction models** that can estimate a model's final accuracy *before* any training begins, saving massive amounts of compute time.
2.  **Facilitates LLM Feedback:** The API acts as the crucial feedback mechanism for LLMs (like NNGPT). Generated architectures are validated via `check_nn`, and the results are immediately fed back into the dataset via `data()`, enabling the LLM to iteratively improve its quality based on its own outputs.

### Data Extraction and Mechanism

The core value of the API is the ability to retrieve complete, validated experimental records and submit new code for verification.

#### 1. The `data()` Function for Data Retrieval

```python
def data(...) -> pandas.DataFrame
```

| Data Type Extracted | DataFrame Column Name | Description |
| :--- | :--- | :--- |
| **Model Python Code** | `'nn_code'` | The **exact Python code (as a string)** defining the neural network's architecture. |
| **Hyperparameters** | `'prm'` | The **exact dictionary of hyperparameters** (e.g., `{'lr': 0.01, 'momentum': 0.9}`) used for this specific run. |
| **Performance Metric** | `'accuracy'` | The **metric value** (e.g., accuracy) achieved in the experiment, recorded at the `'epoch'` specified. |
| **Execution Time** | `'duration'` | The wall-clock time required for the training run, ns. |

**Mechanism:** Users filter the database using optional arguments (`task`, `dataset`, `nn`, etc.). The returned DataFrame allows external programs (such as statistical models or benchmark scripts) to easily consume the structured data for large-scale analysis. The optional `only_best_accuracy=True` ensures efficiency by returning only the best-performing trial for each unique configuration.

#### 2. The `check_nn()` Function for NN Validation

```python
def check_nn(nn_code: str, task: str, dataset: str, metric: str, prm: dict, ...) -> tuple[str, float, float, float]
```

This function is the **submission endpoint** for new models.

1.  **Input:** An external program (e.g., an LLM agent) provides the new model's `nn_code` (as a string), the `prm` dictionary, and the context (`task`, `dataset`, `metric`).
2.  **Process:** The function automatically initiates the full training pipeline, running the code under standardized conditions for a set duration (`epoch_limit_minutes`).
3.  **Output:** It returns a tuple containing the key validated metrics, ready for consumption by an LLM or an external optimization loop:
    * **NN Model Name (`str`):** An automatically generated unique ID for the archived model.
    * **Accuracy (`float`):** The measured final performance.
    * **Accuracy to Time Metric (`float`):** A single metric balancing performance against compute efficiency.
    * **Quality of the Code Metric (`float`):** A score assessing the structural integrity of the submitted code.

#### πŸš€ Get Started: Build Smarter, Train Less

The LEMUR API is designed for artificial agents, as well as for students and scientists. Using `data()`, provides immediate access to **validated performance data derived from extensive computations**. Instead of dedicating weeks of expensive hardware time to replicate known results or blindly test configurations, you can now:
1.  **Data Access Scale:** Instantly retrieve performance benchmarks validated across a **large quantity** of diverse architectural and hyperparameter configurations.
2.  **Focus on Generation:** Use `check_nn()` to automate the validation of your new, unique architectures.
3.  **Computational Efficiency:** Prioritize allocation of high-cost computational resources (GPU/TPU) exclusively toward training novel architectures.

### 🐳 Docker

All versions of this project are compatible with <a href='https://hub.docker.com/r/abrainone/ai-linux' target='_blank'>AI Linux</a> and can be seamlessly executed within the AI Linux Docker container.

<h4>Example of training LEMUR neural network within the AI Linux container (Linux host):</h4>

Installing the latest version of the project from GitHub
```bash
docker run --rm -u $(id -u):ab -v $(pwd):/a/mm abrainone/ai-linux:cv bash -c "[ -d nn-dataset ] && git -C nn-dataset pull || git -c advice.detachedHead=false clone --depth 1 https://github.com/ABrain-One/nn-dataset"
```

Running a quick training script:
```bash
docker run --rm -u $(id -u):ab --shm-size=16G -v $(pwd)/nn-dataset:/a/mm abrainone/ai-linux:cv bash -c ". train.sh -c img-classification_cifar-10_acc_ComplexNet -f complex -l 0.017 --min_learning_rate 0.013 -m 0.025 --min_momentum 0.022 -b 7 --min_batch_binary_power 8 --max_batch_binary_power 9"
```

If recently added dependencies are missing in the <a href='https://hub.docker.com/r/abrainone/ai-linux' target='_blank'>AI Linux</a>, you can create a container from the Docker image ```abrainone/ai-linux:cv```, install the missing packages (preferably using ```pip install <package name>```), and then create a new image from the container using ```docker commit <container name> <new image name>```. You can use this new image locally or push it to the registry for deployment on the computer cluster.


## Environment for NN Dataset Contributors
### Pip package manager
Create a virtual environment, activate it, and run the following command to install all the project dependencies:
```bash
python -m pip install --upgrade pip
pip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cu126
```

## Contribution

To contribute a new neural network (NN) model to the NN Dataset, please ensure the following criteria are met:

1. The code for each model is provided in a respective ".py" file within the <strong>/ab/nn/nn</strong> directory, and the file is named after the name of the model's structure.
2. The main class for each model is named <strong>Net</strong>.
3. The constructor of the <strong>Net</strong> class takes the following parameters:
   - <strong>in_shape</strong> (tuple): The shape of the first tensor from the dataset iterator. For images it is structured as `(batch, channel, height, width)`.
   - <strong>out_shape</strong> (tuple): Provided by the dataset loader, it describes the shape of the output tensor. For a classification task, this could be `(number of classes,)`.
   - <strong>prm</strong> (dict): A dictionary of hyperparameters, e.g., `{'lr': 0.24, 'momentum': 0.93, 'dropout': 0.51}`.
   - <strong>device</strong> (torch.device): PyTorch device used for the model training 
4. All external information required for the correct building and training of the NN model for a specific dataset/transformer, as well as the list of hyperparameters, is extracted from <strong>in_shape</strong>, <strong>out_shape</strong> or <strong>prm</strong>, e.g.: </br>`batch = in_shape[0]` </br>`channel_number = in_shape[1]` </br>`image_size = in_shape[2]` </br>`class_number = out_shape[0]` </br>`learning_rate = prm['lr']` </br>`momentum = prm['momentum']` </br>`dropout = prm['dropout']`.
5. Every model script has function returning set of supported hyperparameters, e.g.: </br>`def supported_hyperparameters(): return {'lr', 'momentum', 'dropout'}`</br> The value of each hyperparameter lies within the range of 0.0 to 1.0.
6. Every class <strong>Net</strong> implements two functions: </br>`train_setup(self, prm)`</br> and </br>`learn(self, train_data)`</br> The first function initializes the `criteria` and `optimizer`, while the second implements the training pipeline. See a simple implementation in the <a href="https://github.com/ABrain-One/nn-dataset/blob/main/ab/nn/nn/AlexNet.py">AlexNet model</a>.
7. For each pull request involving a new NN model, please generate and submit training statistics for 100 Optuna trials (or at least 3 trials for very large models) in the <strong>ab/nn/stat</strong> directory. The trials should cover 5 epochs of training. Ensure that this statistics is included along with the model in your pull request. For example, the statistics for the ComplexNet model are stored in files <strong>&#x003C;epoch number&#x003E;.json</strong> inside folder <strong>img-classification_cifar-10_acc_ComplexNet</strong>, and can be generated by:<br/>
```bash
python run.py -c img-classification_cifar-10_acc_ComplexNet -t 100 -e 5
```
<p>See more examples of models in <code>/ab/nn/nn</code> and generated statistics in <code>/ab/nn/stat</code>.</p>

### Available Modules

The `NN Dataset` includes the following key modules within the **<a href='https://github.com/ABrain-One/nn-dataset/tree/main/ab/nn'>ab.nn</a>** package:
- **<a href='https://github.com/ABrain-One/nn-dataset/tree/main/ab/nn/nn'>nn</a>**: Predefined neural network architectures, including models like `AlexNet`, `ResNet`, `VGG`, and more.
- **<a href='https://github.com/ABrain-One/nn-dataset/tree/main/ab/nn/loader'>loader</a>**: Data loading utilities for popular datasets such as CIFAR-10, COCO, and others.
- **<a href='https://github.com/ABrain-One/nn-dataset/tree/main/ab/nn/metric'>metric</a>**: Evaluation metrics supported for model assessment, such as accuracy, Intersection over Union (IoU), and others.
- **<a href='https://github.com/ABrain-One/nn-dataset/tree/main/ab/nn/transform'>transform</a>**: A collection of data transformation algorithms for dataset preprocessing and augmentation.
- **<a href='https://github.com/ABrain-One/nn-dataset/tree/main/ab/nn/stat'>stat</a>**: Statistics for different neural network model training pipelines.
- **<a href='https://github.com/ABrain-One/nn-dataset/tree/main/ab/nn/util'>util</a>**: Utility functions designed to assist with training, model evaluation, and statistical analysis.

## Citation

If you find the LEMUR Neural Network Dataset to be useful for your research, please consider citing our <a target='_blank' href='https://arxiv.org/pdf/2504.10552'>article</a>:
```bibtex
@article{ABrain.NN-Dataset,
  title={LEMUR Neural Network Dataset: Towards Seamless AutoML},
  author={Goodarzi, Arash Torabi and Kochnev, Roman and Khalid, Waleed and Qin, Furui and Uzun, Tolgay Atinc and Dhameliya, Yashkumar Sanjaybhai and Kathiriya, Yash Kanubhai and Bentyn, Zofia Antonina and Ignatov, Dmitry and Timofte, Radu},
  journal={arXiv preprint arXiv:2504.10552},
  year={2025}
}
```

## Licenses

This project is distributed under the following licensing terms:
<ul><li>for neural network models adopted from other projects
  <ul>
    <li> Python code under the legacy <a href="https://github.com/ABrain-One/nn-dataset/blob/main/Doc/Licenses/LICENSE-MIT-NNs">MIT</a> or <a href="https://github.com/ABrain-One/nn-dataset/blob/main/Doc/Licenses/LICENSE-BSD-NNs">BSD 3-Clause</a> license</li>
    <li> models with pretrained weights under the legacy <a href="https://github.com/ABrain-One/nn-dataset/blob/main/Doc/Licenses/LICENSE-DEEPSEEK-LLM-V2">DeepSeek LLM V2</a> license</li>
  </ul></li>
<li> all neural network models and their weights not covered by the above licenses, as well as all other files and assets in this project, are subject to the <a href="https://github.com/ABrain-One/nn-dataset/blob/main/LICENSE">MIT license</a></li> 
</ul>

#### The idea and leadership of Dr. Ignatov

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "lmur",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "neural network, dataset, experiment archive, hyperparameters, autoML, LLM integration, model validation, benchmarking, python library, AI research",
    "author": null,
    "author_email": "ABrain One and contributors <AI@ABrain.one>",
    "download_url": "https://files.pythonhosted.org/packages/74/51/ee8ac7fb66b92bf98ad4f3577a59ebbe3c37defe3322ad38ac22da4f0f6c/lmur-2.1.1.tar.gz",
    "platform": null,
    "description": "## <img src='https://abrain.one/img/lemur-nn-icon-64x64.png' width='32px'/> Neural Network Dataset \n<sub><a href='https://pypi.python.org/pypi/nn-dataset'><img src='https://img.shields.io/pypi/v/nn-dataset.svg'/></a> <a href=\"https://pepy.tech/project/nn-dataset\"><img alt=\"GitHub release\" src=\"https://static.pepy.tech/badge/nn-dataset\"></a><br/>\nshort alias  <a href='https://pypi.python.org/pypi/lmur'>lmur</a></sub>\n   \nLEMUR - Learning, Evaluation, and Modeling for Unified Research\n\n<img src='https://abrain.one/img/lemur-nn-whit.jpg' width='25%'/>\n\nThe original version of the <a href='https://github.com/ABrain-One/nn-dataset'>LEMUR dataset</a> was created by <strong>Arash Torabi Goodarzi, Roman Kochnev</strong> and <strong>Zofia Antonina Bentyn</strong> at the Computer Vision Laboratory, University of W\u00fcrzburg, Germany.\n\n## Contents\n\n1. [\ud83d\udcd6 Overview](#-overview)\n2. [Installation](#installation-or-update-of-the-nn-dataset-with-pip)\n   - [Create and Activate a Virtual Environment](#create-and-activate-a-virtual-environment-recommended)\n   - [Installation or Update of the NN Dataset with pip](#installation-or-update-of-the-nn-dataset-with-pip)\n3. [Usage](#usage)\n   - [Standard Use Cases](#standard-use-cases)\n   - [Reproducing Results with Fixed Training Parameters](#reproducing-results-with-fixed-training-parameters)\n   - [View Supported Flags](#to-view-supported-flags)\n4. [\ud83d\udcbb API: Programmatic Access](#-api-programmatic-access)\n   - [Why the API is Important](#why-the-api-is-important)\n   - [Data Extraction and Mechanism](#data-extraction-and-mechanism)\n   - [The `data()` Function for Data Retrieval](#1-the-data-function-for-data-retrieval)\n   - [The `check_nn()` Function for NN Validation](#2-the-check_nn-function-for-nn-validation)\n   - [\ud83d\ude80 Get Started: Build Smarter, Train Less](#-get-started-build-smarter-train-less)\n5. [\ud83d\udc33 Docker](#-docker)\n   - [Example of Training LEMUR Neural Network within AI Linux Container](#example-of-training-lemur-neural-network-within-ai-linux-container-linux-host)\n6. [Environment for NN Dataset Contributors](#environment-for-nn-dataset-contributors)\n   - [Pip Package Manager](#pip-package-manager)\n7. [Contribution](#contribution)\n   - [Adding a New Neural Network Model](#adding-a-new-neural-network-model)\n8. [Available Modules](#available-modules)\n9. [Citation](#citation)\n10. [Licenses](#licenses)\n    \n<h3>\ud83d\udcd6 Overview</h3>\nNN Dataset project provides flexibility for dynamically combining various deep learing tasks, datasets, metrics, and neural network models. It is designed to facilitate the verification of neural network performance under various combinations of training hyperparameters and data transformation algorithms, by automatically generating performance statistics. Developed to support the <a href='https://github.com/ABrain-One/nn-gpt'>NNGPT</a> project, this dataset contains neural network models modified or generated by NNGPT's large language models, with names featuring alphanumeric postfixes (e.g., C10C-ResNetTransformer-e2b49b871c8b9a9014277a51b2348999).\n\n## Create and Activate a Virtual Environment (recommended)\nFor Linux/Mac:\n   ```bash\n   python3 -m venv .venv\n   source .venv/bin/activate\n   python -m pip install --upgrade pip\n   ```\nFor Windows:\n   ```bash\n   python3 -m venv .venv\n   .venv\\Scripts\\activate\n   python -m pip install --upgrade pip\n   ```\n\nIt is assumed that CUDA 12.6 is installed; otherwise, consider replacing 'cu126' with the appropriate version. Some neural network training tasks require GPUs with at least 24 GB of memory.\n\n## Installation or Update of the NN Dataset with pip\nRemove an old version of the LEMUR Dataset and its database:\n```bash\npip uninstall nn-dataset -y\nrm -rf db\n```\nInstalling the stable version:\n```bash\npip install --no-cache-dir nn-dataset --upgrade --extra-index-url https://download.pytorch.org/whl/cu126\n```\nInstalling from GitHub to get the most recent code and statistics updates:\n```bash\npip install git+https://github.com/ABrain-One/nn-dataset --upgrade --force --extra-index-url https://download.pytorch.org/whl/cu126\n```\nAdding functionality to export data to Excel files and generate plots for <a href='https://github.com/ABrain-One/nn-stat'>analyzing neural network performance</a>:\n```bash\npip install nn-dataset[stat] --upgrade --extra-index-url https://download.pytorch.org/whl/cu126\n```\nand export/generate:\n```bash\npython -m ab.stat.export\n```\n\n## Usage\n\nStandard use cases:\n\nRun the automated training process for this model (e.g., a new ComplexNet training pipeline configuration):\n```bash\npython -m ab.nn.train -c img-classification_cifar-10_acc_ComplexNet\n```\nor for all image segmentation models using a fixed range of training parameters and transformer:\n```bash\n. train.sh -c img-segmentation -f echo --min_learning_rate 1e-4 -l 1e-2 --min_momentum 0.8 -m 0.99 --min_batch_binary_power 2 -b 6\n```\n`train.sh` internally calls `ab.nn.train`, offering a shorter way to run the program. Both scripts accept the same input flags and can be used interchangeably.\n\n##### Reproducing Results with Fixed Training Parameters\n\nTo reproduce previously obtained results, provide fixed values for the training parameters in JSON format. The parameter names should match those returned by the <strong>supported_hyperparameters()</strong> function of the NN model.\n\nExample command:\n\n```bash\n. train.sh -c img-classification_cifar-10_acc_ComplexNet -f complex -p '{\"lr\": 0.017, \"momentum\": 0.022 , \"batch\": 32}'\n```\n\nwhere:\n\n-c specifies the training pipeline,\n\n-f selects the preprocessing algorithm,\n\n-p sets the hyperparameters explicitly (e.g., learning rate, momentum, batch size) using a JSON string.\n\n\n##### To view supported flags:\n```bash\n. train.sh -h\n```\n\n**Add your new neural network model to the `ab/nn/nn` directory and proceed with your experiments (see [Contribution](#contribution) for details).**\n\n\n## \ud83d\udcbb API: Programmatic Access\n\nThe **LEMUR NN Dataset API** (`ab.nn.api`) is the dedicated programmatic interface for both querying validated deep learning experiment data and submitting new neural network configurations for automatic training and archival. It is the essential layer supporting modern AutoML systems, including the NNGPT framework.\n\n### Why the API is Important\n\nThe API solves the problem of **costly and time-consuming model validation**. By providing two distinct and powerful functions, it transforms the bottleneck of \"waiting for results\" into two key steps: instant query and automated validation.\n\n1.  **Enables Predictive Models:** Access to the full historical data allows researchers to train **performance prediction models** that can estimate a model's final accuracy *before* any training begins, saving massive amounts of compute time.\n2.  **Facilitates LLM Feedback:** The API acts as the crucial feedback mechanism for LLMs (like NNGPT). Generated architectures are validated via `check_nn`, and the results are immediately fed back into the dataset via `data()`, enabling the LLM to iteratively improve its quality based on its own outputs.\n\n### Data Extraction and Mechanism\n\nThe core value of the API is the ability to retrieve complete, validated experimental records and submit new code for verification.\n\n#### 1. The `data()` Function for Data Retrieval\n\n```python\ndef data(...) -> pandas.DataFrame\n```\n\n| Data Type Extracted | DataFrame Column Name | Description |\n| :--- | :--- | :--- |\n| **Model Python Code** | `'nn_code'` | The **exact Python code (as a string)** defining the neural network's architecture. |\n| **Hyperparameters** | `'prm'` | The **exact dictionary of hyperparameters** (e.g., `{'lr': 0.01, 'momentum': 0.9}`) used for this specific run. |\n| **Performance Metric** | `'accuracy'` | The **metric value** (e.g., accuracy) achieved in the experiment, recorded at the `'epoch'` specified. |\n| **Execution Time** | `'duration'` | The wall-clock time required for the training run, ns. |\n\n**Mechanism:** Users filter the database using optional arguments (`task`, `dataset`, `nn`, etc.). The returned DataFrame allows external programs (such as statistical models or benchmark scripts) to easily consume the structured data for large-scale analysis. The optional `only_best_accuracy=True` ensures efficiency by returning only the best-performing trial for each unique configuration.\n\n#### 2. The `check_nn()` Function for NN Validation\n\n```python\ndef check_nn(nn_code: str, task: str, dataset: str, metric: str, prm: dict, ...) -> tuple[str, float, float, float]\n```\n\nThis function is the **submission endpoint** for new models.\n\n1.  **Input:** An external program (e.g., an LLM agent) provides the new model's `nn_code` (as a string), the `prm` dictionary, and the context (`task`, `dataset`, `metric`).\n2.  **Process:** The function automatically initiates the full training pipeline, running the code under standardized conditions for a set duration (`epoch_limit_minutes`).\n3.  **Output:** It returns a tuple containing the key validated metrics, ready for consumption by an LLM or an external optimization loop:\n    * **NN Model Name (`str`):** An automatically generated unique ID for the archived model.\n    * **Accuracy (`float`):** The measured final performance.\n    * **Accuracy to Time Metric (`float`):** A single metric balancing performance against compute efficiency.\n    * **Quality of the Code Metric (`float`):** A score assessing the structural integrity of the submitted code.\n\n#### \ud83d\ude80 Get Started: Build Smarter, Train Less\n\nThe LEMUR API is designed for artificial agents, as well as for students and scientists. Using `data()`, provides immediate access to **validated performance data derived from extensive computations**. Instead of dedicating weeks of expensive hardware time to replicate known results or blindly test configurations, you can now:\n1.  **Data Access Scale:** Instantly retrieve performance benchmarks validated across a **large quantity** of diverse architectural and hyperparameter configurations.\n2.  **Focus on Generation:** Use `check_nn()` to automate the validation of your new, unique architectures.\n3.  **Computational Efficiency:** Prioritize allocation of high-cost computational resources (GPU/TPU) exclusively toward training novel architectures.\n\n### \ud83d\udc33 Docker\n\nAll versions of this project are compatible with <a href='https://hub.docker.com/r/abrainone/ai-linux' target='_blank'>AI Linux</a> and can be seamlessly executed within the AI Linux Docker container.\n\n<h4>Example of training LEMUR neural network within the AI Linux container (Linux host):</h4>\n\nInstalling the latest version of the project from GitHub\n```bash\ndocker run --rm -u $(id -u):ab -v $(pwd):/a/mm abrainone/ai-linux:cv bash -c \"[ -d nn-dataset ] && git -C nn-dataset pull || git -c advice.detachedHead=false clone --depth 1 https://github.com/ABrain-One/nn-dataset\"\n```\n\nRunning a quick training script:\n```bash\ndocker run --rm -u $(id -u):ab --shm-size=16G -v $(pwd)/nn-dataset:/a/mm abrainone/ai-linux:cv bash -c \". train.sh -c img-classification_cifar-10_acc_ComplexNet -f complex -l 0.017 --min_learning_rate 0.013 -m 0.025 --min_momentum 0.022 -b 7 --min_batch_binary_power 8 --max_batch_binary_power 9\"\n```\n\nIf recently added dependencies are missing in the <a href='https://hub.docker.com/r/abrainone/ai-linux' target='_blank'>AI Linux</a>, you can create a container from the Docker image ```abrainone/ai-linux:cv```, install the missing packages (preferably using ```pip install <package name>```), and then create a new image from the container using ```docker commit <container name> <new image name>```. You can use this new image locally or push it to the registry for deployment on the computer cluster.\n\n\n## Environment for NN Dataset Contributors\n### Pip package manager\nCreate a virtual environment, activate it, and run the following command to install all the project dependencies:\n```bash\npython -m pip install --upgrade pip\npip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cu126\n```\n\n## Contribution\n\nTo contribute a new neural network (NN) model to the NN Dataset, please ensure the following criteria are met:\n\n1. The code for each model is provided in a respective \".py\" file within the <strong>/ab/nn/nn</strong> directory, and the file is named after the name of the model's structure.\n2. The main class for each model is named <strong>Net</strong>.\n3. The constructor of the <strong>Net</strong> class takes the following parameters:\n   - <strong>in_shape</strong> (tuple): The shape of the first tensor from the dataset iterator. For images it is structured as `(batch, channel, height, width)`.\n   - <strong>out_shape</strong> (tuple): Provided by the dataset loader, it describes the shape of the output tensor. For a classification task, this could be `(number of classes,)`.\n   - <strong>prm</strong> (dict): A dictionary of hyperparameters, e.g., `{'lr': 0.24, 'momentum': 0.93, 'dropout': 0.51}`.\n   - <strong>device</strong> (torch.device): PyTorch device used for the model training \n4. All external information required for the correct building and training of the NN model for a specific dataset/transformer, as well as the list of hyperparameters, is extracted from <strong>in_shape</strong>, <strong>out_shape</strong> or <strong>prm</strong>, e.g.: </br>`batch = in_shape[0]` </br>`channel_number = in_shape[1]` </br>`image_size = in_shape[2]` </br>`class_number = out_shape[0]` </br>`learning_rate = prm['lr']` </br>`momentum = prm['momentum']` </br>`dropout = prm['dropout']`.\n5. Every model script has function returning set of supported hyperparameters, e.g.: </br>`def supported_hyperparameters(): return {'lr', 'momentum', 'dropout'}`</br> The value of each hyperparameter lies within the range of 0.0 to 1.0.\n6. Every class <strong>Net</strong> implements two functions: </br>`train_setup(self, prm)`</br> and </br>`learn(self, train_data)`</br> The first function initializes the `criteria` and `optimizer`, while the second implements the training pipeline. See a simple implementation in the <a href=\"https://github.com/ABrain-One/nn-dataset/blob/main/ab/nn/nn/AlexNet.py\">AlexNet model</a>.\n7. For each pull request involving a new NN model, please generate and submit training statistics for 100 Optuna trials (or at least 3 trials for very large models) in the <strong>ab/nn/stat</strong> directory. The trials should cover 5 epochs of training. Ensure that this statistics is included along with the model in your pull request. For example, the statistics for the ComplexNet model are stored in files <strong>&#x003C;epoch number&#x003E;.json</strong> inside folder <strong>img-classification_cifar-10_acc_ComplexNet</strong>, and can be generated by:<br/>\n```bash\npython run.py -c img-classification_cifar-10_acc_ComplexNet -t 100 -e 5\n```\n<p>See more examples of models in <code>/ab/nn/nn</code> and generated statistics in <code>/ab/nn/stat</code>.</p>\n\n### Available Modules\n\nThe `NN Dataset` includes the following key modules within the **<a href='https://github.com/ABrain-One/nn-dataset/tree/main/ab/nn'>ab.nn</a>** package:\n- **<a href='https://github.com/ABrain-One/nn-dataset/tree/main/ab/nn/nn'>nn</a>**: Predefined neural network architectures, including models like `AlexNet`, `ResNet`, `VGG`, and more.\n- **<a href='https://github.com/ABrain-One/nn-dataset/tree/main/ab/nn/loader'>loader</a>**: Data loading utilities for popular datasets such as CIFAR-10, COCO, and others.\n- **<a href='https://github.com/ABrain-One/nn-dataset/tree/main/ab/nn/metric'>metric</a>**: Evaluation metrics supported for model assessment, such as accuracy, Intersection over Union (IoU), and others.\n- **<a href='https://github.com/ABrain-One/nn-dataset/tree/main/ab/nn/transform'>transform</a>**: A collection of data transformation algorithms for dataset preprocessing and augmentation.\n- **<a href='https://github.com/ABrain-One/nn-dataset/tree/main/ab/nn/stat'>stat</a>**: Statistics for different neural network model training pipelines.\n- **<a href='https://github.com/ABrain-One/nn-dataset/tree/main/ab/nn/util'>util</a>**: Utility functions designed to assist with training, model evaluation, and statistical analysis.\n\n## Citation\n\nIf you find the LEMUR Neural Network Dataset to be useful for your research, please consider citing our <a target='_blank' href='https://arxiv.org/pdf/2504.10552'>article</a>:\n```bibtex\n@article{ABrain.NN-Dataset,\n  title={LEMUR Neural Network Dataset: Towards Seamless AutoML},\n  author={Goodarzi, Arash Torabi and Kochnev, Roman and Khalid, Waleed and Qin, Furui and Uzun, Tolgay Atinc and Dhameliya, Yashkumar Sanjaybhai and Kathiriya, Yash Kanubhai and Bentyn, Zofia Antonina and Ignatov, Dmitry and Timofte, Radu},\n  journal={arXiv preprint arXiv:2504.10552},\n  year={2025}\n}\n```\n\n## Licenses\n\nThis project is distributed under the following licensing terms:\n<ul><li>for neural network models adopted from other projects\n  <ul>\n    <li> Python code under the legacy <a href=\"https://github.com/ABrain-One/nn-dataset/blob/main/Doc/Licenses/LICENSE-MIT-NNs\">MIT</a> or <a href=\"https://github.com/ABrain-One/nn-dataset/blob/main/Doc/Licenses/LICENSE-BSD-NNs\">BSD 3-Clause</a> license</li>\n    <li> models with pretrained weights under the legacy <a href=\"https://github.com/ABrain-One/nn-dataset/blob/main/Doc/Licenses/LICENSE-DEEPSEEK-LLM-V2\">DeepSeek LLM V2</a> license</li>\n  </ul></li>\n<li> all neural network models and their weights not covered by the above licenses, as well as all other files and assets in this project, are subject to the <a href=\"https://github.com/ABrain-One/nn-dataset/blob/main/LICENSE\">MIT license</a></li> \n</ul>\n\n#### The idea and leadership of Dr. Ignatov\n",
    "bugtrack_url": null,
    "license": "MIT License\n        \n        Copyright (c) 2024- ABrain One and contributors\n        All rights reserved.\n        \n        Permission is hereby granted, free of charge, to any person obtaining a copy\n        of this software and associated documentation files (the \"Software\"), to deal\n        in the Software without restriction, including without limitation the rights\n        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\n        copies of the Software, and to permit persons to whom the Software is\n        furnished to do so, subject to the following conditions:\n        \n        The above copyright notice and this permission notice shall be included in all\n        copies or substantial portions of the Software.\n        \n        THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\n        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\n        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\n        SOFTWARE.\n        ",
    "summary": "Neural Network Dataset",
    "version": "2.1.1",
    "project_urls": {
        "Bug Tracker": "https://github.com/ABrain-One/nn-dataset/issues",
        "Homepage": "https://ABrain.one",
        "Repository": "https://github.com/ABrain-One/nn-dataset"
    },
    "split_keywords": [
        "neural network",
        " dataset",
        " experiment archive",
        " hyperparameters",
        " automl",
        " llm integration",
        " model validation",
        " benchmarking",
        " python library",
        " ai research"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "436f2d334595afbca72fcaaf78f82401bd617867842bec4809a526d22fe19e9e",
                "md5": "684f82d4c4a779a30ab8d8da9e885004",
                "sha256": "89f6ff998757c834b195bbe9a459dfc1fee8f1cb9ed0506a5b52d1bcf153b058"
            },
            "downloads": -1,
            "filename": "lmur-2.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "684f82d4c4a779a30ab8d8da9e885004",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 9480,
            "upload_time": "2025-10-27T09:53:41",
            "upload_time_iso_8601": "2025-10-27T09:53:41.303125Z",
            "url": "https://files.pythonhosted.org/packages/43/6f/2d334595afbca72fcaaf78f82401bd617867842bec4809a526d22fe19e9e/lmur-2.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "7451ee8ac7fb66b92bf98ad4f3577a59ebbe3c37defe3322ad38ac22da4f0f6c",
                "md5": "9f1e87afec32ceef197ab9bb750798b2",
                "sha256": "d65db92b3674193a7c12260ef9d83d35bc88266f4e98224d268c454c765f7531"
            },
            "downloads": -1,
            "filename": "lmur-2.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "9f1e87afec32ceef197ab9bb750798b2",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 10691,
            "upload_time": "2025-10-27T09:53:42",
            "upload_time_iso_8601": "2025-10-27T09:53:42.658170Z",
            "url": "https://files.pythonhosted.org/packages/74/51/ee8ac7fb66b92bf98ad4f3577a59ebbe3c37defe3322ad38ac22da4f0f6c/lmur-2.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-27 09:53:42",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "ABrain-One",
    "github_project": "nn-dataset",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "lmur"
}
        
Elapsed time: 2.83823s