sim-datasets


Namesim-datasets JSON
Version 0.1.0 PyPI version JSON
download
home_pageNone
SummaryA unified platform solution for symbolic regression, providing comprehensive support for Scientific-Intelligent-Modeling toolkits. Seamlessly integrates with ModelScope and Hugging Face for efficient dataset access.
upload_time2025-07-13 21:48:41
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseGPL-3.0
keywords datasets machine learning scientific modeling symbolic regression
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # SIM-Datasets

[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![License: GPL-3.0](https://img.shields.io/badge/License-GPL%203.0-green.svg)](https://opensource.org/licenses/GPL-3.0)
[![PyPI version](https://badge.fury.io/py/sim-datasets.svg)](https://badge.fury.io/py/sim-datasets)

A unified platform solution for symbolic regression, providing comprehensive support for Scientific-Intelligent-Modeling toolkits. Seamlessly integrates with ModelScope and Hugging Face for efficient dataset access.

## 🌟 Key Features

- 🔄 **Multi-Source Support**: Simultaneously supports HuggingFace and ModelScope platforms
- ⚡ **Smart Source Selection**: Automatically selects the fastest download source
- 🚀 **Concurrent Downloads**: Supports asynchronous concurrent downloads with up to 20 concurrent tasks
- 📊 **Real-time Progress**: Displays detailed download progress and status
- 📁 **Smart Caching**: Automatically caches download results to avoid repeated downloads
- 🛠️ **Command Line Tools**: Provides convenient command-line interface
- 🔧 **Proxy Support**: Complete proxy configuration support
- 📋 **Dataset Management**: Unified dataset list and configuration management

## 📦 Installation

### Install from Source

```bash
# Clone repository
git clone https://github.com/scientific-intelligent-modelling/scientific-intelligent-modelling.git
cd scientific-intelligent-modelling

# Install dependencies
pip install -e .
```

### Install from PyPI

```bash
pip install sim-datasets
```

## 🚀 Quick Start

### Basic Usage

```python
from sim_datasets import get_datasets_list, download_dataset

# Get dataset list
datasets = get_datasets_list('llm-srbench')
print(f"Found {len(datasets)} datasets")

# Download single dataset
result = download_single_dataset('llm-srbench/bio_pop_growth/BPG0')
print(f"Dataset downloaded: {result['cache_path']}")

# Download entire dataset collection
result = download_dataset('llm-srbench')
print(f"Downloaded {len(result['downloaded'])} datasets")
```

### Advanced Usage

```python
from sim_datasets import download_dataset_parallel

# Concurrent download (recommended for large datasets)
result = download_dataset_parallel(
    'llm-srbench',
    source='huggingface',  # or 'modelscope'
    max_workers=10,        # number of concurrent workers
    proxy='http://proxy:8080'  # optional proxy
)

print(f"Successfully downloaded: {len(result['downloaded'])}")
print(f"Failed: {len(result['failed'])}")
```

## 📋 Supported Datasets

### LLM-SRBench Datasets
- **Biological Population Growth** (`bio_pop_growth`): Biological population dynamics modeling data
- **Chemical Reactions** (`chem_react`): Chemical reaction kinetics data
- **LSR Transform** (`lsrtransform`): Linear symbolic regression transform data
- **Materials Science** (`matsci`): Materials science related data
- **Physical Oscillations** (`phys_osc`): Physical oscillation system data

### SRBench 1.0 Datasets
- **Feynman Equations** (`feynman`): Feynman physics equation data
- **Strogatz Systems** (`strogatz`): Strogatz nonlinear system data
- **Black Box Functions** (`blackbox`): Black box function data

### SRSD Datasets
- **Feynman Easy** (`srsd-feynman_easy`): Simple Feynman equations
- **Feynman Medium** (`srsd-feynman_medium`): Medium difficulty Feynman equations
- **Feynman Hard** (`srsd-feynman_hard`): Hard Feynman equations

## 📄 License

This project is licensed under the [GPL-3.0](https://opensource.org/licenses/GPL-3.0) License.

## 👥 Authors

- **Ziwen Zhang** - *Lead Developer* - [244824379@qq.com](mailto:244824379@qq.com)
- **Kai Li** - *Contributor* - [kai.li@ia.ac.cn](mailto:kai.li@ia.ac.cn)

## 🙏 Acknowledgments

Thanks to the following open source projects:

- [Hugging Face](https://huggingface.co/) - Providing dataset hosting services
- [ModelScope](https://modelscope.cn/) - Providing model and dataset platform
- [datasets](https://github.com/huggingface/datasets) - Dataset processing library

## 📞 Contact Us

- Email: [244824379@qq.com](mailto:244824379@qq.com)
- Project Homepage: [https://github.com/scientific-intelligent-modelling/scientific-intelligent-modelling](https://github.com/scientific-intelligent-modelling/scientific-intelligent-modelling)
- Issue Reports: [GitHub Issues](https://github.com/scientific-intelligent-modelling/scientific-intelligent-modelling/issues)

---

⭐ If this project helps you, please give us a star! 
            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "sim-datasets",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "Ziwen Zhang <244824379@qq.com>",
    "keywords": "datasets, machine learning, scientific modeling, symbolic regression",
    "author": null,
    "author_email": "Ziwen Zhang <244824379@qq.com>, Kai Li <kai.li@ia.ac.cn>",
    "download_url": "https://files.pythonhosted.org/packages/48/0c/8d6fad81be6d273d98365f4caced949677ef7d6ccd72a374599af248cd28/sim_datasets-0.1.0.tar.gz",
    "platform": null,
    "description": "# SIM-Datasets\n\n[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)\n[![License: GPL-3.0](https://img.shields.io/badge/License-GPL%203.0-green.svg)](https://opensource.org/licenses/GPL-3.0)\n[![PyPI version](https://badge.fury.io/py/sim-datasets.svg)](https://badge.fury.io/py/sim-datasets)\n\nA unified platform solution for symbolic regression, providing comprehensive support for Scientific-Intelligent-Modeling toolkits. Seamlessly integrates with ModelScope and Hugging Face for efficient dataset access.\n\n## \ud83c\udf1f Key Features\n\n- \ud83d\udd04 **Multi-Source Support**: Simultaneously supports HuggingFace and ModelScope platforms\n- \u26a1 **Smart Source Selection**: Automatically selects the fastest download source\n- \ud83d\ude80 **Concurrent Downloads**: Supports asynchronous concurrent downloads with up to 20 concurrent tasks\n- \ud83d\udcca **Real-time Progress**: Displays detailed download progress and status\n- \ud83d\udcc1 **Smart Caching**: Automatically caches download results to avoid repeated downloads\n- \ud83d\udee0\ufe0f **Command Line Tools**: Provides convenient command-line interface\n- \ud83d\udd27 **Proxy Support**: Complete proxy configuration support\n- \ud83d\udccb **Dataset Management**: Unified dataset list and configuration management\n\n## \ud83d\udce6 Installation\n\n### Install from Source\n\n```bash\n# Clone repository\ngit clone https://github.com/scientific-intelligent-modelling/scientific-intelligent-modelling.git\ncd scientific-intelligent-modelling\n\n# Install dependencies\npip install -e .\n```\n\n### Install from PyPI\n\n```bash\npip install sim-datasets\n```\n\n## \ud83d\ude80 Quick Start\n\n### Basic Usage\n\n```python\nfrom sim_datasets import get_datasets_list, download_dataset\n\n# Get dataset list\ndatasets = get_datasets_list('llm-srbench')\nprint(f\"Found {len(datasets)} datasets\")\n\n# Download single dataset\nresult = download_single_dataset('llm-srbench/bio_pop_growth/BPG0')\nprint(f\"Dataset downloaded: {result['cache_path']}\")\n\n# Download entire dataset collection\nresult = download_dataset('llm-srbench')\nprint(f\"Downloaded {len(result['downloaded'])} datasets\")\n```\n\n### Advanced Usage\n\n```python\nfrom sim_datasets import download_dataset_parallel\n\n# Concurrent download (recommended for large datasets)\nresult = download_dataset_parallel(\n    'llm-srbench',\n    source='huggingface',  # or 'modelscope'\n    max_workers=10,        # number of concurrent workers\n    proxy='http://proxy:8080'  # optional proxy\n)\n\nprint(f\"Successfully downloaded: {len(result['downloaded'])}\")\nprint(f\"Failed: {len(result['failed'])}\")\n```\n\n## \ud83d\udccb Supported Datasets\n\n### LLM-SRBench Datasets\n- **Biological Population Growth** (`bio_pop_growth`): Biological population dynamics modeling data\n- **Chemical Reactions** (`chem_react`): Chemical reaction kinetics data\n- **LSR Transform** (`lsrtransform`): Linear symbolic regression transform data\n- **Materials Science** (`matsci`): Materials science related data\n- **Physical Oscillations** (`phys_osc`): Physical oscillation system data\n\n### SRBench 1.0 Datasets\n- **Feynman Equations** (`feynman`): Feynman physics equation data\n- **Strogatz Systems** (`strogatz`): Strogatz nonlinear system data\n- **Black Box Functions** (`blackbox`): Black box function data\n\n### SRSD Datasets\n- **Feynman Easy** (`srsd-feynman_easy`): Simple Feynman equations\n- **Feynman Medium** (`srsd-feynman_medium`): Medium difficulty Feynman equations\n- **Feynman Hard** (`srsd-feynman_hard`): Hard Feynman equations\n\n## \ud83d\udcc4 License\n\nThis project is licensed under the [GPL-3.0](https://opensource.org/licenses/GPL-3.0) License.\n\n## \ud83d\udc65 Authors\n\n- **Ziwen Zhang** - *Lead Developer* - [244824379@qq.com](mailto:244824379@qq.com)\n- **Kai Li** - *Contributor* - [kai.li@ia.ac.cn](mailto:kai.li@ia.ac.cn)\n\n## \ud83d\ude4f Acknowledgments\n\nThanks to the following open source projects:\n\n- [Hugging Face](https://huggingface.co/) - Providing dataset hosting services\n- [ModelScope](https://modelscope.cn/) - Providing model and dataset platform\n- [datasets](https://github.com/huggingface/datasets) - Dataset processing library\n\n## \ud83d\udcde Contact Us\n\n- Email: [244824379@qq.com](mailto:244824379@qq.com)\n- Project Homepage: [https://github.com/scientific-intelligent-modelling/scientific-intelligent-modelling](https://github.com/scientific-intelligent-modelling/scientific-intelligent-modelling)\n- Issue Reports: [GitHub Issues](https://github.com/scientific-intelligent-modelling/scientific-intelligent-modelling/issues)\n\n---\n\n\u2b50 If this project helps you, please give us a star! ",
    "bugtrack_url": null,
    "license": "GPL-3.0",
    "summary": "A unified platform solution for symbolic regression, providing comprehensive support for Scientific-Intelligent-Modeling toolkits. Seamlessly integrates with ModelScope and Hugging Face for efficient dataset access.",
    "version": "0.1.0",
    "project_urls": {
        "Documentation": "https://github.com/scientific-intelligent-modelling/scientific-intelligent-modelling",
        "Homepage": "https://github.com/scientific-intelligent-modelling/scientific-intelligent-modelling",
        "Issues": "https://github.com/scientific-intelligent-modelling/scientific-intelligent-modelling/issues",
        "Repository": "https://github.com/scientific-intelligent-modelling/scientific-intelligent-modelling"
    },
    "split_keywords": [
        "datasets",
        " machine learning",
        " scientific modeling",
        " symbolic regression"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "2a62862c692815308b3fcde5eae1556356e7987377640b9333a53431d097fe78",
                "md5": "6e7b38efd6971a5d212911c19d83726a",
                "sha256": "7cf3a8a97cc344c6d55305f9c55816a8131877447444b3c86a9e312f1accee21"
            },
            "downloads": -1,
            "filename": "sim_datasets-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6e7b38efd6971a5d212911c19d83726a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 24931,
            "upload_time": "2025-07-13T21:48:40",
            "upload_time_iso_8601": "2025-07-13T21:48:40.421688Z",
            "url": "https://files.pythonhosted.org/packages/2a/62/862c692815308b3fcde5eae1556356e7987377640b9333a53431d097fe78/sim_datasets-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "480c8d6fad81be6d273d98365f4caced949677ef7d6ccd72a374599af248cd28",
                "md5": "3bee68ca185c2a97f4d8bfc8bdaaa207",
                "sha256": "c9e1dce185d2b844889ea58efa20d5fc4577fa6faa5c78ba0da0e0ba7d22501e"
            },
            "downloads": -1,
            "filename": "sim_datasets-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "3bee68ca185c2a97f4d8bfc8bdaaa207",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 15383,
            "upload_time": "2025-07-13T21:48:41",
            "upload_time_iso_8601": "2025-07-13T21:48:41.906920Z",
            "url": "https://files.pythonhosted.org/packages/48/0c/8d6fad81be6d273d98365f4caced949677ef7d6ccd72a374599af248cd28/sim_datasets-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-13 21:48:41",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "scientific-intelligent-modelling",
    "github_project": "scientific-intelligent-modelling",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "sim-datasets"
}
        
Elapsed time: 0.46326s