Name | sim-datasets JSON |
Version |
0.1.0
JSON |
| download |
home_page | None |
Summary | A unified platform solution for symbolic regression, providing comprehensive support for Scientific-Intelligent-Modeling toolkits. Seamlessly integrates with ModelScope and Hugging Face for efficient dataset access. |
upload_time | 2025-07-13 21:48:41 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.8 |
license | GPL-3.0 |
keywords |
datasets
machine learning
scientific modeling
symbolic regression
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# SIM-Datasets
[](https://www.python.org/downloads/)
[](https://opensource.org/licenses/GPL-3.0)
[](https://badge.fury.io/py/sim-datasets)
A unified platform solution for symbolic regression, providing comprehensive support for Scientific-Intelligent-Modeling toolkits. Seamlessly integrates with ModelScope and Hugging Face for efficient dataset access.
## 🌟 Key Features
- 🔄 **Multi-Source Support**: Simultaneously supports HuggingFace and ModelScope platforms
- ⚡ **Smart Source Selection**: Automatically selects the fastest download source
- 🚀 **Concurrent Downloads**: Supports asynchronous concurrent downloads with up to 20 concurrent tasks
- 📊 **Real-time Progress**: Displays detailed download progress and status
- 📁 **Smart Caching**: Automatically caches download results to avoid repeated downloads
- 🛠️ **Command Line Tools**: Provides convenient command-line interface
- 🔧 **Proxy Support**: Complete proxy configuration support
- 📋 **Dataset Management**: Unified dataset list and configuration management
## 📦 Installation
### Install from Source
```bash
# Clone repository
git clone https://github.com/scientific-intelligent-modelling/scientific-intelligent-modelling.git
cd scientific-intelligent-modelling
# Install dependencies
pip install -e .
```
### Install from PyPI
```bash
pip install sim-datasets
```
## 🚀 Quick Start
### Basic Usage
```python
from sim_datasets import get_datasets_list, download_dataset
# Get dataset list
datasets = get_datasets_list('llm-srbench')
print(f"Found {len(datasets)} datasets")
# Download single dataset
result = download_single_dataset('llm-srbench/bio_pop_growth/BPG0')
print(f"Dataset downloaded: {result['cache_path']}")
# Download entire dataset collection
result = download_dataset('llm-srbench')
print(f"Downloaded {len(result['downloaded'])} datasets")
```
### Advanced Usage
```python
from sim_datasets import download_dataset_parallel
# Concurrent download (recommended for large datasets)
result = download_dataset_parallel(
'llm-srbench',
source='huggingface', # or 'modelscope'
max_workers=10, # number of concurrent workers
proxy='http://proxy:8080' # optional proxy
)
print(f"Successfully downloaded: {len(result['downloaded'])}")
print(f"Failed: {len(result['failed'])}")
```
## 📋 Supported Datasets
### LLM-SRBench Datasets
- **Biological Population Growth** (`bio_pop_growth`): Biological population dynamics modeling data
- **Chemical Reactions** (`chem_react`): Chemical reaction kinetics data
- **LSR Transform** (`lsrtransform`): Linear symbolic regression transform data
- **Materials Science** (`matsci`): Materials science related data
- **Physical Oscillations** (`phys_osc`): Physical oscillation system data
### SRBench 1.0 Datasets
- **Feynman Equations** (`feynman`): Feynman physics equation data
- **Strogatz Systems** (`strogatz`): Strogatz nonlinear system data
- **Black Box Functions** (`blackbox`): Black box function data
### SRSD Datasets
- **Feynman Easy** (`srsd-feynman_easy`): Simple Feynman equations
- **Feynman Medium** (`srsd-feynman_medium`): Medium difficulty Feynman equations
- **Feynman Hard** (`srsd-feynman_hard`): Hard Feynman equations
## 📄 License
This project is licensed under the [GPL-3.0](https://opensource.org/licenses/GPL-3.0) License.
## 👥 Authors
- **Ziwen Zhang** - *Lead Developer* - [244824379@qq.com](mailto:244824379@qq.com)
- **Kai Li** - *Contributor* - [kai.li@ia.ac.cn](mailto:kai.li@ia.ac.cn)
## 🙏 Acknowledgments
Thanks to the following open source projects:
- [Hugging Face](https://huggingface.co/) - Providing dataset hosting services
- [ModelScope](https://modelscope.cn/) - Providing model and dataset platform
- [datasets](https://github.com/huggingface/datasets) - Dataset processing library
## 📞 Contact Us
- Email: [244824379@qq.com](mailto:244824379@qq.com)
- Project Homepage: [https://github.com/scientific-intelligent-modelling/scientific-intelligent-modelling](https://github.com/scientific-intelligent-modelling/scientific-intelligent-modelling)
- Issue Reports: [GitHub Issues](https://github.com/scientific-intelligent-modelling/scientific-intelligent-modelling/issues)
---
⭐ If this project helps you, please give us a star!
Raw data
{
"_id": null,
"home_page": null,
"name": "sim-datasets",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "Ziwen Zhang <244824379@qq.com>",
"keywords": "datasets, machine learning, scientific modeling, symbolic regression",
"author": null,
"author_email": "Ziwen Zhang <244824379@qq.com>, Kai Li <kai.li@ia.ac.cn>",
"download_url": "https://files.pythonhosted.org/packages/48/0c/8d6fad81be6d273d98365f4caced949677ef7d6ccd72a374599af248cd28/sim_datasets-0.1.0.tar.gz",
"platform": null,
"description": "# SIM-Datasets\n\n[](https://www.python.org/downloads/)\n[](https://opensource.org/licenses/GPL-3.0)\n[](https://badge.fury.io/py/sim-datasets)\n\nA unified platform solution for symbolic regression, providing comprehensive support for Scientific-Intelligent-Modeling toolkits. Seamlessly integrates with ModelScope and Hugging Face for efficient dataset access.\n\n## \ud83c\udf1f Key Features\n\n- \ud83d\udd04 **Multi-Source Support**: Simultaneously supports HuggingFace and ModelScope platforms\n- \u26a1 **Smart Source Selection**: Automatically selects the fastest download source\n- \ud83d\ude80 **Concurrent Downloads**: Supports asynchronous concurrent downloads with up to 20 concurrent tasks\n- \ud83d\udcca **Real-time Progress**: Displays detailed download progress and status\n- \ud83d\udcc1 **Smart Caching**: Automatically caches download results to avoid repeated downloads\n- \ud83d\udee0\ufe0f **Command Line Tools**: Provides convenient command-line interface\n- \ud83d\udd27 **Proxy Support**: Complete proxy configuration support\n- \ud83d\udccb **Dataset Management**: Unified dataset list and configuration management\n\n## \ud83d\udce6 Installation\n\n### Install from Source\n\n```bash\n# Clone repository\ngit clone https://github.com/scientific-intelligent-modelling/scientific-intelligent-modelling.git\ncd scientific-intelligent-modelling\n\n# Install dependencies\npip install -e .\n```\n\n### Install from PyPI\n\n```bash\npip install sim-datasets\n```\n\n## \ud83d\ude80 Quick Start\n\n### Basic Usage\n\n```python\nfrom sim_datasets import get_datasets_list, download_dataset\n\n# Get dataset list\ndatasets = get_datasets_list('llm-srbench')\nprint(f\"Found {len(datasets)} datasets\")\n\n# Download single dataset\nresult = download_single_dataset('llm-srbench/bio_pop_growth/BPG0')\nprint(f\"Dataset downloaded: {result['cache_path']}\")\n\n# Download entire dataset collection\nresult = download_dataset('llm-srbench')\nprint(f\"Downloaded {len(result['downloaded'])} datasets\")\n```\n\n### Advanced Usage\n\n```python\nfrom sim_datasets import download_dataset_parallel\n\n# Concurrent download (recommended for large datasets)\nresult = download_dataset_parallel(\n 'llm-srbench',\n source='huggingface', # or 'modelscope'\n max_workers=10, # number of concurrent workers\n proxy='http://proxy:8080' # optional proxy\n)\n\nprint(f\"Successfully downloaded: {len(result['downloaded'])}\")\nprint(f\"Failed: {len(result['failed'])}\")\n```\n\n## \ud83d\udccb Supported Datasets\n\n### LLM-SRBench Datasets\n- **Biological Population Growth** (`bio_pop_growth`): Biological population dynamics modeling data\n- **Chemical Reactions** (`chem_react`): Chemical reaction kinetics data\n- **LSR Transform** (`lsrtransform`): Linear symbolic regression transform data\n- **Materials Science** (`matsci`): Materials science related data\n- **Physical Oscillations** (`phys_osc`): Physical oscillation system data\n\n### SRBench 1.0 Datasets\n- **Feynman Equations** (`feynman`): Feynman physics equation data\n- **Strogatz Systems** (`strogatz`): Strogatz nonlinear system data\n- **Black Box Functions** (`blackbox`): Black box function data\n\n### SRSD Datasets\n- **Feynman Easy** (`srsd-feynman_easy`): Simple Feynman equations\n- **Feynman Medium** (`srsd-feynman_medium`): Medium difficulty Feynman equations\n- **Feynman Hard** (`srsd-feynman_hard`): Hard Feynman equations\n\n## \ud83d\udcc4 License\n\nThis project is licensed under the [GPL-3.0](https://opensource.org/licenses/GPL-3.0) License.\n\n## \ud83d\udc65 Authors\n\n- **Ziwen Zhang** - *Lead Developer* - [244824379@qq.com](mailto:244824379@qq.com)\n- **Kai Li** - *Contributor* - [kai.li@ia.ac.cn](mailto:kai.li@ia.ac.cn)\n\n## \ud83d\ude4f Acknowledgments\n\nThanks to the following open source projects:\n\n- [Hugging Face](https://huggingface.co/) - Providing dataset hosting services\n- [ModelScope](https://modelscope.cn/) - Providing model and dataset platform\n- [datasets](https://github.com/huggingface/datasets) - Dataset processing library\n\n## \ud83d\udcde Contact Us\n\n- Email: [244824379@qq.com](mailto:244824379@qq.com)\n- Project Homepage: [https://github.com/scientific-intelligent-modelling/scientific-intelligent-modelling](https://github.com/scientific-intelligent-modelling/scientific-intelligent-modelling)\n- Issue Reports: [GitHub Issues](https://github.com/scientific-intelligent-modelling/scientific-intelligent-modelling/issues)\n\n---\n\n\u2b50 If this project helps you, please give us a star! ",
"bugtrack_url": null,
"license": "GPL-3.0",
"summary": "A unified platform solution for symbolic regression, providing comprehensive support for Scientific-Intelligent-Modeling toolkits. Seamlessly integrates with ModelScope and Hugging Face for efficient dataset access.",
"version": "0.1.0",
"project_urls": {
"Documentation": "https://github.com/scientific-intelligent-modelling/scientific-intelligent-modelling",
"Homepage": "https://github.com/scientific-intelligent-modelling/scientific-intelligent-modelling",
"Issues": "https://github.com/scientific-intelligent-modelling/scientific-intelligent-modelling/issues",
"Repository": "https://github.com/scientific-intelligent-modelling/scientific-intelligent-modelling"
},
"split_keywords": [
"datasets",
" machine learning",
" scientific modeling",
" symbolic regression"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "2a62862c692815308b3fcde5eae1556356e7987377640b9333a53431d097fe78",
"md5": "6e7b38efd6971a5d212911c19d83726a",
"sha256": "7cf3a8a97cc344c6d55305f9c55816a8131877447444b3c86a9e312f1accee21"
},
"downloads": -1,
"filename": "sim_datasets-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "6e7b38efd6971a5d212911c19d83726a",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 24931,
"upload_time": "2025-07-13T21:48:40",
"upload_time_iso_8601": "2025-07-13T21:48:40.421688Z",
"url": "https://files.pythonhosted.org/packages/2a/62/862c692815308b3fcde5eae1556356e7987377640b9333a53431d097fe78/sim_datasets-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "480c8d6fad81be6d273d98365f4caced949677ef7d6ccd72a374599af248cd28",
"md5": "3bee68ca185c2a97f4d8bfc8bdaaa207",
"sha256": "c9e1dce185d2b844889ea58efa20d5fc4577fa6faa5c78ba0da0e0ba7d22501e"
},
"downloads": -1,
"filename": "sim_datasets-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "3bee68ca185c2a97f4d8bfc8bdaaa207",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 15383,
"upload_time": "2025-07-13T21:48:41",
"upload_time_iso_8601": "2025-07-13T21:48:41.906920Z",
"url": "https://files.pythonhosted.org/packages/48/0c/8d6fad81be6d273d98365f4caced949677ef7d6ccd72a374599af248cd28/sim_datasets-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-13 21:48:41",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "scientific-intelligent-modelling",
"github_project": "scientific-intelligent-modelling",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "sim-datasets"
}