# Dataset with Logits
A PyTorch package for loading computer vision datasets paired with pre-computed model logits. Perfect for knowledge distillation, model analysis, and efficient research workflows.
## 🚀 Quick Start
```bash
pip install dataset-with-logits
```
```python
import torchvision.transforms as transforms
from dataset_with_logits import ImageNet
# Define transforms
transform = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
])
# Create dataset (auto-downloads predictions)
dataset = ImageNet(
root='/path/to/imagenet/val',
model='resnet18',
transform=transform,
auto_download=True
)
# Use with DataLoader
from torch.utils.data import DataLoader
loader = DataLoader(dataset, batch_size=32, shuffle=True)
for images, labels, logits in loader:
# images: [batch_size, 3, 224, 224]
# labels: [batch_size] - ground truth
# logits: [batch_size, 1000] - model predictions
break
```
## 📊 Available Models
### ImageNet-1K
- `resnet18` - ResNet-18 (11.7M parameters)
- `resnet50` - ResNet-50 (25.6M parameters)
- `resnet152` - ResNet-152 (60.2M parameters)
- `vit_l_16` - Vision Transformer Large (304M parameters)
- `mobilenet_v3_small` - MobileNet V3 Small (2.5M parameters)
- `mobilenet_v3_large` - MobileNet V3 Large (5.5M parameters)
More models and datasets coming soon!
## 🎯 Use Cases
### Knowledge Distillation
```python
import torch.nn.functional as F
def knowledge_distillation_loss(student_logits, teacher_logits, labels, temperature=3.0):
student_soft = F.log_softmax(student_logits / temperature, dim=1)
teacher_soft = F.softmax(teacher_logits / temperature, dim=1)
return F.kl_div(student_soft, teacher_soft, reduction='batchmean')
# In your training loop
for images, labels, teacher_logits in dataloader:
student_logits = student_model(images)
loss = knowledge_distillation_loss(student_logits, teacher_logits, labels)
```
### Model Analysis
```python
from dataset_with_logits import ImageNet
# Compare different models
models = ['resnet18', 'resnet152', 'vit_l_16']
datasets = {}
for model in models:
datasets[model] = ImageNet(root=imagenet_path, model=model)
# Analyze prediction differences, calibration, etc.
```
## 🔧 Advanced Usage
### List Available Models
```python
from dataset_with_logits import list_available_models
models = list_available_models()
print(models)
# {'imagenet1k': {'resnet18': 'ResNet-18 (11.7M parameters)', ...}}
```
### Custom Cache Directory
```python
dataset = ImageNet(
root='/path/to/imagenet',
model='resnet18',
cache_dir='/custom/cache/dir',
auto_download=True
)
```
### Version Control
```python
dataset = ImageNet(
root='/path/to/imagenet',
model='resnet18',
version='v0.1.0', # Specific version
auto_download=True
)
```
## 📁 File Format
Prediction files are CSV format with:
- `id`: Image filename (no extension)
- `label`: Ground truth class index
- `logits`: Semicolon-separated model outputs
Example:
```csv
id,label,logits
ILSVRC2012_val_00000001,65,-2.3;1.7;0.2;...;0.8
ILSVRC2012_val_00000002,970,0.1;-1.2;3.4;...;-0.5
```
## 🌐 Data Source
Prediction files are automatically downloaded from **Hugging Face Hub** (primary) with GitHub fallback. Files are cached locally after first download.
**Hosting Infrastructure:**
- 🤗 **Primary**: [Hugging Face Datasets](https://huggingface.co/datasets/ViGeng/prediction-datasets) - Fast, reliable, academic-friendly
- 🐙 **Fallback**: GitHub Releases - For redundancy
- 📦 **Multi-backend**: Automatic fallback ensures high availability
## 🔍 Examples
See the `examples/` directory for:
- Basic usage
- Knowledge distillation
- Model comparison
- Advanced workflows
## 📦 Installation
### From PyPI (Recommended)
```bash
pip install dataset-with-logits
```
### From Source
```bash
git clone https://github.com/ViGeng/predictions-on-datasets.git
cd predictions-on-datasets/dataset_with_logits
pip install -e .
```
## 🤝 Contributing
Contributions are welcome! See the main repository for contribution guidelines.
## 📄 License
MIT License - see LICENSE file for details.
Raw data
{
"_id": null,
"home_page": null,
"name": "dataset-with-logits",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "pytorch, imagenet, cifar, dataset, computer-vision, machine-learning, knowledge-distillation, deep-learning, logits, pretrained-models",
"author": null,
"author_email": "ViGeng <your.email@example.com>",
"download_url": "https://files.pythonhosted.org/packages/a7/37/598d02a3a33796e362c6c68b54d72be1345de5344043ebd42a60ee204a49/dataset_with_logits-0.2.9.tar.gz",
"platform": null,
"description": "# Dataset with Logits\n\nA PyTorch package for loading computer vision datasets paired with pre-computed model logits. Perfect for knowledge distillation, model analysis, and efficient research workflows.\n\n## \ud83d\ude80 Quick Start\n\n```bash\npip install dataset-with-logits\n```\n\n```python\nimport torchvision.transforms as transforms\nfrom dataset_with_logits import ImageNet\n\n# Define transforms\ntransform = transforms.Compose([\n transforms.Resize(256),\n transforms.CenterCrop(224),\n transforms.ToTensor(),\n])\n\n# Create dataset (auto-downloads predictions)\ndataset = ImageNet(\n root='/path/to/imagenet/val',\n model='resnet18',\n transform=transform,\n auto_download=True\n)\n\n# Use with DataLoader\nfrom torch.utils.data import DataLoader\nloader = DataLoader(dataset, batch_size=32, shuffle=True)\n\nfor images, labels, logits in loader:\n # images: [batch_size, 3, 224, 224] \n # labels: [batch_size] - ground truth\n # logits: [batch_size, 1000] - model predictions\n break\n```\n\n## \ud83d\udcca Available Models\n\n### ImageNet-1K\n- `resnet18` - ResNet-18 (11.7M parameters)\n- `resnet50` - ResNet-50 (25.6M parameters) \n- `resnet152` - ResNet-152 (60.2M parameters)\n- `vit_l_16` - Vision Transformer Large (304M parameters)\n- `mobilenet_v3_small` - MobileNet V3 Small (2.5M parameters)\n- `mobilenet_v3_large` - MobileNet V3 Large (5.5M parameters)\n\nMore models and datasets coming soon!\n\n## \ud83c\udfaf Use Cases\n\n### Knowledge Distillation\n```python\nimport torch.nn.functional as F\n\ndef knowledge_distillation_loss(student_logits, teacher_logits, labels, temperature=3.0):\n student_soft = F.log_softmax(student_logits / temperature, dim=1)\n teacher_soft = F.softmax(teacher_logits / temperature, dim=1)\n return F.kl_div(student_soft, teacher_soft, reduction='batchmean')\n\n# In your training loop\nfor images, labels, teacher_logits in dataloader:\n student_logits = student_model(images)\n loss = knowledge_distillation_loss(student_logits, teacher_logits, labels)\n```\n\n### Model Analysis\n```python\nfrom dataset_with_logits import ImageNet\n\n# Compare different models\nmodels = ['resnet18', 'resnet152', 'vit_l_16']\ndatasets = {}\n\nfor model in models:\n datasets[model] = ImageNet(root=imagenet_path, model=model)\n\n# Analyze prediction differences, calibration, etc.\n```\n\n## \ud83d\udd27 Advanced Usage\n\n### List Available Models\n```python\nfrom dataset_with_logits import list_available_models\n\nmodels = list_available_models()\nprint(models)\n# {'imagenet1k': {'resnet18': 'ResNet-18 (11.7M parameters)', ...}}\n```\n\n### Custom Cache Directory\n```python\ndataset = ImageNet(\n root='/path/to/imagenet',\n model='resnet18',\n cache_dir='/custom/cache/dir',\n auto_download=True\n)\n```\n\n### Version Control\n```python\ndataset = ImageNet(\n root='/path/to/imagenet',\n model='resnet18',\n version='v0.1.0', # Specific version\n auto_download=True\n)\n```\n\n## \ud83d\udcc1 File Format\n\nPrediction files are CSV format with:\n- `id`: Image filename (no extension)\n- `label`: Ground truth class index \n- `logits`: Semicolon-separated model outputs\n\nExample:\n```csv\nid,label,logits\nILSVRC2012_val_00000001,65,-2.3;1.7;0.2;...;0.8\nILSVRC2012_val_00000002,970,0.1;-1.2;3.4;...;-0.5\n```\n\n## \ud83c\udf10 Data Source\n\nPrediction files are automatically downloaded from **Hugging Face Hub** (primary) with GitHub fallback. Files are cached locally after first download.\n\n**Hosting Infrastructure:**\n- \ud83e\udd17 **Primary**: [Hugging Face Datasets](https://huggingface.co/datasets/ViGeng/prediction-datasets) - Fast, reliable, academic-friendly\n- \ud83d\udc19 **Fallback**: GitHub Releases - For redundancy\n- \ud83d\udce6 **Multi-backend**: Automatic fallback ensures high availability\n\n## \ud83d\udd0d Examples\n\nSee the `examples/` directory for:\n- Basic usage\n- Knowledge distillation\n- Model comparison\n- Advanced workflows\n\n## \ud83d\udce6 Installation\n\n### From PyPI (Recommended)\n```bash\npip install dataset-with-logits\n```\n\n### From Source\n```bash\ngit clone https://github.com/ViGeng/predictions-on-datasets.git\ncd predictions-on-datasets/dataset_with_logits\npip install -e .\n```\n\n## \ud83e\udd1d Contributing\n\nContributions are welcome! See the main repository for contribution guidelines.\n\n## \ud83d\udcc4 License\n\nMIT License - see LICENSE file for details.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "PyTorch datasets with pre-computed model logits for efficient research",
"version": "0.2.9",
"project_urls": {
"Bug Tracker": "https://github.com/ViGeng/predictions-on-datasets/issues",
"Documentation": "https://github.com/ViGeng/predictions-on-datasets#readme",
"Homepage": "https://github.com/ViGeng/predictions-on-datasets",
"Repository": "https://github.com/ViGeng/predictions-on-datasets"
},
"split_keywords": [
"pytorch",
" imagenet",
" cifar",
" dataset",
" computer-vision",
" machine-learning",
" knowledge-distillation",
" deep-learning",
" logits",
" pretrained-models"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "2de12a3b21e6c0f0e753aca34b65dcfb6d85df518abe6843011d8f4041d947b1",
"md5": "583823b932e96f525b37bb989f3748a0",
"sha256": "f27724d2e22bc7e9f90fc5007b36335155eaae0bf246d527d32b6132d01a4942"
},
"downloads": -1,
"filename": "dataset_with_logits-0.2.9-py3-none-any.whl",
"has_sig": false,
"md5_digest": "583823b932e96f525b37bb989f3748a0",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 13805,
"upload_time": "2025-08-08T01:24:20",
"upload_time_iso_8601": "2025-08-08T01:24:20.519948Z",
"url": "https://files.pythonhosted.org/packages/2d/e1/2a3b21e6c0f0e753aca34b65dcfb6d85df518abe6843011d8f4041d947b1/dataset_with_logits-0.2.9-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "a737598d02a3a33796e362c6c68b54d72be1345de5344043ebd42a60ee204a49",
"md5": "74b72bf52c17d4fde61f8c671dffe04a",
"sha256": "84e0d10735133d14c19e9f2f7c4a0acc8306765e8275163c4a8741b90c6e69ab"
},
"downloads": -1,
"filename": "dataset_with_logits-0.2.9.tar.gz",
"has_sig": false,
"md5_digest": "74b72bf52c17d4fde61f8c671dffe04a",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 15579,
"upload_time": "2025-08-08T01:24:21",
"upload_time_iso_8601": "2025-08-08T01:24:21.803082Z",
"url": "https://files.pythonhosted.org/packages/a7/37/598d02a3a33796e362c6c68b54d72be1345de5344043ebd42a60ee204a49/dataset_with_logits-0.2.9.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-08 01:24:21",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "ViGeng",
"github_project": "predictions-on-datasets",
"github_not_found": true,
"lcname": "dataset-with-logits"
}