hfdol

Name	hfdol JSON
Version	0.1.16 JSON
	download
home_page	https://github.com/thorwhalen/hfdol
Summary	Simple Mapping interface to HuggingFace
upload_time	2025-10-28 00:21:43
maintainer	None
docs_url	None
author	Thor Whalen
requires_python	None
license	mit
keywords	datasets data science artificial intelligence ai
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # hfdol

Simple Mapping interface to HuggingFace.

(Note -- was [hf](https://pypi.org/project/hf/0.0.14/) but realeased the name to Huggingface itself for their tool.)

To install:	```pip install hfdol```

You'll also need a Hugginface token. See [more about this here](https://huggingface.co/docs/huggingface_hub/en/quick-start).


## Motivation

The Python packages [`datasets`](https://github.com/huggingface/datasets) and [`huggingface_hub`](https://github.com/huggingface/huggingface_hub) provide a remarkably clean, well-documented, and comprehensive API for accessing datasets, models, spaces, and papers hosted on [Hugging Face](https://huggingface.co).  
Yet, as elegant as these APIs are, they remain *their own language*. Every library—no matter how intuitive—inevitably carries its own conventions, abstractions, and domain-specific semantics. When working with one or two APIs, this diversity is harmless, even stimulating. But when juggling dozens or hundreds of them, the cognitive overhead accumulates.

Despite their differences, most APIs share a small set of universal primitives — *retrieve something by key, list what's available, check existence, store, update, delete*.  
In Python, these operations are embodied by the `Mapping` interface, the conceptual model behind dictionaries. It's a minimal, ubiquitous, and instantly recognizable abstraction.  

This package offers such a `Mapping`-based façade to Hugging Face datasets and models, allowing you to browse, query, and access them as if they were simple Python dictionaries. The goal isn't to replace the original API, but to provide a thin, ergonomic layer for the most common operations — so you can spend less time remembering syntax, and more time working with data.

## Examples

This package provides four ready-to-use singleton instances, each offering a dictionary-like interface to different types of HuggingFace resources:

```python
import hfdoldol
```

### Working with Datasets

The `hfdol.datasets` singleton provides a `Mapping` (i.e. read-only-dictionary-like) interface to HuggingFace datasets:

#### List Local Datasets

As with dictionaries, `hfdol.datasets` is an iterable. An iterable of keys. 
The keys are repository ids for those datasets you've downloaded. 
See what datasets you already have cached locally like this:

```python
list(hfdol.datasets)  # Lists locally cached datasets
# ['stingning/ultrachat', 'allenai/WildChat-1M', 'google-research-datasets/go_emotions']
```

#### Access Local Datasets

The values of `hfdol.datasets` are the `DatasetDict` 
(from Huggingface's `datasets` package) instances that give you access to the dataset.
If you already have the dataset downloaded locally, it will load it from there, 
if not it will download it, then give it to you (and it will be cached locally 
for the next time you access it). 

```python
data = hfdol.datasets['stingning/ultrachat']  # Loads the dataset
print(data)  # Shows dataset information and structure
```

#### Search for Remote Datasets

`hfdol.datasets` also offers a search functionality, so you can search "remote" 
repositories:

```python
# Search for music-related datasets
search_results = hfdol.datasets.search('music', gated=False)
print(f"search_results is a {type(search_results).__name__}")  # It's a generator

# Get the first result (it will be a `DatasetInfo` instance contain information on the dataset)
result = next(search_results)
print(f"Dataset ID: {result.id}")
print(f"Description: {result.description[:80]}...")

# Download and use it directly
data = hfdol.datasets[result]  # You can pass the DatasetInfo object directly
```

Note that the `gated=False` was to make sure you get models that you have access to. 
For more search options, see the [HuggingFace Hub documentation](https://huggingface.co/docs/huggingface_hub/package_reference/hf_api#huggingface_hub.HfApi.list_datasets).

#### A useful recipe: Get a table of result infos

You can use this to get a dataframe of the first/next `n` results of the results iterable:

```py
def table_of_results(results, n=10):
    import itertools, operator, pandas as pd

    results_table = pd.DataFrame(  # make a table with
        map(
            operator.attrgetter('__dict__'),  # the attributes dicts
            itertools.islice(results, n),  # ... of the first 10 search results
        )
    )
    return results_table
```

Example:

```py
results_table = table_of_results(search_results)
results_table
```

                              id            author                                       sha ...
    0   Genius-Society/hoyoMusic    Genius-Society  4f7e5120c0e8e26213d4bb3b52bcce76e69dfce4 ...
    1      Genius-Society/emo163    Genius-Society  6b8c3526b66940ddaedf15602d01083d24eb370c ...
    2  ccmusic-database/acapella  ccmusic-database  4cb8a4d4cb58cc55f30cb8c7a180fee1b5576dc5 ...
    3    ccmusic-database/pianos  ccmusic-database  db2b3f74c4c989b4fbda4b309e6bc925bfd8f5d1 ...
    ...


### Working with Models

The `hfdol.models` singleton provides the same dictionary-like interface for models:

#### Search for Models

Find models by keywords:

```python
model_search_results = hfdol.models.search('embeddings', gated=False)
model_result = next(model_search_results)
print(f"Model: {model_result.id}")
```

#### Download Models

Get the local path to a model (downloads if not cached):

```python
model_path = hfdol.models[model_result]
print(f"Model downloaded to: {model_path}")
```

#### List Local Models

See what models you have cached:

```python
list(hfdol.models)  # Lists all locally cached models
```

### Working with Spaces

The `hfdol.spaces` singleton provides access to HuggingFace Spaces (interactive ML demos and applications):

#### Search for Spaces

Find interesting Spaces by keywords:

```python
space_search_results = hfdol.spaces.search('gradio', limit=5)
space_result = next(space_search_results)
print(f"Space: {space_result.id}")
```

#### Access Space Information

Get detailed information about a Space:

```python
space_info = hfdol.spaces[space_result]
print(f"Space info: {space_info}")
```

#### List Local Spaces

See what spaces you have cached locally:

```python
list(hfdol.spaces)  # Lists all locally cached spaces
```

### Working with Papers

The `hfdol.papers` singleton provides access to research papers hosted on HuggingFace:

#### Search for Papers

Find research papers by topic:

```python
paper_search_results = hfdol.papers.search('transformer', limit=5)
paper_result = next(paper_search_results)
print(f"Paper: {paper_result.id}")
```

#### Access Paper Information

Get detailed information about a paper:

```python
paper_info = hfdol.papers[paper_result]
print(f"Paper title: {paper_info.title}")
print(f"Abstract: {paper_info.summary[:100]}...")
```

Note: Papers are metadata objects only—they contain information about research papers but don't have downloadable files like datasets or models.

### Getting Repository Sizes

You can check the size of any repository before downloading using the `get_size` function. The `repo_type` parameter is required to avoid ambiguity when repositories exist as multiple types:

```python
from hfdol import get_size

# Get size of a dataset (specify repo_type explicitly)
dataset_size = get_size('ccmusic-database/music_genre', repo_type='dataset')
print(f"Dataset size: {dataset_size:.2f} GiB")

# Get size of a model 
model_size = get_size('ccmusic-database/music_genre', repo_type='model')
print(f"Model size: {model_size:.2f} GiB")

# Using RepoType enum for type safety
from hfdol.base import RepoType
size_with_enum = get_size('some-repo', repo_type=RepoType.DATASET)

# Get size in different units (e.g., bytes)
size_in_bytes = get_size('some-repo', repo_type='dataset', unit_bytes=1)
```

**Pro tip**: Use the singleton instances for automatic repo_type handling:
```python
# These automatically know their repo_type
dataset_size = hfdol.datasets.get_size('ccmusic-database/music_genre')
model_size = hfdol.models.get_size('ccmusic-database/music_genre')
```

### Unified Interface

The beauty of this approach is that whether you're working with datasets, models, spaces, or papers, the interface remains familiar and consistent—just like working with Python dictionaries. All four singleton instances support the same core operations:

- **Dictionary-style access**: `resource = hfdol.datasets[key]`, `model_path = hfdol.models[key]`
- **Local listing**: `list(hfdol.datasets)`, `list(hfdol.models)` 
- **Remote searching**: `hfdol.datasets.search(query)`, `hfdol.models.search(query)`
- **Existence checking**: `key in hfdol.datasets`, `key in hfdol.models`

This unified interface means you can switch between different types of HuggingFace resources without learning new APIs—it's all just dictionaries! And since they're singleton instances, they're always ready to use without any setup.


## Design & Architecture

### Design Philosophy

This package is designed as a **thin façade** over the excellent [`huggingface_hub`](https://github.com/huggingface/huggingface_hub) and [`datasets`](https://github.com/huggingface/datasets) libraries. Rather than reinventing functionality, it provides a unified `Mapping` interface that wraps the most common operations, making them feel like native Python dictionary operations.

The design balances two sometimes-competing goals:
1. **Simplicity**: Keep the codebase small, readable, and maintainable
2. **Single Source of Truth (SSOT)**: Minimize hardcoded knowledge about the underlying APIs

Ideally, this interface would be *entirely* auto-generated through static analysis of the wrapped packages. While we achieve this partially, practical constraints require some manual intervention—but we've minimized it as much as possible.

### Key Architectural Patterns

#### 1. Configuration-Driven Design (SSOT)

The `repo_type_helpers` dictionary serves as the **single source of truth** for all repo-type-specific behavior:

```python
repo_type_helpers = dict(
    dataset=dict(
        loader_func=load_dataset,
        search_func=list_datasets,
    ),
    model=dict(
        loader_func=snapshot_download,
        search_func=list_models,
    ),
    # ... etc
)
```

This declarative approach means:
- Adding a new repo type requires only updating this configuration
- No duplication of logic across different repo types
- Clear visibility of how each type differs

#### 2. Dynamic Signature Injection

Rather than manually replicating the signatures of wrapped functions (which would violate SSOT), we use **signature extraction and injection** via the `sign_kwargs_with` decorator:

```python
@sign_kwargs_with(search_func)
def search(self, filter, **kwargs):
    return self.search_func(filter=filter, **kwargs)
```

This means:
- Each `.search()` method automatically inherits the correct signature from its underlying function
- IDEs and type checkers see the actual parameters available
- When HuggingFace updates their APIs, our signatures update automatically
- Documentation stays accurate without manual synchronization

**Note**: The `list_papers` function required special handling (`_list_papers` wrapper) because it uses `query` instead of `filter` as its parameter name. This is the type of pragmatic compromise we make—we normalize the interface rather than exposing the inconsistency.

#### 3. Separation of Concerns

The architecture cleanly separates:

- **Configuration** (`repo_type_helpers`): What differs between types
- **Base functionality** (`HfMapping`): Shared behavior for all types
- **Type-specific classes** (`HfDatasets`, `HfModels`, etc.): Minimal subclasses that mainly provide:
  - Clear, discoverable class names
  - Type-specific documentation
  - Future extensibility points
- **Convenience layer** (module-level singletons): Zero-setup access for users

#### 4. Module-Level Singletons

The pre-instantiated `datasets`, `models`, `spaces`, and `papers` instances follow Python's **convenience instance pattern** (seen in `sys.stdout`, `np.random`, etc.):

```python
# Ready to use immediately
datasets = HfDatasets()
models = HfModels()
```

This works because these instances:
- Have no mutable state
- Require no configuration for basic use
- Represent logical singletons ("the datasets mapping")

#### 5. Progressive Disclosure

The API supports multiple levels of sophistication:

```python
# Simplest: Use pre-configured singletons
data = hfdol.datasets['some/dataset']

# Advanced: Create custom instances with configuration
my_datasets = HfDatasets()

# Power user: Parameterized mapping for dynamic repo types
custom = HfMapping(RepoType.DATASET)
```

### Design Compromises

Several compromises were made for pragmatism:

1. **Manual wrappers**: `_list_papers` normalizes the papers API to match others
2. **Enum + string hybrid**: `RepoType(str, Enum)` allows both type safety and string convenience
3. **Explicit repo_type in get_size**: Required parameter to avoid ambiguity when repos exist as multiple types
4. **Signature injection limitations**: Works well for keyword arguments but can't handle complex overloads

### Contributing Guidelines

When contributing to this package, please maintain these principles:

**✅ DO:**
- Add configuration to `repo_type_helpers` rather than creating new methods
- Use signature extraction (`sign_kwargs_with`) when wrapping functions with many parameters
- Keep `HfMapping` generic and push specialization to configuration
- Document *why* special cases exist (like `_list_papers`)
- Test against actual HuggingFace APIs to catch signature drift

**❌ AVOID:**
- Duplicating knowledge about wrapped APIs
- Hardcoding parameter lists or types that could be extracted
- Adding stateful behavior to mapping instances
- Creating wrapper methods that simply pass through to underlying functions

**When in doubt:**
- Ask "Could this be driven by configuration?"
- Prefer declarative patterns over imperative logic
- Keep the codebase small and the configuration visible

The goal is a package where 80% of the code is just wiring and configuration, and the HuggingFace packages do the actual work. This maximizes maintainability and minimizes drift as those packages evolve.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/thorwhalen/hfdol",
    "name": "hfdol",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "datasets, data science, artificial intelligence, AI",
    "author": "Thor Whalen",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/1f/2c/3b15db4ba406ed5e1ec6730f252511104232cfd0dfcd0de54790e82f9ef5/hfdol-0.1.16.tar.gz",
    "platform": "any",
    "description": "# hfdol\n\nSimple Mapping interface to HuggingFace.\n\n(Note -- was [hf](https://pypi.org/project/hf/0.0.14/) but realeased the name to Huggingface itself for their tool.)\n\nTo install:\t```pip install hfdol```\n\nYou'll also need a Hugginface token. See [more about this here](https://huggingface.co/docs/huggingface_hub/en/quick-start).\n\n\n## Motivation\n\nThe Python packages [`datasets`](https://github.com/huggingface/datasets) and [`huggingface_hub`](https://github.com/huggingface/huggingface_hub) provide a remarkably clean, well-documented, and comprehensive API for accessing datasets, models, spaces, and papers hosted on [Hugging Face](https://huggingface.co).  \nYet, as elegant as these APIs are, they remain *their own language*. Every library\u2014no matter how intuitive\u2014inevitably carries its own conventions, abstractions, and domain-specific semantics. When working with one or two APIs, this diversity is harmless, even stimulating. But when juggling dozens or hundreds of them, the cognitive overhead accumulates.\n\nDespite their differences, most APIs share a small set of universal primitives \u2014 *retrieve something by key, list what's available, check existence, store, update, delete*.  \nIn Python, these operations are embodied by the `Mapping` interface, the conceptual model behind dictionaries. It's a minimal, ubiquitous, and instantly recognizable abstraction.  \n\nThis package offers such a `Mapping`-based fa\u00e7ade to Hugging Face datasets and models, allowing you to browse, query, and access them as if they were simple Python dictionaries. The goal isn't to replace the original API, but to provide a thin, ergonomic layer for the most common operations \u2014 so you can spend less time remembering syntax, and more time working with data.\n\n## Examples\n\nThis package provides four ready-to-use singleton instances, each offering a dictionary-like interface to different types of HuggingFace resources:\n\n```python\nimport hfdoldol\n```\n\n### Working with Datasets\n\nThe `hfdol.datasets` singleton provides a `Mapping` (i.e. read-only-dictionary-like) interface to HuggingFace datasets:\n\n#### List Local Datasets\n\nAs with dictionaries, `hfdol.datasets` is an iterable. An iterable of keys. \nThe keys are repository ids for those datasets you've downloaded. \nSee what datasets you already have cached locally like this:\n\n```python\nlist(hfdol.datasets)  # Lists locally cached datasets\n# ['stingning/ultrachat', 'allenai/WildChat-1M', 'google-research-datasets/go_emotions']\n```\n\n#### Access Local Datasets\n\nThe values of `hfdol.datasets` are the `DatasetDict` \n(from Huggingface's `datasets` package) instances that give you access to the dataset.\nIf you already have the dataset downloaded locally, it will load it from there, \nif not it will download it, then give it to you (and it will be cached locally \nfor the next time you access it). \n\n```python\ndata = hfdol.datasets['stingning/ultrachat']  # Loads the dataset\nprint(data)  # Shows dataset information and structure\n```\n\n#### Search for Remote Datasets\n\n`hfdol.datasets` also offers a search functionality, so you can search \"remote\" \nrepositories:\n\n```python\n# Search for music-related datasets\nsearch_results = hfdol.datasets.search('music', gated=False)\nprint(f\"search_results is a {type(search_results).__name__}\")  # It's a generator\n\n# Get the first result (it will be a `DatasetInfo` instance contain information on the dataset)\nresult = next(search_results)\nprint(f\"Dataset ID: {result.id}\")\nprint(f\"Description: {result.description[:80]}...\")\n\n# Download and use it directly\ndata = hfdol.datasets[result]  # You can pass the DatasetInfo object directly\n```\n\nNote that the `gated=False` was to make sure you get models that you have access to. \nFor more search options, see the [HuggingFace Hub documentation](https://huggingface.co/docs/huggingface_hub/package_reference/hf_api#huggingface_hub.HfApi.list_datasets).\n\n#### A useful recipe: Get a table of result infos\n\nYou can use this to get a dataframe of the first/next `n` results of the results iterable:\n\n```py\ndef table_of_results(results, n=10):\n    import itertools, operator, pandas as pd\n\n    results_table = pd.DataFrame(  # make a table with\n        map(\n            operator.attrgetter('__dict__'),  # the attributes dicts\n            itertools.islice(results, n),  # ... of the first 10 search results\n        )\n    )\n    return results_table\n```\n\nExample:\n\n```py\nresults_table = table_of_results(search_results)\nresults_table\n```\n\n                              id            author                                       sha ...\n    0   Genius-Society/hoyoMusic    Genius-Society  4f7e5120c0e8e26213d4bb3b52bcce76e69dfce4 ...\n    1      Genius-Society/emo163    Genius-Society  6b8c3526b66940ddaedf15602d01083d24eb370c ...\n    2  ccmusic-database/acapella  ccmusic-database  4cb8a4d4cb58cc55f30cb8c7a180fee1b5576dc5 ...\n    3    ccmusic-database/pianos  ccmusic-database  db2b3f74c4c989b4fbda4b309e6bc925bfd8f5d1 ...\n    ...\n\n\n### Working with Models\n\nThe `hfdol.models` singleton provides the same dictionary-like interface for models:\n\n#### Search for Models\n\nFind models by keywords:\n\n```python\nmodel_search_results = hfdol.models.search('embeddings', gated=False)\nmodel_result = next(model_search_results)\nprint(f\"Model: {model_result.id}\")\n```\n\n#### Download Models\n\nGet the local path to a model (downloads if not cached):\n\n```python\nmodel_path = hfdol.models[model_result]\nprint(f\"Model downloaded to: {model_path}\")\n```\n\n#### List Local Models\n\nSee what models you have cached:\n\n```python\nlist(hfdol.models)  # Lists all locally cached models\n```\n\n### Working with Spaces\n\nThe `hfdol.spaces` singleton provides access to HuggingFace Spaces (interactive ML demos and applications):\n\n#### Search for Spaces\n\nFind interesting Spaces by keywords:\n\n```python\nspace_search_results = hfdol.spaces.search('gradio', limit=5)\nspace_result = next(space_search_results)\nprint(f\"Space: {space_result.id}\")\n```\n\n#### Access Space Information\n\nGet detailed information about a Space:\n\n```python\nspace_info = hfdol.spaces[space_result]\nprint(f\"Space info: {space_info}\")\n```\n\n#### List Local Spaces\n\nSee what spaces you have cached locally:\n\n```python\nlist(hfdol.spaces)  # Lists all locally cached spaces\n```\n\n### Working with Papers\n\nThe `hfdol.papers` singleton provides access to research papers hosted on HuggingFace:\n\n#### Search for Papers\n\nFind research papers by topic:\n\n```python\npaper_search_results = hfdol.papers.search('transformer', limit=5)\npaper_result = next(paper_search_results)\nprint(f\"Paper: {paper_result.id}\")\n```\n\n#### Access Paper Information\n\nGet detailed information about a paper:\n\n```python\npaper_info = hfdol.papers[paper_result]\nprint(f\"Paper title: {paper_info.title}\")\nprint(f\"Abstract: {paper_info.summary[:100]}...\")\n```\n\nNote: Papers are metadata objects only\u2014they contain information about research papers but don't have downloadable files like datasets or models.\n\n### Getting Repository Sizes\n\nYou can check the size of any repository before downloading using the `get_size` function. The `repo_type` parameter is required to avoid ambiguity when repositories exist as multiple types:\n\n```python\nfrom hfdol import get_size\n\n# Get size of a dataset (specify repo_type explicitly)\ndataset_size = get_size('ccmusic-database/music_genre', repo_type='dataset')\nprint(f\"Dataset size: {dataset_size:.2f} GiB\")\n\n# Get size of a model \nmodel_size = get_size('ccmusic-database/music_genre', repo_type='model')\nprint(f\"Model size: {model_size:.2f} GiB\")\n\n# Using RepoType enum for type safety\nfrom hfdol.base import RepoType\nsize_with_enum = get_size('some-repo', repo_type=RepoType.DATASET)\n\n# Get size in different units (e.g., bytes)\nsize_in_bytes = get_size('some-repo', repo_type='dataset', unit_bytes=1)\n```\n\n**Pro tip**: Use the singleton instances for automatic repo_type handling:\n```python\n# These automatically know their repo_type\ndataset_size = hfdol.datasets.get_size('ccmusic-database/music_genre')\nmodel_size = hfdol.models.get_size('ccmusic-database/music_genre')\n```\n\n### Unified Interface\n\nThe beauty of this approach is that whether you're working with datasets, models, spaces, or papers, the interface remains familiar and consistent\u2014just like working with Python dictionaries. All four singleton instances support the same core operations:\n\n- **Dictionary-style access**: `resource = hfdol.datasets[key]`, `model_path = hfdol.models[key]`\n- **Local listing**: `list(hfdol.datasets)`, `list(hfdol.models)` \n- **Remote searching**: `hfdol.datasets.search(query)`, `hfdol.models.search(query)`\n- **Existence checking**: `key in hfdol.datasets`, `key in hfdol.models`\n\nThis unified interface means you can switch between different types of HuggingFace resources without learning new APIs\u2014it's all just dictionaries! And since they're singleton instances, they're always ready to use without any setup.\n\n\n## Design & Architecture\n\n### Design Philosophy\n\nThis package is designed as a **thin fa\u00e7ade** over the excellent [`huggingface_hub`](https://github.com/huggingface/huggingface_hub) and [`datasets`](https://github.com/huggingface/datasets) libraries. Rather than reinventing functionality, it provides a unified `Mapping` interface that wraps the most common operations, making them feel like native Python dictionary operations.\n\nThe design balances two sometimes-competing goals:\n1. **Simplicity**: Keep the codebase small, readable, and maintainable\n2. **Single Source of Truth (SSOT)**: Minimize hardcoded knowledge about the underlying APIs\n\nIdeally, this interface would be *entirely* auto-generated through static analysis of the wrapped packages. While we achieve this partially, practical constraints require some manual intervention\u2014but we've minimized it as much as possible.\n\n### Key Architectural Patterns\n\n#### 1. Configuration-Driven Design (SSOT)\n\nThe `repo_type_helpers` dictionary serves as the **single source of truth** for all repo-type-specific behavior:\n\n```python\nrepo_type_helpers = dict(\n    dataset=dict(\n        loader_func=load_dataset,\n        search_func=list_datasets,\n    ),\n    model=dict(\n        loader_func=snapshot_download,\n        search_func=list_models,\n    ),\n    # ... etc\n)\n```\n\nThis declarative approach means:\n- Adding a new repo type requires only updating this configuration\n- No duplication of logic across different repo types\n- Clear visibility of how each type differs\n\n#### 2. Dynamic Signature Injection\n\nRather than manually replicating the signatures of wrapped functions (which would violate SSOT), we use **signature extraction and injection** via the `sign_kwargs_with` decorator:\n\n```python\n@sign_kwargs_with(search_func)\ndef search(self, filter, **kwargs):\n    return self.search_func(filter=filter, **kwargs)\n```\n\nThis means:\n- Each `.search()` method automatically inherits the correct signature from its underlying function\n- IDEs and type checkers see the actual parameters available\n- When HuggingFace updates their APIs, our signatures update automatically\n- Documentation stays accurate without manual synchronization\n\n**Note**: The `list_papers` function required special handling (`_list_papers` wrapper) because it uses `query` instead of `filter` as its parameter name. This is the type of pragmatic compromise we make\u2014we normalize the interface rather than exposing the inconsistency.\n\n#### 3. Separation of Concerns\n\nThe architecture cleanly separates:\n\n- **Configuration** (`repo_type_helpers`): What differs between types\n- **Base functionality** (`HfMapping`): Shared behavior for all types\n- **Type-specific classes** (`HfDatasets`, `HfModels`, etc.): Minimal subclasses that mainly provide:\n  - Clear, discoverable class names\n  - Type-specific documentation\n  - Future extensibility points\n- **Convenience layer** (module-level singletons): Zero-setup access for users\n\n#### 4. Module-Level Singletons\n\nThe pre-instantiated `datasets`, `models`, `spaces`, and `papers` instances follow Python's **convenience instance pattern** (seen in `sys.stdout`, `np.random`, etc.):\n\n```python\n# Ready to use immediately\ndatasets = HfDatasets()\nmodels = HfModels()\n```\n\nThis works because these instances:\n- Have no mutable state\n- Require no configuration for basic use\n- Represent logical singletons (\"the datasets mapping\")\n\n#### 5. Progressive Disclosure\n\nThe API supports multiple levels of sophistication:\n\n```python\n# Simplest: Use pre-configured singletons\ndata = hfdol.datasets['some/dataset']\n\n# Advanced: Create custom instances with configuration\nmy_datasets = HfDatasets()\n\n# Power user: Parameterized mapping for dynamic repo types\ncustom = HfMapping(RepoType.DATASET)\n```\n\n### Design Compromises\n\nSeveral compromises were made for pragmatism:\n\n1. **Manual wrappers**: `_list_papers` normalizes the papers API to match others\n2. **Enum + string hybrid**: `RepoType(str, Enum)` allows both type safety and string convenience\n3. **Explicit repo_type in get_size**: Required parameter to avoid ambiguity when repos exist as multiple types\n4. **Signature injection limitations**: Works well for keyword arguments but can't handle complex overloads\n\n### Contributing Guidelines\n\nWhen contributing to this package, please maintain these principles:\n\n**\u2705 DO:**\n- Add configuration to `repo_type_helpers` rather than creating new methods\n- Use signature extraction (`sign_kwargs_with`) when wrapping functions with many parameters\n- Keep `HfMapping` generic and push specialization to configuration\n- Document *why* special cases exist (like `_list_papers`)\n- Test against actual HuggingFace APIs to catch signature drift\n\n**\u274c AVOID:**\n- Duplicating knowledge about wrapped APIs\n- Hardcoding parameter lists or types that could be extracted\n- Adding stateful behavior to mapping instances\n- Creating wrapper methods that simply pass through to underlying functions\n\n**When in doubt:**\n- Ask \"Could this be driven by configuration?\"\n- Prefer declarative patterns over imperative logic\n- Keep the codebase small and the configuration visible\n\nThe goal is a package where 80% of the code is just wiring and configuration, and the HuggingFace packages do the actual work. This maximizes maintainability and minimizes drift as those packages evolve.\n",
    "bugtrack_url": null,
    "license": "mit",
    "summary": "Simple Mapping interface to HuggingFace",
    "version": "0.1.16",
    "project_urls": {
        "Homepage": "https://github.com/thorwhalen/hfdol"
    },
    "split_keywords": [
        "datasets",
        " data science",
        " artificial intelligence",
        " ai"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "43dff60046778b861df9e4cc3102ba4a29059772b08d9550a813b0b9129d0678",
                "md5": "c60bc7d728cc88ec343bc9d7730d4045",
                "sha256": "a1507a829f5de64338dad4327877c20059d00967fd0239a5b684a377facf370e"
            },
            "downloads": -1,
            "filename": "hfdol-0.1.16-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c60bc7d728cc88ec343bc9d7730d4045",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 16149,
            "upload_time": "2025-10-28T00:21:42",
            "upload_time_iso_8601": "2025-10-28T00:21:42.384145Z",
            "url": "https://files.pythonhosted.org/packages/43/df/f60046778b861df9e4cc3102ba4a29059772b08d9550a813b0b9129d0678/hfdol-0.1.16-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "1f2c3b15db4ba406ed5e1ec6730f252511104232cfd0dfcd0de54790e82f9ef5",
                "md5": "d37a67fdb92cf8e542e6e13e2a1b5d80",
                "sha256": "1197d3525b5b745ca7a1251779efe92d01f163c0e7d22cfccd8c27e3e62c0f78"
            },
            "downloads": -1,
            "filename": "hfdol-0.1.16.tar.gz",
            "has_sig": false,
            "md5_digest": "d37a67fdb92cf8e542e6e13e2a1b5d80",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 20249,
            "upload_time": "2025-10-28T00:21:43",
            "upload_time_iso_8601": "2025-10-28T00:21:43.168739Z",
            "url": "https://files.pythonhosted.org/packages/1f/2c/3b15db4ba406ed5e1ec6730f252511104232cfd0dfcd0de54790e82f9ef5/hfdol-0.1.16.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-28 00:21:43",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "thorwhalen",
    "github_project": "hfdol",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "hfdol"
}

Thor Whalen