graze

Name	graze JSON
Version	0.1.38 JSON
	download
home_page	https://github.com/thorwhalen/graze
Summary	Cache (a tiny part of) the internet
upload_time	2025-10-29 17:27:44
maintainer	None
docs_url	None
author	Thor Whalen
requires_python	None
license	mit
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # graze

Cache (a tiny part of) the internet.

(For the technically inclined, `graze` is meant to ease the separation of the concerns of getting and caching/persisting data from the internet.)

## install

```pip install graze```


# Quick example

```python
from graze import Graze
import os
rootdir = os.path.expanduser('~/graze')
g = Graze(rootdir)
list(g)
```

If this is your first time, you got nothing:

```
[]
```

So get something. For no particular reason let's be self-referential and get myself:

```python
url = 'https://raw.githubusercontent.com/thorwhalen/graze/master/README.md'
content = g[url]
type(content), len(content)
```

Before I grew up, I had only 46 petty bytes (I have a lot more now):

```
(bytes, 46)
```

These were:

```python
print(content.decode())
```

```
# graze

Cache (a tiny part of) the internet.
```

But now, here's the deal. List your ``g`` keys now. Go ahead, don't be shy!

```python
list(g)
```
```
['https://raw.githubusercontent.com/thorwhalen/graze/master/README.md']
```

What does that mean? 

I means you have a local copy of these contents. 

The file path isn't really ``https://...``, it's `rootdir/https/...`, but you 
only have to care about that if you actually have to go get the file with
something else than graze. Because graze will give it to you.

How? Same way you got it in the first place:

```
content_2 = g[url]
assert content_2 == content
```

But this time, it didn't ask the internet. It just got it's local copy.

And if you want a fresh copy? 

No problem, just delete your local one. You guessed! 
The same way you would delete a key from a dict:

```python
del g[url]
```


# Understanding graze: Function and Class

Now that you've seen `graze` in action, let's dive deeper into how it works and what options you have to tailor it to your needs.

## The `graze()` function: Your core workhorse

At the heart of the package is the `graze()` function. It's simple: give it a URL, and it gives you back the contents as bytes. But here's the clever bit—it caches those bytes locally so the next time you ask for the same URL, you get instant access without hitting the network again.

```python
from graze import graze

# First call downloads and caches
content = graze('https://example.com/data.json')

# Second call uses cached version - blazing fast!
content_again = graze('https://example.com/data.json')
```

### Where does it cache?

By default, `graze()` stores files in `~/graze`, but you have full control over this through the `cache` parameter:

```python
# Cache to a specific folder
content = graze(url, cache='~/my_project/cache')

# Or use a specific filepath (cache defaults to None automatically)
content = graze(url, cache_key='~/data/specific_file.json')

# Or even use a dict for in-memory caching!
my_cache = {}
content = graze(url, cache=my_cache, cache_key='data.json')
```

The `cache` parameter accepts:
- `None` (default): Uses `~/graze` as the cache folder
- A string path: Any folder where you want files cached
- A `MutableMapping` (like dict or `dol.Files`): Custom storage backend

### Controlling the cache key

The `cache_key` parameter determines what key is used in your cache. By default, URLs are converted to safe filesystem paths, but you can customize this:

```python
# Auto-generated key (default)
content = graze('https://example.com/data.json')

# Explicit cache key
content = graze('https://example.com/data.json', cache_key='my_data.json')

# Use a function to generate keys
def url_to_key(url):
    return url.split('/')[-1]  # Just use filename
content = graze('https://example.com/data/file.json', cache_key=url_to_key)

# Or provide a full filepath (makes cache default to None)
content = graze('https://example.com/data.json', cache_key='~/my_data/important.json')
```

### Keeping data fresh

What if the data at your URL changes? `graze` offers two powerful refresh strategies:

**Time-based refresh with `max_age`:**

```python
# Re-download if cached data is older than 1 hour (3600 seconds)
content = graze(url, max_age=3600)

# Or for a whole day
content = graze(url, max_age=86400)
```

**Custom refresh logic with `refresh`:**

```python
# Always re-download
content = graze(url, refresh=True)

# Or use a function for complex logic
def should_refresh(cache_key, url):
    # Your custom logic here
    return some_condition

content = graze(url, refresh=should_refresh)
```

### Custom data sources

By default, `graze` uses `requests` to fetch URLs, but you can plug in any data source:

```python
from graze import graze, Internet

# Use a custom fetcher function
def my_fetcher(url):
    # Your custom logic (must return bytes)
    return response_bytes

content = graze(url, source=my_fetcher)

# Or use an object with __getitem__
content = graze(url, source=Internet(timeout=30))
```

### Getting notified of downloads

Want to know when `graze` is actually hitting the network?

```python
# Simple notification
content = graze(url, key_ingress=lambda k: print(f"Downloading {k}..."))

# Or get fancy with logging
import logging
logger = logging.getLogger(__name__)
content = graze(url, key_ingress=lambda k: logger.info(f"Fetching fresh data from {k}"))
```

### Other useful parameters

```python
# Get the cache key/filepath instead of contents
filepath = graze(url, return_key=True)
```

## The `Graze` class: Your dict-like cache interface

While the `graze()` function is great for one-off fetches, the `Graze` class gives you a convenient dict-like interface to browse and manage your cached data.

```python
from graze import Graze

# Create your cache interface
g = Graze('~/my_cache')

# It's a mapping - use it like a dict!
urls = list(g)  # See what you've cached
content = g[url]  # Get contents (downloads if not cached)
url in g  # Check if cached
len(g)  # Count cached items
del g[url]  # Remove from cache
```

The beauty of `Graze` is that it makes your cache feel like a dictionary where the keys are URLs and the values are the byte contents. Under the hood, it's using the `graze()` function for all the heavy lifting.

### Configuring your Graze instance

`Graze` accepts similar parameters to `graze()`, but they apply to all operations:

```python
from graze import Graze, Internet

g = Graze(
    rootdir='~/my_cache',  # Where to cache
    source=Internet(timeout=30),  # Custom source
    key_ingress=lambda k: print(f"Fetching {k}"),  # Download notifications
)

# Now all operations use these settings
content = g['https://example.com/data.json']
```

### Working with filepaths

Sometimes you need the actual filepath where data is cached:

```python
# Get filepaths instead of contents
g = Graze('~/cache', return_filepaths=True)
filepath = g[url]  # Returns path string instead of bytes

# Or get filepath on demand
g = Graze('~/cache')
filepath = g.filepath_of(url)
content = g[url]  # Still gets contents normally
```

### When you need TTL (time-to-live) caching

For data that changes periodically, use `GrazeWithDataRefresh`:

```python
from graze import GrazeWithDataRefresh

# Re-fetch if data is older than 1 hour
g = GrazeWithDataRefresh(
    rootdir='~/cache',
    time_to_live=3600,  # seconds
    on_error='ignore'  # Return stale data if refresh fails
)

content = g[url]  # Fresh data (or cached if recent enough)
```

The `on_error` parameter controls what happens when refresh fails:
- `'ignore'`: Silently return stale cached data
- `'warn'`: Warn but return stale data
- `'raise'`: Raise the error
- `'warn_and_return_local'`: Warn and return stale data

### Advanced cache backends

Want to cache to something other than files? Use any `MutableMapping`:

```python
from dol import Files

# Files gives you a dict-like interface to a filesystem
cache = Files('~/cache')
g = Graze(cache)  # Now using Files instead of plain folder

# Or use an in-memory dict for temporary caching
cache = {}
g = Graze(cache)
```

## Choosing between `graze()` and `Graze`

Use the **`graze()` function** when:
- You're fetching a single URL
- You want different settings per fetch
- You prefer a functional style

Use the **`Graze` class** when:
- You want a dict-like interface to your cache
- You're working with multiple URLs with consistent settings
- You need to browse, count, or manage cached items
- You want to check what's cached before fetching


# Q&A


## The pages I need to slurp need to be rendered, can I use selenium of other such engines?

Sure!

We understand that sometimes you might have special slurping needs -- such 
as needing to let the JS render the page fully, and/or extract something 
specific, in a specific way, from the page.

Selenium is a popular choice for these needs.

`graze` doesn't install selenium for you, but if you've done that, you just 
need to specify a different `Internet` object for `Graze` to source from, 
and to make an internet object, you just need to specify what a 
`url_to_contents` function that does exactly what it says. 

Note that the contents need to be returned in bytes for `Graze` to work.

If you want to use some of the default `selenium` `url_to_contents` functions 
to make an `Internet` (we got Chrome, Firefox, Safari, and Opera), 
you go ahead! here's an example using the default Chrome driver
(again, you need to have the driver installed already for this to work; 
see https://selenium-python.readthedocs.io/):

```python
from graze import Graze, url_to_contents, Internet

g = Graze(source=Internet(url_to_contents=url_to_contents.selenium_chrome))
```

And if you'll be using it often, just do:

```python
from graze import Graze, url_to_contents, Internet
from functools import partial
my_graze =  partial(
    Graze,
    rootdir='a_specific_root_dir_for_your_project',
    source=Internet(url_to_contents=url_to_contents.selenium_chrome)
)

# and then you can just do
g = my_graze()
# and get on with the fun...
```


## What if I want a fresh copy of the data?

Classic caching problem. 
You like the convenience of having a local copy, but then how do you keep in sync with the data source if it changes?

See the "Keeping data fresh" section above for comprehensive coverage of refresh strategies. In brief:

If you KNOW the source data changed and want to sync, it's easy. You delete the local copy 
(like deleting a key from a dict: `del g[url]`)
and you try to access it again. 
Since you don't have a local copy, it will get one from the `url` source. 

For automatic refresh, you have several options:

**Time-based (TTL) refresh:**
```python
from graze import graze

# Re-download if cached data is older than an hour
content_bytes = graze(url, max_age=3600)
```

**Or use `GrazeWithDataRefresh` for dict-like TTL caching:**
```python
from graze import GrazeWithDataRefresh

g = GrazeWithDataRefresh(time_to_live=3600, on_error='ignore')
content = g[url]
```

**Custom refresh logic:**
```python
# Always refresh
content = graze(url, refresh=True)

# Or use a custom function
def should_refresh(cache_key, url):
    return your_logic_here

content = graze(url, refresh=should_refresh)
```

## Can I make graze notify me when it gets a new copy of the data?

Sure! Just specify a `key_ingress` function when you make your `Graze` object, or 
call `graze`. This function will be called on the key (the url) just before contents 
are being downloaded from the internet. The typical function would be:

```python
key_ingress = lambda key: print(f"Getting {key} from the internet")
```

## Does graze work for dropbox links?

Yes it does, but you need to be aware that dropbox systematically send the data as a zip, **even if there's only one file in it**.

Here's some code that can help.

```python
def zip_store_of_dropbox_url(dropbox_url: str):
    """Get a key-value perspective of the (folder) contents 
    of the zip a dropbox url gets you"""
    from graze import graze
    from dol import FilesOfZip
    return FilesOfZip(graze(dropbox_url))
    
def filebytes_of_dropbox_url(dropbox_url: str, assert_only_one_file=True):
    """Get the bytes of the first file in a zip that a dropbox url gives you"""
    zip_store = zip_store_of_dropbox_url(dropbox_url)
    zip_filepaths = iter(zip_store)
    first_filepath = next(zip_filepaths)
    if assert_only_one_file:
        assert next(zip_filepaths, None) is None, f"More than one file in {dropbox_url}"
    return zip_store[first_filepath]
```

## How do I use tiny_url?

`tiny_url` is a convenience utility that shortens long URLs, making them easier to work with in demos, notebooks, and testing. It's especially useful when you're working with GitHub raw content URLs or other lengthy URLs.

**Basic usage:**

```python
from graze import tiny_url

url = 'https://raw.githubusercontent.com/thorwhalen/graze/refs/heads/master/README.md'
short_url = tiny_url(url)
print(short_url)  # Much shorter!
```

**Encoding and decoding:**

`tiny_url` works like a codec with `encode` and `decode` methods:

```python
# Encoding (shortening)
encoded_url = tiny_url.encode(url)  # Same as tiny_url(url)

# Decoding (getting original URL back)
original_url = tiny_url.decode(encoded_url)
assert original_url == url
```

This is particularly useful when:
- Working in Jupyter notebooks with long URLs
- Creating cleaner demos and examples
- Testing with URLs that would clutter your code
- Sharing code snippets where URL readability matters


# Further Notes

## New url-to-path mapping 

`graze` used to have a more straightforward url-to-local_filepath mapping, 
but it ended up being problematic: In a nutshell, 
if you slurp `abc.com` and it goes to a file of that name, 
where is `abc.com/data.zip` supposed to go (`abc.com` needs to be a folder 
in that case).  
See [issue](https://github.com/thorwhalen/graze/issues/1).

It's with a heavy heart that I changed the mapping to one that was still 
straightforward, but has the disadvantage of mapping all files to the 
same file name, without extension. 

Hopefully a better solution will show up soon.

If you already have graze files from the old way, you can 
use the `change_files_to_new_url_to_filepath_format` function to change these 
to the new format.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/thorwhalen/graze",
    "name": "graze",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": null,
    "author": "Thor Whalen",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/b0/62/c7d569c433c99ef94c3d0bc91e7d067237563576ffe77698e8d824ea6012/graze-0.1.38.tar.gz",
    "platform": "any",
    "description": "# graze\n\nCache (a tiny part of) the internet.\n\n(For the technically inclined, `graze` is meant to ease the separation of the concerns of getting and caching/persisting data from the internet.)\n\n## install\n\n```pip install graze```\n\n\n# Quick example\n\n```python\nfrom graze import Graze\nimport os\nrootdir = os.path.expanduser('~/graze')\ng = Graze(rootdir)\nlist(g)\n```\n\nIf this is your first time, you got nothing:\n\n```\n[]\n```\n\nSo get something. For no particular reason let's be self-referential and get myself:\n\n```python\nurl = 'https://raw.githubusercontent.com/thorwhalen/graze/master/README.md'\ncontent = g[url]\ntype(content), len(content)\n```\n\nBefore I grew up, I had only 46 petty bytes (I have a lot more now):\n\n```\n(bytes, 46)\n```\n\nThese were:\n\n```python\nprint(content.decode())\n```\n\n```\n# graze\n\nCache (a tiny part of) the internet.\n```\n\nBut now, here's the deal. List your ``g`` keys now. Go ahead, don't be shy!\n\n```python\nlist(g)\n```\n```\n['https://raw.githubusercontent.com/thorwhalen/graze/master/README.md']\n```\n\nWhat does that mean? \n\nI means you have a local copy of these contents. \n\nThe file path isn't really ``https://...``, it's `rootdir/https/...`, but you \nonly have to care about that if you actually have to go get the file with\nsomething else than graze. Because graze will give it to you.\n\nHow? Same way you got it in the first place:\n\n```\ncontent_2 = g[url]\nassert content_2 == content\n```\n\nBut this time, it didn't ask the internet. It just got it's local copy.\n\nAnd if you want a fresh copy? \n\nNo problem, just delete your local one. You guessed! \nThe same way you would delete a key from a dict:\n\n```python\ndel g[url]\n```\n\n\n# Understanding graze: Function and Class\n\nNow that you've seen `graze` in action, let's dive deeper into how it works and what options you have to tailor it to your needs.\n\n## The `graze()` function: Your core workhorse\n\nAt the heart of the package is the `graze()` function. It's simple: give it a URL, and it gives you back the contents as bytes. But here's the clever bit\u2014it caches those bytes locally so the next time you ask for the same URL, you get instant access without hitting the network again.\n\n```python\nfrom graze import graze\n\n# First call downloads and caches\ncontent = graze('https://example.com/data.json')\n\n# Second call uses cached version - blazing fast!\ncontent_again = graze('https://example.com/data.json')\n```\n\n### Where does it cache?\n\nBy default, `graze()` stores files in `~/graze`, but you have full control over this through the `cache` parameter:\n\n```python\n# Cache to a specific folder\ncontent = graze(url, cache='~/my_project/cache')\n\n# Or use a specific filepath (cache defaults to None automatically)\ncontent = graze(url, cache_key='~/data/specific_file.json')\n\n# Or even use a dict for in-memory caching!\nmy_cache = {}\ncontent = graze(url, cache=my_cache, cache_key='data.json')\n```\n\nThe `cache` parameter accepts:\n- `None` (default): Uses `~/graze` as the cache folder\n- A string path: Any folder where you want files cached\n- A `MutableMapping` (like dict or `dol.Files`): Custom storage backend\n\n### Controlling the cache key\n\nThe `cache_key` parameter determines what key is used in your cache. By default, URLs are converted to safe filesystem paths, but you can customize this:\n\n```python\n# Auto-generated key (default)\ncontent = graze('https://example.com/data.json')\n\n# Explicit cache key\ncontent = graze('https://example.com/data.json', cache_key='my_data.json')\n\n# Use a function to generate keys\ndef url_to_key(url):\n    return url.split('/')[-1]  # Just use filename\ncontent = graze('https://example.com/data/file.json', cache_key=url_to_key)\n\n# Or provide a full filepath (makes cache default to None)\ncontent = graze('https://example.com/data.json', cache_key='~/my_data/important.json')\n```\n\n### Keeping data fresh\n\nWhat if the data at your URL changes? `graze` offers two powerful refresh strategies:\n\n**Time-based refresh with `max_age`:**\n\n```python\n# Re-download if cached data is older than 1 hour (3600 seconds)\ncontent = graze(url, max_age=3600)\n\n# Or for a whole day\ncontent = graze(url, max_age=86400)\n```\n\n**Custom refresh logic with `refresh`:**\n\n```python\n# Always re-download\ncontent = graze(url, refresh=True)\n\n# Or use a function for complex logic\ndef should_refresh(cache_key, url):\n    # Your custom logic here\n    return some_condition\n\ncontent = graze(url, refresh=should_refresh)\n```\n\n### Custom data sources\n\nBy default, `graze` uses `requests` to fetch URLs, but you can plug in any data source:\n\n```python\nfrom graze import graze, Internet\n\n# Use a custom fetcher function\ndef my_fetcher(url):\n    # Your custom logic (must return bytes)\n    return response_bytes\n\ncontent = graze(url, source=my_fetcher)\n\n# Or use an object with __getitem__\ncontent = graze(url, source=Internet(timeout=30))\n```\n\n### Getting notified of downloads\n\nWant to know when `graze` is actually hitting the network?\n\n```python\n# Simple notification\ncontent = graze(url, key_ingress=lambda k: print(f\"Downloading {k}...\"))\n\n# Or get fancy with logging\nimport logging\nlogger = logging.getLogger(__name__)\ncontent = graze(url, key_ingress=lambda k: logger.info(f\"Fetching fresh data from {k}\"))\n```\n\n### Other useful parameters\n\n```python\n# Get the cache key/filepath instead of contents\nfilepath = graze(url, return_key=True)\n```\n\n## The `Graze` class: Your dict-like cache interface\n\nWhile the `graze()` function is great for one-off fetches, the `Graze` class gives you a convenient dict-like interface to browse and manage your cached data.\n\n```python\nfrom graze import Graze\n\n# Create your cache interface\ng = Graze('~/my_cache')\n\n# It's a mapping - use it like a dict!\nurls = list(g)  # See what you've cached\ncontent = g[url]  # Get contents (downloads if not cached)\nurl in g  # Check if cached\nlen(g)  # Count cached items\ndel g[url]  # Remove from cache\n```\n\nThe beauty of `Graze` is that it makes your cache feel like a dictionary where the keys are URLs and the values are the byte contents. Under the hood, it's using the `graze()` function for all the heavy lifting.\n\n### Configuring your Graze instance\n\n`Graze` accepts similar parameters to `graze()`, but they apply to all operations:\n\n```python\nfrom graze import Graze, Internet\n\ng = Graze(\n    rootdir='~/my_cache',  # Where to cache\n    source=Internet(timeout=30),  # Custom source\n    key_ingress=lambda k: print(f\"Fetching {k}\"),  # Download notifications\n)\n\n# Now all operations use these settings\ncontent = g['https://example.com/data.json']\n```\n\n### Working with filepaths\n\nSometimes you need the actual filepath where data is cached:\n\n```python\n# Get filepaths instead of contents\ng = Graze('~/cache', return_filepaths=True)\nfilepath = g[url]  # Returns path string instead of bytes\n\n# Or get filepath on demand\ng = Graze('~/cache')\nfilepath = g.filepath_of(url)\ncontent = g[url]  # Still gets contents normally\n```\n\n### When you need TTL (time-to-live) caching\n\nFor data that changes periodically, use `GrazeWithDataRefresh`:\n\n```python\nfrom graze import GrazeWithDataRefresh\n\n# Re-fetch if data is older than 1 hour\ng = GrazeWithDataRefresh(\n    rootdir='~/cache',\n    time_to_live=3600,  # seconds\n    on_error='ignore'  # Return stale data if refresh fails\n)\n\ncontent = g[url]  # Fresh data (or cached if recent enough)\n```\n\nThe `on_error` parameter controls what happens when refresh fails:\n- `'ignore'`: Silently return stale cached data\n- `'warn'`: Warn but return stale data\n- `'raise'`: Raise the error\n- `'warn_and_return_local'`: Warn and return stale data\n\n### Advanced cache backends\n\nWant to cache to something other than files? Use any `MutableMapping`:\n\n```python\nfrom dol import Files\n\n# Files gives you a dict-like interface to a filesystem\ncache = Files('~/cache')\ng = Graze(cache)  # Now using Files instead of plain folder\n\n# Or use an in-memory dict for temporary caching\ncache = {}\ng = Graze(cache)\n```\n\n## Choosing between `graze()` and `Graze`\n\nUse the **`graze()` function** when:\n- You're fetching a single URL\n- You want different settings per fetch\n- You prefer a functional style\n\nUse the **`Graze` class** when:\n- You want a dict-like interface to your cache\n- You're working with multiple URLs with consistent settings\n- You need to browse, count, or manage cached items\n- You want to check what's cached before fetching\n\n\n# Q&A\n\n\n## The pages I need to slurp need to be rendered, can I use selenium of other such engines?\n\nSure!\n\nWe understand that sometimes you might have special slurping needs -- such \nas needing to let the JS render the page fully, and/or extract something \nspecific, in a specific way, from the page.\n\nSelenium is a popular choice for these needs.\n\n`graze` doesn't install selenium for you, but if you've done that, you just \nneed to specify a different `Internet` object for `Graze` to source from, \nand to make an internet object, you just need to specify what a \n`url_to_contents` function that does exactly what it says. \n\nNote that the contents need to be returned in bytes for `Graze` to work.\n\nIf you want to use some of the default `selenium` `url_to_contents` functions \nto make an `Internet` (we got Chrome, Firefox, Safari, and Opera), \nyou go ahead! here's an example using the default Chrome driver\n(again, you need to have the driver installed already for this to work; \nsee https://selenium-python.readthedocs.io/):\n\n```python\nfrom graze import Graze, url_to_contents, Internet\n\ng = Graze(source=Internet(url_to_contents=url_to_contents.selenium_chrome))\n```\n\nAnd if you'll be using it often, just do:\n\n```python\nfrom graze import Graze, url_to_contents, Internet\nfrom functools import partial\nmy_graze =  partial(\n    Graze,\n    rootdir='a_specific_root_dir_for_your_project',\n    source=Internet(url_to_contents=url_to_contents.selenium_chrome)\n)\n\n# and then you can just do\ng = my_graze()\n# and get on with the fun...\n```\n\n\n## What if I want a fresh copy of the data?\n\nClassic caching problem. \nYou like the convenience of having a local copy, but then how do you keep in sync with the data source if it changes?\n\nSee the \"Keeping data fresh\" section above for comprehensive coverage of refresh strategies. In brief:\n\nIf you KNOW the source data changed and want to sync, it's easy. You delete the local copy \n(like deleting a key from a dict: `del g[url]`)\nand you try to access it again. \nSince you don't have a local copy, it will get one from the `url` source. \n\nFor automatic refresh, you have several options:\n\n**Time-based (TTL) refresh:**\n```python\nfrom graze import graze\n\n# Re-download if cached data is older than an hour\ncontent_bytes = graze(url, max_age=3600)\n```\n\n**Or use `GrazeWithDataRefresh` for dict-like TTL caching:**\n```python\nfrom graze import GrazeWithDataRefresh\n\ng = GrazeWithDataRefresh(time_to_live=3600, on_error='ignore')\ncontent = g[url]\n```\n\n**Custom refresh logic:**\n```python\n# Always refresh\ncontent = graze(url, refresh=True)\n\n# Or use a custom function\ndef should_refresh(cache_key, url):\n    return your_logic_here\n\ncontent = graze(url, refresh=should_refresh)\n```\n\n## Can I make graze notify me when it gets a new copy of the data?\n\nSure! Just specify a `key_ingress` function when you make your `Graze` object, or \ncall `graze`. This function will be called on the key (the url) just before contents \nare being downloaded from the internet. The typical function would be:\n\n```python\nkey_ingress = lambda key: print(f\"Getting {key} from the internet\")\n```\n\n## Does graze work for dropbox links?\n\nYes it does, but you need to be aware that dropbox systematically send the data as a zip, **even if there's only one file in it**.\n\nHere's some code that can help.\n\n```python\ndef zip_store_of_dropbox_url(dropbox_url: str):\n    \"\"\"Get a key-value perspective of the (folder) contents \n    of the zip a dropbox url gets you\"\"\"\n    from graze import graze\n    from dol import FilesOfZip\n    return FilesOfZip(graze(dropbox_url))\n    \ndef filebytes_of_dropbox_url(dropbox_url: str, assert_only_one_file=True):\n    \"\"\"Get the bytes of the first file in a zip that a dropbox url gives you\"\"\"\n    zip_store = zip_store_of_dropbox_url(dropbox_url)\n    zip_filepaths = iter(zip_store)\n    first_filepath = next(zip_filepaths)\n    if assert_only_one_file:\n        assert next(zip_filepaths, None) is None, f\"More than one file in {dropbox_url}\"\n    return zip_store[first_filepath]\n```\n\n## How do I use tiny_url?\n\n`tiny_url` is a convenience utility that shortens long URLs, making them easier to work with in demos, notebooks, and testing. It's especially useful when you're working with GitHub raw content URLs or other lengthy URLs.\n\n**Basic usage:**\n\n```python\nfrom graze import tiny_url\n\nurl = 'https://raw.githubusercontent.com/thorwhalen/graze/refs/heads/master/README.md'\nshort_url = tiny_url(url)\nprint(short_url)  # Much shorter!\n```\n\n**Encoding and decoding:**\n\n`tiny_url` works like a codec with `encode` and `decode` methods:\n\n```python\n# Encoding (shortening)\nencoded_url = tiny_url.encode(url)  # Same as tiny_url(url)\n\n# Decoding (getting original URL back)\noriginal_url = tiny_url.decode(encoded_url)\nassert original_url == url\n```\n\nThis is particularly useful when:\n- Working in Jupyter notebooks with long URLs\n- Creating cleaner demos and examples\n- Testing with URLs that would clutter your code\n- Sharing code snippets where URL readability matters\n\n\n# Further Notes\n\n## New url-to-path mapping \n\n`graze` used to have a more straightforward url-to-local_filepath mapping, \nbut it ended up being problematic: In a nutshell, \nif you slurp `abc.com` and it goes to a file of that name, \nwhere is `abc.com/data.zip` supposed to go (`abc.com` needs to be a folder \nin that case).  \nSee [issue](https://github.com/thorwhalen/graze/issues/1).\n\nIt's with a heavy heart that I changed the mapping to one that was still \nstraightforward, but has the disadvantage of mapping all files to the \nsame file name, without extension. \n\nHopefully a better solution will show up soon.\n\nIf you already have graze files from the old way, you can \nuse the `change_files_to_new_url_to_filepath_format` function to change these \nto the new format. \n\n\n\n",
    "bugtrack_url": null,
    "license": "mit",
    "summary": "Cache (a tiny part of) the internet",
    "version": "0.1.38",
    "project_urls": {
        "Homepage": "https://github.com/thorwhalen/graze"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "0ee917daa307041bf9cfb7789ea30e6007c219223b4f20df612b97bd86ccae5c",
                "md5": "411e8083da4d5fb75334b3d44d373793",
                "sha256": "33f7192821180de1fe5deb0eca81cb4339f3e76d02b90db292b8ef0f0bda7210"
            },
            "downloads": -1,
            "filename": "graze-0.1.38-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "411e8083da4d5fb75334b3d44d373793",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 30140,
            "upload_time": "2025-10-29T17:27:42",
            "upload_time_iso_8601": "2025-10-29T17:27:42.688143Z",
            "url": "https://files.pythonhosted.org/packages/0e/e9/17daa307041bf9cfb7789ea30e6007c219223b4f20df612b97bd86ccae5c/graze-0.1.38-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "b062c7d569c433c99ef94c3d0bc91e7d067237563576ffe77698e8d824ea6012",
                "md5": "35558f7e9a01015be1917a72841d128d",
                "sha256": "e9ac24c82eb71a518279ba2903bf756cf6f615938968aa2bee296d433bace009"
            },
            "downloads": -1,
            "filename": "graze-0.1.38.tar.gz",
            "has_sig": false,
            "md5_digest": "35558f7e9a01015be1917a72841d128d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 32969,
            "upload_time": "2025-10-29T17:27:44",
            "upload_time_iso_8601": "2025-10-29T17:27:44.484278Z",
            "url": "https://files.pythonhosted.org/packages/b0/62/c7d569c433c99ef94c3d0bc91e7d067237563576ffe77698e8d824ea6012/graze-0.1.38.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-29 17:27:44",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "thorwhalen",
    "github_project": "graze",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "graze"
}

Thor Whalen