==========================
charmonium.freeze
==========================
.. image:: https://img.shields.io/pypi/v/charmonium.freeze
:alt: PyPI Package
:target: https://pypi.org/project/charmonium.freeze
.. image:: https://img.shields.io/pypi/dm/charmonium.freeze
:alt: PyPI Downloads
:target: https://pypi.org/project/charmonium.freeze
.. image:: https://img.shields.io/pypi/l/charmonium.freeze
:alt: License
:target: https://github.com/charmoniumQ/charmonium.freeze/blob/main/LICENSE
.. image:: https://img.shields.io/pypi/pyversions/charmonium.freeze
:alt: Python Versions
:target: https://pypi.org/project/charmonium.freeze
.. image:: https://img.shields.io/librariesio/sourcerank/pypi/charmonium.freeze
:alt: libraries.io sourcerank
:target: https://libraries.io/pypi/charmonium.freeze
.. image:: https://img.shields.io/github/stars/charmoniumQ/charmonium.freeze?style=social
:alt: GitHub stars
:target: https://github.com/charmoniumQ/charmonium.freeze
.. image:: https://github.com/charmoniumQ/charmonium.freeze/actions/workflows/main.yaml/badge.svg
:alt: CI status
:target: https://github.com/charmoniumQ/charmonium.freeze/actions/workflows/main.yaml
.. image:: https://codecov.io/gh/charmoniumQ/charmonium.freeze/branch/main/graph/badge.svg?token=56A97FFTGZ
:alt: Code Coverage
:target: https://codecov.io/gh/charmoniumQ/charmonium.freeze
.. image:: https://img.shields.io/github/last-commit/charmoniumQ/charmonium.cache
:alt: GitHub last commit
:target: https://github.com/charmoniumQ/charmonium.freeze/commits
.. image:: http://www.mypy-lang.org/static/mypy_badge.svg
:target: https://mypy.readthedocs.io/en/stable/
:alt: Checked with Mypy
.. image:: https://img.shields.io/badge/code%20style-black-000000.svg
:target: https://github.com/psf/black
:alt: Code style: black
Injectively, deterministically maps arbitrary objects to hashable, immutable values
----------
Quickstart
----------
If you don't have ``pip`` installed, see the `pip install guide`_.
.. _`pip install guide`: https://pip.pypa.io/en/latest/installing/
.. code-block:: console
$ pip install charmonium.freeze
For a related project, |charmonium.cache|_, I needed a function that
deterministically, injectively maps objects to hashable objects.
- "Injectively" means ``freeze(a) == freeze(b)`` implies ``a == b``
(with the precondition that ``a`` and ``b`` are of the same type).
- "Deterministically" means it should return the same value **across
subsequent process invocations** (with the same interpreter major
and minor version), unlike Python's |hash|_ function, which is not
deterministic between processes.
- "Hashable" means one can call ``hash(...)`` on it. All hashable
values are immutable.
.. |hash| replace:: ``hash``
.. _`hash`: https://docs.python.org/3.8/reference/datamodel.html#object.__hash__
.. |charmonium.cache| replace:: ``charmonium.cache``
.. _`charmonium.cache`: https://github.com/charmoniumQ/charmonium.cache
Have you ever felt like you wanted to "freeze" a list of arbitrary
data into a hashable value? Now you can.
>>> obj = [1, 2, 3, {4, 5, 6}, object()]
>>> hash(obj)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
>>> from charmonium.freeze import freeze
>>> freeze(obj)
9561766455304166758
-------------
Configuration
-------------
By changing the configuration, we can see the exact data that gets hashed.
We can change the configuration in a few ways:
- Object-oriented (preferred)
>>> from charmonium.freeze import Config
>>> freeze(obj, Config(use_hash=False))
(1, 2, 3, frozenset({4, 5, 6}), ((('builtins', 'object'),), b'copyreg.__newobj__'))
- Global variable, but in this case, we must also clear the cache when we mutate
the config.
>>> from charmonium.freeze import global_config
>>> global_config.use_hash = False
>>> global_config.memo.clear()
>>> freeze(obj)
(1, 2, 3, frozenset({4, 5, 6}), ((('builtins', 'object'),), b'copyreg.__newobj__'))
``use_hash=True`` will be faster and produce less data, but I will demonstrate
it with ``use_hash=False`` so you can see what data gets included in the state.
See the source code ``charmonium/freeze/config.py`` for other configuration
options.
------------------
Freezing Functions
------------------
``freeze`` on functions returns their bytecode, constants, and closure-vars. The
remarkable thing is that this is true across subsequent invocations of the same
process. If the user edits the script and changes the function, then it's
``freeze`` will change too. This tells you if it is safe to use the cached value
of the function.
::
(freeze(f) == freeze(g)) implies (for all x, f(x) == g(x))
>>> from pprint import pprint
>>> i = 456
>>> func = lambda x: x + i + 123
>>> pprint(freeze(func))
(('<lambda>', None, 123, b'|\x00t\x00\x17\x00d\x01\x17\x00S\x00'),
(('i', 456),))
As promised, the frozen value includes the bytecode (``b'|x00t...``), the
constants (123), and the closure variables (456). When we change ``i``, we get a
different frozen value, indicating that the ``func`` might not be
computationally equivalent to what it was before.
>>> i = 789
>>> pprint(freeze(func))
(('<lambda>', None, 123, b'|\x00t\x00\x17\x00d\x01\x17\x00S\x00'),
(('i', 789),))
``freeze`` works for objects that use function as data.
>>> import functools
>>> pprint(freeze(functools.partial(print, 123)))
(('print',),
('print', (123,), (), None),
(frozenset({'partial',
(...,
('args', (b'member_descriptor', b'args')),
('func', (b'member_descriptor', b'func')),
('keywords', (b'member_descriptor', b'keywords')))}),
('builtins', 'object')))
``freeze`` works for methods.
>>> class Greeter:
... def __init__(self, greeting):
... self.greeting = greeting
... def greet(self, name):
... print(self.greeting + " " + name)
...
>>> pprint(freeze(Greeter.greet))
(('greet',
None,
' ',
b't\x00|\x00j\x01d\x01\x17\x00|\x01\x17\x00\x83\x01\x01\x00d\x00S\x00'),)
----------------
Freezing Objects
----------------
``freeze`` works on objects by freezing their state and freezing their
methods. The state is found by the `pickle protocol`_, which the Python language
implements by default for all classes. To get an idea of what this returns, call
``obj.__reduce_ex__(4)``. Because we reuse an existing protocol, ``freeze`` work
correctly on most user-defined types.
.. _`pickle protocol`: https://docs.python.org/3/library/pickle.html#pickling-class-instances
>>> s = Greeter("hello")
>>> pprint(s.__reduce_ex__(4))
(<function __newobj__ at 0x...>,
(<class '__main__.Greeter'>,),
{'greeting': 'hello'},
None,
None)
>>> pprint(freeze(s))
(((frozenset({'Greeter',
(('__init__',
(('__init__', None, b'|\x01|\x00_\x00d\x00S\x00'),)),
('greet',
(('greet',
None,
' ',
b't\x00|\x00j\x01d\x01\x17\x00|\x01\x17\x00\x83\x01'
b'\x01\x00d\x00S\x00'),)))}),
('builtins', 'object')),),
(('greeting', 'hello'),),
b'copyreg.__newobj__')
However, there can still be special cases: ``pickle`` may incorporate
non-deterministic values. In this case, there are three remedies:
- If you can tweak the definition of the class, add a method called
``__getfrozenstate__`` which returns a deterministic snapshot of the
state. This takes precedence over the Pickle protocol, if it is defined.
>>> class Greeter:
... def __init__(self, greeting):
... self.greeting = greeting
... def greet(self, name):
... print(self.greeting + " " + name)
... def __getfrozenstate__(self):
... return self.greeting
...
>>> pprint(freeze(Greeter("hello")))
((frozenset({'Greeter',
(('__getfrozenstate__',
(('__getfrozenstate__', None, b'|\x00j\x00S\x00'),)),
('__init__', (('__init__', None, b'|\x01|\x00_\x00d\x00S\x00'),)),
('greet',
(('greet',
None,
' ',
b't\x00|\x00j\x01d\x01\x17\x00|\x01\x17\x00\x83\x01'
b'\x01\x00d\x00S\x00'),)))}),
('builtins', 'object')),
'hello')
- Otherwise, you can ignore certain attributes by changing the
configuration. See the source code of ``charmonium/freeze/config.py`` for more
details.
>>> class Greeter:
... def __init__(self, greeting):
... self.greeting = greeting
... def greet(self, name):
... print(self.greeting + " " + name)
...
>>> config = Config(use_hash=False)
>>> config.ignore_attributes.add(("__main__", "Greeter", "greeting"))
>>> pprint(freeze(Greeter("hello"), config))
(((frozenset({'Greeter',
(('__init__',
(('__init__', None, b'|\x01|\x00_\x00d\x00S\x00'),)),
('greet',
(('greet',
None,
' ',
b't\x00|\x00j\x01d\x01\x17\x00|\x01\x17\x00\x83\x01'
b'\x01\x00d\x00S\x00'),)))}),
('builtins', 'object')),),
(),
b'copyreg.__newobj__')
Note that ``'hello'`` is not present in the frozen object any more.
- If you cannot tweak the definition of the class or monkeypatch a
``__getfrozenstate__`` method, you can still register `single dispatch
handler`_ for that type:
.. _`single dispatch handler`: https://docs.python.org/3/library/functools.html#functools.singledispatch
>>> from typing import Hashable, Optional, Dict, Tuple
>>> from charmonium.freeze import _freeze_dispatch, _freeze
>>> @_freeze_dispatch.register(Greeter)
... def _(
... obj: Greeter,
... config: Config,
... tabu: Dict[int, Tuple[int, int]],
... level: int,
... index: int,
... ) -> Tuple[Hashable, bool, Optional[int]]:
... # Type annotations are optional.
... # I have included them here for clarity.
...
... # `tabu` is for object cycle detection. It is handled for you.
... # `level` is for logging and recursion limits. It is incremented for you.
... # `index` is the "birth order" of the children.
... frozen_greeting = _freeze(obj.greeting, config, tabu, level, 0)
...
... return (
... frozen_greeting[0],
... # Remember that _freeze returns a triple;
... # we are only interested in the first element here.
...
... False,
... # Whether the obj is immutable
... # If the obj is immutable, it's frozen value need not be recomputed every time.
... # This is handled for you.
...
... None,
... # The depth of references contained here or None
... # Currently, this doesn't do anything.
... )
...
>>> freeze(Greeter("Hello"))
'Hello'
----------------
Dictionary order
----------------
As of Python 3.7, dictionaries "remember" their insertion order. As such,
>>> freeze({"a": 1, "b": 2})
(('a', 1), ('b', 2))
>>> freeze({"b": 2, "a": 1})
(('b', 2), ('a', 1))
This behavior is controllable by ``Config.ignore_dict_order``, which emits a ``frozenset`` of pairs.
>>> config = Config(ignore_dict_order=True)
>>> freeze({"b": 2, "a": 1}, config) == freeze({"a": 1, "b": 2}, config)
True
--------------
Summarize diff
--------------
This enables a pretty neat utility to compare two arbitrary Python objects.
>>> from charmonium.freeze import summarize_diffs
>>> obj0 = [0, 1, 2, {3, 4}, {"a": 5, "b": 6, "c": 7}, 8]
>>> obj1 = [0, 8, 2, {3, 5}, {"a": 5, "b": 7, "d": 8}]
>>> print(summarize_diffs(obj0, obj1))
let obj0_sub = obj0
let obj1_sub = obj1
obj0_sub.__len__() == 6
obj1_sub.__len__() == 5
obj0_sub[1] == 1
obj1_sub[1] == 8
obj0_sub[3].has() == 4
obj1_sub[3].has() == no such element
obj0_sub[3].has() == no such element
obj1_sub[3].has() == 5
obj0_sub[4].keys().has() == c
obj1_sub[4].keys().has() == no such element
obj0_sub[4].keys().has() == no such element
obj1_sub[4].keys().has() == d
obj0_sub[4]['b'] == 6
obj1_sub[4]['b'] == 7
And if you don't like my printing style, you can get a programatic
access to this information.
>>> from charmonium.freeze import iterate_diffs
>>> for o1, o2 in iterate_diffs(obj0, obj1):
... print(o1, o2, sep="\n")
ObjectLocation(labels=('obj0', '.__len__()'), objects=(..., 6))
ObjectLocation(labels=('obj1', '.__len__()'), objects=(..., 5))
ObjectLocation(labels=('obj0', '[1]'), objects=(..., 1))
ObjectLocation(labels=('obj1', '[1]'), objects=(..., 8))
ObjectLocation(labels=('obj0', '[3]', '.has()'), objects=(...), 4))
ObjectLocation(labels=('obj1', '[3]', '.has()'), objects=(..., 'no such element'))
ObjectLocation(labels=('obj0', '[3]', '.has()'), objects=(...), 'no such element'))
ObjectLocation(labels=('obj1', '[3]', '.has()'), objects=(..., 5))
ObjectLocation(labels=('obj0', '[4]', '.keys()', '.has()'), objects=(..., 'c'))
ObjectLocation(labels=('obj1', '[4]', '.keys()', '.has()'), objects=(..., 'no such element'))
ObjectLocation(labels=('obj0', '[4]', '.keys()', '.has()'), objects=(..., 'no such element'))
ObjectLocation(labels=('obj1', '[4]', '.keys()', '.has()'), objects=(..., 'd'))
ObjectLocation(labels=('obj0', '[4]', "['b']"), objects=(..., 6))
ObjectLocation(labels=('obj1', '[4]', "['b']"), objects=(..., 7))
---------
Debugging
---------
Use the following lines to see how ``freeze`` decomposes an object into
primitive values.
.. code:: python
import logging, os
logger = logging.getLogger("charmonium.freeze")
logger.setLevel(logging.DEBUG)
fh = logging.FileHandler("freeze.log")
fh.setLevel(logging.DEBUG)
fh.setFormatter(logging.Formatter("%(message)s"))
logger.addHandler(fh)
logger.debug("Program %d", os.getpid())
i = 0
def square_plus_i(x):
# Value of global variable will be included in the function's frozen state.
return x**2 + i
from charmonium.freeze import freeze
freeze(square_plus_i)
This produces a log such as in ``freeze.log``:
::
freeze begin <function square_plus_i at 0x7f9228bff550>
function <function square_plus_i at 0x7f9228bff550>
tuple (('code', <code object square_plus_i at 0x7f9228c6cf50, file "/tmp/ipython_edit_303agyiz/ipython_edit_rez33yf_.py", line 2>), 'closure globals', {'i': 0})
tuple ('code', <code object square_plus_i at 0x7f9228c6cf50, file "/tmp/ipython_edit_303agyiz/ipython_edit_rez33yf_.py", line 2>)
'code'
code <code object square_plus_i at 0x7f9228c6cf50, file "/tmp/ipython_edit_303agyiz/ipython_edit_rez33yf_.py", line 2>
tuple (None, 2)
None
2
b'|\x00d\x01\x13\x00t\x00\x17\x00S\x00'
'closure globals'
dict {'i': 0}
'i'
0
freeze end
I do this to find the differences between subsequent runs:
.. code:: shell
$ python code.py
$ mv freeze.log freeze.0.log
$ python code.py
$ mv freeze.log freeze.1.log
$ sed -i 's/at 0x[0-9a-f]*//g' freeze.*.log
# This removes pointer values that appear in the `repr(...)`.
$ meld freeze.0.log freeze.1.log
# Alternatively, use `icdiff` or `diff -u1`.
If ``freeze(obj)`` is taking a long time, try adding ``freeze(obj,
Config(recursion_limit=20))``. This causes an exception if ``freeze`` recurses
more than a certain number of times. If you hit this exception, consider adding
ignored class, functions, attributes, or objects in ``Config``.
----------
Developing
----------
See `CONTRIBUTING.md`_ for instructions on setting up a development environment.
.. _`CONTRIBUTING.md`: https://github.com/charmoniumQ/charmonium.freeze/tree/main/CONTRIBUTING.md
----
TODO
----
- ☐ Correctness
- ☑ Test hashing sets with different orders. Assert tests fail.
- ☑ Test hashing dicts with different orders. Assert tests fail.
- ☑ Don't include properties in hash.
- ☑ Test that freeze of an object includes freeze of its instance methods.
- ☑ Test functions with minor changes.
- ☑ Test set/dict with diff hash.
- ☑ Test obj with slots.
- ☑ Test hash for objects and classes more carefully.
- ☑ Improve test coverage.
- ☑ Investigate when modules are assumed constant.
- ☐ Detect if a module/package has a version. If present, use that. Else, use each attribute.
- ☐ Support closures which include ``import x`` and ``from x import y``
- ☑ API
- ☑ Use user-customizable multidispatch.
- ☑ Bring hash into separate package.
- ☑ Make it easier to register a freeze method for a type.
- ☑ Encapsulate global config into object.
- ☑ Make freeze object-oriented with a module-level instance, like ``random.random`` and ``random.Random``.
- This makes it easier for different callers to have their own configuration options.
- ☑ Add an option which returns a single 128-bit int instead of a structured object after a certain depth. This is what ``charmonium.determ_hash`` does. Use this configuration in ``charmonium.cache``.
- ☐ Move "get call graph" into its own package.
- ☐ Document configuration options.
- ☑ Document ``summarize_diff`` and ``iterate_diffs``.
- ☐ Have an API for ignoring modules in ``requirements.txt`` or ``pyproject.toml``, and just tracking them by version.
- ☑ Config object should cascade with ``with config.set(a=b)``
- ☑ Make ``freeze`` handle more types:
- ☑ Module: freeze by name.
- ☑ Objects: include the source-code of methods.
- ☑ C extensions. freeze by name, like module
- ☑ Methods
- ☑ fastpath for numpy arrays
- ☑ ``tqdm``
- ☑ ``numpy.int64(1234)``
- ☑ Pandas dataframe
- ☑ Catch Pickle TypeError
- ☑ Catch Pickle ImportError
- ☐ Performance
- ☑ Memoize the hash of immutable data:
- If function contains no locals or globals except other immutables, it is immutable.
- If a collection is immutable and contains only immutables, it is immutable.
- ☑ Make performance benchmarks.
Raw data
{
"_id": null,
"home_page": "https://github.com/charmoniumQ/charmonium.freeze",
"name": "charmonium-freeze",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8,<4.0",
"maintainer_email": "",
"keywords": "freeze,hash",
"author": "Samuel Grayson",
"author_email": "sam+dev@samgrayson.me",
"download_url": "https://files.pythonhosted.org/packages/33/e3/5e28d994c54326d0a9f3c2e2c58c604ad94343e046d0152bfebe10b44952/charmonium_freeze-0.8.3.tar.gz",
"platform": null,
"description": "==========================\ncharmonium.freeze\n==========================\n\n.. image:: https://img.shields.io/pypi/v/charmonium.freeze\n :alt: PyPI Package\n :target: https://pypi.org/project/charmonium.freeze\n.. image:: https://img.shields.io/pypi/dm/charmonium.freeze\n :alt: PyPI Downloads\n :target: https://pypi.org/project/charmonium.freeze\n.. image:: https://img.shields.io/pypi/l/charmonium.freeze\n :alt: License\n :target: https://github.com/charmoniumQ/charmonium.freeze/blob/main/LICENSE\n.. image:: https://img.shields.io/pypi/pyversions/charmonium.freeze\n :alt: Python Versions\n :target: https://pypi.org/project/charmonium.freeze\n.. image:: https://img.shields.io/librariesio/sourcerank/pypi/charmonium.freeze\n :alt: libraries.io sourcerank\n :target: https://libraries.io/pypi/charmonium.freeze\n.. image:: https://img.shields.io/github/stars/charmoniumQ/charmonium.freeze?style=social\n :alt: GitHub stars\n :target: https://github.com/charmoniumQ/charmonium.freeze\n.. image:: https://github.com/charmoniumQ/charmonium.freeze/actions/workflows/main.yaml/badge.svg\n :alt: CI status\n :target: https://github.com/charmoniumQ/charmonium.freeze/actions/workflows/main.yaml\n.. image:: https://codecov.io/gh/charmoniumQ/charmonium.freeze/branch/main/graph/badge.svg?token=56A97FFTGZ\n :alt: Code Coverage\n :target: https://codecov.io/gh/charmoniumQ/charmonium.freeze\n.. image:: https://img.shields.io/github/last-commit/charmoniumQ/charmonium.cache\n :alt: GitHub last commit\n :target: https://github.com/charmoniumQ/charmonium.freeze/commits\n.. image:: http://www.mypy-lang.org/static/mypy_badge.svg\n :target: https://mypy.readthedocs.io/en/stable/\n :alt: Checked with Mypy\n.. image:: https://img.shields.io/badge/code%20style-black-000000.svg\n :target: https://github.com/psf/black\n :alt: Code style: black\n\nInjectively, deterministically maps arbitrary objects to hashable, immutable values\n\n\n----------\nQuickstart\n----------\n\nIf you don't have ``pip`` installed, see the `pip install guide`_.\n\n.. _`pip install guide`: https://pip.pypa.io/en/latest/installing/\n\n.. code-block:: console\n\n $ pip install charmonium.freeze\n\nFor a related project, |charmonium.cache|_, I needed a function that\ndeterministically, injectively maps objects to hashable objects.\n\n- \"Injectively\" means ``freeze(a) == freeze(b)`` implies ``a == b``\n (with the precondition that ``a`` and ``b`` are of the same type).\n\n- \"Deterministically\" means it should return the same value **across\n subsequent process invocations** (with the same interpreter major\n and minor version), unlike Python's |hash|_ function, which is not\n deterministic between processes.\n\n- \"Hashable\" means one can call ``hash(...)`` on it. All hashable\n values are immutable.\n\n.. |hash| replace:: ``hash``\n.. _`hash`: https://docs.python.org/3.8/reference/datamodel.html#object.__hash__\n.. |charmonium.cache| replace:: ``charmonium.cache``\n.. _`charmonium.cache`: https://github.com/charmoniumQ/charmonium.cache\n\nHave you ever felt like you wanted to \"freeze\" a list of arbitrary\ndata into a hashable value? Now you can.\n\n>>> obj = [1, 2, 3, {4, 5, 6}, object()]\n>>> hash(obj)\nTraceback (most recent call last):\n File \"<stdin>\", line 1, in <module>\nTypeError: unhashable type: 'list'\n\n>>> from charmonium.freeze import freeze\n>>> freeze(obj)\n9561766455304166758\n\n-------------\nConfiguration\n-------------\n\nBy changing the configuration, we can see the exact data that gets hashed.\n\nWe can change the configuration in a few ways:\n\n- Object-oriented (preferred)\n\n >>> from charmonium.freeze import Config\n >>> freeze(obj, Config(use_hash=False))\n (1, 2, 3, frozenset({4, 5, 6}), ((('builtins', 'object'),), b'copyreg.__newobj__'))\n\n- Global variable, but in this case, we must also clear the cache when we mutate\n the config.\n\n >>> from charmonium.freeze import global_config\n >>> global_config.use_hash = False\n >>> global_config.memo.clear()\n >>> freeze(obj)\n (1, 2, 3, frozenset({4, 5, 6}), ((('builtins', 'object'),), b'copyreg.__newobj__'))\n\n``use_hash=True`` will be faster and produce less data, but I will demonstrate\nit with ``use_hash=False`` so you can see what data gets included in the state.\n\nSee the source code ``charmonium/freeze/config.py`` for other configuration\noptions.\n\n------------------\nFreezing Functions\n------------------\n\n``freeze`` on functions returns their bytecode, constants, and closure-vars. The\nremarkable thing is that this is true across subsequent invocations of the same\nprocess. If the user edits the script and changes the function, then it's\n``freeze`` will change too. This tells you if it is safe to use the cached value\nof the function.\n\n ::\n\n (freeze(f) == freeze(g)) implies (for all x, f(x) == g(x))\n\n>>> from pprint import pprint\n>>> i = 456\n>>> func = lambda x: x + i + 123\n>>> pprint(freeze(func))\n(('<lambda>', None, 123, b'|\\x00t\\x00\\x17\\x00d\\x01\\x17\\x00S\\x00'),\n (('i', 456),))\n\nAs promised, the frozen value includes the bytecode (``b'|x00t...``), the\nconstants (123), and the closure variables (456). When we change ``i``, we get a\ndifferent frozen value, indicating that the ``func`` might not be\ncomputationally equivalent to what it was before.\n\n>>> i = 789\n>>> pprint(freeze(func))\n(('<lambda>', None, 123, b'|\\x00t\\x00\\x17\\x00d\\x01\\x17\\x00S\\x00'),\n (('i', 789),))\n\n``freeze`` works for objects that use function as data.\n\n>>> import functools\n>>> pprint(freeze(functools.partial(print, 123)))\n(('print',),\n ('print', (123,), (), None),\n (frozenset({'partial',\n (...,\n ('args', (b'member_descriptor', b'args')),\n ('func', (b'member_descriptor', b'func')),\n ('keywords', (b'member_descriptor', b'keywords')))}),\n ('builtins', 'object')))\n\n``freeze`` works for methods.\n\n>>> class Greeter:\n... def __init__(self, greeting):\n... self.greeting = greeting\n... def greet(self, name):\n... print(self.greeting + \" \" + name)\n... \n>>> pprint(freeze(Greeter.greet))\n(('greet',\n None,\n ' ',\n b't\\x00|\\x00j\\x01d\\x01\\x17\\x00|\\x01\\x17\\x00\\x83\\x01\\x01\\x00d\\x00S\\x00'),)\n\n----------------\nFreezing Objects\n----------------\n\n``freeze`` works on objects by freezing their state and freezing their\nmethods. The state is found by the `pickle protocol`_, which the Python language\nimplements by default for all classes. To get an idea of what this returns, call\n``obj.__reduce_ex__(4)``. Because we reuse an existing protocol, ``freeze`` work\ncorrectly on most user-defined types.\n\n.. _`pickle protocol`: https://docs.python.org/3/library/pickle.html#pickling-class-instances\n\n>>> s = Greeter(\"hello\")\n>>> pprint(s.__reduce_ex__(4))\n(<function __newobj__ at 0x...>,\n (<class '__main__.Greeter'>,),\n {'greeting': 'hello'},\n None,\n None)\n>>> pprint(freeze(s))\n(((frozenset({'Greeter',\n (('__init__',\n (('__init__', None, b'|\\x01|\\x00_\\x00d\\x00S\\x00'),)),\n ('greet',\n (('greet',\n None,\n ' ',\n b't\\x00|\\x00j\\x01d\\x01\\x17\\x00|\\x01\\x17\\x00\\x83\\x01'\n b'\\x01\\x00d\\x00S\\x00'),)))}),\n ('builtins', 'object')),),\n (('greeting', 'hello'),),\n b'copyreg.__newobj__')\n\nHowever, there can still be special cases: ``pickle`` may incorporate\nnon-deterministic values. In this case, there are three remedies:\n\n- If you can tweak the definition of the class, add a method called\n ``__getfrozenstate__`` which returns a deterministic snapshot of the\n state. This takes precedence over the Pickle protocol, if it is defined.\n\n >>> class Greeter:\n ... def __init__(self, greeting):\n ... self.greeting = greeting\n ... def greet(self, name):\n ... print(self.greeting + \" \" + name)\n ... def __getfrozenstate__(self):\n ... return self.greeting\n ... \n >>> pprint(freeze(Greeter(\"hello\")))\n ((frozenset({'Greeter',\n (('__getfrozenstate__',\n (('__getfrozenstate__', None, b'|\\x00j\\x00S\\x00'),)),\n ('__init__', (('__init__', None, b'|\\x01|\\x00_\\x00d\\x00S\\x00'),)),\n ('greet',\n (('greet',\n None,\n ' ',\n b't\\x00|\\x00j\\x01d\\x01\\x17\\x00|\\x01\\x17\\x00\\x83\\x01'\n b'\\x01\\x00d\\x00S\\x00'),)))}),\n ('builtins', 'object')),\n 'hello')\n\n- Otherwise, you can ignore certain attributes by changing the\n configuration. See the source code of ``charmonium/freeze/config.py`` for more\n details.\n\n >>> class Greeter:\n ... def __init__(self, greeting):\n ... self.greeting = greeting\n ... def greet(self, name):\n ... print(self.greeting + \" \" + name)\n ... \n >>> config = Config(use_hash=False)\n >>> config.ignore_attributes.add((\"__main__\", \"Greeter\", \"greeting\"))\n >>> pprint(freeze(Greeter(\"hello\"), config))\n (((frozenset({'Greeter',\n (('__init__',\n (('__init__', None, b'|\\x01|\\x00_\\x00d\\x00S\\x00'),)),\n ('greet',\n (('greet',\n None,\n ' ',\n b't\\x00|\\x00j\\x01d\\x01\\x17\\x00|\\x01\\x17\\x00\\x83\\x01'\n b'\\x01\\x00d\\x00S\\x00'),)))}),\n ('builtins', 'object')),),\n (),\n b'copyreg.__newobj__')\n\n Note that ``'hello'`` is not present in the frozen object any more.\n\n- If you cannot tweak the definition of the class or monkeypatch a\n ``__getfrozenstate__`` method, you can still register `single dispatch\n handler`_ for that type:\n\n .. _`single dispatch handler`: https://docs.python.org/3/library/functools.html#functools.singledispatch\n\n >>> from typing import Hashable, Optional, Dict, Tuple\n >>> from charmonium.freeze import _freeze_dispatch, _freeze\n >>> @_freeze_dispatch.register(Greeter)\n ... def _(\n ... obj: Greeter,\n ... config: Config,\n ... tabu: Dict[int, Tuple[int, int]],\n ... level: int,\n ... index: int,\n ... ) -> Tuple[Hashable, bool, Optional[int]]:\n ... # Type annotations are optional.\n ... # I have included them here for clarity.\n ... \n ... # `tabu` is for object cycle detection. It is handled for you.\n ... # `level` is for logging and recursion limits. It is incremented for you.\n ... # `index` is the \"birth order\" of the children.\n ... frozen_greeting = _freeze(obj.greeting, config, tabu, level, 0)\n ... \n ... return (\n ... frozen_greeting[0],\n ... # Remember that _freeze returns a triple;\n ... # we are only interested in the first element here.\n ... \n ... False,\n ... # Whether the obj is immutable\n ... # If the obj is immutable, it's frozen value need not be recomputed every time.\n ... # This is handled for you.\n ... \n ... None,\n ... # The depth of references contained here or None\n ... # Currently, this doesn't do anything.\n ... )\n ... \n >>> freeze(Greeter(\"Hello\"))\n 'Hello'\n\n----------------\nDictionary order\n----------------\n\nAs of Python 3.7, dictionaries \"remember\" their insertion order. As such,\n\n>>> freeze({\"a\": 1, \"b\": 2})\n(('a', 1), ('b', 2))\n>>> freeze({\"b\": 2, \"a\": 1})\n(('b', 2), ('a', 1))\n\nThis behavior is controllable by ``Config.ignore_dict_order``, which emits a ``frozenset`` of pairs.\n\n>>> config = Config(ignore_dict_order=True)\n>>> freeze({\"b\": 2, \"a\": 1}, config) == freeze({\"a\": 1, \"b\": 2}, config)\nTrue\n\n--------------\nSummarize diff\n--------------\n\nThis enables a pretty neat utility to compare two arbitrary Python objects.\n\n>>> from charmonium.freeze import summarize_diffs\n>>> obj0 = [0, 1, 2, {3, 4}, {\"a\": 5, \"b\": 6, \"c\": 7}, 8]\n>>> obj1 = [0, 8, 2, {3, 5}, {\"a\": 5, \"b\": 7, \"d\": 8}]\n>>> print(summarize_diffs(obj0, obj1))\nlet obj0_sub = obj0\nlet obj1_sub = obj1\nobj0_sub.__len__() == 6\nobj1_sub.__len__() == 5\nobj0_sub[1] == 1\nobj1_sub[1] == 8\nobj0_sub[3].has() == 4\nobj1_sub[3].has() == no such element\nobj0_sub[3].has() == no such element\nobj1_sub[3].has() == 5\nobj0_sub[4].keys().has() == c\nobj1_sub[4].keys().has() == no such element\nobj0_sub[4].keys().has() == no such element\nobj1_sub[4].keys().has() == d\nobj0_sub[4]['b'] == 6\nobj1_sub[4]['b'] == 7\n\nAnd if you don't like my printing style, you can get a programatic\naccess to this information.\n\n>>> from charmonium.freeze import iterate_diffs\n>>> for o1, o2 in iterate_diffs(obj0, obj1):\n... print(o1, o2, sep=\"\\n\")\nObjectLocation(labels=('obj0', '.__len__()'), objects=(..., 6))\nObjectLocation(labels=('obj1', '.__len__()'), objects=(..., 5))\nObjectLocation(labels=('obj0', '[1]'), objects=(..., 1))\nObjectLocation(labels=('obj1', '[1]'), objects=(..., 8))\nObjectLocation(labels=('obj0', '[3]', '.has()'), objects=(...), 4))\nObjectLocation(labels=('obj1', '[3]', '.has()'), objects=(..., 'no such element'))\nObjectLocation(labels=('obj0', '[3]', '.has()'), objects=(...), 'no such element'))\nObjectLocation(labels=('obj1', '[3]', '.has()'), objects=(..., 5))\nObjectLocation(labels=('obj0', '[4]', '.keys()', '.has()'), objects=(..., 'c'))\nObjectLocation(labels=('obj1', '[4]', '.keys()', '.has()'), objects=(..., 'no such element'))\nObjectLocation(labels=('obj0', '[4]', '.keys()', '.has()'), objects=(..., 'no such element'))\nObjectLocation(labels=('obj1', '[4]', '.keys()', '.has()'), objects=(..., 'd'))\nObjectLocation(labels=('obj0', '[4]', \"['b']\"), objects=(..., 6))\nObjectLocation(labels=('obj1', '[4]', \"['b']\"), objects=(..., 7))\n\n\n---------\nDebugging\n---------\n\nUse the following lines to see how ``freeze`` decomposes an object into\nprimitive values.\n\n.. code:: python\n\n import logging, os\n logger = logging.getLogger(\"charmonium.freeze\")\n logger.setLevel(logging.DEBUG)\n fh = logging.FileHandler(\"freeze.log\")\n fh.setLevel(logging.DEBUG)\n fh.setFormatter(logging.Formatter(\"%(message)s\"))\n logger.addHandler(fh)\n logger.debug(\"Program %d\", os.getpid())\n\n i = 0\n def square_plus_i(x):\n # Value of global variable will be included in the function's frozen state.\n return x**2 + i\n\n from charmonium.freeze import freeze\n freeze(square_plus_i)\n\n\nThis produces a log such as in ``freeze.log``:\n\n::\n\n freeze begin <function square_plus_i at 0x7f9228bff550>\n function <function square_plus_i at 0x7f9228bff550>\n tuple (('code', <code object square_plus_i at 0x7f9228c6cf50, file \"/tmp/ipython_edit_303agyiz/ipython_edit_rez33yf_.py\", line 2>), 'closure globals', {'i': 0})\n tuple ('code', <code object square_plus_i at 0x7f9228c6cf50, file \"/tmp/ipython_edit_303agyiz/ipython_edit_rez33yf_.py\", line 2>)\n 'code'\n code <code object square_plus_i at 0x7f9228c6cf50, file \"/tmp/ipython_edit_303agyiz/ipython_edit_rez33yf_.py\", line 2>\n tuple (None, 2)\n None\n 2\n b'|\\x00d\\x01\\x13\\x00t\\x00\\x17\\x00S\\x00'\n 'closure globals'\n dict {'i': 0}\n 'i'\n 0\n freeze end\n\nI do this to find the differences between subsequent runs:\n\n.. code:: shell\n\n $ python code.py\n $ mv freeze.log freeze.0.log\n\n $ python code.py\n $ mv freeze.log freeze.1.log\n\n $ sed -i 's/at 0x[0-9a-f]*//g' freeze.*.log\n # This removes pointer values that appear in the `repr(...)`.\n\n $ meld freeze.0.log freeze.1.log\n # Alternatively, use `icdiff` or `diff -u1`.\n\nIf ``freeze(obj)`` is taking a long time, try adding ``freeze(obj,\nConfig(recursion_limit=20))``. This causes an exception if ``freeze`` recurses\nmore than a certain number of times. If you hit this exception, consider adding\nignored class, functions, attributes, or objects in ``Config``.\n\n----------\nDeveloping\n----------\n\nSee `CONTRIBUTING.md`_ for instructions on setting up a development environment.\n\n.. _`CONTRIBUTING.md`: https://github.com/charmoniumQ/charmonium.freeze/tree/main/CONTRIBUTING.md\n\n\n----\nTODO\n----\n\n- \u2610 Correctness\n\n - \u2611 Test hashing sets with different orders. Assert tests fail.\n - \u2611 Test hashing dicts with different orders. Assert tests fail.\n - \u2611 Don't include properties in hash.\n - \u2611 Test that freeze of an object includes freeze of its instance methods.\n - \u2611 Test functions with minor changes.\n - \u2611 Test set/dict with diff hash.\n - \u2611 Test obj with slots.\n - \u2611 Test hash for objects and classes more carefully.\n - \u2611 Improve test coverage.\n - \u2611 Investigate when modules are assumed constant.\n - \u2610 Detect if a module/package has a version. If present, use that. Else, use each attribute.\n - \u2610 Support closures which include ``import x`` and ``from x import y``\n\n- \u2611 API\n\n - \u2611 Use user-customizable multidispatch.\n - \u2611 Bring hash into separate package.\n - \u2611 Make it easier to register a freeze method for a type.\n - \u2611 Encapsulate global config into object.\n - \u2611 Make freeze object-oriented with a module-level instance, like ``random.random`` and ``random.Random``.\n - This makes it easier for different callers to have their own configuration options.\n - \u2611 Add an option which returns a single 128-bit int instead of a structured object after a certain depth. This is what ``charmonium.determ_hash`` does. Use this configuration in ``charmonium.cache``.\n - \u2610 Move \"get call graph\" into its own package.\n - \u2610 Document configuration options.\n - \u2611 Document ``summarize_diff`` and ``iterate_diffs``.\n - \u2610 Have an API for ignoring modules in ``requirements.txt`` or ``pyproject.toml``, and just tracking them by version.\n - \u2611 Config object should cascade with ``with config.set(a=b)``\n\n- \u2611 Make ``freeze`` handle more types:\n\n - \u2611 Module: freeze by name.\n - \u2611 Objects: include the source-code of methods.\n - \u2611 C extensions. freeze by name, like module\n - \u2611 Methods\n - \u2611 fastpath for numpy arrays\n - \u2611 ``tqdm``\n - \u2611 ``numpy.int64(1234)``\n - \u2611 Pandas dataframe\n - \u2611 Catch Pickle TypeError\n - \u2611 Catch Pickle ImportError\n\n- \u2610 Performance\n\n - \u2611 Memoize the hash of immutable data:\n - If function contains no locals or globals except other immutables, it is immutable.\n - If a collection is immutable and contains only immutables, it is immutable.\n - \u2611 Make performance benchmarks.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Injectively, deterministically maps arbitrary objects to hashable values",
"version": "0.8.3",
"split_keywords": [
"freeze",
"hash"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "af5360d615fb2ae9af1636edda510990a597a56c4e10d427576736c17656caee",
"md5": "68f6b90c401110d07a835e45558bb5a8",
"sha256": "50f2fa8f584a2fac3c2941343fe775a3385794d65ec06400f3712f0f0a081fd6"
},
"downloads": -1,
"filename": "charmonium_freeze-0.8.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "68f6b90c401110d07a835e45558bb5a8",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8,<4.0",
"size": 937875,
"upload_time": "2023-03-19T00:42:33",
"upload_time_iso_8601": "2023-03-19T00:42:33.448649Z",
"url": "https://files.pythonhosted.org/packages/af/53/60d615fb2ae9af1636edda510990a597a56c4e10d427576736c17656caee/charmonium_freeze-0.8.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "33e35e28d994c54326d0a9f3c2e2c58c604ad94343e046d0152bfebe10b44952",
"md5": "754a7a35a1b2488594c8ba6e61197e08",
"sha256": "acbc9794786fbb002b36ed6fb75973b8e1a7c2bd0d2c3c8f0cfaf02443b4c2f0"
},
"downloads": -1,
"filename": "charmonium_freeze-0.8.3.tar.gz",
"has_sig": false,
"md5_digest": "754a7a35a1b2488594c8ba6e61197e08",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8,<4.0",
"size": 735672,
"upload_time": "2023-03-19T00:42:35",
"upload_time_iso_8601": "2023-03-19T00:42:35.530131Z",
"url": "https://files.pythonhosted.org/packages/33/e3/5e28d994c54326d0a9f3c2e2c58c604ad94343e046d0152bfebe10b44952/charmonium_freeze-0.8.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-03-19 00:42:35",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "charmoniumQ",
"github_project": "charmonium.freeze",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "charmonium-freeze"
}