charmonium-freeze


Namecharmonium-freeze JSON
Version 0.8.3 PyPI version JSON
download
home_pagehttps://github.com/charmoniumQ/charmonium.freeze
SummaryInjectively, deterministically maps arbitrary objects to hashable values
upload_time2023-03-19 00:42:35
maintainer
docs_urlNone
authorSamuel Grayson
requires_python>=3.8,<4.0
licenseMIT
keywords freeze hash
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ==========================
charmonium.freeze
==========================

.. image:: https://img.shields.io/pypi/v/charmonium.freeze
   :alt: PyPI Package
   :target: https://pypi.org/project/charmonium.freeze
.. image:: https://img.shields.io/pypi/dm/charmonium.freeze
   :alt: PyPI Downloads
   :target: https://pypi.org/project/charmonium.freeze
.. image:: https://img.shields.io/pypi/l/charmonium.freeze
   :alt: License
   :target: https://github.com/charmoniumQ/charmonium.freeze/blob/main/LICENSE
.. image:: https://img.shields.io/pypi/pyversions/charmonium.freeze
   :alt: Python Versions
   :target: https://pypi.org/project/charmonium.freeze
.. image:: https://img.shields.io/librariesio/sourcerank/pypi/charmonium.freeze
   :alt: libraries.io sourcerank
   :target: https://libraries.io/pypi/charmonium.freeze
.. image:: https://img.shields.io/github/stars/charmoniumQ/charmonium.freeze?style=social
   :alt: GitHub stars
   :target: https://github.com/charmoniumQ/charmonium.freeze
.. image:: https://github.com/charmoniumQ/charmonium.freeze/actions/workflows/main.yaml/badge.svg
   :alt: CI status
   :target: https://github.com/charmoniumQ/charmonium.freeze/actions/workflows/main.yaml
.. image:: https://codecov.io/gh/charmoniumQ/charmonium.freeze/branch/main/graph/badge.svg?token=56A97FFTGZ
   :alt: Code Coverage
   :target: https://codecov.io/gh/charmoniumQ/charmonium.freeze
.. image:: https://img.shields.io/github/last-commit/charmoniumQ/charmonium.cache
   :alt: GitHub last commit
   :target: https://github.com/charmoniumQ/charmonium.freeze/commits
.. image:: http://www.mypy-lang.org/static/mypy_badge.svg
   :target: https://mypy.readthedocs.io/en/stable/
   :alt: Checked with Mypy
.. image:: https://img.shields.io/badge/code%20style-black-000000.svg
   :target: https://github.com/psf/black
   :alt: Code style: black

Injectively, deterministically maps arbitrary objects to hashable, immutable values


----------
Quickstart
----------

If you don't have ``pip`` installed, see the `pip install guide`_.

.. _`pip install guide`: https://pip.pypa.io/en/latest/installing/

.. code-block:: console

    $ pip install charmonium.freeze

For a related project, |charmonium.cache|_, I needed a function that
deterministically, injectively maps objects to hashable objects.

- "Injectively" means ``freeze(a) == freeze(b)`` implies ``a == b``
  (with the precondition that ``a`` and ``b`` are of the same type).

- "Deterministically" means it should return the same value **across
  subsequent process invocations** (with the same interpreter major
  and minor version), unlike Python's |hash|_ function, which is not
  deterministic between processes.

- "Hashable" means one can call ``hash(...)`` on it. All hashable
  values are immutable.

.. |hash| replace:: ``hash``
.. _`hash`: https://docs.python.org/3.8/reference/datamodel.html#object.__hash__
.. |charmonium.cache| replace:: ``charmonium.cache``
.. _`charmonium.cache`: https://github.com/charmoniumQ/charmonium.cache

Have you ever felt like you wanted to "freeze" a list of arbitrary
data into a hashable value? Now you can.

>>> obj = [1, 2, 3, {4, 5, 6}, object()]
>>> hash(obj)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

>>> from charmonium.freeze import freeze
>>> freeze(obj)
9561766455304166758

-------------
Configuration
-------------

By changing the configuration, we can see the exact data that gets hashed.

We can change the configuration in a few ways:

- Object-oriented (preferred)

  >>> from charmonium.freeze import Config
  >>> freeze(obj, Config(use_hash=False))
  (1, 2, 3, frozenset({4, 5, 6}), ((('builtins', 'object'),), b'copyreg.__newobj__'))

- Global variable, but in this case, we must also clear the cache when we mutate
  the config.

  >>> from charmonium.freeze import global_config
  >>> global_config.use_hash = False
  >>> global_config.memo.clear()
  >>> freeze(obj)
  (1, 2, 3, frozenset({4, 5, 6}), ((('builtins', 'object'),), b'copyreg.__newobj__'))

``use_hash=True`` will be faster and produce less data, but I will demonstrate
it with ``use_hash=False`` so you can see what data gets included in the state.

See the source code ``charmonium/freeze/config.py`` for other configuration
options.

------------------
Freezing Functions
------------------

``freeze`` on functions returns their bytecode, constants, and closure-vars. The
remarkable thing is that this is true across subsequent invocations of the same
process. If the user edits the script and changes the function, then it's
``freeze`` will change too. This tells you if it is safe to use the cached value
of the function.

  ::

    (freeze(f) == freeze(g)) implies (for all x, f(x) == g(x))

>>> from pprint import pprint
>>> i = 456
>>> func = lambda x: x + i + 123
>>> pprint(freeze(func))
(('<lambda>', None, 123, b'|\x00t\x00\x17\x00d\x01\x17\x00S\x00'),
 (('i', 456),))

As promised, the frozen value includes the bytecode (``b'|x00t...``), the
constants (123), and the closure variables (456). When we change ``i``, we get a
different frozen value, indicating that the ``func`` might not be
computationally equivalent to what it was before.

>>> i = 789
>>> pprint(freeze(func))
(('<lambda>', None, 123, b'|\x00t\x00\x17\x00d\x01\x17\x00S\x00'),
 (('i', 789),))

``freeze`` works for objects that use function as data.

>>> import functools
>>> pprint(freeze(functools.partial(print, 123)))
(('print',),
 ('print', (123,), (), None),
 (frozenset({'partial',
             (...,
              ('args', (b'member_descriptor', b'args')),
              ('func', (b'member_descriptor', b'func')),
              ('keywords', (b'member_descriptor', b'keywords')))}),
  ('builtins', 'object')))

``freeze`` works for methods.

>>> class Greeter:
...     def __init__(self, greeting):
...         self.greeting = greeting
...     def greet(self, name):
...         print(self.greeting + " " + name)
... 
>>> pprint(freeze(Greeter.greet))
(('greet',
  None,
  ' ',
  b't\x00|\x00j\x01d\x01\x17\x00|\x01\x17\x00\x83\x01\x01\x00d\x00S\x00'),)

----------------
Freezing Objects
----------------

``freeze`` works on objects by freezing their state and freezing their
methods. The state is found by the `pickle protocol`_, which the Python language
implements by default for all classes. To get an idea of what this returns, call
``obj.__reduce_ex__(4)``. Because we reuse an existing protocol, ``freeze`` work
correctly on most user-defined types.

.. _`pickle protocol`: https://docs.python.org/3/library/pickle.html#pickling-class-instances

>>> s = Greeter("hello")
>>> pprint(s.__reduce_ex__(4))
(<function __newobj__ at 0x...>,
 (<class '__main__.Greeter'>,),
 {'greeting': 'hello'},
 None,
 None)
>>> pprint(freeze(s))
(((frozenset({'Greeter',
              (('__init__',
                (('__init__', None, b'|\x01|\x00_\x00d\x00S\x00'),)),
               ('greet',
                (('greet',
                  None,
                  ' ',
                  b't\x00|\x00j\x01d\x01\x17\x00|\x01\x17\x00\x83\x01'
                  b'\x01\x00d\x00S\x00'),)))}),
   ('builtins', 'object')),),
 (('greeting', 'hello'),),
 b'copyreg.__newobj__')

However, there can still be special cases: ``pickle`` may incorporate
non-deterministic values. In this case, there are three remedies:

- If you can tweak the definition of the class, add a method called
  ``__getfrozenstate__`` which returns a deterministic snapshot of the
  state. This takes precedence over the Pickle protocol, if it is defined.

  >>> class Greeter:
  ...     def __init__(self, greeting):
  ...         self.greeting = greeting
  ...     def greet(self, name):
  ...         print(self.greeting + " " + name)
  ...     def __getfrozenstate__(self):
  ...         return self.greeting
  ... 
  >>> pprint(freeze(Greeter("hello")))
  ((frozenset({'Greeter',
               (('__getfrozenstate__',
                 (('__getfrozenstate__', None, b'|\x00j\x00S\x00'),)),
                ('__init__', (('__init__', None, b'|\x01|\x00_\x00d\x00S\x00'),)),
                ('greet',
                 (('greet',
                   None,
                   ' ',
                   b't\x00|\x00j\x01d\x01\x17\x00|\x01\x17\x00\x83\x01'
                   b'\x01\x00d\x00S\x00'),)))}),
    ('builtins', 'object')),
   'hello')

- Otherwise, you can ignore certain attributes by changing the
  configuration. See the source code of ``charmonium/freeze/config.py`` for more
  details.

  >>> class Greeter:
  ...     def __init__(self, greeting):
  ...         self.greeting = greeting
  ...     def greet(self, name):
  ...         print(self.greeting + " " + name)
  ... 
  >>> config = Config(use_hash=False)
  >>> config.ignore_attributes.add(("__main__", "Greeter", "greeting"))
  >>> pprint(freeze(Greeter("hello"), config))
  (((frozenset({'Greeter',
                (('__init__',
                  (('__init__', None, b'|\x01|\x00_\x00d\x00S\x00'),)),
                 ('greet',
                  (('greet',
                    None,
                    ' ',
                    b't\x00|\x00j\x01d\x01\x17\x00|\x01\x17\x00\x83\x01'
                    b'\x01\x00d\x00S\x00'),)))}),
     ('builtins', 'object')),),
   (),
   b'copyreg.__newobj__')

  Note that ``'hello'`` is not present in the frozen object any more.

- If you cannot tweak the definition of the class or monkeypatch a
  ``__getfrozenstate__`` method, you can still register `single dispatch
  handler`_ for that type:

  .. _`single dispatch handler`: https://docs.python.org/3/library/functools.html#functools.singledispatch

  >>> from typing import Hashable, Optional, Dict, Tuple
  >>> from charmonium.freeze import _freeze_dispatch, _freeze
  >>> @_freeze_dispatch.register(Greeter)
  ... def _(
  ...         obj: Greeter,
  ...         config: Config,
  ...         tabu: Dict[int, Tuple[int, int]],
  ...         level: int,
  ...         index: int,
  ...     ) -> Tuple[Hashable, bool, Optional[int]]:
  ...     # Type annotations are optional.
  ...     # I have included them here for clarity.
  ... 
  ...     # `tabu` is for object cycle detection. It is handled for you.
  ...     # `level` is for logging and recursion limits. It is incremented for you.
  ...     # `index` is the "birth order" of the children.
  ...     frozen_greeting = _freeze(obj.greeting, config, tabu, level, 0)
  ... 
  ...     return (
  ...         frozen_greeting[0],
  ...         # Remember that _freeze returns a triple;
  ...         # we are only interested in the first element here.
  ... 
  ...         False,
  ...         # Whether the obj is immutable
  ...         # If the obj is immutable, it's frozen value need not be recomputed every time.
  ...         # This is handled for you.
  ... 
  ...         None,
  ...         # The depth of references contained here or None
  ...         # Currently, this doesn't do anything.
  ...     )
  ... 
  >>> freeze(Greeter("Hello"))
  'Hello'

----------------
Dictionary order
----------------

As of Python 3.7, dictionaries "remember" their insertion order. As such,

>>> freeze({"a": 1, "b": 2})
(('a', 1), ('b', 2))
>>> freeze({"b": 2, "a": 1})
(('b', 2), ('a', 1))

This behavior is controllable by ``Config.ignore_dict_order``, which emits a ``frozenset`` of pairs.

>>> config = Config(ignore_dict_order=True)
>>> freeze({"b": 2, "a": 1}, config) == freeze({"a": 1, "b": 2}, config)
True

--------------
Summarize diff
--------------

This enables a pretty neat utility to compare two arbitrary Python objects.

>>> from charmonium.freeze import summarize_diffs
>>> obj0 = [0, 1, 2, {3, 4}, {"a": 5, "b": 6, "c": 7}, 8]
>>> obj1 = [0, 8, 2, {3, 5}, {"a": 5, "b": 7, "d": 8}]
>>> print(summarize_diffs(obj0, obj1))
let obj0_sub = obj0
let obj1_sub = obj1
obj0_sub.__len__() == 6
obj1_sub.__len__() == 5
obj0_sub[1] == 1
obj1_sub[1] == 8
obj0_sub[3].has() == 4
obj1_sub[3].has() == no such element
obj0_sub[3].has() == no such element
obj1_sub[3].has() == 5
obj0_sub[4].keys().has() == c
obj1_sub[4].keys().has() == no such element
obj0_sub[4].keys().has() == no such element
obj1_sub[4].keys().has() == d
obj0_sub[4]['b'] == 6
obj1_sub[4]['b'] == 7

And if you don't like my printing style, you can get a programatic
access to this information.

>>> from charmonium.freeze import iterate_diffs
>>> for o1, o2 in iterate_diffs(obj0, obj1):
...    print(o1, o2, sep="\n")
ObjectLocation(labels=('obj0', '.__len__()'), objects=(..., 6))
ObjectLocation(labels=('obj1', '.__len__()'), objects=(..., 5))
ObjectLocation(labels=('obj0', '[1]'), objects=(..., 1))
ObjectLocation(labels=('obj1', '[1]'), objects=(..., 8))
ObjectLocation(labels=('obj0', '[3]', '.has()'), objects=(...), 4))
ObjectLocation(labels=('obj1', '[3]', '.has()'), objects=(..., 'no such element'))
ObjectLocation(labels=('obj0', '[3]', '.has()'), objects=(...), 'no such element'))
ObjectLocation(labels=('obj1', '[3]', '.has()'), objects=(..., 5))
ObjectLocation(labels=('obj0', '[4]', '.keys()', '.has()'), objects=(..., 'c'))
ObjectLocation(labels=('obj1', '[4]', '.keys()', '.has()'), objects=(..., 'no such element'))
ObjectLocation(labels=('obj0', '[4]', '.keys()', '.has()'), objects=(..., 'no such element'))
ObjectLocation(labels=('obj1', '[4]', '.keys()', '.has()'), objects=(..., 'd'))
ObjectLocation(labels=('obj0', '[4]', "['b']"), objects=(..., 6))
ObjectLocation(labels=('obj1', '[4]', "['b']"), objects=(..., 7))


---------
Debugging
---------

Use the following lines to see how ``freeze`` decomposes an object into
primitive values.

.. code:: python

    import logging, os
    logger = logging.getLogger("charmonium.freeze")
    logger.setLevel(logging.DEBUG)
    fh = logging.FileHandler("freeze.log")
    fh.setLevel(logging.DEBUG)
    fh.setFormatter(logging.Formatter("%(message)s"))
    logger.addHandler(fh)
    logger.debug("Program %d", os.getpid())

    i = 0
    def square_plus_i(x):
        # Value of global variable will be included in the function's frozen state.
        return x**2 + i

    from charmonium.freeze import freeze
    freeze(square_plus_i)


This produces a log such as in ``freeze.log``:

::

    freeze begin <function square_plus_i at 0x7f9228bff550>
     function <function square_plus_i at 0x7f9228bff550>
      tuple (('code', <code object square_plus_i at 0x7f9228c6cf50, file "/tmp/ipython_edit_303agyiz/ipython_edit_rez33yf_.py", line 2>), 'closure globals', {'i': 0})
       tuple ('code', <code object square_plus_i at 0x7f9228c6cf50, file "/tmp/ipython_edit_303agyiz/ipython_edit_rez33yf_.py", line 2>)
        'code'
        code <code object square_plus_i at 0x7f9228c6cf50, file "/tmp/ipython_edit_303agyiz/ipython_edit_rez33yf_.py", line 2>
         tuple (None, 2)
          None
          2
         b'|\x00d\x01\x13\x00t\x00\x17\x00S\x00'
       'closure globals'
       dict {'i': 0}
        'i'
        0
    freeze end

I do this to find the differences between subsequent runs:

.. code:: shell

    $ python code.py
    $ mv freeze.log freeze.0.log

    $ python code.py
    $ mv freeze.log freeze.1.log

    $ sed -i 's/at 0x[0-9a-f]*//g' freeze.*.log
    # This removes pointer values that appear in the `repr(...)`.

    $ meld freeze.0.log freeze.1.log
    # Alternatively, use `icdiff` or `diff -u1`.

If ``freeze(obj)`` is taking a long time, try adding ``freeze(obj,
Config(recursion_limit=20))``. This causes an exception if ``freeze`` recurses
more than a certain number of times. If you hit this exception, consider adding
ignored class, functions, attributes, or objects in ``Config``.

----------
Developing
----------

See `CONTRIBUTING.md`_ for instructions on setting up a development environment.

.. _`CONTRIBUTING.md`: https://github.com/charmoniumQ/charmonium.freeze/tree/main/CONTRIBUTING.md


----
TODO
----

- ☐ Correctness

  - ☑ Test hashing sets with different orders. Assert tests fail.
  - ☑ Test hashing dicts with different orders. Assert tests fail.
  - ☑ Don't include properties in hash.
  - ☑ Test that freeze of an object includes freeze of its instance methods.
  - ☑ Test functions with minor changes.
  - ☑ Test set/dict with diff hash.
  - ☑ Test obj with slots.
  - ☑ Test hash for objects and classes more carefully.
  - ☑ Improve test coverage.
  - ☑ Investigate when modules are assumed constant.
  - ☐ Detect if a module/package has a version. If present, use that. Else, use each attribute.
  - ☐ Support closures which include ``import x`` and ``from x import y``

- ☑ API

  - ☑ Use user-customizable multidispatch.
  - ☑ Bring hash into separate package.
  - ☑ Make it easier to register a freeze method for a type.
  - ☑ Encapsulate global config into object.
  - ☑ Make freeze object-oriented with a module-level instance, like ``random.random`` and ``random.Random``.
    - This makes it easier for different callers to have their own configuration options.
  - ☑ Add an option which returns a single 128-bit int instead of a structured object after a certain depth. This is what ``charmonium.determ_hash`` does. Use this configuration in ``charmonium.cache``.
  - ☐ Move "get call graph" into its own package.
  - ☐ Document configuration options.
  - ☑ Document ``summarize_diff`` and ``iterate_diffs``.
  - ☐ Have an API for ignoring modules in ``requirements.txt`` or ``pyproject.toml``, and just tracking them by version.
  - ☑ Config object should cascade with ``with config.set(a=b)``

- ☑ Make ``freeze`` handle more types:

  - ☑ Module: freeze by name.
  - ☑ Objects: include the source-code of methods.
  - ☑ C extensions. freeze by name, like module
  - ☑ Methods
  - ☑ fastpath for numpy arrays
  - ☑ ``tqdm``
  - ☑ ``numpy.int64(1234)``
  - ☑ Pandas dataframe
  - ☑ Catch Pickle TypeError
  - ☑ Catch Pickle ImportError

- ☐ Performance

  - ☑ Memoize the hash of immutable data:
    - If function contains no locals or globals except other immutables, it is immutable.
    - If a collection is immutable and contains only immutables, it is immutable.
  - ☑ Make performance benchmarks.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/charmoniumQ/charmonium.freeze",
    "name": "charmonium-freeze",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8,<4.0",
    "maintainer_email": "",
    "keywords": "freeze,hash",
    "author": "Samuel Grayson",
    "author_email": "sam+dev@samgrayson.me",
    "download_url": "https://files.pythonhosted.org/packages/33/e3/5e28d994c54326d0a9f3c2e2c58c604ad94343e046d0152bfebe10b44952/charmonium_freeze-0.8.3.tar.gz",
    "platform": null,
    "description": "==========================\ncharmonium.freeze\n==========================\n\n.. image:: https://img.shields.io/pypi/v/charmonium.freeze\n   :alt: PyPI Package\n   :target: https://pypi.org/project/charmonium.freeze\n.. image:: https://img.shields.io/pypi/dm/charmonium.freeze\n   :alt: PyPI Downloads\n   :target: https://pypi.org/project/charmonium.freeze\n.. image:: https://img.shields.io/pypi/l/charmonium.freeze\n   :alt: License\n   :target: https://github.com/charmoniumQ/charmonium.freeze/blob/main/LICENSE\n.. image:: https://img.shields.io/pypi/pyversions/charmonium.freeze\n   :alt: Python Versions\n   :target: https://pypi.org/project/charmonium.freeze\n.. image:: https://img.shields.io/librariesio/sourcerank/pypi/charmonium.freeze\n   :alt: libraries.io sourcerank\n   :target: https://libraries.io/pypi/charmonium.freeze\n.. image:: https://img.shields.io/github/stars/charmoniumQ/charmonium.freeze?style=social\n   :alt: GitHub stars\n   :target: https://github.com/charmoniumQ/charmonium.freeze\n.. image:: https://github.com/charmoniumQ/charmonium.freeze/actions/workflows/main.yaml/badge.svg\n   :alt: CI status\n   :target: https://github.com/charmoniumQ/charmonium.freeze/actions/workflows/main.yaml\n.. image:: https://codecov.io/gh/charmoniumQ/charmonium.freeze/branch/main/graph/badge.svg?token=56A97FFTGZ\n   :alt: Code Coverage\n   :target: https://codecov.io/gh/charmoniumQ/charmonium.freeze\n.. image:: https://img.shields.io/github/last-commit/charmoniumQ/charmonium.cache\n   :alt: GitHub last commit\n   :target: https://github.com/charmoniumQ/charmonium.freeze/commits\n.. image:: http://www.mypy-lang.org/static/mypy_badge.svg\n   :target: https://mypy.readthedocs.io/en/stable/\n   :alt: Checked with Mypy\n.. image:: https://img.shields.io/badge/code%20style-black-000000.svg\n   :target: https://github.com/psf/black\n   :alt: Code style: black\n\nInjectively, deterministically maps arbitrary objects to hashable, immutable values\n\n\n----------\nQuickstart\n----------\n\nIf you don't have ``pip`` installed, see the `pip install guide`_.\n\n.. _`pip install guide`: https://pip.pypa.io/en/latest/installing/\n\n.. code-block:: console\n\n    $ pip install charmonium.freeze\n\nFor a related project, |charmonium.cache|_, I needed a function that\ndeterministically, injectively maps objects to hashable objects.\n\n- \"Injectively\" means ``freeze(a) == freeze(b)`` implies ``a == b``\n  (with the precondition that ``a`` and ``b`` are of the same type).\n\n- \"Deterministically\" means it should return the same value **across\n  subsequent process invocations** (with the same interpreter major\n  and minor version), unlike Python's |hash|_ function, which is not\n  deterministic between processes.\n\n- \"Hashable\" means one can call ``hash(...)`` on it. All hashable\n  values are immutable.\n\n.. |hash| replace:: ``hash``\n.. _`hash`: https://docs.python.org/3.8/reference/datamodel.html#object.__hash__\n.. |charmonium.cache| replace:: ``charmonium.cache``\n.. _`charmonium.cache`: https://github.com/charmoniumQ/charmonium.cache\n\nHave you ever felt like you wanted to \"freeze\" a list of arbitrary\ndata into a hashable value? Now you can.\n\n>>> obj = [1, 2, 3, {4, 5, 6}, object()]\n>>> hash(obj)\nTraceback (most recent call last):\n  File \"<stdin>\", line 1, in <module>\nTypeError: unhashable type: 'list'\n\n>>> from charmonium.freeze import freeze\n>>> freeze(obj)\n9561766455304166758\n\n-------------\nConfiguration\n-------------\n\nBy changing the configuration, we can see the exact data that gets hashed.\n\nWe can change the configuration in a few ways:\n\n- Object-oriented (preferred)\n\n  >>> from charmonium.freeze import Config\n  >>> freeze(obj, Config(use_hash=False))\n  (1, 2, 3, frozenset({4, 5, 6}), ((('builtins', 'object'),), b'copyreg.__newobj__'))\n\n- Global variable, but in this case, we must also clear the cache when we mutate\n  the config.\n\n  >>> from charmonium.freeze import global_config\n  >>> global_config.use_hash = False\n  >>> global_config.memo.clear()\n  >>> freeze(obj)\n  (1, 2, 3, frozenset({4, 5, 6}), ((('builtins', 'object'),), b'copyreg.__newobj__'))\n\n``use_hash=True`` will be faster and produce less data, but I will demonstrate\nit with ``use_hash=False`` so you can see what data gets included in the state.\n\nSee the source code ``charmonium/freeze/config.py`` for other configuration\noptions.\n\n------------------\nFreezing Functions\n------------------\n\n``freeze`` on functions returns their bytecode, constants, and closure-vars. The\nremarkable thing is that this is true across subsequent invocations of the same\nprocess. If the user edits the script and changes the function, then it's\n``freeze`` will change too. This tells you if it is safe to use the cached value\nof the function.\n\n  ::\n\n    (freeze(f) == freeze(g)) implies (for all x, f(x) == g(x))\n\n>>> from pprint import pprint\n>>> i = 456\n>>> func = lambda x: x + i + 123\n>>> pprint(freeze(func))\n(('<lambda>', None, 123, b'|\\x00t\\x00\\x17\\x00d\\x01\\x17\\x00S\\x00'),\n (('i', 456),))\n\nAs promised, the frozen value includes the bytecode (``b'|x00t...``), the\nconstants (123), and the closure variables (456). When we change ``i``, we get a\ndifferent frozen value, indicating that the ``func`` might not be\ncomputationally equivalent to what it was before.\n\n>>> i = 789\n>>> pprint(freeze(func))\n(('<lambda>', None, 123, b'|\\x00t\\x00\\x17\\x00d\\x01\\x17\\x00S\\x00'),\n (('i', 789),))\n\n``freeze`` works for objects that use function as data.\n\n>>> import functools\n>>> pprint(freeze(functools.partial(print, 123)))\n(('print',),\n ('print', (123,), (), None),\n (frozenset({'partial',\n             (...,\n              ('args', (b'member_descriptor', b'args')),\n              ('func', (b'member_descriptor', b'func')),\n              ('keywords', (b'member_descriptor', b'keywords')))}),\n  ('builtins', 'object')))\n\n``freeze`` works for methods.\n\n>>> class Greeter:\n...     def __init__(self, greeting):\n...         self.greeting = greeting\n...     def greet(self, name):\n...         print(self.greeting + \" \" + name)\n... \n>>> pprint(freeze(Greeter.greet))\n(('greet',\n  None,\n  ' ',\n  b't\\x00|\\x00j\\x01d\\x01\\x17\\x00|\\x01\\x17\\x00\\x83\\x01\\x01\\x00d\\x00S\\x00'),)\n\n----------------\nFreezing Objects\n----------------\n\n``freeze`` works on objects by freezing their state and freezing their\nmethods. The state is found by the `pickle protocol`_, which the Python language\nimplements by default for all classes. To get an idea of what this returns, call\n``obj.__reduce_ex__(4)``. Because we reuse an existing protocol, ``freeze`` work\ncorrectly on most user-defined types.\n\n.. _`pickle protocol`: https://docs.python.org/3/library/pickle.html#pickling-class-instances\n\n>>> s = Greeter(\"hello\")\n>>> pprint(s.__reduce_ex__(4))\n(<function __newobj__ at 0x...>,\n (<class '__main__.Greeter'>,),\n {'greeting': 'hello'},\n None,\n None)\n>>> pprint(freeze(s))\n(((frozenset({'Greeter',\n              (('__init__',\n                (('__init__', None, b'|\\x01|\\x00_\\x00d\\x00S\\x00'),)),\n               ('greet',\n                (('greet',\n                  None,\n                  ' ',\n                  b't\\x00|\\x00j\\x01d\\x01\\x17\\x00|\\x01\\x17\\x00\\x83\\x01'\n                  b'\\x01\\x00d\\x00S\\x00'),)))}),\n   ('builtins', 'object')),),\n (('greeting', 'hello'),),\n b'copyreg.__newobj__')\n\nHowever, there can still be special cases: ``pickle`` may incorporate\nnon-deterministic values. In this case, there are three remedies:\n\n- If you can tweak the definition of the class, add a method called\n  ``__getfrozenstate__`` which returns a deterministic snapshot of the\n  state. This takes precedence over the Pickle protocol, if it is defined.\n\n  >>> class Greeter:\n  ...     def __init__(self, greeting):\n  ...         self.greeting = greeting\n  ...     def greet(self, name):\n  ...         print(self.greeting + \" \" + name)\n  ...     def __getfrozenstate__(self):\n  ...         return self.greeting\n  ... \n  >>> pprint(freeze(Greeter(\"hello\")))\n  ((frozenset({'Greeter',\n               (('__getfrozenstate__',\n                 (('__getfrozenstate__', None, b'|\\x00j\\x00S\\x00'),)),\n                ('__init__', (('__init__', None, b'|\\x01|\\x00_\\x00d\\x00S\\x00'),)),\n                ('greet',\n                 (('greet',\n                   None,\n                   ' ',\n                   b't\\x00|\\x00j\\x01d\\x01\\x17\\x00|\\x01\\x17\\x00\\x83\\x01'\n                   b'\\x01\\x00d\\x00S\\x00'),)))}),\n    ('builtins', 'object')),\n   'hello')\n\n- Otherwise, you can ignore certain attributes by changing the\n  configuration. See the source code of ``charmonium/freeze/config.py`` for more\n  details.\n\n  >>> class Greeter:\n  ...     def __init__(self, greeting):\n  ...         self.greeting = greeting\n  ...     def greet(self, name):\n  ...         print(self.greeting + \" \" + name)\n  ... \n  >>> config = Config(use_hash=False)\n  >>> config.ignore_attributes.add((\"__main__\", \"Greeter\", \"greeting\"))\n  >>> pprint(freeze(Greeter(\"hello\"), config))\n  (((frozenset({'Greeter',\n                (('__init__',\n                  (('__init__', None, b'|\\x01|\\x00_\\x00d\\x00S\\x00'),)),\n                 ('greet',\n                  (('greet',\n                    None,\n                    ' ',\n                    b't\\x00|\\x00j\\x01d\\x01\\x17\\x00|\\x01\\x17\\x00\\x83\\x01'\n                    b'\\x01\\x00d\\x00S\\x00'),)))}),\n     ('builtins', 'object')),),\n   (),\n   b'copyreg.__newobj__')\n\n  Note that ``'hello'`` is not present in the frozen object any more.\n\n- If you cannot tweak the definition of the class or monkeypatch a\n  ``__getfrozenstate__`` method, you can still register `single dispatch\n  handler`_ for that type:\n\n  .. _`single dispatch handler`: https://docs.python.org/3/library/functools.html#functools.singledispatch\n\n  >>> from typing import Hashable, Optional, Dict, Tuple\n  >>> from charmonium.freeze import _freeze_dispatch, _freeze\n  >>> @_freeze_dispatch.register(Greeter)\n  ... def _(\n  ...         obj: Greeter,\n  ...         config: Config,\n  ...         tabu: Dict[int, Tuple[int, int]],\n  ...         level: int,\n  ...         index: int,\n  ...     ) -> Tuple[Hashable, bool, Optional[int]]:\n  ...     # Type annotations are optional.\n  ...     # I have included them here for clarity.\n  ... \n  ...     # `tabu` is for object cycle detection. It is handled for you.\n  ...     # `level` is for logging and recursion limits. It is incremented for you.\n  ...     # `index` is the \"birth order\" of the children.\n  ...     frozen_greeting = _freeze(obj.greeting, config, tabu, level, 0)\n  ... \n  ...     return (\n  ...         frozen_greeting[0],\n  ...         # Remember that _freeze returns a triple;\n  ...         # we are only interested in the first element here.\n  ... \n  ...         False,\n  ...         # Whether the obj is immutable\n  ...         # If the obj is immutable, it's frozen value need not be recomputed every time.\n  ...         # This is handled for you.\n  ... \n  ...         None,\n  ...         # The depth of references contained here or None\n  ...         # Currently, this doesn't do anything.\n  ...     )\n  ... \n  >>> freeze(Greeter(\"Hello\"))\n  'Hello'\n\n----------------\nDictionary order\n----------------\n\nAs of Python 3.7, dictionaries \"remember\" their insertion order. As such,\n\n>>> freeze({\"a\": 1, \"b\": 2})\n(('a', 1), ('b', 2))\n>>> freeze({\"b\": 2, \"a\": 1})\n(('b', 2), ('a', 1))\n\nThis behavior is controllable by ``Config.ignore_dict_order``, which emits a ``frozenset`` of pairs.\n\n>>> config = Config(ignore_dict_order=True)\n>>> freeze({\"b\": 2, \"a\": 1}, config) == freeze({\"a\": 1, \"b\": 2}, config)\nTrue\n\n--------------\nSummarize diff\n--------------\n\nThis enables a pretty neat utility to compare two arbitrary Python objects.\n\n>>> from charmonium.freeze import summarize_diffs\n>>> obj0 = [0, 1, 2, {3, 4}, {\"a\": 5, \"b\": 6, \"c\": 7}, 8]\n>>> obj1 = [0, 8, 2, {3, 5}, {\"a\": 5, \"b\": 7, \"d\": 8}]\n>>> print(summarize_diffs(obj0, obj1))\nlet obj0_sub = obj0\nlet obj1_sub = obj1\nobj0_sub.__len__() == 6\nobj1_sub.__len__() == 5\nobj0_sub[1] == 1\nobj1_sub[1] == 8\nobj0_sub[3].has() == 4\nobj1_sub[3].has() == no such element\nobj0_sub[3].has() == no such element\nobj1_sub[3].has() == 5\nobj0_sub[4].keys().has() == c\nobj1_sub[4].keys().has() == no such element\nobj0_sub[4].keys().has() == no such element\nobj1_sub[4].keys().has() == d\nobj0_sub[4]['b'] == 6\nobj1_sub[4]['b'] == 7\n\nAnd if you don't like my printing style, you can get a programatic\naccess to this information.\n\n>>> from charmonium.freeze import iterate_diffs\n>>> for o1, o2 in iterate_diffs(obj0, obj1):\n...    print(o1, o2, sep=\"\\n\")\nObjectLocation(labels=('obj0', '.__len__()'), objects=(..., 6))\nObjectLocation(labels=('obj1', '.__len__()'), objects=(..., 5))\nObjectLocation(labels=('obj0', '[1]'), objects=(..., 1))\nObjectLocation(labels=('obj1', '[1]'), objects=(..., 8))\nObjectLocation(labels=('obj0', '[3]', '.has()'), objects=(...), 4))\nObjectLocation(labels=('obj1', '[3]', '.has()'), objects=(..., 'no such element'))\nObjectLocation(labels=('obj0', '[3]', '.has()'), objects=(...), 'no such element'))\nObjectLocation(labels=('obj1', '[3]', '.has()'), objects=(..., 5))\nObjectLocation(labels=('obj0', '[4]', '.keys()', '.has()'), objects=(..., 'c'))\nObjectLocation(labels=('obj1', '[4]', '.keys()', '.has()'), objects=(..., 'no such element'))\nObjectLocation(labels=('obj0', '[4]', '.keys()', '.has()'), objects=(..., 'no such element'))\nObjectLocation(labels=('obj1', '[4]', '.keys()', '.has()'), objects=(..., 'd'))\nObjectLocation(labels=('obj0', '[4]', \"['b']\"), objects=(..., 6))\nObjectLocation(labels=('obj1', '[4]', \"['b']\"), objects=(..., 7))\n\n\n---------\nDebugging\n---------\n\nUse the following lines to see how ``freeze`` decomposes an object into\nprimitive values.\n\n.. code:: python\n\n    import logging, os\n    logger = logging.getLogger(\"charmonium.freeze\")\n    logger.setLevel(logging.DEBUG)\n    fh = logging.FileHandler(\"freeze.log\")\n    fh.setLevel(logging.DEBUG)\n    fh.setFormatter(logging.Formatter(\"%(message)s\"))\n    logger.addHandler(fh)\n    logger.debug(\"Program %d\", os.getpid())\n\n    i = 0\n    def square_plus_i(x):\n        # Value of global variable will be included in the function's frozen state.\n        return x**2 + i\n\n    from charmonium.freeze import freeze\n    freeze(square_plus_i)\n\n\nThis produces a log such as in ``freeze.log``:\n\n::\n\n    freeze begin <function square_plus_i at 0x7f9228bff550>\n     function <function square_plus_i at 0x7f9228bff550>\n      tuple (('code', <code object square_plus_i at 0x7f9228c6cf50, file \"/tmp/ipython_edit_303agyiz/ipython_edit_rez33yf_.py\", line 2>), 'closure globals', {'i': 0})\n       tuple ('code', <code object square_plus_i at 0x7f9228c6cf50, file \"/tmp/ipython_edit_303agyiz/ipython_edit_rez33yf_.py\", line 2>)\n        'code'\n        code <code object square_plus_i at 0x7f9228c6cf50, file \"/tmp/ipython_edit_303agyiz/ipython_edit_rez33yf_.py\", line 2>\n         tuple (None, 2)\n          None\n          2\n         b'|\\x00d\\x01\\x13\\x00t\\x00\\x17\\x00S\\x00'\n       'closure globals'\n       dict {'i': 0}\n        'i'\n        0\n    freeze end\n\nI do this to find the differences between subsequent runs:\n\n.. code:: shell\n\n    $ python code.py\n    $ mv freeze.log freeze.0.log\n\n    $ python code.py\n    $ mv freeze.log freeze.1.log\n\n    $ sed -i 's/at 0x[0-9a-f]*//g' freeze.*.log\n    # This removes pointer values that appear in the `repr(...)`.\n\n    $ meld freeze.0.log freeze.1.log\n    # Alternatively, use `icdiff` or `diff -u1`.\n\nIf ``freeze(obj)`` is taking a long time, try adding ``freeze(obj,\nConfig(recursion_limit=20))``. This causes an exception if ``freeze`` recurses\nmore than a certain number of times. If you hit this exception, consider adding\nignored class, functions, attributes, or objects in ``Config``.\n\n----------\nDeveloping\n----------\n\nSee `CONTRIBUTING.md`_ for instructions on setting up a development environment.\n\n.. _`CONTRIBUTING.md`: https://github.com/charmoniumQ/charmonium.freeze/tree/main/CONTRIBUTING.md\n\n\n----\nTODO\n----\n\n- \u2610 Correctness\n\n  - \u2611 Test hashing sets with different orders. Assert tests fail.\n  - \u2611 Test hashing dicts with different orders. Assert tests fail.\n  - \u2611 Don't include properties in hash.\n  - \u2611 Test that freeze of an object includes freeze of its instance methods.\n  - \u2611 Test functions with minor changes.\n  - \u2611 Test set/dict with diff hash.\n  - \u2611 Test obj with slots.\n  - \u2611 Test hash for objects and classes more carefully.\n  - \u2611 Improve test coverage.\n  - \u2611 Investigate when modules are assumed constant.\n  - \u2610 Detect if a module/package has a version. If present, use that. Else, use each attribute.\n  - \u2610 Support closures which include ``import x`` and ``from x import y``\n\n- \u2611 API\n\n  - \u2611 Use user-customizable multidispatch.\n  - \u2611 Bring hash into separate package.\n  - \u2611 Make it easier to register a freeze method for a type.\n  - \u2611 Encapsulate global config into object.\n  - \u2611 Make freeze object-oriented with a module-level instance, like ``random.random`` and ``random.Random``.\n    - This makes it easier for different callers to have their own configuration options.\n  - \u2611 Add an option which returns a single 128-bit int instead of a structured object after a certain depth. This is what ``charmonium.determ_hash`` does. Use this configuration in ``charmonium.cache``.\n  - \u2610 Move \"get call graph\" into its own package.\n  - \u2610 Document configuration options.\n  - \u2611 Document ``summarize_diff`` and ``iterate_diffs``.\n  - \u2610 Have an API for ignoring modules in ``requirements.txt`` or ``pyproject.toml``, and just tracking them by version.\n  - \u2611 Config object should cascade with ``with config.set(a=b)``\n\n- \u2611 Make ``freeze`` handle more types:\n\n  - \u2611 Module: freeze by name.\n  - \u2611 Objects: include the source-code of methods.\n  - \u2611 C extensions. freeze by name, like module\n  - \u2611 Methods\n  - \u2611 fastpath for numpy arrays\n  - \u2611 ``tqdm``\n  - \u2611 ``numpy.int64(1234)``\n  - \u2611 Pandas dataframe\n  - \u2611 Catch Pickle TypeError\n  - \u2611 Catch Pickle ImportError\n\n- \u2610 Performance\n\n  - \u2611 Memoize the hash of immutable data:\n    - If function contains no locals or globals except other immutables, it is immutable.\n    - If a collection is immutable and contains only immutables, it is immutable.\n  - \u2611 Make performance benchmarks.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Injectively, deterministically maps arbitrary objects to hashable values",
    "version": "0.8.3",
    "split_keywords": [
        "freeze",
        "hash"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "af5360d615fb2ae9af1636edda510990a597a56c4e10d427576736c17656caee",
                "md5": "68f6b90c401110d07a835e45558bb5a8",
                "sha256": "50f2fa8f584a2fac3c2941343fe775a3385794d65ec06400f3712f0f0a081fd6"
            },
            "downloads": -1,
            "filename": "charmonium_freeze-0.8.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "68f6b90c401110d07a835e45558bb5a8",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8,<4.0",
            "size": 937875,
            "upload_time": "2023-03-19T00:42:33",
            "upload_time_iso_8601": "2023-03-19T00:42:33.448649Z",
            "url": "https://files.pythonhosted.org/packages/af/53/60d615fb2ae9af1636edda510990a597a56c4e10d427576736c17656caee/charmonium_freeze-0.8.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "33e35e28d994c54326d0a9f3c2e2c58c604ad94343e046d0152bfebe10b44952",
                "md5": "754a7a35a1b2488594c8ba6e61197e08",
                "sha256": "acbc9794786fbb002b36ed6fb75973b8e1a7c2bd0d2c3c8f0cfaf02443b4c2f0"
            },
            "downloads": -1,
            "filename": "charmonium_freeze-0.8.3.tar.gz",
            "has_sig": false,
            "md5_digest": "754a7a35a1b2488594c8ba6e61197e08",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8,<4.0",
            "size": 735672,
            "upload_time": "2023-03-19T00:42:35",
            "upload_time_iso_8601": "2023-03-19T00:42:35.530131Z",
            "url": "https://files.pythonhosted.org/packages/33/e3/5e28d994c54326d0a9f3c2e2c58c604ad94343e046d0152bfebe10b44952/charmonium_freeze-0.8.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-03-19 00:42:35",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "charmoniumQ",
    "github_project": "charmonium.freeze",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "charmonium-freeze"
}
        
Elapsed time: 0.49190s