# pymerkle
**Merkle-tree in Python**
[![Build Status](https://gitlab.com/fmerg/pymerkle/badges/master/pipeline.svg)](https://gitlab.com/fmerg/pymerkle/commits/master)
[![Docs Status](https://readthedocs.org/projects/pymerkle/badge/?version=latest)](http://pymerkle.readthedocs.org)
[![PyPI version](https://badge.fury.io/py/pymerkle.svg)](https://pypi.org/project/pymerkle/)
![Python >= 3.7](https://img.shields.io/badge/python-%3E%3D%203.7-blue.svg)
Documentation at **[pymerkle.readthedocs.org](http://pymerkle.readthedocs.org/)**.
Storage agnostic implementation capable of generating inclusion and consistency proofs.
## Install
```bash
pip3 install pymerkle
```
This will also install [`cachetools`](https://github.com/tkem/cachetools)
as a dependency.
## Basic API
Let ``MerkleTree`` be any class implementing the ``BaseMerkleTree``
interface; e.g.,
```python
from pymerkle import InmemoryTree as MerkleTree
tree = MerkleTree(algorithm='sha256')
```
Append data into the tree and retrieve the corresponding hash value:
```python
index = tree.append_entry(b'foo') # leaf index
value = tree.get_leaf(index) # leaf hash
```
Current tree size:
```python
size = tree.get_size() # number of leaves
```
Current and intermediate states:
```python
state = tree.get_state() # current root-hash
state = tree.get_state(5) # root-hash of size 5 subtree
```
### Inclusion proof
Prove inclusion of the 3-rd leaf hash in the subtree of size 5:
```python
proof = tree.prove_inclusion(3, 5)
```
Verify the proof against the base hash and the subtree root:
```python
from pymerkle import verify_inclusion
base = tree.get_leaf(3)
root = tree.get_state(5)
verify_inclusion(base, root, proof)
```
### Consistency proof
Prove consistency between the states with size 3 and 5:
```python
proof = tree.prove_consistency(3, 5)
```
Verify the proof against the respective root-hashes:
```python
from pymerkle import verify_consistency
state1 = tree.get_state(3)
state2 = tree.get_state(5)
verify_consistency(state1, state2, proof)
```
## Supported hash functions
`sha224`, `sha256`, `sha384`, `sha512`, `sha3_224`, `sha3_256`, `sha3_384`, `sha3_512`
### Support for Keccak beyond SHA3
Installing [`pysha3==1.0.2`](https://pypi.org/project/pysha3/) makes available
the following hash functions:
`keccak_224`, `keccak_256`, `keccak_384`, `keccak_512`
## Security
*This library requires security review.*
### Resistance against second-preimage attack
This consists in the following standard technique:
- Upon computing the hash of a leaf node, prepend `0x00` to the payload
- Upon computing the hash of an interior node, prepend `0x01` to the payload
**Note**: For, say, testing purposes, you can disable this feature by passing
`disable_security=True` when initializing the `BaseMerkleTree` superclass.
Refer [here](./tests/test_defense.py) to see how to perform second-preimage
attack against the present implementation.
### Resistance against CVE-2012-2459 DOS
Contrary to the [bitcoin](https://en.bitcoin.it/wiki/Protocol_documentation#Merkle_Trees)
specification, lonely leaves are not duplicated while the tree is growing.
Instead, a bifurcation node is created at the rightmost branch (see next section).
As a consequence, the present implementation should be invulnerable to the
[CVE-2012-2459](https://nvd.nist.gov/vuln/detail/CVE-2012-2459) DOS attack (see also
[here](https://github.com/bitcoin/bitcoin/blob/bccb4d29a8080bf1ecda1fc235415a11d903a680/src/consensus/merkle.cpp)
for insight).
## Topology
Interior nodes are not assumed to be stored anywhere and no concrete links are
created between them. The tree structure is determined by the recursive
function which computes intermediate states on the fly and is essentially the same as
[RFC 9162](https://datatracker.ietf.org/doc/html/rfc9162) (Section 2).
It turns out to be that of a binary
[Sakura tree](https://keccak.team/files/Sakura.pdf) (Section 5.4).
## Optimizations
The performance of a Merkle-tree depends on how efficiently it computes the root-hash
for arbitrary leaf ranges on the fly. The recursive version of this operation
(e.g., [RFC 9162](https://datatracker.ietf.org/doc/html/rfc9162), Section 2)
is slow.
A key remark is that the above operation can be made iterative by combining the root-hashes
for ranges whose size is a power of two ("subroots") and can as such be computed
efficiently. Subroot computation has significant impact on performance
(>500% speedup) while keeping peak memory usage reasonably low
(e.g., 200 MiB for a tree with several tens of millions of entries) and
linear with respect to tree size.
**Note**: For, say, comparison purposes, you can disable this feature by passing
`disable_optimizations=True` when initializing the `BaseMerkleTree` superclass.
### Caching
In view of the above technique, subroot computation is the only massively repeated
and relatively costly operation. It thus makes sense to apply memoization
for ranges whose size exceeds a certain threshold (128 leaves by default).
For example, after sufficiently many cache hits (e.g. 2MiB cache memory), proof generation
becomes 5 times faster for a tree with several tens of million of entries.
Practically, a pretty big tree with sufficiently long uptime will respond instantly
with negligible penalty in memory usage.
**Note**: For, say, comparison purposes, you can disable this feature by passing
`disable_cache=True` when initializing the `BaseMerkleTree` superclass.
## Storage
This library is unopinionated on how leaves are appended to the tree, i.e., how
data is stored in concrete. Cryptographic functionality is encapsulated in the
`BaseMerkleTree` abstract class, which admits pluggable storage backends
through subclassing. It is the the developer's choice to decide how to
store data by implementing the interior storage interface of this class.
Any contiguously indexed dataset should do the job. Conversely, given any such
dataset, we should be able to trivially implement a Merkle-tree that is
operable with it.
### Example
This is a simple non-persistent implementation utilizing a list as storage. It
expects entries to be strings, which it encodes in utf-8 before hashing.
```python
from pymerkle import BaseMerkleTree
class MerkleTree(BaseMerkleTree):
def __init__(self, algorithm='sha256'):
"""
Storage setup and superclass initialization
"""
self.hashes = []
super().__init__(algorithm)
def _encode_entry(self, data):
"""
Prepares data entry for hashing
"""
return data.encode('utf-8')
def _store_leaf(self, data, digest):
"""
Stores data hash in a new leaf and returns index
"""
self.hashes += [digest]
return len(self.hashes)
def _get_leaf(self, index):
"""
Returns the hash stored by the leaf specified
"""
value = self.hashes[index - 1]
return value
def _get_leaves(self, offset, width):
"""
Returns hashes corresponding to the specified leaf range
"""
values = self.hashes[offset: offset + width]
return values
def _get_size(self):
"""
Returns the current number of leaves
"""
return len(self.hashes)
```
## Development
In what follows, you need to have locally installed dev requirements:
```commandline
pip3 install -r requirements-dev.txt
```
### Tests
```commandline
./test.sh [--help]
```
### Performance
In order to capture the effect of I/O operations, performance measurements are
run against a SQLite database as leaf storage. Create it using the following script:
```commandline
python benchmarks/init_db.py [--help]
```
#### Benchmarks
```commandline
./benchmark.sh [--help]
```
#### Profiling
Assuming [`valgrind`](https://valgrind.org/) and
[`massif-visualizer`](https://apps.kde.org/massif-visualizer/) are installed, use
```commandline
./profile.sh [--help]
```
to do memory profiling. Pass `--time` to profile execution times
instead of memory allocations.
## Documentation
**[pymerkle.readthedocs.org](http://pymerkle.readthedocs.org/)**.
### Build locally
Documentation is built with
[`sphinx`](https://www.sphinx-doc.org/en/master/index.html).
Assuming dev requirements have been installed, build the docs with
```commandline
./build-docs.sh [--help]
```
and browse at
```
docs/target/build/html/index.html
```
to view them.
Raw data
{
"_id": null,
"home_page": "https://github.com/fmerg/pymerkle",
"name": "pymerkle",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": "",
"keywords": "merkle,proof,inclusion,consistency",
"author": "fmerg",
"author_email": "fmerg@protonmail.com",
"download_url": "https://files.pythonhosted.org/packages/61/90/b96b3af828426da138107270e6942b5aac9525c6b1eddec73c9ecd63134f/pymerkle-6.1.0.tar.gz",
"platform": null,
"description": "# pymerkle\n\n**Merkle-tree in Python**\n\n[![Build Status](https://gitlab.com/fmerg/pymerkle/badges/master/pipeline.svg)](https://gitlab.com/fmerg/pymerkle/commits/master)\n[![Docs Status](https://readthedocs.org/projects/pymerkle/badge/?version=latest)](http://pymerkle.readthedocs.org)\n[![PyPI version](https://badge.fury.io/py/pymerkle.svg)](https://pypi.org/project/pymerkle/)\n![Python >= 3.7](https://img.shields.io/badge/python-%3E%3D%203.7-blue.svg)\n\nDocumentation at **[pymerkle.readthedocs.org](http://pymerkle.readthedocs.org/)**.\n\nStorage agnostic implementation capable of generating inclusion and consistency proofs.\n\n\n## Install\n\n```bash\npip3 install pymerkle\n```\n\nThis will also install [`cachetools`](https://github.com/tkem/cachetools)\nas a dependency.\n\n\n## Basic API\n\nLet ``MerkleTree`` be any class implementing the ``BaseMerkleTree``\ninterface; e.g.,\n\n\n```python\nfrom pymerkle import InmemoryTree as MerkleTree\n\ntree = MerkleTree(algorithm='sha256')\n```\n\nAppend data into the tree and retrieve the corresponding hash value:\n\n```python\nindex = tree.append_entry(b'foo') # leaf index\n\nvalue = tree.get_leaf(index) # leaf hash\n```\n\n\nCurrent tree size:\n\n```python\nsize = tree.get_size() # number of leaves\n```\n\n\nCurrent and intermediate states:\n\n```python\nstate = tree.get_state() # current root-hash\n\nstate = tree.get_state(5) # root-hash of size 5 subtree\n```\n\n\n### Inclusion proof\n\nProve inclusion of the 3-rd leaf hash in the subtree of size 5:\n\n```python\nproof = tree.prove_inclusion(3, 5)\n```\n\nVerify the proof against the base hash and the subtree root:\n\n```python\nfrom pymerkle import verify_inclusion\n\nbase = tree.get_leaf(3)\nroot = tree.get_state(5)\n\nverify_inclusion(base, root, proof)\n```\n\n### Consistency proof\n\nProve consistency between the states with size 3 and 5:\n\n```python\nproof = tree.prove_consistency(3, 5)\n```\n\nVerify the proof against the respective root-hashes:\n\n```python\nfrom pymerkle import verify_consistency\n\nstate1 = tree.get_state(3)\nstate2 = tree.get_state(5)\n\nverify_consistency(state1, state2, proof)\n```\n\n## Supported hash functions\n\n`sha224`, `sha256`, `sha384`, `sha512`, `sha3_224`, `sha3_256`, `sha3_384`, `sha3_512`\n\n### Support for Keccak beyond SHA3\n\nInstalling [`pysha3==1.0.2`](https://pypi.org/project/pysha3/) makes available\nthe following hash functions:\n\n`keccak_224`, `keccak_256`, `keccak_384`, `keccak_512`\n\n\n## Security\n\n*This library requires security review.*\n\n### Resistance against second-preimage attack\n\nThis consists in the following standard technique:\n\n- Upon computing the hash of a leaf node, prepend `0x00` to the payload\n- Upon computing the hash of an interior node, prepend `0x01` to the payload\n\n**Note**: For, say, testing purposes, you can disable this feature by passing\n`disable_security=True` when initializing the `BaseMerkleTree` superclass.\nRefer [here](./tests/test_defense.py) to see how to perform second-preimage\nattack against the present implementation.\n\n\n### Resistance against CVE-2012-2459 DOS\n\nContrary to the [bitcoin](https://en.bitcoin.it/wiki/Protocol_documentation#Merkle_Trees)\nspecification, lonely leaves are not duplicated while the tree is growing.\nInstead, a bifurcation node is created at the rightmost branch (see next section).\nAs a consequence, the present implementation should be invulnerable to the\n[CVE-2012-2459](https://nvd.nist.gov/vuln/detail/CVE-2012-2459) DOS attack (see also\n[here](https://github.com/bitcoin/bitcoin/blob/bccb4d29a8080bf1ecda1fc235415a11d903a680/src/consensus/merkle.cpp)\nfor insight).\n\n\n## Topology\n\nInterior nodes are not assumed to be stored anywhere and no concrete links are\ncreated between them. The tree structure is determined by the recursive\nfunction which computes intermediate states on the fly and is essentially the same as\n[RFC 9162](https://datatracker.ietf.org/doc/html/rfc9162) (Section 2).\nIt turns out to be that of a binary\n[Sakura tree](https://keccak.team/files/Sakura.pdf) (Section 5.4).\n\n\n## Optimizations\n\nThe performance of a Merkle-tree depends on how efficiently it computes the root-hash\nfor arbitrary leaf ranges on the fly. The recursive version of this operation\n(e.g., [RFC 9162](https://datatracker.ietf.org/doc/html/rfc9162), Section 2)\nis slow.\n\nA key remark is that the above operation can be made iterative by combining the root-hashes\nfor ranges whose size is a power of two (\"subroots\") and can as such be computed\nefficiently. Subroot computation has significant impact on performance\n(>500% speedup) while keeping peak memory usage reasonably low\n(e.g., 200 MiB for a tree with several tens of millions of entries) and\nlinear with respect to tree size.\n\n**Note**: For, say, comparison purposes, you can disable this feature by passing\n`disable_optimizations=True` when initializing the `BaseMerkleTree` superclass.\n\n\n### Caching\n\nIn view of the above technique, subroot computation is the only massively repeated\nand relatively costly operation. It thus makes sense to apply memoization\nfor ranges whose size exceeds a certain threshold (128 leaves by default).\nFor example, after sufficiently many cache hits (e.g. 2MiB cache memory), proof generation\nbecomes 5 times faster for a tree with several tens of million of entries.\nPractically, a pretty big tree with sufficiently long uptime will respond instantly\nwith negligible penalty in memory usage.\n\n\n**Note**: For, say, comparison purposes, you can disable this feature by passing\n`disable_cache=True` when initializing the `BaseMerkleTree` superclass.\n\n\n## Storage\n\nThis library is unopinionated on how leaves are appended to the tree, i.e., how\ndata is stored in concrete. Cryptographic functionality is encapsulated in the\n`BaseMerkleTree` abstract class, which admits pluggable storage backends\nthrough subclassing. It is the the developer's choice to decide how to\nstore data by implementing the interior storage interface of this class.\nAny contiguously indexed dataset should do the job. Conversely, given any such\ndataset, we should be able to trivially implement a Merkle-tree that is\noperable with it.\n\n\n### Example\n\nThis is a simple non-persistent implementation utilizing a list as storage. It\nexpects entries to be strings, which it encodes in utf-8 before hashing.\n\n\n```python\nfrom pymerkle import BaseMerkleTree\n\n\nclass MerkleTree(BaseMerkleTree):\n\n def __init__(self, algorithm='sha256'):\n \"\"\"\n Storage setup and superclass initialization\n \"\"\"\n self.hashes = []\n\n super().__init__(algorithm)\n\n\n def _encode_entry(self, data):\n \"\"\"\n Prepares data entry for hashing\n \"\"\"\n return data.encode('utf-8')\n\n\n def _store_leaf(self, data, digest):\n \"\"\"\n Stores data hash in a new leaf and returns index\n \"\"\"\n self.hashes += [digest]\n\n return len(self.hashes)\n\n\n def _get_leaf(self, index):\n \"\"\"\n Returns the hash stored by the leaf specified\n \"\"\"\n value = self.hashes[index - 1]\n\n return value\n\n\n def _get_leaves(self, offset, width):\n \"\"\"\n Returns hashes corresponding to the specified leaf range\n \"\"\"\n values = self.hashes[offset: offset + width]\n\n return values\n\n\n def _get_size(self):\n \"\"\"\n Returns the current number of leaves\n \"\"\"\n return len(self.hashes)\n```\n\n## Development\n\nIn what follows, you need to have locally installed dev requirements:\n\n```commandline\npip3 install -r requirements-dev.txt\n```\n\n### Tests\n\n```commandline\n./test.sh [--help]\n```\n\n### Performance\n\nIn order to capture the effect of I/O operations, performance measurements are\nrun against a SQLite database as leaf storage. Create it using the following script:\n\n```commandline\npython benchmarks/init_db.py [--help]\n```\n\n#### Benchmarks\n\n```commandline\n./benchmark.sh [--help]\n```\n\n#### Profiling\n\nAssuming [`valgrind`](https://valgrind.org/) and\n[`massif-visualizer`](https://apps.kde.org/massif-visualizer/) are installed, use\n\n```commandline\n./profile.sh [--help]\n```\n\nto do memory profiling. Pass `--time` to profile execution times\ninstead of memory allocations.\n\n\n## Documentation\n\n**[pymerkle.readthedocs.org](http://pymerkle.readthedocs.org/)**.\n\n### Build locally\n\nDocumentation is built with\n[`sphinx`](https://www.sphinx-doc.org/en/master/index.html).\n\nAssuming dev requirements have been installed, build the docs with\n\n```commandline\n./build-docs.sh [--help]\n```\n\nand browse at\n\n```\ndocs/target/build/html/index.html\n```\n\nto view them.\n",
"bugtrack_url": null,
"license": "",
"summary": "Merkle-tree cryptography in python",
"version": "6.1.0",
"project_urls": {
"Homepage": "https://github.com/fmerg/pymerkle",
"docs": "https://6.1.0.readthedocs.io/en/latest/",
"github": "https://github.com/fmerg/pymerkle",
"source": "https://github.com/fmerg/pymerkle/tree/master/pymerkle"
},
"split_keywords": [
"merkle",
"proof",
"inclusion",
"consistency"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "1c460b5c80ee9912fcd174d112c00f95d12f19e88dec00177c6b3a51b1330d4f",
"md5": "387345f53311fa75b597c0e005961101",
"sha256": "9ba32e71fc9bd53a65c2815414aca9e3c43b9d0376b3c73d41cb7b3814267a0a"
},
"downloads": -1,
"filename": "pymerkle-6.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "387345f53311fa75b597c0e005961101",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 34289,
"upload_time": "2023-08-30T14:42:22",
"upload_time_iso_8601": "2023-08-30T14:42:22.460341Z",
"url": "https://files.pythonhosted.org/packages/1c/46/0b5c80ee9912fcd174d112c00f95d12f19e88dec00177c6b3a51b1330d4f/pymerkle-6.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "6190b96b3af828426da138107270e6942b5aac9525c6b1eddec73c9ecd63134f",
"md5": "b6cf7d1d5362b2e96e988667828ef1d3",
"sha256": "bf43addba1c49da4f7fd905a6836f58f9e99294b2b34c6460b24f53156ed285a"
},
"downloads": -1,
"filename": "pymerkle-6.1.0.tar.gz",
"has_sig": false,
"md5_digest": "b6cf7d1d5362b2e96e988667828ef1d3",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 36178,
"upload_time": "2023-08-30T14:42:25",
"upload_time_iso_8601": "2023-08-30T14:42:25.048060Z",
"url": "https://files.pythonhosted.org/packages/61/90/b96b3af828426da138107270e6942b5aac9525c6b1eddec73c9ecd63134f/pymerkle-6.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-08-30 14:42:25",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "fmerg",
"github_project": "pymerkle",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "pymerkle"
}