ddsketch


Nameddsketch JSON
Version 3.0.1 PyPI version JSON
download
home_pagehttp://github.com/datadog/sketches-py
SummaryDistributed quantile sketches
upload_time2024-04-01 13:11:39
maintainerNone
docs_urlNone
authorJee Rim, Charles-Philippe Masson, Homin Lee
requires_python>=3.7
licenseNone
keywords ddsketch quantile sketch
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # ddsketch

This repo contains the Python implementation of the distributed quantile sketch
algorithm DDSketch [1]. DDSketch has relative-error guarantees for any quantile
q in [0, 1]. That is if the true value of the qth-quantile is `x` then DDSketch
returns a value `y` such that `|x-y| / x < e` where `e` is the relative error
parameter. (The default here is set to 0.01.)  DDSketch is also fully mergeable,
meaning that multiple sketches from distributed systems can be combined in a
central node.

Our default implementation, `DDSketch`, is guaranteed [1] to not grow too large
in size for any data that can be described by a distribution whose tails are
sub-exponential.

We also provide implementations (`LogCollapsingLowestDenseDDSketch` and
`LogCollapsingHighestDenseDDSketch`) where the q-quantile will be accurate up to
the specified relative error for q that is not too small (or large). Concretely,
the q-quantile will be accurate up to the specified relative error as long as it
belongs to one of the `m` bins kept by the sketch.  If the data is time in
seconds, the default of `m = 2048` covers 80 microseconds to 1 year.

## Installation

To install this package, run `pip install ddsketch`, or clone the repo and run
`python setup.py install`. This package depends on `numpy` and `protobuf`. (The
protobuf dependency can be removed if it's not applicable.)

## Usage
```
from ddsketch import DDSketch

sketch = DDSketch()
```
Add values to the sketch
```
import numpy as np

values = np.random.normal(size=500)
for v in values:
  sketch.add(v)
```
Find the quantiles of `values` to within the relative error.
```
quantiles = [sketch.get_quantile_value(q) for q in [0.5, 0.75, 0.9, 1]]
```
Merge another `DDSketch` into `sketch`.
```
another_sketch = DDSketch()
other_values = np.random.normal(size=500)
for v in other_values:
  another_sketch.add(v)
sketch.merge(another_sketch)
```
The quantiles of `values` concatenated with `other_values` are still accurate to within the relative error.

## Development

To work on ddsketch a Python interpreter must be installed. It is recommended to use the provided development
container (requires [docker](https://www.docker.com/)) which includes all the required Python interpreters.

    docker-compose run dev

Or, if developing outside of docker then it is recommended to use a virtual environment:

    pip install virtualenv
    virtualenv --python=3 .venv
    source .venv/bin/activate


### Testing

To run the tests install `riot`:

    pip install riot

Replace the Python version with the interpreter(s) available.

    # Run tests with Python 3.9
    riot run -p3.9 test

### Release notes

New features, bug fixes, deprecations and other breaking changes must have
release notes included.

To generate a release note for the change:

    riot run reno new <short-description-of-change-no-spaces>

Edit the generated file to include notes on the changes made in the commit/PR
and add commit it.


### Formatting

Format code with

    riot run fmt


### Type-checking

Type checking is done with [mypy](http://mypy-lang.org/):

    riot run mypy


### Type-checking

Lint the code with [flake8](https://flake8.pycqa.org/en/latest/):

    riot run flake8


### Protobuf

The protobuf is stored in the go repository: https://github.com/DataDog/sketches-go/blob/master/ddsketch/pb/ddsketch.proto

Install the minimum required protoc and generate the Python code:

```sh
docker run -v $PWD:/code -it ubuntu:18.04 /bin/bash
apt update && apt install protobuf-compiler  # default is 3.0.0
protoc --proto_path=ddsketch/pb/ --python_out=ddsketch/pb/ ddsketch/pb/ddsketch.proto
```


### Releasing

1. Generate the release notes and use [`pandoc`](https://pandoc.org/) to format
them for Github:
```bash
    git checkout master && git pull
    riot run -s reno report --no-show-source | pandoc -f rst -t gfm --wrap=none
```
   Copy the output into a new release: https://github.com/DataDog/sketches-py/releases/new.

2. Enter a tag for the release (following [`semver`](https://semver.org)) (eg. `v1.1.3`, `v1.0.3`, `v1.2.0`).
3. Use the tag without the `v` as the title.
4. Save the release as a draft and pass the link to someone else to give a quick review.
5. If all looks good hit publish


## References
[1] Charles Masson and Jee E Rim and Homin K. Lee. DDSketch: A fast and fully-mergeable quantile sketch with relative-error guarantees. PVLDB, 12(12): 2195-2205, 2019. (The code referenced in the paper, including our implementation of the the Greenwald-Khanna (GK) algorithm, can be found at: https://github.com/DataDog/sketches-py/releases/tag/v0.1 )

            

Raw data

            {
    "_id": null,
    "home_page": "http://github.com/datadog/sketches-py",
    "name": "ddsketch",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": "ddsketch, quantile, sketch",
    "author": "Jee Rim, Charles-Philippe Masson, Homin Lee",
    "author_email": "jee.rim@datadoghq.com, charles.masson@datadoghq.com, homin@datadoghq.com",
    "download_url": "https://files.pythonhosted.org/packages/b8/c7/25f300ba359c7e723180ce962a30e1f820c3990e3f3e8bbed16ae9387cab/ddsketch-3.0.1.tar.gz",
    "platform": null,
    "description": "# ddsketch\n\nThis repo contains the Python implementation of the distributed quantile sketch\nalgorithm DDSketch [1]. DDSketch has relative-error guarantees for any quantile\nq in [0, 1]. That is if the true value of the qth-quantile is `x` then DDSketch\nreturns a value `y` such that `|x-y| / x < e` where `e` is the relative error\nparameter. (The default here is set to 0.01.)  DDSketch is also fully mergeable,\nmeaning that multiple sketches from distributed systems can be combined in a\ncentral node.\n\nOur default implementation, `DDSketch`, is guaranteed [1] to not grow too large\nin size for any data that can be described by a distribution whose tails are\nsub-exponential.\n\nWe also provide implementations (`LogCollapsingLowestDenseDDSketch` and\n`LogCollapsingHighestDenseDDSketch`) where the q-quantile will be accurate up to\nthe specified relative error for q that is not too small (or large). Concretely,\nthe q-quantile will be accurate up to the specified relative error as long as it\nbelongs to one of the `m` bins kept by the sketch.  If the data is time in\nseconds, the default of `m = 2048` covers 80 microseconds to 1 year.\n\n## Installation\n\nTo install this package, run `pip install ddsketch`, or clone the repo and run\n`python setup.py install`. This package depends on `numpy` and `protobuf`. (The\nprotobuf dependency can be removed if it's not applicable.)\n\n## Usage\n```\nfrom ddsketch import DDSketch\n\nsketch = DDSketch()\n```\nAdd values to the sketch\n```\nimport numpy as np\n\nvalues = np.random.normal(size=500)\nfor v in values:\n  sketch.add(v)\n```\nFind the quantiles of `values` to within the relative error.\n```\nquantiles = [sketch.get_quantile_value(q) for q in [0.5, 0.75, 0.9, 1]]\n```\nMerge another `DDSketch` into `sketch`.\n```\nanother_sketch = DDSketch()\nother_values = np.random.normal(size=500)\nfor v in other_values:\n  another_sketch.add(v)\nsketch.merge(another_sketch)\n```\nThe quantiles of `values` concatenated with `other_values` are still accurate to within the relative error.\n\n## Development\n\nTo work on ddsketch a Python interpreter must be installed. It is recommended to use the provided development\ncontainer (requires [docker](https://www.docker.com/)) which includes all the required Python interpreters.\n\n    docker-compose run dev\n\nOr, if developing outside of docker then it is recommended to use a virtual environment:\n\n    pip install virtualenv\n    virtualenv --python=3 .venv\n    source .venv/bin/activate\n\n\n### Testing\n\nTo run the tests install `riot`:\n\n    pip install riot\n\nReplace the Python version with the interpreter(s) available.\n\n    # Run tests with Python 3.9\n    riot run -p3.9 test\n\n### Release notes\n\nNew features, bug fixes, deprecations and other breaking changes must have\nrelease notes included.\n\nTo generate a release note for the change:\n\n    riot run reno new <short-description-of-change-no-spaces>\n\nEdit the generated file to include notes on the changes made in the commit/PR\nand add commit it.\n\n\n### Formatting\n\nFormat code with\n\n    riot run fmt\n\n\n### Type-checking\n\nType checking is done with [mypy](http://mypy-lang.org/):\n\n    riot run mypy\n\n\n### Type-checking\n\nLint the code with [flake8](https://flake8.pycqa.org/en/latest/):\n\n    riot run flake8\n\n\n### Protobuf\n\nThe protobuf is stored in the go repository: https://github.com/DataDog/sketches-go/blob/master/ddsketch/pb/ddsketch.proto\n\nInstall the minimum required protoc and generate the Python code:\n\n```sh\ndocker run -v $PWD:/code -it ubuntu:18.04 /bin/bash\napt update && apt install protobuf-compiler  # default is 3.0.0\nprotoc --proto_path=ddsketch/pb/ --python_out=ddsketch/pb/ ddsketch/pb/ddsketch.proto\n```\n\n\n### Releasing\n\n1. Generate the release notes and use [`pandoc`](https://pandoc.org/) to format\nthem for Github:\n```bash\n    git checkout master && git pull\n    riot run -s reno report --no-show-source | pandoc -f rst -t gfm --wrap=none\n```\n   Copy the output into a new release: https://github.com/DataDog/sketches-py/releases/new.\n\n2. Enter a tag for the release (following [`semver`](https://semver.org)) (eg. `v1.1.3`, `v1.0.3`, `v1.2.0`).\n3. Use the tag without the `v` as the title.\n4. Save the release as a draft and pass the link to someone else to give a quick review.\n5. If all looks good hit publish\n\n\n## References\n[1] Charles Masson and Jee E Rim and Homin K. Lee. DDSketch: A fast and fully-mergeable quantile sketch with relative-error guarantees. PVLDB, 12(12): 2195-2205, 2019. (The code referenced in the paper, including our implementation of the the Greenwald-Khanna (GK) algorithm, can be found at: https://github.com/DataDog/sketches-py/releases/tag/v0.1 )\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Distributed quantile sketches",
    "version": "3.0.1",
    "project_urls": {
        "Download": "https://github.com/DataDog/sketches-py/archive/v1.0.tar.gz",
        "Homepage": "http://github.com/datadog/sketches-py"
    },
    "split_keywords": [
        "ddsketch",
        " quantile",
        " sketch"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "acdac821e4958c8df43ded1a92aaca678d89ec8b7a4df5bb561ef25354be1912",
                "md5": "608dd612d2be0deac714a664610772a7",
                "sha256": "6d047b455fe2837c43d366ff1ae6ba0c3166e15499de8688437a75cea914224e"
            },
            "downloads": -1,
            "filename": "ddsketch-3.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "608dd612d2be0deac714a664610772a7",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 19113,
            "upload_time": "2024-04-01T13:11:38",
            "upload_time_iso_8601": "2024-04-01T13:11:38.159484Z",
            "url": "https://files.pythonhosted.org/packages/ac/da/c821e4958c8df43ded1a92aaca678d89ec8b7a4df5bb561ef25354be1912/ddsketch-3.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b8c725f300ba359c7e723180ce962a30e1f820c3990e3f3e8bbed16ae9387cab",
                "md5": "29a5915967ceb6a80fcc79f5d4e8553b",
                "sha256": "aa8f20b2965e61731ca4fee2ca9c209f397f5bbb23f9d192ec8bd7a2f5bd9824"
            },
            "downloads": -1,
            "filename": "ddsketch-3.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "29a5915967ceb6a80fcc79f5d4e8553b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 30010,
            "upload_time": "2024-04-01T13:11:39",
            "upload_time_iso_8601": "2024-04-01T13:11:39.734366Z",
            "url": "https://files.pythonhosted.org/packages/b8/c7/25f300ba359c7e723180ce962a30e1f820c3990e3f3e8bbed16ae9387cab/ddsketch-3.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-01 13:11:39",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "datadog",
    "github_project": "sketches-py",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "ddsketch"
}
        
Elapsed time: 3.61622s