pycudahll


Namepycudahll JSON
Version 0.1.0 PyPI version JSON
download
home_pagehttps://github.com/gabemgem/PyCudaHLL
SummaryA GPU implementation of HyperLogLog
upload_time2023-05-04 01:39:26
maintainer
docs_urlNone
authorGabe Maayan
requires_python>3.10,<3.12
license
keywords cupy gpu hll hyperloglog
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # PyCudaHLL

This is a GPU accelerated implementation of HyperLogLog using the CuPy library. It was created for the class "Algorithmic Techniques for Taming Big Data" at the department of Computing and Data Science at Boston University.

## Using the Code

To use this code, you can either get the library from PyPI or build it from source.

### Get from PyPI (Recommended)

- Install using pip: `pip install pycudahll`
- In your code, import the library: `from pycudahll.CudaHLL import CudaHLL`

### Building from Source

- Clone the repository
- Install dependencies: `poetry install`
- In your code, import the library: `from pycudahll.CudaHLL import CudaHLL`
- See `test.py` for examples. (Note: `test.py` is most likely in a broken state, but should give you an idea of how to use the library.)

## API

The main class of the library is CudaHLL. It can be imported in your code with:
```python
from pycudahll.CudaHLL import CudaHLL
```

CudaHLL also includes a helper function to hash data to use with the main class:
```python
from pycudahll.CudaHLL import hashDataGPUHLL
```

A short example of how to use the library is as follows:
```python
from pycudahll.CudaHLL import CudaHLL, hashDataGPUHLL

with open('data.csv', 'r') as file:
    data = file.read().split(',')
    hashedData = hashDataGPUHLL(data)

    threads = 64
    p = 14
    cudaDevice = 0 # optional
    roundThreads = True # optional
    hll = CudaHLL(p, threads, cudaDevice, roundThreads)

    hll.add(hashedData)
    print(hll.card()) # print unrounded cardinality estimate
    print(len(hll)) # print rounded cardinality estimate
```


## Test Data

Text of Shakespeare plays obtained from https://ocw.mit.edu/ans7870/6/6.006/s08/lecturenotes/files/t8.shakespeare.txt. Original text can be found in t8.shakespeare.txt and the modified text can be found in shakespeare.csv.

Total number of items = 899300
Exact cardinality = 34065
            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/gabemgem/PyCudaHLL",
    "name": "pycudahll",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">3.10,<3.12",
    "maintainer_email": "",
    "keywords": "cupy,gpu,hll,hyperloglog",
    "author": "Gabe Maayan",
    "author_email": "gabemgem@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/07/01/d2f78fdd3ddd61fc9d0c111a8f6d091a1d04ebf50257c7fe783cd2653946/pycudahll-0.1.0.tar.gz",
    "platform": null,
    "description": "# PyCudaHLL\n\nThis is a GPU accelerated implementation of HyperLogLog using the CuPy library. It was created for the class \"Algorithmic Techniques for Taming Big Data\" at the department of Computing and Data Science at Boston University.\n\n## Using the Code\n\nTo use this code, you can either get the library from PyPI or build it from source.\n\n### Get from PyPI (Recommended)\n\n- Install using pip: `pip install pycudahll`\n- In your code, import the library: `from pycudahll.CudaHLL import CudaHLL`\n\n### Building from Source\n\n- Clone the repository\n- Install dependencies: `poetry install`\n- In your code, import the library: `from pycudahll.CudaHLL import CudaHLL`\n- See `test.py` for examples. (Note: `test.py` is most likely in a broken state, but should give you an idea of how to use the library.)\n\n## API\n\nThe main class of the library is CudaHLL. It can be imported in your code with:\n```python\nfrom pycudahll.CudaHLL import CudaHLL\n```\n\nCudaHLL also includes a helper function to hash data to use with the main class:\n```python\nfrom pycudahll.CudaHLL import hashDataGPUHLL\n```\n\nA short example of how to use the library is as follows:\n```python\nfrom pycudahll.CudaHLL import CudaHLL, hashDataGPUHLL\n\nwith open('data.csv', 'r') as file:\n    data = file.read().split(',')\n    hashedData = hashDataGPUHLL(data)\n\n    threads = 64\n    p = 14\n    cudaDevice = 0 # optional\n    roundThreads = True # optional\n    hll = CudaHLL(p, threads, cudaDevice, roundThreads)\n\n    hll.add(hashedData)\n    print(hll.card()) # print unrounded cardinality estimate\n    print(len(hll)) # print rounded cardinality estimate\n```\n\n\n## Test Data\n\nText of Shakespeare plays obtained from https://ocw.mit.edu/ans7870/6/6.006/s08/lecturenotes/files/t8.shakespeare.txt. Original text can be found in t8.shakespeare.txt and the modified text can be found in shakespeare.csv.\n\nTotal number of items = 899300\nExact cardinality = 34065",
    "bugtrack_url": null,
    "license": "",
    "summary": "A GPU implementation of HyperLogLog",
    "version": "0.1.0",
    "project_urls": {
        "Homepage": "https://github.com/gabemgem/PyCudaHLL",
        "Repository": "https://github.com/gabemgem/PyCudaHLL"
    },
    "split_keywords": [
        "cupy",
        "gpu",
        "hll",
        "hyperloglog"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "77b9522efd40eae19e4d183af618dcfc23dbf59b5dcd57a821b9d090b19d0aeb",
                "md5": "127cf33dec363f2a9282a24a345a867c",
                "sha256": "6a367b7c5ae2071907fda59f07014baf63039b541b97fdf6a9bd22a5f9d11456"
            },
            "downloads": -1,
            "filename": "pycudahll-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "127cf33dec363f2a9282a24a345a867c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">3.10,<3.12",
            "size": 37892,
            "upload_time": "2023-05-04T01:39:24",
            "upload_time_iso_8601": "2023-05-04T01:39:24.756413Z",
            "url": "https://files.pythonhosted.org/packages/77/b9/522efd40eae19e4d183af618dcfc23dbf59b5dcd57a821b9d090b19d0aeb/pycudahll-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0701d2f78fdd3ddd61fc9d0c111a8f6d091a1d04ebf50257c7fe783cd2653946",
                "md5": "4d9862ac16c35ca4955b312be59c8753",
                "sha256": "5806b9d6557a7b816f07f1750dd21d10f816c1bddefaf990cd3f5ffe0642ffd2"
            },
            "downloads": -1,
            "filename": "pycudahll-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "4d9862ac16c35ca4955b312be59c8753",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">3.10,<3.12",
            "size": 37473,
            "upload_time": "2023-05-04T01:39:26",
            "upload_time_iso_8601": "2023-05-04T01:39:26.789715Z",
            "url": "https://files.pythonhosted.org/packages/07/01/d2f78fdd3ddd61fc9d0c111a8f6d091a1d04ebf50257c7fe783cd2653946/pycudahll-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-05-04 01:39:26",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "gabemgem",
    "github_project": "PyCudaHLL",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "pycudahll"
}
        
Elapsed time: 0.10164s