sparksampling


Namesparksampling JSON
Version 0.4.2 PyPI version JSON
download
home_page
Summarypyspark-sampling
upload_time2023-08-01 13:46:34
maintainer
docs_urlNone
author
requires_python>=3.7
licenseApache License 2.0
keywords pyspark-sampling sparksampling
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI
coveralls test coverage No coveralls.
            ![](https://img.shields.io/github/license/wh1isper/pyspark-sampling)
![](https://img.shields.io/docker/image-size/wh1isper/pysparksampling)
![](https://img.shields.io/pypi/pyversions/sparksampling)
![](https://img.shields.io/pypi/dm/sparksampling)

# pyspark-sampling

``sparksampling`` is a PySpark-based sampling and data quality assessment GRPC service that supports containerized
deployments and Spark On K8S

## Feature

- Common sampling methods: Random, Stratified, Simple
- Relationship Sampling based on DAG and Topological sorting
- Cloud Native and Spark on K8S support

# QUICK START

## Installation

The trial only requires direct installation using pypi

``pip install sparksampling``

run as

``sparksampling``

The service will start and listen on port 8530

## Docker

``docker run -p 8530:8530 wh1isper/pysparksampling:latest``


# Development

Using dev install

```shell
pip install -e .[test]
pre-commit install
```

run test

```shell
pytest -v
```

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "sparksampling",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "pyspark-sampling,sparksampling",
    "author": "",
    "author_email": "Wh1isper <9573586@qq.com>",
    "download_url": "https://files.pythonhosted.org/packages/ed/46/9982610865b02c3a958e7e9152c506b556b7f023ea42c86916c63660bd2e/sparksampling-0.4.2.tar.gz",
    "platform": null,
    "description": "![](https://img.shields.io/github/license/wh1isper/pyspark-sampling)\n![](https://img.shields.io/docker/image-size/wh1isper/pysparksampling)\n![](https://img.shields.io/pypi/pyversions/sparksampling)\n![](https://img.shields.io/pypi/dm/sparksampling)\n\n# pyspark-sampling\n\n``sparksampling`` is a PySpark-based sampling and data quality assessment GRPC service that supports containerized\ndeployments and Spark On K8S\n\n## Feature\n\n- Common sampling methods: Random, Stratified, Simple\n- Relationship Sampling based on DAG and Topological sorting\n- Cloud Native and Spark on K8S support\n\n# QUICK START\n\n## Installation\n\nThe trial only requires direct installation using pypi\n\n``pip install sparksampling``\n\nrun as\n\n``sparksampling``\n\nThe service will start and listen on port 8530\n\n## Docker\n\n``docker run -p 8530:8530 wh1isper/pysparksampling:latest``\n\n\n# Development\n\nUsing dev install\n\n```shell\npip install -e .[test]\npre-commit install\n```\n\nrun test\n\n```shell\npytest -v\n```\n",
    "bugtrack_url": null,
    "license": "Apache License 2.0",
    "summary": "pyspark-sampling",
    "version": "0.4.2",
    "project_urls": {
        "Source": "https://github.com/Wh1isper/pyspark-sampling"
    },
    "split_keywords": [
        "pyspark-sampling",
        "sparksampling"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8cc2fb0e04e7361a3421514900d095a9dff23eafa510e94218e4d5cc79f61090",
                "md5": "6bd476eaf6ddda70aa617c3cae30544a",
                "sha256": "743062f9f2a73b2cdd4957c10526c830e1277e0a7b7d3cf9b5ef0d01f5cfada2"
            },
            "downloads": -1,
            "filename": "sparksampling-0.4.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6bd476eaf6ddda70aa617c3cae30544a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 33273,
            "upload_time": "2023-08-01T13:46:30",
            "upload_time_iso_8601": "2023-08-01T13:46:30.345813Z",
            "url": "https://files.pythonhosted.org/packages/8c/c2/fb0e04e7361a3421514900d095a9dff23eafa510e94218e4d5cc79f61090/sparksampling-0.4.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ed469982610865b02c3a958e7e9152c506b556b7f023ea42c86916c63660bd2e",
                "md5": "fb515b51905e37d0108fbecb4251c7af",
                "sha256": "010ac9c109ff3cd6a2d4a0a2531ac265d52191a6398586d51fb254540e6e32f5"
            },
            "downloads": -1,
            "filename": "sparksampling-0.4.2.tar.gz",
            "has_sig": false,
            "md5_digest": "fb515b51905e37d0108fbecb4251c7af",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 1921627,
            "upload_time": "2023-08-01T13:46:34",
            "upload_time_iso_8601": "2023-08-01T13:46:34.632170Z",
            "url": "https://files.pythonhosted.org/packages/ed/46/9982610865b02c3a958e7e9152c506b556b7f023ea42c86916c63660bd2e/sparksampling-0.4.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-08-01 13:46:34",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Wh1isper",
    "github_project": "pyspark-sampling",
    "travis_ci": true,
    "coveralls": false,
    "github_actions": false,
    "lcname": "sparksampling"
}
        
Elapsed time: 0.10236s