![](https://img.shields.io/github/license/wh1isper/pyspark-sampling)
![](https://img.shields.io/docker/image-size/wh1isper/pysparksampling)
![](https://img.shields.io/pypi/pyversions/sparksampling)
![](https://img.shields.io/pypi/dm/sparksampling)
# pyspark-sampling
``sparksampling`` is a PySpark-based sampling and data quality assessment GRPC service that supports containerized
deployments and Spark On K8S
## Feature
- Common sampling methods: Random, Stratified, Simple
- Relationship Sampling based on DAG and Topological sorting
- Cloud Native and Spark on K8S support
# QUICK START
## Installation
The trial only requires direct installation using pypi
``pip install sparksampling``
run as
``sparksampling``
The service will start and listen on port 8530
## Docker
``docker run -p 8530:8530 wh1isper/pysparksampling:latest``
# Development
Using dev install
```shell
pip install -e .[test]
pre-commit install
```
run test
```shell
pytest -v
```
Raw data
{
"_id": null,
"home_page": "",
"name": "sparksampling",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": "",
"keywords": "pyspark-sampling,sparksampling",
"author": "",
"author_email": "Wh1isper <9573586@qq.com>",
"download_url": "https://files.pythonhosted.org/packages/ed/46/9982610865b02c3a958e7e9152c506b556b7f023ea42c86916c63660bd2e/sparksampling-0.4.2.tar.gz",
"platform": null,
"description": "![](https://img.shields.io/github/license/wh1isper/pyspark-sampling)\n![](https://img.shields.io/docker/image-size/wh1isper/pysparksampling)\n![](https://img.shields.io/pypi/pyversions/sparksampling)\n![](https://img.shields.io/pypi/dm/sparksampling)\n\n# pyspark-sampling\n\n``sparksampling`` is a PySpark-based sampling and data quality assessment GRPC service that supports containerized\ndeployments and Spark On K8S\n\n## Feature\n\n- Common sampling methods: Random, Stratified, Simple\n- Relationship Sampling based on DAG and Topological sorting\n- Cloud Native and Spark on K8S support\n\n# QUICK START\n\n## Installation\n\nThe trial only requires direct installation using pypi\n\n``pip install sparksampling``\n\nrun as\n\n``sparksampling``\n\nThe service will start and listen on port 8530\n\n## Docker\n\n``docker run -p 8530:8530 wh1isper/pysparksampling:latest``\n\n\n# Development\n\nUsing dev install\n\n```shell\npip install -e .[test]\npre-commit install\n```\n\nrun test\n\n```shell\npytest -v\n```\n",
"bugtrack_url": null,
"license": "Apache License 2.0",
"summary": "pyspark-sampling",
"version": "0.4.2",
"project_urls": {
"Source": "https://github.com/Wh1isper/pyspark-sampling"
},
"split_keywords": [
"pyspark-sampling",
"sparksampling"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "8cc2fb0e04e7361a3421514900d095a9dff23eafa510e94218e4d5cc79f61090",
"md5": "6bd476eaf6ddda70aa617c3cae30544a",
"sha256": "743062f9f2a73b2cdd4957c10526c830e1277e0a7b7d3cf9b5ef0d01f5cfada2"
},
"downloads": -1,
"filename": "sparksampling-0.4.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "6bd476eaf6ddda70aa617c3cae30544a",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 33273,
"upload_time": "2023-08-01T13:46:30",
"upload_time_iso_8601": "2023-08-01T13:46:30.345813Z",
"url": "https://files.pythonhosted.org/packages/8c/c2/fb0e04e7361a3421514900d095a9dff23eafa510e94218e4d5cc79f61090/sparksampling-0.4.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "ed469982610865b02c3a958e7e9152c506b556b7f023ea42c86916c63660bd2e",
"md5": "fb515b51905e37d0108fbecb4251c7af",
"sha256": "010ac9c109ff3cd6a2d4a0a2531ac265d52191a6398586d51fb254540e6e32f5"
},
"downloads": -1,
"filename": "sparksampling-0.4.2.tar.gz",
"has_sig": false,
"md5_digest": "fb515b51905e37d0108fbecb4251c7af",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 1921627,
"upload_time": "2023-08-01T13:46:34",
"upload_time_iso_8601": "2023-08-01T13:46:34.632170Z",
"url": "https://files.pythonhosted.org/packages/ed/46/9982610865b02c3a958e7e9152c506b556b7f023ea42c86916c63660bd2e/sparksampling-0.4.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-08-01 13:46:34",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Wh1isper",
"github_project": "pyspark-sampling",
"travis_ci": true,
"coveralls": false,
"github_actions": false,
"lcname": "sparksampling"
}