SparkStream


NameSparkStream JSON
Version 1.3.0 PyPI version JSON
download
home_pagehttps://github.com/HassanRady/SparkStream
SummaryA simple spark streaming handler.
upload_time2022-07-25 15:58:52
maintainer
docs_urlNone
authorHassan Rady
requires_python>=3.6.0
licenseMIT license
keywords tweets
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Spark Streaming Package
Package: <a href="https://pypi.org/project/SparkStream/#description">SparkStream-pypi</a>

## What is it?
It is a handler for processing streaming text data from a kafka topic into cassandra and redis.

## How it works?
The stream processing is done by the following steps:
1. Read data from kafka topic 
2. Parse the data into a spark dataframe with a schema
3. Clean the data: remove unwanted chars, fix abbreviations, remove stop-words, and remove empty fields
4. Save the data into cassandra and redis

## How to use it?
Use its API: <a href="https://github.com/HassanRady/Spark-Stream-Api">SparkStream-API github</a>

## Dependency
The package requires the following dependency:
- spark-redis_2.12-3.1.0-jar-with-dependencies.jar (<a href="https://mvnrepository.com/artifact/com.redislabs/spark-redis_2.12/3.1.0">mvn Repository</a>)

Its so to be able to write data into redis.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/HassanRady/SparkStream",
    "name": "SparkStream",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6.0",
    "maintainer_email": "",
    "keywords": "Tweets",
    "author": "Hassan Rady",
    "author_email": "hassan.khaled.rady@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/b0/0d/85c90186cfa7cb1e011e4c33ad4db6ae90903b7dbcca9670b8b2bfe77d1c/SparkStream-1.3.0.tar.gz",
    "platform": null,
    "description": "# Spark Streaming Package\nPackage: <a href=\"https://pypi.org/project/SparkStream/#description\">SparkStream-pypi</a>\n\n## What is it?\nIt is a handler for processing streaming text data from a kafka topic into cassandra and redis.\n\n## How it works?\nThe stream processing is done by the following steps:\n1. Read data from kafka topic \n2. Parse the data into a spark dataframe with a schema\n3. Clean the data: remove unwanted chars, fix abbreviations, remove stop-words, and remove empty fields\n4. Save the data into cassandra and redis\n\n## How to use it?\nUse its API: <a href=\"https://github.com/HassanRady/Spark-Stream-Api\">SparkStream-API github</a>\n\n## Dependency\nThe package requires the following dependency:\n- spark-redis_2.12-3.1.0-jar-with-dependencies.jar (<a href=\"https://mvnrepository.com/artifact/com.redislabs/spark-redis_2.12/3.1.0\">mvn Repository</a>)\n\nIts so to be able to write data into redis.\n",
    "bugtrack_url": null,
    "license": "MIT license",
    "summary": "A simple spark streaming handler.",
    "version": "1.3.0",
    "split_keywords": [
        "tweets"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "82ef110612c62b8072b90079d7962c14",
                "sha256": "2ec79c6457dfb826534a17d3e5ecf511a8fdfe97b8b02614c95dc15b7e9de573"
            },
            "downloads": -1,
            "filename": "SparkStream-1.3.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "82ef110612c62b8072b90079d7962c14",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6.0",
            "size": 7955,
            "upload_time": "2022-07-25T15:58:29",
            "upload_time_iso_8601": "2022-07-25T15:58:29.629561Z",
            "url": "https://files.pythonhosted.org/packages/82/4d/87d7466c45922d4122d8227dc3829d65ba35b60ecfb4e65b2d4696ddb557/SparkStream-1.3.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "681887fa137c0c73078ed5e2aaf5c225",
                "sha256": "21f469564de160ada453493268819a8eb7657703cc8b5db4f767a4f10cc3f509"
            },
            "downloads": -1,
            "filename": "SparkStream-1.3.0.tar.gz",
            "has_sig": false,
            "md5_digest": "681887fa137c0c73078ed5e2aaf5c225",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6.0",
            "size": 1238610,
            "upload_time": "2022-07-25T15:58:52",
            "upload_time_iso_8601": "2022-07-25T15:58:52.751148Z",
            "url": "https://files.pythonhosted.org/packages/b0/0d/85c90186cfa7cb1e011e4c33ad4db6ae90903b7dbcca9670b8b2bfe77d1c/SparkStream-1.3.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2022-07-25 15:58:52",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "HassanRady",
    "github_project": "SparkStream",
    "lcname": "sparkstream"
}
        
Elapsed time: 0.52056s