qurix-kafka-dataframes


Namequrix-kafka-dataframes JSON
Version 1.3.1 PyPI version JSON
download
home_pagehttps://github.com/qurixtechnology/qurix-kafka-dataframes.git
Summarydataframe packages
upload_time2023-10-23 13:37:28
maintainer
docs_urlNone
authorqurix Technology
requires_python>=3.10, <4
license
keywords python
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # KafkaDataFrames

# What is it?

Qurix kafka dataframes is a Python package that provides a class for sending Pandas DataFrames as required format(e.g. JSON, AVRO etc) to confluent Kafka. It simplifies the process of splitting DataFrames into batches and sending them as individual messages to Kafka topics. This README provides an overview of the package and instructions for usage.

# Main Features

Key features of the package include:

- Producing and consuming the messages using confluent kafka platform
- DataFrame to JSON/AVRO Conversion: The package provides a method (`convert()`)uses to convert pandas DataFrames into JSON and AVRO formats, respectively.
- Batch Sending: The package allows users to send DataFrame data to Kafka in batches, dividing the JSON or AVRO data into smaller chunks for efficient processing.
-Enums : To choose between message formats enums are introduced and can used by importing the `message_formats` module
- Kafka Metadata Headers: For each batch sent to Kafka, metadata headers such as `dataframe_id`, `dataframe_name`, `batch_num`, and `total_batches` are attached, providing additional context to consumers.
- Logging: The package sets up logging using the Python `logging` module to facilitate monitoring and error handling during execution.

# Requirements

`pandas`
`confluent-kafka`
`fastavro`

You can install these dependencies manually or use the provided requirements.txt file in repository.


# Installation

## Create a new virtual environment
`python -m virtualenv .venv --python="python3.11"`

## Activate
source .venv/bin/activate

## Install
To install the `qurix-kafka-dataframes` package, you can use `pip`:

`pip install qurix-kafka-dataframes`

# Usage

Import the `DataFrameProducer` class from the package:
`from qurix.kafka.dataframes.dataframe_producer import JsonDataFrameProducer`

## Example to use Producer
```
from qurix.kafka.dataframes.dataframe_producer import JsonDataFrameProducer
from qurix.kafka.dataframes.message_formats import MessageFormat
import pandas as pd

#Create a DataFrame
data = {'Name': ['John', 'Jane', 'Mike'],'Age': [25, 30, 35]}
df = pd.DataFrame(data)

#Kafka topic to send the data
kafka_topic = 'my_topic'

#Configuration for Kafka producer
kafka_conf = {'bootstrap.servers': 'my server', 'client.id': 'my_name'}

#Create an instance of DataFrameProducer
producer = DataFrameProducer(kafka_conf, MessageFormat.JSON)

#Send the DataFrame to Kafka
producer.send_dataframe(df, kafka_topic)

```

## Example to use Consumer

```
from qurix.kafka.dataframes.dataframe_consumer import DataFrameConsumer
from qurix.kafka.dataframes.message_formats import MessageFormat

#Configuration for Kafka
conf = {'bootstrap.servers': 'localhost:9092','group.id': 'my_consumer_group', 'auto.offset.reset': 'earliest' }

#Kafka-Topic and Consumer
kafka_topic = 'my_topic'

#Create consumer instance
consumer = DataFrameConsumer(conf)

#Consume msgs for topic
result = consumer.consume_dataframes(kafka_topic)

#Get result
header_dataframe = result[0] # Header-DataFrame
dataframe = result[1] 
print(dataframe)

```

# Contact
For any inquiries or questions, feel free to reach out


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/qurixtechnology/qurix-kafka-dataframes.git",
    "name": "qurix-kafka-dataframes",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.10, <4",
    "maintainer_email": "",
    "keywords": "python",
    "author": "qurix Technology",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/9a/c2/57e2c8976a8c0d1faabf6f9190de18298450f28bec8416865f01445d381c/qurix-kafka-dataframes-1.3.1.tar.gz",
    "platform": null,
    "description": "# KafkaDataFrames\n\n# What is it?\n\nQurix kafka dataframes is a Python package that provides a class for sending Pandas DataFrames as required format(e.g. JSON, AVRO etc) to confluent Kafka. It simplifies the process of splitting DataFrames into batches and sending them as individual messages to Kafka topics. This README provides an overview of the package and instructions for usage.\n\n# Main Features\n\nKey features of the package include:\n\n- Producing and consuming the messages using confluent kafka platform\n- DataFrame to JSON/AVRO Conversion: The package provides a method (`convert()`)uses to convert pandas DataFrames into JSON and AVRO formats, respectively.\n- Batch Sending: The package allows users to send DataFrame data to Kafka in batches, dividing the JSON or AVRO data into smaller chunks for efficient processing.\n-Enums : To choose between message formats enums are introduced and can used by importing the `message_formats` module\n- Kafka Metadata Headers: For each batch sent to Kafka, metadata headers such as `dataframe_id`, `dataframe_name`, `batch_num`, and `total_batches` are attached, providing additional context to consumers.\n- Logging: The package sets up logging using the Python `logging` module to facilitate monitoring and error handling during execution.\n\n# Requirements\n\n`pandas`\n`confluent-kafka`\n`fastavro`\n\nYou can install these dependencies manually or use the provided requirements.txt file in repository.\n\n\n# Installation\n\n## Create a new virtual environment\n`python -m virtualenv .venv --python=\"python3.11\"`\n\n## Activate\nsource .venv/bin/activate\n\n## Install\nTo install the `qurix-kafka-dataframes` package, you can use `pip`:\n\n`pip install qurix-kafka-dataframes`\n\n# Usage\n\nImport the `DataFrameProducer` class from the package:\n`from qurix.kafka.dataframes.dataframe_producer import JsonDataFrameProducer`\n\n## Example to use Producer\n```\nfrom qurix.kafka.dataframes.dataframe_producer import JsonDataFrameProducer\nfrom qurix.kafka.dataframes.message_formats import MessageFormat\nimport pandas as pd\n\n#Create a DataFrame\ndata = {'Name': ['John', 'Jane', 'Mike'],'Age': [25, 30, 35]}\ndf = pd.DataFrame(data)\n\n#Kafka topic to send the data\nkafka_topic = 'my_topic'\n\n#Configuration for Kafka producer\nkafka_conf = {'bootstrap.servers': 'my server', 'client.id': 'my_name'}\n\n#Create an instance of DataFrameProducer\nproducer = DataFrameProducer(kafka_conf, MessageFormat.JSON)\n\n#Send the DataFrame to Kafka\nproducer.send_dataframe(df, kafka_topic)\n\n```\n\n## Example to use Consumer\n\n```\nfrom qurix.kafka.dataframes.dataframe_consumer import DataFrameConsumer\nfrom qurix.kafka.dataframes.message_formats import MessageFormat\n\n#Configuration for Kafka\nconf = {'bootstrap.servers': 'localhost:9092','group.id': 'my_consumer_group', 'auto.offset.reset': 'earliest' }\n\n#Kafka-Topic and Consumer\nkafka_topic = 'my_topic'\n\n#Create consumer instance\nconsumer = DataFrameConsumer(conf)\n\n#Consume msgs for topic\nresult = consumer.consume_dataframes(kafka_topic)\n\n#Get result\nheader_dataframe = result[0] # Header-DataFrame\ndataframe = result[1] \nprint(dataframe)\n\n```\n\n# Contact\nFor any inquiries or questions, feel free to reach out\n\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "dataframe packages",
    "version": "1.3.1",
    "project_urls": {
        "Homepage": "https://github.com/qurixtechnology/qurix-kafka-dataframes.git"
    },
    "split_keywords": [
        "python"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7bea55df5a433dee009496a62b45feddc2be7d81d6c4a4ea4becb353a594ef5f",
                "md5": "c5c9586f1ded11a10df1e0cc5520ea0a",
                "sha256": "a3c01ffd8e73b1f540b5414756147027249ce9e7c433c1595606a14fb6e9a15c"
            },
            "downloads": -1,
            "filename": "qurix_kafka_dataframes-1.3.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c5c9586f1ded11a10df1e0cc5520ea0a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10, <4",
            "size": 11292,
            "upload_time": "2023-10-23T13:37:26",
            "upload_time_iso_8601": "2023-10-23T13:37:26.196344Z",
            "url": "https://files.pythonhosted.org/packages/7b/ea/55df5a433dee009496a62b45feddc2be7d81d6c4a4ea4becb353a594ef5f/qurix_kafka_dataframes-1.3.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9ac257e2c8976a8c0d1faabf6f9190de18298450f28bec8416865f01445d381c",
                "md5": "575124968c6046933f655a7fd4023371",
                "sha256": "748b874adcca0e5d1b7fa5478393dc47e0cb692a1ca415402a03374eac085f66"
            },
            "downloads": -1,
            "filename": "qurix-kafka-dataframes-1.3.1.tar.gz",
            "has_sig": false,
            "md5_digest": "575124968c6046933f655a7fd4023371",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10, <4",
            "size": 9121,
            "upload_time": "2023-10-23T13:37:28",
            "upload_time_iso_8601": "2023-10-23T13:37:28.089621Z",
            "url": "https://files.pythonhosted.org/packages/9a/c2/57e2c8976a8c0d1faabf6f9190de18298450f28bec8416865f01445d381c/qurix-kafka-dataframes-1.3.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-10-23 13:37:28",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "qurixtechnology",
    "github_project": "qurix-kafka-dataframes",
    "github_not_found": true,
    "lcname": "qurix-kafka-dataframes"
}
        
Elapsed time: 0.29372s