godspeedio


Namegodspeedio JSON
Version 0.1.2 PyPI version JSON
download
home_pagehttps://github.com/apburton84/godspeed
Summarymemory efficient, fast, and easy to use stream processing library
upload_time2023-09-06 18:46:29
maintainer
docs_urlNone
authorAnthony Burton
requires_python
licenseMIT
keywords stream processing memory efficient fast easy use library
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # ⚡ Godspeed IO

## Memory Efficient Stream Processor

Welcome to the Godspeed project! 

This project provides a versatile and memory-efficient solution for processing and transforming text streams in Python. 
Whether you're dealing with large text files, real-time data streams, or any scenario where memory is a concern, this tool aims to meet your needs.

## Features

- **Memory Efficiency**: This project prioritizes memory efficiency, making it suitable for processing large text data without consuming excessive memory resources.
- **Stream Processing**: The core functionality revolves around processing text streams. You can read text data line by line or in chunks, avoiding loading the entire content into memory.
- **Flexible Transformation**: The project enables you to define custom transformation functions to process the text data as it streams through the system.
- **Easy-to-Use**: The provided API is designed to be user-friendly, making it accessible for developers of various skill levels.
- **Integration**: The project can seamlessly integrate into various data processing pipelines, ETL workflows, and text analysis applications.

## ⚒️  Installation

You can install the package from the Python Package Index (PyPI) using `pip`:

```bash
pip install godspeedio
```

## 🪧 Usage

- **Custom Transformation**: Define your custom transformation function that takes a line of text as input and returns the transformed line. This function can perform any operation you need, such as text manipulation, data extraction, or filtering.
- **Process Stream**: Use the `godspeedio()` function to process the text stream. Provide the input and output file handles along with your custom transformation function.
- **Efficient Processing**: The library processes the text stream line by line, minimizing memory usage. It's suitable for situations where loading the entire text data into memory is not feasible.

## 📣 Example

To illustrate the usage of this library, here's a simple example that reads a text file, ensures each row has an equal number of columns, and make it available again for further processing:

```python
from godspeedio import godspeed, processor


@processor(order=1)
def ensure_equal_columns(chunk, width=10, sep=","):
    """Ensure that all rows have the same number of columns"""
    chunk = chunk.rstrip("\n")
    if chunk.count(sep) < width:
        chunk += sep * (width - chunk.count(sep)) + "\n"
    return chunk


file = open("large_file.csv")
with godspeed(file_obj) as f:
    for chunk in f:
      pass # Do something with the line (post processing)
```

The main goal of the code is to ensure that all rows in the CSV file have the same number of columns by padding the rows with separators if necessary.

Let's break down the code step by step and explain its functionality:

1. Import Statements:

```python
from godspeedio import godspeed, processor
```

- This line imports two components from the "godspeedio" library: the `godspeed` function and the `processor` decorator.

2.The `@processor` Decorator:

```python
@processor(order=1)
def ensure_equal_columns(chunk, width=10, sep=","):
   """Ensure that all rows have the same number of columns"""
   chunk = chunk.rstrip("\n")
   if chunk.count(sep) < width:
       chunk += sep * (width - chunk.count(sep)) + "\n"
   return chunk
```

- We define a transformation function `ensure_equal_columns` and decorated it with `@processor(order=1)`. 
- The `order=1` argument indicates the order in which processors will be applied. With the `deault=0`
- The function takes three parameters:
  - `chunk`: A single line (chunk) read from the CSV file.
  - `width`: The desired width (number of columns) for each row.
  - `sep`: The separator used in the CSV file (default is a comma `,`).
- The function's purpose is to ensure that each line (row) in the CSV file has the same number of columns. It does this by counting the occurrences of the separator in the current chunk. If the count is less than the desired width, it pads the chunk with additional separators to match the desired width. Finally, it returns the modified chunk.

3. File Handling and Processing:

```python
file = open("large_file.csv")
with godspeed(file_obj) as f:
   for chunk in f:
       pass # Do something with the line (post processing)
```

- This part of the code demonstrates how to use the `godspeedio` library to process a large CSV file.
- It opens the file named "large_file.csv".
- The `godspeed` function is used as a context manager by passing the file object `file` to it.
- Inside the context, a loop iterates over the chunks (lines) of the file.
- Sequencially applying the transformations to each line.

## 📣 Example - state management

This code sample demonstrates how to use the state management functionality provided by the `godspeedio` library. The primary purpose of this code appears to be to process a large CSV file while maintaining and updating some state information as it processes each line. Let's break down the code and focus on the state management aspect:

```python
from godspeedio import godspeed, processor


@processor(state=True)
def add_relationship(chunk, state):
    # this will be true for the first row
    if "01" == chunk[0:2]:
        state.set("parent_id", chunk.split("*")[1])
    return chunk.rstrip("\n") + "*" + state.get("parent_id") + "\n"


file = open("large_file.csv")
with godspeed(file_obj) as f:
    for chunk in f:
        pass  # Do something with the line (post processing)
```

1. Importing Dependencies:

 ```python
 from godspeedio import godspeed, processor
 ```

 - The code imports two modules from the `godspeedio` library: `godspeed` and `processor`. These modules are used for file input/output and defining custom processing functions.

2. Defining a Custom Processor Function with State:

```python
@processor(state=True)
def add_relationship(chunk, state):
  # this will be true for the first row
  if "01" == chunk[0:2]:
     state.set("parent_id", chunk.split("*")[1])
  return chunk.rstrip("\n") + "*" + state.get("parent_id") + "\n"
```

- The `@processor(state=True)` decorator is used to create a custom processing function called `add_relationship`. The `state=True` argument indicates that this function will use a state object to store and share data between processing iterations.
- Inside this function:
  - It checks if the first two characters of the `chunk` are equal to "01". If this condition is met, it extracts information from the chunk and stores it in the state object using `state.set()`.
  - It modifies the `chunk` by appending some data extracted from the state object and removing any trailing newline characters.
  - Finally, it returns the modified `chunk`.

3. Opening and Reading the CSV File:

```python
file = open("large_file.csv")
with godspeed(file_obj) as f:
   for chunk in f:
       pass  # Do something with the line (post processing)
```

- The script opens a CSV file named "large_file.csv" for reading and assigns it to the `file` variable.
- It then uses a `godspeed` context manager (`with godspeed(file_obj) as f`) to read the file line by line. The `with` statement ensures that the file is properly closed after processing.
- Inside the loop (`for chunk in f:`), each line (or chunk) from the CSV file is processed. However, the loop currently contains a placeholder (`pass`), indicating that the actual post-processing logic needs to be implemented here.

In summary, this example showcases the use of the `godspeedio` library's state management functionality, allowing you to maintain and update shared data (in this case, "parent_id") while processing a large CSV file. The actual post-processing logic should be implemented inside the loop to take advantage of the state information stored in the `add_relationship` function.

## 🙏 Contributions

Contributions to this project are welcome! If you have suggestions, bug reports, or want to add new features, feel free to open issues and pull requests on the GitHub repository.


## ⚖️ License

This project is licensed under the MIT License.



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/apburton84/godspeed",
    "name": "godspeedio",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "STREAM,PROCESSING,MEMORY,EFFICIENT,FAST,EASY,USE,LIBRARY",
    "author": "Anthony Burton",
    "author_email": "apburton84@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/ba/fc/0d54df25ec847d1faaad2e78e85d267652feae10de4bccf08c4a73bdc497/godspeedio-0.1.2.tar.gz",
    "platform": null,
    "description": "# \u26a1 Godspeed IO\n\n## Memory Efficient Stream Processor\n\nWelcome to the Godspeed project! \n\nThis project provides a versatile and memory-efficient solution for processing and transforming text streams in Python. \nWhether you're dealing with large text files, real-time data streams, or any scenario where memory is a concern, this tool aims to meet your needs.\n\n## Features\n\n- **Memory Efficiency**: This project prioritizes memory efficiency, making it suitable for processing large text data without consuming excessive memory resources.\n- **Stream Processing**: The core functionality revolves around processing text streams. You can read text data line by line or in chunks, avoiding loading the entire content into memory.\n- **Flexible Transformation**: The project enables you to define custom transformation functions to process the text data as it streams through the system.\n- **Easy-to-Use**: The provided API is designed to be user-friendly, making it accessible for developers of various skill levels.\n- **Integration**: The project can seamlessly integrate into various data processing pipelines, ETL workflows, and text analysis applications.\n\n## \u2692\ufe0f  Installation\n\nYou can install the package from the Python Package Index (PyPI) using `pip`:\n\n```bash\npip install godspeedio\n```\n\n## \ud83e\udea7 Usage\n\n- **Custom Transformation**: Define your custom transformation function that takes a line of text as input and returns the transformed line. This function can perform any operation you need, such as text manipulation, data extraction, or filtering.\n- **Process Stream**: Use the `godspeedio()` function to process the text stream. Provide the input and output file handles along with your custom transformation function.\n- **Efficient Processing**: The library processes the text stream line by line, minimizing memory usage. It's suitable for situations where loading the entire text data into memory is not feasible.\n\n## \ud83d\udce3 Example\n\nTo illustrate the usage of this library, here's a simple example that reads a text file, ensures each row has an equal number of columns, and make it available again for further processing:\n\n```python\nfrom godspeedio import godspeed, processor\n\n\n@processor(order=1)\ndef ensure_equal_columns(chunk, width=10, sep=\",\"):\n    \"\"\"Ensure that all rows have the same number of columns\"\"\"\n    chunk = chunk.rstrip(\"\\n\")\n    if chunk.count(sep) < width:\n        chunk += sep * (width - chunk.count(sep)) + \"\\n\"\n    return chunk\n\n\nfile = open(\"large_file.csv\")\nwith godspeed(file_obj) as f:\n    for chunk in f:\n      pass # Do something with the line (post processing)\n```\n\nThe main goal of the code is to ensure that all rows in the CSV file have the same number of columns by padding the rows with separators if necessary.\n\nLet's break down the code step by step and explain its functionality:\n\n1. Import Statements:\n\n```python\nfrom godspeedio import godspeed, processor\n```\n\n- This line imports two components from the \"godspeedio\" library: the `godspeed` function and the `processor` decorator.\n\n2.The `@processor` Decorator:\n\n```python\n@processor(order=1)\ndef ensure_equal_columns(chunk, width=10, sep=\",\"):\n   \"\"\"Ensure that all rows have the same number of columns\"\"\"\n   chunk = chunk.rstrip(\"\\n\")\n   if chunk.count(sep) < width:\n       chunk += sep * (width - chunk.count(sep)) + \"\\n\"\n   return chunk\n```\n\n- We define a transformation function `ensure_equal_columns` and decorated it with `@processor(order=1)`. \n- The `order=1` argument indicates the order in which processors will be applied. With the `deault=0`\n- The function takes three parameters:\n  - `chunk`: A single line (chunk) read from the CSV file.\n  - `width`: The desired width (number of columns) for each row.\n  - `sep`: The separator used in the CSV file (default is a comma `,`).\n- The function's purpose is to ensure that each line (row) in the CSV file has the same number of columns. It does this by counting the occurrences of the separator in the current chunk. If the count is less than the desired width, it pads the chunk with additional separators to match the desired width. Finally, it returns the modified chunk.\n\n3. File Handling and Processing:\n\n```python\nfile = open(\"large_file.csv\")\nwith godspeed(file_obj) as f:\n   for chunk in f:\n       pass # Do something with the line (post processing)\n```\n\n- This part of the code demonstrates how to use the `godspeedio` library to process a large CSV file.\n- It opens the file named \"large_file.csv\".\n- The `godspeed` function is used as a context manager by passing the file object `file` to it.\n- Inside the context, a loop iterates over the chunks (lines) of the file.\n- Sequencially applying the transformations to each line.\n\n## \ud83d\udce3 Example - state management\n\nThis code sample demonstrates how to use the state management functionality provided by the `godspeedio` library. The primary purpose of this code appears to be to process a large CSV file while maintaining and updating some state information as it processes each line. Let's break down the code and focus on the state management aspect:\n\n```python\nfrom godspeedio import godspeed, processor\n\n\n@processor(state=True)\ndef add_relationship(chunk, state):\n    # this will be true for the first row\n    if \"01\" == chunk[0:2]:\n        state.set(\"parent_id\", chunk.split(\"*\")[1])\n    return chunk.rstrip(\"\\n\") + \"*\" + state.get(\"parent_id\") + \"\\n\"\n\n\nfile = open(\"large_file.csv\")\nwith godspeed(file_obj) as f:\n    for chunk in f:\n        pass  # Do something with the line (post processing)\n```\n\n1. Importing Dependencies:\n\n ```python\n from godspeedio import godspeed, processor\n ```\n\n - The code imports two modules from the `godspeedio` library: `godspeed` and `processor`. These modules are used for file input/output and defining custom processing functions.\n\n2. Defining a Custom Processor Function with State:\n\n```python\n@processor(state=True)\ndef add_relationship(chunk, state):\n  # this will be true for the first row\n  if \"01\" == chunk[0:2]:\n     state.set(\"parent_id\", chunk.split(\"*\")[1])\n  return chunk.rstrip(\"\\n\") + \"*\" + state.get(\"parent_id\") + \"\\n\"\n```\n\n- The `@processor(state=True)` decorator is used to create a custom processing function called `add_relationship`. The `state=True` argument indicates that this function will use a state object to store and share data between processing iterations.\n- Inside this function:\n  - It checks if the first two characters of the `chunk` are equal to \"01\". If this condition is met, it extracts information from the chunk and stores it in the state object using `state.set()`.\n  - It modifies the `chunk` by appending some data extracted from the state object and removing any trailing newline characters.\n  - Finally, it returns the modified `chunk`.\n\n3. Opening and Reading the CSV File:\n\n```python\nfile = open(\"large_file.csv\")\nwith godspeed(file_obj) as f:\n   for chunk in f:\n       pass  # Do something with the line (post processing)\n```\n\n- The script opens a CSV file named \"large_file.csv\" for reading and assigns it to the `file` variable.\n- It then uses a `godspeed` context manager (`with godspeed(file_obj) as f`) to read the file line by line. The `with` statement ensures that the file is properly closed after processing.\n- Inside the loop (`for chunk in f:`), each line (or chunk) from the CSV file is processed. However, the loop currently contains a placeholder (`pass`), indicating that the actual post-processing logic needs to be implemented here.\n\nIn summary, this example showcases the use of the `godspeedio` library's state management functionality, allowing you to maintain and update shared data (in this case, \"parent_id\") while processing a large CSV file. The actual post-processing logic should be implemented inside the loop to take advantage of the state information stored in the `add_relationship` function.\n\n## \ud83d\ude4f Contributions\n\nContributions to this project are welcome! If you have suggestions, bug reports, or want to add new features, feel free to open issues and pull requests on the GitHub repository.\n\n\n## \u2696\ufe0f License\n\nThis project is licensed under the MIT License.\n\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "memory efficient, fast, and easy to use stream processing library",
    "version": "0.1.2",
    "project_urls": {
        "Download": "https://codeload.github.com/apburton84/godspeed/zip/refs/heads/main",
        "Homepage": "https://github.com/apburton84/godspeed"
    },
    "split_keywords": [
        "stream",
        "processing",
        "memory",
        "efficient",
        "fast",
        "easy",
        "use",
        "library"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "bafc0d54df25ec847d1faaad2e78e85d267652feae10de4bccf08c4a73bdc497",
                "md5": "2b97e8cac3de4de88de1984b55d669a0",
                "sha256": "2659c3a0fd75c248d1477592b95d019362a90344854f8c74eb83a96ba2b7fb79"
            },
            "downloads": -1,
            "filename": "godspeedio-0.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "2b97e8cac3de4de88de1984b55d669a0",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 7929,
            "upload_time": "2023-09-06T18:46:29",
            "upload_time_iso_8601": "2023-09-06T18:46:29.834580Z",
            "url": "https://files.pythonhosted.org/packages/ba/fc/0d54df25ec847d1faaad2e78e85d267652feae10de4bccf08c4a73bdc497/godspeedio-0.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-09-06 18:46:29",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "apburton84",
    "github_project": "godspeed",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "godspeedio"
}
        
Elapsed time: 0.15157s