tukuy


Nametukuy JSON
Version 0.0.4 PyPI version JSON
download
home_pagehttps://github.com/jhd3197/tukuy
SummaryA flexible data transformation library with a plugin system
upload_time2025-09-08 21:38:18
maintainerNone
docs_urlNone
authorJuan Denis
requires_python>=3.7
licenseNone
keywords
VCS
bugtrack_url
requirements beautifulsoup4 pytest python-slugify html5lib
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # ๐ŸŒ€ Tukuy

A flexible data transformation library with a plugin system for Python.

## ๐Ÿš€ Overview

Tukuy (meaning "to transform" or "to become" in Quechua) is a powerful and extensible data transformation library that makes it easy to manipulate, validate, and extract data from various formats. With its plugin architecture, Tukuy provides a unified interface for working with text, HTML, JSON, dates, numbers, and more.

## โœจ Features

- ๐Ÿงฉ **Plugin System**: Easily extend functionality with custom plugins
- ๐Ÿ”„ **Chainable Transformers**: Compose multiple transformations in sequence
- ๐Ÿงช **Type-safe Transformations**: With built-in validation
- ๐Ÿ” **Pattern-based Data Extraction**: Extract structured data from HTML and JSON
- ๐Ÿ›ก๏ธ **Error Handling**: Comprehensive error handling with detailed messages

## ๐Ÿ“ฆ Installation

```bash
pip install tukuy
```

## ๐Ÿ› ๏ธ Basic Usage

```python
from tukuy import TukuyTransformer

# Create transformer
TUKUY = TukuyTransformer()

# Basic text transformation
text = " Hello World! "
result = TUKUY.transform(text, [
    "strip",
    "lowercase",
    {"function": "truncate", "length": 5}
])
print(result)  # "hello..."

# HTML transformation
html = "<div>Hello <b>World</b>!</div>"
result = TUKUY.transform(html, [
    "strip_html_tags",
    "lowercase"
])
print(result)  # "hello world!"

# Date transformation
date_str = "2023-01-01"
age = TUKUY.transform(date_str, [
    {"function": "age_calc"}
])
print(age)  # 1

# Validation
email = "test@example.com"
valid = TUKUY.transform(email, ["email_validator"])
print(valid)  # "test@example.com" or None if invalid
```

## ๐Ÿ” Pattern-based Extraction

Tukuy provides powerful pattern-based extraction capabilities for both HTML and JSON data.

### ๐ŸŒ HTML Extraction

```python
pattern = {
    "properties": [
        {
            "name": "title",
            "selector": "h1",
            "transform": ["strip", "lowercase"]
        },
        {
            "name": "links",
            "selector": "a",
            "attribute": "href",
            "type": "array"
        }
    ]
}

data = TUKUY.extract_html_with_pattern(html, pattern)
```

### ๐Ÿ“‹ JSON Extraction

```python
pattern = {
    "properties": [
        {
            "name": "user",
            "selector": "data.user",
            "properties": [
                {
                    "name": "name",
                    "selector": "fullName",
                    "transform": ["strip"]
                }
            ]
        }
    ]
}

data = TUKUY.extract_json_with_pattern(json_str, pattern)
```

## ๐Ÿš€ Use Cases

Tukuy is designed to handle a wide range of data transformation scenarios:

- ๐ŸŒ **Web Scraping**: Extract structured data from HTML pages
- ๐Ÿ“Š **Data Cleaning**: Normalize and validate data from various sources
- ๐Ÿ”„ **Format Conversion**: Transform data between different formats
- ๐Ÿ“ **Text Processing**: Apply complex text transformations
- ๐Ÿ” **Data Extraction**: Extract specific information from complex structures
- โœ… **Validation**: Ensure data meets specific criteria

## โšก Performance Tips

- ๐Ÿ”— **Chain Transformations**: Use chained transformations to avoid intermediate objects
- ๐Ÿงฉ **Use Built-in Transformers**: Built-in transformers are optimized for performance
- ๐Ÿ” **Be Specific with Selectors**: More specific selectors are faster to process
- ๐Ÿ› ๏ธ **Custom Transformers**: For performance-critical operations, create custom transformers
- ๐Ÿ“ฆ **Batch Processing**: Process data in batches for better performance

## ๐Ÿ›ก๏ธ Error Handling

Tukuy provides comprehensive error handling with detailed error messages:

```python
from tukuy.exceptions import ValidationError, TransformationError, ParseError

try:
    result = TUKUY.transform(data, transformations)
except ValidationError as e:
    print(f"Validation failed: {e}")
except ParseError as e:
    print(f"Parsing failed: {e}")
except TransformationError as e:
    print(f"Transformation failed: {e}")
```

## ๐Ÿค Contributing

Contributions are welcome! Here's how you can help:

1. ๐Ÿด Fork the repository
2. ๐ŸŒฟ Create a feature branch (`git checkout -b feature/amazing-feature`)
3. ๐Ÿ’ป Make your changes
4. โœ… Run tests with `pytest`
5. ๐Ÿ“ Update documentation if needed
6. ๐Ÿ”„ Commit your changes (`git commit -m 'Add amazing feature'`)
7. ๐Ÿš€ Push to the branch (`git push origin feature/amazing-feature`)
8. ๐Ÿ” Open a Pull Request

## ๐Ÿงฉ Plugin System Documentation

Tukuy's plugin system is the core of its extensibility. Below is a comprehensive list of all available plugins and their features.

### ๐Ÿ“š Built-in Plugins

#### ๐Ÿ“ Text Plugin (`text`)
- **Description**: Handles text manipulation and string operations
- **Key Transformers**:
  - `strip`: Remove leading/trailing whitespace
  - `lowercase`: Convert text to lowercase
  - `uppercase`: Convert text to uppercase
  - `truncate`: Truncate text to specified length
  - `replace`: Replace text patterns
  - `regex_replace`: Replace using regular expressions
  - `split`: Split text into array
  - `join`: Join array into text
  - `normalize`: Normalize text (remove diacritics)

#### ๐ŸŒ HTML Plugin (`html`)
- **Description**: Process and extract data from HTML content
- **Key Transformers**:
  - `strip_html_tags`: Remove HTML tags
  - `extract_text`: Extract text content
  - `select`: Extract content using CSS selectors
  - `extract_links`: Get all links from HTML
  - `extract_tables`: Extract tables to structured data
  - `clean_html`: Sanitize HTML content

#### ๐Ÿ“… Date Plugin (`date`)
- **Description**: Handle date parsing, formatting, and calculations
- **Key Transformers**:
  - `parse_date`: Convert string to date object
  - `format_date`: Format date to string
  - `age_calc`: Calculate age from date
  - `add_days`: Add days to date
  - `diff_days`: Calculate days between dates
  - `is_weekend`: Check if date is weekend
  - `to_timezone`: Convert between timezones

#### ๐Ÿ”ข Numerical Plugin (`numerical`)
- **Description**: Mathematical operations and number formatting
- **Key Transformers**:
  - `round`: Round number to decimals
  - `format_number`: Format with thousand separators
  - `to_currency`: Format as currency
  - `percentage`: Convert to percentage
  - `math_eval`: Evaluate mathematical expressions
  - `scale`: Scale number to range
  - `statistics`: Calculate basic statistics

#### โœ… Validation Plugin (`validation`)
- **Description**: Data validation and verification
- **Key Transformers**:
  - `email_validator`: Validate email addresses
  - `url_validator`: Validate URLs
  - `phone_validator`: Validate phone numbers
  - `length_validator`: Validate string length
  - `range_validator`: Validate number ranges
  - `regex_validator`: Validate against regex pattern
  - `type_validator`: Validate data types

#### ๐Ÿ“‹ JSON Plugin (`json`)
- **Description**: JSON manipulation and extraction
- **Key Transformers**:
  - `parse_json`: Parse JSON string
  - `stringify`: Convert to JSON string
  - `extract`: Extract values using JSON path
  - `flatten`: Flatten nested JSON
  - `merge`: Merge multiple JSON objects
  - `validate_schema`: Validate against JSON schema

### ๐Ÿ”Œ Creating Custom Plugins

You can create custom plugins by extending the `TransformerPlugin` class:

```python
from tukuy.plugins import TransformerPlugin
from tukuy.base import ChainableTransformer

class ReverseTransformer(ChainableTransformer[str, str]):
    def validate(self, value: str) -> bool:
        return isinstance(value, str)
    
    def _transform(self, value: str, context=None) -> str:
        return value[::-1]

class MyPlugin(TransformerPlugin):
    def __init__(self):
        super().__init__("my_plugin")
    
    @property
    def transformers(self):
        return {
            'reverse': lambda _: ReverseTransformer('reverse')
        }

# Usage
TUKUY = TukuyTransformer()
TUKUY.register_plugin(MyPlugin())

result = TUKUY.transform("hello", ["reverse"])  # "olleh"
```

### ๐Ÿ”„ Plugin Lifecycle

Plugins can implement `initialize()` and `cleanup()` methods for setup and teardown:

```python
class MyPlugin(TransformerPlugin):
    def initialize(self) -> None:
        super().initialize()
        # Load resources, connect to databases, etc.
    
    def cleanup(self) -> None:
        super().cleanup()
        # Close connections, free resources, etc.
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/jhd3197/tukuy",
    "name": "tukuy",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": null,
    "author": "Juan Denis",
    "author_email": "juan@vene.co",
    "download_url": "https://files.pythonhosted.org/packages/1d/69/535f5e6dbfcd29968383c2b90ea76ab7b79fe51a27a59a24ed145986c253/tukuy-0.0.4.tar.gz",
    "platform": null,
    "description": "# \ud83c\udf00 Tukuy\n\nA flexible data transformation library with a plugin system for Python.\n\n## \ud83d\ude80 Overview\n\nTukuy (meaning \"to transform\" or \"to become\" in Quechua) is a powerful and extensible data transformation library that makes it easy to manipulate, validate, and extract data from various formats. With its plugin architecture, Tukuy provides a unified interface for working with text, HTML, JSON, dates, numbers, and more.\n\n## \u2728 Features\n\n- \ud83e\udde9 **Plugin System**: Easily extend functionality with custom plugins\n- \ud83d\udd04 **Chainable Transformers**: Compose multiple transformations in sequence\n- \ud83e\uddea **Type-safe Transformations**: With built-in validation\n- \ud83d\udd0d **Pattern-based Data Extraction**: Extract structured data from HTML and JSON\n- \ud83d\udee1\ufe0f **Error Handling**: Comprehensive error handling with detailed messages\n\n## \ud83d\udce6 Installation\n\n```bash\npip install tukuy\n```\n\n## \ud83d\udee0\ufe0f Basic Usage\n\n```python\nfrom tukuy import TukuyTransformer\n\n# Create transformer\nTUKUY = TukuyTransformer()\n\n# Basic text transformation\ntext = \" Hello World! \"\nresult = TUKUY.transform(text, [\n    \"strip\",\n    \"lowercase\",\n    {\"function\": \"truncate\", \"length\": 5}\n])\nprint(result)  # \"hello...\"\n\n# HTML transformation\nhtml = \"<div>Hello <b>World</b>!</div>\"\nresult = TUKUY.transform(html, [\n    \"strip_html_tags\",\n    \"lowercase\"\n])\nprint(result)  # \"hello world!\"\n\n# Date transformation\ndate_str = \"2023-01-01\"\nage = TUKUY.transform(date_str, [\n    {\"function\": \"age_calc\"}\n])\nprint(age)  # 1\n\n# Validation\nemail = \"test@example.com\"\nvalid = TUKUY.transform(email, [\"email_validator\"])\nprint(valid)  # \"test@example.com\" or None if invalid\n```\n\n## \ud83d\udd0d Pattern-based Extraction\n\nTukuy provides powerful pattern-based extraction capabilities for both HTML and JSON data.\n\n### \ud83c\udf10 HTML Extraction\n\n```python\npattern = {\n    \"properties\": [\n        {\n            \"name\": \"title\",\n            \"selector\": \"h1\",\n            \"transform\": [\"strip\", \"lowercase\"]\n        },\n        {\n            \"name\": \"links\",\n            \"selector\": \"a\",\n            \"attribute\": \"href\",\n            \"type\": \"array\"\n        }\n    ]\n}\n\ndata = TUKUY.extract_html_with_pattern(html, pattern)\n```\n\n### \ud83d\udccb JSON Extraction\n\n```python\npattern = {\n    \"properties\": [\n        {\n            \"name\": \"user\",\n            \"selector\": \"data.user\",\n            \"properties\": [\n                {\n                    \"name\": \"name\",\n                    \"selector\": \"fullName\",\n                    \"transform\": [\"strip\"]\n                }\n            ]\n        }\n    ]\n}\n\ndata = TUKUY.extract_json_with_pattern(json_str, pattern)\n```\n\n## \ud83d\ude80 Use Cases\n\nTukuy is designed to handle a wide range of data transformation scenarios:\n\n- \ud83c\udf10 **Web Scraping**: Extract structured data from HTML pages\n- \ud83d\udcca **Data Cleaning**: Normalize and validate data from various sources\n- \ud83d\udd04 **Format Conversion**: Transform data between different formats\n- \ud83d\udcdd **Text Processing**: Apply complex text transformations\n- \ud83d\udd0d **Data Extraction**: Extract specific information from complex structures\n- \u2705 **Validation**: Ensure data meets specific criteria\n\n## \u26a1 Performance Tips\n\n- \ud83d\udd17 **Chain Transformations**: Use chained transformations to avoid intermediate objects\n- \ud83e\udde9 **Use Built-in Transformers**: Built-in transformers are optimized for performance\n- \ud83d\udd0d **Be Specific with Selectors**: More specific selectors are faster to process\n- \ud83d\udee0\ufe0f **Custom Transformers**: For performance-critical operations, create custom transformers\n- \ud83d\udce6 **Batch Processing**: Process data in batches for better performance\n\n## \ud83d\udee1\ufe0f Error Handling\n\nTukuy provides comprehensive error handling with detailed error messages:\n\n```python\nfrom tukuy.exceptions import ValidationError, TransformationError, ParseError\n\ntry:\n    result = TUKUY.transform(data, transformations)\nexcept ValidationError as e:\n    print(f\"Validation failed: {e}\")\nexcept ParseError as e:\n    print(f\"Parsing failed: {e}\")\nexcept TransformationError as e:\n    print(f\"Transformation failed: {e}\")\n```\n\n## \ud83e\udd1d Contributing\n\nContributions are welcome! Here's how you can help:\n\n1. \ud83c\udf74 Fork the repository\n2. \ud83c\udf3f Create a feature branch (`git checkout -b feature/amazing-feature`)\n3. \ud83d\udcbb Make your changes\n4. \u2705 Run tests with `pytest`\n5. \ud83d\udcdd Update documentation if needed\n6. \ud83d\udd04 Commit your changes (`git commit -m 'Add amazing feature'`)\n7. \ud83d\ude80 Push to the branch (`git push origin feature/amazing-feature`)\n8. \ud83d\udd0d Open a Pull Request\n\n## \ud83e\udde9 Plugin System Documentation\n\nTukuy's plugin system is the core of its extensibility. Below is a comprehensive list of all available plugins and their features.\n\n### \ud83d\udcda Built-in Plugins\n\n#### \ud83d\udcdd Text Plugin (`text`)\n- **Description**: Handles text manipulation and string operations\n- **Key Transformers**:\n  - `strip`: Remove leading/trailing whitespace\n  - `lowercase`: Convert text to lowercase\n  - `uppercase`: Convert text to uppercase\n  - `truncate`: Truncate text to specified length\n  - `replace`: Replace text patterns\n  - `regex_replace`: Replace using regular expressions\n  - `split`: Split text into array\n  - `join`: Join array into text\n  - `normalize`: Normalize text (remove diacritics)\n\n#### \ud83c\udf10 HTML Plugin (`html`)\n- **Description**: Process and extract data from HTML content\n- **Key Transformers**:\n  - `strip_html_tags`: Remove HTML tags\n  - `extract_text`: Extract text content\n  - `select`: Extract content using CSS selectors\n  - `extract_links`: Get all links from HTML\n  - `extract_tables`: Extract tables to structured data\n  - `clean_html`: Sanitize HTML content\n\n#### \ud83d\udcc5 Date Plugin (`date`)\n- **Description**: Handle date parsing, formatting, and calculations\n- **Key Transformers**:\n  - `parse_date`: Convert string to date object\n  - `format_date`: Format date to string\n  - `age_calc`: Calculate age from date\n  - `add_days`: Add days to date\n  - `diff_days`: Calculate days between dates\n  - `is_weekend`: Check if date is weekend\n  - `to_timezone`: Convert between timezones\n\n#### \ud83d\udd22 Numerical Plugin (`numerical`)\n- **Description**: Mathematical operations and number formatting\n- **Key Transformers**:\n  - `round`: Round number to decimals\n  - `format_number`: Format with thousand separators\n  - `to_currency`: Format as currency\n  - `percentage`: Convert to percentage\n  - `math_eval`: Evaluate mathematical expressions\n  - `scale`: Scale number to range\n  - `statistics`: Calculate basic statistics\n\n#### \u2705 Validation Plugin (`validation`)\n- **Description**: Data validation and verification\n- **Key Transformers**:\n  - `email_validator`: Validate email addresses\n  - `url_validator`: Validate URLs\n  - `phone_validator`: Validate phone numbers\n  - `length_validator`: Validate string length\n  - `range_validator`: Validate number ranges\n  - `regex_validator`: Validate against regex pattern\n  - `type_validator`: Validate data types\n\n#### \ud83d\udccb JSON Plugin (`json`)\n- **Description**: JSON manipulation and extraction\n- **Key Transformers**:\n  - `parse_json`: Parse JSON string\n  - `stringify`: Convert to JSON string\n  - `extract`: Extract values using JSON path\n  - `flatten`: Flatten nested JSON\n  - `merge`: Merge multiple JSON objects\n  - `validate_schema`: Validate against JSON schema\n\n### \ud83d\udd0c Creating Custom Plugins\n\nYou can create custom plugins by extending the `TransformerPlugin` class:\n\n```python\nfrom tukuy.plugins import TransformerPlugin\nfrom tukuy.base import ChainableTransformer\n\nclass ReverseTransformer(ChainableTransformer[str, str]):\n    def validate(self, value: str) -> bool:\n        return isinstance(value, str)\n    \n    def _transform(self, value: str, context=None) -> str:\n        return value[::-1]\n\nclass MyPlugin(TransformerPlugin):\n    def __init__(self):\n        super().__init__(\"my_plugin\")\n    \n    @property\n    def transformers(self):\n        return {\n            'reverse': lambda _: ReverseTransformer('reverse')\n        }\n\n# Usage\nTUKUY = TukuyTransformer()\nTUKUY.register_plugin(MyPlugin())\n\nresult = TUKUY.transform(\"hello\", [\"reverse\"])  # \"olleh\"\n```\n\n### \ud83d\udd04 Plugin Lifecycle\n\nPlugins can implement `initialize()` and `cleanup()` methods for setup and teardown:\n\n```python\nclass MyPlugin(TransformerPlugin):\n    def initialize(self) -> None:\n        super().initialize()\n        # Load resources, connect to databases, etc.\n    \n    def cleanup(self) -> None:\n        super().cleanup()\n        # Close connections, free resources, etc.\n```\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A flexible data transformation library with a plugin system",
    "version": "0.0.4",
    "project_urls": {
        "Homepage": "https://github.com/jhd3197/tukuy"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "ef2584d7d6ccab1113230dbcd1f526f5c518995a7b3e401f426f6da14bd1130d",
                "md5": "3951d35ab405e778a97e60ee661ade83",
                "sha256": "2fcd235cac8d50b00c23cde879beacc9c9ac6149e1710f055683fc7e355ec6de"
            },
            "downloads": -1,
            "filename": "tukuy-0.0.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "3951d35ab405e778a97e60ee661ade83",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 44209,
            "upload_time": "2025-09-08T21:38:16",
            "upload_time_iso_8601": "2025-09-08T21:38:16.995784Z",
            "url": "https://files.pythonhosted.org/packages/ef/25/84d7d6ccab1113230dbcd1f526f5c518995a7b3e401f426f6da14bd1130d/tukuy-0.0.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "1d69535f5e6dbfcd29968383c2b90ea76ab7b79fe51a27a59a24ed145986c253",
                "md5": "27f80029a43cb2d7c911868ef841727a",
                "sha256": "1c4c81218e549192ba2a4a57afef6bb9e5be2a1fa92179c362e60b4f212a6107"
            },
            "downloads": -1,
            "filename": "tukuy-0.0.4.tar.gz",
            "has_sig": false,
            "md5_digest": "27f80029a43cb2d7c911868ef841727a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 44685,
            "upload_time": "2025-09-08T21:38:18",
            "upload_time_iso_8601": "2025-09-08T21:38:18.428364Z",
            "url": "https://files.pythonhosted.org/packages/1d/69/535f5e6dbfcd29968383c2b90ea76ab7b79fe51a27a59a24ed145986c253/tukuy-0.0.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-08 21:38:18",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "jhd3197",
    "github_project": "tukuy",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "beautifulsoup4",
            "specs": [
                [
                    ">=",
                    "4.9.3"
                ]
            ]
        },
        {
            "name": "pytest",
            "specs": [
                [
                    ">=",
                    "7.0.0"
                ]
            ]
        },
        {
            "name": "python-slugify",
            "specs": [
                [
                    ">=",
                    "5.0.2"
                ]
            ]
        },
        {
            "name": "html5lib",
            "specs": []
        }
    ],
    "lcname": "tukuy"
}
        
Elapsed time: 1.00607s