# ๐ Tukuy
A flexible data transformation library with a plugin system for Python.
## ๐ Overview
Tukuy (meaning "to transform" or "to become" in Quechua) is a powerful and extensible data transformation library that makes it easy to manipulate, validate, and extract data from various formats. With its plugin architecture, Tukuy provides a unified interface for working with text, HTML, JSON, dates, numbers, and more.
## โจ Features
- ๐งฉ **Plugin System**: Easily extend functionality with custom plugins
- ๐ **Chainable Transformers**: Compose multiple transformations in sequence
- ๐งช **Type-safe Transformations**: With built-in validation
- ๐ **Pattern-based Data Extraction**: Extract structured data from HTML and JSON
- ๐ก๏ธ **Error Handling**: Comprehensive error handling with detailed messages
## ๐ฆ Installation
```bash
pip install tukuy
```
## ๐ ๏ธ Basic Usage
```python
from tukuy import TukuyTransformer
# Create transformer
TUKUY = TukuyTransformer()
# Basic text transformation
text = " Hello World! "
result = TUKUY.transform(text, [
"strip",
"lowercase",
{"function": "truncate", "length": 5}
])
print(result) # "hello..."
# HTML transformation
html = "<div>Hello <b>World</b>!</div>"
result = TUKUY.transform(html, [
"strip_html_tags",
"lowercase"
])
print(result) # "hello world!"
# Date transformation
date_str = "2023-01-01"
age = TUKUY.transform(date_str, [
{"function": "age_calc"}
])
print(age) # 1
# Validation
email = "test@example.com"
valid = TUKUY.transform(email, ["email_validator"])
print(valid) # "test@example.com" or None if invalid
```
## ๐ Pattern-based Extraction
Tukuy provides powerful pattern-based extraction capabilities for both HTML and JSON data.
### ๐ HTML Extraction
```python
pattern = {
"properties": [
{
"name": "title",
"selector": "h1",
"transform": ["strip", "lowercase"]
},
{
"name": "links",
"selector": "a",
"attribute": "href",
"type": "array"
}
]
}
data = TUKUY.extract_html_with_pattern(html, pattern)
```
### ๐ JSON Extraction
```python
pattern = {
"properties": [
{
"name": "user",
"selector": "data.user",
"properties": [
{
"name": "name",
"selector": "fullName",
"transform": ["strip"]
}
]
}
]
}
data = TUKUY.extract_json_with_pattern(json_str, pattern)
```
## ๐ Use Cases
Tukuy is designed to handle a wide range of data transformation scenarios:
- ๐ **Web Scraping**: Extract structured data from HTML pages
- ๐ **Data Cleaning**: Normalize and validate data from various sources
- ๐ **Format Conversion**: Transform data between different formats
- ๐ **Text Processing**: Apply complex text transformations
- ๐ **Data Extraction**: Extract specific information from complex structures
- โ
**Validation**: Ensure data meets specific criteria
## โก Performance Tips
- ๐ **Chain Transformations**: Use chained transformations to avoid intermediate objects
- ๐งฉ **Use Built-in Transformers**: Built-in transformers are optimized for performance
- ๐ **Be Specific with Selectors**: More specific selectors are faster to process
- ๐ ๏ธ **Custom Transformers**: For performance-critical operations, create custom transformers
- ๐ฆ **Batch Processing**: Process data in batches for better performance
## ๐ก๏ธ Error Handling
Tukuy provides comprehensive error handling with detailed error messages:
```python
from tukuy.exceptions import ValidationError, TransformationError, ParseError
try:
result = TUKUY.transform(data, transformations)
except ValidationError as e:
print(f"Validation failed: {e}")
except ParseError as e:
print(f"Parsing failed: {e}")
except TransformationError as e:
print(f"Transformation failed: {e}")
```
## ๐ค Contributing
Contributions are welcome! Here's how you can help:
1. ๐ด Fork the repository
2. ๐ฟ Create a feature branch (`git checkout -b feature/amazing-feature`)
3. ๐ป Make your changes
4. โ
Run tests with `pytest`
5. ๐ Update documentation if needed
6. ๐ Commit your changes (`git commit -m 'Add amazing feature'`)
7. ๐ Push to the branch (`git push origin feature/amazing-feature`)
8. ๐ Open a Pull Request
## ๐งฉ Plugin System Documentation
Tukuy's plugin system is the core of its extensibility. Below is a comprehensive list of all available plugins and their features.
### ๐ Built-in Plugins
#### ๐ Text Plugin (`text`)
- **Description**: Handles text manipulation and string operations
- **Key Transformers**:
- `strip`: Remove leading/trailing whitespace
- `lowercase`: Convert text to lowercase
- `uppercase`: Convert text to uppercase
- `truncate`: Truncate text to specified length
- `replace`: Replace text patterns
- `regex_replace`: Replace using regular expressions
- `split`: Split text into array
- `join`: Join array into text
- `normalize`: Normalize text (remove diacritics)
#### ๐ HTML Plugin (`html`)
- **Description**: Process and extract data from HTML content
- **Key Transformers**:
- `strip_html_tags`: Remove HTML tags
- `extract_text`: Extract text content
- `select`: Extract content using CSS selectors
- `extract_links`: Get all links from HTML
- `extract_tables`: Extract tables to structured data
- `clean_html`: Sanitize HTML content
#### ๐
Date Plugin (`date`)
- **Description**: Handle date parsing, formatting, and calculations
- **Key Transformers**:
- `parse_date`: Convert string to date object
- `format_date`: Format date to string
- `age_calc`: Calculate age from date
- `add_days`: Add days to date
- `diff_days`: Calculate days between dates
- `is_weekend`: Check if date is weekend
- `to_timezone`: Convert between timezones
#### ๐ข Numerical Plugin (`numerical`)
- **Description**: Mathematical operations and number formatting
- **Key Transformers**:
- `round`: Round number to decimals
- `format_number`: Format with thousand separators
- `to_currency`: Format as currency
- `percentage`: Convert to percentage
- `math_eval`: Evaluate mathematical expressions
- `scale`: Scale number to range
- `statistics`: Calculate basic statistics
#### โ
Validation Plugin (`validation`)
- **Description**: Data validation and verification
- **Key Transformers**:
- `email_validator`: Validate email addresses
- `url_validator`: Validate URLs
- `phone_validator`: Validate phone numbers
- `length_validator`: Validate string length
- `range_validator`: Validate number ranges
- `regex_validator`: Validate against regex pattern
- `type_validator`: Validate data types
#### ๐ JSON Plugin (`json`)
- **Description**: JSON manipulation and extraction
- **Key Transformers**:
- `parse_json`: Parse JSON string
- `stringify`: Convert to JSON string
- `extract`: Extract values using JSON path
- `flatten`: Flatten nested JSON
- `merge`: Merge multiple JSON objects
- `validate_schema`: Validate against JSON schema
### ๐ Creating Custom Plugins
You can create custom plugins by extending the `TransformerPlugin` class:
```python
from tukuy.plugins import TransformerPlugin
from tukuy.base import ChainableTransformer
class ReverseTransformer(ChainableTransformer[str, str]):
def validate(self, value: str) -> bool:
return isinstance(value, str)
def _transform(self, value: str, context=None) -> str:
return value[::-1]
class MyPlugin(TransformerPlugin):
def __init__(self):
super().__init__("my_plugin")
@property
def transformers(self):
return {
'reverse': lambda _: ReverseTransformer('reverse')
}
# Usage
TUKUY = TukuyTransformer()
TUKUY.register_plugin(MyPlugin())
result = TUKUY.transform("hello", ["reverse"]) # "olleh"
```
### ๐ Plugin Lifecycle
Plugins can implement `initialize()` and `cleanup()` methods for setup and teardown:
```python
class MyPlugin(TransformerPlugin):
def initialize(self) -> None:
super().initialize()
# Load resources, connect to databases, etc.
def cleanup(self) -> None:
super().cleanup()
# Close connections, free resources, etc.
```
Raw data
{
"_id": null,
"home_page": "https://github.com/jhd3197/tukuy",
"name": "tukuy",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": null,
"keywords": null,
"author": "Juan Denis",
"author_email": "juan@vene.co",
"download_url": "https://files.pythonhosted.org/packages/1d/69/535f5e6dbfcd29968383c2b90ea76ab7b79fe51a27a59a24ed145986c253/tukuy-0.0.4.tar.gz",
"platform": null,
"description": "# \ud83c\udf00 Tukuy\n\nA flexible data transformation library with a plugin system for Python.\n\n## \ud83d\ude80 Overview\n\nTukuy (meaning \"to transform\" or \"to become\" in Quechua) is a powerful and extensible data transformation library that makes it easy to manipulate, validate, and extract data from various formats. With its plugin architecture, Tukuy provides a unified interface for working with text, HTML, JSON, dates, numbers, and more.\n\n## \u2728 Features\n\n- \ud83e\udde9 **Plugin System**: Easily extend functionality with custom plugins\n- \ud83d\udd04 **Chainable Transformers**: Compose multiple transformations in sequence\n- \ud83e\uddea **Type-safe Transformations**: With built-in validation\n- \ud83d\udd0d **Pattern-based Data Extraction**: Extract structured data from HTML and JSON\n- \ud83d\udee1\ufe0f **Error Handling**: Comprehensive error handling with detailed messages\n\n## \ud83d\udce6 Installation\n\n```bash\npip install tukuy\n```\n\n## \ud83d\udee0\ufe0f Basic Usage\n\n```python\nfrom tukuy import TukuyTransformer\n\n# Create transformer\nTUKUY = TukuyTransformer()\n\n# Basic text transformation\ntext = \" Hello World! \"\nresult = TUKUY.transform(text, [\n \"strip\",\n \"lowercase\",\n {\"function\": \"truncate\", \"length\": 5}\n])\nprint(result) # \"hello...\"\n\n# HTML transformation\nhtml = \"<div>Hello <b>World</b>!</div>\"\nresult = TUKUY.transform(html, [\n \"strip_html_tags\",\n \"lowercase\"\n])\nprint(result) # \"hello world!\"\n\n# Date transformation\ndate_str = \"2023-01-01\"\nage = TUKUY.transform(date_str, [\n {\"function\": \"age_calc\"}\n])\nprint(age) # 1\n\n# Validation\nemail = \"test@example.com\"\nvalid = TUKUY.transform(email, [\"email_validator\"])\nprint(valid) # \"test@example.com\" or None if invalid\n```\n\n## \ud83d\udd0d Pattern-based Extraction\n\nTukuy provides powerful pattern-based extraction capabilities for both HTML and JSON data.\n\n### \ud83c\udf10 HTML Extraction\n\n```python\npattern = {\n \"properties\": [\n {\n \"name\": \"title\",\n \"selector\": \"h1\",\n \"transform\": [\"strip\", \"lowercase\"]\n },\n {\n \"name\": \"links\",\n \"selector\": \"a\",\n \"attribute\": \"href\",\n \"type\": \"array\"\n }\n ]\n}\n\ndata = TUKUY.extract_html_with_pattern(html, pattern)\n```\n\n### \ud83d\udccb JSON Extraction\n\n```python\npattern = {\n \"properties\": [\n {\n \"name\": \"user\",\n \"selector\": \"data.user\",\n \"properties\": [\n {\n \"name\": \"name\",\n \"selector\": \"fullName\",\n \"transform\": [\"strip\"]\n }\n ]\n }\n ]\n}\n\ndata = TUKUY.extract_json_with_pattern(json_str, pattern)\n```\n\n## \ud83d\ude80 Use Cases\n\nTukuy is designed to handle a wide range of data transformation scenarios:\n\n- \ud83c\udf10 **Web Scraping**: Extract structured data from HTML pages\n- \ud83d\udcca **Data Cleaning**: Normalize and validate data from various sources\n- \ud83d\udd04 **Format Conversion**: Transform data between different formats\n- \ud83d\udcdd **Text Processing**: Apply complex text transformations\n- \ud83d\udd0d **Data Extraction**: Extract specific information from complex structures\n- \u2705 **Validation**: Ensure data meets specific criteria\n\n## \u26a1 Performance Tips\n\n- \ud83d\udd17 **Chain Transformations**: Use chained transformations to avoid intermediate objects\n- \ud83e\udde9 **Use Built-in Transformers**: Built-in transformers are optimized for performance\n- \ud83d\udd0d **Be Specific with Selectors**: More specific selectors are faster to process\n- \ud83d\udee0\ufe0f **Custom Transformers**: For performance-critical operations, create custom transformers\n- \ud83d\udce6 **Batch Processing**: Process data in batches for better performance\n\n## \ud83d\udee1\ufe0f Error Handling\n\nTukuy provides comprehensive error handling with detailed error messages:\n\n```python\nfrom tukuy.exceptions import ValidationError, TransformationError, ParseError\n\ntry:\n result = TUKUY.transform(data, transformations)\nexcept ValidationError as e:\n print(f\"Validation failed: {e}\")\nexcept ParseError as e:\n print(f\"Parsing failed: {e}\")\nexcept TransformationError as e:\n print(f\"Transformation failed: {e}\")\n```\n\n## \ud83e\udd1d Contributing\n\nContributions are welcome! Here's how you can help:\n\n1. \ud83c\udf74 Fork the repository\n2. \ud83c\udf3f Create a feature branch (`git checkout -b feature/amazing-feature`)\n3. \ud83d\udcbb Make your changes\n4. \u2705 Run tests with `pytest`\n5. \ud83d\udcdd Update documentation if needed\n6. \ud83d\udd04 Commit your changes (`git commit -m 'Add amazing feature'`)\n7. \ud83d\ude80 Push to the branch (`git push origin feature/amazing-feature`)\n8. \ud83d\udd0d Open a Pull Request\n\n## \ud83e\udde9 Plugin System Documentation\n\nTukuy's plugin system is the core of its extensibility. Below is a comprehensive list of all available plugins and their features.\n\n### \ud83d\udcda Built-in Plugins\n\n#### \ud83d\udcdd Text Plugin (`text`)\n- **Description**: Handles text manipulation and string operations\n- **Key Transformers**:\n - `strip`: Remove leading/trailing whitespace\n - `lowercase`: Convert text to lowercase\n - `uppercase`: Convert text to uppercase\n - `truncate`: Truncate text to specified length\n - `replace`: Replace text patterns\n - `regex_replace`: Replace using regular expressions\n - `split`: Split text into array\n - `join`: Join array into text\n - `normalize`: Normalize text (remove diacritics)\n\n#### \ud83c\udf10 HTML Plugin (`html`)\n- **Description**: Process and extract data from HTML content\n- **Key Transformers**:\n - `strip_html_tags`: Remove HTML tags\n - `extract_text`: Extract text content\n - `select`: Extract content using CSS selectors\n - `extract_links`: Get all links from HTML\n - `extract_tables`: Extract tables to structured data\n - `clean_html`: Sanitize HTML content\n\n#### \ud83d\udcc5 Date Plugin (`date`)\n- **Description**: Handle date parsing, formatting, and calculations\n- **Key Transformers**:\n - `parse_date`: Convert string to date object\n - `format_date`: Format date to string\n - `age_calc`: Calculate age from date\n - `add_days`: Add days to date\n - `diff_days`: Calculate days between dates\n - `is_weekend`: Check if date is weekend\n - `to_timezone`: Convert between timezones\n\n#### \ud83d\udd22 Numerical Plugin (`numerical`)\n- **Description**: Mathematical operations and number formatting\n- **Key Transformers**:\n - `round`: Round number to decimals\n - `format_number`: Format with thousand separators\n - `to_currency`: Format as currency\n - `percentage`: Convert to percentage\n - `math_eval`: Evaluate mathematical expressions\n - `scale`: Scale number to range\n - `statistics`: Calculate basic statistics\n\n#### \u2705 Validation Plugin (`validation`)\n- **Description**: Data validation and verification\n- **Key Transformers**:\n - `email_validator`: Validate email addresses\n - `url_validator`: Validate URLs\n - `phone_validator`: Validate phone numbers\n - `length_validator`: Validate string length\n - `range_validator`: Validate number ranges\n - `regex_validator`: Validate against regex pattern\n - `type_validator`: Validate data types\n\n#### \ud83d\udccb JSON Plugin (`json`)\n- **Description**: JSON manipulation and extraction\n- **Key Transformers**:\n - `parse_json`: Parse JSON string\n - `stringify`: Convert to JSON string\n - `extract`: Extract values using JSON path\n - `flatten`: Flatten nested JSON\n - `merge`: Merge multiple JSON objects\n - `validate_schema`: Validate against JSON schema\n\n### \ud83d\udd0c Creating Custom Plugins\n\nYou can create custom plugins by extending the `TransformerPlugin` class:\n\n```python\nfrom tukuy.plugins import TransformerPlugin\nfrom tukuy.base import ChainableTransformer\n\nclass ReverseTransformer(ChainableTransformer[str, str]):\n def validate(self, value: str) -> bool:\n return isinstance(value, str)\n \n def _transform(self, value: str, context=None) -> str:\n return value[::-1]\n\nclass MyPlugin(TransformerPlugin):\n def __init__(self):\n super().__init__(\"my_plugin\")\n \n @property\n def transformers(self):\n return {\n 'reverse': lambda _: ReverseTransformer('reverse')\n }\n\n# Usage\nTUKUY = TukuyTransformer()\nTUKUY.register_plugin(MyPlugin())\n\nresult = TUKUY.transform(\"hello\", [\"reverse\"]) # \"olleh\"\n```\n\n### \ud83d\udd04 Plugin Lifecycle\n\nPlugins can implement `initialize()` and `cleanup()` methods for setup and teardown:\n\n```python\nclass MyPlugin(TransformerPlugin):\n def initialize(self) -> None:\n super().initialize()\n # Load resources, connect to databases, etc.\n \n def cleanup(self) -> None:\n super().cleanup()\n # Close connections, free resources, etc.\n```\n",
"bugtrack_url": null,
"license": null,
"summary": "A flexible data transformation library with a plugin system",
"version": "0.0.4",
"project_urls": {
"Homepage": "https://github.com/jhd3197/tukuy"
},
"split_keywords": [],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "ef2584d7d6ccab1113230dbcd1f526f5c518995a7b3e401f426f6da14bd1130d",
"md5": "3951d35ab405e778a97e60ee661ade83",
"sha256": "2fcd235cac8d50b00c23cde879beacc9c9ac6149e1710f055683fc7e355ec6de"
},
"downloads": -1,
"filename": "tukuy-0.0.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "3951d35ab405e778a97e60ee661ade83",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 44209,
"upload_time": "2025-09-08T21:38:16",
"upload_time_iso_8601": "2025-09-08T21:38:16.995784Z",
"url": "https://files.pythonhosted.org/packages/ef/25/84d7d6ccab1113230dbcd1f526f5c518995a7b3e401f426f6da14bd1130d/tukuy-0.0.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "1d69535f5e6dbfcd29968383c2b90ea76ab7b79fe51a27a59a24ed145986c253",
"md5": "27f80029a43cb2d7c911868ef841727a",
"sha256": "1c4c81218e549192ba2a4a57afef6bb9e5be2a1fa92179c362e60b4f212a6107"
},
"downloads": -1,
"filename": "tukuy-0.0.4.tar.gz",
"has_sig": false,
"md5_digest": "27f80029a43cb2d7c911868ef841727a",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 44685,
"upload_time": "2025-09-08T21:38:18",
"upload_time_iso_8601": "2025-09-08T21:38:18.428364Z",
"url": "https://files.pythonhosted.org/packages/1d/69/535f5e6dbfcd29968383c2b90ea76ab7b79fe51a27a59a24ed145986c253/tukuy-0.0.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-08 21:38:18",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "jhd3197",
"github_project": "tukuy",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "beautifulsoup4",
"specs": [
[
">=",
"4.9.3"
]
]
},
{
"name": "pytest",
"specs": [
[
">=",
"7.0.0"
]
]
},
{
"name": "python-slugify",
"specs": [
[
">=",
"5.0.2"
]
]
},
{
"name": "html5lib",
"specs": []
}
],
"lcname": "tukuy"
}