[](https://pyscaffold.org/)
[](https://pypi.org/project/awk_plus_plus/)
<!-- These are examples of badges you might also want to add to your README. Update the URLs accordingly.
[](https://cirrus-ci.com/github/<USER>/awk_plus_plus)
[](https://awk_plus_plus.readthedocs.io/en/stable/)
[](https://coveralls.io/r/<USER>/awk_plus_plus)
[](https://anaconda.org/conda-forge/awk_plus_plus)
[](https://pepy.tech/project/awk_plus_plus)
[](https://twitter.com/awk_plus_plus)
-->
# awk_plus_plus
> A language designed for data orchestration.
## Features
* Fuzzy regex engine and Semantic search to retrieve information in an in-process DB.
* End-user programming.
* Orthogonal Persistence based on DuckDB
* Transparent reference with Jsonnet. We plan to execute this feature with Dask.
* URL interpreter to manage data sources.
## Installation from pip
Install the package with:
```bash
pip install awk_plus_plus
```
# CLI Usage
You output your data to JSON with the `cti` command.
## Web service
The command runs a web service with Gradio, allowing you to execute your expressions through a user-friendly user interface or by making HTTP requests.
```bash
cti run-webservice
```
## Jsonnet support
### Hello world
```bash
cti i "Hello world" -p -v 4
```
### Jsonnet support
```bash
cti i '{"keys":: ["AWK", "SED", "SHELL"], "languages": [std.asciiLower(x) for x in self.keys]}'
```
## URL interpreter
Our step further is the URL interpreter which allows you to manage different data sources with an unique syntax across a set of plugins.
## STDIN, STDOUT, STDERR
```bash
cti i '{"lines": interpret("stream://stdin?strip=true")}'
```
## Imap
```bash
cti i '{"emails": interpret("imap://USER:PASSWORD@HOST:993/INBOX")}'
```
## Keyring
```bash
cti i '{"email":: interpret("keyring://backend/awk_plus_plus/email"), "emails": interpret($.email)}'
```
## Files
```bash
cti i 'interpret("**/*.csv")'
```
## SQL
```bash
cti i 'interpret("sql:SELECT * FROM email")'
```
## Leverage the Power of Reference with Jsonnet
Unlike other programming languages that require multiple steps to reference data, Jsonnet requires only one step, thanks to its reference mechanism.
This is particularly useful for data engineers who want to connect different services in a topological order. The code below represents this scenario in Python:
```python
import requests
def fetch_character(id):
url = f"https://rickandmortyapi.com/api/character/{id}"
response = requests.get(url)
return response.json()
def process_character(character):
# Add new 'image' field with processed URL
character['image'] += f"?awk_download=data/{character['name'].replace(' ', '_').lower()}.jpeg"
# Process 'episode' field, fetching additional data if necessary
character['episode'] = [requests.get(episode).json() for episode in character['episode']]
return character
print([process_character(fetch_character(id)) for id in [1, 2, 3, 4, 5, 6]])
```
Contrary to the previous Python code, Jsonnet allows you to leverage the power of referential transparency. The previous code is equivalent in Jsonnet to:
```jsonnet
[
i("https://rickandmortyapi.com/api/character/%s" % id) +
{image: i(super.image+"?awk_download=data/"+std.strReplace(std.asciiLower(super.name), " ", "_")+".jpeg")} +
{episode: [i(episode) for episode in super.episode]}
for id in [1,2,3,4,5,6]
]
```
## Connect and call different data sources in one expression
```jsonnet
{
"emails": i("sql:SELECT subject FROM `%s`" % self.email),
// This expression saves the unseen emails from your inbox, as defined in your keyring, using IMAP query criteria. It then returns the netloc hash, which refers to the table.
"email": i(i("keyring://backend/awk_plus_plus/primary_email")+"?q=UNSEEN")
}
```
# Protocols and Plugins
* pop3://
* imap://
* keyring://backend/{service}/{username}
* sql:{expression}
* https://
* file:/
## Note
This project has been set up using [PyScaffold] 4.5 and the [dsproject extension] 0.0.post167+g4386552.
[conda]: https://docs.conda.io/
[pre-commit]: https://pre-commit.com/
[Jupyter]: https://jupyter.org/
[nbstripout]: https://github.com/kynan/nbstripout
[Google style]: http://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings
[PyScaffold]: https://pyscaffold.org/
[dsproject extension]: https://github.com/pyscaffold/pyscaffoldext-dsproject
Raw data
{
"_id": null,
"home_page": "https://github.com/pyscaffold/pyscaffold/",
"name": "awk-plus-plus",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": null,
"author": "sanchezcarlosjr",
"author_email": "24639141+sanchezcarlosjr@users.noreply.github.com",
"download_url": "https://files.pythonhosted.org/packages/8e/29/991bcc251a70e0b9577019b072ea8d2aa3e9add956acc56b4f0ad47e6ca7/awk_plus_plus-0.17.1.tar.gz",
"platform": "any",
"description": "[](https://pyscaffold.org/)\n[](https://pypi.org/project/awk_plus_plus/)\n\n<!-- These are examples of badges you might also want to add to your README. Update the URLs accordingly.\n[](https://cirrus-ci.com/github/<USER>/awk_plus_plus)\n[](https://awk_plus_plus.readthedocs.io/en/stable/)\n[](https://coveralls.io/r/<USER>/awk_plus_plus)\n[](https://anaconda.org/conda-forge/awk_plus_plus)\n[](https://pepy.tech/project/awk_plus_plus)\n[](https://twitter.com/awk_plus_plus)\n-->\n\n# awk_plus_plus\n\n> A language designed for data orchestration. \n\n## Features\n* Fuzzy regex engine and Semantic search to retrieve information in an in-process DB.\n* End-user programming.\n* Orthogonal Persistence based on DuckDB\n* Transparent reference with Jsonnet. We plan to execute this feature with Dask.\n* URL interpreter to manage data sources.\n\n## Installation from pip\nInstall the package with:\n```bash\npip install awk_plus_plus\n```\n\n# CLI Usage\nYou output your data to JSON with the `cti` command.\n\n## Web service\nThe command runs a web service with Gradio, allowing you to execute your expressions through a user-friendly user interface or by making HTTP requests.\n```bash\ncti run-webservice\n```\n\n## Jsonnet support\n### Hello world\n```bash\ncti i \"Hello world\" -p -v 4\n```\n\n### Jsonnet support\n```bash\ncti i '{\"keys\":: [\"AWK\", \"SED\", \"SHELL\"], \"languages\": [std.asciiLower(x) for x in self.keys]}'\n```\n\n## URL interpreter\nOur step further is the URL interpreter which allows you to manage different data sources with an unique syntax across a set of plugins.\n\n## STDIN, STDOUT, STDERR\n```bash\ncti i '{\"lines\": interpret(\"stream://stdin?strip=true\")}'\n```\n\n## Imap\n```bash\ncti i '{\"emails\": interpret(\"imap://USER:PASSWORD@HOST:993/INBOX\")}'\n```\n\n## Keyring\n```bash\ncti i '{\"email\":: interpret(\"keyring://backend/awk_plus_plus/email\"), \"emails\": interpret($.email)}'\n```\n\n## Files\n```bash\ncti i 'interpret(\"**/*.csv\")'\n```\n\n## SQL\n```bash\ncti i 'interpret(\"sql:SELECT * FROM email\")'\n```\n\n## Leverage the Power of Reference with Jsonnet\nUnlike other programming languages that require multiple steps to reference data, Jsonnet requires only one step, thanks to its reference mechanism.\nThis is particularly useful for data engineers who want to connect different services in a topological order. The code below represents this scenario in Python:\n```python\n\nimport requests\n\ndef fetch_character(id):\n url = f\"https://rickandmortyapi.com/api/character/{id}\"\n response = requests.get(url)\n return response.json()\n\ndef process_character(character):\n # Add new 'image' field with processed URL\n character['image'] += f\"?awk_download=data/{character['name'].replace(' ', '_').lower()}.jpeg\"\n \n # Process 'episode' field, fetching additional data if necessary\n character['episode'] = [requests.get(episode).json() for episode in character['episode']]\n \n return character\n\n\nprint([process_character(fetch_character(id)) for id in [1, 2, 3, 4, 5, 6]])\n\n```\nContrary to the previous Python code, Jsonnet allows you to leverage the power of referential transparency. The previous code is equivalent in Jsonnet to:\n\n```jsonnet\n[\n i(\"https://rickandmortyapi.com/api/character/%s\" % id) + \n {image: i(super.image+\"?awk_download=data/\"+std.strReplace(std.asciiLower(super.name), \" \", \"_\")+\".jpeg\")} + \n {episode: [i(episode) for episode in super.episode]}\n for id in [1,2,3,4,5,6]\n]\n```\n\n## Connect and call different data sources in one expression\n```jsonnet\n{\n \"emails\": i(\"sql:SELECT subject FROM `%s`\" % self.email),\n // This expression saves the unseen emails from your inbox, as defined in your keyring, using IMAP query criteria. It then returns the netloc hash, which refers to the table.\n \"email\": i(i(\"keyring://backend/awk_plus_plus/primary_email\")+\"?q=UNSEEN\")\n}\n```\n\n# Protocols and Plugins\n* pop3://\n* imap://\n* keyring://backend/{service}/{username}\n* sql:{expression}\n* https://\n* file:/\n\n## Note\n\nThis project has been set up using [PyScaffold] 4.5 and the [dsproject extension] 0.0.post167+g4386552.\n\n[conda]: https://docs.conda.io/\n[pre-commit]: https://pre-commit.com/\n[Jupyter]: https://jupyter.org/\n[nbstripout]: https://github.com/kynan/nbstripout\n[Google style]: http://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings\n[PyScaffold]: https://pyscaffold.org/\n[dsproject extension]: https://github.com/pyscaffold/pyscaffoldext-dsproject\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Declarative Data Orchestration",
"version": "0.17.1",
"project_urls": {
"Documentation": "https://pyscaffold.org/",
"Homepage": "https://github.com/pyscaffold/pyscaffold/"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "f292d6be8879aeb6f228aae6ac2b305b4cfaf7071aa7424e8b65b8c98d65bd63",
"md5": "990d7ba0a74a9a6f2dad355729844072",
"sha256": "24d5206553ae31e68a66bb9e42a9c7030ce1586407e9e2058198f543d9d45c7c"
},
"downloads": -1,
"filename": "awk_plus_plus-0.17.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "990d7ba0a74a9a6f2dad355729844072",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 15000,
"upload_time": "2024-09-23T20:32:31",
"upload_time_iso_8601": "2024-09-23T20:32:31.741712Z",
"url": "https://files.pythonhosted.org/packages/f2/92/d6be8879aeb6f228aae6ac2b305b4cfaf7071aa7424e8b65b8c98d65bd63/awk_plus_plus-0.17.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "8e29991bcc251a70e0b9577019b072ea8d2aa3e9add956acc56b4f0ad47e6ca7",
"md5": "5e8208b8362e18991fcc639387d3fbb6",
"sha256": "6efaaf35dd760e60b7650bffee9aa1f9c49d052adcdb289e28f8f06e7ec1ad47"
},
"downloads": -1,
"filename": "awk_plus_plus-0.17.1.tar.gz",
"has_sig": false,
"md5_digest": "5e8208b8362e18991fcc639387d3fbb6",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 36981,
"upload_time": "2024-09-23T20:32:35",
"upload_time_iso_8601": "2024-09-23T20:32:35.804143Z",
"url": "https://files.pythonhosted.org/packages/8e/29/991bcc251a70e0b9577019b072ea8d2aa3e9add956acc56b4f0ad47e6ca7/awk_plus_plus-0.17.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-09-23 20:32:35",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "pyscaffold",
"github_project": "pyscaffold",
"travis_ci": false,
"coveralls": true,
"github_actions": true,
"tox": true,
"lcname": "awk-plus-plus"
}