# `hela`: write your data catalog as code
![Unit Tests](https://github.com/erikmunkby/hela/actions/workflows/unit_tests.yaml/badge.svg)
![Spark](https://github.com/erikmunkby/hela/actions/workflows/test_spark.yaml/badge.svg)
![BigQuery](https://github.com/erikmunkby/hela/actions/workflows/test_bigquery.yaml/badge.svg)
![AWS Glue](https://github.com/erikmunkby/hela/actions/workflows/test_aws_glue.yaml/badge.svg)
You probably already have your data job scripts version controlled, but what about your data catalog?
The answer: **write your data catalog as code!** Storing your data catalog and data documentation as code makes your catalog searchable, referenceable, reliable, platform agnostic, sets you up for easy collaboration and much more!
This library is built to fit small and large data landscapes, but is happiest when included from the start.
`Hela` (or Hel) is the norse mythological collector of souls, and the Swedish word for "whole" or "all of it". `Hela`
is designed to give everyone a chance to build a data catalog, with a low entry barrier: pure python code.
Links:
* [docs](https://erikmunkby.github.io/hela/)
* [pypi](https://pypi.org/project/hela/)
* [showcase catalog](https://erikmunkby.github.io/hela-showcase/)
## Installing
Using pip:
`pip install hela`
Using poetry:
`poetry add hela`
## Roadmap
These are up-coming features in no particular order, but contributions towards these milestones are highly appreciated! To read more about contributing check out `CONTRIBUTING.md`.
* Search functionality in web app
* More integrations (Snowflake, Redshift)
* More feature rich dataset classes
* Data lineage functionality (both visualized in notebooks and web app)
* Prettier docs page
## (Mega) Quick start
If you want to read more check out the [docs page](https://erikmunkby.github.io/hela/). If you do not have patience for that, the following is all you need to get started.
First of all build your own dataset class by inheriting the `BaseDataset` class. This class will hold most of your project specific functionality such as read/write, authentication etc.
```python
class MyDatasetClass(BaseDataset):
def __init__(
self,
name: str, # Required
description: str, # Optional but recommended
columns: list, # Optional but recommended
rich_description_path: str = None, # Optional, used for web app
partition_cols: list = None, # Optional but recommended
# folder: str = None, # Only do one of either folder or database
database: str = None, # Optional, can also be enriched via Catalog
) -> None:
super().__init__(
name,
data_type='bigquery',
folder=None,
database=database,
description=description,
rich_description_path=rich_description_path,
partition_cols=partition_cols,
columns=columns
)
# Do more of your own init stuff
def my_func(self) -> None:
# Your own dataset function
pass
# Now instantiate your dataset class with one example column
my_dataset = MyDatasetClass('my_dataset', 'An example dataset.', [
Col('my_column', String(), 'An example column.')
])
```
Now that you have a dataset class, and instantiated your first dataset, you can start populating your
data catalog.
```python
from hela import Catalog
class MyCatalog(Catalog):
my_dataset = my_dataset
```
That's it! You now have a small catalog to keep building on. To view it as a web page you can
add the following code to a python script, and in the future add it in whichever CI/CD tool you use.
This will generate an `index.html` file that you can view in your browser or host on e.g. github pages.
```python
from hela import generate_webpage
generate_webpage(MyCatalog, output_folder='.')
```
To view what a bigger data catalog can look like check out the [showcase catalog](https://erikmunkby.github.io/hela-showcase/).
Raw data
{
"_id": null,
"home_page": "https://github.com/erikmunkby/hela",
"name": "hela",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.9,<4.0",
"maintainer_email": "",
"keywords": "data,catalog,big,web,deploy,aws,glue,bigquery,spark,democratize",
"author": "Erik Munkby",
"author_email": "erik.munkby@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/11/7d/d720145d8797bfa53db4cc5308f80234e8d1ac7b3f5f1c3ebb5f86584e90/hela-0.2.6.tar.gz",
"platform": null,
"description": "# `hela`: write your data catalog as code\n![Unit Tests](https://github.com/erikmunkby/hela/actions/workflows/unit_tests.yaml/badge.svg)\n![Spark](https://github.com/erikmunkby/hela/actions/workflows/test_spark.yaml/badge.svg)\n![BigQuery](https://github.com/erikmunkby/hela/actions/workflows/test_bigquery.yaml/badge.svg)\n![AWS Glue](https://github.com/erikmunkby/hela/actions/workflows/test_aws_glue.yaml/badge.svg)\n\nYou probably already have your data job scripts version controlled, but what about your data catalog?\nThe answer: **write your data catalog as code!** Storing your data catalog and data documentation as code makes your catalog searchable, referenceable, reliable, platform agnostic, sets you up for easy collaboration and much more! \nThis library is built to fit small and large data landscapes, but is happiest when included from the start.\n\n`Hela` (or Hel) is the norse mythological collector of souls, and the Swedish word for \"whole\" or \"all of it\". `Hela`\nis designed to give everyone a chance to build a data catalog, with a low entry barrier: pure python code.\n\nLinks:\n* [docs](https://erikmunkby.github.io/hela/)\n* [pypi](https://pypi.org/project/hela/)\n* [showcase catalog](https://erikmunkby.github.io/hela-showcase/)\n\n## Installing\nUsing pip:\n\n`pip install hela`\n\nUsing poetry:\n\n`poetry add hela`\n\n## Roadmap\nThese are up-coming features in no particular order, but contributions towards these milestones are highly appreciated! To read more about contributing check out `CONTRIBUTING.md`.\n\n* Search functionality in web app\n* More integrations (Snowflake, Redshift)\n* More feature rich dataset classes\n* Data lineage functionality (both visualized in notebooks and web app)\n* Prettier docs page\n\n\n## (Mega) Quick start\nIf you want to read more check out the [docs page](https://erikmunkby.github.io/hela/). If you do not have patience for that, the following is all you need to get started.\n\nFirst of all build your own dataset class by inheriting the `BaseDataset` class. This class will hold most of your project specific functionality such as read/write, authentication etc.\n\n```python\nclass MyDatasetClass(BaseDataset):\n def __init__(\n self,\n name: str, # Required\n description: str, # Optional but recommended\n columns: list, # Optional but recommended\n rich_description_path: str = None, # Optional, used for web app\n partition_cols: list = None, # Optional but recommended\n # folder: str = None, # Only do one of either folder or database\n database: str = None, # Optional, can also be enriched via Catalog\n ) -> None:\n super().__init__(\n name,\n data_type='bigquery',\n folder=None,\n database=database,\n description=description,\n rich_description_path=rich_description_path,\n partition_cols=partition_cols,\n columns=columns\n )\n # Do more of your own init stuff\n\n def my_func(self) -> None:\n # Your own dataset function\n pass\n\n# Now instantiate your dataset class with one example column\nmy_dataset = MyDatasetClass('my_dataset', 'An example dataset.', [\n Col('my_column', String(), 'An example column.')\n])\n```\n\nNow that you have a dataset class, and instantiated your first dataset, you can start populating your\ndata catalog.\n\n```python\nfrom hela import Catalog\n\nclass MyCatalog(Catalog):\n my_dataset = my_dataset\n```\n\nThat's it! You now have a small catalog to keep building on. To view it as a web page you can\nadd the following code to a python script, and in the future add it in whichever CI/CD tool you use.\nThis will generate an `index.html` file that you can view in your browser or host on e.g. github pages.\n\n```python\nfrom hela import generate_webpage\n\ngenerate_webpage(MyCatalog, output_folder='.')\n```\n\nTo view what a bigger data catalog can look like check out the [showcase catalog](https://erikmunkby.github.io/hela-showcase/).",
"bugtrack_url": null,
"license": "Apache License v2.0",
"summary": "Your data catalog as code and one schema to rule them all.",
"version": "0.2.6",
"project_urls": {
"Homepage": "https://github.com/erikmunkby/hela",
"Repository": "https://github.com/erikmunkby/hela"
},
"split_keywords": [
"data",
"catalog",
"big",
"web",
"deploy",
"aws",
"glue",
"bigquery",
"spark",
"democratize"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "91365cfa01e8e47814bc99a44e3e03d2419b781add57c5a7a5400f0153d20456",
"md5": "cbeff1c4197a844ed9cebdc5826f3fd7",
"sha256": "2b0e6f6384cf8682008d36c0cc0f3f0846c3887b8a948a021c8c94e98fd0935b"
},
"downloads": -1,
"filename": "hela-0.2.6-py3-none-any.whl",
"has_sig": false,
"md5_digest": "cbeff1c4197a844ed9cebdc5826f3fd7",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9,<4.0",
"size": 169835,
"upload_time": "2024-01-03T15:04:52",
"upload_time_iso_8601": "2024-01-03T15:04:52.767221Z",
"url": "https://files.pythonhosted.org/packages/91/36/5cfa01e8e47814bc99a44e3e03d2419b781add57c5a7a5400f0153d20456/hela-0.2.6-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "117dd720145d8797bfa53db4cc5308f80234e8d1ac7b3f5f1c3ebb5f86584e90",
"md5": "d96c118c2f1679dce09454639e69abc5",
"sha256": "fc65f507df73819bb75889d7dd3b5214d412d4c00137dec43004cb20f00a94ab"
},
"downloads": -1,
"filename": "hela-0.2.6.tar.gz",
"has_sig": false,
"md5_digest": "d96c118c2f1679dce09454639e69abc5",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9,<4.0",
"size": 161093,
"upload_time": "2024-01-03T15:04:54",
"upload_time_iso_8601": "2024-01-03T15:04:54.896109Z",
"url": "https://files.pythonhosted.org/packages/11/7d/d720145d8797bfa53db4cc5308f80234e8d1ac7b3f5f1c3ebb5f86584e90/hela-0.2.6.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-01-03 15:04:54",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "erikmunkby",
"github_project": "hela",
"travis_ci": false,
"coveralls": true,
"github_actions": true,
"lcname": "hela"
}