# Apache Gravitino Python client
Apache Gravitino is a high-performance, geo-distributed, and federated metadata lake.
It manages the metadata directly in different sources, types, and regions, also provides users
the unified metadata access for data and AI assets.
Gravitino Python client helps data scientists easily manage metadata using Python language.
![gravitino-python-client-introduction](https://github.com/apache/gravitino/blob/main/docs/assets/gravitino-python-client-introduction.png?raw=true)
## Use Guidance
You can use Gravitino Python client library with Spark, PyTorch, Tensorflow, Ray and Python environment.
First of all, You must have a Gravitino server set up and run, You can refer document of
[How to install Gravitino](https://datastrato.ai/docs/latest/how-to-install/) to build Gravitino server from source code and
install it in your local.
### Apache Gravitino Python client API
```shell
pip install apache-gravitino
```
1. [Manage metalake using Gravitino Python API](https://datastrato.ai/docs/latest/manage-metalake-using-gravitino/?language=python)
2. [Manage fileset metadata using Gravitino Python API](https://datastrato.ai/docs/latest/manage-fileset-metadata-using-gravitino/?language=python)
### Apache Gravitino Fileset Example
We offer a playground environment to help you quickly understand how to use Gravitino Python
client to manage non-tabular data on HDFS via Fileset in Gravitino. You can refer to the
document [How to use the playground](https://datastrato.ai/docs/latest/how-to-use-the-playground/)
to launch a Gravitino server, HDFS and Jupyter notebook environment in you local Docker environment.
Waiting for the playground Docker environment to start, you can directly open
`http://localhost:18888/lab/tree/gravitino-fileset-example.ipynb` in the browser and run the example.
The [gravitino-fileset-example](https://github.com/apache/gravitino-playground/blob/main/init/jupyter/gravitino-fileset-example.ipynb)
contains the following code snippets:
1. Install HDFS Python client.
2. Create a HDFS client to connect HDFS and to do some test operations.
3. Install Gravitino Python client.
4. Initialize Gravitino admin client and create a Gravitino metalake.
5. Initialize Gravitino client and list metalakes.
6. Create a Gravitino `Catalog` and special `type` is `Catalog.Type.FILESET` and `provider` is
[hadoop](https://datastrato.ai/docs/latest/hadoop-catalog/)
7. Create a Gravitino `Schema` with the `location` pointed to a HDFS path, and use `hdfs client` to
check if the schema location is successfully created in HDFS.
8. Create a `Fileset` with `type` is [Fileset.Type.MANAGED](https://datastrato.ai/docs/latest/manage-fileset-metadata-using-gravitino/#fileset-operations),
use `hdfs client` to check if the fileset location was successfully created in HDFS.
9. Drop this `Fileset.Type.MANAGED` type fileset and check if the fileset location was
successfully deleted in HDFS.
10. Create a `Fileset` with `type` is [Fileset.Type.EXTERNAL](https://datastrato.ai/docs/latest/manage-fileset-metadata-using-gravitino/#fileset-operations)
and `location` pointed to exist HDFS path
11. Drop this `Fileset.Type.EXTERNAL` type fileset and check if the fileset location was
not deleted in HDFS.
## How to development Apache Gravitino Python Client
You can ues any IDE to develop Gravitino Python Client. Directly open the client-python module project in the IDE.
### Prerequisites
+ Python 3.8+
+ Refer to [How to build Gravitino](https://datastrato.ai/docs/latest/how-to-build/#prerequisites) to have necessary build
environment ready for building.
### Build and testing
1. Clone the Gravitino project.
```shell
git clone git@github.com:apache/gravitino.git
```
2. Build the Gravitino Python client module
```shell
./gradlew :clients:client-python:build
```
3. Run unit tests
```shell
./gradlew :clients:client-python:test -PskipITs
```
4. Run integration tests
Because Python client connects to Gravitino Server to run integration tests,
So it runs `./gradlew compileDistribution -x test` command automatically to compile the
Gravitino project in the `distribution` directory. When you run integration tests via Gradle
command or IDE, Gravitino integration test framework (`integration_test_env.py`)
will start and stop Gravitino server automatically.
```shell
./gradlew :clients:client-python:test
```
5. Distribute the Gravitino Python client module
```shell
./gradlew :clients:client-python:distribution
```
6. Deploy the Gravitino Python client to https://pypi.org/project/apache-gravitino/
```shell
./gradlew :clients:client-python:deploy
```
## Resources
+ Official website https://gravitino.apache.org/
+ Project home on GitHub: https://github.com/apache/gravitino/
+ Playground with Docker: https://github.com/apache/gravitino-playground
+ User documentation: https://gravitino.apache.org/docs/
+ Slack Community: [https://the-asf.slack.com#gravitino](https://the-asf.slack.com/archives/C078RESTT19)
## License
Gravitino is under the Apache License Version 2.0, See the [LICENSE](https://github.com/apache/gravitino/blob/main/LICENSE) for the details.
## ASF Incubator disclaimer
Apache Gravitino is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator.
Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications,
and decision making process have stabilized in a manner consistent with other successful ASF projects.
While incubation status is not necessarily a reflection of the completeness or stability of the code,
it does indicate that the project has yet to be fully endorsed by the ASF.
Raw data
{
"_id": null,
"home_page": "https://github.com/apache/gravitino",
"name": "apache-gravitino",
"maintainer": "Apache Gravitino Community",
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "dev@gravitino.apache.org",
"keywords": "Data, AI, metadata, catalog",
"author": "Apache Software Foundation",
"author_email": "dev@gravitino.apache.org",
"download_url": "https://files.pythonhosted.org/packages/c3/07/02b256c525ffc6741d258e0de14d9d3c4d28ae308623031c3d831803cec6/apache_gravitino-0.7.0.tar.gz",
"platform": null,
"description": "# Apache Gravitino Python client\n\nApache Gravitino is a high-performance, geo-distributed, and federated metadata lake.\nIt manages the metadata directly in different sources, types, and regions, also provides users\nthe unified metadata access for data and AI assets.\n\nGravitino Python client helps data scientists easily manage metadata using Python language.\n\n![gravitino-python-client-introduction](https://github.com/apache/gravitino/blob/main/docs/assets/gravitino-python-client-introduction.png?raw=true)\n\n## Use Guidance\n\nYou can use Gravitino Python client library with Spark, PyTorch, Tensorflow, Ray and Python environment.\n\nFirst of all, You must have a Gravitino server set up and run, You can refer document of\n[How to install Gravitino](https://datastrato.ai/docs/latest/how-to-install/) to build Gravitino server from source code and\ninstall it in your local.\n\n### Apache Gravitino Python client API\n\n```shell\npip install apache-gravitino\n```\n\n1. [Manage metalake using Gravitino Python API](https://datastrato.ai/docs/latest/manage-metalake-using-gravitino/?language=python)\n2. [Manage fileset metadata using Gravitino Python API](https://datastrato.ai/docs/latest/manage-fileset-metadata-using-gravitino/?language=python)\n\n### Apache Gravitino Fileset Example\n\nWe offer a playground environment to help you quickly understand how to use Gravitino Python\nclient to manage non-tabular data on HDFS via Fileset in Gravitino. You can refer to the\ndocument [How to use the playground](https://datastrato.ai/docs/latest/how-to-use-the-playground/)\nto launch a Gravitino server, HDFS and Jupyter notebook environment in you local Docker environment.\n\nWaiting for the playground Docker environment to start, you can directly open\n`http://localhost:18888/lab/tree/gravitino-fileset-example.ipynb` in the browser and run the example.\n\nThe [gravitino-fileset-example](https://github.com/apache/gravitino-playground/blob/main/init/jupyter/gravitino-fileset-example.ipynb)\ncontains the following code snippets:\n\n1. Install HDFS Python client.\n2. Create a HDFS client to connect HDFS and to do some test operations.\n3. Install Gravitino Python client.\n4. Initialize Gravitino admin client and create a Gravitino metalake.\n5. Initialize Gravitino client and list metalakes.\n6. Create a Gravitino `Catalog` and special `type` is `Catalog.Type.FILESET` and `provider` is\n [hadoop](https://datastrato.ai/docs/latest/hadoop-catalog/)\n7. Create a Gravitino `Schema` with the `location` pointed to a HDFS path, and use `hdfs client` to\n check if the schema location is successfully created in HDFS.\n8. Create a `Fileset` with `type` is [Fileset.Type.MANAGED](https://datastrato.ai/docs/latest/manage-fileset-metadata-using-gravitino/#fileset-operations),\n use `hdfs client` to check if the fileset location was successfully created in HDFS.\n9. Drop this `Fileset.Type.MANAGED` type fileset and check if the fileset location was\n successfully deleted in HDFS.\n10. Create a `Fileset` with `type` is [Fileset.Type.EXTERNAL](https://datastrato.ai/docs/latest/manage-fileset-metadata-using-gravitino/#fileset-operations)\n and `location` pointed to exist HDFS path\n11. Drop this `Fileset.Type.EXTERNAL` type fileset and check if the fileset location was\n not deleted in HDFS.\n\n## How to development Apache Gravitino Python Client\n\nYou can ues any IDE to develop Gravitino Python Client. Directly open the client-python module project in the IDE.\n\n### Prerequisites\n\n+ Python 3.8+\n+ Refer to [How to build Gravitino](https://datastrato.ai/docs/latest/how-to-build/#prerequisites) to have necessary build\n environment ready for building.\n\n### Build and testing\n\n1. Clone the Gravitino project.\n\n ```shell\n git clone git@github.com:apache/gravitino.git\n ```\n\n2. Build the Gravitino Python client module\n\n ```shell\n ./gradlew :clients:client-python:build\n ```\n\n3. Run unit tests\n\n ```shell\n ./gradlew :clients:client-python:test -PskipITs\n ```\n\n4. Run integration tests\n\n Because Python client connects to Gravitino Server to run integration tests,\n So it runs `./gradlew compileDistribution -x test` command automatically to compile the\n Gravitino project in the `distribution` directory. When you run integration tests via Gradle\n command or IDE, Gravitino integration test framework (`integration_test_env.py`)\n will start and stop Gravitino server automatically.\n\n ```shell\n ./gradlew :clients:client-python:test\n ```\n\n5. Distribute the Gravitino Python client module\n\n ```shell\n ./gradlew :clients:client-python:distribution\n ```\n\n6. Deploy the Gravitino Python client to https://pypi.org/project/apache-gravitino/\n\n ```shell\n ./gradlew :clients:client-python:deploy\n ```\n\n## Resources\n\n+ Official website https://gravitino.apache.org/\n+ Project home on GitHub: https://github.com/apache/gravitino/\n+ Playground with Docker: https://github.com/apache/gravitino-playground\n+ User documentation: https://gravitino.apache.org/docs/\n+ Slack Community: [https://the-asf.slack.com#gravitino](https://the-asf.slack.com/archives/C078RESTT19)\n\n## License\n\nGravitino is under the Apache License Version 2.0, See the [LICENSE](https://github.com/apache/gravitino/blob/main/LICENSE) for the details.\n\n## ASF Incubator disclaimer\n\nApache Gravitino is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator. \nIncubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, \nand decision making process have stabilized in a manner consistent with other successful ASF projects. \nWhile incubation status is not necessarily a reflection of the completeness or stability of the code, \nit does indicate that the project has yet to be fully endorsed by the ASF.\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "Python lib/client for Apache Gravitino",
"version": "0.7.0",
"project_urls": {
"Bug Tracker": "https://github.com/apache/gravitino/issues",
"Documentation": "https://gravitino.apache.org/docs/overview",
"Homepage": "https://gravitino.apache.org/",
"Slack Chat": "https://the-asf.slack.com/archives/C078RESTT19",
"Source Code": "https://github.com/apache/gravitino"
},
"split_keywords": [
"data",
" ai",
" metadata",
" catalog"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "c30702b256c525ffc6741d258e0de14d9d3c4d28ae308623031c3d831803cec6",
"md5": "858d2f5620ba2d01d5ab0368a50545b4",
"sha256": "49c25667c5394930fc6ae4c4a43a8be887065671fcce227b2fda4cd041d2f695"
},
"downloads": -1,
"filename": "apache_gravitino-0.7.0.tar.gz",
"has_sig": false,
"md5_digest": "858d2f5620ba2d01d5ab0368a50545b4",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 88879,
"upload_time": "2024-11-14T07:37:57",
"upload_time_iso_8601": "2024-11-14T07:37:57.560843Z",
"url": "https://files.pythonhosted.org/packages/c3/07/02b256c525ffc6741d258e0de14d9d3c4d28ae308623031c3d831803cec6/apache_gravitino-0.7.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-14 07:37:57",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "apache",
"github_project": "gravitino",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "apache-gravitino"
}