apache-gravitino


Nameapache-gravitino JSON
Version 0.7.0 PyPI version JSON
download
home_pagehttps://github.com/apache/gravitino
SummaryPython lib/client for Apache Gravitino
upload_time2024-11-14 07:37:57
maintainerApache Gravitino Community
docs_urlNone
authorApache Software Foundation
requires_python>=3.8
licenseApache-2.0
keywords data ai metadata catalog
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Apache Gravitino Python client

Apache Gravitino is a high-performance, geo-distributed, and federated metadata lake.
It manages the metadata directly in different sources, types, and regions, also provides users
the unified metadata access for data and AI assets.

Gravitino Python client helps data scientists easily manage metadata using Python language.

![gravitino-python-client-introduction](https://github.com/apache/gravitino/blob/main/docs/assets/gravitino-python-client-introduction.png?raw=true)

## Use Guidance

You can use Gravitino Python client library with Spark, PyTorch, Tensorflow, Ray and Python environment.

First of all, You must have a Gravitino server set up and run, You can refer document of
[How to install Gravitino](https://datastrato.ai/docs/latest/how-to-install/) to build Gravitino server from source code and
install it in your local.

### Apache Gravitino Python client API

```shell
pip install apache-gravitino
```

1. [Manage metalake using Gravitino Python API](https://datastrato.ai/docs/latest/manage-metalake-using-gravitino/?language=python)
2. [Manage fileset metadata using Gravitino Python API](https://datastrato.ai/docs/latest/manage-fileset-metadata-using-gravitino/?language=python)

### Apache Gravitino Fileset Example

We offer a playground environment to help you quickly understand how to use Gravitino Python
client to manage non-tabular data on HDFS via Fileset in Gravitino. You can refer to the
document [How to use the playground](https://datastrato.ai/docs/latest/how-to-use-the-playground/)
to launch a Gravitino server, HDFS and Jupyter notebook environment in you local Docker environment.

Waiting for the playground Docker environment to start, you can directly open
`http://localhost:18888/lab/tree/gravitino-fileset-example.ipynb` in the browser and run the example.

The [gravitino-fileset-example](https://github.com/apache/gravitino-playground/blob/main/init/jupyter/gravitino-fileset-example.ipynb)
contains the following code snippets:

1. Install HDFS Python client.
2. Create a HDFS client to connect HDFS and to do some test operations.
3. Install Gravitino Python client.
4. Initialize Gravitino admin client and create a Gravitino metalake.
5. Initialize Gravitino client and list metalakes.
6. Create a Gravitino `Catalog` and special `type` is `Catalog.Type.FILESET` and `provider` is
   [hadoop](https://datastrato.ai/docs/latest/hadoop-catalog/)
7. Create a Gravitino `Schema` with the `location` pointed to a HDFS path, and use `hdfs client` to
   check if the schema location is successfully created in HDFS.
8. Create a `Fileset` with `type` is [Fileset.Type.MANAGED](https://datastrato.ai/docs/latest/manage-fileset-metadata-using-gravitino/#fileset-operations),
   use `hdfs client` to check if the fileset location was successfully created in HDFS.
9. Drop this `Fileset.Type.MANAGED` type fileset and check if the fileset location was
   successfully deleted in HDFS.
10. Create a `Fileset` with `type` is [Fileset.Type.EXTERNAL](https://datastrato.ai/docs/latest/manage-fileset-metadata-using-gravitino/#fileset-operations)
    and `location` pointed to exist HDFS path
11. Drop this `Fileset.Type.EXTERNAL` type fileset and check if the fileset location was
    not deleted in HDFS.

## How to development Apache Gravitino Python Client

You can ues any IDE to develop Gravitino Python Client. Directly open the client-python module project in the IDE.

### Prerequisites

+ Python 3.8+
+ Refer to [How to build Gravitino](https://datastrato.ai/docs/latest/how-to-build/#prerequisites) to have necessary build
  environment ready for building.

### Build and testing

1. Clone the Gravitino project.

    ```shell
    git clone git@github.com:apache/gravitino.git
    ```

2. Build the Gravitino Python client module

    ```shell
    ./gradlew :clients:client-python:build
    ```

3. Run unit tests

    ```shell
    ./gradlew :clients:client-python:test -PskipITs
    ```

4. Run integration tests

   Because Python client connects to Gravitino Server to run integration tests,
   So it runs `./gradlew compileDistribution -x test` command automatically to compile the
   Gravitino project in the `distribution` directory. When you run integration tests via Gradle
   command or IDE, Gravitino integration test framework (`integration_test_env.py`)
   will start and stop Gravitino server automatically.

    ```shell
    ./gradlew :clients:client-python:test
    ```

5. Distribute the Gravitino Python client module

    ```shell
    ./gradlew :clients:client-python:distribution
    ```

6. Deploy the Gravitino Python client to https://pypi.org/project/apache-gravitino/

    ```shell
    ./gradlew :clients:client-python:deploy
    ```

## Resources

+ Official website https://gravitino.apache.org/
+ Project home on GitHub: https://github.com/apache/gravitino/
+ Playground with Docker: https://github.com/apache/gravitino-playground
+ User documentation: https://gravitino.apache.org/docs/
+ Slack Community: [https://the-asf.slack.com#gravitino](https://the-asf.slack.com/archives/C078RESTT19)

## License

Gravitino is under the Apache License Version 2.0, See the [LICENSE](https://github.com/apache/gravitino/blob/main/LICENSE) for the details.

## ASF Incubator disclaimer

Apache Gravitino is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator. 
Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, 
and decision making process have stabilized in a manner consistent with other successful ASF projects. 
While incubation status is not necessarily a reflection of the completeness or stability of the code, 
it does indicate that the project has yet to be fully endorsed by the ASF.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/apache/gravitino",
    "name": "apache-gravitino",
    "maintainer": "Apache Gravitino Community",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "dev@gravitino.apache.org",
    "keywords": "Data, AI, metadata, catalog",
    "author": "Apache Software Foundation",
    "author_email": "dev@gravitino.apache.org",
    "download_url": "https://files.pythonhosted.org/packages/c3/07/02b256c525ffc6741d258e0de14d9d3c4d28ae308623031c3d831803cec6/apache_gravitino-0.7.0.tar.gz",
    "platform": null,
    "description": "# Apache Gravitino Python client\n\nApache Gravitino is a high-performance, geo-distributed, and federated metadata lake.\nIt manages the metadata directly in different sources, types, and regions, also provides users\nthe unified metadata access for data and AI assets.\n\nGravitino Python client helps data scientists easily manage metadata using Python language.\n\n![gravitino-python-client-introduction](https://github.com/apache/gravitino/blob/main/docs/assets/gravitino-python-client-introduction.png?raw=true)\n\n## Use Guidance\n\nYou can use Gravitino Python client library with Spark, PyTorch, Tensorflow, Ray and Python environment.\n\nFirst of all, You must have a Gravitino server set up and run, You can refer document of\n[How to install Gravitino](https://datastrato.ai/docs/latest/how-to-install/) to build Gravitino server from source code and\ninstall it in your local.\n\n### Apache Gravitino Python client API\n\n```shell\npip install apache-gravitino\n```\n\n1. [Manage metalake using Gravitino Python API](https://datastrato.ai/docs/latest/manage-metalake-using-gravitino/?language=python)\n2. [Manage fileset metadata using Gravitino Python API](https://datastrato.ai/docs/latest/manage-fileset-metadata-using-gravitino/?language=python)\n\n### Apache Gravitino Fileset Example\n\nWe offer a playground environment to help you quickly understand how to use Gravitino Python\nclient to manage non-tabular data on HDFS via Fileset in Gravitino. You can refer to the\ndocument [How to use the playground](https://datastrato.ai/docs/latest/how-to-use-the-playground/)\nto launch a Gravitino server, HDFS and Jupyter notebook environment in you local Docker environment.\n\nWaiting for the playground Docker environment to start, you can directly open\n`http://localhost:18888/lab/tree/gravitino-fileset-example.ipynb` in the browser and run the example.\n\nThe [gravitino-fileset-example](https://github.com/apache/gravitino-playground/blob/main/init/jupyter/gravitino-fileset-example.ipynb)\ncontains the following code snippets:\n\n1. Install HDFS Python client.\n2. Create a HDFS client to connect HDFS and to do some test operations.\n3. Install Gravitino Python client.\n4. Initialize Gravitino admin client and create a Gravitino metalake.\n5. Initialize Gravitino client and list metalakes.\n6. Create a Gravitino `Catalog` and special `type` is `Catalog.Type.FILESET` and `provider` is\n   [hadoop](https://datastrato.ai/docs/latest/hadoop-catalog/)\n7. Create a Gravitino `Schema` with the `location` pointed to a HDFS path, and use `hdfs client` to\n   check if the schema location is successfully created in HDFS.\n8. Create a `Fileset` with `type` is [Fileset.Type.MANAGED](https://datastrato.ai/docs/latest/manage-fileset-metadata-using-gravitino/#fileset-operations),\n   use `hdfs client` to check if the fileset location was successfully created in HDFS.\n9. Drop this `Fileset.Type.MANAGED` type fileset and check if the fileset location was\n   successfully deleted in HDFS.\n10. Create a `Fileset` with `type` is [Fileset.Type.EXTERNAL](https://datastrato.ai/docs/latest/manage-fileset-metadata-using-gravitino/#fileset-operations)\n    and `location` pointed to exist HDFS path\n11. Drop this `Fileset.Type.EXTERNAL` type fileset and check if the fileset location was\n    not deleted in HDFS.\n\n## How to development Apache Gravitino Python Client\n\nYou can ues any IDE to develop Gravitino Python Client. Directly open the client-python module project in the IDE.\n\n### Prerequisites\n\n+ Python 3.8+\n+ Refer to [How to build Gravitino](https://datastrato.ai/docs/latest/how-to-build/#prerequisites) to have necessary build\n  environment ready for building.\n\n### Build and testing\n\n1. Clone the Gravitino project.\n\n    ```shell\n    git clone git@github.com:apache/gravitino.git\n    ```\n\n2. Build the Gravitino Python client module\n\n    ```shell\n    ./gradlew :clients:client-python:build\n    ```\n\n3. Run unit tests\n\n    ```shell\n    ./gradlew :clients:client-python:test -PskipITs\n    ```\n\n4. Run integration tests\n\n   Because Python client connects to Gravitino Server to run integration tests,\n   So it runs `./gradlew compileDistribution -x test` command automatically to compile the\n   Gravitino project in the `distribution` directory. When you run integration tests via Gradle\n   command or IDE, Gravitino integration test framework (`integration_test_env.py`)\n   will start and stop Gravitino server automatically.\n\n    ```shell\n    ./gradlew :clients:client-python:test\n    ```\n\n5. Distribute the Gravitino Python client module\n\n    ```shell\n    ./gradlew :clients:client-python:distribution\n    ```\n\n6. Deploy the Gravitino Python client to https://pypi.org/project/apache-gravitino/\n\n    ```shell\n    ./gradlew :clients:client-python:deploy\n    ```\n\n## Resources\n\n+ Official website https://gravitino.apache.org/\n+ Project home on GitHub: https://github.com/apache/gravitino/\n+ Playground with Docker: https://github.com/apache/gravitino-playground\n+ User documentation: https://gravitino.apache.org/docs/\n+ Slack Community: [https://the-asf.slack.com#gravitino](https://the-asf.slack.com/archives/C078RESTT19)\n\n## License\n\nGravitino is under the Apache License Version 2.0, See the [LICENSE](https://github.com/apache/gravitino/blob/main/LICENSE) for the details.\n\n## ASF Incubator disclaimer\n\nApache Gravitino is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator. \nIncubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, \nand decision making process have stabilized in a manner consistent with other successful ASF projects. \nWhile incubation status is not necessarily a reflection of the completeness or stability of the code, \nit does indicate that the project has yet to be fully endorsed by the ASF.\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Python lib/client for Apache Gravitino",
    "version": "0.7.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/apache/gravitino/issues",
        "Documentation": "https://gravitino.apache.org/docs/overview",
        "Homepage": "https://gravitino.apache.org/",
        "Slack Chat": "https://the-asf.slack.com/archives/C078RESTT19",
        "Source Code": "https://github.com/apache/gravitino"
    },
    "split_keywords": [
        "data",
        " ai",
        " metadata",
        " catalog"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c30702b256c525ffc6741d258e0de14d9d3c4d28ae308623031c3d831803cec6",
                "md5": "858d2f5620ba2d01d5ab0368a50545b4",
                "sha256": "49c25667c5394930fc6ae4c4a43a8be887065671fcce227b2fda4cd041d2f695"
            },
            "downloads": -1,
            "filename": "apache_gravitino-0.7.0.tar.gz",
            "has_sig": false,
            "md5_digest": "858d2f5620ba2d01d5ab0368a50545b4",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 88879,
            "upload_time": "2024-11-14T07:37:57",
            "upload_time_iso_8601": "2024-11-14T07:37:57.560843Z",
            "url": "https://files.pythonhosted.org/packages/c3/07/02b256c525ffc6741d258e0de14d9d3c4d28ae308623031c3d831803cec6/apache_gravitino-0.7.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-14 07:37:57",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "apache",
    "github_project": "gravitino",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "apache-gravitino"
}
        
Elapsed time: 0.36606s