recap-core


Namerecap-core JSON
Version 0.12.0 PyPI version JSON
download
home_pagehttps://recap.build
SummaryRecap reads and writes schemas from web services, databases, and schema registries in a standard format
upload_time2024-02-29 23:25:57
maintainer
docs_urlNone
author
requires_python<=3.12,>=3.10
licenseMIT
keywords avro data data catalog data discovery data engineering data governance data infrastructure data integration data pipelines data quality devops devtools etl infrastructure json schema metadata protobuf
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <div align="center">
  <img src="https://github.com/recap-build/recap/blob/main/static/recap-logo.png?raw=true" alt="recap">
</div>

## What is Recap?

Recap reads and writes schemas from web services, databases, and schema registries in a standard format.

⭐️ _If you like this project, please give it a star! It helps the project get more visibility._

## Table of Contents

* [What is Recap?](#what-is-recap)
* [Supported Formats](#supported-formats)
* [Install](#install)
* [Usage](#usage)
   * [CLI](#cli)
   * [Gateway](#gateway)
   * [Registry](#registry)
   * [API](#api)
   * [Docker](#docker)
* [Schema](#schema)
* [Documentation](#documentation)

## Supported Formats

| Format      | Read | Write |
| :---------- | :-: | :-: |
| [Avro](https://recap.build/docs/integrations/avro/) | ✅ | ✅ |
| [BigQuery](https://recap.build/docs/integrations/bigquery/) | ✅ |  |
| [Confluent Schema Registry](https://recap.build/docs/integrations/confluent-schema-registry/) | ✅ |  |
| [Hive Metastore](https://recap.build/docs/integrations/hive-metastore/) | ✅ |  |
| [JSON Schema](https://recap.build/docs/integrations/json-schema/) | ✅ | ✅ |
| [MySQL](https://recap.build/docs/integrations/mysql/) | ✅ |  |
| [PostgreSQL](https://recap.build/docs/integrations/postgresql/) | ✅ |  |
| [Protobuf](https://recap.build/docs/integrations/protobuf/) | ✅ | ✅ |
| [Snowflake](https://recap.build/docs/integrations/snowflake/) | ✅ |  |
| [SQLite](https://recap.build/docs/integrations/sqlite/) | ✅ |  |

## Install

Install Recap and all of its optional dependencies:

```bash
pip install 'recap-core[all]'
```

You can also select specific dependencies:

```bash
pip install 'recap-core[avro,kafka]'
```

See `pyproject.toml` for a list of optional dependencies.

## Usage

### CLI

Recap comes with a command line interface that can list and read schemas from external systems.

List the children of a URL:

```bash
recap ls postgresql://user:pass@host:port/testdb
```

```json
[
  "pg_toast",
  "pg_catalog",
  "public",
  "information_schema"
]
```

Keep drilling down:

```bash
recap ls postgresql://user:pass@host:port/testdb/public
```

```json
[
  "test_types"
]
```

Read the schema for the `test_types` table as a Recap struct:

```bash
recap schema postgresql://user:pass@host:port/testdb/public/test_types
```

```json
{
  "type": "struct",
  "fields": [
    {
      "type": "int64",
      "name": "test_bigint",
      "optional": true
    }
  ]
}
```

### Gateway

Recap comes with a stateless HTTP/JSON gateway that can list and read schemas from data catalogs and databases.

Start the server at [http://localhost:8000](http://localhost:8000):

```bash
recap serve
```

List the schemas in a PostgreSQL database:

```bash
curl http://localhost:8000/gateway/ls/postgresql://user:pass@host:port/testdb
```

```json
["pg_toast","pg_catalog","public","information_schema"]
```

And read a schema:

```bash
curl http://localhost:8000/gateway/schema/postgresql://user:pass@host:port/testdb/public/test_types
```

```json
{"type":"struct","fields":[{"type":"int64","name":"test_bigint","optional":true}]}
```

The gateway fetches schemas from external systems in realtime and returns them as Recap schemas.

An OpenAPI schema is available at [http://localhost:8000/docs](http://localhost:8000/docs).

### Registry

You can store schemas in Recap's schema registry.

Start the server at [http://localhost:8000](http://localhost:8000):

```bash
recap serve
```

Put a schema in the registry:

```bash
curl -X POST \
    -H "Content-Type: application/x-recap+json" \
    -d '{"type":"struct","fields":[{"type":"int64","name":"test_bigint","optional":true}]}' \
    http://localhost:8000/registry/some_schema
```

Get the schema (and version) from the registry:

```bash
curl http://localhost:8000/registry/some_schema
```

```json
[{"type":"struct","fields":[{"type":"int64","name":"test_bigint","optional":true}]},1]
```

Put a new version of the schema in the registry:

```bash
curl -X POST \
    -H "Content-Type: application/x-recap+json" \
    -d '{"type":"struct","fields":[{"type":"int32","name":"test_int","optional":true}]}' \
    http://localhost:8000/registry/some_schema
```

List schema versions:

```bash
curl http://localhost:8000/registry/some_schema/versions
```

```json
[1,2]
```

Get a specific version of the schema:

```bash
curl http://localhost:8000/registry/some_schema/versions/1
```

```json
[{"type":"struct","fields":[{"type":"int64","name":"test_bigint","optional":true}]},1]
```

The registry uses [fsspec](https://filesystem-spec.readthedocs.io/en/latest/) to store schemas in a variety of filesystems like S3, GCS, ABS, and the local filesystem. See the [registry](https://recap.build/docs/registry/) docs for more details.

An OpenAPI schema is available at [http://localhost:8000/docs](http://localhost:8000/docs).

### API

Recap has `recap.converters` and `recap.clients` packages.

- Converters convert schemas to and from Recap schemas.
- Clients read schemas from external systems (databases, schema registries, and so on) and use converters to return Recap schemas.

Read a schema from PostgreSQL:

```python
from recap.clients import create_client

with create_client("postgresql://user:pass@host:port/testdb") as c:
    c.schema("testdb", "public", "test_types")
```

Convert the schema to Avro, Protobuf, and JSON schemas:

```python
from recap.converters.avro import AvroConverter
from recap.converters.protobuf import ProtobufConverter
from recap.converters.json_schema import JSONSchemaConverter

avro_schema = AvroConverter().from_recap(struct)
protobuf_schema = ProtobufConverter().from_recap(struct)
json_schema = JSONSchemaConverter().from_recap(struct)
```

Transpile schemas from one format to another:

```python
from recap.converters.json_schema import JSONSchemaConverter
from recap.converters.avro import AvroConverter

json_schema = """
{
    "type": "object",
    "$id": "https://recap.build/person.schema.json",
    "properties": {
        "name": {"type": "string"}
    }
}
"""

# Use Recap as an intermediate format to convert JSON schema to Avro
struct = JSONSchemaConverter().to_recap(json_schema)
avro_schema = AvroConverter().from_recap(struct)
```

Store schemas in Recap's schema registry:

```python
from recap.storage.registry import RegistryStorage
from recap.types import StructType, IntType

storage = RegistryStorage("file:///tmp/recap-registry-storage")
version = storage.put(
    "postgresql://localhost:5432/testdb/public/test_table",
    StructType(fields=[IntType(32)])
)
storage.get("postgresql://localhost:5432/testdb/public/test_table")

# Get all versions of a schema
versions = storage.versions("postgresql://localhost:5432/testdb/public/test_table")

# List all schemas in the registry
schemas = storage.ls()
```

### Docker

Recap's gateway and registry are also available as a Docker image:

```bash
docker run \
    -p 8000:8000 \
    -e RECAP_URLS=["postgresql://user:pass@localhost:5432/testdb"]' \
    ghcr.io/recap-build/recap:latest
```

See [Recap's Docker documentation](https://recap.build/docs/gateway/docker) for more details.

## Schema

See [Recap's type spec](https://recap.build/specs/type) for details on Recap's type system.

## Documentation

Recap's documentation is available at [recap.build](https://recap.build).

            

Raw data

            {
    "_id": null,
    "home_page": "https://recap.build",
    "name": "recap-core",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "<=3.12,>=3.10",
    "maintainer_email": "",
    "keywords": "avro data data catalog data discovery data engineering data governance data infrastructure data integration data pipelines data quality devops devtools etl infrastructure json schema metadata protobuf",
    "author": "",
    "author_email": "Chris Riccomini <criccomini@apache.org>",
    "download_url": "https://files.pythonhosted.org/packages/d8/5a/da78ccabcb29cf2ffcec9043de06e8009b4fd33621b3b5c8d5f73a9c9ece/recap_core-0.12.0.tar.gz",
    "platform": null,
    "description": "<div align=\"center\">\n  <img src=\"https://github.com/recap-build/recap/blob/main/static/recap-logo.png?raw=true\" alt=\"recap\">\n</div>\n\n## What is Recap?\n\nRecap reads and writes schemas from web services, databases, and schema registries in a standard format.\n\n\u2b50\ufe0f _If you like this project, please give it a star! It helps the project get more visibility._\n\n## Table of Contents\n\n* [What is Recap?](#what-is-recap)\n* [Supported Formats](#supported-formats)\n* [Install](#install)\n* [Usage](#usage)\n   * [CLI](#cli)\n   * [Gateway](#gateway)\n   * [Registry](#registry)\n   * [API](#api)\n   * [Docker](#docker)\n* [Schema](#schema)\n* [Documentation](#documentation)\n\n## Supported Formats\n\n| Format      | Read | Write |\n| :---------- | :-: | :-: |\n| [Avro](https://recap.build/docs/integrations/avro/) | \u2705 | \u2705 |\n| [BigQuery](https://recap.build/docs/integrations/bigquery/) | \u2705 |  |\n| [Confluent Schema Registry](https://recap.build/docs/integrations/confluent-schema-registry/) | \u2705 |  |\n| [Hive Metastore](https://recap.build/docs/integrations/hive-metastore/) | \u2705 |  |\n| [JSON Schema](https://recap.build/docs/integrations/json-schema/) | \u2705 | \u2705 |\n| [MySQL](https://recap.build/docs/integrations/mysql/) | \u2705 |  |\n| [PostgreSQL](https://recap.build/docs/integrations/postgresql/) | \u2705 |  |\n| [Protobuf](https://recap.build/docs/integrations/protobuf/) | \u2705 | \u2705 |\n| [Snowflake](https://recap.build/docs/integrations/snowflake/) | \u2705 |  |\n| [SQLite](https://recap.build/docs/integrations/sqlite/) | \u2705 |  |\n\n## Install\n\nInstall Recap and all of its optional dependencies:\n\n```bash\npip install 'recap-core[all]'\n```\n\nYou can also select specific dependencies:\n\n```bash\npip install 'recap-core[avro,kafka]'\n```\n\nSee `pyproject.toml` for a list of optional dependencies.\n\n## Usage\n\n### CLI\n\nRecap comes with a command line interface that can list and read schemas from external systems.\n\nList the children of a URL:\n\n```bash\nrecap ls postgresql://user:pass@host:port/testdb\n```\n\n```json\n[\n  \"pg_toast\",\n  \"pg_catalog\",\n  \"public\",\n  \"information_schema\"\n]\n```\n\nKeep drilling down:\n\n```bash\nrecap ls postgresql://user:pass@host:port/testdb/public\n```\n\n```json\n[\n  \"test_types\"\n]\n```\n\nRead the schema for the `test_types` table as a Recap struct:\n\n```bash\nrecap schema postgresql://user:pass@host:port/testdb/public/test_types\n```\n\n```json\n{\n  \"type\": \"struct\",\n  \"fields\": [\n    {\n      \"type\": \"int64\",\n      \"name\": \"test_bigint\",\n      \"optional\": true\n    }\n  ]\n}\n```\n\n### Gateway\n\nRecap comes with a stateless HTTP/JSON gateway that can list and read schemas from data catalogs and databases.\n\nStart the server at [http://localhost:8000](http://localhost:8000):\n\n```bash\nrecap serve\n```\n\nList the schemas in a PostgreSQL database:\n\n```bash\ncurl http://localhost:8000/gateway/ls/postgresql://user:pass@host:port/testdb\n```\n\n```json\n[\"pg_toast\",\"pg_catalog\",\"public\",\"information_schema\"]\n```\n\nAnd read a schema:\n\n```bash\ncurl http://localhost:8000/gateway/schema/postgresql://user:pass@host:port/testdb/public/test_types\n```\n\n```json\n{\"type\":\"struct\",\"fields\":[{\"type\":\"int64\",\"name\":\"test_bigint\",\"optional\":true}]}\n```\n\nThe gateway fetches schemas from external systems in realtime and returns them as Recap schemas.\n\nAn OpenAPI schema is available at [http://localhost:8000/docs](http://localhost:8000/docs).\n\n### Registry\n\nYou can store schemas in Recap's schema registry.\n\nStart the server at [http://localhost:8000](http://localhost:8000):\n\n```bash\nrecap serve\n```\n\nPut a schema in the registry:\n\n```bash\ncurl -X POST \\\n    -H \"Content-Type: application/x-recap+json\" \\\n    -d '{\"type\":\"struct\",\"fields\":[{\"type\":\"int64\",\"name\":\"test_bigint\",\"optional\":true}]}' \\\n    http://localhost:8000/registry/some_schema\n```\n\nGet the schema (and version) from the registry:\n\n```bash\ncurl http://localhost:8000/registry/some_schema\n```\n\n```json\n[{\"type\":\"struct\",\"fields\":[{\"type\":\"int64\",\"name\":\"test_bigint\",\"optional\":true}]},1]\n```\n\nPut a new version of the schema in the registry:\n\n```bash\ncurl -X POST \\\n    -H \"Content-Type: application/x-recap+json\" \\\n    -d '{\"type\":\"struct\",\"fields\":[{\"type\":\"int32\",\"name\":\"test_int\",\"optional\":true}]}' \\\n    http://localhost:8000/registry/some_schema\n```\n\nList schema versions:\n\n```bash\ncurl http://localhost:8000/registry/some_schema/versions\n```\n\n```json\n[1,2]\n```\n\nGet a specific version of the schema:\n\n```bash\ncurl http://localhost:8000/registry/some_schema/versions/1\n```\n\n```json\n[{\"type\":\"struct\",\"fields\":[{\"type\":\"int64\",\"name\":\"test_bigint\",\"optional\":true}]},1]\n```\n\nThe registry uses [fsspec](https://filesystem-spec.readthedocs.io/en/latest/) to store schemas in a variety of filesystems like S3, GCS, ABS, and the local filesystem. See the [registry](https://recap.build/docs/registry/) docs for more details.\n\nAn OpenAPI schema is available at [http://localhost:8000/docs](http://localhost:8000/docs).\n\n### API\n\nRecap has `recap.converters` and `recap.clients` packages.\n\n- Converters convert schemas to and from Recap schemas.\n- Clients read schemas from external systems (databases, schema registries, and so on) and use converters to return Recap schemas.\n\nRead a schema from PostgreSQL:\n\n```python\nfrom recap.clients import create_client\n\nwith create_client(\"postgresql://user:pass@host:port/testdb\") as c:\n    c.schema(\"testdb\", \"public\", \"test_types\")\n```\n\nConvert the schema to Avro, Protobuf, and JSON schemas:\n\n```python\nfrom recap.converters.avro import AvroConverter\nfrom recap.converters.protobuf import ProtobufConverter\nfrom recap.converters.json_schema import JSONSchemaConverter\n\navro_schema = AvroConverter().from_recap(struct)\nprotobuf_schema = ProtobufConverter().from_recap(struct)\njson_schema = JSONSchemaConverter().from_recap(struct)\n```\n\nTranspile schemas from one format to another:\n\n```python\nfrom recap.converters.json_schema import JSONSchemaConverter\nfrom recap.converters.avro import AvroConverter\n\njson_schema = \"\"\"\n{\n    \"type\": \"object\",\n    \"$id\": \"https://recap.build/person.schema.json\",\n    \"properties\": {\n        \"name\": {\"type\": \"string\"}\n    }\n}\n\"\"\"\n\n# Use Recap as an intermediate format to convert JSON schema to Avro\nstruct = JSONSchemaConverter().to_recap(json_schema)\navro_schema = AvroConverter().from_recap(struct)\n```\n\nStore schemas in Recap's schema registry:\n\n```python\nfrom recap.storage.registry import RegistryStorage\nfrom recap.types import StructType, IntType\n\nstorage = RegistryStorage(\"file:///tmp/recap-registry-storage\")\nversion = storage.put(\n    \"postgresql://localhost:5432/testdb/public/test_table\",\n    StructType(fields=[IntType(32)])\n)\nstorage.get(\"postgresql://localhost:5432/testdb/public/test_table\")\n\n# Get all versions of a schema\nversions = storage.versions(\"postgresql://localhost:5432/testdb/public/test_table\")\n\n# List all schemas in the registry\nschemas = storage.ls()\n```\n\n### Docker\n\nRecap's gateway and registry are also available as a Docker image:\n\n```bash\ndocker run \\\n    -p 8000:8000 \\\n    -e RECAP_URLS=[\"postgresql://user:pass@localhost:5432/testdb\"]' \\\n    ghcr.io/recap-build/recap:latest\n```\n\nSee [Recap's Docker documentation](https://recap.build/docs/gateway/docker) for more details.\n\n## Schema\n\nSee [Recap's type spec](https://recap.build/specs/type) for details on Recap's type system.\n\n## Documentation\n\nRecap's documentation is available at [recap.build](https://recap.build).\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Recap reads and writes schemas from web services, databases, and schema registries in a standard format",
    "version": "0.12.0",
    "project_urls": {
        "Documentation": "https://recap.build",
        "Homepage": "https://recap.build",
        "Repository": "https://github.com/recap-cloud/recap"
    },
    "split_keywords": [
        "avro",
        "data",
        "data",
        "catalog",
        "data",
        "discovery",
        "data",
        "engineering",
        "data",
        "governance",
        "data",
        "infrastructure",
        "data",
        "integration",
        "data",
        "pipelines",
        "data",
        "quality",
        "devops",
        "devtools",
        "etl",
        "infrastructure",
        "json",
        "schema",
        "metadata",
        "protobuf"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4ac7a5b98da5e52a8473bd534f6e7088e80298e502b824fa568ab37477dea9a1",
                "md5": "6f6e8fb35f90ec3da94354c969aed838",
                "sha256": "04692f9eb655e924e7f891cf0744489244f55fe12bc9b40278b3792a4cc05ebe"
            },
            "downloads": -1,
            "filename": "recap_core-0.12.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6f6e8fb35f90ec3da94354c969aed838",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<=3.12,>=3.10",
            "size": 50842,
            "upload_time": "2024-02-29T23:25:55",
            "upload_time_iso_8601": "2024-02-29T23:25:55.424936Z",
            "url": "https://files.pythonhosted.org/packages/4a/c7/a5b98da5e52a8473bd534f6e7088e80298e502b824fa568ab37477dea9a1/recap_core-0.12.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d85ada78ccabcb29cf2ffcec9043de06e8009b4fd33621b3b5c8d5f73a9c9ece",
                "md5": "19fb21b76d30901fd2bb1921dc810844",
                "sha256": "85b1a3793e2e09598ee3e716d49d88be5531a8787905d3aec52b204347e0e5cf"
            },
            "downloads": -1,
            "filename": "recap_core-0.12.0.tar.gz",
            "has_sig": false,
            "md5_digest": "19fb21b76d30901fd2bb1921dc810844",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<=3.12,>=3.10",
            "size": 79867,
            "upload_time": "2024-02-29T23:25:57",
            "upload_time_iso_8601": "2024-02-29T23:25:57.798825Z",
            "url": "https://files.pythonhosted.org/packages/d8/5a/da78ccabcb29cf2ffcec9043de06e8009b4fd33621b3b5c8d5f73a9c9ece/recap_core-0.12.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-29 23:25:57",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "recap-cloud",
    "github_project": "recap",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "recap-core"
}
        
Elapsed time: 0.18485s