lark-dbml


Namelark-dbml JSON
Version 0.6.0 PyPI version JSON
download
home_pageNone
SummaryDBML parser using Lark.
upload_time2025-08-08 13:33:41
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseNone
keywords dbml lark sql ddl data contract pydantic parser lalr earley standalone
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![](https://img.shields.io/pypi/v/lark-dbml.svg)](https://pypi.org/project/lark-dbml/)
[![](https://img.shields.io/github/v/tag/daihuynh/lark-dbml.svg?label=GitHub)](https://github.com/daihuynh/lark-dbml)
[![codecov](https://codecov.io/gh/daihuynh/lark-dbml/graph/badge.svg?token=YZPWVIS3QA)](https://codecov.io/gh/daihuynh/lark-dbml)
[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
[![PyPI Downloads](https://static.pepy.tech/badge/lark-dbml)](https://pepy.tech/projects/lark-dbml)


# Lark-DBML

* [Features](#features)
* [Milestones](#milestones)
* [Installation](#installation)
* [Usage](#usage)
  * [Output Structure](#output-structure)
  * [Parser](#parser)
    * [Load](#load-dbml)
    * [Dump](#dump-dbml)
  * [Converters](#converters)
    * [SQL](#sql)
    * [Data Contract](#data-contract)
* [Development](#development)
* [License](#license)

A Python parser for [Database Markup Language (DBML)](https://dbml.dbdiagram.io) built with the powerful LARK parsing toolkit, utilizing the Earley algorithm for robust and flexible parsing.

## Features

* **High-Performance:** `lark-dbml` supports both the Earley and LALR(1) algorithms. The LALR(1) algorithm in `lark` has the best performance compared to Parsimonious, PyParsing, and ANTLR, according to [lark](https://github.com/lark-parser/lark).
* **Standalone Mode:** the package does not require `lark` package by default. The whole parsing is packed in a single Python file generated from the EBNF grammar file.
* **Standalone Mode:** fully support [DBML latest specification - April 2025](https://docs.dbdiagram.io/release-notes/).
* **Pydantic Validation:** Ensures the parsed DBML data conforms to a well-defined structure using Pydantic 2.11, providing reliable data integrity.
* **Structured Output:** Generates Python objects representing your DBML diagram, making it easy to programmatically access and manipulate your database schema.
* **Future-Proof:** the parser accepts any properties or settings that are not defined in the DBML spec.
* **Powerful Conversion & Tooling**:
  * **DBML Round-Trip**: The package supports full round-trip conversion, allowing to parse DBML, programmatically manipulate the Pydantic models, and then generate the DBML back out.
  * **SQL**: convert Pydantic output model to SQL with [sqlglot](https://github.com/tobymao/sqlglot).
  * **Data Contract**: Transform your DBML models into [data contract specification](https://datacontract.com).

## Milestones

- [x] DBML Parser - Earley
- [x] SQL Converter
- [x] DBML Converter
- [x] Data Contract Converter
- [x] Optimised DBML Parser - LALR(1)
- [ ] CLI - TBD
- [ ] Generate DBML from a database connection string - TBD

## Installation

You can install lark-dbml using pip:

```bash
pip install lark-dbml
```

To use `lark` mode when `standalone_mode` is set as False in the `load` function
```bash
pip install lark-dbml
```

To use SQL converter
```bash
pip install "lark-dbml[sql]"
```

## Usage

### Output Structure

Diagram - a Pydantic model - defines the expected structure of the parsed DBML content, ensuring consistency and type safety.

```python
class Diagram(BaseModel):
    project: Project
    enums: list[Enum] | None = []
    table_groups: list[TableGroup] | None = []
    sticky_notes: list[Note] | None = []
    references: list[Reference] | None = []
    tables: list[Table] | None = []
    table_partials : list[TablePartial] | None = []
```

### Parser

lark-dbml uses the same API as other parser packages in Python. The default option is `standalone` mode with `LALR(1)` algorithm. Beside default parameters, `load` and `loads` accept any options used by the Lark parser, which can be found at this [link](https://github.com/lark-parser/lark/blob/d1a456dd365603bbcb4b5b4ec2c29e6096b82f59/lark/lark.py#L47)

#### Load DBML

```python
from lark_dbml import load, loads

# 1. Read from a string
dbml = """
Project "My Database" {
  database_type: 'PostgreSQL'
  Note: "This is a sample database"
}

Table "users" {
  id int [pk, increment]
  username varchar [unique, not null]
  email varchar [unique]
  created_at timestamp [default: `now()`]
}

Table "posts" {
  id int [pk, increment]
  title varchar
  content text
  user_id int
}

Ref: posts.user_id > users.id
"""

# Default option
diagram = loads(dbml)
# Change to Lark mode
diagram = loads(dbml, standalone_mode=False)
# Switch to Earley algorithm
diagram = loads(dbml, parser="earley")

# 2. Read from a file
diagram = load('example.dbml')
```

The parser can read any settings or properties in DBML objects even if the spec doesn't define them.

```python
diagram = loads("""
Table myTable [newkey: 'random_value'] {
    id int [pk]
}
""")
```

```
>>> diagram.tables[0].settings
TableSettings(note=None, header_color=None, newkey='random_value')
```

#### Dump DBML


```python
from lark_dbml import dump, dumps

from lark_dbml.converter.dbml import DBMLConverterSettings
from lark_dbml.schema import (
    Column,
    ColumnSettings,
    DataType,
    Diagram,
    Table,
    TableSettings
)

diagram = Diagram(
    tables=[
        Table(
            name="body",
            alias="full_table",
            note="Incorporated with header and footer",
            settings=TableSettings(
                headercolor="#3498DB",
                note="header note",
                partitioned_by="id"
            ),
            columns=[
                Column(
                    name="id",
                    data_type=DataType(sql_type="int"),
                    settings=ColumnSettings(
                        is_primary_key=True, note="why is id behind name?"
                    ),
                ),
                Column(
                    name="audit_date",
                    data_type=DataType(sql_type="timestamp"),
                    settings=ColumnSettings(default="`getdate()`"),
                ),
            ],
        )
    ]
)

# This converts the diagram to DBML,
# but partitioned_by will not be included
dumps(diagram)

# This includes partitioned_by in the output
dumps(diagram,
      settings=DBMLConverterSettings(
          allow_extra=True
      )
)

# Write the DBML to file
dump(diagram, 'diagram.dbml')
```

### Converters

#### SQL

SQL conversion is backed by **sqlglot** package. The underlying code converts the output Pydantic model to **sqlglot**'s AST Expression. Using **sqlglot** helps transpilation to any SQL dialect easily.

**NOTE THAT**: the output SQL is not guaranteed to be perfect or completely functional due to differences between dialects. If you find any issue, please create a new issue in Github :)

```python
from lark_dbml import load
from lark_dbml.converter import to_sql
from sqlglot import Dialects

# Load DBML diagram
diagram = load("diagram.dbml")

# Convert to SQL for PostgreSQL
sql = to_sql(diagram, Dialects.POSTGRES)
```

#### Data Contract

Convert DBML diagram to a [Data Contract](https://datacontract.com) spec file. The basic usage just convert tables and columns to "model" and "definition" sections. However, `lark-dbml` supports settings to extract more information from a DBML - please expand Advanced Usage.

<details>
<summary>Basic example</summary>

```python
from lark_dbml import load
from lark_dbml.converter import to_data_contract

# Load DBML diagram
diagram = load("diagram.dbml")

# Convert to SQL for PostgreSQL
data_contract = to_data_contract(diagram)
```

</details>

<details>
<summary>Advanced Usage</summary>

You can leverage Sticky Notes in DBML to store information about "terms" and "servers" in JSON or YAML format. Then, you can set `note_as_fields` in the settings to parse and include those information in the generated contract. Here is the example

```python
import json
from lark_dbml import load
from lark_dbml.converter import to_data_contract
from lark_dbml.converter.datacontract import DataContractConverterSettings

# complex_datacontract.dbml inside the exmaples folder in this repo
diagram = load('examples/complex_datacontract.dbml')

# project_as_info: properties in Project are put into "info"
# note_as_description: note in Table Settings is treated as model description.
# note_as_fields: inline note in Table is parsed and extends the corresponding model's properties.
# deserialization_func is required once note_as_fields is set. This is the function to parse the content inside an inline note. In this example, it's JSON

data_contract = to_data_contract(diagram=diagram,
                       settings=DataContractConverterSettings(
                        project_as_info=True,
                        note_as_description=True,
                        note_as_fields=True,
                        deserialization_func=json.loads
                       ))
```

</details>

## Development

Contributions are welcome! Please feel free to open issues or submit pull requests.

## License

This project is licensed under the MIT License - see the LICENSE file for details.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "lark-dbml",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "dbml, lark, sql, ddl, data contract, pydantic, parser, lalr, earley, standalone",
    "author": null,
    "author_email": "Austin Huynh <contact@austinhuynh.me>",
    "download_url": "https://files.pythonhosted.org/packages/96/2e/70ec05ea7ff2e49865adfa9d2d3de0cef0ef4939e110b32c0b903651ab8f/lark_dbml-0.6.0.tar.gz",
    "platform": null,
    "description": "[![](https://img.shields.io/pypi/v/lark-dbml.svg)](https://pypi.org/project/lark-dbml/)\n[![](https://img.shields.io/github/v/tag/daihuynh/lark-dbml.svg?label=GitHub)](https://github.com/daihuynh/lark-dbml)\n[![codecov](https://codecov.io/gh/daihuynh/lark-dbml/graph/badge.svg?token=YZPWVIS3QA)](https://codecov.io/gh/daihuynh/lark-dbml)\n[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)\n[![PyPI Downloads](https://static.pepy.tech/badge/lark-dbml)](https://pepy.tech/projects/lark-dbml)\n\n\n# Lark-DBML\n\n* [Features](#features)\n* [Milestones](#milestones)\n* [Installation](#installation)\n* [Usage](#usage)\n  * [Output Structure](#output-structure)\n  * [Parser](#parser)\n    * [Load](#load-dbml)\n    * [Dump](#dump-dbml)\n  * [Converters](#converters)\n    * [SQL](#sql)\n    * [Data Contract](#data-contract)\n* [Development](#development)\n* [License](#license)\n\nA Python parser for [Database Markup Language (DBML)](https://dbml.dbdiagram.io) built with the powerful LARK parsing toolkit, utilizing the Earley algorithm for robust and flexible parsing.\n\n## Features\n\n* **High-Performance:** `lark-dbml` supports both the Earley and LALR(1) algorithms. The LALR(1) algorithm in `lark` has the best performance compared to Parsimonious, PyParsing, and ANTLR, according to [lark](https://github.com/lark-parser/lark).\n* **Standalone Mode:** the package does not require `lark` package by default. The whole parsing is packed in a single Python file generated from the EBNF grammar file.\n* **Standalone Mode:** fully support [DBML latest specification - April 2025](https://docs.dbdiagram.io/release-notes/).\n* **Pydantic Validation:** Ensures the parsed DBML data conforms to a well-defined structure using Pydantic 2.11, providing reliable data integrity.\n* **Structured Output:** Generates Python objects representing your DBML diagram, making it easy to programmatically access and manipulate your database schema.\n* **Future-Proof:** the parser accepts any properties or settings that are not defined in the DBML spec.\n* **Powerful Conversion & Tooling**:\n  * **DBML Round-Trip**: The package supports full round-trip conversion, allowing to parse DBML, programmatically manipulate the Pydantic models, and then generate the DBML back out.\n  * **SQL**: convert Pydantic output model to SQL with [sqlglot](https://github.com/tobymao/sqlglot).\n  * **Data Contract**: Transform your DBML models into [data contract specification](https://datacontract.com).\n\n## Milestones\n\n- [x] DBML Parser - Earley\n- [x] SQL Converter\n- [x] DBML Converter\n- [x] Data Contract Converter\n- [x] Optimised DBML Parser - LALR(1)\n- [ ] CLI - TBD\n- [ ] Generate DBML from a database connection string - TBD\n\n## Installation\n\nYou can install lark-dbml using pip:\n\n```bash\npip install lark-dbml\n```\n\nTo use `lark` mode when `standalone_mode` is set as False in the `load` function\n```bash\npip install lark-dbml\n```\n\nTo use SQL converter\n```bash\npip install \"lark-dbml[sql]\"\n```\n\n## Usage\n\n### Output Structure\n\nDiagram - a Pydantic model - defines the expected structure of the parsed DBML content, ensuring consistency and type safety.\n\n```python\nclass Diagram(BaseModel):\n    project: Project\n    enums: list[Enum] | None = []\n    table_groups: list[TableGroup] | None = []\n    sticky_notes: list[Note] | None = []\n    references: list[Reference] | None = []\n    tables: list[Table] | None = []\n    table_partials : list[TablePartial] | None = []\n```\n\n### Parser\n\nlark-dbml uses the same API as other parser packages in Python. The default option is `standalone` mode with `LALR(1)` algorithm. Beside default parameters, `load` and `loads` accept any options used by the Lark parser, which can be found at this [link](https://github.com/lark-parser/lark/blob/d1a456dd365603bbcb4b5b4ec2c29e6096b82f59/lark/lark.py#L47)\n\n#### Load DBML\n\n```python\nfrom lark_dbml import load, loads\n\n# 1. Read from a string\ndbml = \"\"\"\nProject \"My Database\" {\n  database_type: 'PostgreSQL'\n  Note: \"This is a sample database\"\n}\n\nTable \"users\" {\n  id int [pk, increment]\n  username varchar [unique, not null]\n  email varchar [unique]\n  created_at timestamp [default: `now()`]\n}\n\nTable \"posts\" {\n  id int [pk, increment]\n  title varchar\n  content text\n  user_id int\n}\n\nRef: posts.user_id > users.id\n\"\"\"\n\n# Default option\ndiagram = loads(dbml)\n# Change to Lark mode\ndiagram = loads(dbml, standalone_mode=False)\n# Switch to Earley algorithm\ndiagram = loads(dbml, parser=\"earley\")\n\n# 2. Read from a file\ndiagram = load('example.dbml')\n```\n\nThe parser can read any settings or properties in DBML objects even if the spec doesn't define them.\n\n```python\ndiagram = loads(\"\"\"\nTable myTable [newkey: 'random_value'] {\n    id int [pk]\n}\n\"\"\")\n```\n\n```\n>>> diagram.tables[0].settings\nTableSettings(note=None, header_color=None, newkey='random_value')\n```\n\n#### Dump DBML\n\n\n```python\nfrom lark_dbml import dump, dumps\n\nfrom lark_dbml.converter.dbml import DBMLConverterSettings\nfrom lark_dbml.schema import (\n    Column,\n    ColumnSettings,\n    DataType,\n    Diagram,\n    Table,\n    TableSettings\n)\n\ndiagram = Diagram(\n    tables=[\n        Table(\n            name=\"body\",\n            alias=\"full_table\",\n            note=\"Incorporated with header and footer\",\n            settings=TableSettings(\n                headercolor=\"#3498DB\",\n                note=\"header note\",\n                partitioned_by=\"id\"\n            ),\n            columns=[\n                Column(\n                    name=\"id\",\n                    data_type=DataType(sql_type=\"int\"),\n                    settings=ColumnSettings(\n                        is_primary_key=True, note=\"why is id behind name?\"\n                    ),\n                ),\n                Column(\n                    name=\"audit_date\",\n                    data_type=DataType(sql_type=\"timestamp\"),\n                    settings=ColumnSettings(default=\"`getdate()`\"),\n                ),\n            ],\n        )\n    ]\n)\n\n# This converts the diagram to DBML,\n# but partitioned_by will not be included\ndumps(diagram)\n\n# This includes partitioned_by in the output\ndumps(diagram,\n      settings=DBMLConverterSettings(\n          allow_extra=True\n      )\n)\n\n# Write the DBML to file\ndump(diagram, 'diagram.dbml')\n```\n\n### Converters\n\n#### SQL\n\nSQL conversion is backed by **sqlglot** package. The underlying code converts the output Pydantic model to **sqlglot**'s AST Expression. Using **sqlglot** helps transpilation to any SQL dialect easily.\n\n**NOTE THAT**: the output SQL is not guaranteed to be perfect or completely functional due to differences between dialects. If you find any issue, please create a new issue in Github :)\n\n```python\nfrom lark_dbml import load\nfrom lark_dbml.converter import to_sql\nfrom sqlglot import Dialects\n\n# Load DBML diagram\ndiagram = load(\"diagram.dbml\")\n\n# Convert to SQL for PostgreSQL\nsql = to_sql(diagram, Dialects.POSTGRES)\n```\n\n#### Data Contract\n\nConvert DBML diagram to a [Data Contract](https://datacontract.com) spec file. The basic usage just convert tables and columns to \"model\" and \"definition\" sections. However, `lark-dbml` supports settings to extract more information from a DBML - please expand Advanced Usage.\n\n<details>\n<summary>Basic example</summary>\n\n```python\nfrom lark_dbml import load\nfrom lark_dbml.converter import to_data_contract\n\n# Load DBML diagram\ndiagram = load(\"diagram.dbml\")\n\n# Convert to SQL for PostgreSQL\ndata_contract = to_data_contract(diagram)\n```\n\n</details>\n\n<details>\n<summary>Advanced Usage</summary>\n\nYou can leverage Sticky Notes in DBML to store information about \"terms\" and \"servers\" in JSON or YAML format. Then, you can set `note_as_fields` in the settings to parse and include those information in the generated contract. Here is the example\n\n```python\nimport json\nfrom lark_dbml import load\nfrom lark_dbml.converter import to_data_contract\nfrom lark_dbml.converter.datacontract import DataContractConverterSettings\n\n# complex_datacontract.dbml inside the exmaples folder in this repo\ndiagram = load('examples/complex_datacontract.dbml')\n\n# project_as_info: properties in Project are put into \"info\"\n# note_as_description: note in Table Settings is treated as model description.\n# note_as_fields: inline note in Table is parsed and extends the corresponding model's properties.\n# deserialization_func is required once note_as_fields is set. This is the function to parse the content inside an inline note. In this example, it's JSON\n\ndata_contract = to_data_contract(diagram=diagram,\n                       settings=DataContractConverterSettings(\n                        project_as_info=True,\n                        note_as_description=True,\n                        note_as_fields=True,\n                        deserialization_func=json.loads\n                       ))\n```\n\n</details>\n\n## Development\n\nContributions are welcome! Please feel free to open issues or submit pull requests.\n\n## License\n\nThis project is licensed under the MIT License - see the LICENSE file for details.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "DBML parser using Lark.",
    "version": "0.6.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/daihuynh/lark-dbml/issues",
        "Homepage": "https://github.com/daihuynh/lark-dbml"
    },
    "split_keywords": [
        "dbml",
        " lark",
        " sql",
        " ddl",
        " data contract",
        " pydantic",
        " parser",
        " lalr",
        " earley",
        " standalone"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "6e30e8e3a1ca9aeb48238e6f8ddef4f1cfa4b8ca278d854aaf096f69707dc055",
                "md5": "1798adcea07706a26380ffff238f4081",
                "sha256": "8d36ba5ec27f12fb743f87b0b8296d807c7e3a606d8aa38f69464849fb03d336"
            },
            "downloads": -1,
            "filename": "lark_dbml-0.6.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1798adcea07706a26380ffff238f4081",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 76805,
            "upload_time": "2025-08-08T13:33:40",
            "upload_time_iso_8601": "2025-08-08T13:33:40.227639Z",
            "url": "https://files.pythonhosted.org/packages/6e/30/e8e3a1ca9aeb48238e6f8ddef4f1cfa4b8ca278d854aaf096f69707dc055/lark_dbml-0.6.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "962e70ec05ea7ff2e49865adfa9d2d3de0cef0ef4939e110b32c0b903651ab8f",
                "md5": "0996526a993db6a0dff242f6ca124ed0",
                "sha256": "404b112b0d1ee8c2f46857b9831833895d9c1b3f6bf365844931a1ee09a490de"
            },
            "downloads": -1,
            "filename": "lark_dbml-0.6.0.tar.gz",
            "has_sig": false,
            "md5_digest": "0996526a993db6a0dff242f6ca124ed0",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 69132,
            "upload_time": "2025-08-08T13:33:41",
            "upload_time_iso_8601": "2025-08-08T13:33:41.491473Z",
            "url": "https://files.pythonhosted.org/packages/96/2e/70ec05ea7ff2e49865adfa9d2d3de0cef0ef4939e110b32c0b903651ab8f/lark_dbml-0.6.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-08 13:33:41",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "daihuynh",
    "github_project": "lark-dbml",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "lark-dbml"
}
        
Elapsed time: 3.21125s