dbml-to-fides

Name	dbml-to-fides JSON
Version	1.0.0b1 JSON
	download
home_page
Summary	Interoperatbility for DBML and Fides dataset manifests
upload_time	2023-06-27 13:11:33
maintainer
docs_url	None
author
requires_python	<4,>=3.8
license	Copyright 2023 Ee Durbin Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords	fides dbml
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # dbml-to-fides

This tool converts [DBML](https://dbml.dbdiagram.io/docs/#project-definition)
to [Fides dataset manifests](https://ethyca.github.io/fideslang/resources/dataset/).

It optionally has support for merging the result from DBML into an existing
Fides dataset manifest.

Combined, this can be used in automation to ensure that datasets are kept
up-to-date with the latest schema changes in continuous integration.

## Usage

### Basic

Given a sample DBML in `sample.dbml`:

```dbml
Table users {
  id integer [primary key]
  username varchar
  role varchar
  created_at timestamp
}

Table posts {
  id integer [primary key]
  title varchar
  body text [note: 'Content of the post']
  user_id integer
  status post_status
  created_at timestamp
}

Enum post_status {
  draft
  published
  private [note: 'visible via URL only']
}

Ref: posts.user_id > users.id // many-to-one
```

`dbml-to-fides` will output what it can infer from the DBML file as a Fides
dataset:

```sh
$ dbml-to-fides sample.dbml
dataset:
- name: public
  collections:
  - name: users
    description: Users
    fields:
    - name: id
      fides_meta:
        primary_key: true
    - name: username
    - name: role
    - name: created_at
  - name: posts
    description: All the content you crave
    fields:
    - name: id
      fides_meta:
        primary_key: true
    - name: title
    - name: body
      description: Content of the post
    - name: user_id
      fides_meta:
        references:
        - dataset: public
          field: users.id
          direction: to
    - name: status
    - name: created_at
```

### Merging with existing Fides dataset

If you have an existing Fides dataset in `.fides/sample_dataset.yml`:

```yaml
dataset:
- fides_key: sample_dataset
  organization_fides_key: default_organization
  name: public
  description: Sample dataset for my system
  meta: null
  data_categories: []
  data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified
  retention: 30 days after account deletion
  collections:
  - name: users
    description: User information
    fields:
    - name: id
      fides_meta:
        primary_key: true
      description: User's unique ID
      data_categories:
      - user.unique_id
      data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified
    - name: username
      description: User's username
      data_categories:
      - user.name
      data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified
      retention: Account termination
    - name: role
      description: User's system level role/privilege
      data_categories:
      - system.operations
      data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified
    - name: created_at
      description: User's creation timestamp
      data_categories:
      - system.operations
      data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified
  - name: posts
    description: Post information
    fields:
    - name: id
      fides_meta:
        primary_key: true
      description: Post's unique ID
      data_categories:
      - system.operations
      data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified
    - name: title
      description: Post's title
      data_categories:
      - system.operations
      data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified
    - name: body
      description: Post's body
      data_categories:
      - system.operations
      data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified
    - name: user_id
      fides_meta:
        references:
        - dataset: public
          field: users.id
          direction: to
      description: Post creator's unique User ID
      data_categories:
      - user.unique_id
      data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified
    - name: status
      description: User's creation timestamp
      data_categories:
      - system.operations
      data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified
    - name: created_at
      description: Post's creation timestamp
      data_categories:
      - system.operations
      data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified

```

`dbml-to-fides` can be used with the
`--base-dataset` option to merge the results together.
But, in this case there are no differences:

```sh
$ diff -u .fides/sample_dataset.yml <(dbml-to-fides sample.dbml --base-dataset .fides/sample_dataset.yml)
$
```

If we introduce a change to the DBML:

```diff
@@ -3,6 +3,7 @@ Table users {
   username varchar
   role varchar
   created_at timestamp
+  social_security_number varchar
 }
 
 Table posts {
```

Then running our diff again will add the field to our Fides dataset:

```shell
$ diff -u .fides/sample_dataset.yml <(dbml-to-fides sample.dbml --base-dataset .fides/sample_dataset.yml)
--- .fides/sample_dataset.yml	2023-05-22 15:39:24
+++ /dev/fd/63	2023-05-22 15:40:07
@@ -34,6 +34,7 @@
       data_categories:
       - system.operations
       data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified
+    - name: social_security_number
   - name: posts
     description: Post information
     fields:
```

### File output

If we wanted to write the output to a file,
we would add the `--output-file` flag:

```shell
$ dbml-to-fides sample.dbml --base-dataset .fides/sample_dataset.yml --output-file .fides/sample_dataset.yml
$ git diff
diff --git a/.fides/sample_dataset.yml b/.fides/sample_dataset.yml
index 594cee4..edc3141 100644
--- a/.fides/sample_dataset.yml
+++ b/.fides/sample_dataset.yml
@@ -34,6 +34,7 @@ dataset:
       data_categories:
       - system.operations
       data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified
+    - name: social_security_number
   - name: posts
     description: Post information
     fields:
```

### Initial generation

If you do not have an existing Fides dataset, the `--include-fides-keys` flag will
create a more "fleshed out" version of a
[Fides dataset](https://ethyca.github.io/fideslang/resources/dataset/)
including all keys. See the [docs](https://ethyca.github.io/fideslang/resources/dataset/)
for what each field can/should be populated with.

```shell
$ dbml-to-fides sample.dbml --include-fides-keys
dataset:
- fides_key: null
  name: public
  description: null
  organization_fides_key: null
  meta: {}
  third_country_transfers: []
  joint_controller: []
  retention: null
  data_categories: []
  data_qualifiers: []
  collections:
  - name: users
    description: Users
    data_categories: []
    data_qualifiers: []
    retention: null
    fields:
    - name: id
      description: null
      data_categories: []
      data_qualifier: null
      retention: null
      fides_meta:
        primary_key: true
    - name: username
      description: null
      data_categories: []
      data_qualifier: null
      retention: null
    - name: role
      description: null
      data_categories: []
      data_qualifier: null
      retention: null
    - name: created_at
      description: null
      data_categories: []
      data_qualifier: null
      retention: null
    - name: social
      description: null
      data_categories: []
      data_qualifier: null
      retention: null
  - name: posts
    description: All the content you crave
    data_categories: []
    data_qualifiers: []
    retention: null
    fields:
    - name: id
      description: null
      data_categories: []
      data_qualifier: null
      retention: null
      fides_meta:
        primary_key: true
    - name: title
      description: null
      data_categories: []
      data_qualifier: null
      retention: null
    - name: body
      description: Content of the post
      data_categories: []
      data_qualifier: null
      retention: null
    - name: user_id
      description: null
      data_categories: []
      data_qualifier: null
      retention: null
      fides_meta:
        references:
        - dataset: public
          field: users.id
          direction: to
    - name: status
      description: null
      data_categories: []
      data_qualifier: null
      retention: null
    - name: created_at
      description: null
      data_categories: []
      data_qualifier: null
      retention: null
```

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "dbml-to-fides",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "<4,>=3.8",
    "maintainer_email": "",
    "keywords": "fides,dbml",
    "author": "",
    "author_email": "Ee Durbin <ee.opensource@pyfound.org>",
    "download_url": "https://files.pythonhosted.org/packages/3e/eb/314007dcd2fb85e5e86a79e11de46d78cb141135e4d43869d02963872d4d/dbml-to-fides-1.0.0b1.tar.gz",
    "platform": null,
    "description": "# dbml-to-fides\n\nThis tool converts [DBML](https://dbml.dbdiagram.io/docs/#project-definition)\nto [Fides dataset manifests](https://ethyca.github.io/fideslang/resources/dataset/).\n\nIt optionally has support for merging the result from DBML into an existing\nFides dataset manifest.\n\nCombined, this can be used in automation to ensure that datasets are kept\nup-to-date with the latest schema changes in continuous integration.\n\n## Usage\n\n### Basic\n\nGiven a sample DBML in `sample.dbml`:\n\n```dbml\nTable users {\n  id integer [primary key]\n  username varchar\n  role varchar\n  created_at timestamp\n}\n\nTable posts {\n  id integer [primary key]\n  title varchar\n  body text [note: 'Content of the post']\n  user_id integer\n  status post_status\n  created_at timestamp\n}\n\nEnum post_status {\n  draft\n  published\n  private [note: 'visible via URL only']\n}\n\nRef: posts.user_id > users.id // many-to-one\n```\n\n`dbml-to-fides` will output what it can infer from the DBML file as a Fides\ndataset:\n\n```sh\n$ dbml-to-fides sample.dbml\ndataset:\n- name: public\n  collections:\n  - name: users\n    description: Users\n    fields:\n    - name: id\n      fides_meta:\n        primary_key: true\n    - name: username\n    - name: role\n    - name: created_at\n  - name: posts\n    description: All the content you crave\n    fields:\n    - name: id\n      fides_meta:\n        primary_key: true\n    - name: title\n    - name: body\n      description: Content of the post\n    - name: user_id\n      fides_meta:\n        references:\n        - dataset: public\n          field: users.id\n          direction: to\n    - name: status\n    - name: created_at\n```\n\n### Merging with existing Fides dataset\n\nIf you have an existing Fides dataset in `.fides/sample_dataset.yml`:\n\n```yaml\ndataset:\n- fides_key: sample_dataset\n  organization_fides_key: default_organization\n  name: public\n  description: Sample dataset for my system\n  meta: null\n  data_categories: []\n  data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified\n  retention: 30 days after account deletion\n  collections:\n  - name: users\n    description: User information\n    fields:\n    - name: id\n      fides_meta:\n        primary_key: true\n      description: User's unique ID\n      data_categories:\n      - user.unique_id\n      data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified\n    - name: username\n      description: User's username\n      data_categories:\n      - user.name\n      data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified\n      retention: Account termination\n    - name: role\n      description: User's system level role/privilege\n      data_categories:\n      - system.operations\n      data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified\n    - name: created_at\n      description: User's creation timestamp\n      data_categories:\n      - system.operations\n      data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified\n  - name: posts\n    description: Post information\n    fields:\n    - name: id\n      fides_meta:\n        primary_key: true\n      description: Post's unique ID\n      data_categories:\n      - system.operations\n      data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified\n    - name: title\n      description: Post's title\n      data_categories:\n      - system.operations\n      data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified\n    - name: body\n      description: Post's body\n      data_categories:\n      - system.operations\n      data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified\n    - name: user_id\n      fides_meta:\n        references:\n        - dataset: public\n          field: users.id\n          direction: to\n      description: Post creator's unique User ID\n      data_categories:\n      - user.unique_id\n      data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified\n    - name: status\n      description: User's creation timestamp\n      data_categories:\n      - system.operations\n      data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified\n    - name: created_at\n      description: Post's creation timestamp\n      data_categories:\n      - system.operations\n      data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified\n\n```\n\n`dbml-to-fides` can be used with the\n`--base-dataset` option to merge the results together.\nBut, in this case there are no differences:\n\n```sh\n$ diff -u .fides/sample_dataset.yml <(dbml-to-fides sample.dbml --base-dataset .fides/sample_dataset.yml)\n$\n```\n\nIf we introduce a change to the DBML:\n\n```diff\n@@ -3,6 +3,7 @@ Table users {\n   username varchar\n   role varchar\n   created_at timestamp\n+  social_security_number varchar\n }\n \n Table posts {\n```\n\nThen running our diff again will add the field to our Fides dataset:\n\n```shell\n$ diff -u .fides/sample_dataset.yml <(dbml-to-fides sample.dbml --base-dataset .fides/sample_dataset.yml)\n--- .fides/sample_dataset.yml\t2023-05-22 15:39:24\n+++ /dev/fd/63\t2023-05-22 15:40:07\n@@ -34,6 +34,7 @@\n       data_categories:\n       - system.operations\n       data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified\n+    - name: social_security_number\n   - name: posts\n     description: Post information\n     fields:\n```\n\n### File output\n\nIf we wanted to write the output to a file,\nwe would add the `--output-file` flag:\n\n```shell\n$ dbml-to-fides sample.dbml --base-dataset .fides/sample_dataset.yml --output-file .fides/sample_dataset.yml\n$ git diff\ndiff --git a/.fides/sample_dataset.yml b/.fides/sample_dataset.yml\nindex 594cee4..edc3141 100644\n--- a/.fides/sample_dataset.yml\n+++ b/.fides/sample_dataset.yml\n@@ -34,6 +34,7 @@ dataset:\n       data_categories:\n       - system.operations\n       data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified\n+    - name: social_security_number\n   - name: posts\n     description: Post information\n     fields:\n```\n\n### Initial generation\n\nIf you do not have an existing Fides dataset, the `--include-fides-keys` flag will\ncreate a more \"fleshed out\" version of a\n[Fides dataset](https://ethyca.github.io/fideslang/resources/dataset/)\nincluding all keys. See the [docs](https://ethyca.github.io/fideslang/resources/dataset/)\nfor what each field can/should be populated with.\n\n```shell\n$ dbml-to-fides sample.dbml --include-fides-keys\ndataset:\n- fides_key: null\n  name: public\n  description: null\n  organization_fides_key: null\n  meta: {}\n  third_country_transfers: []\n  joint_controller: []\n  retention: null\n  data_categories: []\n  data_qualifiers: []\n  collections:\n  - name: users\n    description: Users\n    data_categories: []\n    data_qualifiers: []\n    retention: null\n    fields:\n    - name: id\n      description: null\n      data_categories: []\n      data_qualifier: null\n      retention: null\n      fides_meta:\n        primary_key: true\n    - name: username\n      description: null\n      data_categories: []\n      data_qualifier: null\n      retention: null\n    - name: role\n      description: null\n      data_categories: []\n      data_qualifier: null\n      retention: null\n    - name: created_at\n      description: null\n      data_categories: []\n      data_qualifier: null\n      retention: null\n    - name: social\n      description: null\n      data_categories: []\n      data_qualifier: null\n      retention: null\n  - name: posts\n    description: All the content you crave\n    data_categories: []\n    data_qualifiers: []\n    retention: null\n    fields:\n    - name: id\n      description: null\n      data_categories: []\n      data_qualifier: null\n      retention: null\n      fides_meta:\n        primary_key: true\n    - name: title\n      description: null\n      data_categories: []\n      data_qualifier: null\n      retention: null\n    - name: body\n      description: Content of the post\n      data_categories: []\n      data_qualifier: null\n      retention: null\n    - name: user_id\n      description: null\n      data_categories: []\n      data_qualifier: null\n      retention: null\n      fides_meta:\n        references:\n        - dataset: public\n          field: users.id\n          direction: to\n    - name: status\n      description: null\n      data_categories: []\n      data_qualifier: null\n      retention: null\n    - name: created_at\n      description: null\n      data_categories: []\n      data_qualifier: null\n      retention: null\n```\n",
    "bugtrack_url": null,
    "license": "Copyright 2023 Ee Durbin  Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \u201cSoftware\u201d), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:  The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.  THE SOFTWARE IS PROVIDED \u201cAS IS\u201d, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ",
    "summary": "Interoperatbility for DBML and Fides dataset manifests",
    "version": "1.0.0b1",
    "project_urls": {
        "Homepage": "https://github.com/ewdurbin/dbml-to-fides",
        "Source": "https://github.com/ewdurbin/dbml-to-fides"
    },
    "split_keywords": [
        "fides",
        "dbml"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1f1f2d6e198117cb0480a6dc5aff28b177b6ff6dc09c20935506b310fb807816",
                "md5": "317cb5e9793da976928d6e440a32b139",
                "sha256": "33b32b02d683610f0c3bdd736a9a93a7b12542d01119886fef9ec66641d61ede"
            },
            "downloads": -1,
            "filename": "dbml_to_fides-1.0.0b1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "317cb5e9793da976928d6e440a32b139",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4,>=3.8",
            "size": 7079,
            "upload_time": "2023-06-27T13:11:32",
            "upload_time_iso_8601": "2023-06-27T13:11:32.247251Z",
            "url": "https://files.pythonhosted.org/packages/1f/1f/2d6e198117cb0480a6dc5aff28b177b6ff6dc09c20935506b310fb807816/dbml_to_fides-1.0.0b1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3eeb314007dcd2fb85e5e86a79e11de46d78cb141135e4d43869d02963872d4d",
                "md5": "727fff5b80ccbcdd4632a3a74653ff26",
                "sha256": "96763925e957606299a8c0493017e27531524cda28e590d69972467af1210bca"
            },
            "downloads": -1,
            "filename": "dbml-to-fides-1.0.0b1.tar.gz",
            "has_sig": false,
            "md5_digest": "727fff5b80ccbcdd4632a3a74653ff26",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4,>=3.8",
            "size": 17271,
            "upload_time": "2023-06-27T13:11:33",
            "upload_time_iso_8601": "2023-06-27T13:11:33.605841Z",
            "url": "https://files.pythonhosted.org/packages/3e/eb/314007dcd2fb85e5e86a79e11de46d78cb141135e4d43869d02963872d4d/dbml-to-fides-1.0.0b1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-06-27 13:11:33",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "ewdurbin",
    "github_project": "dbml-to-fides",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "dbml-to-fides"
}