skeem


Nameskeem JSON
Version 0.1.1 PyPI version JSON
download
home_pageNone
SummaryInfer SQL DDL statements from tabular data
upload_time2024-10-22 07:37:52
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseMIT
keywords csv ddl excel frictionless gcs github grib gsheet http infer inference influxdb introspection json jsonl ldjson netcdf ndjson pandas parquet rdbms reflection s3 schema schema conversion schema inference spreadsheet sql sql-schema tabular-data
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            #####
Skeem
#####

|

.. start-badges

|ci-tests| |ci-coverage| |license| |pypi-downloads|
|python-versions| |status| |pypi-version|

.. |ci-tests| image:: https://github.com/daq-tools/skeem/actions/workflows/tests.yml/badge.svg
    :target: https://github.com/daq-tools/skeem/actions/workflows/tests.yml

.. |ci-coverage| image:: https://codecov.io/gh/daq-tools/skeem/branch/main/graph/badge.svg
    :target: https://codecov.io/gh/daq-tools/skeem
    :alt: Test suite code coverage

.. |python-versions| image:: https://img.shields.io/pypi/pyversions/skeem.svg
    :target: https://pypi.org/project/skeem/

.. |status| image:: https://img.shields.io/pypi/status/skeem.svg
    :target: https://pypi.org/project/skeem/

.. |pypi-version| image:: https://img.shields.io/pypi/v/skeem.svg
    :target: https://pypi.org/project/skeem/

.. |pypi-downloads| image:: https://static.pepy.tech/badge/skeem/month
    :target: https://pypi.org/project/skeem/

.. |license| image:: https://img.shields.io/pypi/l/skeem.svg
    :target: https://github.com/daq-tools/skeem/blob/main/LICENSE

.. end-badges


*****
About
*****

Skeem infers SQL DDL statements from tabular data.

Skeem is, amongst others, based on the excellent `ddlgenerator`_,
`frictionless`_, `fsspec`_, `pandas`_, `ScipPy`_, `SQLAlchemy`_
and `xarray`_ packages, and can be used both as a standalone program,
and as a library.

Supported input data:

- `Apache Parquet`_
- `CSV`_
- `Google Sheets`_
- `GRIB`_
- `InfluxDB line protocol`_
- `JSON`_
- `NetCDF`_
- `NDJSON`_ (formerly LDJSON) aka. `JSON Lines`_, see also `JSON streaming`_
- `Office Open XML Workbook`_ (`Microsoft Excel`_)
- `OpenDocument Spreadsheet`_ (`LibreOffice`_)

Supported input sources:

- `Amazon S3`_
- `File system`_
- `GitHub`_
- `Google Cloud Storage`_
- `HTTP`_

Please note that Skeem is beta-quality software, and a work in progress.
Contributions of all kinds are very welcome, in order to make it more solid.
Breaking changes should be expected until a 1.0 release, so version pinning
is recommended, especially when you use it as a library.


********
Synopsis
********

.. code-block:: sh

    skeem infer-ddl --dialect=postgresql data.ndjson

.. code-block:: sql

    CREATE TABLE "data" (
        "id" SERIAL NOT NULL,
        "name" TEXT NOT NULL,
        "date" TIMESTAMP WITHOUT TIME ZONE,
        "fruits" TEXT NOT NULL,
        "price" DECIMAL(2, 2) NOT NULL,
        PRIMARY KEY ("id")
    );


**********
Quickstart
**********

If you are in a hurry, and want to run Skeem without any installation, just use
the OCI image on Podman or Docker.

.. code-block:: sh

    docker run --rm ghcr.io/daq-tools/skeem-standard \
        skeem infer-ddl --dialect=postgresql \
        https://github.com/daq-tools/skeem/raw/main/tests/testdata/basic.ndjson


*****
Setup
*****

Install Skeem from PyPI.

.. code-block:: sh

    pip install skeem

Install Skeem with support for additional data formats like NetCDF.

.. code-block:: sh

    pip install 'skeem[scientific]'


*****
Usage
*****

This section outlines some example invocations of Skeem, both on the command
line, and per library use. Other than the resources available from the web,
testing data can be acquired from the repository's `testdata`_ folder.

Command line use
================

Help
----

.. code-block:: sh

    skeem info
    skeem --help
    skeem infer-ddl --help

Read from files
---------------

.. code-block:: sh

    # NDJSON, Parquet, and InfluxDB line protocol (ILP) formats.
    skeem infer-ddl --dialect=postgresql data.ndjson
    skeem infer-ddl --dialect=postgresql data.parquet
    skeem infer-ddl --dialect=postgresql data.lp

    # CSV, JSON, ODS, and XLSX formats.
    skeem infer-ddl --dialect=postgresql data.csv
    skeem infer-ddl --dialect=postgresql data.json
    skeem infer-ddl --dialect=postgresql data.ods
    skeem infer-ddl --dialect=postgresql data.xlsx
    skeem infer-ddl --dialect=postgresql data.xlsx --address="Sheet2"

Read from URLs
--------------

.. code-block:: sh

    # CSV, NDJSON, XLSX
    skeem infer-ddl --dialect=postgresql https://github.com/daq-tools/skeem/raw/main/tests/testdata/basic.csv
    skeem infer-ddl --dialect=postgresql https://github.com/daq-tools/skeem/raw/main/tests/testdata/basic.ndjson
    skeem infer-ddl --dialect=postgresql https://github.com/daq-tools/skeem/raw/main/tests/testdata/basic.xlsx --address="Sheet2"

    # Google Sheets: Address first sheet, and specific sheet of workbook.
    skeem infer-ddl --dialect=postgresql --table-name=foo https://docs.google.com/spreadsheets/d/1ExyrawjlyksbC6DOM6nLolJDbU8qiRrrhxSuxf5ScB0/view
    skeem infer-ddl --dialect=postgresql --table-name=foo https://docs.google.com/spreadsheets/d/1ExyrawjlyksbC6DOM6nLolJDbU8qiRrrhxSuxf5ScB0/view#gid=883324548

    # InfluxDB line protocol (ILP)
    skeem infer-ddl --dialect=postgresql https://github.com/influxdata/influxdb2-sample-data/raw/master/air-sensor-data/air-sensor-data.lp

    # Compressed files in gzip format
    skeem --verbose infer-ddl --dialect=crate --content-type=ndjson https://s3.amazonaws.com/crate.sampledata/nyc.yellowcab/yc.2019.07.gz

    # CSV on S3
    skeem --verbose infer-ddl --dialect=postgresql s3://noaa-ghcn-pds/csv/by_year/2022.csv

    # CSV on Google Cloud Storage
    skeem --verbose infer-ddl --dialect=postgresql gs://tinybird-assets/datasets/nations.csv
    skeem --verbose infer-ddl --dialect=postgresql gs://tinybird-assets/datasets/medals1.csv

    # CSV on GitHub
    skeem --verbose infer-ddl --dialect=postgresql github://daq-tools:skeem@/tests/testdata/basic.csv

    # GRIB2, NetCDF
    skeem infer-ddl --dialect=postgresql https://github.com/earthobservations/testdata/raw/main/opendata.dwd.de/weather/nwp/icon/grib/18/t/icon-global_regular-lat-lon_air-temperature_level-90.grib2
    skeem infer-ddl --dialect=postgresql https://www.unidata.ucar.edu/software/netcdf/examples/sresa1b_ncar_ccsm3-example.nc
    skeem infer-ddl --dialect=postgresql https://www.unidata.ucar.edu/software/netcdf/examples/WMI_Lear.nc

OCI
---

OCI images are available on the GitHub Container Registry (GHCR). In order to
run them on Podman or Docker, invoke:

.. code-block:: sh

    docker run --rm ghcr.io/daq-tools/skeem-standard \
        skeem infer-ddl --dialect=postgresql \
        https://github.com/daq-tools/skeem/raw/main/tests/testdata/basic.csv

If you want to work with files on your filesystem, you will need to either
mount the working directory into the container using the ``--volume`` option,
or use the ``--interactive`` option to consume STDIN, like:

.. code-block:: sh

    docker run --rm --volume=$(pwd):/data ghcr.io/daq-tools/skeem-standard \
        skeem infer-ddl --dialect=postgresql /data/basic.ndjson

    docker run --rm --interactive ghcr.io/daq-tools/skeem-standard \
        skeem infer-ddl --dialect=postgresql --content-type=ndjson - < basic.ndjson

In order to always run the latest ``nightly`` development version, and to use a
shortcut for that, this section outlines how to use an alias for ``skeem``, and
a variable for storing the input URL. It may be useful to save a few keystrokes
on subsequent invocations.

.. code-block:: sh

    docker pull ghcr.io/daq-tools/skeem-standard:nightly
    alias skeem="docker run --rm --interactive ghcr.io/daq-tools/skeem-standard:nightly skeem"
    URL=https://github.com/daq-tools/skeem/raw/main/tests/testdata/basic.ndjson

    skeem infer-ddl --dialect=postgresql $URL


More
----

Use a different backend (default: ``ddlgen``)::

    skeem infer-ddl --dialect=postgresql --backend=frictionless data.ndjson

Reading data from STDIN needs to obtain both the table name and content type separately::

    skeem infer-ddl --dialect=crate --table-name=foo --content-type=ndjson - < data.ndjson

Reading data from STDIN also works like this, if you prefer to use pipes::

    cat data.ndjson | skeem infer-ddl --dialect=crate --table-name=foo --content-type=ndjson -


Library use
===========

.. code-block:: python

    import io
    from skeem.core import SchemaGenerator
    from skeem.model import Resource, SqlTarget

    INDATA = io.StringIO(
        """
        {"id":1,"name":"foo","date":"2014-10-31 09:22:56","fruits":"apple,banana","price":0.42}
        {"id":2,"name":"bar","date":null,"fruits":"pear","price":0.84}
        """
    )

    sg = SchemaGenerator(
        resource=Resource(data=INDATA, content_type="ndjson"),
        target=SqlTarget(dialect="crate", table_name="testdrive"),
    )

    print(sg.to_sql_ddl().pretty)

.. code-block:: sql

    CREATE TABLE "testdrive" (
        "id" INT NOT NULL,
        "name" STRING NOT NULL,
        "date" TIMESTAMP,
        "fruits" STRING NOT NULL,
        "price" DOUBLE NOT NULL,
        PRIMARY KEY ("id")
    );


***********
Development
***********

For installing the project from source, please follow the `development`_
documentation.


*******************
Project information
*******************

Credits
=======
- `Catherine Devlin`_ for `ddlgenerator`_ and `data_dispenser`_.
- `Mike Bayer`_ for `SQLAlchemy`_.
- `Paul Walsh`_ and `Evgeny Karev`_ for `frictionless`_.
- `Wes McKinney`_ for `pandas`_.
- All other countless contributors and authors of excellent Python
  packages, Python itself, and turtles all the way down.

Prior art
=========
We are maintaining a `list of other projects`_ with the same or similar goals
like Skeem.

Etymology
=========
The program was about to be called *Eskema*, but it turned out that there is
already another `Eskema`_ out there. So, it has been renamed to *Skeem*, which
is Estonian, and means "schema", "outline", or "(to) plan".



.. _Amazon S3: https://en.wikipedia.org/wiki/Amazon_S3
.. _Apache Parquet: https://en.wikipedia.org/wiki/Apache_Parquet
.. _Catherine Devlin: https://github.com/catherinedevlin
.. _CSV: https://en.wikipedia.org/wiki/Comma-separated_values
.. _data_dispenser: https://pypi.org/project/data_dispenser/
.. _ddlgenerator: https://pypi.org/project/ddlgenerator/
.. _development: doc/development.rst
.. _Eskema: https://github.com/nombrekeff/eskema
.. _Evgeny Karev: https://github.com/roll
.. _file system: https://en.wikipedia.org/wiki/File_system
.. _frictionless: https://github.com/frictionlessdata/framework
.. _fsspec: https://pypi.org/project/fsspec/
.. _GitHub: https://github.com/
.. _Google Cloud Storage: https://en.wikipedia.org/wiki/Google_Cloud_Storage
.. _Google Sheets: https://en.wikipedia.org/wiki/Google_Sheets
.. _GRIB: https://en.wikipedia.org/wiki/GRIB
.. _HTTP: https://en.wikipedia.org/wiki/HTTP
.. _InfluxDB line protocol: https://docs.influxdata.com/influxdb/latest/reference/syntax/line-protocol/
.. _JSON: https://www.json.org/
.. _JSON Lines: https://jsonlines.org/
.. _JSON streaming: https://en.wikipedia.org/wiki/JSON_streaming
.. _LibreOffice: https://en.wikipedia.org/wiki/LibreOffice
.. _list of other projects: doc/prior-art.rst
.. _Microsoft Excel: https://en.wikipedia.org/wiki/Microsoft_Excel
.. _Mike Bayer: https://github.com/zzzeek
.. _NDJSON: http://ndjson.org/
.. _NetCDF: https://en.wikipedia.org/wiki/NetCDF
.. _Office Open XML Workbook: https://en.wikipedia.org/wiki/Office_Open_XML
.. _OpenDocument Spreadsheet: https://en.wikipedia.org/wiki/OpenDocument
.. _pandas: https://pandas.pydata.org/
.. _Paul Walsh: https://github.com/pwalsh
.. _ScipPy: https://scipy.org/
.. _SQLAlchemy: https://pypi.org/project/SQLAlchemy/
.. _testdata: https://github.com/daq-tools/skeem/tree/main/tests/testdata
.. _Wes McKinney: https://github.com/wesm
.. _xarray: https://xarray.dev/

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "skeem",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "csv, ddl, excel, frictionless, gcs, github, grib, gsheet, http, infer, inference, influxdb, introspection, json, jsonl, ldjson, netcdf, ndjson, pandas, parquet, rdbms, reflection, s3, schema, schema conversion, schema inference, spreadsheet, sql, sql-schema, tabular-data",
    "author": null,
    "author_email": "Andreas Motl <andreas.motl@panodata.org>, Benjamin Gutzmann <gutzemann@gmail.com>, Richard Pobering <richard.pobering@panodata.org>",
    "download_url": "https://files.pythonhosted.org/packages/dc/ab/96cd23c2d596b01736ea311303f019c0919f43699d736a7f7246f407280a/skeem-0.1.1.tar.gz",
    "platform": null,
    "description": "#####\nSkeem\n#####\n\n|\n\n.. start-badges\n\n|ci-tests| |ci-coverage| |license| |pypi-downloads|\n|python-versions| |status| |pypi-version|\n\n.. |ci-tests| image:: https://github.com/daq-tools/skeem/actions/workflows/tests.yml/badge.svg\n    :target: https://github.com/daq-tools/skeem/actions/workflows/tests.yml\n\n.. |ci-coverage| image:: https://codecov.io/gh/daq-tools/skeem/branch/main/graph/badge.svg\n    :target: https://codecov.io/gh/daq-tools/skeem\n    :alt: Test suite code coverage\n\n.. |python-versions| image:: https://img.shields.io/pypi/pyversions/skeem.svg\n    :target: https://pypi.org/project/skeem/\n\n.. |status| image:: https://img.shields.io/pypi/status/skeem.svg\n    :target: https://pypi.org/project/skeem/\n\n.. |pypi-version| image:: https://img.shields.io/pypi/v/skeem.svg\n    :target: https://pypi.org/project/skeem/\n\n.. |pypi-downloads| image:: https://static.pepy.tech/badge/skeem/month\n    :target: https://pypi.org/project/skeem/\n\n.. |license| image:: https://img.shields.io/pypi/l/skeem.svg\n    :target: https://github.com/daq-tools/skeem/blob/main/LICENSE\n\n.. end-badges\n\n\n*****\nAbout\n*****\n\nSkeem infers SQL DDL statements from tabular data.\n\nSkeem is, amongst others, based on the excellent `ddlgenerator`_,\n`frictionless`_, `fsspec`_, `pandas`_, `ScipPy`_, `SQLAlchemy`_\nand `xarray`_ packages, and can be used both as a standalone program,\nand as a library.\n\nSupported input data:\n\n- `Apache Parquet`_\n- `CSV`_\n- `Google Sheets`_\n- `GRIB`_\n- `InfluxDB line protocol`_\n- `JSON`_\n- `NetCDF`_\n- `NDJSON`_ (formerly LDJSON) aka. `JSON Lines`_, see also `JSON streaming`_\n- `Office Open XML Workbook`_ (`Microsoft Excel`_)\n- `OpenDocument Spreadsheet`_ (`LibreOffice`_)\n\nSupported input sources:\n\n- `Amazon S3`_\n- `File system`_\n- `GitHub`_\n- `Google Cloud Storage`_\n- `HTTP`_\n\nPlease note that Skeem is beta-quality software, and a work in progress.\nContributions of all kinds are very welcome, in order to make it more solid.\nBreaking changes should be expected until a 1.0 release, so version pinning\nis recommended, especially when you use it as a library.\n\n\n********\nSynopsis\n********\n\n.. code-block:: sh\n\n    skeem infer-ddl --dialect=postgresql data.ndjson\n\n.. code-block:: sql\n\n    CREATE TABLE \"data\" (\n        \"id\" SERIAL NOT NULL,\n        \"name\" TEXT NOT NULL,\n        \"date\" TIMESTAMP WITHOUT TIME ZONE,\n        \"fruits\" TEXT NOT NULL,\n        \"price\" DECIMAL(2, 2) NOT NULL,\n        PRIMARY KEY (\"id\")\n    );\n\n\n**********\nQuickstart\n**********\n\nIf you are in a hurry, and want to run Skeem without any installation, just use\nthe OCI image on Podman or Docker.\n\n.. code-block:: sh\n\n    docker run --rm ghcr.io/daq-tools/skeem-standard \\\n        skeem infer-ddl --dialect=postgresql \\\n        https://github.com/daq-tools/skeem/raw/main/tests/testdata/basic.ndjson\n\n\n*****\nSetup\n*****\n\nInstall Skeem from PyPI.\n\n.. code-block:: sh\n\n    pip install skeem\n\nInstall Skeem with support for additional data formats like NetCDF.\n\n.. code-block:: sh\n\n    pip install 'skeem[scientific]'\n\n\n*****\nUsage\n*****\n\nThis section outlines some example invocations of Skeem, both on the command\nline, and per library use. Other than the resources available from the web,\ntesting data can be acquired from the repository's `testdata`_ folder.\n\nCommand line use\n================\n\nHelp\n----\n\n.. code-block:: sh\n\n    skeem info\n    skeem --help\n    skeem infer-ddl --help\n\nRead from files\n---------------\n\n.. code-block:: sh\n\n    # NDJSON, Parquet, and InfluxDB line protocol (ILP) formats.\n    skeem infer-ddl --dialect=postgresql data.ndjson\n    skeem infer-ddl --dialect=postgresql data.parquet\n    skeem infer-ddl --dialect=postgresql data.lp\n\n    # CSV, JSON, ODS, and XLSX formats.\n    skeem infer-ddl --dialect=postgresql data.csv\n    skeem infer-ddl --dialect=postgresql data.json\n    skeem infer-ddl --dialect=postgresql data.ods\n    skeem infer-ddl --dialect=postgresql data.xlsx\n    skeem infer-ddl --dialect=postgresql data.xlsx --address=\"Sheet2\"\n\nRead from URLs\n--------------\n\n.. code-block:: sh\n\n    # CSV, NDJSON, XLSX\n    skeem infer-ddl --dialect=postgresql https://github.com/daq-tools/skeem/raw/main/tests/testdata/basic.csv\n    skeem infer-ddl --dialect=postgresql https://github.com/daq-tools/skeem/raw/main/tests/testdata/basic.ndjson\n    skeem infer-ddl --dialect=postgresql https://github.com/daq-tools/skeem/raw/main/tests/testdata/basic.xlsx --address=\"Sheet2\"\n\n    # Google Sheets: Address first sheet, and specific sheet of workbook.\n    skeem infer-ddl --dialect=postgresql --table-name=foo https://docs.google.com/spreadsheets/d/1ExyrawjlyksbC6DOM6nLolJDbU8qiRrrhxSuxf5ScB0/view\n    skeem infer-ddl --dialect=postgresql --table-name=foo https://docs.google.com/spreadsheets/d/1ExyrawjlyksbC6DOM6nLolJDbU8qiRrrhxSuxf5ScB0/view#gid=883324548\n\n    # InfluxDB line protocol (ILP)\n    skeem infer-ddl --dialect=postgresql https://github.com/influxdata/influxdb2-sample-data/raw/master/air-sensor-data/air-sensor-data.lp\n\n    # Compressed files in gzip format\n    skeem --verbose infer-ddl --dialect=crate --content-type=ndjson https://s3.amazonaws.com/crate.sampledata/nyc.yellowcab/yc.2019.07.gz\n\n    # CSV on S3\n    skeem --verbose infer-ddl --dialect=postgresql s3://noaa-ghcn-pds/csv/by_year/2022.csv\n\n    # CSV on Google Cloud Storage\n    skeem --verbose infer-ddl --dialect=postgresql gs://tinybird-assets/datasets/nations.csv\n    skeem --verbose infer-ddl --dialect=postgresql gs://tinybird-assets/datasets/medals1.csv\n\n    # CSV on GitHub\n    skeem --verbose infer-ddl --dialect=postgresql github://daq-tools:skeem@/tests/testdata/basic.csv\n\n    # GRIB2, NetCDF\n    skeem infer-ddl --dialect=postgresql https://github.com/earthobservations/testdata/raw/main/opendata.dwd.de/weather/nwp/icon/grib/18/t/icon-global_regular-lat-lon_air-temperature_level-90.grib2\n    skeem infer-ddl --dialect=postgresql https://www.unidata.ucar.edu/software/netcdf/examples/sresa1b_ncar_ccsm3-example.nc\n    skeem infer-ddl --dialect=postgresql https://www.unidata.ucar.edu/software/netcdf/examples/WMI_Lear.nc\n\nOCI\n---\n\nOCI images are available on the GitHub Container Registry (GHCR). In order to\nrun them on Podman or Docker, invoke:\n\n.. code-block:: sh\n\n    docker run --rm ghcr.io/daq-tools/skeem-standard \\\n        skeem infer-ddl --dialect=postgresql \\\n        https://github.com/daq-tools/skeem/raw/main/tests/testdata/basic.csv\n\nIf you want to work with files on your filesystem, you will need to either\nmount the working directory into the container using the ``--volume`` option,\nor use the ``--interactive`` option to consume STDIN, like:\n\n.. code-block:: sh\n\n    docker run --rm --volume=$(pwd):/data ghcr.io/daq-tools/skeem-standard \\\n        skeem infer-ddl --dialect=postgresql /data/basic.ndjson\n\n    docker run --rm --interactive ghcr.io/daq-tools/skeem-standard \\\n        skeem infer-ddl --dialect=postgresql --content-type=ndjson - < basic.ndjson\n\nIn order to always run the latest ``nightly`` development version, and to use a\nshortcut for that, this section outlines how to use an alias for ``skeem``, and\na variable for storing the input URL. It may be useful to save a few keystrokes\non subsequent invocations.\n\n.. code-block:: sh\n\n    docker pull ghcr.io/daq-tools/skeem-standard:nightly\n    alias skeem=\"docker run --rm --interactive ghcr.io/daq-tools/skeem-standard:nightly skeem\"\n    URL=https://github.com/daq-tools/skeem/raw/main/tests/testdata/basic.ndjson\n\n    skeem infer-ddl --dialect=postgresql $URL\n\n\nMore\n----\n\nUse a different backend (default: ``ddlgen``)::\n\n    skeem infer-ddl --dialect=postgresql --backend=frictionless data.ndjson\n\nReading data from STDIN needs to obtain both the table name and content type separately::\n\n    skeem infer-ddl --dialect=crate --table-name=foo --content-type=ndjson - < data.ndjson\n\nReading data from STDIN also works like this, if you prefer to use pipes::\n\n    cat data.ndjson | skeem infer-ddl --dialect=crate --table-name=foo --content-type=ndjson -\n\n\nLibrary use\n===========\n\n.. code-block:: python\n\n    import io\n    from skeem.core import SchemaGenerator\n    from skeem.model import Resource, SqlTarget\n\n    INDATA = io.StringIO(\n        \"\"\"\n        {\"id\":1,\"name\":\"foo\",\"date\":\"2014-10-31 09:22:56\",\"fruits\":\"apple,banana\",\"price\":0.42}\n        {\"id\":2,\"name\":\"bar\",\"date\":null,\"fruits\":\"pear\",\"price\":0.84}\n        \"\"\"\n    )\n\n    sg = SchemaGenerator(\n        resource=Resource(data=INDATA, content_type=\"ndjson\"),\n        target=SqlTarget(dialect=\"crate\", table_name=\"testdrive\"),\n    )\n\n    print(sg.to_sql_ddl().pretty)\n\n.. code-block:: sql\n\n    CREATE TABLE \"testdrive\" (\n        \"id\" INT NOT NULL,\n        \"name\" STRING NOT NULL,\n        \"date\" TIMESTAMP,\n        \"fruits\" STRING NOT NULL,\n        \"price\" DOUBLE NOT NULL,\n        PRIMARY KEY (\"id\")\n    );\n\n\n***********\nDevelopment\n***********\n\nFor installing the project from source, please follow the `development`_\ndocumentation.\n\n\n*******************\nProject information\n*******************\n\nCredits\n=======\n- `Catherine Devlin`_ for `ddlgenerator`_ and `data_dispenser`_.\n- `Mike Bayer`_ for `SQLAlchemy`_.\n- `Paul Walsh`_ and `Evgeny Karev`_ for `frictionless`_.\n- `Wes McKinney`_ for `pandas`_.\n- All other countless contributors and authors of excellent Python\n  packages, Python itself, and turtles all the way down.\n\nPrior art\n=========\nWe are maintaining a `list of other projects`_ with the same or similar goals\nlike Skeem.\n\nEtymology\n=========\nThe program was about to be called *Eskema*, but it turned out that there is\nalready another `Eskema`_ out there. So, it has been renamed to *Skeem*, which\nis Estonian, and means \"schema\", \"outline\", or \"(to) plan\".\n\n\n\n.. _Amazon S3: https://en.wikipedia.org/wiki/Amazon_S3\n.. _Apache Parquet: https://en.wikipedia.org/wiki/Apache_Parquet\n.. _Catherine Devlin: https://github.com/catherinedevlin\n.. _CSV: https://en.wikipedia.org/wiki/Comma-separated_values\n.. _data_dispenser: https://pypi.org/project/data_dispenser/\n.. _ddlgenerator: https://pypi.org/project/ddlgenerator/\n.. _development: doc/development.rst\n.. _Eskema: https://github.com/nombrekeff/eskema\n.. _Evgeny Karev: https://github.com/roll\n.. _file system: https://en.wikipedia.org/wiki/File_system\n.. _frictionless: https://github.com/frictionlessdata/framework\n.. _fsspec: https://pypi.org/project/fsspec/\n.. _GitHub: https://github.com/\n.. _Google Cloud Storage: https://en.wikipedia.org/wiki/Google_Cloud_Storage\n.. _Google Sheets: https://en.wikipedia.org/wiki/Google_Sheets\n.. _GRIB: https://en.wikipedia.org/wiki/GRIB\n.. _HTTP: https://en.wikipedia.org/wiki/HTTP\n.. _InfluxDB line protocol: https://docs.influxdata.com/influxdb/latest/reference/syntax/line-protocol/\n.. _JSON: https://www.json.org/\n.. _JSON Lines: https://jsonlines.org/\n.. _JSON streaming: https://en.wikipedia.org/wiki/JSON_streaming\n.. _LibreOffice: https://en.wikipedia.org/wiki/LibreOffice\n.. _list of other projects: doc/prior-art.rst\n.. _Microsoft Excel: https://en.wikipedia.org/wiki/Microsoft_Excel\n.. _Mike Bayer: https://github.com/zzzeek\n.. _NDJSON: http://ndjson.org/\n.. _NetCDF: https://en.wikipedia.org/wiki/NetCDF\n.. _Office Open XML Workbook: https://en.wikipedia.org/wiki/Office_Open_XML\n.. _OpenDocument Spreadsheet: https://en.wikipedia.org/wiki/OpenDocument\n.. _pandas: https://pandas.pydata.org/\n.. _Paul Walsh: https://github.com/pwalsh\n.. _ScipPy: https://scipy.org/\n.. _SQLAlchemy: https://pypi.org/project/SQLAlchemy/\n.. _testdata: https://github.com/daq-tools/skeem/tree/main/tests/testdata\n.. _Wes McKinney: https://github.com/wesm\n.. _xarray: https://xarray.dev/\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Infer SQL DDL statements from tabular data",
    "version": "0.1.1",
    "project_urls": {
        "changelog": "https://github.com/daq-tools/skeem/blob/main/CHANGES.rst",
        "documentation": "https://github.com/daq-tools/skeem",
        "homepage": "https://github.com/daq-tools/skeem",
        "repository": "https://github.com/daq-tools/skeem"
    },
    "split_keywords": [
        "csv",
        " ddl",
        " excel",
        " frictionless",
        " gcs",
        " github",
        " grib",
        " gsheet",
        " http",
        " infer",
        " inference",
        " influxdb",
        " introspection",
        " json",
        " jsonl",
        " ldjson",
        " netcdf",
        " ndjson",
        " pandas",
        " parquet",
        " rdbms",
        " reflection",
        " s3",
        " schema",
        " schema conversion",
        " schema inference",
        " spreadsheet",
        " sql",
        " sql-schema",
        " tabular-data"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "122534014ef5878f9d7846bbfef7b40f0bde1789e73dfe27b3b124158079b093",
                "md5": "885785242d15553e449eb69f58cd62a0",
                "sha256": "5c6c58cf4512e794b423252ab9739108dcef9e8631c8cc253f07137bdfa3cc4e"
            },
            "downloads": -1,
            "filename": "skeem-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "885785242d15553e449eb69f58cd62a0",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 42375,
            "upload_time": "2024-10-22T07:37:50",
            "upload_time_iso_8601": "2024-10-22T07:37:50.675683Z",
            "url": "https://files.pythonhosted.org/packages/12/25/34014ef5878f9d7846bbfef7b40f0bde1789e73dfe27b3b124158079b093/skeem-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "dcab96cd23c2d596b01736ea311303f019c0919f43699d736a7f7246f407280a",
                "md5": "5cb4fca084a4bb436c8c73a8e24e9ad0",
                "sha256": "2f5481750b7e34e13f2be498a19995abe9c3b9841fbf1f1d8c6f972f75ca3d1e"
            },
            "downloads": -1,
            "filename": "skeem-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "5cb4fca084a4bb436c8c73a8e24e9ad0",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 53724,
            "upload_time": "2024-10-22T07:37:52",
            "upload_time_iso_8601": "2024-10-22T07:37:52.257409Z",
            "url": "https://files.pythonhosted.org/packages/dc/ab/96cd23c2d596b01736ea311303f019c0919f43699d736a7f7246f407280a/skeem-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-22 07:37:52",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "daq-tools",
    "github_project": "skeem",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "skeem"
}
        
Elapsed time: 1.14853s