tensql


Nametensql JSON
Version 1.0.0 PyPI version JSON
download
home_page
SummaryTenSQL
upload_time2023-10-07 12:12:28
maintainerJon Roose
docs_urlNone
authorJon Roose
requires_python<3.10,>=3.9
licenseBSD Clause 3
keywords sparse linear algebra tensor sql
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # TenSQL
Relational Database Management Systems (RDBMS) have been the most prominent
form of database in the world for several decades. While relational databases
are often applied within high-frequency/low-volume transactional applications
such as website backends, the poor performance of relational databases on
low-frequency/high-volume queries often precludes their application to big data
analysis fields like graph analytics. This work explores the construction of an
RDBMS solution that uses the GraphBLAS API to execute Structured Query Language
(SQL) in an effort to improve performance on high-volume queries. Tables are
redefined to be collections of sparse scalars, vectors, matrices, and more
generally sparse tensors. The explicit values (nonzeros) in these sparse
tensors define the rows and NULL values within the tables. A prototype database
called TenSQL was constructed and evaluated against several SQL implementations
including PostgreSQL. Preliminary results comparing the performance on queries
common in graph analysis applications offer performance improvements as high as
1,400x over PostgreSQL for moderately sized datasets when returning results in
a columnar format.

## Authors
TenSQL was created by Sandia National Laboratories, with assistance provided by
the University of Utah.  

## Installation
TenSQL has only been tested with Python 3.9.  Python 3.10 is too new for the
version of numpy supported by pygraphblas.  

To install from PyPI:
```
pip install tensql
```

To install from source:
```
git clone 'https://github.com/sandialabs/TenSQL.git'
cd TenSQL
pip install .
```

## Testing
To run the tests, you must first clone the sourcecode from github, and then
build the extensions and install testing dependencies.
```
git clone 'https://github.com/sandialabs/TenSQL.git'
cd TenSQL
python setup.py build_ext --inplace
pip install -e ".[test]"
```

The tests can then be run either with the `run_tests.py` script which outputs
code coverage information:
```
python3 run_tests.py
```

Or via python's built-in unittest module
```
python3 -m unittest -v tensql.test
```

Specific tests can be run via the unittest module:
```
python3 -m unittest -v tensql.test.test_queries.xAy.TestQuery_xAy
```

Note: Certain tests for memory leaks can take about a minute to execute.

## Running Benchamrks
To run the benchmarks, you must first clone the sourcecode from github, and then
build the extensions and install testing dependencies.
```
git clone 'https://github.com/sandialabs/TenSQL.git'
cd TenSQL
python setup.py build_ext --inplace
pip install -e ".[test,benchmark]"
```

You must also install PostgreSQL 15 to run the postgres tests.

Once installed, you can run the benchmarks via slurm with:
```
bash download_benchmark_data.sh
bash benchmark_twohop.sh
bash benchmark_ingest_and_named_edges.sh
```

Alternatively, you can run single benchmarks (without slurm) like this:
```
bash download_benchmark_data.sh
bash single_twohop.sh "`pwd`/tmp" "`pwd`/results" all
bash single_ingest_and_named_edges.sh "`pwd`/tmp" "`pwd`/results" all
```

Note: You will likely need to tune the settings in `postgresql.conf` if your
system has less memory than our benchmarking system.

Note: Running some benchmarks requires a very large amount of memory (hundreds
of gigabytes).  

## Citing TenSQL
TenSQL was described in the paper "An SQL Database Built on GraphBLAS", which
was accepeted by the IEEE High Performance Extreme Computing Virtual Conference
in September 2023.  It has not yet been published in IEEE Xplore.
```
Roose, J. P., Vaidya, M., Sadayappan, P., & Rajamanickam, S. (2023). TenSQL: An SQL Database Built on GraphBLAS. 
IEEE High Performance Extreme Computing Virtual Conference, forthcoming.
```

BSD 3-Clause License

Copyright 2023 National Technology & Engineering Solutions of Sandia, LLC 
(NTESS). Under the terms of Contract DE-NA0003525 with NTESS, the U.S. 
Government retains certain rights in this software.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this
   list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice,
   this list of conditions and the following disclaimer in the documentation
   and/or other materials provided with the distribution.

3. Neither the name of the copyright holder nor the names of its
   contributors may be used to endorse or promote products derived from
   this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "tensql",
    "maintainer": "Jon Roose",
    "docs_url": null,
    "requires_python": "<3.10,>=3.9",
    "maintainer_email": "jproose@sandia.gov",
    "keywords": "sparse linear algebra tensor sql",
    "author": "Jon Roose",
    "author_email": "jproose@sandia.gov",
    "download_url": "",
    "platform": null,
    "description": "# TenSQL\nRelational Database Management Systems (RDBMS) have been the most prominent\nform of database in the world for several decades. While relational databases\nare often applied within high-frequency/low-volume transactional applications\nsuch as website backends, the poor performance of relational databases on\nlow-frequency/high-volume queries often precludes their application to big data\nanalysis fields like graph analytics. This work explores the construction of an\nRDBMS solution that uses the GraphBLAS API to execute Structured Query Language\n(SQL) in an effort to improve performance on high-volume queries. Tables are\nredefined to be collections of sparse scalars, vectors, matrices, and more\ngenerally sparse tensors. The explicit values (nonzeros) in these sparse\ntensors define the rows and NULL values within the tables. A prototype database\ncalled TenSQL was constructed and evaluated against several SQL implementations\nincluding PostgreSQL. Preliminary results comparing the performance on queries\ncommon in graph analysis applications offer performance improvements as high as\n1,400x over PostgreSQL for moderately sized datasets when returning results in\na columnar format.\n\n## Authors\nTenSQL was created by Sandia National Laboratories, with assistance provided by\nthe University of Utah.  \n\n## Installation\nTenSQL has only been tested with Python 3.9.  Python 3.10 is too new for the\nversion of numpy supported by pygraphblas.  \n\nTo install from PyPI:\n```\npip install tensql\n```\n\nTo install from source:\n```\ngit clone 'https://github.com/sandialabs/TenSQL.git'\ncd TenSQL\npip install .\n```\n\n## Testing\nTo run the tests, you must first clone the sourcecode from github, and then\nbuild the extensions and install testing dependencies.\n```\ngit clone 'https://github.com/sandialabs/TenSQL.git'\ncd TenSQL\npython setup.py build_ext --inplace\npip install -e \".[test]\"\n```\n\nThe tests can then be run either with the `run_tests.py` script which outputs\ncode coverage information:\n```\npython3 run_tests.py\n```\n\nOr via python's built-in unittest module\n```\npython3 -m unittest -v tensql.test\n```\n\nSpecific tests can be run via the unittest module:\n```\npython3 -m unittest -v tensql.test.test_queries.xAy.TestQuery_xAy\n```\n\nNote: Certain tests for memory leaks can take about a minute to execute.\n\n## Running Benchamrks\nTo run the benchmarks, you must first clone the sourcecode from github, and then\nbuild the extensions and install testing dependencies.\n```\ngit clone 'https://github.com/sandialabs/TenSQL.git'\ncd TenSQL\npython setup.py build_ext --inplace\npip install -e \".[test,benchmark]\"\n```\n\nYou must also install PostgreSQL 15 to run the postgres tests.\n\nOnce installed, you can run the benchmarks via slurm with:\n```\nbash download_benchmark_data.sh\nbash benchmark_twohop.sh\nbash benchmark_ingest_and_named_edges.sh\n```\n\nAlternatively, you can run single benchmarks (without slurm) like this:\n```\nbash download_benchmark_data.sh\nbash single_twohop.sh \"`pwd`/tmp\" \"`pwd`/results\" all\nbash single_ingest_and_named_edges.sh \"`pwd`/tmp\" \"`pwd`/results\" all\n```\n\nNote: You will likely need to tune the settings in `postgresql.conf` if your\nsystem has less memory than our benchmarking system.\n\nNote: Running some benchmarks requires a very large amount of memory (hundreds\nof gigabytes).  \n\n## Citing TenSQL\nTenSQL was described in the paper \"An SQL Database Built on GraphBLAS\", which\nwas accepeted by the IEEE High Performance Extreme Computing Virtual Conference\nin September 2023.  It has not yet been published in IEEE Xplore.\n```\nRoose, J. P., Vaidya, M., Sadayappan, P., & Rajamanickam, S. (2023). TenSQL: An SQL Database Built on GraphBLAS. \nIEEE High Performance Extreme Computing Virtual Conference, forthcoming.\n```\n\nBSD 3-Clause License\n\nCopyright 2023 National Technology & Engineering Solutions of Sandia, LLC \n(NTESS). Under the terms of Contract DE-NA0003525 with NTESS, the U.S. \nGovernment retains certain rights in this software.\n\nRedistribution and use in source and binary forms, with or without\nmodification, are permitted provided that the following conditions are met:\n\n1. Redistributions of source code must retain the above copyright notice, this\n   list of conditions and the following disclaimer.\n\n2. Redistributions in binary form must reproduce the above copyright notice,\n   this list of conditions and the following disclaimer in the documentation\n   and/or other materials provided with the distribution.\n\n3. Neither the name of the copyright holder nor the names of its\n   contributors may be used to endorse or promote products derived from\n   this software without specific prior written permission.\n\nTHIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS \"AS IS\"\nAND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE\nIMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE\nDISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE\nFOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL\nDAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR\nSERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER\nCAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,\nOR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE\nOF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.\n",
    "bugtrack_url": null,
    "license": "BSD Clause 3",
    "summary": "TenSQL",
    "version": "1.0.0",
    "project_urls": null,
    "split_keywords": [
        "sparse",
        "linear",
        "algebra",
        "tensor",
        "sql"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5c68fe5e185e4bea54be98ed14c328fe2abcbcf1d50aa95d0694b2db2e2a579e",
                "md5": "40ac9071da2fc7e46e8684663acffb4d",
                "sha256": "53a5d3fe3c6030a1ee18379080d45dfe7d04bc1c318a9341ce9fb527ffc23793"
            },
            "downloads": -1,
            "filename": "tensql-1.0.0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl",
            "has_sig": false,
            "md5_digest": "40ac9071da2fc7e46e8684663acffb4d",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": "<3.10,>=3.9",
            "size": 427303,
            "upload_time": "2023-10-07T12:12:28",
            "upload_time_iso_8601": "2023-10-07T12:12:28.142321Z",
            "url": "https://files.pythonhosted.org/packages/5c/68/fe5e185e4bea54be98ed14c328fe2abcbcf1d50aa95d0694b2db2e2a579e/tensql-1.0.0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-10-07 12:12:28",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "tensql"
}
        
Elapsed time: 0.12849s