qvd-utils


Nameqvd-utils JSON
Version 0.1.15 PyPI version JSON
download
home_page
SummaryA library for reading Qlik Sense .qvd file format from Python, written in Rust.
upload_time2024-02-28 19:22:44
maintainer
docs_urlNone
authorHugo Tallys <hgtllys@gmail.com>
requires_python
licenseApache-2.0
keywords python rust qlik sense qvd
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Read Qlik Sense .qvd files 🛠
[![CI pipeline](https://github.com/SBentley/qvd-utils/actions/workflows/CI.yml/badge.svg)](https://github.com/SBentley/qvd-utils/actions/workflows/CI.yml)

A python library for reading Qlik Sense .qvd file format, written in Rust.
Files can be read to DataFrame or dictionary. Large files can be read in parts.

## Install

Install from PyPi:

```sh
pip install qvd_utils
```

## Usage

```python
from qvd_utils import qvd_reader

df = qvd_reader.read('test.qvd')
print(df)
```

For large files specify a `chunk_size` parameter get a generator of dicts:

```python
import pandas as pd
from qvd_utils import qvd_reader

chunks = qvd_reader.read_in_chunks('test.qvd', chunk_size=1000)

for chunk in chunks:
    df = pd.DataFrame.from_dict(chunk)
    print(df)
```

### Developing

Create a virtual env https://docs.python-guide.org/dev/virtualenvs/ and activate it.

```sh
python3 -m venv venv
```

Then install dev dependencies:

```sh
pip install pandas maturin
```

Afterwards, run 

```sh
maturin develop --release
```

to install the generated python lib to the virtual env.

## Test

To run the tests, you can use these commands:

```sh
cargo test  # runs all Rust unit tests
pytest test_qvd_reader.py  # runs all Python tests
```

## QVD File Structure

A QVD file is split into 3 parts; XML Metdata, Symbols table and the bit
stuffed binary indexes.

### XML Metadata

This section is at the top of the file and is in human readable XML. This
section contains metadata about the file in gneneral such as table name, number
of records, size of records as well as data about individual fields including
field name, length offset in symbol table.

### Symbol table

Directly after the xml section is the symbol table. This is a table of every
unique value contained within each column. The columns are in the order
described in the metadata fields section. In the metadata we can find the byte
offset from the start of the symbols section for each column. Symbol types
cannot be determined from the metadata and are instead determined by a flag
byte preceding each symbol. These types are:

* 1 - 4 byte signed int (u32) - little endiand
* 2 - 8 byte signed float (f64) - little endian
* 4 - null terminated string
* 5 - 4 bytes of junk follwed by a null terminated string representing an integer
* 6 - 8 bytes of junk followed by a null terminated string representing a float

### Binary Indexes

After the symbol table are the binary indexes that map to the symbols for each
row. They are bit stuffed and reversed binary numbers that point to the index
of the symbol in the symbols table for each field.


            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "qvd-utils",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "python,rust,qlik,sense,qvd",
    "author": "Hugo Tallys <hgtllys@gmail.com>",
    "author_email": "Hugo Tallys <hgtllys@gmail.com>",
    "download_url": "",
    "platform": null,
    "description": "# Read Qlik Sense .qvd files \ud83d\udee0\n[![CI pipeline](https://github.com/SBentley/qvd-utils/actions/workflows/CI.yml/badge.svg)](https://github.com/SBentley/qvd-utils/actions/workflows/CI.yml)\n\nA python library for reading Qlik Sense .qvd file format, written in Rust.\nFiles can be read to DataFrame or dictionary. Large files can be read in parts.\n\n## Install\n\nInstall from PyPi:\n\n```sh\npip install qvd_utils\n```\n\n## Usage\n\n```python\nfrom qvd_utils import qvd_reader\n\ndf = qvd_reader.read('test.qvd')\nprint(df)\n```\n\nFor large files specify a `chunk_size` parameter get a generator of dicts:\n\n```python\nimport pandas as pd\nfrom qvd_utils import qvd_reader\n\nchunks = qvd_reader.read_in_chunks('test.qvd', chunk_size=1000)\n\nfor chunk in chunks:\n    df = pd.DataFrame.from_dict(chunk)\n    print(df)\n```\n\n### Developing\n\nCreate a virtual env https://docs.python-guide.org/dev/virtualenvs/ and activate it.\n\n```sh\npython3 -m venv venv\n```\n\nThen install dev dependencies:\n\n```sh\npip install pandas maturin\n```\n\nAfterwards, run \n\n```sh\nmaturin develop --release\n```\n\nto install the generated python lib to the virtual env.\n\n## Test\n\nTo run the tests, you can use these commands:\n\n```sh\ncargo test  # runs all Rust unit tests\npytest test_qvd_reader.py  # runs all Python tests\n```\n\n## QVD File Structure\n\nA QVD file is split into 3 parts; XML Metdata, Symbols table and the bit\nstuffed binary indexes.\n\n### XML Metadata\n\nThis section is at the top of the file and is in human readable XML. This\nsection contains metadata about the file in gneneral such as table name, number\nof records, size of records as well as data about individual fields including\nfield name, length offset in symbol table.\n\n### Symbol table\n\nDirectly after the xml section is the symbol table. This is a table of every\nunique value contained within each column. The columns are in the order\ndescribed in the metadata fields section. In the metadata we can find the byte\noffset from the start of the symbols section for each column. Symbol types\ncannot be determined from the metadata and are instead determined by a flag\nbyte preceding each symbol. These types are:\n\n* 1 - 4 byte signed int (u32) - little endiand\n* 2 - 8 byte signed float (f64) - little endian\n* 4 - null terminated string\n* 5 - 4 bytes of junk follwed by a null terminated string representing an integer\n* 6 - 8 bytes of junk followed by a null terminated string representing a float\n\n### Binary Indexes\n\nAfter the symbol table are the binary indexes that map to the symbols for each\nrow. They are bit stuffed and reversed binary numbers that point to the index\nof the symbol in the symbols table for each field.\n\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "A library for reading Qlik Sense .qvd file format from Python, written in Rust.",
    "version": "0.1.15",
    "project_urls": {
        "Source Code": "https://github.com/hugotallys/qvd-utils"
    },
    "split_keywords": [
        "python",
        "rust",
        "qlik",
        "sense",
        "qvd"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b7bf794b488e46de33767ad8e407aec5396e87b55e26e8fa0195629c8becd22a",
                "md5": "84c9d0964c13819feff8965fdf8ce50c",
                "sha256": "aa421b3e7781e5483e7df72a212cc8db1677defd7bf68ae3d6a5196d8de3951b"
            },
            "downloads": -1,
            "filename": "qvd_utils-0.1.15-cp310-cp310-manylinux_2_34_x86_64.whl",
            "has_sig": false,
            "md5_digest": "84c9d0964c13819feff8965fdf8ce50c",
            "packagetype": "bdist_wheel",
            "python_version": "cp310",
            "requires_python": null,
            "size": 3154320,
            "upload_time": "2024-02-28T19:22:44",
            "upload_time_iso_8601": "2024-02-28T19:22:44.554103Z",
            "url": "https://files.pythonhosted.org/packages/b7/bf/794b488e46de33767ad8e407aec5396e87b55e26e8fa0195629c8becd22a/qvd_utils-0.1.15-cp310-cp310-manylinux_2_34_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-28 19:22:44",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "hugotallys",
    "github_project": "qvd-utils",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "qvd-utils"
}
        
Elapsed time: 0.18800s