tabled


Nametabled JSON
Version 0.1.17 PyPI version JSON
download
home_pagehttps://github.com/i2mint/tabled
SummaryA (key-value) data-object-layer to get (pandas) tables from a variety of sources with ease
upload_time2024-12-17 13:15:30
maintainerNone
docs_urlNone
authorThor Whalen
requires_pythonNone
licenseapache-2.0
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            
# tabled

A (key-value) data-object-layer to get (pandas) tables from a variety of sources with ease

To install:	```pip install tabled```





```python

```

# DfFiles

This notebook demonstrates how to use `DfFiles` to store and retrieve pandas DataFrames using various file formats.

## Setup

First, let's import required packages and define our test data:


```python
import os
import shutil
import tempfile

import pandas as pd
from tabled import DfFiles

# Test data dictionary
misc_small_dicts = {
    "fantasy_tavern_menu": {
        "item": ["Dragon Ale", "Elf Bread", "Goblin Stew"],
        "price": [7.5, 3.0, 5.5],
        "is_alcoholic": [True, False, False],
        "servings_left": [12, 25, 8],
    },
    "alien_abduction_log": {
        "abductee_name": ["Bob", "Alice", "Zork"],
        "location": ["Kansas City", "Roswell", "Jupiter"],
        "duration_minutes": [15, 120, 30],
        "was_returned": [True, False, True],
    }
}
```

## Creating Test Directory

We'll create a temporary directory for our files:


```python
def create_test_directory():
    # Create a directory for the test files
    rootdir = os.path.join(tempfile.gettempdir(), 'tabled_df_files_test')
    if os.path.exists(rootdir):
        shutil.rmtree(rootdir)
    os.makedirs(rootdir)
    print(f"Created directory at: {rootdir}")
    return rootdir

rootdir = create_test_directory()
print(f"Created directory at: {rootdir}")
```

    Created directory at: /var/folders/mc/c070wfh51kxd9lft8dl74q1r0000gn/T/tabled_df_files_test
    Created directory at: /var/folders/mc/c070wfh51kxd9lft8dl74q1r0000gn/T/tabled_df_files_test


## Initialize DfFiles

Create a new DfFiles instance pointing to our directory:


```python
df_files = DfFiles(rootdir)
```

Let's verify it starts empty:


```python
list(df_files)
```




    []



## Creating and Saving DataFrames

Let's create DataFrames from our test data:


```python
fantasy_tavern_menu_df = pd.DataFrame(misc_small_dicts['fantasy_tavern_menu'])
alien_abduction_log_df = pd.DataFrame(misc_small_dicts['alien_abduction_log'])

print("Fantasy Tavern Menu:")
display(fantasy_tavern_menu_df)
print("\nAlien Abduction Log:")
display(alien_abduction_log_df)
```

    Fantasy Tavern Menu:



<div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>item</th>
      <th>price</th>
      <th>is_alcoholic</th>
      <th>servings_left</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>Dragon Ale</td>
      <td>7.5</td>
      <td>True</td>
      <td>12</td>
    </tr>
    <tr>
      <th>1</th>
      <td>Elf Bread</td>
      <td>3.0</td>
      <td>False</td>
      <td>25</td>
    </tr>
    <tr>
      <th>2</th>
      <td>Goblin Stew</td>
      <td>5.5</td>
      <td>False</td>
      <td>8</td>
    </tr>
  </tbody>
</table>
</div>


    
    Alien Abduction Log:



<div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>abductee_name</th>
      <th>location</th>
      <th>duration_minutes</th>
      <th>was_returned</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>Bob</td>
      <td>Kansas City</td>
      <td>15</td>
      <td>True</td>
    </tr>
    <tr>
      <th>1</th>
      <td>Alice</td>
      <td>Roswell</td>
      <td>120</td>
      <td>False</td>
    </tr>
    <tr>
      <th>2</th>
      <td>Zork</td>
      <td>Jupiter</td>
      <td>30</td>
      <td>True</td>
    </tr>
  </tbody>
</table>
</div>


Now let's save these DataFrames using different formats:


```python
df_files['fantasy_tavern_menu.csv'] = fantasy_tavern_menu_df
df_files['alien_abduction_log.json'] = alien_abduction_log_df
```

## Reading Data Back

Let's verify we can read the data back correctly:


```python
saved_df = df_files['fantasy_tavern_menu.csv']
saved_df
```


<div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>item</th>
      <th>price</th>
      <th>is_alcoholic</th>
      <th>servings_left</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>Dragon Ale</td>
      <td>7.5</td>
      <td>True</td>
      <td>12</td>
    </tr>
    <tr>
      <th>1</th>
      <td>Elf Bread</td>
      <td>3.0</td>
      <td>False</td>
      <td>25</td>
    </tr>
    <tr>
      <th>2</th>
      <td>Goblin Stew</td>
      <td>5.5</td>
      <td>False</td>
      <td>8</td>
    </tr>
  </tbody>
</table>
</div>



## MutableMapping Interface

DfFiles implements the MutableMapping interface, making it behave like a dictionary.

Let's see how many files we have:


```python
len(df_files)
```




    2



List all available files:


```python
list(df_files)
```




    ['fantasy_tavern_menu.csv', 'alien_abduction_log.json']



Check if a file exists:


```python
'fantasy_tavern_menu.csv' in df_files
```




    True



## Supported File Extensions

Let's see what file formats DfFiles supports out of the box.

(**Note that some of these will require installing extra packages, which you'll realize if you get an ImportError**)


```python
print("Encoder supported extensions:")
list_of_encoder_supported_extensions = list(df_files.extension_encoder_mapping)
print(*list_of_encoder_supported_extensions, sep=', ')
```

    Encoder supported extensions:
    csv, txt, tsv, json, html, p, pickle, pkl, npy, parquet, zip, feather, h5, hdf5, stata, dta, sql, sqlite, gbq, xls, xlsx, xml, orc



```python
print("Decoder supported extensions:")
list_of_decoder_supported_extensions = list(df_files.extension_decoder_mapping)
print(*list_of_decoder_supported_extensions, sep=', ')
```

    Decoder supported extensions:
    csv, txt, tsv, parquet, json, html, p, pickle, pkl, xml, sql, sqlite, feather, stata, dta, sas, h5, hdf5, xls, xlsx, orc, sav


## Testing Different Extensions

Let's try saving and loading our test DataFrame in different formats:


```python
extensions_supported_by_encoder_and_decoder = (
    set(list_of_encoder_supported_extensions) & set(list_of_decoder_supported_extensions)
)
sorted(extensions_supported_by_encoder_and_decoder)
```


    ['csv',
     'dta',
     'feather',
     'h5',
     'hdf5',
     'html',
     'json',
     'orc',
     'p',
     'parquet',
     'pickle',
     'pkl',
     'sql',
     'sqlite',
     'stata',
     'tsv',
     'txt',
     'xls',
     'xlsx',
     'xml']




```python

```


```python
def test_extension(ext):
    filename = f'test_file.{ext}'
    try:
        df_files[filename] = fantasy_tavern_menu_df
        df_loaded = df_files[filename]
        # test the decoded df is the same as the one that was saved (round-trip test)
        # Note that we drop the index, since the index is not saved in the file by default for all codecs
        pd.testing.assert_frame_equal(
            fantasy_tavern_menu_df.reset_index(drop=True),
            df_loaded.reset_index(drop=True),
        )
        return True
    except Exception as e:
        return False


test_extensions = [
    'csv',
    'feather',
    'json',
    'orc',
    'parquet',
    'pkl',
    'tsv',  
    # 'dta',  # TODO: fix
    # 'h5',  # TODO: fix
    # 'html',  # TODO: fix
    # 'sql',  # TODO: fix
    # 'xml',  # TODO: fix
]

for ext in test_extensions:
    print("Testing extension:", ext)
    success = test_extension(ext)
    if success:
        print(f"\tExtension {ext}: ✓")
    else:
        print('\033[91m' + f"\tFix extension {ext}: ✗" + '\033[0m')
        
    # marker = '✓' if success else '\033[91m✗\033[0m'
    # print(f"\tExtension {ext}: {marker}")
```

    Testing extension: csv
    	Extension csv: ✓
    Testing extension: feather
    	Extension feather: ✓
    Testing extension: json
    	Extension json: ✓
    Testing extension: orc
    	Extension orc: ✓
    Testing extension: parquet
    	Extension parquet: ✓
    Testing extension: pkl
    	Extension pkl: ✓
    Testing extension: tsv
    	Extension tsv: ✓
    Testing extension: dta
    	Fix extension dta: ✗
    Testing extension: h5
    	Fix extension h5: ✗
    Testing extension: html
    	Fix extension html: ✗
    Testing extension: sql
    	Fix extension sql: ✗
    Testing extension: xml
    	Fix extension xml: ✗



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/i2mint/tabled",
    "name": "tabled",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": null,
    "author": "Thor Whalen",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/22/fc/d0d6933d3cd2f436aa612bd276596040396ea91847e0d7ff1be67d10f337/tabled-0.1.17.tar.gz",
    "platform": "any",
    "description": "\n# tabled\n\nA (key-value) data-object-layer to get (pandas) tables from a variety of sources with ease\n\nTo install:\t```pip install tabled```\n\n\n\n\n\n```python\n\n```\n\n# DfFiles\n\nThis notebook demonstrates how to use `DfFiles` to store and retrieve pandas DataFrames using various file formats.\n\n## Setup\n\nFirst, let's import required packages and define our test data:\n\n\n```python\nimport os\nimport shutil\nimport tempfile\n\nimport pandas as pd\nfrom tabled import DfFiles\n\n# Test data dictionary\nmisc_small_dicts = {\n    \"fantasy_tavern_menu\": {\n        \"item\": [\"Dragon Ale\", \"Elf Bread\", \"Goblin Stew\"],\n        \"price\": [7.5, 3.0, 5.5],\n        \"is_alcoholic\": [True, False, False],\n        \"servings_left\": [12, 25, 8],\n    },\n    \"alien_abduction_log\": {\n        \"abductee_name\": [\"Bob\", \"Alice\", \"Zork\"],\n        \"location\": [\"Kansas City\", \"Roswell\", \"Jupiter\"],\n        \"duration_minutes\": [15, 120, 30],\n        \"was_returned\": [True, False, True],\n    }\n}\n```\n\n## Creating Test Directory\n\nWe'll create a temporary directory for our files:\n\n\n```python\ndef create_test_directory():\n    # Create a directory for the test files\n    rootdir = os.path.join(tempfile.gettempdir(), 'tabled_df_files_test')\n    if os.path.exists(rootdir):\n        shutil.rmtree(rootdir)\n    os.makedirs(rootdir)\n    print(f\"Created directory at: {rootdir}\")\n    return rootdir\n\nrootdir = create_test_directory()\nprint(f\"Created directory at: {rootdir}\")\n```\n\n    Created directory at: /var/folders/mc/c070wfh51kxd9lft8dl74q1r0000gn/T/tabled_df_files_test\n    Created directory at: /var/folders/mc/c070wfh51kxd9lft8dl74q1r0000gn/T/tabled_df_files_test\n\n\n## Initialize DfFiles\n\nCreate a new DfFiles instance pointing to our directory:\n\n\n```python\ndf_files = DfFiles(rootdir)\n```\n\nLet's verify it starts empty:\n\n\n```python\nlist(df_files)\n```\n\n\n\n\n    []\n\n\n\n## Creating and Saving DataFrames\n\nLet's create DataFrames from our test data:\n\n\n```python\nfantasy_tavern_menu_df = pd.DataFrame(misc_small_dicts['fantasy_tavern_menu'])\nalien_abduction_log_df = pd.DataFrame(misc_small_dicts['alien_abduction_log'])\n\nprint(\"Fantasy Tavern Menu:\")\ndisplay(fantasy_tavern_menu_df)\nprint(\"\\nAlien Abduction Log:\")\ndisplay(alien_abduction_log_df)\n```\n\n    Fantasy Tavern Menu:\n\n\n\n<div>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>item</th>\n      <th>price</th>\n      <th>is_alcoholic</th>\n      <th>servings_left</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>Dragon Ale</td>\n      <td>7.5</td>\n      <td>True</td>\n      <td>12</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>Elf Bread</td>\n      <td>3.0</td>\n      <td>False</td>\n      <td>25</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>Goblin Stew</td>\n      <td>5.5</td>\n      <td>False</td>\n      <td>8</td>\n    </tr>\n  </tbody>\n</table>\n</div>\n\n\n    \n    Alien Abduction Log:\n\n\n\n<div>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>abductee_name</th>\n      <th>location</th>\n      <th>duration_minutes</th>\n      <th>was_returned</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>Bob</td>\n      <td>Kansas City</td>\n      <td>15</td>\n      <td>True</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>Alice</td>\n      <td>Roswell</td>\n      <td>120</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>Zork</td>\n      <td>Jupiter</td>\n      <td>30</td>\n      <td>True</td>\n    </tr>\n  </tbody>\n</table>\n</div>\n\n\nNow let's save these DataFrames using different formats:\n\n\n```python\ndf_files['fantasy_tavern_menu.csv'] = fantasy_tavern_menu_df\ndf_files['alien_abduction_log.json'] = alien_abduction_log_df\n```\n\n## Reading Data Back\n\nLet's verify we can read the data back correctly:\n\n\n```python\nsaved_df = df_files['fantasy_tavern_menu.csv']\nsaved_df\n```\n\n\n<div>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>item</th>\n      <th>price</th>\n      <th>is_alcoholic</th>\n      <th>servings_left</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>Dragon Ale</td>\n      <td>7.5</td>\n      <td>True</td>\n      <td>12</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>Elf Bread</td>\n      <td>3.0</td>\n      <td>False</td>\n      <td>25</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>Goblin Stew</td>\n      <td>5.5</td>\n      <td>False</td>\n      <td>8</td>\n    </tr>\n  </tbody>\n</table>\n</div>\n\n\n\n## MutableMapping Interface\n\nDfFiles implements the MutableMapping interface, making it behave like a dictionary.\n\nLet's see how many files we have:\n\n\n```python\nlen(df_files)\n```\n\n\n\n\n    2\n\n\n\nList all available files:\n\n\n```python\nlist(df_files)\n```\n\n\n\n\n    ['fantasy_tavern_menu.csv', 'alien_abduction_log.json']\n\n\n\nCheck if a file exists:\n\n\n```python\n'fantasy_tavern_menu.csv' in df_files\n```\n\n\n\n\n    True\n\n\n\n## Supported File Extensions\n\nLet's see what file formats DfFiles supports out of the box.\n\n(**Note that some of these will require installing extra packages, which you'll realize if you get an ImportError**)\n\n\n```python\nprint(\"Encoder supported extensions:\")\nlist_of_encoder_supported_extensions = list(df_files.extension_encoder_mapping)\nprint(*list_of_encoder_supported_extensions, sep=', ')\n```\n\n    Encoder supported extensions:\n    csv, txt, tsv, json, html, p, pickle, pkl, npy, parquet, zip, feather, h5, hdf5, stata, dta, sql, sqlite, gbq, xls, xlsx, xml, orc\n\n\n\n```python\nprint(\"Decoder supported extensions:\")\nlist_of_decoder_supported_extensions = list(df_files.extension_decoder_mapping)\nprint(*list_of_decoder_supported_extensions, sep=', ')\n```\n\n    Decoder supported extensions:\n    csv, txt, tsv, parquet, json, html, p, pickle, pkl, xml, sql, sqlite, feather, stata, dta, sas, h5, hdf5, xls, xlsx, orc, sav\n\n\n## Testing Different Extensions\n\nLet's try saving and loading our test DataFrame in different formats:\n\n\n```python\nextensions_supported_by_encoder_and_decoder = (\n    set(list_of_encoder_supported_extensions) & set(list_of_decoder_supported_extensions)\n)\nsorted(extensions_supported_by_encoder_and_decoder)\n```\n\n\n    ['csv',\n     'dta',\n     'feather',\n     'h5',\n     'hdf5',\n     'html',\n     'json',\n     'orc',\n     'p',\n     'parquet',\n     'pickle',\n     'pkl',\n     'sql',\n     'sqlite',\n     'stata',\n     'tsv',\n     'txt',\n     'xls',\n     'xlsx',\n     'xml']\n\n\n\n\n```python\n\n```\n\n\n```python\ndef test_extension(ext):\n    filename = f'test_file.{ext}'\n    try:\n        df_files[filename] = fantasy_tavern_menu_df\n        df_loaded = df_files[filename]\n        # test the decoded df is the same as the one that was saved (round-trip test)\n        # Note that we drop the index, since the index is not saved in the file by default for all codecs\n        pd.testing.assert_frame_equal(\n            fantasy_tavern_menu_df.reset_index(drop=True),\n            df_loaded.reset_index(drop=True),\n        )\n        return True\n    except Exception as e:\n        return False\n\n\ntest_extensions = [\n    'csv',\n    'feather',\n    'json',\n    'orc',\n    'parquet',\n    'pkl',\n    'tsv',  \n    # 'dta',  # TODO: fix\n    # 'h5',  # TODO: fix\n    # 'html',  # TODO: fix\n    # 'sql',  # TODO: fix\n    # 'xml',  # TODO: fix\n]\n\nfor ext in test_extensions:\n    print(\"Testing extension:\", ext)\n    success = test_extension(ext)\n    if success:\n        print(f\"\\tExtension {ext}: \u2713\")\n    else:\n        print('\\033[91m' + f\"\\tFix extension {ext}: \u2717\" + '\\033[0m')\n        \n    # marker = '\u2713' if success else '\\033[91m\u2717\\033[0m'\n    # print(f\"\\tExtension {ext}: {marker}\")\n```\n\n    Testing extension: csv\n    \tExtension csv: \u2713\n    Testing extension: feather\n    \tExtension feather: \u2713\n    Testing extension: json\n    \tExtension json: \u2713\n    Testing extension: orc\n    \tExtension orc: \u2713\n    Testing extension: parquet\n    \tExtension parquet: \u2713\n    Testing extension: pkl\n    \tExtension pkl: \u2713\n    Testing extension: tsv\n    \tExtension tsv: \u2713\n    Testing extension: dta\n    \u001b[91m\tFix extension dta: \u2717\u001b[0m\n    Testing extension: h5\n    \u001b[91m\tFix extension h5: \u2717\u001b[0m\n    Testing extension: html\n    \u001b[91m\tFix extension html: \u2717\u001b[0m\n    Testing extension: sql\n    \u001b[91m\tFix extension sql: \u2717\u001b[0m\n    Testing extension: xml\n    \u001b[91m\tFix extension xml: \u2717\u001b[0m\n\n\n",
    "bugtrack_url": null,
    "license": "apache-2.0",
    "summary": "A (key-value) data-object-layer to get (pandas) tables from a variety of sources with ease",
    "version": "0.1.17",
    "project_urls": {
        "Homepage": "https://github.com/i2mint/tabled"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ba029bcbbadaeb978f0a3745151f85961be41bc93a7c1e27cf777503108e2c9d",
                "md5": "ef49c526832302af70436db61e40267e",
                "sha256": "3950db76e89535172e92212bcaaaa50d09ec772780902dd2a39f6eb601dba284"
            },
            "downloads": -1,
            "filename": "tabled-0.1.17-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ef49c526832302af70436db61e40267e",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 43074,
            "upload_time": "2024-12-17T13:15:25",
            "upload_time_iso_8601": "2024-12-17T13:15:25.733385Z",
            "url": "https://files.pythonhosted.org/packages/ba/02/9bcbbadaeb978f0a3745151f85961be41bc93a7c1e27cf777503108e2c9d/tabled-0.1.17-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "22fcd0d6933d3cd2f436aa612bd276596040396ea91847e0d7ff1be67d10f337",
                "md5": "17b97b6cfa1d647a55e7dcfd5b6dbc6c",
                "sha256": "c814a460c5b6f2d77b43b8d981e0a144503b68afd6e9cf03d333361007ccafb4"
            },
            "downloads": -1,
            "filename": "tabled-0.1.17.tar.gz",
            "has_sig": false,
            "md5_digest": "17b97b6cfa1d647a55e7dcfd5b6dbc6c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 40475,
            "upload_time": "2024-12-17T13:15:30",
            "upload_time_iso_8601": "2024-12-17T13:15:30.759146Z",
            "url": "https://files.pythonhosted.org/packages/22/fc/d0d6933d3cd2f436aa612bd276596040396ea91847e0d7ff1be67d10f337/tabled-0.1.17.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-17 13:15:30",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "i2mint",
    "github_project": "tabled",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "tabled"
}
        
Elapsed time: 0.44841s