Name | tabled JSON |
Version |
0.1.17
JSON |
| download |
home_page | https://github.com/i2mint/tabled |
Summary | A (key-value) data-object-layer to get (pandas) tables from a variety of sources with ease |
upload_time | 2024-12-17 13:15:30 |
maintainer | None |
docs_url | None |
author | Thor Whalen |
requires_python | None |
license | apache-2.0 |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# tabled
A (key-value) data-object-layer to get (pandas) tables from a variety of sources with ease
To install: ```pip install tabled```
```python
```
# DfFiles
This notebook demonstrates how to use `DfFiles` to store and retrieve pandas DataFrames using various file formats.
## Setup
First, let's import required packages and define our test data:
```python
import os
import shutil
import tempfile
import pandas as pd
from tabled import DfFiles
# Test data dictionary
misc_small_dicts = {
"fantasy_tavern_menu": {
"item": ["Dragon Ale", "Elf Bread", "Goblin Stew"],
"price": [7.5, 3.0, 5.5],
"is_alcoholic": [True, False, False],
"servings_left": [12, 25, 8],
},
"alien_abduction_log": {
"abductee_name": ["Bob", "Alice", "Zork"],
"location": ["Kansas City", "Roswell", "Jupiter"],
"duration_minutes": [15, 120, 30],
"was_returned": [True, False, True],
}
}
```
## Creating Test Directory
We'll create a temporary directory for our files:
```python
def create_test_directory():
# Create a directory for the test files
rootdir = os.path.join(tempfile.gettempdir(), 'tabled_df_files_test')
if os.path.exists(rootdir):
shutil.rmtree(rootdir)
os.makedirs(rootdir)
print(f"Created directory at: {rootdir}")
return rootdir
rootdir = create_test_directory()
print(f"Created directory at: {rootdir}")
```
Created directory at: /var/folders/mc/c070wfh51kxd9lft8dl74q1r0000gn/T/tabled_df_files_test
Created directory at: /var/folders/mc/c070wfh51kxd9lft8dl74q1r0000gn/T/tabled_df_files_test
## Initialize DfFiles
Create a new DfFiles instance pointing to our directory:
```python
df_files = DfFiles(rootdir)
```
Let's verify it starts empty:
```python
list(df_files)
```
[]
## Creating and Saving DataFrames
Let's create DataFrames from our test data:
```python
fantasy_tavern_menu_df = pd.DataFrame(misc_small_dicts['fantasy_tavern_menu'])
alien_abduction_log_df = pd.DataFrame(misc_small_dicts['alien_abduction_log'])
print("Fantasy Tavern Menu:")
display(fantasy_tavern_menu_df)
print("\nAlien Abduction Log:")
display(alien_abduction_log_df)
```
Fantasy Tavern Menu:
<div>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>item</th>
<th>price</th>
<th>is_alcoholic</th>
<th>servings_left</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>Dragon Ale</td>
<td>7.5</td>
<td>True</td>
<td>12</td>
</tr>
<tr>
<th>1</th>
<td>Elf Bread</td>
<td>3.0</td>
<td>False</td>
<td>25</td>
</tr>
<tr>
<th>2</th>
<td>Goblin Stew</td>
<td>5.5</td>
<td>False</td>
<td>8</td>
</tr>
</tbody>
</table>
</div>
Alien Abduction Log:
<div>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>abductee_name</th>
<th>location</th>
<th>duration_minutes</th>
<th>was_returned</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>Bob</td>
<td>Kansas City</td>
<td>15</td>
<td>True</td>
</tr>
<tr>
<th>1</th>
<td>Alice</td>
<td>Roswell</td>
<td>120</td>
<td>False</td>
</tr>
<tr>
<th>2</th>
<td>Zork</td>
<td>Jupiter</td>
<td>30</td>
<td>True</td>
</tr>
</tbody>
</table>
</div>
Now let's save these DataFrames using different formats:
```python
df_files['fantasy_tavern_menu.csv'] = fantasy_tavern_menu_df
df_files['alien_abduction_log.json'] = alien_abduction_log_df
```
## Reading Data Back
Let's verify we can read the data back correctly:
```python
saved_df = df_files['fantasy_tavern_menu.csv']
saved_df
```
<div>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>item</th>
<th>price</th>
<th>is_alcoholic</th>
<th>servings_left</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>Dragon Ale</td>
<td>7.5</td>
<td>True</td>
<td>12</td>
</tr>
<tr>
<th>1</th>
<td>Elf Bread</td>
<td>3.0</td>
<td>False</td>
<td>25</td>
</tr>
<tr>
<th>2</th>
<td>Goblin Stew</td>
<td>5.5</td>
<td>False</td>
<td>8</td>
</tr>
</tbody>
</table>
</div>
## MutableMapping Interface
DfFiles implements the MutableMapping interface, making it behave like a dictionary.
Let's see how many files we have:
```python
len(df_files)
```
2
List all available files:
```python
list(df_files)
```
['fantasy_tavern_menu.csv', 'alien_abduction_log.json']
Check if a file exists:
```python
'fantasy_tavern_menu.csv' in df_files
```
True
## Supported File Extensions
Let's see what file formats DfFiles supports out of the box.
(**Note that some of these will require installing extra packages, which you'll realize if you get an ImportError**)
```python
print("Encoder supported extensions:")
list_of_encoder_supported_extensions = list(df_files.extension_encoder_mapping)
print(*list_of_encoder_supported_extensions, sep=', ')
```
Encoder supported extensions:
csv, txt, tsv, json, html, p, pickle, pkl, npy, parquet, zip, feather, h5, hdf5, stata, dta, sql, sqlite, gbq, xls, xlsx, xml, orc
```python
print("Decoder supported extensions:")
list_of_decoder_supported_extensions = list(df_files.extension_decoder_mapping)
print(*list_of_decoder_supported_extensions, sep=', ')
```
Decoder supported extensions:
csv, txt, tsv, parquet, json, html, p, pickle, pkl, xml, sql, sqlite, feather, stata, dta, sas, h5, hdf5, xls, xlsx, orc, sav
## Testing Different Extensions
Let's try saving and loading our test DataFrame in different formats:
```python
extensions_supported_by_encoder_and_decoder = (
set(list_of_encoder_supported_extensions) & set(list_of_decoder_supported_extensions)
)
sorted(extensions_supported_by_encoder_and_decoder)
```
['csv',
'dta',
'feather',
'h5',
'hdf5',
'html',
'json',
'orc',
'p',
'parquet',
'pickle',
'pkl',
'sql',
'sqlite',
'stata',
'tsv',
'txt',
'xls',
'xlsx',
'xml']
```python
```
```python
def test_extension(ext):
filename = f'test_file.{ext}'
try:
df_files[filename] = fantasy_tavern_menu_df
df_loaded = df_files[filename]
# test the decoded df is the same as the one that was saved (round-trip test)
# Note that we drop the index, since the index is not saved in the file by default for all codecs
pd.testing.assert_frame_equal(
fantasy_tavern_menu_df.reset_index(drop=True),
df_loaded.reset_index(drop=True),
)
return True
except Exception as e:
return False
test_extensions = [
'csv',
'feather',
'json',
'orc',
'parquet',
'pkl',
'tsv',
# 'dta', # TODO: fix
# 'h5', # TODO: fix
# 'html', # TODO: fix
# 'sql', # TODO: fix
# 'xml', # TODO: fix
]
for ext in test_extensions:
print("Testing extension:", ext)
success = test_extension(ext)
if success:
print(f"\tExtension {ext}: ✓")
else:
print('\033[91m' + f"\tFix extension {ext}: ✗" + '\033[0m')
# marker = '✓' if success else '\033[91m✗\033[0m'
# print(f"\tExtension {ext}: {marker}")
```
Testing extension: csv
Extension csv: ✓
Testing extension: feather
Extension feather: ✓
Testing extension: json
Extension json: ✓
Testing extension: orc
Extension orc: ✓
Testing extension: parquet
Extension parquet: ✓
Testing extension: pkl
Extension pkl: ✓
Testing extension: tsv
Extension tsv: ✓
Testing extension: dta
[91m Fix extension dta: ✗[0m
Testing extension: h5
[91m Fix extension h5: ✗[0m
Testing extension: html
[91m Fix extension html: ✗[0m
Testing extension: sql
[91m Fix extension sql: ✗[0m
Testing extension: xml
[91m Fix extension xml: ✗[0m
Raw data
{
"_id": null,
"home_page": "https://github.com/i2mint/tabled",
"name": "tabled",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": null,
"author": "Thor Whalen",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/22/fc/d0d6933d3cd2f436aa612bd276596040396ea91847e0d7ff1be67d10f337/tabled-0.1.17.tar.gz",
"platform": "any",
"description": "\n# tabled\n\nA (key-value) data-object-layer to get (pandas) tables from a variety of sources with ease\n\nTo install:\t```pip install tabled```\n\n\n\n\n\n```python\n\n```\n\n# DfFiles\n\nThis notebook demonstrates how to use `DfFiles` to store and retrieve pandas DataFrames using various file formats.\n\n## Setup\n\nFirst, let's import required packages and define our test data:\n\n\n```python\nimport os\nimport shutil\nimport tempfile\n\nimport pandas as pd\nfrom tabled import DfFiles\n\n# Test data dictionary\nmisc_small_dicts = {\n \"fantasy_tavern_menu\": {\n \"item\": [\"Dragon Ale\", \"Elf Bread\", \"Goblin Stew\"],\n \"price\": [7.5, 3.0, 5.5],\n \"is_alcoholic\": [True, False, False],\n \"servings_left\": [12, 25, 8],\n },\n \"alien_abduction_log\": {\n \"abductee_name\": [\"Bob\", \"Alice\", \"Zork\"],\n \"location\": [\"Kansas City\", \"Roswell\", \"Jupiter\"],\n \"duration_minutes\": [15, 120, 30],\n \"was_returned\": [True, False, True],\n }\n}\n```\n\n## Creating Test Directory\n\nWe'll create a temporary directory for our files:\n\n\n```python\ndef create_test_directory():\n # Create a directory for the test files\n rootdir = os.path.join(tempfile.gettempdir(), 'tabled_df_files_test')\n if os.path.exists(rootdir):\n shutil.rmtree(rootdir)\n os.makedirs(rootdir)\n print(f\"Created directory at: {rootdir}\")\n return rootdir\n\nrootdir = create_test_directory()\nprint(f\"Created directory at: {rootdir}\")\n```\n\n Created directory at: /var/folders/mc/c070wfh51kxd9lft8dl74q1r0000gn/T/tabled_df_files_test\n Created directory at: /var/folders/mc/c070wfh51kxd9lft8dl74q1r0000gn/T/tabled_df_files_test\n\n\n## Initialize DfFiles\n\nCreate a new DfFiles instance pointing to our directory:\n\n\n```python\ndf_files = DfFiles(rootdir)\n```\n\nLet's verify it starts empty:\n\n\n```python\nlist(df_files)\n```\n\n\n\n\n []\n\n\n\n## Creating and Saving DataFrames\n\nLet's create DataFrames from our test data:\n\n\n```python\nfantasy_tavern_menu_df = pd.DataFrame(misc_small_dicts['fantasy_tavern_menu'])\nalien_abduction_log_df = pd.DataFrame(misc_small_dicts['alien_abduction_log'])\n\nprint(\"Fantasy Tavern Menu:\")\ndisplay(fantasy_tavern_menu_df)\nprint(\"\\nAlien Abduction Log:\")\ndisplay(alien_abduction_log_df)\n```\n\n Fantasy Tavern Menu:\n\n\n\n<div>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>item</th>\n <th>price</th>\n <th>is_alcoholic</th>\n <th>servings_left</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>Dragon Ale</td>\n <td>7.5</td>\n <td>True</td>\n <td>12</td>\n </tr>\n <tr>\n <th>1</th>\n <td>Elf Bread</td>\n <td>3.0</td>\n <td>False</td>\n <td>25</td>\n </tr>\n <tr>\n <th>2</th>\n <td>Goblin Stew</td>\n <td>5.5</td>\n <td>False</td>\n <td>8</td>\n </tr>\n </tbody>\n</table>\n</div>\n\n\n \n Alien Abduction Log:\n\n\n\n<div>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>abductee_name</th>\n <th>location</th>\n <th>duration_minutes</th>\n <th>was_returned</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>Bob</td>\n <td>Kansas City</td>\n <td>15</td>\n <td>True</td>\n </tr>\n <tr>\n <th>1</th>\n <td>Alice</td>\n <td>Roswell</td>\n <td>120</td>\n <td>False</td>\n </tr>\n <tr>\n <th>2</th>\n <td>Zork</td>\n <td>Jupiter</td>\n <td>30</td>\n <td>True</td>\n </tr>\n </tbody>\n</table>\n</div>\n\n\nNow let's save these DataFrames using different formats:\n\n\n```python\ndf_files['fantasy_tavern_menu.csv'] = fantasy_tavern_menu_df\ndf_files['alien_abduction_log.json'] = alien_abduction_log_df\n```\n\n## Reading Data Back\n\nLet's verify we can read the data back correctly:\n\n\n```python\nsaved_df = df_files['fantasy_tavern_menu.csv']\nsaved_df\n```\n\n\n<div>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>item</th>\n <th>price</th>\n <th>is_alcoholic</th>\n <th>servings_left</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>Dragon Ale</td>\n <td>7.5</td>\n <td>True</td>\n <td>12</td>\n </tr>\n <tr>\n <th>1</th>\n <td>Elf Bread</td>\n <td>3.0</td>\n <td>False</td>\n <td>25</td>\n </tr>\n <tr>\n <th>2</th>\n <td>Goblin Stew</td>\n <td>5.5</td>\n <td>False</td>\n <td>8</td>\n </tr>\n </tbody>\n</table>\n</div>\n\n\n\n## MutableMapping Interface\n\nDfFiles implements the MutableMapping interface, making it behave like a dictionary.\n\nLet's see how many files we have:\n\n\n```python\nlen(df_files)\n```\n\n\n\n\n 2\n\n\n\nList all available files:\n\n\n```python\nlist(df_files)\n```\n\n\n\n\n ['fantasy_tavern_menu.csv', 'alien_abduction_log.json']\n\n\n\nCheck if a file exists:\n\n\n```python\n'fantasy_tavern_menu.csv' in df_files\n```\n\n\n\n\n True\n\n\n\n## Supported File Extensions\n\nLet's see what file formats DfFiles supports out of the box.\n\n(**Note that some of these will require installing extra packages, which you'll realize if you get an ImportError**)\n\n\n```python\nprint(\"Encoder supported extensions:\")\nlist_of_encoder_supported_extensions = list(df_files.extension_encoder_mapping)\nprint(*list_of_encoder_supported_extensions, sep=', ')\n```\n\n Encoder supported extensions:\n csv, txt, tsv, json, html, p, pickle, pkl, npy, parquet, zip, feather, h5, hdf5, stata, dta, sql, sqlite, gbq, xls, xlsx, xml, orc\n\n\n\n```python\nprint(\"Decoder supported extensions:\")\nlist_of_decoder_supported_extensions = list(df_files.extension_decoder_mapping)\nprint(*list_of_decoder_supported_extensions, sep=', ')\n```\n\n Decoder supported extensions:\n csv, txt, tsv, parquet, json, html, p, pickle, pkl, xml, sql, sqlite, feather, stata, dta, sas, h5, hdf5, xls, xlsx, orc, sav\n\n\n## Testing Different Extensions\n\nLet's try saving and loading our test DataFrame in different formats:\n\n\n```python\nextensions_supported_by_encoder_and_decoder = (\n set(list_of_encoder_supported_extensions) & set(list_of_decoder_supported_extensions)\n)\nsorted(extensions_supported_by_encoder_and_decoder)\n```\n\n\n ['csv',\n 'dta',\n 'feather',\n 'h5',\n 'hdf5',\n 'html',\n 'json',\n 'orc',\n 'p',\n 'parquet',\n 'pickle',\n 'pkl',\n 'sql',\n 'sqlite',\n 'stata',\n 'tsv',\n 'txt',\n 'xls',\n 'xlsx',\n 'xml']\n\n\n\n\n```python\n\n```\n\n\n```python\ndef test_extension(ext):\n filename = f'test_file.{ext}'\n try:\n df_files[filename] = fantasy_tavern_menu_df\n df_loaded = df_files[filename]\n # test the decoded df is the same as the one that was saved (round-trip test)\n # Note that we drop the index, since the index is not saved in the file by default for all codecs\n pd.testing.assert_frame_equal(\n fantasy_tavern_menu_df.reset_index(drop=True),\n df_loaded.reset_index(drop=True),\n )\n return True\n except Exception as e:\n return False\n\n\ntest_extensions = [\n 'csv',\n 'feather',\n 'json',\n 'orc',\n 'parquet',\n 'pkl',\n 'tsv', \n # 'dta', # TODO: fix\n # 'h5', # TODO: fix\n # 'html', # TODO: fix\n # 'sql', # TODO: fix\n # 'xml', # TODO: fix\n]\n\nfor ext in test_extensions:\n print(\"Testing extension:\", ext)\n success = test_extension(ext)\n if success:\n print(f\"\\tExtension {ext}: \u2713\")\n else:\n print('\\033[91m' + f\"\\tFix extension {ext}: \u2717\" + '\\033[0m')\n \n # marker = '\u2713' if success else '\\033[91m\u2717\\033[0m'\n # print(f\"\\tExtension {ext}: {marker}\")\n```\n\n Testing extension: csv\n \tExtension csv: \u2713\n Testing extension: feather\n \tExtension feather: \u2713\n Testing extension: json\n \tExtension json: \u2713\n Testing extension: orc\n \tExtension orc: \u2713\n Testing extension: parquet\n \tExtension parquet: \u2713\n Testing extension: pkl\n \tExtension pkl: \u2713\n Testing extension: tsv\n \tExtension tsv: \u2713\n Testing extension: dta\n \u001b[91m\tFix extension dta: \u2717\u001b[0m\n Testing extension: h5\n \u001b[91m\tFix extension h5: \u2717\u001b[0m\n Testing extension: html\n \u001b[91m\tFix extension html: \u2717\u001b[0m\n Testing extension: sql\n \u001b[91m\tFix extension sql: \u2717\u001b[0m\n Testing extension: xml\n \u001b[91m\tFix extension xml: \u2717\u001b[0m\n\n\n",
"bugtrack_url": null,
"license": "apache-2.0",
"summary": "A (key-value) data-object-layer to get (pandas) tables from a variety of sources with ease",
"version": "0.1.17",
"project_urls": {
"Homepage": "https://github.com/i2mint/tabled"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "ba029bcbbadaeb978f0a3745151f85961be41bc93a7c1e27cf777503108e2c9d",
"md5": "ef49c526832302af70436db61e40267e",
"sha256": "3950db76e89535172e92212bcaaaa50d09ec772780902dd2a39f6eb601dba284"
},
"downloads": -1,
"filename": "tabled-0.1.17-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ef49c526832302af70436db61e40267e",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 43074,
"upload_time": "2024-12-17T13:15:25",
"upload_time_iso_8601": "2024-12-17T13:15:25.733385Z",
"url": "https://files.pythonhosted.org/packages/ba/02/9bcbbadaeb978f0a3745151f85961be41bc93a7c1e27cf777503108e2c9d/tabled-0.1.17-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "22fcd0d6933d3cd2f436aa612bd276596040396ea91847e0d7ff1be67d10f337",
"md5": "17b97b6cfa1d647a55e7dcfd5b6dbc6c",
"sha256": "c814a460c5b6f2d77b43b8d981e0a144503b68afd6e9cf03d333361007ccafb4"
},
"downloads": -1,
"filename": "tabled-0.1.17.tar.gz",
"has_sig": false,
"md5_digest": "17b97b6cfa1d647a55e7dcfd5b6dbc6c",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 40475,
"upload_time": "2024-12-17T13:15:30",
"upload_time_iso_8601": "2024-12-17T13:15:30.759146Z",
"url": "https://files.pythonhosted.org/packages/22/fc/d0d6933d3cd2f436aa612bd276596040396ea91847e0d7ff1be67d10f337/tabled-0.1.17.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-17 13:15:30",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "i2mint",
"github_project": "tabled",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "tabled"
}