.. contents:: **pytablereader**
:backlinks: top
:depth: 2
Summary
=========
`pytablereader <https://github.com/thombashi/pytablereader>`__ is a Python library to load structured table data from files/strings/URL with various data format: CSV / Excel / Google-Sheets / HTML / JSON / LDJSON / LTSV / Markdown / SQLite / TSV.
.. image:: https://badge.fury.io/py/pytablereader.svg
:target: https://badge.fury.io/py/pytablereader
:alt: PyPI package version
.. image:: https://img.shields.io/pypi/pyversions/pytablereader.svg
:target: https://pypi.org/project/pytablereader
:alt: Supported Python versions
.. image:: https://img.shields.io/pypi/implementation/pytablereader.svg
:target: https://pypi.org/project/pytablereader
:alt: Supported Python implementations
.. image:: https://github.com/thombashi/pytablereader/actions/workflows/lint_and_test.yml/badge.svg
:target: https://github.com/thombashi/pytablereader/actions/workflows/lint_and_test.yml
:alt: CI status of Linux/macOS/Windows
.. image:: https://coveralls.io/repos/github/thombashi/pytablereader/badge.svg?branch=master
:target: https://coveralls.io/github/thombashi/pytablereader?branch=master
:alt: Test coverage
.. image:: https://github.com/thombashi/pytablereader/actions/workflows/github-code-scanning/codeql/badge.svg
:target: https://github.com/thombashi/pytablereader/actions/workflows/github-code-scanning/codeql
:alt: CodeQL
Features
--------
- Extract structured tabular data from various data format:
- CSV / Tab separated values (TSV) / Space separated values (SSV)
- Microsoft Excel :superscript:`TM` file
- `Google Sheets <https://www.google.com/intl/en_us/sheets/about/>`_
- HTML (``table`` tags)
- JSON
- `Labeled Tab-separated Values (LTSV) <http://ltsv.org/>`__
- `Line-delimited JSON(LDJSON) <https://en.wikipedia.org/wiki/JSON_streaming#Line-delimited_JSON>`__ / NDJSON / JSON Lines
- Markdown
- MediaWiki
- SQLite database file
- Supported data sources are:
- Files on a local file system
- Accessible URLs
- ``str`` instances
- Loaded table data can be used as:
- `pandas.DataFrame <https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html>`__ instance
- ``dict`` instance
Examples
==========
Load a CSV table
------------------
:Sample Code:
.. code-block:: python
import pytablereader as ptr
import pytablewriter as ptw
# prepare data ---
file_path = "sample_data.csv"
csv_text = "\n".join([
'"attr_a","attr_b","attr_c"',
'1,4,"a"',
'2,2.1,"bb"',
'3,120.9,"ccc"',
])
with open(file_path, "w") as f:
f.write(csv_text)
# load from a csv file ---
loader = ptr.CsvTableFileLoader(file_path)
for table_data in loader.load():
print("\n".join([
"load from file",
"==============",
"{:s}".format(ptw.dumps_tabledata(table_data)),
]))
# load from a csv text ---
loader = ptr.CsvTableTextLoader(csv_text)
for table_data in loader.load():
print("\n".join([
"load from text",
"==============",
"{:s}".format(ptw.dumps_tabledata(table_data)),
]))
:Output:
.. code-block::
load from file
==============
.. table:: sample_data
====== ====== ======
attr_a attr_b attr_c
====== ====== ======
1 4.0 a
2 2.1 bb
3 120.9 ccc
====== ====== ======
load from text
==============
.. table:: csv2
====== ====== ======
attr_a attr_b attr_c
====== ====== ======
1 4.0 a
2 2.1 bb
3 120.9 ccc
====== ====== ======
Get loaded table data as pandas.DataFrame instance
----------------------------------------------------
:Sample Code:
.. code-block:: python
import pytablereader as ptr
loader = ptr.CsvTableTextLoader(
"\n".join([
"a,b",
"1,2",
"3.3,4.4",
]))
for table_data in loader.load():
print(table_data.as_dataframe())
:Output:
.. code-block::
a b
0 1 2
1 3.3 4.4
For more information
----------------------
More examples are available at
https://pytablereader.rtfd.io/en/latest/pages/examples/index.html
Installation
============
Install from PyPI
------------------------------
::
pip install pytablereader
Some of the formats require additional dependency packages, you can install the dependency packages as follows:
- Excel
- ``pip install pytablereader[excel]``
- Google Sheets
- ``pip install pytablereader[gs]``
- Markdown
- ``pip install pytablereader[md]``
- Mediawiki
- ``pip install pytablereader[mediawiki]``
- SQLite
- ``pip install pytablereader[sqlite]``
- Load from URLs
- ``pip install pytablereader[url]``
- All of the extra dependencies
- ``pip install pytablereader[all]``
Install from PPA (for Ubuntu)
------------------------------
::
sudo add-apt-repository ppa:thombashi/ppa
sudo apt update
sudo apt install python3-pytablereader
Dependencies
============
- Python 3.7+
- `Python package dependencies (automatically installed) <https://github.com/thombashi/pytablereader/network/dependencies>`__
Optional Python packages
------------------------------------------------
- ``logging`` extras
- `loguru <https://github.com/Delgan/loguru>`__: Used for logging if the package installed
- ``excel`` extras
- `excelrd <https://github.com/thombashi/excelrd>`__
- ``md`` extras
- `Markdown <https://github.com/Python-Markdown/markdown>`__
- ``mediawiki`` extras
- `pypandoc <https://github.com/bebraw/pypandoc>`__
- ``sqlite`` extras
- `SimpleSQLite <https://github.com/thombashi/SimpleSQLite>`__
- ``url`` extras
- `retryrequests <https://github.com/thombashi/retryrequests>`__
- `pandas <https://pandas.pydata.org/>`__
- required to get table data as a pandas data frame
- `lxml <https://lxml.de/installation.html>`__
Optional packages (other than Python packages)
------------------------------------------------
- ``libxml2`` (faster HTML conversion)
- `pandoc <https://pandoc.org/>`__ (required when loading MediaWiki file)
Documentation
===============
https://pytablereader.rtfd.io/
Related Project
=================
- `pytablewriter <https://github.com/thombashi/pytablewriter>`__
- Tabular data loaded by ``pytablereader`` can be written another tabular data format with ``pytablewriter``.
Sponsors
====================================
.. image:: https://avatars.githubusercontent.com/u/44389260?s=48&u=6da7176e51ae2654bcfd22564772ef8a3bb22318&v=4
:target: https://github.com/chasbecker
:alt: Charles Becker (chasbecker)
.. image:: https://avatars.githubusercontent.com/u/46711571?s=48&u=57687c0e02d5d6e8eeaf9177f7b7af4c9f275eb5&v=4
:target: https://github.com/Arturi0
:alt: onetime: Arturi0
.. image:: https://avatars.githubusercontent.com/u/3658062?s=48&v=4
:target: https://github.com/b4tman
:alt: onetime: Dmitry Belyaev (b4tman)
`Become a sponsor <https://github.com/sponsors/thombashi>`__
Raw data
{
"_id": null,
"home_page": "https://github.com/thombashi/pytablereader",
"name": "pytablereader",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": "",
"keywords": "table,reader,pandas,CSV,Excel,HTML,JSON,LTSV,Markdown,MediaWiki,TSV,SQLite",
"author": "Tsuyoshi Hombashi",
"author_email": "tsuyoshi.hombashi@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/0a/44/e42c24df7b6f1c880b5bf614112e2009ac088fee79b6bc4d1fa43789c460/pytablereader-0.31.4.tar.gz",
"platform": null,
"description": ".. contents:: **pytablereader**\n :backlinks: top\n :depth: 2\n\nSummary\n=========\n`pytablereader <https://github.com/thombashi/pytablereader>`__ is a Python library to load structured table data from files/strings/URL with various data format: CSV / Excel / Google-Sheets / HTML / JSON / LDJSON / LTSV / Markdown / SQLite / TSV.\n\n.. image:: https://badge.fury.io/py/pytablereader.svg\n :target: https://badge.fury.io/py/pytablereader\n :alt: PyPI package version\n\n.. image:: https://img.shields.io/pypi/pyversions/pytablereader.svg\n :target: https://pypi.org/project/pytablereader\n :alt: Supported Python versions\n\n.. image:: https://img.shields.io/pypi/implementation/pytablereader.svg\n :target: https://pypi.org/project/pytablereader\n :alt: Supported Python implementations\n\n.. image:: https://github.com/thombashi/pytablereader/actions/workflows/lint_and_test.yml/badge.svg\n :target: https://github.com/thombashi/pytablereader/actions/workflows/lint_and_test.yml\n :alt: CI status of Linux/macOS/Windows\n\n.. image:: https://coveralls.io/repos/github/thombashi/pytablereader/badge.svg?branch=master\n :target: https://coveralls.io/github/thombashi/pytablereader?branch=master\n :alt: Test coverage\n\n.. image:: https://github.com/thombashi/pytablereader/actions/workflows/github-code-scanning/codeql/badge.svg\n :target: https://github.com/thombashi/pytablereader/actions/workflows/github-code-scanning/codeql\n :alt: CodeQL\n\nFeatures\n--------\n- Extract structured tabular data from various data format:\n - CSV / Tab separated values (TSV) / Space separated values (SSV)\n - Microsoft Excel :superscript:`TM` file\n - `Google Sheets <https://www.google.com/intl/en_us/sheets/about/>`_\n - HTML (``table`` tags)\n - JSON\n - `Labeled Tab-separated Values (LTSV) <http://ltsv.org/>`__\n - `Line-delimited JSON(LDJSON) <https://en.wikipedia.org/wiki/JSON_streaming#Line-delimited_JSON>`__ / NDJSON / JSON Lines\n - Markdown\n - MediaWiki\n - SQLite database file\n- Supported data sources are:\n - Files on a local file system\n - Accessible URLs\n - ``str`` instances\n- Loaded table data can be used as:\n - `pandas.DataFrame <https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html>`__ instance\n - ``dict`` instance\n\nExamples\n==========\nLoad a CSV table\n------------------\n:Sample Code:\n .. code-block:: python\n\n import pytablereader as ptr\n import pytablewriter as ptw\n\n\n # prepare data ---\n file_path = \"sample_data.csv\"\n csv_text = \"\\n\".join([\n '\"attr_a\",\"attr_b\",\"attr_c\"',\n '1,4,\"a\"',\n '2,2.1,\"bb\"',\n '3,120.9,\"ccc\"',\n ])\n\n with open(file_path, \"w\") as f:\n f.write(csv_text)\n\n # load from a csv file ---\n loader = ptr.CsvTableFileLoader(file_path)\n for table_data in loader.load():\n print(\"\\n\".join([\n \"load from file\",\n \"==============\",\n \"{:s}\".format(ptw.dumps_tabledata(table_data)),\n ]))\n\n # load from a csv text ---\n loader = ptr.CsvTableTextLoader(csv_text)\n for table_data in loader.load():\n print(\"\\n\".join([\n \"load from text\",\n \"==============\",\n \"{:s}\".format(ptw.dumps_tabledata(table_data)),\n ]))\n\n\n:Output:\n .. code-block::\n\n load from file\n ==============\n .. table:: sample_data\n\n ====== ====== ======\n attr_a attr_b attr_c\n ====== ====== ======\n 1 4.0 a\n 2 2.1 bb\n 3 120.9 ccc\n ====== ====== ======\n\n load from text\n ==============\n .. table:: csv2\n\n ====== ====== ======\n attr_a attr_b attr_c\n ====== ====== ======\n 1 4.0 a\n 2 2.1 bb\n 3 120.9 ccc\n ====== ====== ======\n\nGet loaded table data as pandas.DataFrame instance\n----------------------------------------------------\n\n:Sample Code:\n .. code-block:: python\n\n import pytablereader as ptr\n\n loader = ptr.CsvTableTextLoader(\n \"\\n\".join([\n \"a,b\",\n \"1,2\",\n \"3.3,4.4\",\n ]))\n for table_data in loader.load():\n print(table_data.as_dataframe())\n\n:Output:\n .. code-block::\n\n a b\n 0 1 2\n 1 3.3 4.4\n\nFor more information\n----------------------\nMore examples are available at \nhttps://pytablereader.rtfd.io/en/latest/pages/examples/index.html\n\nInstallation\n============\n\nInstall from PyPI\n------------------------------\n::\n\n pip install pytablereader\n\nSome of the formats require additional dependency packages, you can install the dependency packages as follows:\n\n- Excel\n - ``pip install pytablereader[excel]``\n- Google Sheets\n - ``pip install pytablereader[gs]``\n- Markdown\n - ``pip install pytablereader[md]``\n- Mediawiki\n - ``pip install pytablereader[mediawiki]``\n- SQLite\n - ``pip install pytablereader[sqlite]``\n- Load from URLs\n - ``pip install pytablereader[url]``\n- All of the extra dependencies\n - ``pip install pytablereader[all]``\n\nInstall from PPA (for Ubuntu)\n------------------------------\n::\n\n sudo add-apt-repository ppa:thombashi/ppa\n sudo apt update\n sudo apt install python3-pytablereader\n\n\nDependencies\n============\n- Python 3.7+\n- `Python package dependencies (automatically installed) <https://github.com/thombashi/pytablereader/network/dependencies>`__\n\n\nOptional Python packages\n------------------------------------------------\n- ``logging`` extras\n - `loguru <https://github.com/Delgan/loguru>`__: Used for logging if the package installed\n- ``excel`` extras\n - `excelrd <https://github.com/thombashi/excelrd>`__\n- ``md`` extras\n - `Markdown <https://github.com/Python-Markdown/markdown>`__\n- ``mediawiki`` extras\n - `pypandoc <https://github.com/bebraw/pypandoc>`__\n- ``sqlite`` extras\n - `SimpleSQLite <https://github.com/thombashi/SimpleSQLite>`__\n- ``url`` extras\n - `retryrequests <https://github.com/thombashi/retryrequests>`__\n- `pandas <https://pandas.pydata.org/>`__\n - required to get table data as a pandas data frame\n- `lxml <https://lxml.de/installation.html>`__\n\nOptional packages (other than Python packages)\n------------------------------------------------\n- ``libxml2`` (faster HTML conversion)\n- `pandoc <https://pandoc.org/>`__ (required when loading MediaWiki file)\n\nDocumentation\n===============\nhttps://pytablereader.rtfd.io/\n\nRelated Project\n=================\n- `pytablewriter <https://github.com/thombashi/pytablewriter>`__\n - Tabular data loaded by ``pytablereader`` can be written another tabular data format with ``pytablewriter``.\n\nSponsors\n====================================\n.. image:: https://avatars.githubusercontent.com/u/44389260?s=48&u=6da7176e51ae2654bcfd22564772ef8a3bb22318&v=4\n :target: https://github.com/chasbecker\n :alt: Charles Becker (chasbecker)\n.. image:: https://avatars.githubusercontent.com/u/46711571?s=48&u=57687c0e02d5d6e8eeaf9177f7b7af4c9f275eb5&v=4\n :target: https://github.com/Arturi0\n :alt: onetime: Arturi0\n.. image:: https://avatars.githubusercontent.com/u/3658062?s=48&v=4\n :target: https://github.com/b4tman\n :alt: onetime: Dmitry Belyaev (b4tman)\n\n`Become a sponsor <https://github.com/sponsors/thombashi>`__\n\n",
"bugtrack_url": null,
"license": "MIT License",
"summary": "pytablereader is a Python library to load structured table data from files/strings/URL with various data format: CSV / Excel / Google-Sheets / HTML / JSON / LDJSON / LTSV / Markdown / SQLite / TSV.",
"version": "0.31.4",
"project_urls": {
"Changlog": "https://github.com/thombashi/pytablereader/releases",
"Documentation": "https://pytablereader.rtfd.io/",
"Homepage": "https://github.com/thombashi/pytablereader",
"Source": "https://github.com/thombashi/pytablereader",
"Tracker": "https://github.com/thombashi/pytablereader/issues"
},
"split_keywords": [
"table",
"reader",
"pandas",
"csv",
"excel",
"html",
"json",
"ltsv",
"markdown",
"mediawiki",
"tsv",
"sqlite"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "41e9eeffa7b8ce57ecfa711f1f173012705bb8b082cb547c2d68a951845ad289",
"md5": "d638021f5b68225f087ac5c029670ca1",
"sha256": "2ce0e81b1035ba6b345cc1edbf5734780ed089fdead05c1fd12869a09cc0c3ce"
},
"downloads": -1,
"filename": "pytablereader-0.31.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "d638021f5b68225f087ac5c029670ca1",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 48446,
"upload_time": "2023-06-25T04:15:42",
"upload_time_iso_8601": "2023-06-25T04:15:42.758519Z",
"url": "https://files.pythonhosted.org/packages/41/e9/eeffa7b8ce57ecfa711f1f173012705bb8b082cb547c2d68a951845ad289/pytablereader-0.31.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "0a44e42c24df7b6f1c880b5bf614112e2009ac088fee79b6bc4d1fa43789c460",
"md5": "d92cbcb2716ecea0eee58649a591edc0",
"sha256": "ad97308308525cafe0eaa4b6a80a02499e0b4c6c979efb17452d302ad78bd5b1"
},
"downloads": -1,
"filename": "pytablereader-0.31.4.tar.gz",
"has_sig": false,
"md5_digest": "d92cbcb2716ecea0eee58649a591edc0",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 72143,
"upload_time": "2023-06-25T04:15:45",
"upload_time_iso_8601": "2023-06-25T04:15:45.468929Z",
"url": "https://files.pythonhosted.org/packages/0a/44/e42c24df7b6f1c880b5bf614112e2009ac088fee79b6bc4d1fa43789c460/pytablereader-0.31.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-06-25 04:15:45",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "thombashi",
"github_project": "pytablereader",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"tox": true,
"lcname": "pytablereader"
}