pytablereader


Namepytablereader JSON
Version 0.31.4 PyPI version JSON
download
home_pagehttps://github.com/thombashi/pytablereader
Summarypytablereader is a Python library to load structured table data from files/strings/URL with various data format: CSV / Excel / Google-Sheets / HTML / JSON / LDJSON / LTSV / Markdown / SQLite / TSV.
upload_time2023-06-25 04:15:45
maintainer
docs_urlNone
authorTsuyoshi Hombashi
requires_python>=3.7
licenseMIT License
keywords table reader pandas csv excel html json ltsv markdown mediawiki tsv sqlite
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            .. contents:: **pytablereader**
   :backlinks: top
   :depth: 2

Summary
=========
`pytablereader <https://github.com/thombashi/pytablereader>`__ is a Python library to load structured table data from files/strings/URL with various data format: CSV / Excel / Google-Sheets / HTML / JSON / LDJSON / LTSV / Markdown / SQLite / TSV.

.. image:: https://badge.fury.io/py/pytablereader.svg
    :target: https://badge.fury.io/py/pytablereader
    :alt: PyPI package version

.. image:: https://img.shields.io/pypi/pyversions/pytablereader.svg
    :target: https://pypi.org/project/pytablereader
    :alt: Supported Python versions

.. image:: https://img.shields.io/pypi/implementation/pytablereader.svg
    :target: https://pypi.org/project/pytablereader
    :alt: Supported Python implementations

.. image:: https://github.com/thombashi/pytablereader/actions/workflows/lint_and_test.yml/badge.svg
    :target: https://github.com/thombashi/pytablereader/actions/workflows/lint_and_test.yml
    :alt: CI status of Linux/macOS/Windows

.. image:: https://coveralls.io/repos/github/thombashi/pytablereader/badge.svg?branch=master
    :target: https://coveralls.io/github/thombashi/pytablereader?branch=master
    :alt: Test coverage

.. image:: https://github.com/thombashi/pytablereader/actions/workflows/github-code-scanning/codeql/badge.svg
    :target: https://github.com/thombashi/pytablereader/actions/workflows/github-code-scanning/codeql
    :alt: CodeQL

Features
--------
- Extract structured tabular data from various data format:
    - CSV / Tab separated values (TSV) / Space separated values (SSV)
    - Microsoft Excel :superscript:`TM` file
    - `Google Sheets <https://www.google.com/intl/en_us/sheets/about/>`_
    - HTML (``table`` tags)
    - JSON
    - `Labeled Tab-separated Values (LTSV) <http://ltsv.org/>`__
    - `Line-delimited JSON(LDJSON) <https://en.wikipedia.org/wiki/JSON_streaming#Line-delimited_JSON>`__ / NDJSON / JSON Lines
    - Markdown
    - MediaWiki
    - SQLite database file
- Supported data sources are:
    - Files on a local file system
    - Accessible URLs
    - ``str`` instances
- Loaded table data can be used as:
    - `pandas.DataFrame <https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html>`__ instance
    - ``dict`` instance

Examples
==========
Load a CSV table
------------------
:Sample Code:
    .. code-block:: python

        import pytablereader as ptr
        import pytablewriter as ptw


        # prepare data ---
        file_path = "sample_data.csv"
        csv_text = "\n".join([
            '"attr_a","attr_b","attr_c"',
            '1,4,"a"',
            '2,2.1,"bb"',
            '3,120.9,"ccc"',
        ])

        with open(file_path, "w") as f:
            f.write(csv_text)

        # load from a csv file ---
        loader = ptr.CsvTableFileLoader(file_path)
        for table_data in loader.load():
            print("\n".join([
                "load from file",
                "==============",
                "{:s}".format(ptw.dumps_tabledata(table_data)),
            ]))

        # load from a csv text ---
        loader = ptr.CsvTableTextLoader(csv_text)
        for table_data in loader.load():
            print("\n".join([
                "load from text",
                "==============",
                "{:s}".format(ptw.dumps_tabledata(table_data)),
            ]))


:Output:
    .. code-block::

        load from file
        ==============
        .. table:: sample_data

            ======  ======  ======
            attr_a  attr_b  attr_c
            ======  ======  ======
                 1     4.0  a
                 2     2.1  bb
                 3   120.9  ccc
            ======  ======  ======

        load from text
        ==============
        .. table:: csv2

            ======  ======  ======
            attr_a  attr_b  attr_c
            ======  ======  ======
                 1     4.0  a
                 2     2.1  bb
                 3   120.9  ccc
            ======  ======  ======

Get loaded table data as pandas.DataFrame instance
----------------------------------------------------

:Sample Code:
    .. code-block:: python

        import pytablereader as ptr

        loader = ptr.CsvTableTextLoader(
            "\n".join([
                "a,b",
                "1,2",
                "3.3,4.4",
            ]))
        for table_data in loader.load():
            print(table_data.as_dataframe())

:Output:
    .. code-block::

             a    b
        0    1    2
        1  3.3  4.4

For more information
----------------------
More examples are available at 
https://pytablereader.rtfd.io/en/latest/pages/examples/index.html

Installation
============

Install from PyPI
------------------------------
::

    pip install pytablereader

Some of the formats require additional dependency packages, you can install the dependency packages as follows:

- Excel
    - ``pip install pytablereader[excel]``
- Google Sheets
    - ``pip install pytablereader[gs]``
- Markdown
    - ``pip install pytablereader[md]``
- Mediawiki
    - ``pip install pytablereader[mediawiki]``
- SQLite
    - ``pip install pytablereader[sqlite]``
- Load from URLs
    - ``pip install pytablereader[url]``
- All of the extra dependencies
    - ``pip install pytablereader[all]``

Install from PPA (for Ubuntu)
------------------------------
::

    sudo add-apt-repository ppa:thombashi/ppa
    sudo apt update
    sudo apt install python3-pytablereader


Dependencies
============
- Python 3.7+
- `Python package dependencies (automatically installed) <https://github.com/thombashi/pytablereader/network/dependencies>`__


Optional Python packages
------------------------------------------------
- ``logging`` extras
    - `loguru <https://github.com/Delgan/loguru>`__: Used for logging if the package installed
- ``excel`` extras
    - `excelrd <https://github.com/thombashi/excelrd>`__
- ``md`` extras
    - `Markdown <https://github.com/Python-Markdown/markdown>`__
- ``mediawiki`` extras
    - `pypandoc <https://github.com/bebraw/pypandoc>`__
- ``sqlite`` extras
    - `SimpleSQLite <https://github.com/thombashi/SimpleSQLite>`__
- ``url`` extras
    - `retryrequests <https://github.com/thombashi/retryrequests>`__
- `pandas <https://pandas.pydata.org/>`__
    - required to get table data as a pandas data frame
- `lxml <https://lxml.de/installation.html>`__

Optional packages (other than Python packages)
------------------------------------------------
- ``libxml2`` (faster HTML conversion)
- `pandoc <https://pandoc.org/>`__ (required when loading MediaWiki file)

Documentation
===============
https://pytablereader.rtfd.io/

Related Project
=================
- `pytablewriter <https://github.com/thombashi/pytablewriter>`__
    - Tabular data loaded by ``pytablereader`` can be written another tabular data format with ``pytablewriter``.

Sponsors
====================================
.. image:: https://avatars.githubusercontent.com/u/44389260?s=48&u=6da7176e51ae2654bcfd22564772ef8a3bb22318&v=4
   :target: https://github.com/chasbecker
   :alt: Charles Becker (chasbecker)
.. image:: https://avatars.githubusercontent.com/u/46711571?s=48&u=57687c0e02d5d6e8eeaf9177f7b7af4c9f275eb5&v=4
   :target: https://github.com/Arturi0
   :alt: onetime: Arturi0
.. image:: https://avatars.githubusercontent.com/u/3658062?s=48&v=4
   :target: https://github.com/b4tman
   :alt: onetime: Dmitry Belyaev (b4tman)

`Become a sponsor <https://github.com/sponsors/thombashi>`__


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/thombashi/pytablereader",
    "name": "pytablereader",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "table,reader,pandas,CSV,Excel,HTML,JSON,LTSV,Markdown,MediaWiki,TSV,SQLite",
    "author": "Tsuyoshi Hombashi",
    "author_email": "tsuyoshi.hombashi@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/0a/44/e42c24df7b6f1c880b5bf614112e2009ac088fee79b6bc4d1fa43789c460/pytablereader-0.31.4.tar.gz",
    "platform": null,
    "description": ".. contents:: **pytablereader**\n   :backlinks: top\n   :depth: 2\n\nSummary\n=========\n`pytablereader <https://github.com/thombashi/pytablereader>`__ is a Python library to load structured table data from files/strings/URL with various data format: CSV / Excel / Google-Sheets / HTML / JSON / LDJSON / LTSV / Markdown / SQLite / TSV.\n\n.. image:: https://badge.fury.io/py/pytablereader.svg\n    :target: https://badge.fury.io/py/pytablereader\n    :alt: PyPI package version\n\n.. image:: https://img.shields.io/pypi/pyversions/pytablereader.svg\n    :target: https://pypi.org/project/pytablereader\n    :alt: Supported Python versions\n\n.. image:: https://img.shields.io/pypi/implementation/pytablereader.svg\n    :target: https://pypi.org/project/pytablereader\n    :alt: Supported Python implementations\n\n.. image:: https://github.com/thombashi/pytablereader/actions/workflows/lint_and_test.yml/badge.svg\n    :target: https://github.com/thombashi/pytablereader/actions/workflows/lint_and_test.yml\n    :alt: CI status of Linux/macOS/Windows\n\n.. image:: https://coveralls.io/repos/github/thombashi/pytablereader/badge.svg?branch=master\n    :target: https://coveralls.io/github/thombashi/pytablereader?branch=master\n    :alt: Test coverage\n\n.. image:: https://github.com/thombashi/pytablereader/actions/workflows/github-code-scanning/codeql/badge.svg\n    :target: https://github.com/thombashi/pytablereader/actions/workflows/github-code-scanning/codeql\n    :alt: CodeQL\n\nFeatures\n--------\n- Extract structured tabular data from various data format:\n    - CSV / Tab separated values (TSV) / Space separated values (SSV)\n    - Microsoft Excel :superscript:`TM` file\n    - `Google Sheets <https://www.google.com/intl/en_us/sheets/about/>`_\n    - HTML (``table`` tags)\n    - JSON\n    - `Labeled Tab-separated Values (LTSV) <http://ltsv.org/>`__\n    - `Line-delimited JSON(LDJSON) <https://en.wikipedia.org/wiki/JSON_streaming#Line-delimited_JSON>`__ / NDJSON / JSON Lines\n    - Markdown\n    - MediaWiki\n    - SQLite database file\n- Supported data sources are:\n    - Files on a local file system\n    - Accessible URLs\n    - ``str`` instances\n- Loaded table data can be used as:\n    - `pandas.DataFrame <https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html>`__ instance\n    - ``dict`` instance\n\nExamples\n==========\nLoad a CSV table\n------------------\n:Sample Code:\n    .. code-block:: python\n\n        import pytablereader as ptr\n        import pytablewriter as ptw\n\n\n        # prepare data ---\n        file_path = \"sample_data.csv\"\n        csv_text = \"\\n\".join([\n            '\"attr_a\",\"attr_b\",\"attr_c\"',\n            '1,4,\"a\"',\n            '2,2.1,\"bb\"',\n            '3,120.9,\"ccc\"',\n        ])\n\n        with open(file_path, \"w\") as f:\n            f.write(csv_text)\n\n        # load from a csv file ---\n        loader = ptr.CsvTableFileLoader(file_path)\n        for table_data in loader.load():\n            print(\"\\n\".join([\n                \"load from file\",\n                \"==============\",\n                \"{:s}\".format(ptw.dumps_tabledata(table_data)),\n            ]))\n\n        # load from a csv text ---\n        loader = ptr.CsvTableTextLoader(csv_text)\n        for table_data in loader.load():\n            print(\"\\n\".join([\n                \"load from text\",\n                \"==============\",\n                \"{:s}\".format(ptw.dumps_tabledata(table_data)),\n            ]))\n\n\n:Output:\n    .. code-block::\n\n        load from file\n        ==============\n        .. table:: sample_data\n\n            ======  ======  ======\n            attr_a  attr_b  attr_c\n            ======  ======  ======\n                 1     4.0  a\n                 2     2.1  bb\n                 3   120.9  ccc\n            ======  ======  ======\n\n        load from text\n        ==============\n        .. table:: csv2\n\n            ======  ======  ======\n            attr_a  attr_b  attr_c\n            ======  ======  ======\n                 1     4.0  a\n                 2     2.1  bb\n                 3   120.9  ccc\n            ======  ======  ======\n\nGet loaded table data as pandas.DataFrame instance\n----------------------------------------------------\n\n:Sample Code:\n    .. code-block:: python\n\n        import pytablereader as ptr\n\n        loader = ptr.CsvTableTextLoader(\n            \"\\n\".join([\n                \"a,b\",\n                \"1,2\",\n                \"3.3,4.4\",\n            ]))\n        for table_data in loader.load():\n            print(table_data.as_dataframe())\n\n:Output:\n    .. code-block::\n\n             a    b\n        0    1    2\n        1  3.3  4.4\n\nFor more information\n----------------------\nMore examples are available at \nhttps://pytablereader.rtfd.io/en/latest/pages/examples/index.html\n\nInstallation\n============\n\nInstall from PyPI\n------------------------------\n::\n\n    pip install pytablereader\n\nSome of the formats require additional dependency packages, you can install the dependency packages as follows:\n\n- Excel\n    - ``pip install pytablereader[excel]``\n- Google Sheets\n    - ``pip install pytablereader[gs]``\n- Markdown\n    - ``pip install pytablereader[md]``\n- Mediawiki\n    - ``pip install pytablereader[mediawiki]``\n- SQLite\n    - ``pip install pytablereader[sqlite]``\n- Load from URLs\n    - ``pip install pytablereader[url]``\n- All of the extra dependencies\n    - ``pip install pytablereader[all]``\n\nInstall from PPA (for Ubuntu)\n------------------------------\n::\n\n    sudo add-apt-repository ppa:thombashi/ppa\n    sudo apt update\n    sudo apt install python3-pytablereader\n\n\nDependencies\n============\n- Python 3.7+\n- `Python package dependencies (automatically installed) <https://github.com/thombashi/pytablereader/network/dependencies>`__\n\n\nOptional Python packages\n------------------------------------------------\n- ``logging`` extras\n    - `loguru <https://github.com/Delgan/loguru>`__: Used for logging if the package installed\n- ``excel`` extras\n    - `excelrd <https://github.com/thombashi/excelrd>`__\n- ``md`` extras\n    - `Markdown <https://github.com/Python-Markdown/markdown>`__\n- ``mediawiki`` extras\n    - `pypandoc <https://github.com/bebraw/pypandoc>`__\n- ``sqlite`` extras\n    - `SimpleSQLite <https://github.com/thombashi/SimpleSQLite>`__\n- ``url`` extras\n    - `retryrequests <https://github.com/thombashi/retryrequests>`__\n- `pandas <https://pandas.pydata.org/>`__\n    - required to get table data as a pandas data frame\n- `lxml <https://lxml.de/installation.html>`__\n\nOptional packages (other than Python packages)\n------------------------------------------------\n- ``libxml2`` (faster HTML conversion)\n- `pandoc <https://pandoc.org/>`__ (required when loading MediaWiki file)\n\nDocumentation\n===============\nhttps://pytablereader.rtfd.io/\n\nRelated Project\n=================\n- `pytablewriter <https://github.com/thombashi/pytablewriter>`__\n    - Tabular data loaded by ``pytablereader`` can be written another tabular data format with ``pytablewriter``.\n\nSponsors\n====================================\n.. image:: https://avatars.githubusercontent.com/u/44389260?s=48&u=6da7176e51ae2654bcfd22564772ef8a3bb22318&v=4\n   :target: https://github.com/chasbecker\n   :alt: Charles Becker (chasbecker)\n.. image:: https://avatars.githubusercontent.com/u/46711571?s=48&u=57687c0e02d5d6e8eeaf9177f7b7af4c9f275eb5&v=4\n   :target: https://github.com/Arturi0\n   :alt: onetime: Arturi0\n.. image:: https://avatars.githubusercontent.com/u/3658062?s=48&v=4\n   :target: https://github.com/b4tman\n   :alt: onetime: Dmitry Belyaev (b4tman)\n\n`Become a sponsor <https://github.com/sponsors/thombashi>`__\n\n",
    "bugtrack_url": null,
    "license": "MIT License",
    "summary": "pytablereader is a Python library to load structured table data from files/strings/URL with various data format: CSV / Excel / Google-Sheets / HTML / JSON / LDJSON / LTSV / Markdown / SQLite / TSV.",
    "version": "0.31.4",
    "project_urls": {
        "Changlog": "https://github.com/thombashi/pytablereader/releases",
        "Documentation": "https://pytablereader.rtfd.io/",
        "Homepage": "https://github.com/thombashi/pytablereader",
        "Source": "https://github.com/thombashi/pytablereader",
        "Tracker": "https://github.com/thombashi/pytablereader/issues"
    },
    "split_keywords": [
        "table",
        "reader",
        "pandas",
        "csv",
        "excel",
        "html",
        "json",
        "ltsv",
        "markdown",
        "mediawiki",
        "tsv",
        "sqlite"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "41e9eeffa7b8ce57ecfa711f1f173012705bb8b082cb547c2d68a951845ad289",
                "md5": "d638021f5b68225f087ac5c029670ca1",
                "sha256": "2ce0e81b1035ba6b345cc1edbf5734780ed089fdead05c1fd12869a09cc0c3ce"
            },
            "downloads": -1,
            "filename": "pytablereader-0.31.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d638021f5b68225f087ac5c029670ca1",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 48446,
            "upload_time": "2023-06-25T04:15:42",
            "upload_time_iso_8601": "2023-06-25T04:15:42.758519Z",
            "url": "https://files.pythonhosted.org/packages/41/e9/eeffa7b8ce57ecfa711f1f173012705bb8b082cb547c2d68a951845ad289/pytablereader-0.31.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0a44e42c24df7b6f1c880b5bf614112e2009ac088fee79b6bc4d1fa43789c460",
                "md5": "d92cbcb2716ecea0eee58649a591edc0",
                "sha256": "ad97308308525cafe0eaa4b6a80a02499e0b4c6c979efb17452d302ad78bd5b1"
            },
            "downloads": -1,
            "filename": "pytablereader-0.31.4.tar.gz",
            "has_sig": false,
            "md5_digest": "d92cbcb2716ecea0eee58649a591edc0",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 72143,
            "upload_time": "2023-06-25T04:15:45",
            "upload_time_iso_8601": "2023-06-25T04:15:45.468929Z",
            "url": "https://files.pythonhosted.org/packages/0a/44/e42c24df7b6f1c880b5bf614112e2009ac088fee79b6bc4d1fa43789c460/pytablereader-0.31.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-06-25 04:15:45",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "thombashi",
    "github_project": "pytablereader",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "pytablereader"
}
        
Elapsed time: 0.54969s