eparse

Name	eparse JSON
Version	0.7.3 JSON
	download
home_page	https://github.com/ChrisPappalardo/eparse
Summary	Excel spreadsheet crawler and table parser for data discovery, extraction, and querying
upload_time	2023-08-23 22:11:07
maintainer
docs_url	None
author	Chris Pappalardo
requires_python	>=3.8
license	MIT license
keywords	eparse
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            ======
eparse
======


.. image:: https://img.shields.io/pypi/v/eparse.svg
        :target: https://pypi.python.org/pypi/eparse

.. image:: https://img.shields.io/badge/License-MIT-blue.svg
        :target: https://opensource.org/licenses/MIT
        :alt: License: MIT


Description
===========
Excel spreadsheet crawler and table parser for data extraction
and querying.


Features
========
* Command-line interface
* Recursive Excel file discovery
* Sub-tabular data extraction (logical tables)
* SQLite and PostgreSQL database interfaces
* CLI query tool
* Summary data metrics


Installation
============
To install eparse, you can use pip and the latest version on PyPI:

.. code-block:: bash

   $ pip install eparse

Or you can clone this repo and install from source:

.. code-block:: bash

   $ git clone https://github.com/ChrisPappalardo/eparse.git
   $ cd eparse
   $ pip install .

If you plan to use the postgres interface, you also need to install
the postgres package ``psycopg2``. Instructions can be found
`here <https://www.psycopg.org/docs/install.html#quick-install>`_.
This package is optional, and you can use the other interfaces
such as the ``SQLite3`` interface without having to install
``psycopg2``.

The easiest way to install the ``psycopg2`` package for your
particular environment may be to install the pre-compiled
binary driver as follows:

.. code-block:: bash

   $ pip install psycopg2-binary

If you see an error while trying to use a postgres endpoint such
as ``postgres://user:pass@host:port/my_db`` that mentions the
postgres driver is missing, then you know you haven't properly
installed (and compiled)  ``psycopg2``.


Usage
=====
eparse can be used as either a python library or from the command-line.
You can view supported CLI commands and usage with ``--help`` as follows:

.. code-block:: bash

    $ eparse --help
    Usage: eparse [OPTIONS] COMMAND [ARGS]...

    excel parser

    Options:
    -i, --input TEXT   input source
    -o, --output TEXT  output destination
    -f, --file TEXT    file(s) or dir(s) to target
    -d, --debug        use debug mode
    -l, --loose        find tables loosely
    -r, --recursive    find files recursively
    -t, --truncate     truncate dataframe output
    -v, --verbose      increase output verbosity
    --help             Show this message and exit.

    Commands:
    migrate  migrate eparse table
    parse    parse table(s) found in sheet for target(s)
    query    query eparse output
    scan     scan for excel files in target

You can also use eparse from python like so:

.. code-block:: python

    from eparse.core import get_df_from_file

    print([table for table in get_df_from_file('myfile.xlsx')])
    102   Date  Principal Repayment   Date  Principal Repayment
    103  44834        700757.679004  44926        430013.148303
    104  44926         71957.776108  45016        100576.127808
    105  45016         147578.19262  45107        898008.340095
    106  45107         32801.363072  45199         841656.13896
    ...


Scan
----
To scan one or more directories for Excel files with descriptive
information, you can use the ``scan`` command like so:

.. code-block:: bash

    $ eparse -v -f <path_to_files> scan

Increase the verbosity with additional flags, such as ``-vvv``, for
more descriptive information about the file(s), including sheet names.


Parse
-----
Excel files can be parsed as follows:

.. code-block:: bash

    $ eparse -v -f <path_to_files> parse

This mode will list each table found in each Excel file to the command-line.
This mode is useful for initial discovery for parseable data.

eparse uses a simple algorithm for identifying tables.  Table "corners"
are identified as cells that contain empty cells above and to the right
(or sheet boundaries).  A densely or sparsely populated 2x2 table must
follow in order for data to be extracted in relation to that cell.
eparse will automatically adjust for rowspan labels and empty table
corners and the dense vs. sparse criterion can be controlled with
the ``--loose`` flag.

eparse was written to accomodate various types of output formats and
endpoints, including ``null:///``, ``stdout:///``, ``sqlite3:///db_name``,
and ``postgres://user:password@host:port/db_name``.

null
^^^^
This mode is useful for validating files and generating descriptive
info, and is the default.  The command above with `-v` is an example
of this mode, which lists out the tables found.

stdout
^^^^^^
This mode is good for viewing data extracted from Excel files in the
console.  For example, you could view all tables found in `Sheet1`
with the following command:

.. code-block:: bash

    $ eparse -f <path_to_files> -o stdout:/// parse -s "Sheet1"

eparse uses `pandas <https://github.com/pandas-dev/pandas>`_
to handle table data.  You can view larger tables without truncation
using the ``-t`` flag as follows:

.. code-block:: bash

    $ eparse -t -f <path_to_files> -o stdout:/// parse -s "Sheet1"

Data in table format is useful for human viewing, but a serialized
form is better for data interfacing.  Serialize your output with
the ``-z`` flag as follows:

.. code-block:: bash

    $ eparse -t -f <path_to_files> -o stdout:/// parse -z

Each cell of extracted table data is serialized as follows:

* `row` - 0-indexed table row number
* `column` - 0-indexed table column number
* `value` - the value of the cell as a ``str``
* `type` - the implied python ``type`` of the data found
* `c_header` - the column header
* `r_header` - the row header
* `excel_RC` - the RC reference from the spreadsheet (e.g. B10)
* `sheet` - the name of the sheet
* `f_name` - the name of the file

sqlite3
^^^^^^^
eparse uses the `peewee <https://github.com/coleifer/peewee>`_
package for ORM and database integration.  The
`interfaces <eparse/interfaces.py>`_ module contains an
``ExcelParse`` model that provides data persistence and a common
interface.

To create a `SQLite3 <https://github.com/sqlite/sqlite>`_ database
with your parsed Excel data, use the following command:

.. code-block:: bash

    $ mkdir .files
    $ eparse -f <path_to_files> -o sqlite3:/// parse -z

This command will automatically generate a unique database filename
using the ``uuid`` python package in the ``.files/`` sub-directory
of the working directory.  You may need to create this directory
before running this command, as shown.

You can also specify a path and filename of your choosing, like so:

.. code-block:: bash

    $ mkdir .files
    $ eparse -f <path_to_files> -o sqlite3:///path/filename.db parse -z

postgres
^^^^^^^^
eparse also supports `postgresql` integrations. As mentioned above,
you will need ``psycopg2`` installed for `postgresql` integrations
to work. The eparse ``BaseDatabaseInterface`` abstracts the
implementation details, so you would use this interface the same
way you use the others, with the exception of the endpoint.

To use a ``postgresql`` database as the source and/or destination
of your data, you would supply an ``--input`` and/or ``--output``
endpoint to the tool as follows:

.. code-block:: bash

    $ eparse -o postgres://user:password@host:port/db_name ...

Where details like ``user``, ``host``, ``port`` are provided to
you by your db administrator. eparse will create the necessary
table(s) and indexes for you when inserting data into the database.


Query
-----
Once you have stored parsed data, you can begin to query it using the
``peewee`` ORM.  This can be done with the tool or directly with
the database.

For example, query distinct column header names from a generated
``SQLite3`` database as follows:

.. code-block:: bash

    $ eparse -i sqlite3:///.files/<db_file> -o stdout:/// query -m get_c_header
                   c_header  Total Rows  Data Types  Distinct Values
      0             ABC-col         150           2               76
      1             DEF-col        3981           3               15
      2             GHI-col          20           1                2
      ..                ...         ...         ...              ...

This command will give descriptive information of each distinct c_header
found, including total rows, unique data types, and distinct values.

You can also get raw un-truncated data as follows:

.. code-block:: bash

    $ eparse -t -i sqlite3:///.files/<db_file> -o stdout:/// query

Filtering data on content is easy.  Use the ``--filter`` option as
follows:

.. code-block:: bash

    $ eparse -i sqlite3:///.files/<db_file> -o stdout:/// query --filter f_name "somefile.xlsx"

The above command will filter all rows from an Excel file named
`somefile.xlsx`. You can use any of the following ``django``-style
filters:

* ``__eq`` equals X
* ``__lt`` less than X
* ``__lte`` less than or equal to X
* ``__gt`` greater than X
* ``__gte`` greater than or equal to X
* ``__ne`` not equal to X
* ``__in`` X is in
* ``__is`` is X
* ``__like`` like expression, such as ``%somestr%``, case sensitive
* ``__ilike`` like expression, such as ``%somestr%``, case insensitive
* ``__regexp`` regular expression matching such as ``^.*?foo.*?$``

Filters are applied to the ORM fields like so:

* ``--filter row__gte 4`` all extracted table rows `>= 5`
* ``--filter f_name__ilike "%foo%"`` all data from filenames with `foo`
* ``--filter value__ne 100`` all data with values other than `100`

Queried data can even be stored into a new database for creating
curated data subsets, as follows:

.. code-block:: bash

    $ eparse -i sqlite3:///.files/<db_file> \
             -o sqlite3:///.files/<subq_db_file> \
             query --filter f_name "somefile.xlsx"

Since database files the tool generates when using `sqlite3:///` are
``SQLite`` native, you can also use `SQLite` database client tools
and execute raw SQL like so:

.. code-block:: bash

    $ sudo apt-get install -y sqlite3-tools
    $ sqlite3 .files/<db_file>
    SQLite version 3.37.2 2022-01-06 13:25:41
    Enter ".help" for usage hints.
    sqlite> .schema
    CREATE TABLE IF NOT EXISTS "excelparse" ("id" INTEGER NOT NULL PRIMARY KEY, "row" INTEGER NOT NULL, "column" INTEGER NOT NULL, "value" VARCHAR(255) NOT NULL, "type" VARCHAR(255) NOT NULL, "c_header" VARCHAR(255) NOT NULL, "r_header" VARCHAR(255) NOT NULL, "excel_RC" VARCHAR(255) NOT NULL, "name" VARCHAR(255) NOT NULL, "sheet" VARCHAR(255) NOT NULL, "f_name" VARCHAR(255) NOT NULL);
    sqlite> .header on
    sqlite> SELECT * FROM excelparse LIMIT 1;
    id|row|column|value|type|c_header|r_header|excel_RC|name|sheet|f_name
    1|0|0|ABC|<class 'str'>|SomeCol|SomeRow|B2|MyTable|Sheet1|myfile.xlsm


Migrate
-------
eparse wouldn't be a solid tool without the ability to migrate your
eparse databases for future code changes.  You can apply migrations
that ship with future versions of eparse as follows:

.. code-block:: bash

    $ eparse -i sqlite3:///.files/<db_file> migrate -m <migration>
    applied <migration>

It is up to you to determine the migrations you need based on the
eparse version you are upgrading from and to. Migrations can be
found in `eparse/migrations.py <eparse/migrations.py>`_


Unstructured
============
If you would like to use eparse to partition xls[x] files alongside unstructured, you can do so with our contributed `partition` and `partition_xlsx` modules. Simply import the `partition` function from `eparse.contrib.unstructured.partition` and use it instead of `partition` from `unstructured.partition.auto` like so:

.. code-block:: python

    from eparse.contrib.unstructured.partition import partition

    elements = partition(filename='some_file.xlsx', eparse_mode='...')

Valid `eparse_mode` settings are available in `eparse.contrib.unstructured.xlsx._eparse_modes`.


Contributing
============
As an open-source project, contributions are always welcome. Please see `Contributing <CONTRIBUTING.rst>`_ for more information.


License
=======
eparse is licensed under the `MIT License <https://opensource.org/licenses/MIT>`_. See the `LICENSE <LICENSE>`_ file for more details.


Contact
=======
Thanks for your support of eparse. Feel free to contact me at `cpappala@gmail.com <mailto:cpappala@gmail.com>`_ or connect with me on `Github <https://www.linkedin.com/in/chris-a-pappalardo/>`_.


=======
History
=======

0.7.3 (2023-08-23)
==================

* Updated partition and partition_xlsx for latest unstructured

0.7.2 (2023-08-22)
==================

* Relaxed unstructured dependency requirement for compatibility

0.7.1 (2023-08-20)
==================

* Added unstructured contrib to core package
* Uopdaed README with instructions

0.7.0 (2023-08-18)
==================

* Added html conversion functions and an html data interface
* Added get_table_digest to core

0.6.3 (2023-06-30)
==================

* Updated get_df_from_file to accept filename or io object

0.6.1 (2023-06-22)
==================

* Added get_df_from_file helper function
* Added contrib with unstructured partition handler

0.5.0 (2023-06-14)
==================

* Optimized database interfaces

0.4.0 (2023-06-14)
==================

* Added postgresql interface

0.3.0 (2023-06-14)
==================

* Added inteface classes
* Added migrations

0.2.0 (2023-06-12)
==================

* Added migrate command
* Added 0.1.2 to 0.2.0 migration

0.1.2 (2023-06-07)
==================

* Updated README

0.1.1 (2023-06-06)
==================

* Updated requirements
* Updated README

0.1.0 (2023-06-06)
==================

* First release on PyPI.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/ChrisPappalardo/eparse",
    "name": "eparse",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "eparse",
    "author": "Chris Pappalardo",
    "author_email": "cpappala@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/c3/a4/c3a7cf1f6cfe980ee0c957a629c5e07705d6a94ce06c81b361d4faaa59d8/eparse-0.7.3.tar.gz",
    "platform": null,
    "description": "======\neparse\n======\n\n\n.. image:: https://img.shields.io/pypi/v/eparse.svg\n        :target: https://pypi.python.org/pypi/eparse\n\n.. image:: https://img.shields.io/badge/License-MIT-blue.svg\n        :target: https://opensource.org/licenses/MIT\n        :alt: License: MIT\n\n\nDescription\n===========\nExcel spreadsheet crawler and table parser for data extraction\nand querying.\n\n\nFeatures\n========\n* Command-line interface\n* Recursive Excel file discovery\n* Sub-tabular data extraction (logical tables)\n* SQLite and PostgreSQL database interfaces\n* CLI query tool\n* Summary data metrics\n\n\nInstallation\n============\nTo install eparse, you can use pip and the latest version on PyPI:\n\n.. code-block:: bash\n\n   $ pip install eparse\n\nOr you can clone this repo and install from source:\n\n.. code-block:: bash\n\n   $ git clone https://github.com/ChrisPappalardo/eparse.git\n   $ cd eparse\n   $ pip install .\n\nIf you plan to use the postgres interface, you also need to install\nthe postgres package ``psycopg2``. Instructions can be found\n`here <https://www.psycopg.org/docs/install.html#quick-install>`_.\nThis package is optional, and you can use the other interfaces\nsuch as the ``SQLite3`` interface without having to install\n``psycopg2``.\n\nThe easiest way to install the ``psycopg2`` package for your\nparticular environment may be to install the pre-compiled\nbinary driver as follows:\n\n.. code-block:: bash\n\n   $ pip install psycopg2-binary\n\nIf you see an error while trying to use a postgres endpoint such\nas ``postgres://user:pass@host:port/my_db`` that mentions the\npostgres driver is missing, then you know you haven't properly\ninstalled (and compiled)  ``psycopg2``.\n\n\nUsage\n=====\neparse can be used as either a python library or from the command-line.\nYou can view supported CLI commands and usage with ``--help`` as follows:\n\n.. code-block:: bash\n\n    $ eparse --help\n    Usage: eparse [OPTIONS] COMMAND [ARGS]...\n\n    excel parser\n\n    Options:\n    -i, --input TEXT   input source\n    -o, --output TEXT  output destination\n    -f, --file TEXT    file(s) or dir(s) to target\n    -d, --debug        use debug mode\n    -l, --loose        find tables loosely\n    -r, --recursive    find files recursively\n    -t, --truncate     truncate dataframe output\n    -v, --verbose      increase output verbosity\n    --help             Show this message and exit.\n\n    Commands:\n    migrate  migrate eparse table\n    parse    parse table(s) found in sheet for target(s)\n    query    query eparse output\n    scan     scan for excel files in target\n\nYou can also use eparse from python like so:\n\n.. code-block:: python\n\n    from eparse.core import get_df_from_file\n\n    print([table for table in get_df_from_file('myfile.xlsx')])\n    102   Date  Principal Repayment   Date  Principal Repayment\n    103  44834        700757.679004  44926        430013.148303\n    104  44926         71957.776108  45016        100576.127808\n    105  45016         147578.19262  45107        898008.340095\n    106  45107         32801.363072  45199         841656.13896\n    ...\n\n\nScan\n----\nTo scan one or more directories for Excel files with descriptive\ninformation, you can use the ``scan`` command like so:\n\n.. code-block:: bash\n\n    $ eparse -v -f <path_to_files> scan\n\nIncrease the verbosity with additional flags, such as ``-vvv``, for\nmore descriptive information about the file(s), including sheet names.\n\n\nParse\n-----\nExcel files can be parsed as follows:\n\n.. code-block:: bash\n\n    $ eparse -v -f <path_to_files> parse\n\nThis mode will list each table found in each Excel file to the command-line.\nThis mode is useful for initial discovery for parseable data.\n\neparse uses a simple algorithm for identifying tables.  Table \"corners\"\nare identified as cells that contain empty cells above and to the right\n(or sheet boundaries).  A densely or sparsely populated 2x2 table must\nfollow in order for data to be extracted in relation to that cell.\neparse will automatically adjust for rowspan labels and empty table\ncorners and the dense vs. sparse criterion can be controlled with\nthe ``--loose`` flag.\n\neparse was written to accomodate various types of output formats and\nendpoints, including ``null:///``, ``stdout:///``, ``sqlite3:///db_name``,\nand ``postgres://user:password@host:port/db_name``.\n\nnull\n^^^^\nThis mode is useful for validating files and generating descriptive\ninfo, and is the default.  The command above with `-v` is an example\nof this mode, which lists out the tables found.\n\nstdout\n^^^^^^\nThis mode is good for viewing data extracted from Excel files in the\nconsole.  For example, you could view all tables found in `Sheet1`\nwith the following command:\n\n.. code-block:: bash\n\n    $ eparse -f <path_to_files> -o stdout:/// parse -s \"Sheet1\"\n\neparse uses `pandas <https://github.com/pandas-dev/pandas>`_\nto handle table data.  You can view larger tables without truncation\nusing the ``-t`` flag as follows:\n\n.. code-block:: bash\n\n    $ eparse -t -f <path_to_files> -o stdout:/// parse -s \"Sheet1\"\n\nData in table format is useful for human viewing, but a serialized\nform is better for data interfacing.  Serialize your output with\nthe ``-z`` flag as follows:\n\n.. code-block:: bash\n\n    $ eparse -t -f <path_to_files> -o stdout:/// parse -z\n\nEach cell of extracted table data is serialized as follows:\n\n* `row` - 0-indexed table row number\n* `column` - 0-indexed table column number\n* `value` - the value of the cell as a ``str``\n* `type` - the implied python ``type`` of the data found\n* `c_header` - the column header\n* `r_header` - the row header\n* `excel_RC` - the RC reference from the spreadsheet (e.g. B10)\n* `sheet` - the name of the sheet\n* `f_name` - the name of the file\n\nsqlite3\n^^^^^^^\neparse uses the `peewee <https://github.com/coleifer/peewee>`_\npackage for ORM and database integration.  The\n`interfaces <eparse/interfaces.py>`_ module contains an\n``ExcelParse`` model that provides data persistence and a common\ninterface.\n\nTo create a `SQLite3 <https://github.com/sqlite/sqlite>`_ database\nwith your parsed Excel data, use the following command:\n\n.. code-block:: bash\n\n    $ mkdir .files\n    $ eparse -f <path_to_files> -o sqlite3:/// parse -z\n\nThis command will automatically generate a unique database filename\nusing the ``uuid`` python package in the ``.files/`` sub-directory\nof the working directory.  You may need to create this directory\nbefore running this command, as shown.\n\nYou can also specify a path and filename of your choosing, like so:\n\n.. code-block:: bash\n\n    $ mkdir .files\n    $ eparse -f <path_to_files> -o sqlite3:///path/filename.db parse -z\n\npostgres\n^^^^^^^^\neparse also supports `postgresql` integrations. As mentioned above,\nyou will need ``psycopg2`` installed for `postgresql` integrations\nto work. The eparse ``BaseDatabaseInterface`` abstracts the\nimplementation details, so you would use this interface the same\nway you use the others, with the exception of the endpoint.\n\nTo use a ``postgresql`` database as the source and/or destination\nof your data, you would supply an ``--input`` and/or ``--output``\nendpoint to the tool as follows:\n\n.. code-block:: bash\n\n    $ eparse -o postgres://user:password@host:port/db_name ...\n\nWhere details like ``user``, ``host``, ``port`` are provided to\nyou by your db administrator. eparse will create the necessary\ntable(s) and indexes for you when inserting data into the database.\n\n\nQuery\n-----\nOnce you have stored parsed data, you can begin to query it using the\n``peewee`` ORM.  This can be done with the tool or directly with\nthe database.\n\nFor example, query distinct column header names from a generated\n``SQLite3`` database as follows:\n\n.. code-block:: bash\n\n    $ eparse -i sqlite3:///.files/<db_file> -o stdout:/// query -m get_c_header\n                   c_header  Total Rows  Data Types  Distinct Values\n      0             ABC-col         150           2               76\n      1             DEF-col        3981           3               15\n      2             GHI-col          20           1                2\n      ..                ...         ...         ...              ...\n\nThis command will give descriptive information of each distinct c_header\nfound, including total rows, unique data types, and distinct values.\n\nYou can also get raw un-truncated data as follows:\n\n.. code-block:: bash\n\n    $ eparse -t -i sqlite3:///.files/<db_file> -o stdout:/// query\n\nFiltering data on content is easy.  Use the ``--filter`` option as\nfollows:\n\n.. code-block:: bash\n\n    $ eparse -i sqlite3:///.files/<db_file> -o stdout:/// query --filter f_name \"somefile.xlsx\"\n\nThe above command will filter all rows from an Excel file named\n`somefile.xlsx`. You can use any of the following ``django``-style\nfilters:\n\n* ``__eq`` equals X\n* ``__lt`` less than X\n* ``__lte`` less than or equal to X\n* ``__gt`` greater than X\n* ``__gte`` greater than or equal to X\n* ``__ne`` not equal to X\n* ``__in`` X is in\n* ``__is`` is X\n* ``__like`` like expression, such as ``%somestr%``, case sensitive\n* ``__ilike`` like expression, such as ``%somestr%``, case insensitive\n* ``__regexp`` regular expression matching such as ``^.*?foo.*?$``\n\nFilters are applied to the ORM fields like so:\n\n* ``--filter row__gte 4`` all extracted table rows `>= 5`\n* ``--filter f_name__ilike \"%foo%\"`` all data from filenames with `foo`\n* ``--filter value__ne 100`` all data with values other than `100`\n\nQueried data can even be stored into a new database for creating\ncurated data subsets, as follows:\n\n.. code-block:: bash\n\n    $ eparse -i sqlite3:///.files/<db_file> \\\n             -o sqlite3:///.files/<subq_db_file> \\\n             query --filter f_name \"somefile.xlsx\"\n\nSince database files the tool generates when using `sqlite3:///` are\n``SQLite`` native, you can also use `SQLite` database client tools\nand execute raw SQL like so:\n\n.. code-block:: bash\n\n    $ sudo apt-get install -y sqlite3-tools\n    $ sqlite3 .files/<db_file>\n    SQLite version 3.37.2 2022-01-06 13:25:41\n    Enter \".help\" for usage hints.\n    sqlite> .schema\n    CREATE TABLE IF NOT EXISTS \"excelparse\" (\"id\" INTEGER NOT NULL PRIMARY KEY, \"row\" INTEGER NOT NULL, \"column\" INTEGER NOT NULL, \"value\" VARCHAR(255) NOT NULL, \"type\" VARCHAR(255) NOT NULL, \"c_header\" VARCHAR(255) NOT NULL, \"r_header\" VARCHAR(255) NOT NULL, \"excel_RC\" VARCHAR(255) NOT NULL, \"name\" VARCHAR(255) NOT NULL, \"sheet\" VARCHAR(255) NOT NULL, \"f_name\" VARCHAR(255) NOT NULL);\n    sqlite> .header on\n    sqlite> SELECT * FROM excelparse LIMIT 1;\n    id|row|column|value|type|c_header|r_header|excel_RC|name|sheet|f_name\n    1|0|0|ABC|<class 'str'>|SomeCol|SomeRow|B2|MyTable|Sheet1|myfile.xlsm\n\n\nMigrate\n-------\neparse wouldn't be a solid tool without the ability to migrate your\neparse databases for future code changes.  You can apply migrations\nthat ship with future versions of eparse as follows:\n\n.. code-block:: bash\n\n    $ eparse -i sqlite3:///.files/<db_file> migrate -m <migration>\n    applied <migration>\n\nIt is up to you to determine the migrations you need based on the\neparse version you are upgrading from and to. Migrations can be\nfound in `eparse/migrations.py <eparse/migrations.py>`_\n\n\nUnstructured\n============\nIf you would like to use eparse to partition xls[x] files alongside unstructured, you can do so with our contributed `partition` and `partition_xlsx` modules. Simply import the `partition` function from `eparse.contrib.unstructured.partition` and use it instead of `partition` from `unstructured.partition.auto` like so:\n\n.. code-block:: python\n\n    from eparse.contrib.unstructured.partition import partition\n\n    elements = partition(filename='some_file.xlsx', eparse_mode='...')\n\nValid `eparse_mode` settings are available in `eparse.contrib.unstructured.xlsx._eparse_modes`.\n\n\nContributing\n============\nAs an open-source project, contributions are always welcome. Please see `Contributing <CONTRIBUTING.rst>`_ for more information.\n\n\nLicense\n=======\neparse is licensed under the `MIT License <https://opensource.org/licenses/MIT>`_. See the `LICENSE <LICENSE>`_ file for more details.\n\n\nContact\n=======\nThanks for your support of eparse. Feel free to contact me at `cpappala@gmail.com <mailto:cpappala@gmail.com>`_ or connect with me on `Github <https://www.linkedin.com/in/chris-a-pappalardo/>`_.\n\n\n=======\nHistory\n=======\n\n0.7.3 (2023-08-23)\n==================\n\n* Updated partition and partition_xlsx for latest unstructured\n\n0.7.2 (2023-08-22)\n==================\n\n* Relaxed unstructured dependency requirement for compatibility\n\n0.7.1 (2023-08-20)\n==================\n\n* Added unstructured contrib to core package\n* Uopdaed README with instructions\n\n0.7.0 (2023-08-18)\n==================\n\n* Added html conversion functions and an html data interface\n* Added get_table_digest to core\n\n0.6.3 (2023-06-30)\n==================\n\n* Updated get_df_from_file to accept filename or io object\n\n0.6.1 (2023-06-22)\n==================\n\n* Added get_df_from_file helper function\n* Added contrib with unstructured partition handler\n\n0.5.0 (2023-06-14)\n==================\n\n* Optimized database interfaces\n\n0.4.0 (2023-06-14)\n==================\n\n* Added postgresql interface\n\n0.3.0 (2023-06-14)\n==================\n\n* Added inteface classes\n* Added migrations\n\n0.2.0 (2023-06-12)\n==================\n\n* Added migrate command\n* Added 0.1.2 to 0.2.0 migration\n\n0.1.2 (2023-06-07)\n==================\n\n* Updated README\n\n0.1.1 (2023-06-06)\n==================\n\n* Updated requirements\n* Updated README\n\n0.1.0 (2023-06-06)\n==================\n\n* First release on PyPI.\n",
    "bugtrack_url": null,
    "license": "MIT license",
    "summary": "Excel spreadsheet crawler and table parser for data discovery, extraction, and querying",
    "version": "0.7.3",
    "project_urls": {
        "Homepage": "https://github.com/ChrisPappalardo/eparse"
    },
    "split_keywords": [
        "eparse"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "26e8acf68d42e11c192225db3c32be6b6690841d26caff9ee3482b6ac0cd37b4",
                "md5": "4af60541a6f26804a87c3abd2cefb122",
                "sha256": "b4b104174f34c7b983d35d2ce161219428348239c83d87e2c8bffd79b68f172b"
            },
            "downloads": -1,
            "filename": "eparse-0.7.3-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "4af60541a6f26804a87c3abd2cefb122",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": ">=3.8",
            "size": 19234,
            "upload_time": "2023-08-23T22:11:05",
            "upload_time_iso_8601": "2023-08-23T22:11:05.341700Z",
            "url": "https://files.pythonhosted.org/packages/26/e8/acf68d42e11c192225db3c32be6b6690841d26caff9ee3482b6ac0cd37b4/eparse-0.7.3-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c3a4c3a7cf1f6cfe980ee0c957a629c5e07705d6a94ce06c81b361d4faaa59d8",
                "md5": "dd520d71e242a256243cbb34908ad56c",
                "sha256": "2b0dab30ff0683939545a9926656f5a245ceade2988b011818a055d8ee276cb1"
            },
            "downloads": -1,
            "filename": "eparse-0.7.3.tar.gz",
            "has_sig": false,
            "md5_digest": "dd520d71e242a256243cbb34908ad56c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 82219,
            "upload_time": "2023-08-23T22:11:07",
            "upload_time_iso_8601": "2023-08-23T22:11:07.443223Z",
            "url": "https://files.pythonhosted.org/packages/c3/a4/c3a7cf1f6cfe980ee0c957a629c5e07705d6a94ce06c81b361d4faaa59d8/eparse-0.7.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-08-23 22:11:07",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "ChrisPappalardo",
    "github_project": "eparse",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "tox": true,
    "lcname": "eparse"
}

Chris Pappalardo