rowgenerators


Namerowgenerators JSON
Version 0.9.26 PyPI version JSON
download
home_pagehttps://github.com/Metatab/rowgenerator.git
SummaryGenerate row data from a variety of file formats
upload_time2023-10-19 20:07:39
maintainer
docs_urlNone
authorEric Busboom
requires_python
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ==============
Row Generators
==============


Application Urls
================


.. image:: https://travis-ci.org/Metatab/appurl.svg?branch=master
    :target: https://travis-ci.org/Metatab/appurl

Application Urls provide structure and operations on URLS where the file the
URL refers to can't, in general, simply be downloaded. For instance, you may
want to refer to a CSV file inside a ZIP archive, or a worksheet in an Excel
file. In conjunction with `Row Generators
<https://github.com/CivicKnowledge/rowgenerators>`_, Application Urls are often
used to refer to tabular data stored on data repositories. For instance:

-  Stored on the web: ``http://examples.com/file.csv``
-  Inside a zip file on the web: ``http://example.com/archive.zip#file.csv``
-  A worksheet in an Excel file: ``http://example.com/excel.xls#worksheet``
-  A worksheet in an Excel file in a ZIP Archive:
   ``http://example.com/archive.zip#excel.xls;worksheet``
-  An API: ``socrata+http://chhs.data.ca.gov/api/views/tthg-z4mf``


Install
*******


.. code-block:: bash

    $ pip install appurl

Documentation
*************

See the documentation at http://row-generators.readthedocs.io/

Development Notes
*****************

Running tests
+++++++++++++

Run ``python setup.py tests`` to run normal development tests. You can also run ``tox``, which will
try to run the tests with python 3.4, 3.5 and 3.6, ignoring non-existent interpreters.


Development Testing with Docker
+++++++++++++++++++++++++++++++

Testing during development for other versions of Python is a bit of a pain, since you have
to install the alternate version, and Tox will run all of the tests, not just the one you want.

One way to deal with this is to install Docker locally, then run the docker test container
on the source directory. This is done automatically from the Makefile in appurl/tests


.. code-block:: bash

    $ cd ./docker
    $ make build # to create the container image
    $ make shell # to run bash the container

You now have a docker container where the /code directory is the appurl source dir.

Now, run tox to build the tox virtual environments, then enter the specific version you want to
run tests for and activate the virtual environment.

.. code-block:: bash

    # tox
    # cd .tox/py34
    # source bin/activate # Activate the python 3.4 virtual env
    # cd ../../
    # python setup.py test # Cause test deps to get installed
    #
    # python -munittest appurl.test.test_basic.BasicTests.test_url_classes  # Run one test




=================
Row Data Pipeline
=================

The Rowpipe library manages row-oriented data transformers. Clients can create a RowProcessor() that has schema, composed of tables and columns, where each column cna have a "transform" that describes how to alter the data in the column.

.. code-block:: python

    from rowpipe.table import Table
    from rowpipe.processor import RowProcessor

    def doubleit(v):
        return int(v) * 2

    env = {
        'doubleit': doubleit
    }

    t = Table('foobar')
    t.add_column('id', datatype='int')
    t.add_column('other_id', datatype='int', transform='^row.a')
    t.add_column('i1', datatype='int', transform='^row.a;doubleit')
    t.add_column('f1', datatype='float', transform='^row.b;doubleit')
    t.add_column('i2', datatype='int', transform='^row.a')
    t.add_column('f2', datatype='float', transform='^row.b')


In this table definition, ``other_id`` and ``i2`` columns are  initialized to the valu of the ``a`` column in the input row,
The  ``i1`` column is initialized to the input row ``a`` column, then the ``doubleit`` function is called on the value. In the last step, all of the values are cast to the types specified in the ``datatype`` column.

The RowProcessor is then run using this table definition, and an input generator:

.. code-block:: python

    class Source(object):

        headers = 'a b'.split()

        def __iter__(self):
            for i in range(N):
                yield i, 2*i

    rp = RowProcessor(Source(), t, env=env)



Then, ``rp`` is a generator that returns ``RowProxy`` objects, which can be indexed as integers or by clolumn number:


.. code-block:: python

    for row in rp:
        v1 = row['f1']
        v2 = row[3]

The RowProcessor creates Python code files and executes them.

Transforms can have several steps, seperated by ';'. The first, prefixes with a '^', initializes the value for the rest of the transforms. A transform that is prefixes with a '!' is executed on exceptions.  Transform functions can have a variable signature; the tranform processor matches argument names. Valid argument names are:

- row. A rowProxy object for the input row. Allows access to any input row value
- row_n. Row number.
- scratch. A dict for temporary storage
- errors. A defaultdict(set) for storing error reports for columns. Keys are column names
- accumulator. A dict for accumulating value, such as sums.
- pipe. Unused
- bundle. Unused
- source. Reference to the input generator that is generating rows
- v . The input row value
- header_s. The header for the column in the input row.
- i_s. The index of the column in the input row
- header_d. The header for the column in the output row.
- i_d.  The index of the column in the output row

... and there is a whole lot more. This documentation is woefully incomplete ...

Notes
=====

This repo still contains old code for Row Pipelines, which are in the ``pipeline.py`` file. These components can be combined to performd defined operations on rows, such as skipping rows based on a predicate, altering the number of rows, returning on ly the head or tail, etc. The code is not currently used ot tested.




            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Metatab/rowgenerator.git",
    "name": "rowgenerators",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "",
    "author": "Eric Busboom",
    "author_email": "eric@civicknowledge.com",
    "download_url": "https://files.pythonhosted.org/packages/9a/4f/81f36a823551e2785d248c16d3c611a0c5ca37fc9769110d7edaa1274fd3/rowgenerators-0.9.26.tar.gz",
    "platform": null,
    "description": "==============\nRow Generators\n==============\n\n\nApplication Urls\n================\n\n\n.. image:: https://travis-ci.org/Metatab/appurl.svg?branch=master\n    :target: https://travis-ci.org/Metatab/appurl\n\nApplication Urls provide structure and operations on URLS where the file the\nURL refers to can't, in general, simply be downloaded. For instance, you may\nwant to refer to a CSV file inside a ZIP archive, or a worksheet in an Excel\nfile. In conjunction with `Row Generators\n<https://github.com/CivicKnowledge/rowgenerators>`_, Application Urls are often\nused to refer to tabular data stored on data repositories. For instance:\n\n-  Stored on the web: ``http://examples.com/file.csv``\n-  Inside a zip file on the web: ``http://example.com/archive.zip#file.csv``\n-  A worksheet in an Excel file: ``http://example.com/excel.xls#worksheet``\n-  A worksheet in an Excel file in a ZIP Archive:\n   ``http://example.com/archive.zip#excel.xls;worksheet``\n-  An API: ``socrata+http://chhs.data.ca.gov/api/views/tthg-z4mf``\n\n\nInstall\n*******\n\n\n.. code-block:: bash\n\n    $ pip install appurl\n\nDocumentation\n*************\n\nSee the documentation at http://row-generators.readthedocs.io/\n\nDevelopment Notes\n*****************\n\nRunning tests\n+++++++++++++\n\nRun ``python setup.py tests`` to run normal development tests. You can also run ``tox``, which will\ntry to run the tests with python 3.4, 3.5 and 3.6, ignoring non-existent interpreters.\n\n\nDevelopment Testing with Docker\n+++++++++++++++++++++++++++++++\n\nTesting during development for other versions of Python is a bit of a pain, since you have\nto install the alternate version, and Tox will run all of the tests, not just the one you want.\n\nOne way to deal with this is to install Docker locally, then run the docker test container\non the source directory. This is done automatically from the Makefile in appurl/tests\n\n\n.. code-block:: bash\n\n    $ cd ./docker\n    $ make build # to create the container image\n    $ make shell # to run bash the container\n\nYou now have a docker container where the /code directory is the appurl source dir.\n\nNow, run tox to build the tox virtual environments, then enter the specific version you want to\nrun tests for and activate the virtual environment.\n\n.. code-block:: bash\n\n    # tox\n    # cd .tox/py34\n    # source bin/activate # Activate the python 3.4 virtual env\n    # cd ../../\n    # python setup.py test # Cause test deps to get installed\n    #\n    # python -munittest appurl.test.test_basic.BasicTests.test_url_classes  # Run one test\n\n\n\n\n=================\nRow Data Pipeline\n=================\n\nThe Rowpipe library manages row-oriented data transformers. Clients can create a RowProcessor() that has schema, composed of tables and columns, where each column cna have a \"transform\" that describes how to alter the data in the column.\n\n.. code-block:: python\n\n    from rowpipe.table import Table\n    from rowpipe.processor import RowProcessor\n\n    def doubleit(v):\n        return int(v) * 2\n\n    env = {\n        'doubleit': doubleit\n    }\n\n    t = Table('foobar')\n    t.add_column('id', datatype='int')\n    t.add_column('other_id', datatype='int', transform='^row.a')\n    t.add_column('i1', datatype='int', transform='^row.a;doubleit')\n    t.add_column('f1', datatype='float', transform='^row.b;doubleit')\n    t.add_column('i2', datatype='int', transform='^row.a')\n    t.add_column('f2', datatype='float', transform='^row.b')\n\n\nIn this table definition, ``other_id`` and ``i2`` columns are  initialized to the valu of the ``a`` column in the input row,\nThe  ``i1`` column is initialized to the input row ``a`` column, then the ``doubleit`` function is called on the value. In the last step, all of the values are cast to the types specified in the ``datatype`` column.\n\nThe RowProcessor is then run using this table definition, and an input generator:\n\n.. code-block:: python\n\n    class Source(object):\n\n        headers = 'a b'.split()\n\n        def __iter__(self):\n            for i in range(N):\n                yield i, 2*i\n\n    rp = RowProcessor(Source(), t, env=env)\n\n\n\nThen, ``rp`` is a generator that returns ``RowProxy`` objects, which can be indexed as integers or by clolumn number:\n\n\n.. code-block:: python\n\n    for row in rp:\n        v1 = row['f1']\n        v2 = row[3]\n\nThe RowProcessor creates Python code files and executes them.\n\nTransforms can have several steps, seperated by ';'. The first, prefixes with a '^', initializes the value for the rest of the transforms. A transform that is prefixes with a '!' is executed on exceptions.  Transform functions can have a variable signature; the tranform processor matches argument names. Valid argument names are:\n\n- row. A rowProxy object for the input row. Allows access to any input row value\n- row_n. Row number.\n- scratch. A dict for temporary storage\n- errors. A defaultdict(set) for storing error reports for columns. Keys are column names\n- accumulator. A dict for accumulating value, such as sums.\n- pipe. Unused\n- bundle. Unused\n- source. Reference to the input generator that is generating rows\n- v . The input row value\n- header_s. The header for the column in the input row.\n- i_s. The index of the column in the input row\n- header_d. The header for the column in the output row.\n- i_d.  The index of the column in the output row\n\n... and there is a whole lot more. This documentation is woefully incomplete ...\n\nNotes\n=====\n\nThis repo still contains old code for Row Pipelines, which are in the ``pipeline.py`` file. These components can be combined to performd defined operations on rows, such as skipping rows based on a predicate, altering the number of rows, returning on ly the head or tail, etc. The code is not currently used ot tested.\n\n\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Generate row data from a variety of file formats",
    "version": "0.9.26",
    "project_urls": {
        "Homepage": "https://github.com/Metatab/rowgenerator.git"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9a4f81f36a823551e2785d248c16d3c611a0c5ca37fc9769110d7edaa1274fd3",
                "md5": "87beebc98142ac1a8b9cd342dea883c5",
                "sha256": "93fc93caebfe545ca50a67dcff4339de8d3ade33d03458d680502f775781c9db"
            },
            "downloads": -1,
            "filename": "rowgenerators-0.9.26.tar.gz",
            "has_sig": false,
            "md5_digest": "87beebc98142ac1a8b9cd342dea883c5",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 19574611,
            "upload_time": "2023-10-19T20:07:39",
            "upload_time_iso_8601": "2023-10-19T20:07:39.122600Z",
            "url": "https://files.pythonhosted.org/packages/9a/4f/81f36a823551e2785d248c16d3c611a0c5ca37fc9769110d7edaa1274fd3/rowgenerators-0.9.26.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-10-19 20:07:39",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Metatab",
    "github_project": "rowgenerator",
    "github_not_found": true,
    "lcname": "rowgenerators"
}
        
Elapsed time: 0.12164s