tabbed

Name	tabbed JSON
Version	1.0.1 JSON
	download
home_page	None
Summary	An iterative reader of irregular text files
upload_time	2025-07-08 15:45:42
maintainer	None
docs_url	None
author	None
requires_python	>=3.12
license	None
keywords	text delimited csv tsv
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            <h1 align="center">
    <img src="https://github.com/mscaudill/tabbed/raw/master/docs/imgs/namedlogo.png"
    style="width:600px;height:auto;"/>
</h1>

## A Python package for reading variably structured text files at scale

![PyPI - License](https://img.shields.io/pypi/l/openseize?color=purple)
[![pytest](https://github.com/mscaudill/tabbed/actions/workflows/testing.yml/badge.svg)](
https://github.com/mscaudill/tabbed/actions/workflows/testing.yml)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](
https://github.com/psf/black)

**Tabbed** is a Python library for reading variably structured text files. It
automatically deduces data start locations, data types and performs iterative
and value-based conditional reading of data rows.

[**Key Features**](#key-features)
| [**Usage**](#usage)
| [**Documentation**](#documentation)
| [**Dependencies**](#dependencies)
| [**Installation**](#installation)
| [**Contributing**](#contributing)
| [**Acknowledgments**](acknowledgements)

-----------------

## Key Features

- **Structural Inference:**  
A common variant of the
[standard](https://datatracker.ietf.org/doc/html/rfc4180) text file is one that
contains *metadata* prior to a header or data section. Tabbed can locate the
metadata, header and data locations in a file.

- **Type inference:**  
Tabbed can parse `int`, `float`, `complex`, `time`, `date` and `datetime`
instances at high-speed via a polling strategy.

- **Conditional Reading:**  
Tabbed can filter rows during reading with equality, membership, rich
comparison, regular expression matching and custom callables via simple keyword
arguments.

- **Partial and Iterative Reading:**  
Tabbed supports reading of large text files that consumes only as much memory as
you choose.


## Usage

Below is a sample file with a *Metadata* section and *Header* using the tab
character as the delimiter.

**annotations.txt**
```AsciiDoc
Experiment ID Experiment
Animal ID Animal
Researcher Test
Directory path 

Number Start Time End Time Time From Start Channel Annotation
0 02/09/22 09:17:38.948 02/09/22 09:17:38.948 0.0000 ALL Started Recording
1 02/09/22 09:37:00.000 02/09/22 09:37:00.000 1161.0520 ALL start
2 02/09/22 09:37:00.000 02/09/22 09:37:08.784 1161.0520 ALL exploring
3 02/09/22 09:37:08.784 02/09/22 09:37:13.897 1169.8360 ALL grooming
4 02/09/22 09:37:13.897 02/09/22 09:38:01.262 1174.9490 ALL exploring
5 02/09/22 09:38:01.262 02/09/22 09:38:07.909 1222.3140 ALL grooming
6 02/09/22 09:38:07.909 02/09/22 09:38:20.258 1228.9610 ALL exploring
7 02/09/22 09:38:20.258 02/09/22 09:38:25.435 1241.3100 ALL grooming
8 02/09/22 09:38:25.435 02/09/22 09:40:07.055 1246.4870 ALL exploring
9 02/09/22 09:40:07.055 02/09/22 09:40:22.334 1348.1070 ALL grooming
10 02/09/22 09:40:22.334 02/09/22 09:41:36.664 1363.3860 ALL exploring
```

**Dialect and Type Inference**

Tabbed can detect the dialect via [clevercsv](
https://clevercsv.readthedocs.io/en/latest/)  and infer the data types.

```python
from tabbed.reading import Reader

infile = open('annotations.txt', 'r')
reader = Reader(infile)
dialect = reader.dialect
types, _ = reader.sniffer.types()
    
print(dialect) # a clevercsv SimpleDialect
print('---')
print(types)
```

*Output*
```
SimpleDialect('\t', '"', None)
---
[<class 'int'>, <class 'datetime.datetime'>, <class 'datetime.datetime'>, <class 'float'>, <class 'str'>, <class 'str'>]
```

**Metadata and Header detection**

Tabbed can automatically locate the metadata, header and data rows.

```python
print(reader.header)
print('---')
print(reader.metadata)
```

*Output*
```
Header(line=6,
       names=['Number', 'Start_Time', 'End_Time', 'Time_From_Start', 'Channel', 'Annotation'],
       string='Number\tStart Time\tEnd Time\tTime From Start\tChannel\tAnnotation')
---
MetaData(lines=(0, 6),
         string='Experiment ID\tExperiment\nAnimal ID\tAnimal\nResearcher\tTest\nDirectory path\t\n\n')
```

**Filtered Reading with Tabs**

Tabbed supports row and column filtering with equality, membership, rich
comparison and regular expression matching. Its also fully iterative allowing
users to choose the amount of memory to consume during file reading.

```python
from itertools import chain

# tab rows whose Start_Time is between 9:38 and 9:40 and set reader to read
# only the Number and Start_Time columns
reader.tab(
    Start_Time='>= 2/09/2022 9:38:00 and <2/09/2022 9:40:00',
    columns=['Number', 'Start_Time'
)

# read the data to an iterator reading only 2 rows at a time
gen = reader.read(chunksize=2)

# convert to an in-memory list
data = chain.from_iterable(gen)
print(data)

# close the reader when done or open under context-management
reader.close()
```

*Output*
```
{'Number': 5, 'Start_Time': datetime.datetime(2022, 2, 9, 9, 38, 1, 262000)}
{'Number': 6, 'Start_Time': datetime.datetime(2022, 2, 9, 9, 38, 7, 909000)}
{'Number': 7, 'Start_Time': datetime.datetime(2022, 2, 9, 9, 38, 20, 258000)}
{'Number': 8, 'Start_Time': datetime.datetime(2022, 2, 9, 9, 38, 25, 435000)}
```

## Documentation
The official documentation is hosted on [github.io](#https://mscaudill.github.io/tabbed/).


## Dependencies
Tabbed depends on the excellent [clevercsv](
https://clevercsv.readthedocs.io/en/latest/) package for dialect detection. The
rest is pure Python >= 3.11.


## Installation

Tabbed is hosted on [pypi]() and can be installed with pip into a virtual
environment.

```bash
pip install tabbed
```

To get a development version of `Tabbed` from source start by cloning the
repository

```bash
git clone git@github.com:mscaudill/tabbed.git
```

Go to the directory you just cloned and create an *editable install* with pip.
```bash
pip install -e .[dev]
```

## Contributing

We're excited you want to contribute! Please check out our
[Contribution](
https://github.com/mscaudill/tabbed/blob/master/.github/CONTRIBUTING.md) guide.


## Acknowledgements

------

**We are grateful for the support of the Ting Tsung and Wei Fong Chao
Foundation and the Jan and Dan Duncan Neurological Research Institute at
Texas Children's that generously supports Tabbed.**

------

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "tabbed",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.12",
    "maintainer_email": null,
    "keywords": "text, delimited, csv, tsv",
    "author": null,
    "author_email": "Matthew Caudill <mscaudill@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/64/95/d1d067b22e3153513830185c385fcc4e83c9c52fb93778e0d4d88096b60e/tabbed-1.0.1.tar.gz",
    "platform": null,
    "description": "<h1 align=\"center\">\n    <img src=\"https://github.com/mscaudill/tabbed/raw/master/docs/imgs/namedlogo.png\"\n    style=\"width:600px;height:auto;\"/>\n</h1>\n\n## A Python package for reading variably structured text files at scale\n\n![PyPI - License](https://img.shields.io/pypi/l/openseize?color=purple)\n[![pytest](https://github.com/mscaudill/tabbed/actions/workflows/testing.yml/badge.svg)](\nhttps://github.com/mscaudill/tabbed/actions/workflows/testing.yml)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](\nhttps://github.com/psf/black)\n\n**Tabbed** is a Python library for reading variably structured text files. It\nautomatically deduces data start locations, data types and performs iterative\nand value-based conditional reading of data rows.\n\n[**Key Features**](#key-features)\n| [**Usage**](#usage)\n| [**Documentation**](#documentation)\n| [**Dependencies**](#dependencies)\n| [**Installation**](#installation)\n| [**Contributing**](#contributing)\n| [**Acknowledgments**](acknowledgements)\n\n-----------------\n\n## Key Features\n\n- **Structural Inference:**  \nA common variant of the\n[standard](https://datatracker.ietf.org/doc/html/rfc4180) text file is one that\ncontains *metadata* prior to a header or data section. Tabbed can locate the\nmetadata, header and data locations in a file.\n\n- **Type inference:**  \nTabbed can parse `int`, `float`, `complex`, `time`, `date` and `datetime`\ninstances at high-speed via a polling strategy.\n\n- **Conditional Reading:**  \nTabbed can filter rows during reading with equality, membership, rich\ncomparison, regular expression matching and custom callables via simple keyword\narguments.\n\n- **Partial and Iterative Reading:**  \nTabbed supports reading of large text files that consumes only as much memory as\nyou choose.\n\n\n## Usage\n\nBelow is a sample file with a *Metadata* section and *Header* using the tab\ncharacter as the delimiter.\n\n**annotations.txt**\n```AsciiDoc\nExperiment ID\u2003Experiment\nAnimal ID\u2003Animal\nResearcher\u2003Test\nDirectory path\u2003\n\nNumber\u2003Start Time\u2003End Time\u2003Time From Start\u2003Channel\u2003Annotation\n0\u200302/09/22 09:17:38.948\u200302/09/22 09:17:38.948\u20030.0000\u2003ALL\u2003Started Recording\n1\u200302/09/22 09:37:00.000\u200302/09/22 09:37:00.000\u20031161.0520\u2003ALL\u2003start\n2\u200302/09/22 09:37:00.000\u200302/09/22 09:37:08.784\u20031161.0520\u2003ALL\u2003exploring\n3\u200302/09/22 09:37:08.784\u200302/09/22 09:37:13.897\u20031169.8360\u2003ALL\u2003grooming\n4\u200302/09/22 09:37:13.897\u200302/09/22 09:38:01.262\u20031174.9490\u2003ALL\u2003exploring\n5\u200302/09/22 09:38:01.262\u200302/09/22 09:38:07.909\u20031222.3140\u2003ALL\u2003grooming\n6\u200302/09/22 09:38:07.909\u200302/09/22 09:38:20.258\u20031228.9610\u2003ALL\u2003exploring\n7\u200302/09/22 09:38:20.258\u200302/09/22 09:38:25.435\u20031241.3100\u2003ALL\u2003grooming\n8\u200302/09/22 09:38:25.435\u200302/09/22 09:40:07.055\u20031246.4870\u2003ALL\u2003exploring\n9\u200302/09/22 09:40:07.055\u200302/09/22 09:40:22.334\u20031348.1070\u2003ALL\u2003grooming\n10\u200302/09/22 09:40:22.334\u200302/09/22 09:41:36.664\u20031363.3860\u2003ALL\u2003exploring\n```\n\n**Dialect and Type Inference**\n\nTabbed can detect the dialect via [clevercsv](\nhttps://clevercsv.readthedocs.io/en/latest/)  and infer the data types.\n\n```python\nfrom tabbed.reading import Reader\n\ninfile = open('annotations.txt', 'r')\nreader = Reader(infile)\ndialect = reader.dialect\ntypes, _ = reader.sniffer.types()\n    \nprint(dialect) # a clevercsv SimpleDialect\nprint('---')\nprint(types)\n```\n\n*Output*\n```\nSimpleDialect('\\t', '\"', None)\n---\n[<class 'int'>, <class 'datetime.datetime'>, <class 'datetime.datetime'>, <class 'float'>, <class 'str'>, <class 'str'>]\n```\n\n**Metadata and Header detection**\n\nTabbed can automatically locate the metadata, header and data rows.\n\n```python\nprint(reader.header)\nprint('---')\nprint(reader.metadata)\n```\n\n*Output*\n```\nHeader(line=6,\n       names=['Number', 'Start_Time', 'End_Time', 'Time_From_Start', 'Channel', 'Annotation'],\n       string='Number\\tStart Time\\tEnd Time\\tTime From Start\\tChannel\\tAnnotation')\n---\nMetaData(lines=(0, 6),\n         string='Experiment ID\\tExperiment\\nAnimal ID\\tAnimal\\nResearcher\\tTest\\nDirectory path\\t\\n\\n')\n```\n\n**Filtered Reading with Tabs**\n\nTabbed supports row and column filtering with equality, membership, rich\ncomparison and regular expression matching. Its also fully iterative allowing\nusers to choose the amount of memory to consume during file reading.\n\n```python\nfrom itertools import chain\n\n# tab rows whose Start_Time is between 9:38 and 9:40 and set reader to read\n# only the Number and Start_Time columns\nreader.tab(\n    Start_Time='>= 2/09/2022 9:38:00 and <2/09/2022 9:40:00',\n    columns=['Number', 'Start_Time'\n)\n\n# read the data to an iterator reading only 2 rows at a time\ngen = reader.read(chunksize=2)\n\n# convert to an in-memory list\ndata = chain.from_iterable(gen)\nprint(data)\n\n# close the reader when done or open under context-management\nreader.close()\n```\n\n*Output*\n```\n{'Number': 5, 'Start_Time': datetime.datetime(2022, 2, 9, 9, 38, 1, 262000)}\n{'Number': 6, 'Start_Time': datetime.datetime(2022, 2, 9, 9, 38, 7, 909000)}\n{'Number': 7, 'Start_Time': datetime.datetime(2022, 2, 9, 9, 38, 20, 258000)}\n{'Number': 8, 'Start_Time': datetime.datetime(2022, 2, 9, 9, 38, 25, 435000)}\n```\n\n## Documentation\nThe official documentation is hosted on [github.io](#https://mscaudill.github.io/tabbed/).\n\n\n## Dependencies\nTabbed depends on the excellent [clevercsv](\nhttps://clevercsv.readthedocs.io/en/latest/) package for dialect detection. The\nrest is pure Python >= 3.11.\n\n\n## Installation\n\nTabbed is hosted on [pypi]() and can be installed with pip into a virtual\nenvironment.\n\n```bash\npip install tabbed\n```\n\nTo get a development version of `Tabbed` from source start by cloning the\nrepository\n\n```bash\ngit clone git@github.com:mscaudill/tabbed.git\n```\n\nGo to the directory you just cloned and create an *editable install* with pip.\n```bash\npip install -e .[dev]\n```\n\n## Contributing\n\nWe're excited you want to contribute! Please check out our\n[Contribution](\nhttps://github.com/mscaudill/tabbed/blob/master/.github/CONTRIBUTING.md) guide.\n\n\n## Acknowledgements\n\n------\n\n**We are grateful for the support of the Ting Tsung and Wei Fong Chao\nFoundation and the Jan and Dan Duncan Neurological Research Institute at\nTexas Children's that generously supports Tabbed.**\n\n------\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "An iterative reader of irregular text files",
    "version": "1.0.1",
    "project_urls": {
        "Homepage": "https://github.com/mscaudill/tabbed"
    },
    "split_keywords": [
        "text",
        " delimited",
        " csv",
        " tsv"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "e4be7a5c0d440a08e5c6203949af694c1edac2da35356cff971de0b3f326e6d6",
                "md5": "b3814e29dff65ba777d5c9cac8db82dc",
                "sha256": "b2f2fed41552c0e44120ae5366ef801b0fb40b7a3006b75b860d294958322a21"
            },
            "downloads": -1,
            "filename": "tabbed-1.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b3814e29dff65ba777d5c9cac8db82dc",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.12",
            "size": 29034,
            "upload_time": "2025-07-08T15:45:40",
            "upload_time_iso_8601": "2025-07-08T15:45:40.829945Z",
            "url": "https://files.pythonhosted.org/packages/e4/be/7a5c0d440a08e5c6203949af694c1edac2da35356cff971de0b3f326e6d6/tabbed-1.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "6495d1d067b22e3153513830185c385fcc4e83c9c52fb93778e0d4d88096b60e",
                "md5": "bb707dc683eef21f9980736a15e0ed60",
                "sha256": "4b28b273490c2f812b4ef67cdc96fa530b4449b2194471ad95fbb3714f175648"
            },
            "downloads": -1,
            "filename": "tabbed-1.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "bb707dc683eef21f9980736a15e0ed60",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.12",
            "size": 40916,
            "upload_time": "2025-07-08T15:45:42",
            "upload_time_iso_8601": "2025-07-08T15:45:42.770010Z",
            "url": "https://files.pythonhosted.org/packages/64/95/d1d067b22e3153513830185c385fcc4e83c9c52fb93778e0d4d88096b60e/tabbed-1.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-08 15:45:42",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "mscaudill",
    "github_project": "tabbed",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "tabbed"
}

None