pyfsdb


Namepyfsdb JSON
Version 2.3.6 PyPI version JSON
download
home_pageNone
SummaryA python implementation of the flat-file streaming database
upload_time2024-08-23 15:38:41
maintainerNone
docs_urlNone
authorNone
requires_python>=3.6
licenseMIT License Copyright (c) 2019-2024 University of Southern California, Information Sciences Institute Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Objective

The [FSDB] "flat-file streaming database" is a structured data file
that includes column names, formatting specifications (e.g. tab vs
space vs comma), and a command history that generated each file.
PyFSDB is a a python implementation of the original functionality that
was implemented in perl.  Both the perl and python version come with a
long list of [command line tools] that can be used to quickly process
datasets using traditional unix pipeline processing.  There is also a
[C implementation] and a Go implementation (ref needed) of FSDB.

Getting started documentation is below, but also see the [full
documentation] over on readthedocs.

[FSDB]: https://www.isi.edu/~johnh/SOFTWARE/FSDB/
[C implementation]: https://github.com/hardaker/fsdb-clib
[full documentation]: https://fsdb.readthedocs.io/en/latest/
[command line tools]: https://fsdb.readthedocs.io/en/latest/tools/index.html

# Installation

Using pip (or pipx):

```
pip3 install pyfsdb
```

Or manually:

```
git clone git@github.com:gawseed/pyfsdb.git
cd pyfsdb
python3 setup.py build
python3 setup.py install
```

# Example Usage

The FSDB file format contains headers and footers that supplement the
data within a file.  The most common separator is tab-separated, but
can wrap CSVs and other datatypes (see the FSDB documentation for full
details).  The file also contains footers that trace all the piped
commands that were used to create a file, thus documenting the history
of its creation within the metadata in the file.

## Example pyfsdb code for reading a file

Reading in row by row:

```
import pyfsdb
db = pyfsdb.Fsdb("myfile.fsdb")
print(db.column_names)
for row in db:
    print(row)
```

## Example FSDB file

```
#fsdb -F t col1 two andthree
1	key1	42.0
2	key2	123.0
```

## Example writing to an FSDB formatted file.

```
import pyfsdb
db = pyfsdb.Fsdb(out_file="myfile.fsdb")
db.out_column_names=('one', 'two')
db.append([4, 'hello world'])
db.close()
```

Read below for further usage details.

# Installation

```
pip3 install pyfsdb
```

# Additional Usage Details

The real power of the FSDB comes from the build up of tool-suites that
all interchange FSDB formatted files.  This allows chaining multiple
commands together to achieve a goal.  Though the original base set of
tools are in perl, you don't need to know perl for most of them.

## Let's create a ./mydemo.py script:

``` python
import sys, pyfsdb

db = pyfsdb.Fsdb(file_handle=sys.stdin, out_file_handle=sys.stdout)
value_column = db.get_column_number('value')

for row in db:     # reads a row from the input stream
    row[value_column] = float(row[value_column]) * 2
    db.append(row) # sends the row to the output stream

db.close()
```

And then feed it this file:

```
#fsdb -F t col1 value
1	42.0
2	123.0
```

We can run it thus'ly:


``` sh
# cat test.fsdb | ./mydemo.py
#fsdb -F t col1 value
1	84.0
2	246.0
#   | ./mydemo.py
```

Or chain it together with multiple FSDB commands:

```
# cat test.fsdb | ./mydemo | dbcolstats value | dbcol mean stddev sum min max | dbfilealter -R C
#fsdb -R C mean stddev sum min max
mean: 165
stddev: 114.55
sum: 330
min: 84
max: 246
#   | ./mydemo.py
#   | dbcolstats value
#   | dbcol mean stddev sum min max
#   | dbfilealter -R C
```

# Command line tools included

All the command line utilities that come with `pyfsdb` start with `p`
by convention so as not to conflict with the utilities from perl
package.  The leading `p` also serves to distinguish the CLI argument
differences as well (e.g. the python versions allow file names to be
specified on the command line, and most keys must be passed with a
`-k` flag).

## Data processing tools

- pdbrow: select rows based on logic criteria
- pdbroweval: modify rows based on python code
- pdbtopn: given a key and a value column, print the top N rows with
  unique keys and the highest values.
- pdbaugment: a fast way to merge two fsdb files, where one is stored
  entirely in memory for speed.  Unlike other tools, this does not
  sort the data for speed purposes.
- pdbcoluniq: find all unique values of a key column, optionally with
  counting.  Requires no sorting (unlike dbrowuniq) at the cost of
  greater memory usage.
- pdbzerofill: fills a column with zeros if the value is otherwise blank
- pdbkeyedsort: sorts a potentially large file that is already
  "mostly" sorted by performing a double-pass on reading it.  This
  will be less and less efficient the more random the rows are in
  order.
- pdbfullpivot: description TBD
- pdbreescape: converts a column full of data to regex quoted for
  safety
- pdbensure:
- pdbcdf: performs cdf analysis on a column

## Conversion tools
- bro2fsdb: converts a [zeek/bro](zeek.org) log into an fsdb
- json2fsdb: converts a json file to fsdb
- fsdb2json: converts an fsdb file to json
- pdb2tex: converts a fsdb file to a latex table
- pdbformat: generically formats each row according to a python column
  specifier
- pdbsplitter: splits a FSDB file into multiple sub-files based on a
  column set
- pdbdatetoepoch: converts columns from a date string to an integer
  epoch column
- pdbepochtodate: formats a unix epoch seconds date to human readable
- pdbnormalize: normalizes a column to a limited range
- pdbsum: tbd
- pdbj2: formats results based on a jinja2 template
- pdb2sql: converts a fsdb file into an sqlite3 database

## graphical utilities
- pdbheatmap: creates a heat map based on incoming data columns
- pdbroc: creates a ROC graph for incoming fsdb data


# Author

Wes Hardaker @ USC/ISI

# See also

The FSDB website and manual page for the original perl module:

https://www.isi.edu/~johnh/SOFTWARE/FSDB/

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "pyfsdb",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": null,
    "author": null,
    "author_email": "Wes Hardaker <opensource@hardakers.net>",
    "download_url": "https://files.pythonhosted.org/packages/ef/3d/cf6d3679e6094c3e29d5bad21171c6b086294f9c64105d1d1a67edcce8e0/pyfsdb-2.3.6.tar.gz",
    "platform": null,
    "description": "# Objective\n\nThe [FSDB] \"flat-file streaming database\" is a structured data file\nthat includes column names, formatting specifications (e.g. tab vs\nspace vs comma), and a command history that generated each file.\nPyFSDB is a a python implementation of the original functionality that\nwas implemented in perl.  Both the perl and python version come with a\nlong list of [command line tools] that can be used to quickly process\ndatasets using traditional unix pipeline processing.  There is also a\n[C implementation] and a Go implementation (ref needed) of FSDB.\n\nGetting started documentation is below, but also see the [full\ndocumentation] over on readthedocs.\n\n[FSDB]: https://www.isi.edu/~johnh/SOFTWARE/FSDB/\n[C implementation]: https://github.com/hardaker/fsdb-clib\n[full documentation]: https://fsdb.readthedocs.io/en/latest/\n[command line tools]: https://fsdb.readthedocs.io/en/latest/tools/index.html\n\n# Installation\n\nUsing pip (or pipx):\n\n```\npip3 install pyfsdb\n```\n\nOr manually:\n\n```\ngit clone git@github.com:gawseed/pyfsdb.git\ncd pyfsdb\npython3 setup.py build\npython3 setup.py install\n```\n\n# Example Usage\n\nThe FSDB file format contains headers and footers that supplement the\ndata within a file.  The most common separator is tab-separated, but\ncan wrap CSVs and other datatypes (see the FSDB documentation for full\ndetails).  The file also contains footers that trace all the piped\ncommands that were used to create a file, thus documenting the history\nof its creation within the metadata in the file.\n\n## Example pyfsdb code for reading a file\n\nReading in row by row:\n\n```\nimport pyfsdb\ndb = pyfsdb.Fsdb(\"myfile.fsdb\")\nprint(db.column_names)\nfor row in db:\n    print(row)\n```\n\n## Example FSDB file\n\n```\n#fsdb -F t col1 two andthree\n1\tkey1\t42.0\n2\tkey2\t123.0\n```\n\n## Example writing to an FSDB formatted file.\n\n```\nimport pyfsdb\ndb = pyfsdb.Fsdb(out_file=\"myfile.fsdb\")\ndb.out_column_names=('one', 'two')\ndb.append([4, 'hello world'])\ndb.close()\n```\n\nRead below for further usage details.\n\n# Installation\n\n```\npip3 install pyfsdb\n```\n\n# Additional Usage Details\n\nThe real power of the FSDB comes from the build up of tool-suites that\nall interchange FSDB formatted files.  This allows chaining multiple\ncommands together to achieve a goal.  Though the original base set of\ntools are in perl, you don't need to know perl for most of them.\n\n## Let's create a ./mydemo.py script:\n\n``` python\nimport sys, pyfsdb\n\ndb = pyfsdb.Fsdb(file_handle=sys.stdin, out_file_handle=sys.stdout)\nvalue_column = db.get_column_number('value')\n\nfor row in db:     # reads a row from the input stream\n    row[value_column] = float(row[value_column]) * 2\n    db.append(row) # sends the row to the output stream\n\ndb.close()\n```\n\nAnd then feed it this file:\n\n```\n#fsdb -F t col1 value\n1\t42.0\n2\t123.0\n```\n\nWe can run it thus'ly:\n\n\n``` sh\n# cat test.fsdb | ./mydemo.py\n#fsdb -F t col1 value\n1\t84.0\n2\t246.0\n#   | ./mydemo.py\n```\n\nOr chain it together with multiple FSDB commands:\n\n```\n# cat test.fsdb | ./mydemo | dbcolstats value | dbcol mean stddev sum min max | dbfilealter -R C\n#fsdb -R C mean stddev sum min max\nmean: 165\nstddev: 114.55\nsum: 330\nmin: 84\nmax: 246\n#   | ./mydemo.py\n#   | dbcolstats value\n#   | dbcol mean stddev sum min max\n#   | dbfilealter -R C\n```\n\n# Command line tools included\n\nAll the command line utilities that come with `pyfsdb` start with `p`\nby convention so as not to conflict with the utilities from perl\npackage.  The leading `p` also serves to distinguish the CLI argument\ndifferences as well (e.g. the python versions allow file names to be\nspecified on the command line, and most keys must be passed with a\n`-k` flag).\n\n## Data processing tools\n\n- pdbrow: select rows based on logic criteria\n- pdbroweval: modify rows based on python code\n- pdbtopn: given a key and a value column, print the top N rows with\n  unique keys and the highest values.\n- pdbaugment: a fast way to merge two fsdb files, where one is stored\n  entirely in memory for speed.  Unlike other tools, this does not\n  sort the data for speed purposes.\n- pdbcoluniq: find all unique values of a key column, optionally with\n  counting.  Requires no sorting (unlike dbrowuniq) at the cost of\n  greater memory usage.\n- pdbzerofill: fills a column with zeros if the value is otherwise blank\n- pdbkeyedsort: sorts a potentially large file that is already\n  \"mostly\" sorted by performing a double-pass on reading it.  This\n  will be less and less efficient the more random the rows are in\n  order.\n- pdbfullpivot: description TBD\n- pdbreescape: converts a column full of data to regex quoted for\n  safety\n- pdbensure:\n- pdbcdf: performs cdf analysis on a column\n\n## Conversion tools\n- bro2fsdb: converts a [zeek/bro](zeek.org) log into an fsdb\n- json2fsdb: converts a json file to fsdb\n- fsdb2json: converts an fsdb file to json\n- pdb2tex: converts a fsdb file to a latex table\n- pdbformat: generically formats each row according to a python column\n  specifier\n- pdbsplitter: splits a FSDB file into multiple sub-files based on a\n  column set\n- pdbdatetoepoch: converts columns from a date string to an integer\n  epoch column\n- pdbepochtodate: formats a unix epoch seconds date to human readable\n- pdbnormalize: normalizes a column to a limited range\n- pdbsum: tbd\n- pdbj2: formats results based on a jinja2 template\n- pdb2sql: converts a fsdb file into an sqlite3 database\n\n## graphical utilities\n- pdbheatmap: creates a heat map based on incoming data columns\n- pdbroc: creates a ROC graph for incoming fsdb data\n\n\n# Author\n\nWes Hardaker @ USC/ISI\n\n# See also\n\nThe FSDB website and manual page for the original perl module:\n\nhttps://www.isi.edu/~johnh/SOFTWARE/FSDB/\n",
    "bugtrack_url": null,
    "license": "MIT License\n        \n        Copyright (c) 2019-2024 University of Southern California, Information Sciences Institute\n        \n        Permission is hereby granted, free of charge, to any person obtaining a copy\n        of this software and associated documentation files (the \"Software\"), to deal\n        in the Software without restriction, including without limitation the rights\n        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\n        copies of the Software, and to permit persons to whom the Software is\n        furnished to do so, subject to the following conditions:\n        \n        The above copyright notice and this permission notice shall be included in all\n        copies or substantial portions of the Software.\n        \n        THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\n        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\n        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\n        SOFTWARE.",
    "summary": "A python implementation of the flat-file streaming database",
    "version": "2.3.6",
    "project_urls": {
        "Homepage": "https://github.com/gawseed/pyfsdb"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "5bc8d26d204015f934c4050493ba12a4e0d5f42d2fe0af360b7a863449c112a6",
                "md5": "2bebcc1c4530a153160278a8ab920679",
                "sha256": "4a3e8159349f8686d10d841711f993591e5283131ee13988aff93328209ca15b"
            },
            "downloads": -1,
            "filename": "pyfsdb-2.3.6-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "2bebcc1c4530a153160278a8ab920679",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 79961,
            "upload_time": "2024-08-23T15:38:38",
            "upload_time_iso_8601": "2024-08-23T15:38:38.814894Z",
            "url": "https://files.pythonhosted.org/packages/5b/c8/d26d204015f934c4050493ba12a4e0d5f42d2fe0af360b7a863449c112a6/pyfsdb-2.3.6-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "ef3dcf6d3679e6094c3e29d5bad21171c6b086294f9c64105d1d1a67edcce8e0",
                "md5": "436b4b38a227693f281abe77ca925d78",
                "sha256": "da8c2d13d018906d7ad778e90aefdfb787fa5f3d54d443c7f966f896e0c78e71"
            },
            "downloads": -1,
            "filename": "pyfsdb-2.3.6.tar.gz",
            "has_sig": false,
            "md5_digest": "436b4b38a227693f281abe77ca925d78",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 46884,
            "upload_time": "2024-08-23T15:38:41",
            "upload_time_iso_8601": "2024-08-23T15:38:41.741456Z",
            "url": "https://files.pythonhosted.org/packages/ef/3d/cf6d3679e6094c3e29d5bad21171c6b086294f9c64105d1d1a67edcce8e0/pyfsdb-2.3.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-23 15:38:41",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "gawseed",
    "github_project": "pyfsdb",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "pyfsdb"
}
        
Elapsed time: 1.09009s