| Name | lico JSON |
| Version |
0.1.3
JSON |
| download |
| home_page | |
| Summary | List processing with csv files |
| upload_time | 2023-04-13 12:59:53 |
| maintainer | |
| docs_url | None |
| author | sjoerdk |
| requires_python | >=3.9,<4.0 |
| license | MIT |
| keywords |
|
| VCS |
|
| bugtrack_url |
|
| requirements |
No requirements were recorded.
|
| Travis-CI |
No Travis.
|
| coveralls test coverage |
No coveralls.
|
# lico
[](https://github.com/sjoerdk/lico/actions/workflows/build.yml?query=branch%3Amaster)
[](https://pypi.org/project/lico/)
[](https://pypi.org/project/lico/)
[](https://github.com/psf/black)
[](http://mypy-lang.org/)
List comb. For quick-and-dirty operations on each row of a csv file.
Handles boiler-plate code for IO, error handling printing progress.
Optimized for single-use operations on smaller (< millions) csv files in noisy environments.
## features
* Free software: MIT license
* Read and write CSV files
* Run custom operations for each row
* Handles errors and existing results
## Installation
```
pip install lico
```
## Usage
### Basic example
```
from lico.io import Task
from lico.operations import Concatenate
# concatenate column 1 and 2 in input.csv, write to output
Task(input='input.csv',
operation=Concatenate(['col1', 'col2']),
output='output.csv').run()
```
### Defining operations
```
from lico.core import Operation
# first of all, subclass lico.core.Operation
class MyOperation(Operation):
def apply(self, row):
"""This method gets called on each row"""
old_value = row['column1'] # access values like dict
new_value = any_function(old_value)
return {'new_column': new_value} # new value(s)
# 'new_column' is appended to existing columns in output
```
### Skipping rows
There are two ways to tell lico to skip a row.`Operation.has_previous_result()` and raising `RowProcessError`
```
from lico.core import Operation
from lico.exceptions import RowProcessError
class MyOperation(Operation):
def apply(self, row):
if row['col1'] == '0':
raise RowProcessError # Lico will skip current row
return {'result':'a_result'}
def has_previous_result(self, row):
"""# If the column 'result' contains anything, skip this"""
if row.get('result', None):
return True
else:
return False
```
## Built-in error handling
Beyond skipping lines with previous results or `RowProcessingErrors` there are ways in which lico
makes processing more robust:
* Trying to access a non-existent column in Operation.apply() will yield an error and automatically skip that row
* Output of `Task.run()` will always have the same number of rows as the input. If an unhandled exception occurs during `Task.run()`, lico will stop processing but still write all results obtained
so far. The unprocessed rows will be in the output unmodified.
## Logging
Lico uses the root logger `lico`. To print log messages put this in your code:
```
import logging
logging.basicConfig(level=logging.DEBUG)
```
## CSV structure
The idea is to keep CSVs as simple and unambiguous as possible. Therefore:
* All csv values are text. No interpreting things as ints. Too many operations
have been messed up by truncating leading zeros etc.
* csv row headers are required and are considered unique keys
## Why?
Situations in which lico might speed up your work:
* I've got a Here is a csv file of (~1000) rows including `legacy id`
* Can we find `new id` for each of these legacy ids and also add `datapoint` based on `new id`?
* We don't know whether `legacy id` is valid in all cases. Or at all.
* This whole procedure is just to 'get an idea'. Just for exploration
There are many ways to approach this. Mine is usually to get rid of excel by parsing the data into a flat
csv file and then using a combination of a text editor and bash magic for merging, sorting. Intermediate
steps are saved for auditing.
However, for certain operations such as interacting with servers this is not enough. I then tend to use python.
This is more powerful but also creates overhead. Many of these tasks are single-use. Each time I have to slighty
modify the same code: read in csv, do something, handle errors, write output.
lico tries to get rid of that boiler plate code as much as possible.
Raw data
{
"_id": null,
"home_page": "",
"name": "lico",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.9,<4.0",
"maintainer_email": "",
"keywords": "",
"author": "sjoerdk",
"author_email": "sjoerd.kerkstra@radboudumc.nl",
"download_url": "https://files.pythonhosted.org/packages/ba/f3/cf8d003978b5ecb9f7b07f8064d2ec240faef5245e382e298cba985e6adf/lico-0.1.3.tar.gz",
"platform": null,
"description": "# lico\n\n\n[](https://github.com/sjoerdk/lico/actions/workflows/build.yml?query=branch%3Amaster)\n[](https://pypi.org/project/lico/)\n[](https://pypi.org/project/lico/)\n[](https://github.com/psf/black)\n[](http://mypy-lang.org/)\n\nList comb. For quick-and-dirty operations on each row of a csv file.\nHandles boiler-plate code for IO, error handling printing progress. \nOptimized for single-use operations on smaller (< millions) csv files in noisy environments.\n\n## features \n\n* Free software: MIT license\n* Read and write CSV files\n* Run custom operations for each row\n* Handles errors and existing results\n\n## Installation \n\n```\npip install lico\n```\n\n## Usage\n\n### Basic example\n```\nfrom lico.io import Task\nfrom lico.operations import Concatenate\n\n# concatenate column 1 and 2 in input.csv, write to output\nTask(input='input.csv', \n operation=Concatenate(['col1', 'col2']),\n output='output.csv').run()\n``` \n\n### Defining operations\n```\nfrom lico.core import Operation\n\n# first of all, subclass lico.core.Operation\nclass MyOperation(Operation): \n def apply(self, row):\n \"\"\"This method gets called on each row\"\"\"\n old_value = row['column1'] # access values like dict \n new_value = any_function(old_value)\n return {'new_column': new_value} # new value(s)\n # 'new_column' is appended to existing columns in output\n```\n### Skipping rows\nThere are two ways to tell lico to skip a row.`Operation.has_previous_result()` and raising `RowProcessError`\n```\nfrom lico.core import Operation\nfrom lico.exceptions import RowProcessError\n\nclass MyOperation(Operation): \n def apply(self, row):\n if row['col1'] == '0': \n raise RowProcessError # Lico will skip current row \n return {'result':'a_result'}\n \n def has_previous_result(self, row):\n \"\"\"# If the column 'result' contains anything, skip this\"\"\" \n if row.get('result', None):\n return True \n else:\n return False\n```\n## Built-in error handling\nBeyond skipping lines with previous results or `RowProcessingErrors` there are ways in which lico\nmakes processing more robust:\n\n* Trying to access a non-existent column in Operation.apply() will yield an error and automatically skip that row\n* Output of `Task.run()` will always have the same number of rows as the input. If an unhandled exception occurs during `Task.run()`, lico will stop processing but still write all results obtained\n so far. The unprocessed rows will be in the output unmodified. \n\n## Logging\nLico uses the root logger `lico`. To print log messages put this in your code:\n```\nimport logging\n\nlogging.basicConfig(level=logging.DEBUG)\n```\n\n## CSV structure\n\nThe idea is to keep CSVs as simple and unambiguous as possible. Therefore:\n\n* All csv values are text. No interpreting things as ints. Too many operations\n have been messed up by truncating leading zeros etc.\n* csv row headers are required and are considered unique keys\n\n## Why?\n\nSituations in which lico might speed up your work:\n\n* I've got a Here is a csv file of (~1000) rows including `legacy id`\n* Can we find `new id` for each of these legacy ids and also add `datapoint` based on `new id`?\n* We don't know whether `legacy id` is valid in all cases. Or at all.\n* This whole procedure is just to 'get an idea'. Just for exploration\n\nThere are many ways to approach this. Mine is usually to get rid of excel by parsing the data into a flat\ncsv file and then using a combination of a text editor and bash magic for merging, sorting. Intermediate\nsteps are saved for auditing.\n\nHowever, for certain operations such as interacting with servers this is not enough. I then tend to use python.\nThis is more powerful but also creates overhead. Many of these tasks are single-use. Each time I have to slighty\nmodify the same code: read in csv, do something, handle errors, write output.\n\nlico tries to get rid of that boiler plate code as much as possible.\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "List processing with csv files",
"version": "0.1.3",
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "518544b72d89531252083b64dd4b4e0ffa7bff1023c698105279dd06df2b9ff1",
"md5": "70960b687d7fb12ed18e3d83281e3df8",
"sha256": "7b7092cf40d73a82d27e9d929876caaaa7c7e35368d516a65d7fc0c461e4c3e4"
},
"downloads": -1,
"filename": "lico-0.1.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "70960b687d7fb12ed18e3d83281e3df8",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9,<4.0",
"size": 9356,
"upload_time": "2023-04-13T12:59:51",
"upload_time_iso_8601": "2023-04-13T12:59:51.637918Z",
"url": "https://files.pythonhosted.org/packages/51/85/44b72d89531252083b64dd4b4e0ffa7bff1023c698105279dd06df2b9ff1/lico-0.1.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "baf3cf8d003978b5ecb9f7b07f8064d2ec240faef5245e382e298cba985e6adf",
"md5": "6a1cee18d1198941e81cfcb35c5e36e8",
"sha256": "181022e0080ee0e86f6731deff358f8f4f975948b66a064c7e28b3130e2354fa"
},
"downloads": -1,
"filename": "lico-0.1.3.tar.gz",
"has_sig": false,
"md5_digest": "6a1cee18d1198941e81cfcb35c5e36e8",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9,<4.0",
"size": 9913,
"upload_time": "2023-04-13T12:59:53",
"upload_time_iso_8601": "2023-04-13T12:59:53.726757Z",
"url": "https://files.pythonhosted.org/packages/ba/f3/cf8d003978b5ecb9f7b07f8064d2ec240faef5245e382e298cba985e6adf/lico-0.1.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-04-13 12:59:53",
"github": false,
"gitlab": false,
"bitbucket": false,
"lcname": "lico"
}