# clean your CSVs!
This command line tool cleans CSV files by:
1. detecting the encoding and converting it to utf-8
2. detecting the delimiter and safely converting it to a comma
3. casting all variables to json form, i.e. integers, floats, booleans, string or null.
* install `pip install csv-bleach`
* and run like `python -m run csv_bleach my-data.csv`
The only option is the output file name, by default it will be your original file name with `.scsv` extension.
You will now be able to parse your CSV safely with a simple script like:
```python
import json
def parse_row(text):
return json.loads(f"[{text}]")
def parse_file(file):
rows = map(parse_row, file)
header = next(rows)
for row in rows:
yield dict(zip(header, row))
with open("my-data.scsv") as f:
for item in parse_file(f):
print(item)
```
Raw data
{
"_id": null,
"home_page": "https://github.com/gecBurton/csv-bleach",
"name": "csv-bleach",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8.1",
"maintainer_email": "",
"keywords": "csv",
"author": "George Burton",
"author_email": "g.e.c.burton@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/2d/47/0f9e2af34700c2048509dfa4e35706f37de565a456bb2211072eb40a2d03/csv_bleach-0.2.3.tar.gz",
"platform": null,
"description": "# clean your CSVs!\n\nThis command line tool cleans CSV files by:\n1. detecting the encoding and converting it to utf-8\n2. detecting the delimiter and safely converting it to a comma\n3. casting all variables to json form, i.e. integers, floats, booleans, string or null.\n\n\n* install `pip install csv-bleach`\n* and run like `python -m run csv_bleach my-data.csv`\n\nThe only option is the output file name, by default it will be your original file name with `.scsv` extension.\n\nYou will now be able to parse your CSV safely with a simple script like:\n\n```python\nimport json\n\n\ndef parse_row(text):\n return json.loads(f\"[{text}]\")\n\ndef parse_file(file):\n rows = map(parse_row, file)\n header = next(rows)\n for row in rows:\n yield dict(zip(header, row))\n\n\nwith open(\"my-data.scsv\") as f:\n for item in parse_file(f):\n print(item)\n```\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "clean CSVs",
"version": "0.2.3",
"split_keywords": [
"csv"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "29981af97c4e20e22367334a0d0a352a6d2f7035f3df578dcefeecf71ad45f50",
"md5": "9163f975188407aea5b9e52e497d2237",
"sha256": "2e021b2c8142d0b083dbabb7b9b77203e5f4d0e3b10f5cfe11d712a67ecb6de4"
},
"downloads": -1,
"filename": "csv_bleach-0.2.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "9163f975188407aea5b9e52e497d2237",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8.1",
"size": 6687,
"upload_time": "2023-01-23T14:22:20",
"upload_time_iso_8601": "2023-01-23T14:22:20.464553Z",
"url": "https://files.pythonhosted.org/packages/29/98/1af97c4e20e22367334a0d0a352a6d2f7035f3df578dcefeecf71ad45f50/csv_bleach-0.2.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "2d470f9e2af34700c2048509dfa4e35706f37de565a456bb2211072eb40a2d03",
"md5": "976bc39fb0007b317137cc29800a7ef6",
"sha256": "1b14cdf085c252240ccf3fab8d7a396952d492cb3181eb8cdf58ce357c7f778b"
},
"downloads": -1,
"filename": "csv_bleach-0.2.3.tar.gz",
"has_sig": false,
"md5_digest": "976bc39fb0007b317137cc29800a7ef6",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8.1",
"size": 5104,
"upload_time": "2023-01-23T14:22:22",
"upload_time_iso_8601": "2023-01-23T14:22:22.044422Z",
"url": "https://files.pythonhosted.org/packages/2d/47/0f9e2af34700c2048509dfa4e35706f37de565a456bb2211072eb40a2d03/csv_bleach-0.2.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-01-23 14:22:22",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "gecBurton",
"github_project": "csv-bleach",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "csv-bleach"
}