# Installation
Depending on if you want only this tool, the full set of PNU tools, or PNU plus a selection of additional third-parties tools, use one of these commands:
pip install [pnu-adsv](https://pypi.org/project/pnu-adsv/)
<br>
pip install [PNU](https://pypi.org/project/PNU/)
<br>
pip install [pytnix](https://pypi.org/project/pytnix/)
# ADSV(1)
## NAME
adsv - Analyze delimiter-separated values files
## SYNOPSIS
**adsv**
\[-d|--delimiter CHAR\]
\[-e|--encoding STRING\]
\[-f|--fields LIST\]
\[-F|--flatten\]
\[-h|--hide INT\]
\[-m|--min INT\]
\[-M|--max INT\]
\[-t|--top INT\]
\[--debug\]
\[--help|-?\]
\[--version\]
\[--\]
filename
\[...\]
## DESCRIPTION
The **adsv** utility analyzes [delimiter-separated values](https://en.wikipedia.org/wiki/Delimiter-separated_values) files, such as [Comma-Separated Values .csv](https://en.wikipedia.org/wiki/Comma-separated_values) or [Tab-Separated Values .tsv](https://en.wikipedia.org/wiki/Tab-separated_values) files, and either prints information about their structure and the data in each of their fields, or prints a selection of fields in the order requested.
The information gathered are:
* for the file:
* the character set encoding
* the [CSV dialect](https://specs.frictionlessdata.io/csv-dialect/) (characters used for delimiting, quoting, escaping or lines terminating. Plus the use or not of double quoting)
* the presence or not of a headers line
* the number of lines and fields
* for each field:
* its number and header
* the number of distinct values
* the values type (strings, integers, floating numbers, complex numbers, date and time (whatever their format))
* the values by descending count
* the values range by ascending order using the detected type (useful for numbers and dates)
When analyzing a DSV dataset, this allows for a quick and automated way of getting global information about the contents, and explore any oddities...
There are options:
* to control and limit what is printed (*-h|--hide*, *-m|--min*, *-M|--max* and *-t|--top*),
* to avoid (or correct) the detection of the character set encoding and delimiter (*-d|--delimiter*, *-e|--encoding*):
* the character set detection can take a long time with big files, so if you know that the file is in "Windows-1252" or "utf-8" encoding, it's quicker to say it...
If you use the *-f|--fields* option, you'll skip printing the file analysis, and instead print the selected fields in the order requested, using the detected delimiting, quoting and escaping characters.
If you encounter multi-lines fields and want to "flatten" them to single lines, you can use the *-F|--flatten* option for that.
### OPTIONS
Options | Use
------- | ---
-d\|--delimiter CHAR|Specify delimiter to be CHAR
-e\|--encoding STRING|Specify charset encoding to be STRING (because detecting encoding can take a long time!)
-f\|--fields LIST|Extract LISTed fields values in given order (ex: 6,2-4,1 with fields numbered from 1)
-F\|--flatten|Make multi-lines fields single line
-h\|--hide INT|Hide the display of distinct values above INT % (default is 20%)
-m\|--min INT|Only display distinct values whose count >= INT (default is to display all distinct values)
-M\|--max INT|Only display INT lines of distinct values (default is to display all distinct values, within the hide limit)
-t\|--top INT|Only display the top/bottom INT lines of values (default is to display the 5 bottom and top lines)
--debug|Enable debug mode
--help\|-?|Print usage and a short help message and exit
--version|Print version and exit
--|Options processing terminator
## ENVIRONMENT
The ADSV_DEBUG environment variable can also be set to any value to enable debug mode.
## EXIT STATUS
The **adsv** utility exits 0 on success, and >0 if an error occurs.
## SEE ALSO
[cut(1)](https://www.freebsd.org/cgi/man.cgi?query=cut),
[file(1)](https://www.freebsd.org/cgi/man.cgi?query=file)
## STANDARDS
The **adsv** utility is not a standard UNIX command.
This implementation tries to follow the [PEP 8](https://www.python.org/dev/peps/pep-0008/) style guide for [Python](https://www.python.org/) code.
The DSV dialects that can be handled are those compatible with [RFC 4180: Common Format and MIME Type for Comma-Separated Values (CSV) Files](https://www.rfc-editor.org/rfc/rfc4180).
## PORTABILITY
Tested OK under Windows.
## HISTORY
This implementation was made for the [PNU project](https://github.com/HubTou/PNU).
I do this kind of analysis with each dataset I have to work with.
Last time I did that, I decided that it was about time to fully automate the process, especially as I was working with fields containing multi-lines values...
## LICENSE
It is available under the [3-clause BSD license](https://opensource.org/licenses/BSD-3-Clause).
## AUTHORS
[Hubert Tournier](https://github.com/HubTou)
## CAVEATS
Using "Sep=X" as a first line in order to set the X character as a delimiter is not supported.
There is no support either for potential commented lines inside the data (for example, with */etc/passwd* files under Unix), but it's not part of any recognized DSV dialect anyway.
Raw data
{
"_id": null,
"home_page": "https://github.com/HubTou/adsv/",
"name": "pnu-adsv",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": "",
"keywords": "pnu-project",
"author": "Hubert Tournier",
"author_email": "hubert.tournier@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/44/b4/eda89a8a53bec9fa352f5aa63385948db7043b195c6b919bb1a12eae84c6/pnu_adsv-1.0.1.tar.gz",
"platform": null,
"description": "# Installation\nDepending on if you want only this tool, the full set of PNU tools, or PNU plus a selection of additional third-parties tools, use one of these commands:\n\npip install [pnu-adsv](https://pypi.org/project/pnu-adsv/)\n<br>\npip install [PNU](https://pypi.org/project/PNU/)\n<br>\npip install [pytnix](https://pypi.org/project/pytnix/)\n\n# ADSV(1)\n\n## NAME\nadsv - Analyze delimiter-separated values files\n\n## SYNOPSIS\n**adsv**\n\\[-d|--delimiter CHAR\\]\n\\[-e|--encoding STRING\\]\n\\[-f|--fields LIST\\]\n\\[-F|--flatten\\]\n\\[-h|--hide INT\\]\n\\[-m|--min INT\\]\n\\[-M|--max INT\\]\n\\[-t|--top INT\\]\n\\[--debug\\]\n\\[--help|-?\\]\n\\[--version\\]\n\\[--\\]\nfilename\n\\[...\\]\n\n## DESCRIPTION\nThe **adsv** utility analyzes [delimiter-separated values](https://en.wikipedia.org/wiki/Delimiter-separated_values) files, such as [Comma-Separated Values .csv](https://en.wikipedia.org/wiki/Comma-separated_values) or [Tab-Separated Values .tsv](https://en.wikipedia.org/wiki/Tab-separated_values) files, and either prints information about their structure and the data in each of their fields, or prints a selection of fields in the order requested.\n\nThe information gathered are:\n* for the file:\n * the character set encoding\n * the [CSV dialect](https://specs.frictionlessdata.io/csv-dialect/) (characters used for delimiting, quoting, escaping or lines terminating. Plus the use or not of double quoting)\n * the presence or not of a headers line\n * the number of lines and fields\n* for each field:\n * its number and header\n * the number of distinct values\n * the values type (strings, integers, floating numbers, complex numbers, date and time (whatever their format))\n * the values by descending count\n * the values range by ascending order using the detected type (useful for numbers and dates)\n\nWhen analyzing a DSV dataset, this allows for a quick and automated way of getting global information about the contents, and explore any oddities...\n\nThere are options:\n* to control and limit what is printed (*-h|--hide*, *-m|--min*, *-M|--max* and *-t|--top*), \n* to avoid (or correct) the detection of the character set encoding and delimiter (*-d|--delimiter*, *-e|--encoding*):\n * the character set detection can take a long time with big files, so if you know that the file is in \"Windows-1252\" or \"utf-8\" encoding, it's quicker to say it...\n\nIf you use the *-f|--fields* option, you'll skip printing the file analysis, and instead print the selected fields in the order requested, using the detected delimiting, quoting and escaping characters.\n\nIf you encounter multi-lines fields and want to \"flatten\" them to single lines, you can use the *-F|--flatten* option for that.\n\n### OPTIONS\nOptions | Use\n------- | ---\n-d\\|--delimiter CHAR|Specify delimiter to be CHAR\n-e\\|--encoding STRING|Specify charset encoding to be STRING (because detecting encoding can take a long time!)\n-f\\|--fields LIST|Extract LISTed fields values in given order (ex: 6,2-4,1 with fields numbered from 1)\n-F\\|--flatten|Make multi-lines fields single line\n-h\\|--hide INT|Hide the display of distinct values above INT % (default is 20%)\n-m\\|--min INT|Only display distinct values whose count >= INT (default is to display all distinct values)\n-M\\|--max INT|Only display INT lines of distinct values (default is to display all distinct values, within the hide limit)\n-t\\|--top INT|Only display the top/bottom INT lines of values (default is to display the 5 bottom and top lines)\n--debug|Enable debug mode\n--help\\|-?|Print usage and a short help message and exit\n--version|Print version and exit\n--|Options processing terminator\n\n## ENVIRONMENT\nThe ADSV_DEBUG environment variable can also be set to any value to enable debug mode.\n\n## EXIT STATUS\nThe **adsv** utility exits 0 on success, and >0 if an error occurs.\n\n## SEE ALSO\n[cut(1)](https://www.freebsd.org/cgi/man.cgi?query=cut),\n[file(1)](https://www.freebsd.org/cgi/man.cgi?query=file)\n\n## STANDARDS\nThe **adsv** utility is not a standard UNIX command.\n\nThis implementation tries to follow the [PEP 8](https://www.python.org/dev/peps/pep-0008/) style guide for [Python](https://www.python.org/) code.\n\nThe DSV dialects that can be handled are those compatible with [RFC 4180: Common Format and MIME Type for Comma-Separated Values (CSV) Files](https://www.rfc-editor.org/rfc/rfc4180).\n\n## PORTABILITY\nTested OK under Windows.\n\n## HISTORY\nThis implementation was made for the [PNU project](https://github.com/HubTou/PNU).\n\nI do this kind of analysis with each dataset I have to work with.\nLast time I did that, I decided that it was about time to fully automate the process, especially as I was working with fields containing multi-lines values...\n\n## LICENSE\nIt is available under the [3-clause BSD license](https://opensource.org/licenses/BSD-3-Clause).\n\n## AUTHORS\n[Hubert Tournier](https://github.com/HubTou)\n\n## CAVEATS\nUsing \"Sep=X\" as a first line in order to set the X character as a delimiter is not supported.\n\nThere is no support either for potential commented lines inside the data (for example, with */etc/passwd* files under Unix), but it's not part of any recognized DSV dialect anyway.\n",
"bugtrack_url": null,
"license": "BSD 3-Clause License",
"summary": "Analyze delimiter-separated values files",
"version": "1.0.1",
"split_keywords": [
"pnu-project"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "512da216ed1dbda69a56cb250e0486598a81a21a49a50b1d3e66e29b7283d446",
"md5": "62314d061ccd73145e5f95a79c7cf87c",
"sha256": "94882759d04de5d2971cdfede3a172d4b73d74bc02ad7d662dcbca9614fcc520"
},
"downloads": -1,
"filename": "pnu_adsv-1.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "62314d061ccd73145e5f95a79c7cf87c",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 14414,
"upload_time": "2023-01-23T21:22:05",
"upload_time_iso_8601": "2023-01-23T21:22:05.657670Z",
"url": "https://files.pythonhosted.org/packages/51/2d/a216ed1dbda69a56cb250e0486598a81a21a49a50b1d3e66e29b7283d446/pnu_adsv-1.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "44b4eda89a8a53bec9fa352f5aa63385948db7043b195c6b919bb1a12eae84c6",
"md5": "763bc0eb26d9d329afa6490a8e4a4018",
"sha256": "d520b21008cfe22fb57f8e5c1e532a3bac65aeaae2429d3629993888c2517823"
},
"downloads": -1,
"filename": "pnu_adsv-1.0.1.tar.gz",
"has_sig": false,
"md5_digest": "763bc0eb26d9d329afa6490a8e4a4018",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 15750,
"upload_time": "2023-01-23T21:22:07",
"upload_time_iso_8601": "2023-01-23T21:22:07.426927Z",
"url": "https://files.pythonhosted.org/packages/44/b4/eda89a8a53bec9fa352f5aa63385948db7043b195c6b919bb1a12eae84c6/pnu_adsv-1.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-01-23 21:22:07",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "HubTou",
"github_project": "adsv",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "pnu-adsv"
}