pnu-adsv


Namepnu-adsv JSON
Version 1.0.1 PyPI version JSON
download
home_pagehttps://github.com/HubTou/adsv/
SummaryAnalyze delimiter-separated values files
upload_time2023-01-23 21:22:07
maintainer
docs_urlNone
authorHubert Tournier
requires_python>=3.6
licenseBSD 3-Clause License
keywords pnu-project
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Installation
Depending on if you want only this tool, the full set of PNU tools, or PNU plus a selection of additional third-parties tools, use one of these commands:

pip install [pnu-adsv](https://pypi.org/project/pnu-adsv/)
<br>
pip install [PNU](https://pypi.org/project/PNU/)
<br>
pip install [pytnix](https://pypi.org/project/pytnix/)

# ADSV(1)

## NAME
adsv - Analyze delimiter-separated values files

## SYNOPSIS
**adsv**
\[-d|--delimiter CHAR\]
\[-e|--encoding STRING\]
\[-f|--fields LIST\]
\[-F|--flatten\]
\[-h|--hide INT\]
\[-m|--min INT\]
\[-M|--max INT\]
\[-t|--top INT\]
\[--debug\]
\[--help|-?\]
\[--version\]
\[--\]
filename
\[...\]

## DESCRIPTION
The **adsv** utility analyzes [delimiter-separated values](https://en.wikipedia.org/wiki/Delimiter-separated_values) files, such as  [Comma-Separated Values .csv](https://en.wikipedia.org/wiki/Comma-separated_values) or [Tab-Separated Values .tsv](https://en.wikipedia.org/wiki/Tab-separated_values) files, and either prints information about their structure and the data in each of their fields, or prints a selection of fields in the order requested.

The information gathered are:
* for the file:
  * the character set encoding
  * the [CSV dialect](https://specs.frictionlessdata.io/csv-dialect/) (characters used for delimiting, quoting, escaping or lines terminating. Plus the use or not of double quoting)
  * the presence or not of a headers line
  * the number of lines and fields
* for each field:
  * its number and header
  * the number of distinct values
  * the values type (strings, integers, floating numbers, complex numbers, date and time (whatever their format))
  * the values by descending count
  * the values range by ascending order using the detected type (useful for numbers and dates)

When analyzing a DSV dataset, this allows for a quick and automated way of getting global information about the contents, and explore any oddities...

There are options:
* to control and limit what is printed (*-h|--hide*, *-m|--min*, *-M|--max* and *-t|--top*), 
* to avoid (or correct) the detection of the character set encoding and delimiter (*-d|--delimiter*, *-e|--encoding*):
  * the character set detection can take a long time with big files, so if you know that the file is in "Windows-1252" or "utf-8" encoding, it's quicker to say it...

If you use the *-f|--fields* option, you'll skip printing the file analysis, and instead print the selected fields in the order requested, using the detected delimiting, quoting and escaping characters.

If you encounter multi-lines fields and want to "flatten" them to single lines, you can use the *-F|--flatten* option for that.

### OPTIONS
Options | Use
------- | ---
-d\|--delimiter CHAR|Specify delimiter to be CHAR
-e\|--encoding STRING|Specify charset encoding to be STRING (because detecting encoding can take a long time!)
-f\|--fields LIST|Extract LISTed fields values in given order (ex: 6,2-4,1 with fields numbered from 1)
-F\|--flatten|Make multi-lines fields single line
-h\|--hide INT|Hide the display of distinct values above INT % (default is 20%)
-m\|--min INT|Only display distinct values whose count >= INT (default is to display all distinct values)
-M\|--max INT|Only display INT lines of distinct values (default is to display all distinct values, within the hide limit)
-t\|--top INT|Only display the top/bottom INT lines of values (default is to display the 5 bottom and top lines)
--debug|Enable debug mode
--help\|-?|Print usage and a short help message and exit
--version|Print version and exit
--|Options processing terminator

## ENVIRONMENT
The ADSV_DEBUG environment variable can also be set to any value to enable debug mode.

## EXIT STATUS
The **adsv** utility exits 0 on success, and >0 if an error occurs.

## SEE ALSO
[cut(1)](https://www.freebsd.org/cgi/man.cgi?query=cut),
[file(1)](https://www.freebsd.org/cgi/man.cgi?query=file)

## STANDARDS
The **adsv** utility is not a standard UNIX command.

This implementation tries to follow the [PEP 8](https://www.python.org/dev/peps/pep-0008/) style guide for [Python](https://www.python.org/) code.

The DSV dialects that can be handled are those compatible with [RFC 4180: Common Format and MIME Type for Comma-Separated Values (CSV) Files](https://www.rfc-editor.org/rfc/rfc4180).

## PORTABILITY
Tested OK under Windows.

## HISTORY
This implementation was made for the [PNU project](https://github.com/HubTou/PNU).

I do this kind of analysis with each dataset I have to work with.
Last time I did that, I decided that it was about time to fully automate the process, especially as I was working with fields containing multi-lines values...

## LICENSE
It is available under the [3-clause BSD license](https://opensource.org/licenses/BSD-3-Clause).

## AUTHORS
[Hubert Tournier](https://github.com/HubTou)

## CAVEATS
Using "Sep=X" as a first line in order to set the X character as a delimiter is not supported.

There is no support either for potential commented lines inside the data (for example, with */etc/passwd* files under Unix), but it's not part of any recognized DSV dialect anyway.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/HubTou/adsv/",
    "name": "pnu-adsv",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "pnu-project",
    "author": "Hubert Tournier",
    "author_email": "hubert.tournier@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/44/b4/eda89a8a53bec9fa352f5aa63385948db7043b195c6b919bb1a12eae84c6/pnu_adsv-1.0.1.tar.gz",
    "platform": null,
    "description": "# Installation\nDepending on if you want only this tool, the full set of PNU tools, or PNU plus a selection of additional third-parties tools, use one of these commands:\n\npip install [pnu-adsv](https://pypi.org/project/pnu-adsv/)\n<br>\npip install [PNU](https://pypi.org/project/PNU/)\n<br>\npip install [pytnix](https://pypi.org/project/pytnix/)\n\n# ADSV(1)\n\n## NAME\nadsv - Analyze delimiter-separated values files\n\n## SYNOPSIS\n**adsv**\n\\[-d|--delimiter CHAR\\]\n\\[-e|--encoding STRING\\]\n\\[-f|--fields LIST\\]\n\\[-F|--flatten\\]\n\\[-h|--hide INT\\]\n\\[-m|--min INT\\]\n\\[-M|--max INT\\]\n\\[-t|--top INT\\]\n\\[--debug\\]\n\\[--help|-?\\]\n\\[--version\\]\n\\[--\\]\nfilename\n\\[...\\]\n\n## DESCRIPTION\nThe **adsv** utility analyzes [delimiter-separated values](https://en.wikipedia.org/wiki/Delimiter-separated_values) files, such as  [Comma-Separated Values .csv](https://en.wikipedia.org/wiki/Comma-separated_values) or [Tab-Separated Values .tsv](https://en.wikipedia.org/wiki/Tab-separated_values) files, and either prints information about their structure and the data in each of their fields, or prints a selection of fields in the order requested.\n\nThe information gathered are:\n* for the file:\n  * the character set encoding\n  * the [CSV dialect](https://specs.frictionlessdata.io/csv-dialect/) (characters used for delimiting, quoting, escaping or lines terminating. Plus the use or not of double quoting)\n  * the presence or not of a headers line\n  * the number of lines and fields\n* for each field:\n  * its number and header\n  * the number of distinct values\n  * the values type (strings, integers, floating numbers, complex numbers, date and time (whatever their format))\n  * the values by descending count\n  * the values range by ascending order using the detected type (useful for numbers and dates)\n\nWhen analyzing a DSV dataset, this allows for a quick and automated way of getting global information about the contents, and explore any oddities...\n\nThere are options:\n* to control and limit what is printed (*-h|--hide*, *-m|--min*, *-M|--max* and *-t|--top*), \n* to avoid (or correct) the detection of the character set encoding and delimiter (*-d|--delimiter*, *-e|--encoding*):\n  * the character set detection can take a long time with big files, so if you know that the file is in \"Windows-1252\" or \"utf-8\" encoding, it's quicker to say it...\n\nIf you use the *-f|--fields* option, you'll skip printing the file analysis, and instead print the selected fields in the order requested, using the detected delimiting, quoting and escaping characters.\n\nIf you encounter multi-lines fields and want to \"flatten\" them to single lines, you can use the *-F|--flatten* option for that.\n\n### OPTIONS\nOptions | Use\n------- | ---\n-d\\|--delimiter CHAR|Specify delimiter to be CHAR\n-e\\|--encoding STRING|Specify charset encoding to be STRING (because detecting encoding can take a long time!)\n-f\\|--fields LIST|Extract LISTed fields values in given order (ex: 6,2-4,1 with fields numbered from 1)\n-F\\|--flatten|Make multi-lines fields single line\n-h\\|--hide INT|Hide the display of distinct values above INT % (default is 20%)\n-m\\|--min INT|Only display distinct values whose count >= INT (default is to display all distinct values)\n-M\\|--max INT|Only display INT lines of distinct values (default is to display all distinct values, within the hide limit)\n-t\\|--top INT|Only display the top/bottom INT lines of values (default is to display the 5 bottom and top lines)\n--debug|Enable debug mode\n--help\\|-?|Print usage and a short help message and exit\n--version|Print version and exit\n--|Options processing terminator\n\n## ENVIRONMENT\nThe ADSV_DEBUG environment variable can also be set to any value to enable debug mode.\n\n## EXIT STATUS\nThe **adsv** utility exits 0 on success, and >0 if an error occurs.\n\n## SEE ALSO\n[cut(1)](https://www.freebsd.org/cgi/man.cgi?query=cut),\n[file(1)](https://www.freebsd.org/cgi/man.cgi?query=file)\n\n## STANDARDS\nThe **adsv** utility is not a standard UNIX command.\n\nThis implementation tries to follow the [PEP 8](https://www.python.org/dev/peps/pep-0008/) style guide for [Python](https://www.python.org/) code.\n\nThe DSV dialects that can be handled are those compatible with [RFC 4180: Common Format and MIME Type for Comma-Separated Values (CSV) Files](https://www.rfc-editor.org/rfc/rfc4180).\n\n## PORTABILITY\nTested OK under Windows.\n\n## HISTORY\nThis implementation was made for the [PNU project](https://github.com/HubTou/PNU).\n\nI do this kind of analysis with each dataset I have to work with.\nLast time I did that, I decided that it was about time to fully automate the process, especially as I was working with fields containing multi-lines values...\n\n## LICENSE\nIt is available under the [3-clause BSD license](https://opensource.org/licenses/BSD-3-Clause).\n\n## AUTHORS\n[Hubert Tournier](https://github.com/HubTou)\n\n## CAVEATS\nUsing \"Sep=X\" as a first line in order to set the X character as a delimiter is not supported.\n\nThere is no support either for potential commented lines inside the data (for example, with */etc/passwd* files under Unix), but it's not part of any recognized DSV dialect anyway.\n",
    "bugtrack_url": null,
    "license": "BSD 3-Clause License",
    "summary": "Analyze delimiter-separated values files",
    "version": "1.0.1",
    "split_keywords": [
        "pnu-project"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "512da216ed1dbda69a56cb250e0486598a81a21a49a50b1d3e66e29b7283d446",
                "md5": "62314d061ccd73145e5f95a79c7cf87c",
                "sha256": "94882759d04de5d2971cdfede3a172d4b73d74bc02ad7d662dcbca9614fcc520"
            },
            "downloads": -1,
            "filename": "pnu_adsv-1.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "62314d061ccd73145e5f95a79c7cf87c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 14414,
            "upload_time": "2023-01-23T21:22:05",
            "upload_time_iso_8601": "2023-01-23T21:22:05.657670Z",
            "url": "https://files.pythonhosted.org/packages/51/2d/a216ed1dbda69a56cb250e0486598a81a21a49a50b1d3e66e29b7283d446/pnu_adsv-1.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "44b4eda89a8a53bec9fa352f5aa63385948db7043b195c6b919bb1a12eae84c6",
                "md5": "763bc0eb26d9d329afa6490a8e4a4018",
                "sha256": "d520b21008cfe22fb57f8e5c1e532a3bac65aeaae2429d3629993888c2517823"
            },
            "downloads": -1,
            "filename": "pnu_adsv-1.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "763bc0eb26d9d329afa6490a8e4a4018",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 15750,
            "upload_time": "2023-01-23T21:22:07",
            "upload_time_iso_8601": "2023-01-23T21:22:07.426927Z",
            "url": "https://files.pythonhosted.org/packages/44/b4/eda89a8a53bec9fa352f5aa63385948db7043b195c6b919bb1a12eae84c6/pnu_adsv-1.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-01-23 21:22:07",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "HubTou",
    "github_project": "adsv",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "pnu-adsv"
}
        
Elapsed time: 0.03385s