Vladiate
========
.. image:: https://github.com/di/vladiate/actions/workflows/test.yml/badge.svg?query=branch%3Amaster+event%3Apush
:target: https://github.com/di/vladiate/actions/workflows/test.yml?query=branch%3Amaster+event%3Apush
.. image:: https://coveralls.io/repos/di/vladiate/badge.svg?branch=master
:target: https://coveralls.io/github/di/vladiate
Description
-----------
Vladiate helps you write explicit assertions for every field of your CSV
file.
Features
--------
**Write validation schemas in plain-old Python**
No UI, no XML, no JSON, just code.
**Write your own validators**
Vladiate comes with a few by default, but there's no reason you can't write
your own.
**Validate multiple files at once**
Either with the same schema, or different ones.
Documentation
-------------
Installation
~~~~~~~~~~~~
Installing:
::
$ pip install vladiate
Quickstart
~~~~~~~~~~
Below is an example of a ``vladfile.py``
.. code:: python
from vladiate import Vlad
from vladiate.validators import UniqueValidator, SetValidator
from vladiate.inputs import LocalFile
class YourFirstValidator(Vlad):
source = LocalFile('vampires.csv')
validators = {
'Column A': [
UniqueValidator()
],
'Column B': [
SetValidator(['Vampire', 'Not A Vampire'])
]
}
Here we define a number of validators for a local file ``vampires.csv``,
which would look like this:
::
Column A,Column B
Vlad the Impaler,Not A Vampire
Dracula,Vampire
Count Chocula,Vampire
We then run ``vladiate`` in the same directory as your ``.csv`` file:
::
$ vladiate
And get the following output:
::
Validating YourFirstValidator(source=LocalFile('vampires.csv'))
Passed! :)
Handling Changes
^^^^^^^^^^^^^^^^
Let's imagine that you've gotten a new CSV file,
``potential_vampires.csv``, that looks like this:
::
Column A,Column B
Vlad the Impaler,Not A Vampire
Dracula,Vampire
Count Chocula,Vampire
Ronald Reagan,Maybe A Vampire
If we were to update our first validator to use this file as follows:
::
- class YourFirstValidator(Vlad):
- source = LocalFile('vampires.csv')
+ class YourFirstFailingValidator(Vlad):
+ source = LocalFile('potential_vampires.csv')
we would get the following error:
::
Validating YourFirstFailingValidator(source=LocalFile('potential_vampires.csv'))
Failed :(
SetValidator failed 1 time(s) (25.0%) on field: 'Column B'
Invalid fields: ['Maybe A Vampire']
And we would know that we'd either need to sanitize this field, or add
it to the ``SetValidator``.
Starting from scratch
^^^^^^^^^^^^^^^^^^^^^
To make writing a new ``vladfile.py`` easy, Vladiate will give
meaningful error messages.
Given the following as ``real_vampires.csv``:
::
Column A,Column B,Column C
Vlad the Impaler,Not A Vampire
Dracula,Vampire
Count Chocula,Vampire
Ronald Reagan,Maybe A Vampire
We could write a bare-bones validator as follows:
.. code:: python
class YourFirstEmptyValidator(Vlad):
source = LocalFile('real_vampires.csv')
validators = {}
Running this with ``vladiate`` would give the following error:
::
Validating YourFirstEmptyValidator(source=LocalFile('real_vampires.csv'))
Missing...
Missing validators for:
'Column A': [],
'Column B': [],
'Column C': [],
Vladiate expects something to be specified for every column, *even if it
is an empty list* (more on this later). We can easily copy and paste
from the error into our ``vladfile.py`` to make it:
.. code:: python
class YourFirstEmptyValidator(Vlad):
source = LocalFile('real_vampires.csv')
validators = {
'Column A': [],
'Column B': [],
'Column C': [],
}
When we run *this* with ``vladiate``, we get:
::
Validating YourSecondEmptyValidator(source=LocalFile('real_vampires.csv'))
Failed :(
EmptyValidator failed 4 time(s) (100.0%) on field: 'Column A'
Invalid fields: ['Dracula', 'Vlad the Impaler', 'Count Chocula', 'Ronald Reagan']
EmptyValidator failed 4 time(s) (100.0%) on field: 'Column B'
Invalid fields: ['Maybe A Vampire', 'Not A Vampire', 'Vampire']
EmptyValidator failed 4 time(s) (100.0%) on field: 'Column C'
Invalid fields: ['Real', 'Not Real']
This is because Vladiate interprets an empty list of validators for a
field as an ``EmptyValidator``, which expects an empty string in every
field. This helps us make meaningful decisions when adding validators to
our ``vladfile.py``. It also ensures that we are not forgetting about a
column or field which is not empty.
Built-in Validators
^^^^^^^^^^^^^^^^^^^
Vladiate comes with a few common validators built-in:
*class* ``Validator``
Generic validator. Should be subclassed by any custom validators. Not to
be used directly.
*class* ``CastValidator``
Generic "can-be-cast-to-x" validator. Should be subclassed by any
cast-test validator. Not to be used directly.
*class* ``IntValidator``
Validates whether a field can be cast to an ``int`` type or not.
:``empty_ok=False``:
Specify whether a field which is an empty string should be ignored.
*class* ``FloatValidator``
Validates whether a field can be cast to an ``float`` type or not.
:``empty_ok=False``:
Specify whether a field which is an empty string should be ignored.
*class* ``SetValidator``
Validates whether a field is in the specified set of possible fields.
:``valid_set=[]``:
List of valid possible fields
:``empty_ok=False``:
Implicity adds the empty string to the specified set.
:``ignore_case=False``:
Ignore the case between values in the column and valid set
*class* ``UniqueValidator``
Ensures that a given field is not repeated in any other column. Can
optionally determine "uniqueness" with other fields in the row as well via
``unique_with``.
:``unique_with=[]``:
List of field names to make the primary field unique with.
:``empty_ok=False``:
Specify whether a field which is an empty string should be ignored.
*class* ``RegexValidator``
Validates whether a field matches the given regex using `re.match()`.
:``pattern=r'di^'``:
The regex pattern. Fails for all fields by default.
:``full=False``:
Specify whether we should use a fullmatch() or match().
:``empty_ok=False``:
Specify whether a field which is an empty string should be ignored.
*class* ``RangeValidator``
Validates whether a field falls within a given range (inclusive). Can handle
integers or floats.
:``low``:
The low value of the range.
:``high``:
The high value of the range.
:``empty_ok=False``:
Specify whether a field which is an empty string should be ignored.
*class* ``EmptyValidator``
Ensure that a field is always empty. Essentially the same as an empty
``SetValidator``. This is used by default when a field has no
validators.
*class* ``NotEmptyValidator``
The opposite of an ``EmptyValidator``. Ensure that a field is never empty.
*class* ``Ignore``
Always passes validation. Used to explicity ignore a given column.
*class* ``RowValidator``
Generic row validator. Should be subclassed by any custom validators. Not
to be used directly.
*class* ``RowLengthValidator``
Validates that each row has the expected number of fields. The expected
number of fields is inferred from the CSV header row read by
``csv.DictReader``.
Built-in Input Types
^^^^^^^^^^^^^^^^^^^^
Vladiate comes with the following input types:
*class* ``VladInput``
Generic input. Should be subclassed by any custom inputs. Not to be used
directly.
*class* ``LocalFile``
Read from a file local to the filesystem.
:``filename``:
Path to a local CSV file.
*class* ``S3File``
Read from a file in S3. Optionally can specify either a full path, or a
bucket/key pair.
Requires the `boto <https://github.com/boto/boto>`_ library, which should be
installed via ``pip install vladiate[s3]``.
:``path=None``:
A full S3 filepath (e.g., ``s3://foo.bar/path/to/file.csv``)
:``bucket=None``:
S3 bucket. Must be specified with a ``key``.
:``key=None``:
S3 key. Must be specified with a ``bucket``.
*class* ``String``
Read CSV from a string. Can take either an ``str`` or a ``StringIO``.
:``string_input=None``
Regular Python string input.
:``string_io=None``
``StringIO`` input.
Running Vlads Programatically
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
*class* ``Vlad``
Initialize a Vlad programatically
:``source``:
Required. Any `VladInput`.
:``validators={}``:
List of validators. Optional, defaults to the class variable `validators`
if set, otherwise uses `EmptyValidator` for all fields.
:``delimiter=','``:
The delimiter used within your csv source. Optional, defaults to `,`.
:``ignore_missing_validators=False``:
Whether to fail validation if there are fields in the file for which the
`Vlad` does not have validators. Optional, defaults to `False`.
:``quiet=False``:
Whether to disable log output generated by validations.
Optional, defaults to `False`.
For example:
.. code:: python
from vladiate import Vlad
from vladiate.inputs import LocalFile
Vlad(source=LocalFile('path/to/local/file.csv')).validate()
Testing
~~~~~~~
To run the tests:
::
make test
To run the linter:
::
make lint
Command Line Arguments
~~~~~~~~~~~~~~~~~~~~~~
.. code:: bash
Usage: vladiate [options] [VladClass [VladClass2 ... ]]
Options:
-h, --help show this help message and exit
-f VLADFILE, --vladfile=VLADFILE
Python module file to import, e.g. '../other.py'.
Default: vladfile
-l, --list Show list of possible vladiate classes and exit
-V, --version show version number and exit
-p PROCESSES, --processes=PROCESSES
attempt to use this number of processes, Default: 1
-q, --quiet disable console log output generated by validations
Contributors
------------
- `Dustin Ingram <https://github.com/di>`__
- `Clara Bennett <https://github.com/csojinb>`__
- `Aditya Natraj <https://github.com/adityanatra>`__
- `Sterling Petersen <https://github.com/sterlingpetersen>`__
- `Aleix <https://github.com/maleix>`__
- `Bob Lannon <https://github.com/boblannon>`__
- `Santi <https://github.com/santilytics>`__
- `David Park <https://github.com/dp247>`__
- `Jon Bonafato <https://github.com/jonafato>`__
License
-------
Open source MIT license.
Raw data
{
"_id": null,
"home_page": "http://github.com/di/vladiate",
"name": "vladiate",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "validate CSV vampires",
"author": "Dustin Ingram",
"author_email": "github@dustingram.com",
"download_url": "https://files.pythonhosted.org/packages/74/ca/09bfbaa47c64c08383b4a4b19783b4c157403e350465b0ac56edf2a9de6a/vladiate-0.0.25.tar.gz",
"platform": null,
"description": "Vladiate\n========\n\n.. image:: https://github.com/di/vladiate/actions/workflows/test.yml/badge.svg?query=branch%3Amaster+event%3Apush\n :target: https://github.com/di/vladiate/actions/workflows/test.yml?query=branch%3Amaster+event%3Apush\n\n.. image:: https://coveralls.io/repos/di/vladiate/badge.svg?branch=master\n :target: https://coveralls.io/github/di/vladiate\n\nDescription\n-----------\n\nVladiate helps you write explicit assertions for every field of your CSV\nfile.\n\nFeatures\n--------\n\n**Write validation schemas in plain-old Python**\n No UI, no XML, no JSON, just code.\n\n**Write your own validators**\n Vladiate comes with a few by default, but there's no reason you can't write\n your own.\n\n**Validate multiple files at once**\n Either with the same schema, or different ones.\n\nDocumentation\n-------------\n\nInstallation\n~~~~~~~~~~~~\n\nInstalling:\n\n::\n\n $ pip install vladiate\n\nQuickstart\n~~~~~~~~~~\n\nBelow is an example of a ``vladfile.py``\n\n.. code:: python\n\n from vladiate import Vlad\n from vladiate.validators import UniqueValidator, SetValidator\n from vladiate.inputs import LocalFile\n\n class YourFirstValidator(Vlad):\n source = LocalFile('vampires.csv')\n validators = {\n 'Column A': [\n UniqueValidator()\n ],\n 'Column B': [\n SetValidator(['Vampire', 'Not A Vampire'])\n ]\n }\n\nHere we define a number of validators for a local file ``vampires.csv``,\nwhich would look like this:\n\n::\n\n Column A,Column B\n Vlad the Impaler,Not A Vampire\n Dracula,Vampire\n Count Chocula,Vampire\n\nWe then run ``vladiate`` in the same directory as your ``.csv`` file:\n\n::\n\n $ vladiate\n\nAnd get the following output:\n\n::\n\n Validating YourFirstValidator(source=LocalFile('vampires.csv'))\n Passed! :)\n\nHandling Changes\n^^^^^^^^^^^^^^^^\n\nLet's imagine that you've gotten a new CSV file,\n``potential_vampires.csv``, that looks like this:\n\n::\n\n Column A,Column B\n Vlad the Impaler,Not A Vampire\n Dracula,Vampire\n Count Chocula,Vampire\n Ronald Reagan,Maybe A Vampire\n\nIf we were to update our first validator to use this file as follows:\n\n::\n\n - class YourFirstValidator(Vlad):\n - source = LocalFile('vampires.csv')\n + class YourFirstFailingValidator(Vlad):\n + source = LocalFile('potential_vampires.csv')\n\nwe would get the following error:\n\n::\n\n Validating YourFirstFailingValidator(source=LocalFile('potential_vampires.csv'))\n Failed :(\n SetValidator failed 1 time(s) (25.0%) on field: 'Column B'\n Invalid fields: ['Maybe A Vampire']\n\nAnd we would know that we'd either need to sanitize this field, or add\nit to the ``SetValidator``.\n\nStarting from scratch\n^^^^^^^^^^^^^^^^^^^^^\n\nTo make writing a new ``vladfile.py`` easy, Vladiate will give\nmeaningful error messages.\n\nGiven the following as ``real_vampires.csv``:\n\n::\n\n Column A,Column B,Column C\n Vlad the Impaler,Not A Vampire\n Dracula,Vampire\n Count Chocula,Vampire\n Ronald Reagan,Maybe A Vampire\n\nWe could write a bare-bones validator as follows:\n\n.. code:: python\n\n class YourFirstEmptyValidator(Vlad):\n source = LocalFile('real_vampires.csv')\n validators = {}\n\nRunning this with ``vladiate`` would give the following error:\n\n::\n\n Validating YourFirstEmptyValidator(source=LocalFile('real_vampires.csv'))\n Missing...\n Missing validators for:\n 'Column A': [],\n 'Column B': [],\n 'Column C': [],\n\nVladiate expects something to be specified for every column, *even if it\nis an empty list* (more on this later). We can easily copy and paste\nfrom the error into our ``vladfile.py`` to make it:\n\n.. code:: python\n\n class YourFirstEmptyValidator(Vlad):\n source = LocalFile('real_vampires.csv')\n validators = {\n 'Column A': [],\n 'Column B': [],\n 'Column C': [],\n }\n\nWhen we run *this* with ``vladiate``, we get:\n\n::\n\n Validating YourSecondEmptyValidator(source=LocalFile('real_vampires.csv'))\n Failed :(\n EmptyValidator failed 4 time(s) (100.0%) on field: 'Column A'\n Invalid fields: ['Dracula', 'Vlad the Impaler', 'Count Chocula', 'Ronald Reagan']\n EmptyValidator failed 4 time(s) (100.0%) on field: 'Column B'\n Invalid fields: ['Maybe A Vampire', 'Not A Vampire', 'Vampire']\n EmptyValidator failed 4 time(s) (100.0%) on field: 'Column C'\n Invalid fields: ['Real', 'Not Real']\n\nThis is because Vladiate interprets an empty list of validators for a\nfield as an ``EmptyValidator``, which expects an empty string in every\nfield. This helps us make meaningful decisions when adding validators to\nour ``vladfile.py``. It also ensures that we are not forgetting about a\ncolumn or field which is not empty.\n\nBuilt-in Validators\n^^^^^^^^^^^^^^^^^^^\n\nVladiate comes with a few common validators built-in:\n\n*class* ``Validator``\n\n Generic validator. Should be subclassed by any custom validators. Not to\n be used directly.\n\n*class* ``CastValidator``\n\n Generic \"can-be-cast-to-x\" validator. Should be subclassed by any\n cast-test validator. Not to be used directly.\n\n*class* ``IntValidator``\n\n Validates whether a field can be cast to an ``int`` type or not.\n\n :``empty_ok=False``:\n Specify whether a field which is an empty string should be ignored.\n\n*class* ``FloatValidator``\n\n Validates whether a field can be cast to an ``float`` type or not.\n\n :``empty_ok=False``:\n Specify whether a field which is an empty string should be ignored.\n\n*class* ``SetValidator``\n\n Validates whether a field is in the specified set of possible fields.\n\n :``valid_set=[]``:\n List of valid possible fields\n :``empty_ok=False``:\n Implicity adds the empty string to the specified set.\n :``ignore_case=False``:\n Ignore the case between values in the column and valid set\n\n*class* ``UniqueValidator``\n\n Ensures that a given field is not repeated in any other column. Can\n optionally determine \"uniqueness\" with other fields in the row as well via\n ``unique_with``.\n\n :``unique_with=[]``:\n List of field names to make the primary field unique with.\n :``empty_ok=False``:\n Specify whether a field which is an empty string should be ignored.\n\n*class* ``RegexValidator``\n\n Validates whether a field matches the given regex using `re.match()`.\n\n :``pattern=r'di^'``:\n The regex pattern. Fails for all fields by default.\n :``full=False``:\n Specify whether we should use a fullmatch() or match().\n :``empty_ok=False``:\n Specify whether a field which is an empty string should be ignored.\n\n*class* ``RangeValidator``\n\n Validates whether a field falls within a given range (inclusive). Can handle\n integers or floats.\n\n :``low``:\n The low value of the range.\n :``high``:\n The high value of the range.\n :``empty_ok=False``:\n Specify whether a field which is an empty string should be ignored.\n\n*class* ``EmptyValidator``\n\n Ensure that a field is always empty. Essentially the same as an empty\n ``SetValidator``. This is used by default when a field has no\n validators.\n\n*class* ``NotEmptyValidator``\n\n The opposite of an ``EmptyValidator``. Ensure that a field is never empty.\n\n*class* ``Ignore``\n\n Always passes validation. Used to explicity ignore a given column.\n\n*class* ``RowValidator``\n\n Generic row validator. Should be subclassed by any custom validators. Not\n to be used directly.\n\n*class* ``RowLengthValidator``\n\n Validates that each row has the expected number of fields. The expected\n number of fields is inferred from the CSV header row read by\n ``csv.DictReader``.\n\nBuilt-in Input Types\n^^^^^^^^^^^^^^^^^^^^\n\nVladiate comes with the following input types:\n\n*class* ``VladInput``\n\n Generic input. Should be subclassed by any custom inputs. Not to be used\n directly.\n\n*class* ``LocalFile``\n\n Read from a file local to the filesystem.\n\n :``filename``:\n Path to a local CSV file.\n\n*class* ``S3File``\n\n Read from a file in S3. Optionally can specify either a full path, or a\n bucket/key pair.\n\n Requires the `boto <https://github.com/boto/boto>`_ library, which should be\n installed via ``pip install vladiate[s3]``.\n\n :``path=None``:\n A full S3 filepath (e.g., ``s3://foo.bar/path/to/file.csv``)\n\n :``bucket=None``:\n S3 bucket. Must be specified with a ``key``.\n\n :``key=None``:\n S3 key. Must be specified with a ``bucket``.\n\n*class* ``String``\n\n Read CSV from a string. Can take either an ``str`` or a ``StringIO``.\n\n :``string_input=None``\n Regular Python string input.\n\n :``string_io=None``\n ``StringIO`` input.\n\nRunning Vlads Programatically\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n*class* ``Vlad``\n\n Initialize a Vlad programatically\n\n :``source``:\n Required. Any `VladInput`.\n\n :``validators={}``:\n List of validators. Optional, defaults to the class variable `validators`\n if set, otherwise uses `EmptyValidator` for all fields.\n\n :``delimiter=','``:\n The delimiter used within your csv source. Optional, defaults to `,`.\n\n :``ignore_missing_validators=False``:\n Whether to fail validation if there are fields in the file for which the\n `Vlad` does not have validators. Optional, defaults to `False`.\n\n :``quiet=False``:\n Whether to disable log output generated by validations.\n Optional, defaults to `False`.\n\n For example:\n\n.. code:: python\n\n from vladiate import Vlad\n from vladiate.inputs import LocalFile\n Vlad(source=LocalFile('path/to/local/file.csv')).validate()\n\nTesting\n~~~~~~~\n\nTo run the tests:\n\n::\n\n make test\n\nTo run the linter:\n\n::\n\n make lint\n\nCommand Line Arguments\n~~~~~~~~~~~~~~~~~~~~~~\n\n.. code:: bash\n\n Usage: vladiate [options] [VladClass [VladClass2 ... ]]\n\n Options:\n -h, --help show this help message and exit\n -f VLADFILE, --vladfile=VLADFILE\n Python module file to import, e.g. '../other.py'.\n Default: vladfile\n -l, --list Show list of possible vladiate classes and exit\n -V, --version show version number and exit\n -p PROCESSES, --processes=PROCESSES\n attempt to use this number of processes, Default: 1\n -q, --quiet disable console log output generated by validations\n\nContributors\n------------\n\n- `Dustin Ingram <https://github.com/di>`__\n- `Clara Bennett <https://github.com/csojinb>`__\n- `Aditya Natraj <https://github.com/adityanatra>`__\n- `Sterling Petersen <https://github.com/sterlingpetersen>`__\n- `Aleix <https://github.com/maleix>`__\n- `Bob Lannon <https://github.com/boblannon>`__\n- `Santi <https://github.com/santilytics>`__\n- `David Park <https://github.com/dp247>`__\n- `Jon Bonafato <https://github.com/jonafato>`__\n\nLicense\n-------\n\nOpen source MIT license.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Vladiate is a strict validation tool for CSV files",
"version": "0.0.25",
"project_urls": {
"Homepage": "http://github.com/di/vladiate"
},
"split_keywords": [
"validate",
"csv",
"vampires"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "466484e47b4b32386cc018224b8044b132b294b5e10a381a9d5166db7b039c66",
"md5": "ea9abe0d55129792d28bcdbe91a701ef",
"sha256": "c020eb65c5bdc696d5eda895b24a8a913da5765d2bad297feb1100c5658adcf7"
},
"downloads": -1,
"filename": "vladiate-0.0.25-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ea9abe0d55129792d28bcdbe91a701ef",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 15187,
"upload_time": "2023-08-31T19:37:43",
"upload_time_iso_8601": "2023-08-31T19:37:43.425475Z",
"url": "https://files.pythonhosted.org/packages/46/64/84e47b4b32386cc018224b8044b132b294b5e10a381a9d5166db7b039c66/vladiate-0.0.25-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "74ca09bfbaa47c64c08383b4a4b19783b4c157403e350465b0ac56edf2a9de6a",
"md5": "70e0e00309513736184e57aa1d23d48d",
"sha256": "ad66e5eccee595d07bb66c365c42cefea018257b05a2c38b7293d509eede2e6b"
},
"downloads": -1,
"filename": "vladiate-0.0.25.tar.gz",
"has_sig": false,
"md5_digest": "70e0e00309513736184e57aa1d23d48d",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 20921,
"upload_time": "2023-08-31T19:37:45",
"upload_time_iso_8601": "2023-08-31T19:37:45.135300Z",
"url": "https://files.pythonhosted.org/packages/74/ca/09bfbaa47c64c08383b4a4b19783b4c157403e350465b0ac56edf2a9de6a/vladiate-0.0.25.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-08-31 19:37:45",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "di",
"github_project": "vladiate",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"tox": true,
"lcname": "vladiate"
}