hi-dateinfer
===========
Python library to infer date format from examples. This is an actively
maintained fork of the original [dateinfer](https://github.com/jeffreystarr/dateinfer)
library by Jeffery Starr. It maintains python 2/3 compatibility and
will be released as hi-dateinfer. Pull requests and issues welcome.
Table of Contents
-----------------
* [Problem Statement](#problem-statement)
* [Installation](#installation)
* [Usage](#usage)
<a name="problem-statement"></a>Problem Statement
-------------------------------------------------
Imagine that you are given a large collection of documents and, as part of the extraction process, extract date information and store it in a normalized format.
If the documents follow a single schema, the ideal approach is to craft a date parsing string for the schema.
However, if the documents follow different schemas or if the contents are noisy (e.g. date fields were hand-populated), the development can become onerous.
This library makes a "best guess" on the proper date parsing string (`datetime.strptime`) based on examples in
the file.
<a name="installation"></a>Installation
---------------------------------------
````sh
git clone https://github.com/hi-primus/hi-dateinfer.git
cd dateinfer
pip install .
````
or
````sh
pip install hi-dateinfer
````
<a name="usage"></a>Usage
-------------------------
````Python
>>> import dateinfer
>>> dateinfer.infer(['Mon Jan 13 09:52:52 MST 2014', 'Tue Jan 21 15:30:00 EST 2014'])
'%a %b %d %H:%M:%S %Z %Y'
>>>
````
Give `dateinfer.infer` a list of example date strings. `infer` returns a `datetime.strftime`/`strptime`-compliant
date format string for its "best guess" of a format string that will correctly parse the majority of the examples.
<a name="development"></a>Development
---------------------------------------
Use the following to install the package locally for development purposes:
````sh
# create empty virtual environment
virtualenv venv --python=python3.7
source venv/bin/activate
# install dateinfer in editable mode
pip install -e .
# install development dependencies
pip install -r requirements.txt
````
You can run unit tests as follows:
```sh
python -m unittest dateinfer/tests.py
```
Raw data
{
"_id": null,
"home_page": "https://github.com/hi-primus/hi-dateinfer",
"name": "hi-dateinfer",
"maintainer": "",
"docs_url": null,
"requires_python": "!=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*",
"maintainer_email": "",
"keywords": "dateinfer",
"author": "Jeffrey Starr",
"author_email": "will@pedalwrencher.com",
"download_url": "https://files.pythonhosted.org/packages/3e/73/447b4f9019909e9d94c67a9892b75d52df29850887cefa06d194ce6615ea/hi-dateinfer-0.4.6.tar.gz",
"platform": "",
"description": "hi-dateinfer\n===========\n\nPython library to infer date format from examples. This is an actively\n maintained fork of the original [dateinfer](https://github.com/jeffreystarr/dateinfer)\n library by Jeffery Starr. It maintains python 2/3 compatibility and\n will be released as hi-dateinfer. Pull requests and issues welcome.\n\nTable of Contents\n-----------------\n\n* [Problem Statement](#problem-statement)\n* [Installation](#installation)\n* [Usage](#usage)\n\n<a name=\"problem-statement\"></a>Problem Statement\n-------------------------------------------------\n\nImagine that you are given a large collection of documents and, as part of the extraction process, extract date information and store it in a normalized format.\nIf the documents follow a single schema, the ideal approach is to craft a date parsing string for the schema.\nHowever, if the documents follow different schemas or if the contents are noisy (e.g. date fields were hand-populated), the development can become onerous.\n\nThis library makes a \"best guess\" on the proper date parsing string (`datetime.strptime`) based on examples in\nthe file.\n\n<a name=\"installation\"></a>Installation\n---------------------------------------\n\n````sh\ngit clone https://github.com/hi-primus/hi-dateinfer.git\ncd dateinfer\npip install .\n````\n\nor\n\n````sh\npip install hi-dateinfer\n````\n\n<a name=\"usage\"></a>Usage\n-------------------------\n\n````Python\n>>> import dateinfer\n>>> dateinfer.infer(['Mon Jan 13 09:52:52 MST 2014', 'Tue Jan 21 15:30:00 EST 2014'])\n'%a %b %d %H:%M:%S %Z %Y'\n>>>\n````\n\nGive `dateinfer.infer` a list of example date strings. `infer` returns a `datetime.strftime`/`strptime`-compliant\ndate format string for its \"best guess\" of a format string that will correctly parse the majority of the examples.\n\n\n<a name=\"development\"></a>Development\n---------------------------------------\n\nUse the following to install the package locally for development purposes:\n\n````sh\n# create empty virtual environment\nvirtualenv venv --python=python3.7\nsource venv/bin/activate\n# install dateinfer in editable mode\npip install -e .\n# install development dependencies\npip install -r requirements.txt\n````\n\nYou can run unit tests as follows:\n\n```sh\npython -m unittest dateinfer/tests.py\n```\n\n\n",
"bugtrack_url": null,
"license": "Apache Software License 2.0",
"summary": "Infers date format from examples, by using a series of pattern matching and rewriting rules to compute a 'best guess' datetime.strptime format string give a list of example date strings.",
"version": "0.4.6",
"project_urls": {
"Homepage": "https://github.com/hi-primus/hi-dateinfer"
},
"split_keywords": [
"dateinfer"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "13c7868b174e83cf03ffa8ad4cb70d121278f5691abed1ade1b19485de869e3e",
"md5": "f93cbbc21a5bdc9579f1b7d4500e1807",
"sha256": "46d27ccc875890315eec9d71d76c2306e84388c4a88106d2f768a921debb49e8"
},
"downloads": -1,
"filename": "hi_dateinfer-0.4.6-py3-none-any.whl",
"has_sig": false,
"md5_digest": "f93cbbc21a5bdc9579f1b7d4500e1807",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "!=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*",
"size": 17709,
"upload_time": "2021-09-16T14:13:45",
"upload_time_iso_8601": "2021-09-16T14:13:45.392517Z",
"url": "https://files.pythonhosted.org/packages/13/c7/868b174e83cf03ffa8ad4cb70d121278f5691abed1ade1b19485de869e3e/hi_dateinfer-0.4.6-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "3e73447b4f9019909e9d94c67a9892b75d52df29850887cefa06d194ce6615ea",
"md5": "9a1f410044e9fd5f014f76fae4d182cb",
"sha256": "e2ded27aeb15ee6fcda0a30c2a341acbdaaa5b95c1c4db4c4680a4dbcda35c54"
},
"downloads": -1,
"filename": "hi-dateinfer-0.4.6.tar.gz",
"has_sig": false,
"md5_digest": "9a1f410044e9fd5f014f76fae4d182cb",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "!=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*",
"size": 18451,
"upload_time": "2021-09-16T14:13:46",
"upload_time_iso_8601": "2021-09-16T14:13:46.659789Z",
"url": "https://files.pythonhosted.org/packages/3e/73/447b4f9019909e9d94c67a9892b75d52df29850887cefa06d194ce6615ea/hi-dateinfer-0.4.6.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2021-09-16 14:13:46",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "hi-primus",
"github_project": "hi-dateinfer",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "hi-dateinfer"
}