datespanlib


Namedatespanlib JSON
Version 0.1.8 PyPI version JSON
download
home_pagehttps://github.com/Zeutschler/datespanlib
SummaryA library for handling date spans.
upload_time2024-09-21 09:01:10
maintainerNone
docs_urlNone
authorThomas Zeutschler
requires_python>=3.10
licenseMIT License Copyright (c) 2024 Thomas Zeutschler Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords python datetime timespan pandas numpy spark data analysis sql dataframe
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # DateSpanLib
![GitHub license](https://img.shields.io/github/license/Zeutschler/datespanlib?color=A1C547)
![PyPI version](https://img.shields.io/pypi/v/datespanlib?logo=pypi&logoColor=979DA4&color=A1C547)
![Python versions](https://img.shields.io/badge/dynamic/toml?url=https%3A%2F%2Fraw.githubusercontent.com%2FZeutschler%2Fdatespanlib%2Fmaster%2Fpyproject.toml&query=%24%5B'project'%5D%5B'requires-python'%5D&color=A1C547)
![PyPI Downloads](https://img.shields.io/pypi/dm/datespanlib.svg?logo=pypi&logoColor=979DA4&label=PyPI%20downloads&color=A1C547)
![GitHub last commit](https://img.shields.io/github/last-commit/Zeutschler/datespanlib?logo=github&logoColor=979DA4&color=A1C547)
![unit tests](https://img.shields.io/github/actions/workflow/status/zeutschler/datespanlib/python-package.yml?logo=GitHub&logoColor=979DA4&label=unit%20tests&color=A1C547)
![build](https://img.shields.io/github/actions/workflow/status/zeutschler/datespanlib/python-package.yml?logo=GitHub&logoColor=979DA4&color=A1C547)
![documentation](https://img.shields.io/github/actions/workflow/status/zeutschler/datespanlib/static-site-upload.yml?logo=GitHub&logoColor=979DA4&label=docs&color=A1C547&link=https%3A%2F%2Fzeutschler.github.io%2Fcubedpandas%2F)
![codecov](https://codecov.io/github/Zeutschler/datespanlib/graph/badge.svg?token=B12O0B6F10)

**UNDER CONSTRUCTION** - The DateSpanLib library is under active development and in a pre-alpha state, not 
suitable for production use and even testing. The library is expected to be released in a first alpha version
in the next weeks.

-----------------
A Python library for handling and using data and time spans. 

```python
from datespanlib import DateSpan

ds = DateSpan("January to March 2024")
print("2024-04-15" in ds + "1 month")  # returns True  
```

The DateSpanLib library is designed to be used for data analysis and data processing, 
where date and time spans are often used to filter, aggregate or join data. But it 
should also be valuable in any other context where date and time spans are used.

It provides dependency free integrations with Pandas, Numpy, Spark and others, can 
generate Python code artefacts, either as source text or as precompiled (lambda) 
functions and can also generate SQL fragments for filtering in SQL WHERE clauses.

#### Background
The DataSpanLib library has been carved out from the 
[CubedPandas](https://github.com/Zeutschler/cubedpandas) project - a library for 
intuitive data analysis with Pandas dataframes - as it serves a broader purpose and 
can be used independently of CubedPandas. 

For internal DateTime parsing and manipulation, 
the great [dateutil](https://github.com/dateutil/dateutil) library is used. The
DataSpanLib library has no other dependencies (like Pandas, Numpy Spark etc.), 
so it is lightweight and easy to install.

## Installation
The library can be installed via pip or is available as a download on [PyPi.org](https://pypi.org/datespanlib/).
```bash
pip install datespanlib
```

## Usage

The library provides the following methods and classes:

### Method parse() 
The `parse` method converts an arbitrary string into a `DateSpanSet` object. The string can be a simple date
like '2021-01-01' or a complex date span expression like 'Mondays to Wednesday last month'.

### Class DateSpan
`DateSpan` objects represent a single span of time, typically represented by a `start` and `end` datetime.
The `DateSpan` object provides methods to compare, merge, split, shift, expand, intersect etc. with other
`DateSpan` or Python datetime objects.

`DateSpan` objects are 'expansive' in the sense that they resolve the widest possible time span
for the 
, e.g. if a `DateSpan` object is created with a start date of '2021-01-01' and an end date of '2021-01-31',  




###  DateSpanSet - represents an ordered set of DateSpan objects
`DateSpanSet` is an ordered and redundancy free collection of `DateSpan` objects. If e.g. two `DateSpan` 
objects in the set would overlap or are contiguous, they are merged into one `DateSpan` object. Aside 
set related operations the `DateSpanSet` comes with two special capabilities worth mentioning:

* A build in **interpreter for arbitrary date, time and date span strings**, ranging from simple dates
  like '2021-01-01' up to complex date span expressions like 'Mondays to Wednesday last month'.

* Provides methods and can create **artefacts and callables for data processing** with Python, SQL, Pandas
  Numpy, Spark and other compatible libraries.




## Basic Usage
```python
from datespanlib import parse, DateSpanSet, DateSpan

# Create a DateSpan object
jan = DateSpan(start='2024-01-01', end='2024-01-31')
feb = DateSpan("February 2024")

jan_feb = DateSpanSet([jan, feb]) # Create a DateSpanSet object
assert(len(jan_feb) == 1)  # returns 1, as the consecutive or overlapping DateSpan objects get merged.

assert (jan_feb == parse("January, February 2024")) # Compare DateSpan objects

# Set operations
jan_feb_mar = jan_feb + "1 month"
assert(jan_feb_mar == parse("first 3 month of 2024"))
jan_mar = jan_feb_mar - "Januray 2024"   
assert(len(jan_mar))  # returns 2, as the one DateSpans gets split into two DataSpans.
assert(jan_mar.contains("2024-01-15"))  

# Use DateSpanSet to filter Pandas DataFrame
import pandas as pd
df = pd.DataFrame({"date": pd.date_range("2024-01-01", "2024-12-31")})
result = df[df["date"].apply(jan_mar.contains)]  # don't use this, slow
result = jan_mar.filter(df, "date")  # fast vectorized operation

# Use DateSpanSet to filter Spark DataFrame
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
df = spark.createDataFrame(pd.DataFrame({"date": pd.date_range("2024-01-01", "2024-12-31")}))
result = jan_mar.filter(df, "date")  # fast vectorized/distributed operation

# Use DateSpanSet to filter Numpy array
import numpy as np
arr = np.arange(np.datetime64("2024-01-01"), np.datetime64("2024-12-31"))
result = jan_mar.filter(arr)  # fast vectorized operation

# Use DateSpanSet to create an SQL WHERE statement
sql = f"SELECT * FROM table WHERE {jan_mar.to_sql('date')}"
```










            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Zeutschler/datespanlib",
    "name": "datespanlib",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": "Thomas Zeutschler <cubedpandas@gmail.com>",
    "keywords": "python, datetime, timespan, pandas, numpy, spark, data analysis, sql, dataframe",
    "author": "Thomas Zeutschler",
    "author_email": "cubedpandas@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/34/4e/3572f3653fb77a7e953c2ef5d576f56ce8f79a086843ccd2e281d55c60f6/datespanlib-0.1.8.tar.gz",
    "platform": "any",
    "description": "# DateSpanLib\n![GitHub license](https://img.shields.io/github/license/Zeutschler/datespanlib?color=A1C547)\n![PyPI version](https://img.shields.io/pypi/v/datespanlib?logo=pypi&logoColor=979DA4&color=A1C547)\n![Python versions](https://img.shields.io/badge/dynamic/toml?url=https%3A%2F%2Fraw.githubusercontent.com%2FZeutschler%2Fdatespanlib%2Fmaster%2Fpyproject.toml&query=%24%5B'project'%5D%5B'requires-python'%5D&color=A1C547)\n![PyPI Downloads](https://img.shields.io/pypi/dm/datespanlib.svg?logo=pypi&logoColor=979DA4&label=PyPI%20downloads&color=A1C547)\n![GitHub last commit](https://img.shields.io/github/last-commit/Zeutschler/datespanlib?logo=github&logoColor=979DA4&color=A1C547)\n![unit tests](https://img.shields.io/github/actions/workflow/status/zeutschler/datespanlib/python-package.yml?logo=GitHub&logoColor=979DA4&label=unit%20tests&color=A1C547)\n![build](https://img.shields.io/github/actions/workflow/status/zeutschler/datespanlib/python-package.yml?logo=GitHub&logoColor=979DA4&color=A1C547)\n![documentation](https://img.shields.io/github/actions/workflow/status/zeutschler/datespanlib/static-site-upload.yml?logo=GitHub&logoColor=979DA4&label=docs&color=A1C547&link=https%3A%2F%2Fzeutschler.github.io%2Fcubedpandas%2F)\n![codecov](https://codecov.io/github/Zeutschler/datespanlib/graph/badge.svg?token=B12O0B6F10)\n\n**UNDER CONSTRUCTION** - The DateSpanLib library is under active development and in a pre-alpha state, not \nsuitable for production use and even testing. The library is expected to be released in a first alpha version\nin the next weeks.\n\n-----------------\nA Python library for handling and using data and time spans. \n\n```python\nfrom datespanlib import DateSpan\n\nds = DateSpan(\"January to March 2024\")\nprint(\"2024-04-15\" in ds + \"1 month\")  # returns True  \n```\n\nThe DateSpanLib library is designed to be used for data analysis and data processing, \nwhere date and time spans are often used to filter, aggregate or join data. But it \nshould also be valuable in any other context where date and time spans are used.\n\nIt provides dependency free integrations with Pandas, Numpy, Spark and others, can \ngenerate Python code artefacts, either as source text or as precompiled (lambda) \nfunctions and can also generate SQL fragments for filtering in SQL WHERE clauses.\n\n#### Background\nThe DataSpanLib library has been carved out from the \n[CubedPandas](https://github.com/Zeutschler/cubedpandas) project - a library for \nintuitive data analysis with Pandas dataframes - as it serves a broader purpose and \ncan be used independently of CubedPandas. \n\nFor internal DateTime parsing and manipulation, \nthe great [dateutil](https://github.com/dateutil/dateutil) library is used. The\nDataSpanLib library has no other dependencies (like Pandas, Numpy Spark etc.), \nso it is lightweight and easy to install.\n\n## Installation\nThe library can be installed via pip or is available as a download on [PyPi.org](https://pypi.org/datespanlib/).\n```bash\npip install datespanlib\n```\n\n## Usage\n\nThe library provides the following methods and classes:\n\n### Method parse() \nThe `parse` method converts an arbitrary string into a `DateSpanSet` object. The string can be a simple date\nlike '2021-01-01' or a complex date span expression like 'Mondays to Wednesday last month'.\n\n### Class DateSpan\n`DateSpan` objects represent a single span of time, typically represented by a `start` and `end` datetime.\nThe `DateSpan` object provides methods to compare, merge, split, shift, expand, intersect etc. with other\n`DateSpan` or Python datetime objects.\n\n`DateSpan` objects are 'expansive' in the sense that they resolve the widest possible time span\nfor the \n, e.g. if a `DateSpan` object is created with a start date of '2021-01-01' and an end date of '2021-01-31',  \n\n\n\n\n###  DateSpanSet - represents an ordered set of DateSpan objects\n`DateSpanSet` is an ordered and redundancy free collection of `DateSpan` objects. If e.g. two `DateSpan` \nobjects in the set would overlap or are contiguous, they are merged into one `DateSpan` object. Aside \nset related operations the `DateSpanSet` comes with two special capabilities worth mentioning:\n\n* A build in **interpreter for arbitrary date, time and date span strings**, ranging from simple dates\n  like '2021-01-01' up to complex date span expressions like 'Mondays to Wednesday last month'.\n\n* Provides methods and can create **artefacts and callables for data processing** with Python, SQL, Pandas\n  Numpy, Spark and other compatible libraries.\n\n\n\n\n## Basic Usage\n```python\nfrom datespanlib import parse, DateSpanSet, DateSpan\n\n# Create a DateSpan object\njan = DateSpan(start='2024-01-01', end='2024-01-31')\nfeb = DateSpan(\"February 2024\")\n\njan_feb = DateSpanSet([jan, feb]) # Create a DateSpanSet object\nassert(len(jan_feb) == 1)  # returns 1, as the consecutive or overlapping DateSpan objects get merged.\n\nassert (jan_feb == parse(\"January, February 2024\")) # Compare DateSpan objects\n\n# Set operations\njan_feb_mar = jan_feb + \"1 month\"\nassert(jan_feb_mar == parse(\"first 3 month of 2024\"))\njan_mar = jan_feb_mar - \"Januray 2024\"   \nassert(len(jan_mar))  # returns 2, as the one DateSpans gets split into two DataSpans.\nassert(jan_mar.contains(\"2024-01-15\"))  \n\n# Use DateSpanSet to filter Pandas DataFrame\nimport pandas as pd\ndf = pd.DataFrame({\"date\": pd.date_range(\"2024-01-01\", \"2024-12-31\")})\nresult = df[df[\"date\"].apply(jan_mar.contains)]  # don't use this, slow\nresult = jan_mar.filter(df, \"date\")  # fast vectorized operation\n\n# Use DateSpanSet to filter Spark DataFrame\nfrom pyspark.sql import SparkSession\nspark = SparkSession.builder.getOrCreate()\ndf = spark.createDataFrame(pd.DataFrame({\"date\": pd.date_range(\"2024-01-01\", \"2024-12-31\")}))\nresult = jan_mar.filter(df, \"date\")  # fast vectorized/distributed operation\n\n# Use DateSpanSet to filter Numpy array\nimport numpy as np\narr = np.arange(np.datetime64(\"2024-01-01\"), np.datetime64(\"2024-12-31\"))\nresult = jan_mar.filter(arr)  # fast vectorized operation\n\n# Use DateSpanSet to create an SQL WHERE statement\nsql = f\"SELECT * FROM table WHERE {jan_mar.to_sql('date')}\"\n```\n\n\n\n\n\n\n\n\n\n",
    "bugtrack_url": null,
    "license": "MIT License  Copyright (c) 2024 Thomas Zeutschler  Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:  The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.  THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ",
    "summary": "A library for handling date spans.",
    "version": "0.1.8",
    "project_urls": {
        "Changelog": "https://github.com/Zeutschler/DateSpanLib/CHANGELOG.md",
        "Documentation": "https://github.com/Zeutschler/DateSpanLib",
        "Homepage": "https://github.com/Zeutschler/DateSpanLib",
        "Issues": "https://github.com/Zeutschler/DateSpanLib/issues",
        "Repository": "https://github.com/Zeutschler/DateSpanLib.git",
        "pypi": "https://pypi.org/project/datespanlib/"
    },
    "split_keywords": [
        "python",
        " datetime",
        " timespan",
        " pandas",
        " numpy",
        " spark",
        " data analysis",
        " sql",
        " dataframe"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4f6ee57864c2a0afe97351f21c0430d1b5377f23ebf9806833caf2eea3adda99",
                "md5": "1c207a16878c2f9aafdb6af55c65292d",
                "sha256": "fb66b3c39769ae2bbc493335a722f5f54852637ad67b8fbdd08547feb7dd56a3"
            },
            "downloads": -1,
            "filename": "datespanlib-0.1.8-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1c207a16878c2f9aafdb6af55c65292d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 43090,
            "upload_time": "2024-09-21T09:01:08",
            "upload_time_iso_8601": "2024-09-21T09:01:08.716953Z",
            "url": "https://files.pythonhosted.org/packages/4f/6e/e57864c2a0afe97351f21c0430d1b5377f23ebf9806833caf2eea3adda99/datespanlib-0.1.8-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "344e3572f3653fb77a7e953c2ef5d576f56ce8f79a086843ccd2e281d55c60f6",
                "md5": "fcebad5e974b4b4f8ebc90250ffbc8ab",
                "sha256": "dedcf1efbe088555187fe99b807e99d60550777b79d5981fa8d30d5333bd6f05"
            },
            "downloads": -1,
            "filename": "datespanlib-0.1.8.tar.gz",
            "has_sig": false,
            "md5_digest": "fcebad5e974b4b4f8ebc90250ffbc8ab",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 41387,
            "upload_time": "2024-09-21T09:01:10",
            "upload_time_iso_8601": "2024-09-21T09:01:10.414351Z",
            "url": "https://files.pythonhosted.org/packages/34/4e/3572f3653fb77a7e953c2ef5d576f56ce8f79a086843ccd2e281d55c60f6/datespanlib-0.1.8.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-21 09:01:10",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Zeutschler",
    "github_project": "datespanlib",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "datespanlib"
}
        
Elapsed time: 0.39025s