cmdata


Namecmdata JSON
Version 0.0.3 PyPI version JSON
download
home_pageNone
SummaryA package to process datasets provided by CargoMetrics Technologies Inc.
upload_time2024-04-24 12:28:59
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9
licenseMIT License Copyright (c) 2023 CargoMetrics Technologies Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords cmdata cargometrics point in time
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # cmdata 

> A python package to get started with CargoMetrics data products

The main goal of this python package is to get subscribers of the CargoMetrics data products up and running with our 
data as fast as possible. With that in mind we provide functions to quickly access the various datasets and perform 
the first couple of common transformations. After that, the universe is yours...  

# Getting started

The cmdata python package from CargoMetrics provides tools to get started with the _Advanced_ datasets, which are 
datasets that contain point-in-time data.

## Installation
The cmdata is available as a pip and can be installed into your python environment

```bash
> pip install cmdata
```

cmdata requires a python version > 3.9 and a pandas version > 1.0.

## Generating a view
The point-in-time Advanced datasets contain multiple time dimensions for each datapoint within the dataset 
(see [“Point-in-time deep dive”](#point-in-time-deep-dive)). To get started inspecting and assessing the data, 
the cmdata package provides a couple of views that reduce the dual time dimensionality to a single time-series. 
For a more in-depth example see [“Point-in-time deep dive”](#point-in-time-deep-dive).

To generate a view:

```python
from cmdata.commodities import point_in_time as pit

PATH = ’...’

pit_df = pit.read(PATH)
view = pit.standard_view(pit_df, asof=’2023-01-01’)
```

This view can be explored or transformed as any tabular one-dimensional dataset.

For example, to generate a plot:

```python
# plot Australian exports
ax = view[('export', 'AUS')].plot(figsize=(10, 3))
ax.set_xlabel('Date')
ax.set_ylabel('AUS exports of Iron Ore in mt / day')
```
![Australia Iron Ore exports - standard view](docs/images/standard-aus-iron-oir.png)

# Point-in-time deep dive

<blockquote>

**Terminology**

Throughout this document the following terms are used:

* **dataset**: a collection of (daily) increments
* **increment**: a collection of observations, published on day T, that contains information about the 
  last T-3 through T-90 days (i.e., a single csv file)
* **observation**: for the advanced commodity products an observation is the amount (in metric tons) of a 
  particular cargo that is imported or exported by a country on a given day
* **activity_date**: date associated with the observation, i.e., the date of import/export of cargo 
  into/out of country
* **publication_timestamp**: the time an increment was published, i.e., the time an increment is available 
  to the customer
* **lag**: the number of days difference between the _publication timestamp_ and the _activity date_
</blockquote>

The CargoMetrics’ _Advanced products_ are point-in-time datasets. This means that

* Each day, the CargoMetrics system use the input datasets, such as AIS, port, and vessel information, 
  available at that time to produce an estimate of global maritime trade covering the last 90 days 
  (this is referred to as the _increment_)
* Each observation in an increment has two associated times:
* The _publication timestamp_ (see box above)
* The _activity date_ (see box above)
* The collection of increments forms the point-in-time _dataset_, which provides the full history back  
  to 2013, and enables customers to train models without look-ahead bias and perform honest backtests.

The following section provides a step-by-step overview of how the _Advanced products_ are constructed.

## One increment

Each day (T), a single increment is added to the point-in-time dataset. This increment contains estimates 
of global maritime trade for activity dates T-3 through T-90. For example: the increment published on 2024-01-01 
contains activity dates ranging from 2023-10-03 (i.e., lag 90) through 2023-12-29 (i.e., lag 3).

The plot below shows a graphical representation of this increment in a two-dimensional time plot where:

* Each square represents an observation
* _Publication timestamp_ is along the vertical axis
* _Activity date_ is along the horizontal axis

![point-in-time visualization: one increment](docs/images/pit-legos-01.png)

## Multiple increments

The increment published 2024-01-02, i.e., the day after the increment depicted above, contains _activity dates_ 
ranging from 2023-10-03 through 2023-12-30. This means that 87 _activity dates_ are present in both increments. 
The graphical representation looks like

![point-in-time visualization: two increment](docs/images/pit-legos-02.png)

And three increments look like

![point-in-time visualization: three increment](docs/images/pit-legos-03.png)

Each increment, compared to the previous, adds one new day at the frontier of time (along the _activity date_ axes) 
and removes one day at the trailing end. 

A couple of things to note about this organization:

1: In the full dataset the same _activity date_ shows up in multiple increments. In other words, there are multiple 
observations for each _activity date_. 

![point-in-time visualization: multiple activity dates](docs/images/pit-legos-04.png) 

2: Each observation can be uniquely identified by its _activity date_ and _publication timestamp_ or _activity date_ 
and lag. 

![point-in-time visualization: uniqueness](docs/images/pit-legos-05.png)

## Lags 3 through 90

A note on why the _Advanced_ products contain only lags between three and 90 days for each increment:

* The upper limit of 90 days is set by the longest processes that occur in maritime shipping; 90 days 
  covers the longest voyages. For example, 90 days at 12 knots (a typical speed for tankers and dry bulk vessels)
  covers more distance than the circumference of the earth.
* The lower limit of 3 days is set by the update characteristics of the input data feeding the system. 
  Delays of up to 2 days between when characteristics change - such as the draft of a vessel - and when that change 
  is available in the input data are common.

# Building views

The two-dimensional point-in-time data provides crucial features that are important for training models on past data 
and for running honest backtests on those models. The main characteristic that facilitates this is the ability to 
select only the observations that were available at a particular time in the past.

To work with the point-in-time data, either to visualize it or to use it as an input to training a model, the dataset 
needs to be reduced to one time-dimension, hereafter referred to as a _view_. Typically, a time-series in terms of 
_activity dates_ is what is required.

A view is defined as a set of rules that results in exactly one publication for each _activity date_. The rules that 
define a view depend on the user’s needs. A few examples of views are depicted below. The `cmdata` package implements 
some of these views, which can be used as templates for other use cases.

> **Note**: New information is added each increment and the system modeling maritime trade becomes more accurate 
> the more information it has available. Long story short: the data matures over time, from increment to increment.

## A fixed lag view

The fixed lag view is defined by a single lag and selects only _activity dates_ that have the same lag. This view suits 
users interested in capturing the same level of maturation of the data for every activity date. A graphical 
representation of this view, selecting only the observations marked in black, looks like

![point-in-time visualization: fixed lag view](docs/images/pit-legos-06.png)

To create this view from a point-in-time dataset use:

```python
from cmdata.commodities import point_in_time as pit

PATH = ’...’

pit_df = pit.read(PATH)

# generate a fixed lag view at a 7-day lag, including data
#   available on or before 2023-01-01
#
view = pit.fixed_lag_view(pit_df, lag=7, asof=’2023-01-01’)
```

## A maturing view

The maturing view, which is provided in the Standard products for CargoMetrics Commodity products, selects the 
_activity date_ with the most up-to-date information from the available increments. This translates in selecting 
the _activity dates_ with lags 3 through 90 from the most recent increment; for the remaining increments select 
the _activity date_ with lag 90 only. The graphical representation is as follows, selecting the dark observations only:

![point-in-time visualization: standard view](docs/images/pit-legos-07.png)

To create this view from a point-in-time dataset use:

```python
from cmdata.commodities import point_in_time as pit

PATH = ’...’

pit_df = pit.read(PATH)

# generate a standard, aka maturing, view, including data
#   available on or before 2023-01-01
#
view = pit.standard_view(pit_df, asof=’2023-01-01’)
```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "cmdata",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "cmdata, cargometrics, point in time",
    "author": null,
    "author_email": "\"CargoMetrics Technologies Inc.\" <support@cargometrics.com>",
    "download_url": "https://files.pythonhosted.org/packages/32/a1/81ccf59b902477f344aeb0a29c823b36a2c31913f8ace9955b3263d47968/cmdata-0.0.3.tar.gz",
    "platform": null,
    "description": "# cmdata \n\n> A python package to get started with CargoMetrics data products\n\nThe main goal of this python package is to get subscribers of the CargoMetrics data products up and running with our \ndata as fast as possible. With that in mind we provide functions to quickly access the various datasets and perform \nthe first couple of common transformations. After that, the universe is yours...  \n\n# Getting started\n\nThe cmdata python package from CargoMetrics provides tools to get started with the _Advanced_ datasets, which are \ndatasets that contain point-in-time data.\n\n## Installation\nThe cmdata is available as a pip and can be installed into your python environment\n\n```bash\n> pip install cmdata\n```\n\ncmdata requires a python version > 3.9 and a pandas version > 1.0.\n\n## Generating a view\nThe point-in-time Advanced datasets contain multiple time dimensions for each datapoint within the dataset \n(see [\u201cPoint-in-time deep dive\u201d](#point-in-time-deep-dive)). To get started inspecting and assessing the data, \nthe cmdata package provides a couple of views that reduce the dual time dimensionality to a single time-series. \nFor a more in-depth example see [\u201cPoint-in-time deep dive\u201d](#point-in-time-deep-dive).\n\nTo generate a view:\n\n```python\nfrom cmdata.commodities import point_in_time as pit\n\nPATH = \u2019...\u2019\n\npit_df = pit.read(PATH)\nview = pit.standard_view(pit_df, asof=\u20192023-01-01\u2019)\n```\n\nThis view can be explored or transformed as any tabular one-dimensional dataset.\n\nFor example, to generate a plot:\n\n```python\n# plot Australian exports\nax = view[('export', 'AUS')].plot(figsize=(10, 3))\nax.set_xlabel('Date')\nax.set_ylabel('AUS exports of Iron Ore in mt / day')\n```\n![Australia Iron Ore exports - standard view](docs/images/standard-aus-iron-oir.png)\n\n# Point-in-time deep dive\n\n<blockquote>\n\n**Terminology**\n\nThroughout this document the following terms are used:\n\n* **dataset**: a collection of (daily) increments\n* **increment**: a collection of observations, published on day T, that contains information about the \n  last T-3 through T-90 days (i.e., a single csv file)\n* **observation**: for the advanced commodity products an observation is the amount (in metric tons) of a \n  particular cargo that is imported or exported by a country on a given day\n* **activity_date**: date associated with the observation, i.e., the date of import/export of cargo \n  into/out of country\n* **publication_timestamp**: the time an increment was published, i.e., the time an increment is available \n  to the customer\n* **lag**: the number of days difference between the _publication timestamp_ and the _activity date_\n</blockquote>\n\nThe CargoMetrics\u2019 _Advanced products_ are point-in-time datasets. This means that\n\n* Each day, the CargoMetrics system use the input datasets, such as AIS, port, and vessel information, \n  available at that time to produce an estimate of global maritime trade covering the last 90 days \n  (this is referred to as the _increment_)\n* Each observation in an increment has two associated times:\n* The _publication timestamp_ (see box above)\n* The _activity date_ (see box above)\n* The collection of increments forms the point-in-time _dataset_, which provides the full history back  \n  to 2013, and enables customers to train models without look-ahead bias and perform honest backtests.\n\nThe following section provides a step-by-step overview of how the _Advanced products_ are constructed.\n\n## One increment\n\nEach day (T), a single increment is added to the point-in-time dataset. This increment contains estimates \nof global maritime trade for activity dates T-3 through T-90. For example: the increment published on 2024-01-01 \ncontains activity dates ranging from 2023-10-03 (i.e., lag 90) through 2023-12-29 (i.e., lag 3).\n\nThe plot below shows a graphical representation of this increment in a two-dimensional time plot where:\n\n* Each square represents an observation\n* _Publication timestamp_ is along the vertical axis\n* _Activity date_ is along the horizontal axis\n\n![point-in-time visualization: one increment](docs/images/pit-legos-01.png)\n\n## Multiple increments\n\nThe increment published 2024-01-02, i.e., the day after the increment depicted above, contains _activity dates_ \nranging from 2023-10-03 through 2023-12-30. This means that 87 _activity dates_ are present in both increments. \nThe graphical representation looks like\n\n![point-in-time visualization: two increment](docs/images/pit-legos-02.png)\n\nAnd three increments look like\n\n![point-in-time visualization: three increment](docs/images/pit-legos-03.png)\n\nEach increment, compared to the previous, adds one new day at the frontier of time (along the _activity date_ axes) \nand removes one day at the trailing end. \n\nA couple of things to note about this organization:\n\n1: In the full dataset the same _activity date_ shows up in multiple increments. In other words, there are multiple \nobservations for each _activity date_. \n\n![point-in-time visualization: multiple activity dates](docs/images/pit-legos-04.png) \n\n2: Each observation can be uniquely identified by its _activity date_ and _publication timestamp_ or _activity date_ \nand lag. \n\n![point-in-time visualization: uniqueness](docs/images/pit-legos-05.png)\n\n## Lags 3 through 90\n\nA note on why the _Advanced_ products contain only lags between three and 90 days for each increment:\n\n* The upper limit of 90 days is set by the longest processes that occur in maritime shipping; 90 days \n  covers the longest voyages. For example, 90 days at 12 knots (a typical speed for tankers and dry bulk vessels)\n  covers more distance than the circumference of the earth.\n* The lower limit of 3 days is set by the update characteristics of the input data feeding the system. \n  Delays of up to 2 days between when characteristics change - such as the draft of a vessel - and when that change \n  is available in the input data are common.\n\n# Building views\n\nThe two-dimensional point-in-time data provides crucial features that are important for training models on past data \nand for running honest backtests on those models. The main characteristic that facilitates this is the ability to \nselect only the observations that were available at a particular time in the past.\n\nTo work with the point-in-time data, either to visualize it or to use it as an input to training a model, the dataset \nneeds to be reduced to one time-dimension, hereafter referred to as a _view_. Typically, a time-series in terms of \n_activity dates_ is what is required.\n\nA view is defined as a set of rules that results in exactly one publication for each _activity date_. The rules that \ndefine a view depend on the user\u2019s needs. A few examples of views are depicted below. The `cmdata` package implements \nsome of these views, which can be used as templates for other use cases.\n\n> **Note**: New information is added each increment and the system modeling maritime trade becomes more accurate \n> the more information it has available. Long story short: the data matures over time, from increment to increment.\n\n## A fixed lag view\n\nThe fixed lag view is defined by a single lag and selects only _activity dates_ that have the same lag. This view suits \nusers interested in capturing the same level of maturation of the data for every activity date. A graphical \nrepresentation of this view, selecting only the observations marked in black, looks like\n\n![point-in-time visualization: fixed lag view](docs/images/pit-legos-06.png)\n\nTo create this view from a point-in-time dataset use:\n\n```python\nfrom cmdata.commodities import point_in_time as pit\n\nPATH = \u2019...\u2019\n\npit_df = pit.read(PATH)\n\n# generate a fixed lag view at a 7-day lag, including data\n#   available on or before 2023-01-01\n#\nview = pit.fixed_lag_view(pit_df, lag=7, asof=\u20192023-01-01\u2019)\n```\n\n## A maturing view\n\nThe maturing view, which is provided in the Standard products for CargoMetrics Commodity products, selects the \n_activity date_ with the most up-to-date information from the available increments. This translates in selecting \nthe _activity dates_ with lags 3 through 90 from the most recent increment; for the remaining increments select \nthe _activity date_ with lag 90 only. The graphical representation is as follows, selecting the dark observations only:\n\n![point-in-time visualization: standard view](docs/images/pit-legos-07.png)\n\nTo create this view from a point-in-time dataset use:\n\n```python\nfrom cmdata.commodities import point_in_time as pit\n\nPATH = \u2019...\u2019\n\npit_df = pit.read(PATH)\n\n# generate a standard, aka maturing, view, including data\n#   available on or before 2023-01-01\n#\nview = pit.standard_view(pit_df, asof=\u20192023-01-01\u2019)\n```\n",
    "bugtrack_url": null,
    "license": "MIT License  Copyright (c) 2023 CargoMetrics Technologies Inc.  Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:  The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.  THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ",
    "summary": "A package to process datasets provided by CargoMetrics Technologies Inc.",
    "version": "0.0.3",
    "project_urls": {
        "Homepage": "https://github.com/cargometrics/cmdata",
        "Issues": "https://github.com/cargometrics/cmdata/issues"
    },
    "split_keywords": [
        "cmdata",
        " cargometrics",
        " point in time"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "22c5cd97cd920387b0afbe1414065f52f9ea30229bf108ef014dc7c5cac5ec0e",
                "md5": "4a4e879b70edda7fd1b368b930283b71",
                "sha256": "8c4f05a10874f77ee4be70edb16ddd06d9efdb439221bb713159851af836d53a"
            },
            "downloads": -1,
            "filename": "cmdata-0.0.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "4a4e879b70edda7fd1b368b930283b71",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 10305,
            "upload_time": "2024-04-24T12:28:57",
            "upload_time_iso_8601": "2024-04-24T12:28:57.744487Z",
            "url": "https://files.pythonhosted.org/packages/22/c5/cd97cd920387b0afbe1414065f52f9ea30229bf108ef014dc7c5cac5ec0e/cmdata-0.0.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "32a181ccf59b902477f344aeb0a29c823b36a2c31913f8ace9955b3263d47968",
                "md5": "a1df3454faebfacba4036ee5855cb9d0",
                "sha256": "094c33512c09798dfaa5f4fc0508e6c0876c821887033e9cbf84cf8ead33c7db"
            },
            "downloads": -1,
            "filename": "cmdata-0.0.3.tar.gz",
            "has_sig": false,
            "md5_digest": "a1df3454faebfacba4036ee5855cb9d0",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 12080,
            "upload_time": "2024-04-24T12:28:59",
            "upload_time_iso_8601": "2024-04-24T12:28:59.272002Z",
            "url": "https://files.pythonhosted.org/packages/32/a1/81ccf59b902477f344aeb0a29c823b36a2c31913f8ace9955b3263d47968/cmdata-0.0.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-24 12:28:59",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "cargometrics",
    "github_project": "cmdata",
    "github_not_found": true,
    "lcname": "cmdata"
}
        
Elapsed time: 0.59690s