Name | nfelodcm JSON |
Version |
0.1.14
JSON |
| download |
home_page | None |
Summary | Python package for loading and caching CSVs hosted on github into pandas dataframes |
upload_time | 2024-10-14 03:52:47 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.11 |
license | MIT |
keywords |
nfl
nflfastr
nfelo
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# nfelo DCM
nfelo DCM is an abstraction layer for loading and saving NFL related CSVs stored on the web. DCM stands for Dataframe-CSV Mapping. The goal of the DCM is to get pandas dataframes of fresh data loaded in a way that balances simplicity, efficiency, and performance.
```python
import nfelodcm
import pandas as pd
## Load 2 dataframes
db = nfelodcm.load(['pbp', 'games'])
## access the PBP dataframe ##
db['pbp']
```
## Maps
Maps are config files that tell the dcm, where data CSVs are located, how they should be retrieved, and what fields to pull. Each CSV has its own config, where parameters can be set for things like freshness SLAs, CSV parsing engines, assignments (aka mutations).
An important characteristic of these maps, and overall framework, is that all fields must be 1) specified in the map and 2) typed. Fields not listed in the map will not be loaded. Fields untyped will throw an error.
Here is a sample config:
```javascript
{
"name": "games",
"description": "nflgamedata games",
"last_local_update": "2023-12-16T22:42:41.040569",
"download_url": "https://raw.githubusercontent.com/nflverse/nfldata/master/data/games.csv",
"compression": null,
"engine": "c",
"freshness": {
"type": "gh_commit",
"gh_api_endpoint": "https://api.github.com/repos/nflverse/nfldata/commits",
"gh_release_tag": null,
"sla_seconds": null
},
"iter": {
"type": null,
"start": null
},
"assignments": [
"game_id_repl"
],
"map": {
"game_id": "object",
"season": "int32",
"game_type": "object",
"week": "int32",
"gameday": "object",
"weekday": "object",
"gametime": "object",
"away_team": "object",
"away_score": "float32",
"home_team": "object",
"home_score": "float32",
"location": "object",
"result": "float32",
"total": "float32",
"overtime": "float32",
"old_game_id": "float32",
"gsis": "float32",
"nfl_detail_id": "object",
"pfr": "object",
"pff": "float32",
"espn": "int32",
"ftn": "float32",
"away_rest": "int32",
"home_rest": "int32",
"away_moneyline": "float32",
"home_moneyline": "float32",
"spread_line": "float32",
"away_spread_odds": "float32",
"home_spread_odds": "float32",
"total_line": "float32",
"under_odds": "float32",
"over_odds": "float32",
"div_game": "int32",
"roof": "object",
"surface": "object",
"temp": "float32",
"wind": "float32",
"away_qb_id": "object",
"home_qb_id": "object",
"away_qb_name": "object",
"home_qb_name": "object",
"away_coach": "object",
"home_coach": "object",
"referee": "object",
"stadium_id": "object",
"stadium": "object"
}
}
```
## Data
When a CSV is translated into a Dataframe, a copy of the data is stored locally for cached retrieval based on SLAs and freshness. For data stored in github, freshness is determined by either the last release or last commit.
Presently, data is stored locally as CSVs
## Assignments
Assignment is the pandas vernacular for mutate. In the DCM, "Assignments" reference functions that take a dataframe as an input and returns a mutated/assigned dataframe as its response. Assignments can be added to the assignments folder and referenced by name in config files.
## Retrieval
To load data, pass an array of table names to the .load() function. The name passed for each table should match the name of the map file (ie passing 'pbp' would retrieve whatever data was specified in the 'pbp.json')
When this function is called, all freshness checks, caching, downloading, field typing, and mutations are handled automatically behind the scenes.
Raw data
{
"_id": null,
"home_page": null,
"name": "nfelodcm",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.11",
"maintainer_email": "Robert Greer <nfl@robbygreer.com>",
"keywords": "nfl, nflfastR, nfelo",
"author": null,
"author_email": "Robert Greer <nfl@robbygreer.com>",
"download_url": "https://files.pythonhosted.org/packages/2c/be/43b86c5fabafd10132fec3a71363efd8c11ca15cb86f9f44e045c05bb58b/nfelodcm-0.1.14.tar.gz",
"platform": null,
"description": "# nfelo DCM\n\nnfelo DCM is an abstraction layer for loading and saving NFL related CSVs stored on the web. DCM stands for Dataframe-CSV Mapping. The goal of the DCM is to get pandas dataframes of fresh data loaded in a way that balances simplicity, efficiency, and performance.\n\n```python\nimport nfelodcm\nimport pandas as pd\n\n## Load 2 dataframes\ndb = nfelodcm.load(['pbp', 'games'])\n## access the PBP dataframe ##\ndb['pbp']\n\n```\n\n## Maps\nMaps are config files that tell the dcm, where data CSVs are located, how they should be retrieved, and what fields to pull. Each CSV has its own config, where parameters can be set for things like freshness SLAs, CSV parsing engines, assignments (aka mutations).\n\nAn important characteristic of these maps, and overall framework, is that all fields must be 1) specified in the map and 2) typed. Fields not listed in the map will not be loaded. Fields untyped will throw an error.\n\nHere is a sample config:\n\n```javascript\n{\n \"name\": \"games\",\n \"description\": \"nflgamedata games\",\n \"last_local_update\": \"2023-12-16T22:42:41.040569\",\n \"download_url\": \"https://raw.githubusercontent.com/nflverse/nfldata/master/data/games.csv\",\n \"compression\": null,\n \"engine\": \"c\",\n \"freshness\": {\n \"type\": \"gh_commit\",\n \"gh_api_endpoint\": \"https://api.github.com/repos/nflverse/nfldata/commits\",\n \"gh_release_tag\": null,\n \"sla_seconds\": null\n },\n \"iter\": {\n \"type\": null,\n \"start\": null\n },\n \"assignments\": [\n \"game_id_repl\"\n ],\n \"map\": {\n \"game_id\": \"object\",\n \"season\": \"int32\",\n \"game_type\": \"object\",\n \"week\": \"int32\",\n \"gameday\": \"object\",\n \"weekday\": \"object\",\n \"gametime\": \"object\",\n \"away_team\": \"object\",\n \"away_score\": \"float32\",\n \"home_team\": \"object\",\n \"home_score\": \"float32\",\n \"location\": \"object\",\n \"result\": \"float32\",\n \"total\": \"float32\",\n \"overtime\": \"float32\",\n \"old_game_id\": \"float32\",\n \"gsis\": \"float32\",\n \"nfl_detail_id\": \"object\",\n \"pfr\": \"object\",\n \"pff\": \"float32\",\n \"espn\": \"int32\",\n \"ftn\": \"float32\",\n \"away_rest\": \"int32\",\n \"home_rest\": \"int32\",\n \"away_moneyline\": \"float32\",\n \"home_moneyline\": \"float32\",\n \"spread_line\": \"float32\",\n \"away_spread_odds\": \"float32\",\n \"home_spread_odds\": \"float32\",\n \"total_line\": \"float32\",\n \"under_odds\": \"float32\",\n \"over_odds\": \"float32\",\n \"div_game\": \"int32\",\n \"roof\": \"object\",\n \"surface\": \"object\",\n \"temp\": \"float32\",\n \"wind\": \"float32\",\n \"away_qb_id\": \"object\",\n \"home_qb_id\": \"object\",\n \"away_qb_name\": \"object\",\n \"home_qb_name\": \"object\",\n \"away_coach\": \"object\",\n \"home_coach\": \"object\",\n \"referee\": \"object\",\n \"stadium_id\": \"object\",\n \"stadium\": \"object\"\n }\n}\n```\n\n## Data\nWhen a CSV is translated into a Dataframe, a copy of the data is stored locally for cached retrieval based on SLAs and freshness. For data stored in github, freshness is determined by either the last release or last commit.\nPresently, data is stored locally as CSVs\n\n## Assignments\nAssignment is the pandas vernacular for mutate. In the DCM, \"Assignments\" reference functions that take a dataframe as an input and returns a mutated/assigned dataframe as its response. Assignments can be added to the assignments folder and referenced by name in config files.\n\n## Retrieval\nTo load data, pass an array of table names to the .load() function. The name passed for each table should match the name of the map file (ie passing 'pbp' would retrieve whatever data was specified in the 'pbp.json')\nWhen this function is called, all freshness checks, caching, downloading, field typing, and mutations are handled automatically behind the scenes.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Python package for loading and caching CSVs hosted on github into pandas dataframes",
"version": "0.1.14",
"project_urls": {
"homepage": "https://github.com/greerreNFL/nfelodcm",
"repository": "https://github.com/greerreNFL/nfelodcm.git"
},
"split_keywords": [
"nfl",
" nflfastr",
" nfelo"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "c750bb56abde77d458e4708b7643b64b97b8a22afbedacb5342a8fd1dc01c14d",
"md5": "af0a2c50be07cecfd30cc10e2549c4d2",
"sha256": "b1b55107f1a10b16c5981c176a5bbe51086850e187ffcca96642a70783b49826"
},
"downloads": -1,
"filename": "nfelodcm-0.1.14-py3-none-any.whl",
"has_sig": false,
"md5_digest": "af0a2c50be07cecfd30cc10e2549c4d2",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.11",
"size": 34597,
"upload_time": "2024-10-14T03:52:45",
"upload_time_iso_8601": "2024-10-14T03:52:45.714566Z",
"url": "https://files.pythonhosted.org/packages/c7/50/bb56abde77d458e4708b7643b64b97b8a22afbedacb5342a8fd1dc01c14d/nfelodcm-0.1.14-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "2cbe43b86c5fabafd10132fec3a71363efd8c11ca15cb86f9f44e045c05bb58b",
"md5": "a1ad6e6fb0e1324dbde8066db7c76e43",
"sha256": "9d6c7a45e95a746a968d32c86c867abf9ef00c5e30494dac1f989ecb31235f79"
},
"downloads": -1,
"filename": "nfelodcm-0.1.14.tar.gz",
"has_sig": false,
"md5_digest": "a1ad6e6fb0e1324dbde8066db7c76e43",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.11",
"size": 22299,
"upload_time": "2024-10-14T03:52:47",
"upload_time_iso_8601": "2024-10-14T03:52:47.339448Z",
"url": "https://files.pythonhosted.org/packages/2c/be/43b86c5fabafd10132fec3a71363efd8c11ca15cb86f9f44e045c05bb58b/nfelodcm-0.1.14.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-14 03:52:47",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "greerreNFL",
"github_project": "nfelodcm",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "nfelodcm"
}