# Nyctibius - Streamlining sociodemographic data harmonizing. <img src="docs/img/ny_logo.png" align="right" width="240" />
<!-- badges: start -->
[![en](https://img.shields.io/badge/lang-en-red.svg)](https://github.com/biomac-lab/harmonize/blob/main/README.md)
[![es](https://img.shields.io/badge/lang-es-yellow.svg)](https://github.com/biomac-lab/harmonize/blob/main/README.es.md)
[![License:
MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/license/mit/)
[![R-CMD-check](https://github.com/r-lib/usethis/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/r-lib/usethis/actions/workflows/R-CMD-check.yaml)
[![Codecov test
coverage](https://codecov.io/gh/%7B%7B%20gh_repo%20%7D%7D/branch/main/graph/badge.svg)](https://app.codecov.io/gh/%7B%7B%20gh_repo%20%7D%7D?branch=main)
[![lifecycle-concept](https://raw.githubusercontent.com/reconverse/reconverse.github.io/master/images/badge-concept.svg)](https://www.reconverse.org/lifecycle.html#concept)
<!-- badges: end -->
The Python package Nyctibius is designed to streamline the complex task of gathering and consolidating sociodemographic data from various sources into a cohesive relational database. Nyctibius empowers users to effortlessly unify custom data sets from diverse socio-demographic sources, ensuring that they can work with up-to-date and comprehensive information in a seamless manner. This package facilitates the process of creating a harmonized repository of socio-demographic data, simplifying data management and analysis for users across various domains.
## Features
- **Extraction:**
- Seamlessly retrieve data from online data sources through web scraping, as well as from local files.
- Support for various data formats, including `.csv`, `.xlsx`, `.xls`, `.txt`, `.sav`, and compressed files, ensuring versatility in sourcing information.
- **Transformation:**
- Consolidating extracted data into pandas DataFrame.
- Optimizing the transformation process of large files.
- Implement parallel processing for large files.
- Use efficient data structures to reduce memory footprint.
- Effectively manage data inconsistencies and discrepancies for enhanced accuracy.
- Apply anomaly detection algorithms.
- **Load:**
- Consolidating transformed data into a cohesive relational database.
- **Query:**
- Conduct precise queries and apply transformations to meet specific criteria.
- **AI Query & Visualization:**
- Using natural language input to query data (Answers from values to subsets)
- Using natural language input to create simple visualizations of data
## Who should use Nyctibius?
Nyctibius is ideal for data analysts, scientists, and researchers who frequently handle large volumes of data from varied sources and are looking for a streamlined way to consolidate, query, and visualize their data. It's also a great tool for developers working on projects that require the integration of disparate data sets into a single, manageable format. Additionally, business intelligence professionals and decision-makers will find Nyctibius invaluable for generating insights through natural language queries and visualizations, making complex data more accessible and actionable. In essence, anyone looking to simplify their data workflows, from extraction to visualization, and leverage AI for natural language querying will benefit greatly from using Nyctibius.
## Installation
For full documentation, please refer to the [Nyctibius documentation](https://drive.google.com/file/d/1f2im1gzYpxrvfmiPllAvYWC21-ZzYLNg/view?usp=sharing).
You can install the Nyctibius package using pip. Make sure you have Python 3.x installed on your system; the package requires Python version 3.7 or higher.
```shell
pip install nyctibius
```
## Usage
To use the Nyctibius package, follow these steps:
1. Import the package in your Python script:
```python
from nyctibius import Harmonizer
```
2. Create an instance of the `Harmonizer` class:
```python
harmonizer = Harmonizer()
```
3. Extract data from online sources and create a list of data information:
```python
url = 'https://www.example.com'
depth = 0
ext = 'csv'
list_datainfo = harmonizer.extract(url=url, depth=depth, ext=ext)
harmonizer = Harmonizer(list_datainfo)
```
4. Load the data from the list of data information and merge it into a relational database:
```python
results = harmonizer.load()
```
5. Import the modifier module and create an instance of the `Modifier` class:
```python
from nyctibius.db.modifier import Modifier
modifier = Modifier(db_path='../../data/output/nyctibius.db')
```
6. Perfom modifications:
```python
tables = modifier.get_tables()
print(tables)
```
7. Import the querier module and create an instance of the `Querier` class:
```python
from nyctibius.db.querier import Querier
querier = Querier(db_path='data/output/nyctibius.db')
```
8. Perform queries:
```python
df = querier.select(table="Estructura CHC_2017").execute()
print(df)
```
## Supported Data Sources
The package supports the following sources:
- Colombian microdata links from National Administrative Department of Statistics (DANE)
- Local files
- Open data sources
Please note that accessing data from these organizations may require authentication or specific credentials. Make sure you have the necessary permissions before using the library.
## License
The Nyctibius package is open-source and released under the [MIT License](https://opensource.org/licenses/MIT). Feel free to use, modify, and distribute this library in accordance with the terms of the license.
## Acknowledgements
We would like to thank the following entities for providing the data used and the economic financial support for the development of this package:
- National Administrative Department of Statistics (DANE)
- Barcelona Supercomputing Center (BSC)
- Universidad de los Andes
## Contact
For any questions, suggestions, or feedback regarding the package please contact:
Erick lozano,
Email: es.lozano@uniandes.edu.co
Diego Irreño,
Email: dirreno@unal.edu.co
## Disclaimer
This library is not officially affiliated with or endorsed by any of the mentioned official organizations. The data provided by this library is sourced from publicly available information and may not always reflect the most current or accurate data. Please verify the information with the respective official sources for critical use cases.
Raw data
{
"_id": null,
"home_page": "https://github.com/Ersebreck/Nyctibius",
"name": "nyctibius",
"maintainer": null,
"docs_url": null,
"requires_python": "<4,>=3.7",
"maintainer_email": null,
"keywords": "extract transform load etl scraping relational census",
"author": "Erick Lozano, Diego Irre\u00f1o y Cristian Amaya",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/3f/bd/eed2da5b1df6f879db547e42e83cf726ebde5949337a831f8198c6f4fe2a/nyctibius-0.0.13.tar.gz",
"platform": null,
"description": "# Nyctibius - Streamlining sociodemographic data harmonizing. <img src=\"docs/img/ny_logo.png\" align=\"right\" width=\"240\" />\r\n\r\n<!-- badges: start -->\r\n[![en](https://img.shields.io/badge/lang-en-red.svg)](https://github.com/biomac-lab/harmonize/blob/main/README.md)\r\n[![es](https://img.shields.io/badge/lang-es-yellow.svg)](https://github.com/biomac-lab/harmonize/blob/main/README.es.md)\r\n[![License:\r\nMIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/license/mit/)\r\n[![R-CMD-check](https://github.com/r-lib/usethis/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/r-lib/usethis/actions/workflows/R-CMD-check.yaml)\r\n[![Codecov test\r\ncoverage](https://codecov.io/gh/%7B%7B%20gh_repo%20%7D%7D/branch/main/graph/badge.svg)](https://app.codecov.io/gh/%7B%7B%20gh_repo%20%7D%7D?branch=main)\r\n[![lifecycle-concept](https://raw.githubusercontent.com/reconverse/reconverse.github.io/master/images/badge-concept.svg)](https://www.reconverse.org/lifecycle.html#concept)\r\n<!-- badges: end -->\r\n\r\nThe Python package Nyctibius is designed to streamline the complex task of gathering and consolidating sociodemographic data from various sources into a cohesive relational database. Nyctibius empowers users to effortlessly unify custom data sets from diverse socio-demographic sources, ensuring that they can work with up-to-date and comprehensive information in a seamless manner. This package facilitates the process of creating a harmonized repository of socio-demographic data, simplifying data management and analysis for users across various domains.\r\n\r\n## Features\r\n\r\n- **Extraction:**\r\n - Seamlessly retrieve data from online data sources through web scraping, as well as from local files.\r\n - Support for various data formats, including `.csv`, `.xlsx`, `.xls`, `.txt`, `.sav`, and compressed files, ensuring versatility in sourcing information.\r\n\r\n- **Transformation:**\r\n - Consolidating extracted data into pandas DataFrame.\r\n - Optimizing the transformation process of large files.\r\n - Implement parallel processing for large files.\r\n - Use efficient data structures to reduce memory footprint.\r\n - Effectively manage data inconsistencies and discrepancies for enhanced accuracy.\r\n - Apply anomaly detection algorithms.\r\n- **Load:**\r\n - Consolidating transformed data into a cohesive relational database.\r\n\r\n- **Query:**\r\n - Conduct precise queries and apply transformations to meet specific criteria.\r\n\r\n- **AI Query & Visualization:**\r\n - Using natural language input to query data (Answers from values to subsets)\r\n - Using natural language input to create simple visualizations of data\r\n\r\n \r\n## Who should use Nyctibius?\r\n\r\nNyctibius is ideal for data analysts, scientists, and researchers who frequently handle large volumes of data from varied sources and are looking for a streamlined way to consolidate, query, and visualize their data. It's also a great tool for developers working on projects that require the integration of disparate data sets into a single, manageable format. Additionally, business intelligence professionals and decision-makers will find Nyctibius invaluable for generating insights through natural language queries and visualizations, making complex data more accessible and actionable. In essence, anyone looking to simplify their data workflows, from extraction to visualization, and leverage AI for natural language querying will benefit greatly from using Nyctibius.\r\n\r\n## Installation\r\n\r\nFor full documentation, please refer to the [Nyctibius documentation](https://drive.google.com/file/d/1f2im1gzYpxrvfmiPllAvYWC21-ZzYLNg/view?usp=sharing).\r\n\r\nYou can install the Nyctibius package using pip. Make sure you have Python 3.x installed on your system; the package requires Python version 3.7 or higher.\r\n\r\n```shell\r\npip install nyctibius\r\n```\r\n\r\n## Usage\r\n\r\nTo use the Nyctibius package, follow these steps:\r\n\r\n1. Import the package in your Python script:\r\n\r\n ```python\r\n from nyctibius import Harmonizer\r\n ```\r\n\r\n2. Create an instance of the `Harmonizer` class:\r\n\r\n ```python\r\n harmonizer = Harmonizer()\r\n ```\r\n\r\n3. Extract data from online sources and create a list of data information:\r\n\r\n ```python\r\n url = 'https://www.example.com'\r\n depth = 0\r\n ext = 'csv'\r\n list_datainfo = harmonizer.extract(url=url, depth=depth, ext=ext)\r\n harmonizer = Harmonizer(list_datainfo)\r\n ```\r\n\r\n4. Load the data from the list of data information and merge it into a relational database:\r\n\r\n ```python\r\n results = harmonizer.load()\r\n ```\r\n\r\n5. Import the modifier module and create an instance of the `Modifier` class:\r\n\r\n ```python\r\n from nyctibius.db.modifier import Modifier\r\n modifier = Modifier(db_path='../../data/output/nyctibius.db')\r\n ```\r\n \r\n6. Perfom modifications:\r\n\r\n ```python\r\n tables = modifier.get_tables()\r\n print(tables)\r\n ```\r\n \r\n7. Import the querier module and create an instance of the `Querier` class:\r\n\r\n ```python\r\n from nyctibius.db.querier import Querier\r\n querier = Querier(db_path='data/output/nyctibius.db')\r\n ```\r\n\r\n8. Perform queries:\r\n\r\n ```python\r\n df = querier.select(table=\"Estructura CHC_2017\").execute()\r\n print(df)\r\n ```\r\n\r\n## Supported Data Sources\r\n\r\nThe package supports the following sources:\r\n\r\n- Colombian microdata links from National Administrative Department of Statistics (DANE)\r\n- Local files\r\n- Open data sources\r\n\r\nPlease note that accessing data from these organizations may require authentication or specific credentials. Make sure you have the necessary permissions before using the library.\r\n\r\n\r\n## License\r\n\r\nThe Nyctibius package is open-source and released under the [MIT License](https://opensource.org/licenses/MIT). Feel free to use, modify, and distribute this library in accordance with the terms of the license.\r\n\r\n## Acknowledgements\r\n\r\nWe would like to thank the following entities for providing the data used and the economic financial support for the development of this package:\r\n\r\n- National Administrative Department of Statistics (DANE)\r\n- Barcelona Supercomputing Center (BSC)\r\n- Universidad de los Andes\r\n\r\n## Contact\r\n\r\nFor any questions, suggestions, or feedback regarding the package please contact:\r\n\r\nErick lozano,\r\nEmail: es.lozano@uniandes.edu.co\r\n\r\nDiego Irre\u00f1o,\r\nEmail: dirreno@unal.edu.co\r\n\r\n## Disclaimer\r\n\r\nThis library is not officially affiliated with or endorsed by any of the mentioned official organizations. The data provided by this library is sourced from publicly available information and may not always reflect the most current or accurate data. Please verify the information with the respective official sources for critical use cases.\r\n",
"bugtrack_url": null,
"license": null,
"summary": "Nyctibius is a Python package for gathering and consolidating socio-demographic data.",
"version": "0.0.13",
"project_urls": {
"Bug Reports": "https://github.com/Ersebreck/Nyctibius/issues",
"Homepage": "https://github.com/Ersebreck/Nyctibius",
"Source": "https://github.com/Ersebreck/Nyctibius/"
},
"split_keywords": [
"extract",
"transform",
"load",
"etl",
"scraping",
"relational",
"census"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "98a6af19808cd071f56bc7c9e28d1a0f0d19efbdbc291a395c0fc2412f30b47b",
"md5": "567288ee94d0e62bfeb088a03d4be787",
"sha256": "919d6552e9f983989d932e92c2cc8b91a1607a83c32fff196f7bef54eea8a9bc"
},
"downloads": -1,
"filename": "nyctibius-0.0.13-py3-none-any.whl",
"has_sig": false,
"md5_digest": "567288ee94d0e62bfeb088a03d4be787",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4,>=3.7",
"size": 28461,
"upload_time": "2024-04-18T02:37:38",
"upload_time_iso_8601": "2024-04-18T02:37:38.249404Z",
"url": "https://files.pythonhosted.org/packages/98/a6/af19808cd071f56bc7c9e28d1a0f0d19efbdbc291a395c0fc2412f30b47b/nyctibius-0.0.13-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "3fbdeed2da5b1df6f879db547e42e83cf726ebde5949337a831f8198c6f4fe2a",
"md5": "38e29663ef3901eb3e31e77c7dbfd6a6",
"sha256": "c1c64d977a4333d74a717e21ce5ecd04eb1cb7a3622575bba168c09f24510d43"
},
"downloads": -1,
"filename": "nyctibius-0.0.13.tar.gz",
"has_sig": false,
"md5_digest": "38e29663ef3901eb3e31e77c7dbfd6a6",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4,>=3.7",
"size": 29550,
"upload_time": "2024-04-18T02:37:40",
"upload_time_iso_8601": "2024-04-18T02:37:40.303309Z",
"url": "https://files.pythonhosted.org/packages/3f/bd/eed2da5b1df6f879db547e42e83cf726ebde5949337a831f8198c6f4fe2a/nyctibius-0.0.13.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-04-18 02:37:40",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Ersebreck",
"github_project": "Nyctibius",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "pandas",
"specs": [
[
"~=",
"2.0.3"
]
]
},
{
"name": "requests",
"specs": [
[
"~=",
"2.31.0"
]
]
},
{
"name": "Scrapy",
"specs": [
[
"~=",
"2.11.1"
]
]
},
{
"name": "tqdm",
"specs": [
[
"~=",
"4.66.1"
]
]
},
{
"name": "pyreadstat",
"specs": [
[
"~=",
"1.2.6"
]
]
},
{
"name": "py7zr",
"specs": [
[
"~=",
"0.20.8"
]
]
},
{
"name": "pandasai",
"specs": [
[
"~=",
"2.0.30"
]
]
},
{
"name": "openpyxl",
"specs": [
[
"~=",
"3.1.2"
]
]
},
{
"name": "matplotlib",
"specs": []
},
{
"name": "numpy",
"specs": []
}
],
"lcname": "nyctibius"
}