# socio4health <a href='https://www.harmonize-tools.org/'><img src='https://harmonize-tools.github.io/harmonize-logo.png' align="right" height="139" /></a>
<!-- badges: start -->
[![Lifecycle:
maturing](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)
[![MIT
license](https://img.shields.io/badge/License-MIT-blue.svg)](https://github.com/harmonize-tools/socio4health/blob/main/LICENSE.md/)
[![GitHub
contributors](https://img.shields.io/github/contributors/harmonize-tools/socio4health)](https://github.com/harmonize-tools/socio4health/graphs/contributors)
![commits](https://badgen.net/github/commits/harmonize-tools/socio4health/main)
<!-- badges: end -->
## Overview
<p style="font-family: Arial, sans-serif; font-size: 14px;">
Package socio4health is an extraction, transformation, loading (ETL) and AI-assisted query and visualization (AI QV) tool designed to simplify the intricate process of collecting and merging data from multiple sources focusing in sociodemografic and census datasets from Colombia, Brasil and Peru, into a unified relational database structure and visualize or querying it using natural language.
</p>
- Seamlessly retrieve data from online data sources through web scraping, as well as from local files.
- Support for various data formats, including .csv, .xlsx, .xls, .txt, .sav, and compressed files, ensuring versatility in sourcing information.
- Consolidating extracted data into pandas DataFrame.
- Consolidating transformed data into a cohesive relational database.
- Conduct precise queries and apply transformations to meet specific criteria.
- Using natural language input to query data (Answers from values to subsets)
- Using natural language input to create simple visualizations of data
## Dependencies
<table>
<tr>
<td align="center">
<a href="https://pandas.pydata.org/" target="_blank">
<img src="https://avatars.githubusercontent.com/u/21206976?s=280&v=4" height="50" alt="pandas logo">
</a>
</td>
<td align="left">
<strong>Pandas</strong><br>
Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool.<br>
</td>
</tr>
<tr>
<td align="center">
<a href="https://numpy.org/" target="_blank">
<img src="https://avatars.githubusercontent.com/u/288276?s=48&v=4" height="50" alt="numpy logo">
</a>
</td>
<td align="left">
<strong>Numpy</strong><br>
The fundamental package for scientific computing with Python.<br>
</td>
</tr>
<tr>
<td align="center">
<a href="https://scrapy.org/" target="_blank">
<img src="https://avatars.githubusercontent.com/u/733635?s=48&v=4" height="50" alt="scrapy logo">
</a>
</td>
<td align="left">
<strong>Scrapy</strong><br>
Framework for extracting the data you need from websites.<br>
</td>
</tr>
<tr>
<td align="center">
<a href="https://pandas-ai.com/" target="_blank">
<img src="https://avatars.githubusercontent.com/u/154438448?s=48&v=4" height="50" alt="ggplot2 logo">
</a>
</td>
<td align="left">
<strong>Pandasai</strong><br>
Integrates generative artificial intelligence capabilities into pandas, making dataframes conversational.<br>
</td>
</tr>
</table>
- <a href="https://openpyxl.readthedocs.io/en/stable/">openpyxl</a>
- <a href="https://py7zr.readthedocs.io/en/latest/">py7zr</a>
- <a href="https://pypi.org/project/pyreadstat/">pyreadstat</a>
- <a href="https://tqdm.github.io/">tqdm</a>
- <a href="https://requests.readthedocs.io/en/latest/">requests</a>
## Installation
You can install the latest version of the package from GitHub using the `remotes` package:
```R
# Install using pip
pip install nyctibius
```
## How to Use it
To use the Nyctibius package, follow these steps:
1. Import the package in your Python script:
```python
from socio4health import Harmonizer
```
2. Create an instance of the `Harmonizer` class:
```python
harmonizer = Harmonizer()
```
3. Extract data from online sources and create a list of data information:
```python
url = 'https://www.example.com'
depth = 0
ext = 'csv'
list_datainfo = harmonizer.extract(url=url, depth=depth, ext=ext)
harmonizer = Harmonizer(list_datainfo)
```
4. Load the data from the list of data information and merge it into a relational database:
```python
results = harmonizer.load()
```
5. Import the modifier module and create an instance of the `Modifier` class:
```python
from socio4health.db.modifier import Modifier
modifier = Modifier(db_path='../../data/output/nyctibius.db')
```
6. Perfom modifications:
```python
tables = modifier.get_tables()
print(tables)
```
7. Import the querier module and create an instance of the `Querier` class:
```python
from socio4health.db.querier import Querier
querier = Querier(db_path='data/output/socio4health.db')
```
8. Perform queries:
```python
df = querier.select(table="Estructura CHC_2017").execute()
print(df)
```
## Resources
<details>
<summary>
Package Website
</summary>
The [socio4health website](https://ersebreck.github.io/Nyctibius/) package website includes a function reference, a model outline, and case studies using the package. The site mainly concerns the release version, but you can also find documentation for the latest development version.
</details>
<details>
<summary>
Organisation Website
</summary>
[Harmonize](https://www.harmonize-tools.org/) is an international develop cost-effective and reproducible digital tools for stakeholders in hotspots affected by a changing climate in Latin America & the Caribbean (LAC), including cities, small islands, highlands, and the Amazon rainforest.
The project consists of resources and [tools](https://harmonize-tools.github.io/) developed in conjunction with different teams from Brazil, Colombia, Dominican Republic, Peru and Spain.
</details>
## Organizations
<table>
<tr>
<td align="center">
<a href="https://www.bsc.es/" target="_blank">
<img src="https://imgs.search.brave.com/t_FUOTCQZmDh3ddbVSX1LgHYq4mzCxvVA8U_YHywMTc/rs:fit:500:0:0/g:ce/aHR0cHM6Ly9zb21t/YS5lcy93cC1jb250/ZW50L3VwbG9hZHMv/MjAyMi8wNC9CU0Mt/Ymx1ZS1zbWFsbC5q/cGc" height="64" alt="bsc logo">
</a>
</td>
<td align="center">
<a href="https://uniandes.edu.co/" target="_blank">
<img src="https://uniandes.edu.co/sites/default/files/logo-uniandes.png" height="64" alt="uniandes logo">
</a>
</td>
</tr>
</table>
## Authors / Contact information
List the authors/contributors of the package and provide contact information if users have questions or feedback.
</br>
</br>
<a href="https://github.com/dirreno">
<img src="https://avatars.githubusercontent.com/u/39099417?v=4" style="width: 50px; height: auto;" />
</a>
<span style="display: flex; align-items: center; margin-left: 10px;">
<strong>Diego Irreño</strong> (developer)
</span>
</br>
<a href="https://github.com/Ersebreck">
<img src="https://avatars.githubusercontent.com/u/81669194?v=4" style="width: 50px; height: auto;" />
</a>
<span style="display: flex; align-items: center; margin-left: 10px;">
<strong>Erick Lozano</strong> (developer)
</span>
Raw data
{
"_id": null,
"home_page": "https://github.com/harmonize-tools/socio4health",
"name": "socio4health",
"maintainer": null,
"docs_url": null,
"requires_python": "<4,>=3.7",
"maintainer_email": null,
"keywords": "extract transform load etl scraping relational census",
"author": "Diego Irre\u00f1o, Erick Lozano",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/86/4e/89e33d2b394e56bf59ac935796786464cedebe1d6db5cc28388ec3b42ec8/socio4health-0.1.1.tar.gz",
"platform": null,
"description": "# socio4health <a href='https://www.harmonize-tools.org/'><img src='https://harmonize-tools.github.io/harmonize-logo.png' align=\"right\" height=\"139\" /></a>\r\n\r\n<!-- badges: start -->\r\n\r\n[![Lifecycle:\r\nmaturing](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)\r\n[![MIT\r\nlicense](https://img.shields.io/badge/License-MIT-blue.svg)](https://github.com/harmonize-tools/socio4health/blob/main/LICENSE.md/)\r\n[![GitHub\r\ncontributors](https://img.shields.io/github/contributors/harmonize-tools/socio4health)](https://github.com/harmonize-tools/socio4health/graphs/contributors)\r\n![commits](https://badgen.net/github/commits/harmonize-tools/socio4health/main)\r\n<!-- badges: end -->\r\n\r\n## Overview\r\n<p style=\"font-family: Arial, sans-serif; font-size: 14px;\">\r\n Package socio4health is an extraction, transformation, loading (ETL) and AI-assisted query and visualization (AI QV) tool designed to simplify the intricate process of collecting and merging data from multiple sources focusing in sociodemografic and census datasets from Colombia, Brasil and Peru, into a unified relational database structure and visualize or querying it using natural language.\r\n</p>\r\n\r\n- Seamlessly retrieve data from online data sources through web scraping, as well as from local files.\r\n- Support for various data formats, including .csv, .xlsx, .xls, .txt, .sav, and compressed files, ensuring versatility in sourcing information.\r\n- Consolidating extracted data into pandas DataFrame.\r\n- Consolidating transformed data into a cohesive relational database.\r\n- Conduct precise queries and apply transformations to meet specific criteria.\r\n- Using natural language input to query data (Answers from values to subsets)\r\n- Using natural language input to create simple visualizations of data\r\n\r\n\r\n## Dependencies\r\n\r\n<table>\r\n <tr>\r\n <td align=\"center\">\r\n <a href=\"https://pandas.pydata.org/\" target=\"_blank\">\r\n <img src=\"https://avatars.githubusercontent.com/u/21206976?s=280&v=4\" height=\"50\" alt=\"pandas logo\">\r\n </a>\r\n </td>\r\n <td align=\"left\">\r\n <strong>Pandas</strong><br>\r\n Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool.<br>\r\n </td>\r\n </tr>\r\n <tr>\r\n <td align=\"center\">\r\n <a href=\"https://numpy.org/\" target=\"_blank\">\r\n <img src=\"https://avatars.githubusercontent.com/u/288276?s=48&v=4\" height=\"50\" alt=\"numpy logo\">\r\n </a>\r\n </td>\r\n <td align=\"left\">\r\n <strong>Numpy</strong><br>\r\n The fundamental package for scientific computing with Python.<br>\r\n </td>\r\n </tr>\r\n <tr>\r\n <td align=\"center\">\r\n <a href=\"https://scrapy.org/\" target=\"_blank\">\r\n <img src=\"https://avatars.githubusercontent.com/u/733635?s=48&v=4\" height=\"50\" alt=\"scrapy logo\">\r\n </a>\r\n </td>\r\n <td align=\"left\">\r\n <strong>Scrapy</strong><br>\r\n Framework for extracting the data you need from websites.<br>\r\n </td>\r\n </tr>\r\n <tr>\r\n <td align=\"center\">\r\n <a href=\"https://pandas-ai.com/\" target=\"_blank\">\r\n <img src=\"https://avatars.githubusercontent.com/u/154438448?s=48&v=4\" height=\"50\" alt=\"ggplot2 logo\">\r\n </a>\r\n </td>\r\n <td align=\"left\">\r\n <strong>Pandasai</strong><br>\r\n Integrates generative artificial intelligence capabilities into pandas, making dataframes conversational.<br>\r\n </td>\r\n </tr>\r\n</table>\r\n\r\n- <a href=\"https://openpyxl.readthedocs.io/en/stable/\">openpyxl</a>\r\n- <a href=\"https://py7zr.readthedocs.io/en/latest/\">py7zr</a>\r\n- <a href=\"https://pypi.org/project/pyreadstat/\">pyreadstat</a>\r\n- <a href=\"https://tqdm.github.io/\">tqdm</a>\r\n- <a href=\"https://requests.readthedocs.io/en/latest/\">requests</a>\r\n\r\n## Installation\r\n\r\nYou can install the latest version of the package from GitHub using the `remotes` package:\r\n\r\n```R\r\n# Install using pip\r\npip install nyctibius\r\n```\r\n\r\n## How to Use it\r\n\r\nTo use the Nyctibius package, follow these steps:\r\n\r\n1. Import the package in your Python script:\r\n\r\n ```python\r\n from socio4health import Harmonizer\r\n ```\r\n\r\n2. Create an instance of the `Harmonizer` class:\r\n\r\n ```python\r\n harmonizer = Harmonizer()\r\n ```\r\n\r\n3. Extract data from online sources and create a list of data information:\r\n\r\n ```python\r\n url = 'https://www.example.com'\r\n depth = 0\r\n ext = 'csv'\r\n list_datainfo = harmonizer.extract(url=url, depth=depth, ext=ext)\r\n harmonizer = Harmonizer(list_datainfo)\r\n ```\r\n\r\n4. Load the data from the list of data information and merge it into a relational database:\r\n\r\n ```python\r\n results = harmonizer.load()\r\n ```\r\n\r\n5. Import the modifier module and create an instance of the `Modifier` class:\r\n\r\n ```python\r\n from socio4health.db.modifier import Modifier\r\n modifier = Modifier(db_path='../../data/output/nyctibius.db')\r\n ```\r\n \r\n6. Perfom modifications:\r\n\r\n ```python\r\n tables = modifier.get_tables()\r\n print(tables)\r\n ```\r\n \r\n7. Import the querier module and create an instance of the `Querier` class:\r\n\r\n ```python\r\n from socio4health.db.querier import Querier\r\n querier = Querier(db_path='data/output/socio4health.db')\r\n ```\r\n\r\n8. Perform queries:\r\n\r\n ```python\r\n df = querier.select(table=\"Estructura CHC_2017\").execute()\r\n print(df)\r\n ```\r\n\r\n## Resources\r\n\r\n<details>\r\n<summary>\r\nPackage Website\r\n</summary>\r\n\r\nThe [socio4health website](https://ersebreck.github.io/Nyctibius/) package website includes a function reference, a model outline, and case studies using the package. The site mainly concerns the release version, but you can also find documentation for the latest development version.\r\n\r\n</details>\r\n<details>\r\n<summary>\r\nOrganisation Website\r\n</summary>\r\n\r\n[Harmonize](https://www.harmonize-tools.org/) is an international develop cost-effective and reproducible digital tools for stakeholders in hotspots affected by a changing climate in Latin America & the Caribbean (LAC), including cities, small islands, highlands, and the Amazon rainforest.\r\n\r\nThe project consists of resources and [tools](https://harmonize-tools.github.io/) developed in conjunction with different teams from Brazil, Colombia, Dominican Republic, Peru and Spain.\r\n\r\n</details>\r\n\r\n## Organizations\r\n\r\n<table>\r\n <tr>\r\n <td align=\"center\">\r\n <a href=\"https://www.bsc.es/\" target=\"_blank\">\r\n <img src=\"https://imgs.search.brave.com/t_FUOTCQZmDh3ddbVSX1LgHYq4mzCxvVA8U_YHywMTc/rs:fit:500:0:0/g:ce/aHR0cHM6Ly9zb21t/YS5lcy93cC1jb250/ZW50L3VwbG9hZHMv/MjAyMi8wNC9CU0Mt/Ymx1ZS1zbWFsbC5q/cGc\" height=\"64\" alt=\"bsc logo\">\r\n </a>\r\n </td>\r\n <td align=\"center\">\r\n <a href=\"https://uniandes.edu.co/\" target=\"_blank\">\r\n <img src=\"https://uniandes.edu.co/sites/default/files/logo-uniandes.png\" height=\"64\" alt=\"uniandes logo\">\r\n </a>\r\n </td>\r\n </tr>\r\n</table>\r\n\r\n\r\n## Authors / Contact information\r\n\r\nList the authors/contributors of the package and provide contact information if users have questions or feedback.\r\n</br>\r\n</br>\r\n<a href=\"https://github.com/dirreno\">\r\n <img src=\"https://avatars.githubusercontent.com/u/39099417?v=4\" style=\"width: 50px; height: auto;\" />\r\n</a>\r\n<span style=\"display: flex; align-items: center; margin-left: 10px;\">\r\n <strong>Diego Irre\u00f1o</strong> (developer)\r\n</span>\r\n</br>\r\n<a href=\"https://github.com/Ersebreck\">\r\n <img src=\"https://avatars.githubusercontent.com/u/81669194?v=4\" style=\"width: 50px; height: auto;\" />\r\n</a>\r\n<span style=\"display: flex; align-items: center; margin-left: 10px;\">\r\n <strong>Erick Lozano</strong> (developer)\r\n</span>\r\n",
"bugtrack_url": null,
"license": null,
"summary": "Socio4health is a Python package for gathering and harmonizing socio-demographic data.",
"version": "0.1.1",
"project_urls": {
"Bug Reports": "https://github.com//harmonize-tools/socio4health/issues",
"Homepage": "https://github.com/harmonize-tools/socio4health",
"Source": "https://github.com//harmonize-tools/socio4health/"
},
"split_keywords": [
"extract",
"transform",
"load",
"etl",
"scraping",
"relational",
"census"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "9df6f9f9f68e95107824b515dca5575d6541bda33bda05ab187c6e9027f30f47",
"md5": "cd2557785c4192769262bf435489d689",
"sha256": "369ce991ef70b3aa91f09820172cbcf040141ec435775ae7411f3dab9b2336fd"
},
"downloads": -1,
"filename": "socio4health-0.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "cd2557785c4192769262bf435489d689",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4,>=3.7",
"size": 32358,
"upload_time": "2024-11-13T17:42:06",
"upload_time_iso_8601": "2024-11-13T17:42:06.967861Z",
"url": "https://files.pythonhosted.org/packages/9d/f6/f9f9f68e95107824b515dca5575d6541bda33bda05ab187c6e9027f30f47/socio4health-0.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "864e89e33d2b394e56bf59ac935796786464cedebe1d6db5cc28388ec3b42ec8",
"md5": "0618d332e44bc7f3a28245762af83132",
"sha256": "ab8853d324bdd6885a2c48676f6159be6c400ab0478c8de4a1263d891e5fa09d"
},
"downloads": -1,
"filename": "socio4health-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "0618d332e44bc7f3a28245762af83132",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4,>=3.7",
"size": 30675,
"upload_time": "2024-11-13T17:42:09",
"upload_time_iso_8601": "2024-11-13T17:42:09.525713Z",
"url": "https://files.pythonhosted.org/packages/86/4e/89e33d2b394e56bf59ac935796786464cedebe1d6db5cc28388ec3b42ec8/socio4health-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-13 17:42:09",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "harmonize-tools",
"github_project": "socio4health",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "socio4health"
}