socio4health

Name	socio4health JSON
Version	0.1.3 JSON
	download
home_page	https://github.com/harmonize-tools/socio4health
Summary	Socio4health is a Python package for gathering and consolidating socio-demographic data.
upload_time	2025-07-14 17:46:04
maintainer	None
docs_url	None
author	Erick Lozano, Diego Irreño, Juan Montenegro, Ingrid Mora
requires_python	<4,>=3.10
license	None
keywords	extract transform load etl scraping relational census
VCS
bugtrack_url
requirements	requests Scrapy tqdm pyreadstat py7zr pandas openpyxl matplotlib numpy dask appdirs pyarrow deep_translator transformers torch pytest
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # socio4health <a href='https://www.harmonize-tools.org/'><img src='https://harmonize-tools.github.io/harmonize-logo.png' align="right" height="139" /></a>

<!-- badges: start -->

[![Lifecycle:
maturing](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)
[![MIT
license](https://img.shields.io/badge/License-MIT-blue.svg)](https://github.com/harmonize-tools/socio4health/blob/main/LICENSE.md/)
[![GitHub
contributors](https://img.shields.io/github/contributors/harmonize-tools/socio4health)](https://github.com/harmonize-tools/socio4health/graphs/contributors)
![commits](https://badgen.net/github/commits/harmonize-tools/socio4health/main)
<!-- badges: end -->

## Overview
<p style="font-family: Arial, sans-serif; font-size: 14px;">
  Package socio4health is an extraction, transformation, loading (ETL) and AI-assisted query and visualization (AI QV) tool designed to simplify the intricate process of collecting and merging data from multiple sources focusing in sociodemografic and census datasets from Colombia, Brasil and Peru, into a unified relational database structure and visualize or querying it using natural language.
</p>

- Seamlessly retrieve data from online data sources through web scraping, as well as from local files.
- Support for various data formats, including .csv, .xlsx, .xls, .txt, .sav, and compressed files, ensuring versatility in sourcing information.
- Consolidating extracted data into pandas DataFrame.
- Consolidating transformed data into a cohesive relational database.
- Conduct precise queries and apply transformations to meet specific criteria.
- Using natural language input to query data (Answers from values to subsets)
- Using natural language input to create simple visualizations of data


## Dependencies

<table>
  <tr>
    <td align="center">
      <a href="https://pandas.pydata.org/" target="_blank">
        <img src="https://avatars.githubusercontent.com/u/21206976?s=280&v=4" height="50" alt="pandas logo">
      </a>
    </td>
    <td align="left">
      <strong>Pandas</strong><br>
      Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool.<br>
    </td>
  </tr>
  <tr>
    <td align="center">
      <a href="https://numpy.org/" target="_blank">
        <img src="https://avatars.githubusercontent.com/u/288276?s=48&v=4" height="50" alt="numpy logo">
      </a>
    </td>
    <td align="left">
      <strong>Numpy</strong><br>
      The fundamental package for scientific computing with Python.<br>
    </td>
  </tr>
  <tr>
    <td align="center">
      <a href="https://scrapy.org/" target="_blank">
        <img src="https://avatars.githubusercontent.com/u/733635?s=48&v=4" height="50" alt="scrapy logo">
      </a>
    </td>
    <td align="left">
      <strong>Scrapy</strong><br>
      Framework for extracting the data you need from websites.<br>
    </td>
  </tr>
  <tr>
    <td align="center">
      <a href="https://pandas-ai.com/" target="_blank">
        <img src="https://avatars.githubusercontent.com/u/154438448?s=48&v=4" height="50" alt="ggplot2 logo">
      </a>
    </td>
    <td align="left">
      <strong>Pandasai</strong><br>
      Integrates generative artificial intelligence capabilities into pandas, making dataframes conversational.<br>
    </td>
  </tr>
</table>

- <a href="https://openpyxl.readthedocs.io/en/stable/">openpyxl</a>
- <a href="https://py7zr.readthedocs.io/en/latest/">py7zr</a>
- <a href="https://pypi.org/project/pyreadstat/">pyreadstat</a>
- <a href="https://tqdm.github.io/">tqdm</a>
- <a href="https://requests.readthedocs.io/en/latest/">requests</a>

## Installation

You can install the latest version of the package from GitHub using the `remotes` package:

```R
# Install using pip
pip install nyctibius
```

## How to Use it

To use the Nyctibius package, follow these steps:

1. Import the package in your Python script:

   ```python
   from socio4health import Harmonizer
   ```

2. Create an instance of the `Harmonizer` class:

   ```python
   harmonizer = Harmonizer()
   ```

3. Extract data from online sources and create a list of data information:

   ```python
   url = 'https://www.example.com'
   depth = 0
   ext = 'csv'
   list_datainfo = harmonizer.extract(url=url, depth=depth, ext=ext)
   harmonizer = Harmonizer(list_datainfo)
   ```

4. Load the data from the list of data information and merge it into a relational database:

   ```python
   results = harmonizer.load()
   ```

5. Import the modifier module and create an instance of the `Modifier` class:

   ```python
   from socio4health.db.modifier import Modifier
   modifier = Modifier(db_path='../../data/output/nyctibius.db')
   ```
   
6. Perfom modifications:

   ```python
   tables = modifier.get_tables()
   print(tables)
   ```
   
7. Import the querier module and create an instance of the `Querier` class:

   ```python
   from socio4health.db.querier import Querier
   querier = Querier(db_path='data/output/socio4health.db')
   ```

8. Perform queries:

   ```python
   df = querier.select(table="Estructura CHC_2017").execute()
   print(df)
   ```

## Resources

<details>
<summary>
Package Website
</summary>

The [socio4health website](https://ersebreck.github.io/Nyctibius/) package website includes a function reference, a model outline, and case studies using the package. The site mainly concerns the release version, but you can also find documentation for the latest development version.

</details>
<details>
<summary>
Organisation Website
</summary>

[Harmonize](https://www.harmonize-tools.org/) is an international develop cost-effective and reproducible digital tools for stakeholders in hotspots affected by a changing climate in Latin America & the Caribbean (LAC), including cities, small islands, highlands, and the Amazon rainforest.

The project consists of resources and [tools](https://harmonize-tools.github.io/) developed in conjunction with different teams from Brazil, Colombia, Dominican Republic, Peru and Spain.

</details>

## Organizations

<table>
  <tr>
    <td align="center">
      <a href="https://www.bsc.es/" target="_blank">
        <img src="https://imgs.search.brave.com/t_FUOTCQZmDh3ddbVSX1LgHYq4mzCxvVA8U_YHywMTc/rs:fit:500:0:0/g:ce/aHR0cHM6Ly9zb21t/YS5lcy93cC1jb250/ZW50L3VwbG9hZHMv/MjAyMi8wNC9CU0Mt/Ymx1ZS1zbWFsbC5q/cGc" height="64" alt="bsc logo">
      </a>
    </td>
    <td align="center">
      <a href="https://uniandes.edu.co/" target="_blank">
        <img src="https://uniandes.edu.co/sites/default/files/logo-uniandes.png" height="64" alt="uniandes logo">
      </a>
    </td>
  </tr>
</table>


## Authors / Contact information

List the authors/contributors of the package and provide contact information if users have questions or feedback.
</br>
</br>
<a href="https://github.com/dirreno">
  <img src="https://avatars.githubusercontent.com/u/39099417?v=4" style="width: 50px; height: auto;" />
</a>
<span style="display: flex; align-items: center; margin-left: 10px;">
  <strong>Diego Irreño</strong> (developer)
</span>
</br>
<a href="https://github.com/Ersebreck">
  <img src="https://avatars.githubusercontent.com/u/81669194?v=4" style="width: 50px; height: auto;" />
</a>
<span style="display: flex; align-items: center; margin-left: 10px;">
  <strong>Erick Lozano</strong> (developer)
</span>

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/harmonize-tools/socio4health",
    "name": "socio4health",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4,>=3.10",
    "maintainer_email": null,
    "keywords": "extract transform load etl scraping relational census",
    "author": "Erick Lozano, Diego Irre\u00f1o, Juan Montenegro, Ingrid Mora",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/2d/32/9ef766a0b062793be917d33ed8b3b21d927cdb76500a569ad59ef7e3bbfc/socio4health-0.1.3.tar.gz",
    "platform": null,
    "description": "# socio4health <a href='https://www.harmonize-tools.org/'><img src='https://harmonize-tools.github.io/harmonize-logo.png' align=\"right\" height=\"139\" /></a>\r\n\r\n<!-- badges: start -->\r\n\r\n[![Lifecycle:\r\nmaturing](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)\r\n[![MIT\r\nlicense](https://img.shields.io/badge/License-MIT-blue.svg)](https://github.com/harmonize-tools/socio4health/blob/main/LICENSE.md/)\r\n[![GitHub\r\ncontributors](https://img.shields.io/github/contributors/harmonize-tools/socio4health)](https://github.com/harmonize-tools/socio4health/graphs/contributors)\r\n![commits](https://badgen.net/github/commits/harmonize-tools/socio4health/main)\r\n<!-- badges: end -->\r\n\r\n## Overview\r\n<p style=\"font-family: Arial, sans-serif; font-size: 14px;\">\r\n  Package socio4health is an extraction, transformation, loading (ETL) and AI-assisted query and visualization (AI QV) tool designed to simplify the intricate process of collecting and merging data from multiple sources focusing in sociodemografic and census datasets from Colombia, Brasil and Peru, into a unified relational database structure and visualize or querying it using natural language.\r\n</p>\r\n\r\n- Seamlessly retrieve data from online data sources through web scraping, as well as from local files.\r\n- Support for various data formats, including .csv, .xlsx, .xls, .txt, .sav, and compressed files, ensuring versatility in sourcing information.\r\n- Consolidating extracted data into pandas DataFrame.\r\n- Consolidating transformed data into a cohesive relational database.\r\n- Conduct precise queries and apply transformations to meet specific criteria.\r\n- Using natural language input to query data (Answers from values to subsets)\r\n- Using natural language input to create simple visualizations of data\r\n\r\n\r\n## Dependencies\r\n\r\n<table>\r\n  <tr>\r\n    <td align=\"center\">\r\n      <a href=\"https://pandas.pydata.org/\" target=\"_blank\">\r\n        <img src=\"https://avatars.githubusercontent.com/u/21206976?s=280&v=4\" height=\"50\" alt=\"pandas logo\">\r\n      </a>\r\n    </td>\r\n    <td align=\"left\">\r\n      <strong>Pandas</strong><br>\r\n      Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool.<br>\r\n    </td>\r\n  </tr>\r\n  <tr>\r\n    <td align=\"center\">\r\n      <a href=\"https://numpy.org/\" target=\"_blank\">\r\n        <img src=\"https://avatars.githubusercontent.com/u/288276?s=48&v=4\" height=\"50\" alt=\"numpy logo\">\r\n      </a>\r\n    </td>\r\n    <td align=\"left\">\r\n      <strong>Numpy</strong><br>\r\n      The fundamental package for scientific computing with Python.<br>\r\n    </td>\r\n  </tr>\r\n  <tr>\r\n    <td align=\"center\">\r\n      <a href=\"https://scrapy.org/\" target=\"_blank\">\r\n        <img src=\"https://avatars.githubusercontent.com/u/733635?s=48&v=4\" height=\"50\" alt=\"scrapy logo\">\r\n      </a>\r\n    </td>\r\n    <td align=\"left\">\r\n      <strong>Scrapy</strong><br>\r\n      Framework for extracting the data you need from websites.<br>\r\n    </td>\r\n  </tr>\r\n  <tr>\r\n    <td align=\"center\">\r\n      <a href=\"https://pandas-ai.com/\" target=\"_blank\">\r\n        <img src=\"https://avatars.githubusercontent.com/u/154438448?s=48&v=4\" height=\"50\" alt=\"ggplot2 logo\">\r\n      </a>\r\n    </td>\r\n    <td align=\"left\">\r\n      <strong>Pandasai</strong><br>\r\n      Integrates generative artificial intelligence capabilities into pandas, making dataframes conversational.<br>\r\n    </td>\r\n  </tr>\r\n</table>\r\n\r\n- <a href=\"https://openpyxl.readthedocs.io/en/stable/\">openpyxl</a>\r\n- <a href=\"https://py7zr.readthedocs.io/en/latest/\">py7zr</a>\r\n- <a href=\"https://pypi.org/project/pyreadstat/\">pyreadstat</a>\r\n- <a href=\"https://tqdm.github.io/\">tqdm</a>\r\n- <a href=\"https://requests.readthedocs.io/en/latest/\">requests</a>\r\n\r\n## Installation\r\n\r\nYou can install the latest version of the package from GitHub using the `remotes` package:\r\n\r\n```R\r\n# Install using pip\r\npip install nyctibius\r\n```\r\n\r\n## How to Use it\r\n\r\nTo use the Nyctibius package, follow these steps:\r\n\r\n1. Import the package in your Python script:\r\n\r\n   ```python\r\n   from socio4health import Harmonizer\r\n   ```\r\n\r\n2. Create an instance of the `Harmonizer` class:\r\n\r\n   ```python\r\n   harmonizer = Harmonizer()\r\n   ```\r\n\r\n3. Extract data from online sources and create a list of data information:\r\n\r\n   ```python\r\n   url = 'https://www.example.com'\r\n   depth = 0\r\n   ext = 'csv'\r\n   list_datainfo = harmonizer.extract(url=url, depth=depth, ext=ext)\r\n   harmonizer = Harmonizer(list_datainfo)\r\n   ```\r\n\r\n4. Load the data from the list of data information and merge it into a relational database:\r\n\r\n   ```python\r\n   results = harmonizer.load()\r\n   ```\r\n\r\n5. Import the modifier module and create an instance of the `Modifier` class:\r\n\r\n   ```python\r\n   from socio4health.db.modifier import Modifier\r\n   modifier = Modifier(db_path='../../data/output/nyctibius.db')\r\n   ```\r\n   \r\n6. Perfom modifications:\r\n\r\n   ```python\r\n   tables = modifier.get_tables()\r\n   print(tables)\r\n   ```\r\n   \r\n7. Import the querier module and create an instance of the `Querier` class:\r\n\r\n   ```python\r\n   from socio4health.db.querier import Querier\r\n   querier = Querier(db_path='data/output/socio4health.db')\r\n   ```\r\n\r\n8. Perform queries:\r\n\r\n   ```python\r\n   df = querier.select(table=\"Estructura CHC_2017\").execute()\r\n   print(df)\r\n   ```\r\n\r\n## Resources\r\n\r\n<details>\r\n<summary>\r\nPackage Website\r\n</summary>\r\n\r\nThe [socio4health website](https://ersebreck.github.io/Nyctibius/) package website includes a function reference, a model outline, and case studies using the package. The site mainly concerns the release version, but you can also find documentation for the latest development version.\r\n\r\n</details>\r\n<details>\r\n<summary>\r\nOrganisation Website\r\n</summary>\r\n\r\n[Harmonize](https://www.harmonize-tools.org/) is an international develop cost-effective and reproducible digital tools for stakeholders in hotspots affected by a changing climate in Latin America & the Caribbean (LAC), including cities, small islands, highlands, and the Amazon rainforest.\r\n\r\nThe project consists of resources and [tools](https://harmonize-tools.github.io/) developed in conjunction with different teams from Brazil, Colombia, Dominican Republic, Peru and Spain.\r\n\r\n</details>\r\n\r\n## Organizations\r\n\r\n<table>\r\n  <tr>\r\n    <td align=\"center\">\r\n      <a href=\"https://www.bsc.es/\" target=\"_blank\">\r\n        <img src=\"https://imgs.search.brave.com/t_FUOTCQZmDh3ddbVSX1LgHYq4mzCxvVA8U_YHywMTc/rs:fit:500:0:0/g:ce/aHR0cHM6Ly9zb21t/YS5lcy93cC1jb250/ZW50L3VwbG9hZHMv/MjAyMi8wNC9CU0Mt/Ymx1ZS1zbWFsbC5q/cGc\" height=\"64\" alt=\"bsc logo\">\r\n      </a>\r\n    </td>\r\n    <td align=\"center\">\r\n      <a href=\"https://uniandes.edu.co/\" target=\"_blank\">\r\n        <img src=\"https://uniandes.edu.co/sites/default/files/logo-uniandes.png\" height=\"64\" alt=\"uniandes logo\">\r\n      </a>\r\n    </td>\r\n  </tr>\r\n</table>\r\n\r\n\r\n## Authors / Contact information\r\n\r\nList the authors/contributors of the package and provide contact information if users have questions or feedback.\r\n</br>\r\n</br>\r\n<a href=\"https://github.com/dirreno\">\r\n  <img src=\"https://avatars.githubusercontent.com/u/39099417?v=4\" style=\"width: 50px; height: auto;\" />\r\n</a>\r\n<span style=\"display: flex; align-items: center; margin-left: 10px;\">\r\n  <strong>Diego Irre\u00f1o</strong> (developer)\r\n</span>\r\n</br>\r\n<a href=\"https://github.com/Ersebreck\">\r\n  <img src=\"https://avatars.githubusercontent.com/u/81669194?v=4\" style=\"width: 50px; height: auto;\" />\r\n</a>\r\n<span style=\"display: flex; align-items: center; margin-left: 10px;\">\r\n  <strong>Erick Lozano</strong> (developer)\r\n</span>\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Socio4health is a Python package for gathering and consolidating socio-demographic data.",
    "version": "0.1.3",
    "project_urls": {
        "Bug Reports": "https://github.com/harmonize-tools/socio4health/issues",
        "Homepage": "https://github.com/harmonize-tools/socio4health",
        "Source": "https://github.com/harmonize-tools/socio4health/"
    },
    "split_keywords": [
        "extract",
        "transform",
        "load",
        "etl",
        "scraping",
        "relational",
        "census"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "97021d775684249a014ea075174975075a8cab9305043528b31a4583159ecd05",
                "md5": "2349b9da1ce64ab691667ccfe4e7d8a0",
                "sha256": "2294f63ffe6c84361a1b10796187adf9bedeb1683533b3a95a185478eddfc2e2"
            },
            "downloads": -1,
            "filename": "socio4health-0.1.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "2349b9da1ce64ab691667ccfe4e7d8a0",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4,>=3.10",
            "size": 26695,
            "upload_time": "2025-07-14T17:46:01",
            "upload_time_iso_8601": "2025-07-14T17:46:01.679694Z",
            "url": "https://files.pythonhosted.org/packages/97/02/1d775684249a014ea075174975075a8cab9305043528b31a4583159ecd05/socio4health-0.1.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "2d329ef766a0b062793be917d33ed8b3b21d927cdb76500a569ad59ef7e3bbfc",
                "md5": "feebbb1024bfbb713f51ad9825cadc26",
                "sha256": "d42a10e30cff2171fd1e8d4e4a8d7af43d9f9e979d9e8893bac36f783c17941e"
            },
            "downloads": -1,
            "filename": "socio4health-0.1.3.tar.gz",
            "has_sig": false,
            "md5_digest": "feebbb1024bfbb713f51ad9825cadc26",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4,>=3.10",
            "size": 31224,
            "upload_time": "2025-07-14T17:46:04",
            "upload_time_iso_8601": "2025-07-14T17:46:04.113441Z",
            "url": "https://files.pythonhosted.org/packages/2d/32/9ef766a0b062793be917d33ed8b3b21d927cdb76500a569ad59ef7e3bbfc/socio4health-0.1.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-14 17:46:04",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "harmonize-tools",
    "github_project": "socio4health",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "requests",
            "specs": [
                [
                    "~=",
                    "2.31.0"
                ]
            ]
        },
        {
            "name": "Scrapy",
            "specs": [
                [
                    "~=",
                    "2.11.1"
                ]
            ]
        },
        {
            "name": "tqdm",
            "specs": [
                [
                    "~=",
                    "4.66.1"
                ]
            ]
        },
        {
            "name": "pyreadstat",
            "specs": [
                [
                    "~=",
                    "1.2.6"
                ]
            ]
        },
        {
            "name": "py7zr",
            "specs": [
                [
                    "~=",
                    "0.20.8"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": []
        },
        {
            "name": "openpyxl",
            "specs": [
                [
                    "~=",
                    "3.1.2"
                ]
            ]
        },
        {
            "name": "matplotlib",
            "specs": []
        },
        {
            "name": "numpy",
            "specs": []
        },
        {
            "name": "dask",
            "specs": []
        },
        {
            "name": "appdirs",
            "specs": []
        },
        {
            "name": "pyarrow",
            "specs": []
        },
        {
            "name": "deep_translator",
            "specs": []
        },
        {
            "name": "transformers",
            "specs": []
        },
        {
            "name": "torch",
            "specs": []
        },
        {
            "name": "pytest",
            "specs": []
        }
    ],
    "lcname": "socio4health"
}

Erick Lozano, Diego Irreño, Juan Montenegro, Ingrid Mora