# HuDa — Humanitarian Data Library
HuDa is a practical Python library for humanitarian data workflows. It provides simple, consistent functions to open, clean, transform, validate, analyze, map, visualize, automate, and share humanitarian datasets.
- Focused on survey, 5W, monitoring, and geo-enabled data
- Consistent API patterns across modules
- Returns lightweight specs for rendering/exports where appropriate
## Features
- **Opening**: CSV/Excel/JSON/SQL/API connectors
- **Cleaning**: normalize numbers/dates/text, translate categories, deduplicate, geocode
- **Transformation**: reshape, aggregate, indexes, ratios, growth, standardization
- **Validation & Quality**: ranges, missing/mandatory, country codes, dates, profiling
- **Geospatial**: folium maps, choropleths, overlays, heatmaps, clusters
- **Analysis**: correlation, time series, regression, PCA, coverage gaps (selected utilities)
- **Visualization**: chart specs for bar/line/pie/hist/box/heatmap, dashboards
- **Automation**: reports, snapshots, change detection (specs)
- **Interoperability**: export specs (CSV/Excel/JSON/Parquet/SQL/Stata/SPSS/GIS/HDX/HTML/API)
## Installation
HuDa is published on PyPI as `huda`.
```bash
pip install huda
```
Minimum Python version: 3.8
Some modules rely on optional libraries (e.g., folium, geopandas, scikit-learn). See Requirements below if you plan to use those features.
## Quickstart
```python
import polars as pl
from huda.cleaning import translate_categories
from huda.transformation import percentage_calculation
from huda.Interoperability import export_csv
# Example data
df = pl.DataFrame({
"province": ["Kabul", "Herat"],
"cluster": ["wash", "wash"],
"reached": [1200, 900],
"target": [2000, 1100],
})
# Cleaning
df2 = translate_categories(df, columns={"cluster": {"wash": "WASH"}})
# Transformation
df3 = percentage_calculation(df2, numerator_col="reached", denominator_col="target", output_col="coverage_pct")
# Interoperability (returns intent spec; does not write files)
spec = export_csv(df3, path="/tmp/coverage.csv")
print(spec)
```
## Module Highlights
### Opening
```python
from huda.opening import open_csv, open_excel, open_json
df = open_csv("/path/data.csv")
```
### Cleaning
```python
from huda.cleaning import numbers_standardization, dates_standardization, duplicate
df = numbers_standardization(df, columns=["reached"]) # normalize numeric fields
df = dates_standardization(df, column="report_date", style="iso")
df = duplicate(df, columns=["id"], keep="first")
```
### Transformation
```python
from huda.transformation import pivot_unpivot, severity_index_calculation
df_wide = pivot_unpivot(df, mode="pivot", index=["province"], columns="cluster", values="reached")
df_idx = severity_index_calculation(df, components=["fcs","rcsi"], weights={"fcs":0.6,"rcsi":0.4})
```
### Validation & Quality
```python
from huda.validation_and_quality import country_code_validation, automatic_data_profiling_report
report = automatic_data_profiling_report(df)
valid = country_code_validation(df, data_col="country")
```
### Geospatial
```python
from huda.geospatial import choropleth_maps_by_region
html_map = choropleth_maps_by_region(df, region_col="province", value_col="reached", geojson_path="/path/afg_provinces.geojson")
with open("map.html", "w", encoding="utf-8") as f:
f.write(html_map)
```
### Visualization (specs)
```python
from huda.visualize import bar_chart, interactive_dashboard
chart = bar_chart(df, category_col="province", value_col="reached")
dashboard = interactive_dashboard(charts=[chart])
```
### Interoperability (specs)
These functions return intent specs you can pass to renderers/uploaders.
```python
from huda.Interoperability import (
export_csv, export_excel, export_json, export_parquet,
export_sql_database, export_stata, export_spss,
export_shapefile, export_geojson, export_hdx_dataset,
share_dashboard_html, api_integration_output,
)
spec_csv = export_csv(df, path="/tmp/data.csv")
spec_sql = export_sql_database(df, connection_uri="postgresql://user:pass@host:5432/db", table_name="huda_export")
spec_geo = export_geojson(df, path="/tmp/data.geojson", geometry_col="geom")
spec_dash = share_dashboard_html(dashboard, path="/tmp/dashboard.html", embed_assets=True)
```
## Requirements
Core requirements and optional dependencies are specified in `requirements.txt`.
If you plan to use geospatial and mapping utilities, you’ll need packages like `folium` and `geopandas` (which may require system libraries on some platforms). For ML utilities (e.g., outlier isolation), you’ll need `scikit-learn`.
## Development
```bash
python -m venv .venv
. .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
```
Run a quick sanity check:
```bash
python -c "import huda, polars as pl; print('HuDa OK')"
```
## Building & Publishing (maintainers)
HuDa uses PEP 517/518 builds via Hatchling (configured in `pyproject.toml`).
```bash
python -m pip install --upgrade build twine
python -m build
# TestPyPI upload
twine upload --repository testpypi dist/*
# PyPI upload
twine upload dist/*
```
## Contributing
Contributions are welcome. Please open an issue to discuss improvements or new utilities aligned with humanitarian workflows.
## License
MIT License. See `pyproject.toml` and add a `LICENSE` file for full text.
## Links
- **Repository**: https://github.com/fiafghan/HuDa
- **Issues**: https://github.com/fiafghan/HuDa/issues
- **Training website**: in `huda_website/` (React + Tailwind; run with Vite)
Raw data
{
"_id": null,
"home_page": null,
"name": "huda",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "analysis, data, etl, geospatial, gis, humanitarian, pandas, polars",
"author": null,
"author_email": "Fardin Ibrahimi <fiafghan@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/18/e4/d7fd205249cc7db31c0dad7f59e3331a956164c1c99fcf96bafe0ea7c734/huda-0.1.4.tar.gz",
"platform": null,
"description": "# HuDa \u2014 Humanitarian Data Library\n\nHuDa is a practical Python library for humanitarian data workflows. It provides simple, consistent functions to open, clean, transform, validate, analyze, map, visualize, automate, and share humanitarian datasets.\n\n- Focused on survey, 5W, monitoring, and geo-enabled data\n- Consistent API patterns across modules\n- Returns lightweight specs for rendering/exports where appropriate\n\n## Features\n- **Opening**: CSV/Excel/JSON/SQL/API connectors\n- **Cleaning**: normalize numbers/dates/text, translate categories, deduplicate, geocode\n- **Transformation**: reshape, aggregate, indexes, ratios, growth, standardization\n- **Validation & Quality**: ranges, missing/mandatory, country codes, dates, profiling\n- **Geospatial**: folium maps, choropleths, overlays, heatmaps, clusters\n- **Analysis**: correlation, time series, regression, PCA, coverage gaps (selected utilities)\n- **Visualization**: chart specs for bar/line/pie/hist/box/heatmap, dashboards\n- **Automation**: reports, snapshots, change detection (specs)\n- **Interoperability**: export specs (CSV/Excel/JSON/Parquet/SQL/Stata/SPSS/GIS/HDX/HTML/API)\n\n## Installation\nHuDa is published on PyPI as `huda`.\n\n```bash\npip install huda\n```\n\nMinimum Python version: 3.8\n\nSome modules rely on optional libraries (e.g., folium, geopandas, scikit-learn). See Requirements below if you plan to use those features.\n\n## Quickstart\n```python\nimport polars as pl\nfrom huda.cleaning import translate_categories\nfrom huda.transformation import percentage_calculation\nfrom huda.Interoperability import export_csv\n\n# Example data\ndf = pl.DataFrame({\n \"province\": [\"Kabul\", \"Herat\"],\n \"cluster\": [\"wash\", \"wash\"],\n \"reached\": [1200, 900],\n \"target\": [2000, 1100],\n})\n\n# Cleaning\ndf2 = translate_categories(df, columns={\"cluster\": {\"wash\": \"WASH\"}})\n\n# Transformation\ndf3 = percentage_calculation(df2, numerator_col=\"reached\", denominator_col=\"target\", output_col=\"coverage_pct\")\n\n# Interoperability (returns intent spec; does not write files)\nspec = export_csv(df3, path=\"/tmp/coverage.csv\")\nprint(spec)\n```\n\n## Module Highlights\n\n### Opening\n```python\nfrom huda.opening import open_csv, open_excel, open_json\ndf = open_csv(\"/path/data.csv\")\n```\n\n### Cleaning\n```python\nfrom huda.cleaning import numbers_standardization, dates_standardization, duplicate\ndf = numbers_standardization(df, columns=[\"reached\"]) # normalize numeric fields\ndf = dates_standardization(df, column=\"report_date\", style=\"iso\")\ndf = duplicate(df, columns=[\"id\"], keep=\"first\")\n```\n\n### Transformation\n```python\nfrom huda.transformation import pivot_unpivot, severity_index_calculation\ndf_wide = pivot_unpivot(df, mode=\"pivot\", index=[\"province\"], columns=\"cluster\", values=\"reached\")\ndf_idx = severity_index_calculation(df, components=[\"fcs\",\"rcsi\"], weights={\"fcs\":0.6,\"rcsi\":0.4})\n```\n\n### Validation & Quality\n```python\nfrom huda.validation_and_quality import country_code_validation, automatic_data_profiling_report\nreport = automatic_data_profiling_report(df)\nvalid = country_code_validation(df, data_col=\"country\")\n```\n\n### Geospatial\n```python\nfrom huda.geospatial import choropleth_maps_by_region\nhtml_map = choropleth_maps_by_region(df, region_col=\"province\", value_col=\"reached\", geojson_path=\"/path/afg_provinces.geojson\")\nwith open(\"map.html\", \"w\", encoding=\"utf-8\") as f:\n f.write(html_map)\n```\n\n### Visualization (specs)\n```python\nfrom huda.visualize import bar_chart, interactive_dashboard\nchart = bar_chart(df, category_col=\"province\", value_col=\"reached\")\ndashboard = interactive_dashboard(charts=[chart])\n```\n\n### Interoperability (specs)\nThese functions return intent specs you can pass to renderers/uploaders.\n\n```python\nfrom huda.Interoperability import (\n export_csv, export_excel, export_json, export_parquet,\n export_sql_database, export_stata, export_spss,\n export_shapefile, export_geojson, export_hdx_dataset,\n share_dashboard_html, api_integration_output,\n)\n\nspec_csv = export_csv(df, path=\"/tmp/data.csv\")\nspec_sql = export_sql_database(df, connection_uri=\"postgresql://user:pass@host:5432/db\", table_name=\"huda_export\")\nspec_geo = export_geojson(df, path=\"/tmp/data.geojson\", geometry_col=\"geom\")\nspec_dash = share_dashboard_html(dashboard, path=\"/tmp/dashboard.html\", embed_assets=True)\n```\n\n## Requirements\nCore requirements and optional dependencies are specified in `requirements.txt`.\n\nIf you plan to use geospatial and mapping utilities, you\u2019ll need packages like `folium` and `geopandas` (which may require system libraries on some platforms). For ML utilities (e.g., outlier isolation), you\u2019ll need `scikit-learn`.\n\n## Development\n```bash\npython -m venv .venv\n. .venv/bin/activate\npip install --upgrade pip\npip install -r requirements.txt\n```\n\nRun a quick sanity check:\n```bash\npython -c \"import huda, polars as pl; print('HuDa OK')\"\n```\n\n## Building & Publishing (maintainers)\nHuDa uses PEP 517/518 builds via Hatchling (configured in `pyproject.toml`).\n\n```bash\npython -m pip install --upgrade build twine\npython -m build\n# TestPyPI upload\ntwine upload --repository testpypi dist/*\n# PyPI upload\ntwine upload dist/*\n```\n\n## Contributing\nContributions are welcome. Please open an issue to discuss improvements or new utilities aligned with humanitarian workflows.\n\n## License\nMIT License. See `pyproject.toml` and add a `LICENSE` file for full text.\n\n## Links\n- **Repository**: https://github.com/fiafghan/HuDa\n- **Issues**: https://github.com/fiafghan/HuDa/issues\n- **Training website**: in `huda_website/` (React + Tailwind; run with Vite)\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "HuDa \u2014 Humanitarian Data Library utilities for opening, cleaning, transforming, validating, geospatial, analysis, visualization, automation, and interoperability.",
"version": "0.1.4",
"project_urls": {
"Homepage": "https://github.com/fiafghan/HuDa",
"Issues": "https://github.com/fiafghan/HuDa/issues",
"Repository": "https://github.com/fiafghan/HuDa"
},
"split_keywords": [
"analysis",
" data",
" etl",
" geospatial",
" gis",
" humanitarian",
" pandas",
" polars"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "15cf50fb231f2841c68bfd98e4235a2500d345f2312aa962d47a65bdd4b4f14e",
"md5": "e8cefa8c7042c89fcda28af0d00ee362",
"sha256": "f6dd81052bf6e36fddd0bc7e195f27d6b969b41b20a26361afae8a25ef21ea9d"
},
"downloads": -1,
"filename": "huda-0.1.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "e8cefa8c7042c89fcda28af0d00ee362",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 173583,
"upload_time": "2025-10-26T09:06:02",
"upload_time_iso_8601": "2025-10-26T09:06:02.684908Z",
"url": "https://files.pythonhosted.org/packages/15/cf/50fb231f2841c68bfd98e4235a2500d345f2312aa962d47a65bdd4b4f14e/huda-0.1.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "18e4d7fd205249cc7db31c0dad7f59e3331a956164c1c99fcf96bafe0ea7c734",
"md5": "0f95e1eee955e2c7df0ab77330246be8",
"sha256": "0b9b53919b1bcd59e43f23a2f8920d70b418766c7e2f8d2a80494e977e1fa087"
},
"downloads": -1,
"filename": "huda-0.1.4.tar.gz",
"has_sig": false,
"md5_digest": "0f95e1eee955e2c7df0ab77330246be8",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 197165,
"upload_time": "2025-10-26T09:06:05",
"upload_time_iso_8601": "2025-10-26T09:06:05.816185Z",
"url": "https://files.pythonhosted.org/packages/18/e4/d7fd205249cc7db31c0dad7f59e3331a956164c1c99fcf96bafe0ea7c734/huda-0.1.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-26 09:06:05",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "fiafghan",
"github_project": "HuDa",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "pandas",
"specs": [
[
">=",
"1.5"
]
]
},
{
"name": "polars",
"specs": [
[
">=",
"0.20"
]
]
},
{
"name": "numpy",
"specs": [
[
">=",
"1.23"
]
]
},
{
"name": "scikit-learn",
"specs": [
[
">=",
"1.1"
]
]
},
{
"name": "folium",
"specs": [
[
">=",
"0.14"
]
]
},
{
"name": "geopy",
"specs": [
[
">=",
"2.3"
]
]
},
{
"name": "pycountry",
"specs": [
[
">=",
"22.3.5"
]
]
},
{
"name": "requests",
"specs": [
[
">=",
"2.31"
]
]
},
{
"name": "SQLAlchemy",
"specs": [
[
">=",
"1.4"
]
]
},
{
"name": "psycopg2-binary",
"specs": [
[
">=",
"2.9"
]
]
},
{
"name": "geopandas",
"specs": [
[
">=",
"0.12"
]
]
}
],
"lcname": "huda"
}