huda


Namehuda JSON
Version 0.1.4 PyPI version JSON
download
home_pageNone
SummaryHuDa — Humanitarian Data Library utilities for opening, cleaning, transforming, validating, geospatial, analysis, visualization, automation, and interoperability.
upload_time2025-10-26 09:06:05
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseMIT
keywords analysis data etl geospatial gis humanitarian pandas polars
VCS
bugtrack_url
requirements pandas polars numpy scikit-learn folium geopy pycountry requests SQLAlchemy psycopg2-binary geopandas
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # HuDa — Humanitarian Data Library

HuDa is a practical Python library for humanitarian data workflows. It provides simple, consistent functions to open, clean, transform, validate, analyze, map, visualize, automate, and share humanitarian datasets.

- Focused on survey, 5W, monitoring, and geo-enabled data
- Consistent API patterns across modules
- Returns lightweight specs for rendering/exports where appropriate

## Features
- **Opening**: CSV/Excel/JSON/SQL/API connectors
- **Cleaning**: normalize numbers/dates/text, translate categories, deduplicate, geocode
- **Transformation**: reshape, aggregate, indexes, ratios, growth, standardization
- **Validation & Quality**: ranges, missing/mandatory, country codes, dates, profiling
- **Geospatial**: folium maps, choropleths, overlays, heatmaps, clusters
- **Analysis**: correlation, time series, regression, PCA, coverage gaps (selected utilities)
- **Visualization**: chart specs for bar/line/pie/hist/box/heatmap, dashboards
- **Automation**: reports, snapshots, change detection (specs)
- **Interoperability**: export specs (CSV/Excel/JSON/Parquet/SQL/Stata/SPSS/GIS/HDX/HTML/API)

## Installation
HuDa is published on PyPI as `huda`.

```bash
pip install huda
```

Minimum Python version: 3.8

Some modules rely on optional libraries (e.g., folium, geopandas, scikit-learn). See Requirements below if you plan to use those features.

## Quickstart
```python
import polars as pl
from huda.cleaning import translate_categories
from huda.transformation import percentage_calculation
from huda.Interoperability import export_csv

# Example data
df = pl.DataFrame({
    "province": ["Kabul", "Herat"],
    "cluster": ["wash", "wash"],
    "reached": [1200, 900],
    "target": [2000, 1100],
})

# Cleaning
df2 = translate_categories(df, columns={"cluster": {"wash": "WASH"}})

# Transformation
df3 = percentage_calculation(df2, numerator_col="reached", denominator_col="target", output_col="coverage_pct")

# Interoperability (returns intent spec; does not write files)
spec = export_csv(df3, path="/tmp/coverage.csv")
print(spec)
```

## Module Highlights

### Opening
```python
from huda.opening import open_csv, open_excel, open_json
df = open_csv("/path/data.csv")
```

### Cleaning
```python
from huda.cleaning import numbers_standardization, dates_standardization, duplicate
df = numbers_standardization(df, columns=["reached"])  # normalize numeric fields
df = dates_standardization(df, column="report_date", style="iso")
df = duplicate(df, columns=["id"], keep="first")
```

### Transformation
```python
from huda.transformation import pivot_unpivot, severity_index_calculation
df_wide = pivot_unpivot(df, mode="pivot", index=["province"], columns="cluster", values="reached")
df_idx = severity_index_calculation(df, components=["fcs","rcsi"], weights={"fcs":0.6,"rcsi":0.4})
```

### Validation & Quality
```python
from huda.validation_and_quality import country_code_validation, automatic_data_profiling_report
report = automatic_data_profiling_report(df)
valid = country_code_validation(df, data_col="country")
```

### Geospatial
```python
from huda.geospatial import choropleth_maps_by_region
html_map = choropleth_maps_by_region(df, region_col="province", value_col="reached", geojson_path="/path/afg_provinces.geojson")
with open("map.html", "w", encoding="utf-8") as f:
    f.write(html_map)
```

### Visualization (specs)
```python
from huda.visualize import bar_chart, interactive_dashboard
chart = bar_chart(df, category_col="province", value_col="reached")
dashboard = interactive_dashboard(charts=[chart])
```

### Interoperability (specs)
These functions return intent specs you can pass to renderers/uploaders.

```python
from huda.Interoperability import (
    export_csv, export_excel, export_json, export_parquet,
    export_sql_database, export_stata, export_spss,
    export_shapefile, export_geojson, export_hdx_dataset,
    share_dashboard_html, api_integration_output,
)

spec_csv = export_csv(df, path="/tmp/data.csv")
spec_sql = export_sql_database(df, connection_uri="postgresql://user:pass@host:5432/db", table_name="huda_export")
spec_geo = export_geojson(df, path="/tmp/data.geojson", geometry_col="geom")
spec_dash = share_dashboard_html(dashboard, path="/tmp/dashboard.html", embed_assets=True)
```

## Requirements
Core requirements and optional dependencies are specified in `requirements.txt`.

If you plan to use geospatial and mapping utilities, you’ll need packages like `folium` and `geopandas` (which may require system libraries on some platforms). For ML utilities (e.g., outlier isolation), you’ll need `scikit-learn`.

## Development
```bash
python -m venv .venv
. .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
```

Run a quick sanity check:
```bash
python -c "import huda, polars as pl; print('HuDa OK')"
```

## Building & Publishing (maintainers)
HuDa uses PEP 517/518 builds via Hatchling (configured in `pyproject.toml`).

```bash
python -m pip install --upgrade build twine
python -m build
# TestPyPI upload
twine upload --repository testpypi dist/*
# PyPI upload
twine upload dist/*
```

## Contributing
Contributions are welcome. Please open an issue to discuss improvements or new utilities aligned with humanitarian workflows.

## License
MIT License. See `pyproject.toml` and add a `LICENSE` file for full text.

## Links
- **Repository**: https://github.com/fiafghan/HuDa
- **Issues**: https://github.com/fiafghan/HuDa/issues
- **Training website**: in `huda_website/` (React + Tailwind; run with Vite)


            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "huda",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "analysis, data, etl, geospatial, gis, humanitarian, pandas, polars",
    "author": null,
    "author_email": "Fardin Ibrahimi <fiafghan@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/18/e4/d7fd205249cc7db31c0dad7f59e3331a956164c1c99fcf96bafe0ea7c734/huda-0.1.4.tar.gz",
    "platform": null,
    "description": "# HuDa \u2014 Humanitarian Data Library\n\nHuDa is a practical Python library for humanitarian data workflows. It provides simple, consistent functions to open, clean, transform, validate, analyze, map, visualize, automate, and share humanitarian datasets.\n\n- Focused on survey, 5W, monitoring, and geo-enabled data\n- Consistent API patterns across modules\n- Returns lightweight specs for rendering/exports where appropriate\n\n## Features\n- **Opening**: CSV/Excel/JSON/SQL/API connectors\n- **Cleaning**: normalize numbers/dates/text, translate categories, deduplicate, geocode\n- **Transformation**: reshape, aggregate, indexes, ratios, growth, standardization\n- **Validation & Quality**: ranges, missing/mandatory, country codes, dates, profiling\n- **Geospatial**: folium maps, choropleths, overlays, heatmaps, clusters\n- **Analysis**: correlation, time series, regression, PCA, coverage gaps (selected utilities)\n- **Visualization**: chart specs for bar/line/pie/hist/box/heatmap, dashboards\n- **Automation**: reports, snapshots, change detection (specs)\n- **Interoperability**: export specs (CSV/Excel/JSON/Parquet/SQL/Stata/SPSS/GIS/HDX/HTML/API)\n\n## Installation\nHuDa is published on PyPI as `huda`.\n\n```bash\npip install huda\n```\n\nMinimum Python version: 3.8\n\nSome modules rely on optional libraries (e.g., folium, geopandas, scikit-learn). See Requirements below if you plan to use those features.\n\n## Quickstart\n```python\nimport polars as pl\nfrom huda.cleaning import translate_categories\nfrom huda.transformation import percentage_calculation\nfrom huda.Interoperability import export_csv\n\n# Example data\ndf = pl.DataFrame({\n    \"province\": [\"Kabul\", \"Herat\"],\n    \"cluster\": [\"wash\", \"wash\"],\n    \"reached\": [1200, 900],\n    \"target\": [2000, 1100],\n})\n\n# Cleaning\ndf2 = translate_categories(df, columns={\"cluster\": {\"wash\": \"WASH\"}})\n\n# Transformation\ndf3 = percentage_calculation(df2, numerator_col=\"reached\", denominator_col=\"target\", output_col=\"coverage_pct\")\n\n# Interoperability (returns intent spec; does not write files)\nspec = export_csv(df3, path=\"/tmp/coverage.csv\")\nprint(spec)\n```\n\n## Module Highlights\n\n### Opening\n```python\nfrom huda.opening import open_csv, open_excel, open_json\ndf = open_csv(\"/path/data.csv\")\n```\n\n### Cleaning\n```python\nfrom huda.cleaning import numbers_standardization, dates_standardization, duplicate\ndf = numbers_standardization(df, columns=[\"reached\"])  # normalize numeric fields\ndf = dates_standardization(df, column=\"report_date\", style=\"iso\")\ndf = duplicate(df, columns=[\"id\"], keep=\"first\")\n```\n\n### Transformation\n```python\nfrom huda.transformation import pivot_unpivot, severity_index_calculation\ndf_wide = pivot_unpivot(df, mode=\"pivot\", index=[\"province\"], columns=\"cluster\", values=\"reached\")\ndf_idx = severity_index_calculation(df, components=[\"fcs\",\"rcsi\"], weights={\"fcs\":0.6,\"rcsi\":0.4})\n```\n\n### Validation & Quality\n```python\nfrom huda.validation_and_quality import country_code_validation, automatic_data_profiling_report\nreport = automatic_data_profiling_report(df)\nvalid = country_code_validation(df, data_col=\"country\")\n```\n\n### Geospatial\n```python\nfrom huda.geospatial import choropleth_maps_by_region\nhtml_map = choropleth_maps_by_region(df, region_col=\"province\", value_col=\"reached\", geojson_path=\"/path/afg_provinces.geojson\")\nwith open(\"map.html\", \"w\", encoding=\"utf-8\") as f:\n    f.write(html_map)\n```\n\n### Visualization (specs)\n```python\nfrom huda.visualize import bar_chart, interactive_dashboard\nchart = bar_chart(df, category_col=\"province\", value_col=\"reached\")\ndashboard = interactive_dashboard(charts=[chart])\n```\n\n### Interoperability (specs)\nThese functions return intent specs you can pass to renderers/uploaders.\n\n```python\nfrom huda.Interoperability import (\n    export_csv, export_excel, export_json, export_parquet,\n    export_sql_database, export_stata, export_spss,\n    export_shapefile, export_geojson, export_hdx_dataset,\n    share_dashboard_html, api_integration_output,\n)\n\nspec_csv = export_csv(df, path=\"/tmp/data.csv\")\nspec_sql = export_sql_database(df, connection_uri=\"postgresql://user:pass@host:5432/db\", table_name=\"huda_export\")\nspec_geo = export_geojson(df, path=\"/tmp/data.geojson\", geometry_col=\"geom\")\nspec_dash = share_dashboard_html(dashboard, path=\"/tmp/dashboard.html\", embed_assets=True)\n```\n\n## Requirements\nCore requirements and optional dependencies are specified in `requirements.txt`.\n\nIf you plan to use geospatial and mapping utilities, you\u2019ll need packages like `folium` and `geopandas` (which may require system libraries on some platforms). For ML utilities (e.g., outlier isolation), you\u2019ll need `scikit-learn`.\n\n## Development\n```bash\npython -m venv .venv\n. .venv/bin/activate\npip install --upgrade pip\npip install -r requirements.txt\n```\n\nRun a quick sanity check:\n```bash\npython -c \"import huda, polars as pl; print('HuDa OK')\"\n```\n\n## Building & Publishing (maintainers)\nHuDa uses PEP 517/518 builds via Hatchling (configured in `pyproject.toml`).\n\n```bash\npython -m pip install --upgrade build twine\npython -m build\n# TestPyPI upload\ntwine upload --repository testpypi dist/*\n# PyPI upload\ntwine upload dist/*\n```\n\n## Contributing\nContributions are welcome. Please open an issue to discuss improvements or new utilities aligned with humanitarian workflows.\n\n## License\nMIT License. See `pyproject.toml` and add a `LICENSE` file for full text.\n\n## Links\n- **Repository**: https://github.com/fiafghan/HuDa\n- **Issues**: https://github.com/fiafghan/HuDa/issues\n- **Training website**: in `huda_website/` (React + Tailwind; run with Vite)\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "HuDa \u2014 Humanitarian Data Library utilities for opening, cleaning, transforming, validating, geospatial, analysis, visualization, automation, and interoperability.",
    "version": "0.1.4",
    "project_urls": {
        "Homepage": "https://github.com/fiafghan/HuDa",
        "Issues": "https://github.com/fiafghan/HuDa/issues",
        "Repository": "https://github.com/fiafghan/HuDa"
    },
    "split_keywords": [
        "analysis",
        " data",
        " etl",
        " geospatial",
        " gis",
        " humanitarian",
        " pandas",
        " polars"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "15cf50fb231f2841c68bfd98e4235a2500d345f2312aa962d47a65bdd4b4f14e",
                "md5": "e8cefa8c7042c89fcda28af0d00ee362",
                "sha256": "f6dd81052bf6e36fddd0bc7e195f27d6b969b41b20a26361afae8a25ef21ea9d"
            },
            "downloads": -1,
            "filename": "huda-0.1.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e8cefa8c7042c89fcda28af0d00ee362",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 173583,
            "upload_time": "2025-10-26T09:06:02",
            "upload_time_iso_8601": "2025-10-26T09:06:02.684908Z",
            "url": "https://files.pythonhosted.org/packages/15/cf/50fb231f2841c68bfd98e4235a2500d345f2312aa962d47a65bdd4b4f14e/huda-0.1.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "18e4d7fd205249cc7db31c0dad7f59e3331a956164c1c99fcf96bafe0ea7c734",
                "md5": "0f95e1eee955e2c7df0ab77330246be8",
                "sha256": "0b9b53919b1bcd59e43f23a2f8920d70b418766c7e2f8d2a80494e977e1fa087"
            },
            "downloads": -1,
            "filename": "huda-0.1.4.tar.gz",
            "has_sig": false,
            "md5_digest": "0f95e1eee955e2c7df0ab77330246be8",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 197165,
            "upload_time": "2025-10-26T09:06:05",
            "upload_time_iso_8601": "2025-10-26T09:06:05.816185Z",
            "url": "https://files.pythonhosted.org/packages/18/e4/d7fd205249cc7db31c0dad7f59e3331a956164c1c99fcf96bafe0ea7c734/huda-0.1.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-26 09:06:05",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "fiafghan",
    "github_project": "HuDa",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "pandas",
            "specs": [
                [
                    ">=",
                    "1.5"
                ]
            ]
        },
        {
            "name": "polars",
            "specs": [
                [
                    ">=",
                    "0.20"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    ">=",
                    "1.23"
                ]
            ]
        },
        {
            "name": "scikit-learn",
            "specs": [
                [
                    ">=",
                    "1.1"
                ]
            ]
        },
        {
            "name": "folium",
            "specs": [
                [
                    ">=",
                    "0.14"
                ]
            ]
        },
        {
            "name": "geopy",
            "specs": [
                [
                    ">=",
                    "2.3"
                ]
            ]
        },
        {
            "name": "pycountry",
            "specs": [
                [
                    ">=",
                    "22.3.5"
                ]
            ]
        },
        {
            "name": "requests",
            "specs": [
                [
                    ">=",
                    "2.31"
                ]
            ]
        },
        {
            "name": "SQLAlchemy",
            "specs": [
                [
                    ">=",
                    "1.4"
                ]
            ]
        },
        {
            "name": "psycopg2-binary",
            "specs": [
                [
                    ">=",
                    "2.9"
                ]
            ]
        },
        {
            "name": "geopandas",
            "specs": [
                [
                    ">=",
                    "0.12"
                ]
            ]
        }
    ],
    "lcname": "huda"
}
        
Elapsed time: 0.77396s