modm-data


Namemodm-data JSON
Version 0.0.1 PyPI version JSON
download
home_page
SummaryEmbedded Hardware Description Processor
upload_time2023-10-15 21:22:46
maintainer
docs_urlNone
author
requires_python>=3.11
licenseMPL-2.0
keywords embedded hardware pdf parser
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # modm-data: Embedded Hardware Description

This project is a collection of data processing pipelines that convert and
combine multiple sources of hardware description data into the most accurate
common representation without manual supervision.

There are many different supported input sources per hardware vendor:

- PDF technical documentation, especially datasheets and reference manuals.
- Source code and CMSIS-SVD files describing peripheral registers.
- Vendor libraries for helping with naming things canonically.
- Proprietary databases extracted from vendor tooling.

These input sources are made accessible via deterministic data pipelines before
finally merging them together. This approach has the best chance of
compensating weaknesses in each individual input source while also arbitrating
conflicts. The output formats are knowledge graphs with a shared ontology.

The resulting knowledge graphs represent a normalized and complete semantic
description of the hardware and are NOT intended to be used directly. Rather,
you should extract the data you require and convert it into a format that is
useful for your specific use case and device scope. This repository only
contains data pipeline code, therefore, if you are interested in the hardware
description data only, please use the resulting knowledge graphs directly.

> **Warning**  
> The project is still in beta and not fully functional or documented.
> Improving the documentation and flexibility of the `modm_data.pdf2html`
> submodule is the main focus of development right now.
> No output data other than HTML is currently supported.


## Installation

You can install this Python ≥3.11 project via PyPi:

```sh
pip install modm-data
```

You also need `g++` and `patch` installed and callable in your path.


## Input Sources

You can download all input sources via `make input-sources`. Please note that it
may take a while to download ~10GB of data, mostly PDF technical documentation.

This project uses only publicly available data sources which we have aggregated
in several GitHub repositories. However, since the copyright of some sources
prohibits republication, these sources are downloaded from the vendor websites
directly:

- STMicro CubeMX database.
- STMicro PDF technical documentation.


## Pipelines

The data pipelines are implemented as Python modules inside `modm_data` folder and
have the following structure:

```mermaid
flowchart LR
    A(PDF) -->|pdf2html| B
    B -->|html2svd| D
    B(HTML) -->|html| C
    %% C --> K
    C(Python) -->|owl| E
    D(CMSIS-SVD) -->|cmsis-svd| C
    E[OWL]
    F(CMSIS\nHeader) -->|header2svd| D
    G(CubeMX) -->|cubemx| C
    H(CubeHAL) -->|cubehal| C
    J -->|dl| A
    J -->|dl| F
    J -->|dl| G
    J -->|dl| H
    J[Vendor] -->|dl| D
    %% K[Evaluation]
```

Each pipeline has its own command-line interface, please refer to the API
documentation for their advanced usage.


## Development

For development you can install the package locally:

```sh
pip install -e ".[all]"
```

To browse the API documentation locally:

```sh
pdoc modm_data
```


## Citation

This project is a further development of a [peer-reviewed paper published in
the in the Journal of Systems Research (JSys)](todo).
Please cite this paper when referring to this project:

```bib
@article{hauser2023automatically,
  title={{Automatically Extracting Hardware Descriptions from PDF Technical Documentation}},
  author={Hauser, Niklas and Pennekamp, Jan},
  journal={Journal of Systems Research (JSys)},
  volume={3},
  issue={2},
  year={2023},
  doi={10.5070/tbd}
}
```

The paper itself is based on a [master thesis](https://salkinium.com/master.pdf).

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "modm-data",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.11",
    "maintainer_email": "",
    "keywords": "embedded,hardware,pdf,parser",
    "author": "",
    "author_email": "Niklas Hauser <niklas@salkinium.com>",
    "download_url": "https://files.pythonhosted.org/packages/79/ef/129cc3325d82cfb880a16c3e4595c3a0777aa27cd01bb2c5658e43df5c6c/modm-data-0.0.1.tar.gz",
    "platform": null,
    "description": "# modm-data: Embedded Hardware Description\n\nThis project is a collection of data processing pipelines that convert and\ncombine multiple sources of hardware description data into the most accurate\ncommon representation without manual supervision.\n\nThere are many different supported input sources per hardware vendor:\n\n- PDF technical documentation, especially datasheets and reference manuals.\n- Source code and CMSIS-SVD files describing peripheral registers.\n- Vendor libraries for helping with naming things canonically.\n- Proprietary databases extracted from vendor tooling.\n\nThese input sources are made accessible via deterministic data pipelines before\nfinally merging them together. This approach has the best chance of\ncompensating weaknesses in each individual input source while also arbitrating\nconflicts. The output formats are knowledge graphs with a shared ontology.\n\nThe resulting knowledge graphs represent a normalized and complete semantic\ndescription of the hardware and are NOT intended to be used directly. Rather,\nyou should extract the data you require and convert it into a format that is\nuseful for your specific use case and device scope. This repository only\ncontains data pipeline code, therefore, if you are interested in the hardware\ndescription data only, please use the resulting knowledge graphs directly.\n\n> **Warning**  \n> The project is still in beta and not fully functional or documented.\n> Improving the documentation and flexibility of the `modm_data.pdf2html`\n> submodule is the main focus of development right now.\n> No output data other than HTML is currently supported.\n\n\n## Installation\n\nYou can install this Python \u22653.11 project via PyPi:\n\n```sh\npip install modm-data\n```\n\nYou also need `g++` and `patch` installed and callable in your path.\n\n\n## Input Sources\n\nYou can download all input sources via `make input-sources`. Please note that it\nmay take a while to download ~10GB of data, mostly PDF technical documentation.\n\nThis project uses only publicly available data sources which we have aggregated\nin several GitHub repositories. However, since the copyright of some sources\nprohibits republication, these sources are downloaded from the vendor websites\ndirectly:\n\n- STMicro CubeMX database.\n- STMicro PDF technical documentation.\n\n\n## Pipelines\n\nThe data pipelines are implemented as Python modules inside `modm_data` folder and\nhave the following structure:\n\n```mermaid\nflowchart LR\n    A(PDF) -->|pdf2html| B\n    B -->|html2svd| D\n    B(HTML) -->|html| C\n    %% C --> K\n    C(Python) -->|owl| E\n    D(CMSIS-SVD) -->|cmsis-svd| C\n    E[OWL]\n    F(CMSIS\\nHeader) -->|header2svd| D\n    G(CubeMX) -->|cubemx| C\n    H(CubeHAL) -->|cubehal| C\n    J -->|dl| A\n    J -->|dl| F\n    J -->|dl| G\n    J -->|dl| H\n    J[Vendor] -->|dl| D\n    %% K[Evaluation]\n```\n\nEach pipeline has its own command-line interface, please refer to the API\ndocumentation for their advanced usage.\n\n\n## Development\n\nFor development you can install the package locally:\n\n```sh\npip install -e \".[all]\"\n```\n\nTo browse the API documentation locally:\n\n```sh\npdoc modm_data\n```\n\n\n## Citation\n\nThis project is a further development of a [peer-reviewed paper published in\nthe in the Journal of Systems Research (JSys)](todo).\nPlease cite this paper when referring to this project:\n\n```bib\n@article{hauser2023automatically,\n  title={{Automatically Extracting Hardware Descriptions from PDF Technical Documentation}},\n  author={Hauser, Niklas and Pennekamp, Jan},\n  journal={Journal of Systems Research (JSys)},\n  volume={3},\n  issue={2},\n  year={2023},\n  doi={10.5070/tbd}\n}\n```\n\nThe paper itself is based on a [master thesis](https://salkinium.com/master.pdf).\n",
    "bugtrack_url": null,
    "license": "MPL-2.0",
    "summary": "Embedded Hardware Description Processor",
    "version": "0.0.1",
    "project_urls": {
        "Changelog": "https://github.com/modm-io/modm-data/blob/main/CHANGELOG.md",
        "GitHub": "https://github.com/modm-io/modm-data"
    },
    "split_keywords": [
        "embedded",
        "hardware",
        "pdf",
        "parser"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d4d0c719f7d460116b90faed6eee6eb624a1916e6f3521e04e073ef16f9151da",
                "md5": "cd69fed971469eb5cf79bdab976906ff",
                "sha256": "9033ba5d1760dd9a4221a406add1e04756d09a471f6d9e66332806c58a147791"
            },
            "downloads": -1,
            "filename": "modm_data-0.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "cd69fed971469eb5cf79bdab976906ff",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11",
            "size": 169007,
            "upload_time": "2023-10-15T21:22:44",
            "upload_time_iso_8601": "2023-10-15T21:22:44.466294Z",
            "url": "https://files.pythonhosted.org/packages/d4/d0/c719f7d460116b90faed6eee6eb624a1916e6f3521e04e073ef16f9151da/modm_data-0.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "79ef129cc3325d82cfb880a16c3e4595c3a0777aa27cd01bb2c5658e43df5c6c",
                "md5": "750dbd4e755480a7df2e8150a964ff3b",
                "sha256": "889f30588e1d89fafc3c6d304f96fa8fa9fe8506f87cf0d5f22f1ea0af7a9d01"
            },
            "downloads": -1,
            "filename": "modm-data-0.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "750dbd4e755480a7df2e8150a964ff3b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11",
            "size": 126723,
            "upload_time": "2023-10-15T21:22:46",
            "upload_time_iso_8601": "2023-10-15T21:22:46.127556Z",
            "url": "https://files.pythonhosted.org/packages/79/ef/129cc3325d82cfb880a16c3e4595c3a0777aa27cd01bb2c5658e43df5c6c/modm-data-0.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-10-15 21:22:46",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "modm-io",
    "github_project": "modm-data",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "modm-data"
}
        
Elapsed time: 0.12668s