pdbpy

Name	pdbpy JSON
Version	0.0.1 JSON
	download
home_page	None
Summary	Pure python implementation of parsing PDB debug information files
upload_time	2024-04-29 23:13:00
maintainer	None
docs_url	None
author	None
requires_python	>=3.12
license	MIT License Copyright (c) 2022 Pierre LeMoine Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords	pdb debug
VCS
bugtrack_url
requirements	dtypes memorywrapper
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # pdbpy
A pure python implementation of Program Database file parsing

## Motivation

I want to be able to parse PDB files using python. I want to understand their structure.

There are other libraries and implementations (see below). This one differs in that it works
 out of the box, and it lazily loads only what is requested. Working with a file 1GB+ large
 and only want to know how to parse a single structure? Only want to know the address of a
 symbol? No problem, only the minumum will be loaded!

The PDB is memory-mapped. The underlying MSF format makes data possibly non-contiguous,
 but using the `memorywrapper` library that becomes (mostly) a non-issue, as it can
 provide a memoryview-like self-sliceable non-copying front for the data. Bytes
 are only copied when the view is accessed as a buffer. Unfortunately the data needs to
 be copied at that point, as the buffer protocol does not support wildly discontiguous
 memory areas.

### Features!
-------------

| Feature                                                   | pdbpy                     |
| :---                                                      | :---:                     |
| Can open PDB                                              |  ✅                      |
| Can find a given type by name                             |  ✅                      |
| Uses the _Hash Stream_ to accelerate type lookup by name? |  ✅                      |
| Can look up symbols given name?                           |  ✅ (from global table)  |
| Can look up symbols given addresses?                      |  ❌                      |

## Installation

`pip install pdbpy`


### Getting started
-------------------

From `test_symbol_address` in [test_windows_pdb.py](tests/test_windows_pdb.py)
```py
    pdb = PDB("example_pdbs/addr.pdb")
    addr = pdb.find_symbol_address("global_variable")
```

#### Explain the type hash stream
---------------------------------

The hash stream consists of two parts:
An ordered list of _truncated hashes_, and a list of of {TI, byteoffset} pairs to accelerate lookup.

The _truncated hashes_ are hashes of the TI records, modulo'ed by the _number of buckets_.
The number of buckets can be found in the header of the type stream.
This can be loaded into a `hash = Dict[TruncatedHash, List[TI]]`
Given the hash of a TI-record, we can find a list of potential TIs.

The second part of the hash stream accelerates this. It contains a list of monotonically increasing
 `Tuple[TI, ByteOffset]`-pairs. If we have a TI, we can find the offset of the closes preceeding TI
 and parse the TI-records from there until we find the exact one we want.

Combining the two functionalities offered by the hash stream, we thus find a list of potential TIs given
 a hash, and then use the second part to accelerate the lookup of the actual records, which we need
 to examine in order to determine if we found the TI matching the non-truncated hash.

The hash of TI-records is often the hash of the unique name (if there is one).
If there isn't any unique name, it's a hash of the bytes of entire record.
The functions used to compute the hashes are different for the unique name strings
 and for the bytes of the records.

##### Sources and references
* [volatility3](https://github.com/volatilityfoundation/volatility3) has some under `/volatility3/framework/symbols/windows/[pdb.json|pdbconv.py]`
    * At least in commit `8e420dec41861993f0cd2837721af2d3e7a6d07a`
* [radare2](https://github.com/radareorg/radare2)
* [microsoft-pdb](https://github.com/microsoft/microsoft-pdb/blob/805655a28bd8198004be2ac27e6e0290121a5e89/PDB/) but it's not really well explained
* [moyix/pdbparse](https://github.com/moyix/pdbparse) - Iconic python implementation
* [willglynn/pdb](https://github.com/willglynn/pdb) - Rust implementation of reader

* [Air14/SymbolicAccess](https://github.com/Air14/SymbolicAccess)
    * [https://github.com/Oxygen1a1/oxgenPdb](https://github.com/Oxygen1a1/oxgenPdb) derivative

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "pdbpy",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.12",
    "maintainer_email": null,
    "keywords": "PDB, debug",
    "author": null,
    "author_email": "Pierre LeMoine <pypi@luben.se>",
    "download_url": "https://files.pythonhosted.org/packages/3a/01/1fb135bcc9ec267a0cf2385df6e065d986edcef56d596045465f066515ce/pdbpy-0.0.1.tar.gz",
    "platform": null,
    "description": "# pdbpy\nA pure python implementation of Program Database file parsing\n\n## Motivation\n\nI want to be able to parse PDB files using python. I want to understand their structure.\n\nThere are other libraries and implementations (see below). This one differs in that it works\n out of the box, and it lazily loads only what is requested. Working with a file 1GB+ large\n and only want to know how to parse a single structure? Only want to know the address of a\n symbol? No problem, only the minumum will be loaded!\n\nThe PDB is memory-mapped. The underlying MSF format makes data possibly non-contiguous,\n but using the `memorywrapper` library that becomes (mostly) a non-issue, as it can\n provide a memoryview-like self-sliceable non-copying front for the data. Bytes\n are only copied when the view is accessed as a buffer. Unfortunately the data needs to\n be copied at that point, as the buffer protocol does not support wildly discontiguous\n memory areas.\n\n### Features!\n-------------\n\n| Feature                                                   | pdbpy                     |\n| :---                                                      | :---:                     |\n| Can open PDB                                              |  \u2705                      |\n| Can find a given type by name                             |  \u2705                      |\n| Uses the _Hash Stream_ to accelerate type lookup by name? |  \u2705                      |\n| Can look up symbols given name?                           |  \u2705 (from global table)  |\n| Can look up symbols given addresses?                      |  \u274c                      |\n\n## Installation\n\n`pip install pdbpy`\n\n\n### Getting started\n-------------------\n\nFrom `test_symbol_address` in [test_windows_pdb.py](tests/test_windows_pdb.py)\n```py\n    pdb = PDB(\"example_pdbs/addr.pdb\")\n    addr = pdb.find_symbol_address(\"global_variable\")\n```\n\n#### Explain the type hash stream\n---------------------------------\n\nThe hash stream consists of two parts:\nAn ordered list of _truncated hashes_, and a list of of {TI, byteoffset} pairs to accelerate lookup.\n\nThe _truncated hashes_ are hashes of the TI records, modulo'ed by the _number of buckets_.\nThe number of buckets can be found in the header of the type stream.\nThis can be loaded into a `hash = Dict[TruncatedHash, List[TI]]`\nGiven the hash of a TI-record, we can find a list of potential TIs.\n\nThe second part of the hash stream accelerates this. It contains a list of monotonically increasing\n `Tuple[TI, ByteOffset]`-pairs. If we have a TI, we can find the offset of the closes preceeding TI\n and parse the TI-records from there until we find the exact one we want.\n\nCombining the two functionalities offered by the hash stream, we thus find a list of potential TIs given\n a hash, and then use the second part to accelerate the lookup of the actual records, which we need\n to examine in order to determine if we found the TI matching the non-truncated hash.\n\nThe hash of TI-records is often the hash of the unique name (if there is one).\nIf there isn't any unique name, it's a hash of the bytes of entire record.\nThe functions used to compute the hashes are different for the unique name strings\n and for the bytes of the records.\n\n##### Sources and references\n* [volatility3](https://github.com/volatilityfoundation/volatility3) has some under `/volatility3/framework/symbols/windows/[pdb.json|pdbconv.py]`\n    * At least in commit `8e420dec41861993f0cd2837721af2d3e7a6d07a`\n* [radare2](https://github.com/radareorg/radare2)\n* [microsoft-pdb](https://github.com/microsoft/microsoft-pdb/blob/805655a28bd8198004be2ac27e6e0290121a5e89/PDB/) but it's not really well explained\n* [moyix/pdbparse](https://github.com/moyix/pdbparse) - Iconic python implementation\n* [willglynn/pdb](https://github.com/willglynn/pdb) - Rust implementation of reader\n\n* [Air14/SymbolicAccess](https://github.com/Air14/SymbolicAccess)\n    * [https://github.com/Oxygen1a1/oxgenPdb](https://github.com/Oxygen1a1/oxgenPdb) derivative\n",
    "bugtrack_url": null,
    "license": "MIT License  Copyright (c) 2022 Pierre LeMoine  Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:  The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.  THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ",
    "summary": "Pure python implementation of parsing PDB debug information files",
    "version": "0.0.1",
    "project_urls": {
        "Homepage": "https://github.com/DrInfiniteExplorer/pdbpy",
        "Repository": "https://github.com/DrInfiniteExplorer/pdbpy"
    },
    "split_keywords": [
        "pdb",
        " debug"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6a0b03be2fd67fe3a9cff6e8a805a7275c76a5811e6d6ceb74cf5a9164ca9115",
                "md5": "2cab7916ba6723c854e39b865dbbc8c5",
                "sha256": "bd5f6b98eb4b4b2e31167a795850c1694373946e9f234bfd7686090e0eee68b5"
            },
            "downloads": -1,
            "filename": "pdbpy-0.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "2cab7916ba6723c854e39b865dbbc8c5",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.12",
            "size": 53647,
            "upload_time": "2024-04-29T23:12:58",
            "upload_time_iso_8601": "2024-04-29T23:12:58.666405Z",
            "url": "https://files.pythonhosted.org/packages/6a/0b/03be2fd67fe3a9cff6e8a805a7275c76a5811e6d6ceb74cf5a9164ca9115/pdbpy-0.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3a011fb135bcc9ec267a0cf2385df6e065d986edcef56d596045465f066515ce",
                "md5": "55f398804e29f9b9077966674b287458",
                "sha256": "800ca5452250b080e6c68856f23fb28a753a6f191f41278571208c7d79f7a3f6"
            },
            "downloads": -1,
            "filename": "pdbpy-0.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "55f398804e29f9b9077966674b287458",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.12",
            "size": 1672868,
            "upload_time": "2024-04-29T23:13:00",
            "upload_time_iso_8601": "2024-04-29T23:13:00.820183Z",
            "url": "https://files.pythonhosted.org/packages/3a/01/1fb135bcc9ec267a0cf2385df6e065d986edcef56d596045465f066515ce/pdbpy-0.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-29 23:13:00",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "DrInfiniteExplorer",
    "github_project": "pdbpy",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "dtypes",
            "specs": []
        },
        {
            "name": "memorywrapper",
            "specs": []
        }
    ],
    "lcname": "pdbpy"
}

None