llama-index-packs-code-hierarchy

Name	llama-index-packs-code-hierarchy JSON
Version	0.1.5 JSON
	download
home_page	None
Summary	A node parser which can create a hierarchy of all code scopes in a directory.
upload_time	2024-04-22 02:41:32
maintainer	ryanpeach
docs_url	None
author	Ryan Peach
requires_python	<3.12,>=3.8.1
license	MIT
keywords	c code cpp hierarchy html javascript python repo typescript
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # CodeHierarchyAgentPack

```bash
# install
pip install llama-index-packs-code-hierarchy

# download source code
llamaindex-cli download-llamapack CodeHierarchyAgentPack -d ./code_hierarchy_pack
```

The `CodeHierarchyAgentPack` is useful to split long code files into more reasonable chunks, while creating an agent on top to navigate the code. What this will do is create a "Hierarchy" of sorts, where sections of the code are made more reasonable by replacing the scope body with short comments telling the LLM to search for a referenced node if it wants to read that context body.

Nodes in this hierarchy will be split based on scope, like function, class, or method scope, and will have links to their children and parents so the LLM can traverse the tree.

```python
from llama_index.core.text_splitter import CodeSplitter
from llama_index.llms.openai import OpenAI
from llama_index.packs.code_hierarchy import (
    CodeHierarchyAgentPack,
    CodeHierarchyNodeParser,
)

llm = OpenAI(model="gpt-4", temperature=0.2)

documents = SimpleDirectoryReader(
    input_files=[
        Path("../llama_index/packs/code_hierarchy/code_hierarchy.py")
    ],
    file_metadata=lambda x: {"filepath": x},
).load_data()

split_nodes = CodeHierarchyNodeParser(
    language="python",
    # You can further parameterize the CodeSplitter to split the code
    # into "chunks" that match your context window size using
    # chunck_lines and max_chars parameters, here we just use the defaults
    code_splitter=CodeSplitter(
        language="python", max_chars=1000, chunk_lines=10
    ),
).get_nodes_from_documents(documents)

pack = CodeHierarchyAgentPack(split_nodes=split_nodes, llm=llm)

pack.run(
    "How does the get_code_hierarchy_from_nodes function from the code hierarchy node parser work? Provide specific implementation details."
)
```

A full example can be found [here in combination with `](https://github.com/run-llama/llama_index/blob/main/llama-index-packs/llama-index-packs-code-hierarchy/examples/CodeHierarchyNodeParserUsage.ipynb).

## Repo Maps

The pack contains a `CodeHierarchyKeywordQueryEngine` that uses a `CodeHierarchyNodeParser` to generate a map of a repository's structure and contents. This is useful for the LLM to understand the structure of a codebase, and to be able to reference specific files or directories.

For example:

- code_hierarchy
  - \_SignatureCaptureType
  - \_SignatureCaptureOptions
  - \_ScopeMethod
  - \_CommentOptions
  - \_ScopeItem
  - \_ChunkNodeOutput
  - CodeHierarchyNodeParser
    - class_name
    - **init**
    - \_get_node_name
      - recur
    - \_get_node_signature
      - find_start
      - find_end
    - \_chunk_node
    - get_code_hierarchy_from_nodes
      - get_subdict
      - recur_inclusive_scope
      - dict_to_markdown
    - \_parse_nodes
    - \_get_indentation
    - \_get_comment_text
    - \_create_comment_line
    - \_get_replacement_text
    - \_skeletonize
    - \_skeletonize_list
      - recur

## Usage as a Tool with an Agent

You can create a tool for any agent using the nodes from the node parser:

```python
from llama_index.agent.openai import OpenAIAgent
from llama_index.core.tools import QueryEngineTool
from llama_index.packs.code_hierarchy import CodeHierarchyKeywordQueryEngine

query_engine = CodeHierarchyKeywordQueryEngine(
    nodes=split_nodes,
)

tool = QueryEngineTool.from_defaults(
    query_engine=query_engine,
    name="code_lookup",
    description="Useful for looking up information about the code hierarchy codebase.",
)

agent = OpenAIAgent.from_tools(
    [tool], system_prompt=query_engine.get_tool_instructions(), verbose=True
)
```

## Adding new languages

To add a new language you need to edit `_DEFAULT_SIGNATURE_IDENTIFIERS` in `code_hierarchy.py`.

The docstrings are infomative as how you ought to do this and its nuances, it should work for most languages.

Please **test your new language** by adding a new file to `tests/file/code/` and testing all your edge cases.

People often ask "how do I find the Node Types I need for a new language?" The best way is to use breakpoints.
I have added a comment `TIP: This is a wonderful place to put a debug breakpoint` in the `code_hierarchy.py` file, put a breakpoint there, input some code in the desired language, and step through it to find the name
of the node you want to capture.

The code as it is should handle any language which:

1. expects you to indent deeper scopes
2. has a way to comment, either full line or between delimiters

## Future

I'm considering adding all the languages from [aider](https://github.com/paul-gauthier/aider/tree/main/aider/queries)
by incorporating `.scm` files instead of `_SignatureCaptureType`, `_SignatureCaptureOptions`, and `_DEFAULT_SIGNATURE_IDENTIFIERS`

## Contributing

You will need to set your `OPENAI_API_KEY` in your env to run the notebook or test the pack.

You can run tests with `pytest tests` in the root directory of this pack.

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "llama-index-packs-code-hierarchy",
    "maintainer": "ryanpeach",
    "docs_url": null,
    "requires_python": "<3.12,>=3.8.1",
    "maintainer_email": null,
    "keywords": "c, code, cpp, hierarchy, html, javascript, python, repo, typescript",
    "author": "Ryan Peach",
    "author_email": "rgpeach10@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/a7/ac/3f1c4a48f1e186fa6b956c22e688e3b7a578434d3f298f8080e8cd9fec3d/llama_index_packs_code_hierarchy-0.1.5.tar.gz",
    "platform": null,
    "description": "# CodeHierarchyAgentPack\n\n```bash\n# install\npip install llama-index-packs-code-hierarchy\n\n# download source code\nllamaindex-cli download-llamapack CodeHierarchyAgentPack -d ./code_hierarchy_pack\n```\n\nThe `CodeHierarchyAgentPack` is useful to split long code files into more reasonable chunks, while creating an agent on top to navigate the code. What this will do is create a \"Hierarchy\" of sorts, where sections of the code are made more reasonable by replacing the scope body with short comments telling the LLM to search for a referenced node if it wants to read that context body.\n\nNodes in this hierarchy will be split based on scope, like function, class, or method scope, and will have links to their children and parents so the LLM can traverse the tree.\n\n```python\nfrom llama_index.core.text_splitter import CodeSplitter\nfrom llama_index.llms.openai import OpenAI\nfrom llama_index.packs.code_hierarchy import (\n    CodeHierarchyAgentPack,\n    CodeHierarchyNodeParser,\n)\n\nllm = OpenAI(model=\"gpt-4\", temperature=0.2)\n\ndocuments = SimpleDirectoryReader(\n    input_files=[\n        Path(\"../llama_index/packs/code_hierarchy/code_hierarchy.py\")\n    ],\n    file_metadata=lambda x: {\"filepath\": x},\n).load_data()\n\nsplit_nodes = CodeHierarchyNodeParser(\n    language=\"python\",\n    # You can further parameterize the CodeSplitter to split the code\n    # into \"chunks\" that match your context window size using\n    # chunck_lines and max_chars parameters, here we just use the defaults\n    code_splitter=CodeSplitter(\n        language=\"python\", max_chars=1000, chunk_lines=10\n    ),\n).get_nodes_from_documents(documents)\n\npack = CodeHierarchyAgentPack(split_nodes=split_nodes, llm=llm)\n\npack.run(\n    \"How does the get_code_hierarchy_from_nodes function from the code hierarchy node parser work? Provide specific implementation details.\"\n)\n```\n\nA full example can be found [here in combination with `](https://github.com/run-llama/llama_index/blob/main/llama-index-packs/llama-index-packs-code-hierarchy/examples/CodeHierarchyNodeParserUsage.ipynb).\n\n## Repo Maps\n\nThe pack contains a `CodeHierarchyKeywordQueryEngine` that uses a `CodeHierarchyNodeParser` to generate a map of a repository's structure and contents. This is useful for the LLM to understand the structure of a codebase, and to be able to reference specific files or directories.\n\nFor example:\n\n- code_hierarchy\n  - \\_SignatureCaptureType\n  - \\_SignatureCaptureOptions\n  - \\_ScopeMethod\n  - \\_CommentOptions\n  - \\_ScopeItem\n  - \\_ChunkNodeOutput\n  - CodeHierarchyNodeParser\n    - class_name\n    - **init**\n    - \\_get_node_name\n      - recur\n    - \\_get_node_signature\n      - find_start\n      - find_end\n    - \\_chunk_node\n    - get_code_hierarchy_from_nodes\n      - get_subdict\n      - recur_inclusive_scope\n      - dict_to_markdown\n    - \\_parse_nodes\n    - \\_get_indentation\n    - \\_get_comment_text\n    - \\_create_comment_line\n    - \\_get_replacement_text\n    - \\_skeletonize\n    - \\_skeletonize_list\n      - recur\n\n## Usage as a Tool with an Agent\n\nYou can create a tool for any agent using the nodes from the node parser:\n\n```python\nfrom llama_index.agent.openai import OpenAIAgent\nfrom llama_index.core.tools import QueryEngineTool\nfrom llama_index.packs.code_hierarchy import CodeHierarchyKeywordQueryEngine\n\nquery_engine = CodeHierarchyKeywordQueryEngine(\n    nodes=split_nodes,\n)\n\ntool = QueryEngineTool.from_defaults(\n    query_engine=query_engine,\n    name=\"code_lookup\",\n    description=\"Useful for looking up information about the code hierarchy codebase.\",\n)\n\nagent = OpenAIAgent.from_tools(\n    [tool], system_prompt=query_engine.get_tool_instructions(), verbose=True\n)\n```\n\n## Adding new languages\n\nTo add a new language you need to edit `_DEFAULT_SIGNATURE_IDENTIFIERS` in `code_hierarchy.py`.\n\nThe docstrings are infomative as how you ought to do this and its nuances, it should work for most languages.\n\nPlease **test your new language** by adding a new file to `tests/file/code/` and testing all your edge cases.\n\nPeople often ask \"how do I find the Node Types I need for a new language?\" The best way is to use breakpoints.\nI have added a comment `TIP: This is a wonderful place to put a debug breakpoint` in the `code_hierarchy.py` file, put a breakpoint there, input some code in the desired language, and step through it to find the name\nof the node you want to capture.\n\nThe code as it is should handle any language which:\n\n1. expects you to indent deeper scopes\n2. has a way to comment, either full line or between delimiters\n\n## Future\n\nI'm considering adding all the languages from [aider](https://github.com/paul-gauthier/aider/tree/main/aider/queries)\nby incorporating `.scm` files instead of `_SignatureCaptureType`, `_SignatureCaptureOptions`, and `_DEFAULT_SIGNATURE_IDENTIFIERS`\n\n## Contributing\n\nYou will need to set your `OPENAI_API_KEY` in your env to run the notebook or test the pack.\n\nYou can run tests with `pytest tests` in the root directory of this pack.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A node parser which can create a hierarchy of all code scopes in a directory.",
    "version": "0.1.5",
    "project_urls": null,
    "split_keywords": [
        "c",
        " code",
        " cpp",
        " hierarchy",
        " html",
        " javascript",
        " python",
        " repo",
        " typescript"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6c5318a2ece37b331a3f29ef3c81539b475667f7b7598e06735e238aaefbafea",
                "md5": "9d0ece5d476ae2fd85620b3bc755dbfc",
                "sha256": "e08f436190ac780d9f0c45b3d98830abc37d031c8765876c994e822f054df414"
            },
            "downloads": -1,
            "filename": "llama_index_packs_code_hierarchy-0.1.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9d0ece5d476ae2fd85620b3bc755dbfc",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.12,>=3.8.1",
            "size": 14308,
            "upload_time": "2024-04-22T02:41:30",
            "upload_time_iso_8601": "2024-04-22T02:41:30.847137Z",
            "url": "https://files.pythonhosted.org/packages/6c/53/18a2ece37b331a3f29ef3c81539b475667f7b7598e06735e238aaefbafea/llama_index_packs_code_hierarchy-0.1.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a7ac3f1c4a48f1e186fa6b956c22e688e3b7a578434d3f298f8080e8cd9fec3d",
                "md5": "c505e32fe8607acf07fbbebc053357af",
                "sha256": "7d038d856606cace2dab5e4ce5eee12d0a85f5297499015c706da0595318b15e"
            },
            "downloads": -1,
            "filename": "llama_index_packs_code_hierarchy-0.1.5.tar.gz",
            "has_sig": false,
            "md5_digest": "c505e32fe8607acf07fbbebc053357af",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.12,>=3.8.1",
            "size": 14313,
            "upload_time": "2024-04-22T02:41:32",
            "upload_time_iso_8601": "2024-04-22T02:41:32.442075Z",
            "url": "https://files.pythonhosted.org/packages/a7/ac/3f1c4a48f1e186fa6b956c22e688e3b7a578434d3f298f8080e8cd9fec3d/llama_index_packs_code_hierarchy-0.1.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-22 02:41:32",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "llama-index-packs-code-hierarchy"
}

Ryan Peach