llama-index-readers-docstring-walker

Name	llama-index-readers-docstring-walker JSON
Version	0.1.4 JSON
	download
home_page	None
Summary	llama-index readers docstring_walker integration
upload_time	2024-05-02 17:09:52
maintainer	Filip Wojcik
docs_url	None
author	Your Name
requires_python	<4.0,>=3.8.1
license	MIT
keywords	code docstring python source code
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Intro

Very often you have a large code base, with a rich docstrings and comments, that you would like to use to produce documentation. In fact, many open-source libraries like Scikit-learn or PyTorch have docstring so rich, that they contain LaTeX equations, or detailed examples.

At the same time, sometimes LLMs are used to read the full code from a repository, which can cost you many tokens, time and computational power.

DocstringWalker tries to find a sweet spot between these two approaches. You can use it to:

1. Parse all docstrings from modules, classes, and functions in your local code directory.
2. Convert them do LlamaIndex Documents.
3. Feed into LLM of your choice to produce a code-buddy chatbot or generate documentation.
   DocstringWalker utilizes only AST module, to process the code.

**With this tool, you can analyze only docstrings from the code, without the need to use tokens for the code itself.**

# Usage

Simply create a DocstringWalker and point it to the directory with the code. The class takes the following parameters:

1. Ignore **init**.py files - should **init**.py files be skipped? In some projects, they are not used at all, while in others they contain valuable info.
2. Fail on error - AST will throw SyntaxError when parsing a malformed file. Should this raise an exception for the whole process, or be ignored?

# Examples

Below you can find examples of using DocstringWalker.

## Example 1 - check Docstring Walker itself

Let's start by using it.... on itself :) We will see what information gets extracted from the module.

```python
# Step 1 - create docstring walker
walker = DocstringWalker()

# Step 2 - provide a path to... this directory :)
example1_docs = walker.load_data(docstring_walker_dir)

# Let's check docs content
print(example1_docs)

"""
[Document(id_=..., embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash=..., text="Module name: base \n Docstring: None...") ]
"""

# We can print the text of document
print(example1_docs[0].text[:500])

"""
Module name: base
Docstring: None
Class name: DocstringWalker
Docstring: A loader for docstring extraction and building structured documents from them.
Recursively walks a directory and extracts docstrings from each Python module - starting from the module
itself, then classes, then functions. Builds a graph of dependencies between the extracted docstrings.

Function name: load_data, In: DocstringWalker
Docstring: Load data from the specified code directory.
Additionally, after loading t
 """

# Step 3: Feed documents into Llama Index
example1_index = VectorStoreIndex(
    example1_docs, service_context=service_context
)

# Step 4: Query the index
example1_qe = example1_index.as_query_engine(service_context=service_context)


# Step 5: And start querying the index
print(
    example1_qe.query(
        "What are the main functions used by DocstringWalker? Describe each one in points."
    ).response
)

"""
1. load_data: This function loads data from a specified code directory and builds a dependency graph between the loaded documents. The graph is stored as an attribute of the class.

2. process_directory: This function processes a directory and extracts information from Python files. It returns a tuple containing a list of Document objects and a networkx Graph object. The Document objects represent the extracted information from Python files, and the Graph object represents the dependency graph between the extracted documents.

3. read_module_text: This function reads the text of a Python module given its path and returns the text of the module.

4. parse_module: This function parses a single Python module and returns a Document object with extracted information from the module.

5. process_class: This function processes a class node in the AST and adds relevant information to the graph. It returns a string representation of the processed class node and its sub-elements.

6. process_function: This function processes a function node in the AST and adds it to the graph. It returns a string representation of the processed function node with its sub-elements.

7. process_elem: This is a generic function that processes an element in the abstract syntax tree (AST) and delegates the execution to more specific functions based on the type of the element. It returns the result of processing the element.
"""
```

# Example 2 - check some arbitrarily selected module

Now we can check how to apply DocstringWalker to some files under an arbitrary directory. Let's use the code from the PyTorchGeometric KGE (Knowledge Graphs Embedding) directory.
You can find its original documentation and classes here: https://pytorch-geometric.readthedocs.io/en/latest/modules/nn.html#kge-models

We import the module and use its filepath directly.

```python
import os
from torch_geometric.nn import kge

# Step 1 - get path to module
module_path = os.path.dirname(kge.__file__)

# Step 2 - get the docs
example2_docs = walker.load_data(module_path)

# Step 3 - feed into Llama Index
example2_index = SummaryIndex.from_documents(
    example2_docs, service_context=service_context
)
example2_qe = example2_index.as_query_engine()

# Step 4 - query docstrings
print(
    example2_qe.query(
        "What classes are available and what is their main purpose? Use nested numbered list to describe: the class name, short summary of purpose, papers or literature review for each one of them."
    ).response
)


"""
1. DistMult
   - Purpose: Models relations as diagonal matrices, simplifying the bi-linear interaction between head and tail entities.
   - Paper: "Embedding Entities and Relations for Learning and Inference in Knowledge Bases" (https://arxiv.org/abs/1412.6575)

2. RotatE
   - Purpose: Models relations as a rotation in complex space from head to tail entities.
   - Paper: "RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space" (https://arxiv.org/abs/1902.10197)

3. TransE
   - Purpose: Models relations as a translation from head to tail entities.
   - Paper: "Translating Embeddings for Modeling Multi-Relational Data" (https://proceedings.neurips.cc/paper/2013/file/1cecc7a77928ca8133fa24680a88d2f9-Paper.pdf)

4. KGEModel
   - Purpose: An abstract base class for implementing custom KGE models.

5. ComplEx
   - Purpose: Models relations as complex-valued bilinear mappings between head and tail entities using the Hermetian dot product.
   - Paper: "Complex Embeddings for Simple Link Prediction" (https://arxiv.org/abs/1606.06357)
"""
```

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "llama-index-readers-docstring-walker",
    "maintainer": "Filip Wojcik",
    "docs_url": null,
    "requires_python": "<4.0,>=3.8.1",
    "maintainer_email": null,
    "keywords": "code, docstring, python, source code",
    "author": "Your Name",
    "author_email": "you@example.com",
    "download_url": "https://files.pythonhosted.org/packages/88/ae/454fe5ff4d5630e01077aede8a3011439896194ab46ab65e5cdce8ca365f/llama_index_readers_docstring_walker-0.1.4.tar.gz",
    "platform": null,
    "description": "# Intro\n\nVery often you have a large code base, with a rich docstrings and comments, that you would like to use to produce documentation. In fact, many open-source libraries like Scikit-learn or PyTorch have docstring so rich, that they contain LaTeX equations, or detailed examples.\n\nAt the same time, sometimes LLMs are used to read the full code from a repository, which can cost you many tokens, time and computational power.\n\nDocstringWalker tries to find a sweet spot between these two approaches. You can use it to:\n\n1. Parse all docstrings from modules, classes, and functions in your local code directory.\n2. Convert them do LlamaIndex Documents.\n3. Feed into LLM of your choice to produce a code-buddy chatbot or generate documentation.\n   DocstringWalker utilizes only AST module, to process the code.\n\n**With this tool, you can analyze only docstrings from the code, without the need to use tokens for the code itself.**\n\n# Usage\n\nSimply create a DocstringWalker and point it to the directory with the code. The class takes the following parameters:\n\n1. Ignore **init**.py files - should **init**.py files be skipped? In some projects, they are not used at all, while in others they contain valuable info.\n2. Fail on error - AST will throw SyntaxError when parsing a malformed file. Should this raise an exception for the whole process, or be ignored?\n\n# Examples\n\nBelow you can find examples of using DocstringWalker.\n\n## Example 1 - check Docstring Walker itself\n\nLet's start by using it.... on itself :) We will see what information gets extracted from the module.\n\n```python\n# Step 1 - create docstring walker\nwalker = DocstringWalker()\n\n# Step 2 - provide a path to... this directory :)\nexample1_docs = walker.load_data(docstring_walker_dir)\n\n# Let's check docs content\nprint(example1_docs)\n\n\"\"\"\n[Document(id_=..., embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash=..., text=\"Module name: base \\n Docstring: None...\") ]\n\"\"\"\n\n# We can print the text of document\nprint(example1_docs[0].text[:500])\n\n\"\"\"\nModule name: base\nDocstring: None\nClass name: DocstringWalker\nDocstring: A loader for docstring extraction and building structured documents from them.\nRecursively walks a directory and extracts docstrings from each Python module - starting from the module\nitself, then classes, then functions. Builds a graph of dependencies between the extracted docstrings.\n\nFunction name: load_data, In: DocstringWalker\nDocstring: Load data from the specified code directory.\nAdditionally, after loading t\n \"\"\"\n\n# Step 3: Feed documents into Llama Index\nexample1_index = VectorStoreIndex(\n    example1_docs, service_context=service_context\n)\n\n# Step 4: Query the index\nexample1_qe = example1_index.as_query_engine(service_context=service_context)\n\n\n# Step 5: And start querying the index\nprint(\n    example1_qe.query(\n        \"What are the main functions used by DocstringWalker? Describe each one in points.\"\n    ).response\n)\n\n\"\"\"\n1. load_data: This function loads data from a specified code directory and builds a dependency graph between the loaded documents. The graph is stored as an attribute of the class.\n\n2. process_directory: This function processes a directory and extracts information from Python files. It returns a tuple containing a list of Document objects and a networkx Graph object. The Document objects represent the extracted information from Python files, and the Graph object represents the dependency graph between the extracted documents.\n\n3. read_module_text: This function reads the text of a Python module given its path and returns the text of the module.\n\n4. parse_module: This function parses a single Python module and returns a Document object with extracted information from the module.\n\n5. process_class: This function processes a class node in the AST and adds relevant information to the graph. It returns a string representation of the processed class node and its sub-elements.\n\n6. process_function: This function processes a function node in the AST and adds it to the graph. It returns a string representation of the processed function node with its sub-elements.\n\n7. process_elem: This is a generic function that processes an element in the abstract syntax tree (AST) and delegates the execution to more specific functions based on the type of the element. It returns the result of processing the element.\n\"\"\"\n```\n\n# Example 2 - check some arbitrarily selected module\n\nNow we can check how to apply DocstringWalker to some files under an arbitrary directory. Let's use the code from the PyTorchGeometric KGE (Knowledge Graphs Embedding) directory.\nYou can find its original documentation and classes here: https://pytorch-geometric.readthedocs.io/en/latest/modules/nn.html#kge-models\n\nWe import the module and use its filepath directly.\n\n```python\nimport os\nfrom torch_geometric.nn import kge\n\n# Step 1 - get path to module\nmodule_path = os.path.dirname(kge.__file__)\n\n# Step 2 - get the docs\nexample2_docs = walker.load_data(module_path)\n\n# Step 3 - feed into Llama Index\nexample2_index = SummaryIndex.from_documents(\n    example2_docs, service_context=service_context\n)\nexample2_qe = example2_index.as_query_engine()\n\n# Step 4 - query docstrings\nprint(\n    example2_qe.query(\n        \"What classes are available and what is their main purpose? Use nested numbered list to describe: the class name, short summary of purpose, papers or literature review for each one of them.\"\n    ).response\n)\n\n\n\"\"\"\n1. DistMult\n   - Purpose: Models relations as diagonal matrices, simplifying the bi-linear interaction between head and tail entities.\n   - Paper: \"Embedding Entities and Relations for Learning and Inference in Knowledge Bases\" (https://arxiv.org/abs/1412.6575)\n\n2. RotatE\n   - Purpose: Models relations as a rotation in complex space from head to tail entities.\n   - Paper: \"RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space\" (https://arxiv.org/abs/1902.10197)\n\n3. TransE\n   - Purpose: Models relations as a translation from head to tail entities.\n   - Paper: \"Translating Embeddings for Modeling Multi-Relational Data\" (https://proceedings.neurips.cc/paper/2013/file/1cecc7a77928ca8133fa24680a88d2f9-Paper.pdf)\n\n4. KGEModel\n   - Purpose: An abstract base class for implementing custom KGE models.\n\n5. ComplEx\n   - Purpose: Models relations as complex-valued bilinear mappings between head and tail entities using the Hermetian dot product.\n   - Paper: \"Complex Embeddings for Simple Link Prediction\" (https://arxiv.org/abs/1606.06357)\n\"\"\"\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "llama-index readers docstring_walker integration",
    "version": "0.1.4",
    "project_urls": null,
    "split_keywords": [
        "code",
        " docstring",
        " python",
        " source code"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "bb9da78f6c03cf412c63228cc4e62758e0525e4c077c406e5cc69bc2e8797717",
                "md5": "019401cfa5dfe970b3b8b7d003ee16db",
                "sha256": "dce342662ffc8b986fe64ea440e33b17575ddcd7220fc20623d297661dd76102"
            },
            "downloads": -1,
            "filename": "llama_index_readers_docstring_walker-0.1.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "019401cfa5dfe970b3b8b7d003ee16db",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.8.1",
            "size": 6251,
            "upload_time": "2024-05-02T17:09:50",
            "upload_time_iso_8601": "2024-05-02T17:09:50.845022Z",
            "url": "https://files.pythonhosted.org/packages/bb/9d/a78f6c03cf412c63228cc4e62758e0525e4c077c406e5cc69bc2e8797717/llama_index_readers_docstring_walker-0.1.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "88ae454fe5ff4d5630e01077aede8a3011439896194ab46ab65e5cdce8ca365f",
                "md5": "654accae5421ffa1fd367f9443cf416a",
                "sha256": "d2cce35e93ebf14b51fe5e53240eff4a3bcdd8f353e12790fcf2c16855e4c41f"
            },
            "downloads": -1,
            "filename": "llama_index_readers_docstring_walker-0.1.4.tar.gz",
            "has_sig": false,
            "md5_digest": "654accae5421ffa1fd367f9443cf416a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.8.1",
            "size": 5616,
            "upload_time": "2024-05-02T17:09:52",
            "upload_time_iso_8601": "2024-05-02T17:09:52.640945Z",
            "url": "https://files.pythonhosted.org/packages/88/ae/454fe5ff4d5630e01077aede8a3011439896194ab46ab65e5cdce8ca365f/llama_index_readers_docstring_walker-0.1.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-05-02 17:09:52",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "llama-index-readers-docstring-walker"
}

Your Name