pyragify

Name	pyragify JSON
Version	0.1.0 JSON
	download
home_page	None
Summary	A tool for processing code repositories into semantic chunks for analysis with LLMs, especiallyNotebookLM.
upload_time	2024-12-14 18:43:35
maintainer	None
docs_url	None
author	ThomasBury
requires_python	>=3.9
license	The Unlicense
keywords	chunking code-processing notebooklm repository-analysis
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

# pyragify

**pyragify** is a Python-based tool designed to process python code repositories and extract their content into semantic chunks for analysis. It supports Python files, Markdown files, and other common file types. The extracted content is saved in plain text format for compatibility with tools like `NotebookLM`.

---

## Features

- **Semantic Chunking**: Extracts functions, classes, and inline comments from Python files, as well as headers and sections from Markdown files.
- **Supported Formats**: Outputs `.txt` files for compatibility with NotebookLM.
- **Flexible Configuration**: Configure processing options via a YAML file or command-line arguments.
- **File Skipping**: Respects `.gitignore` and `.dockerignore` patterns and allows custom skip patterns.
- **Word Limit**: Automatically chunks output files based on a configurable word limit.

---

## Installation

If you are using `uv`

```bash
uv pip install pyragify
```

To install pyragify, use `pip`:

```bash
pip install pyragify
```

---

## Usage

### Best Practice: Run with `uv`

Using `uv` ensures consistent dependency management and reproducibility. First, make sure you have `uv` installed:

```bash
pip install uv
```

Then, run pyragify using `uv`:

```bash
uv run python -m pyragify --config-file config.yaml
```

This ensures your environment is properly isolated and consistent.

---

### Chat With Your Code-Base

Head over []() and input the text file you will find under `[...]/output/remaining/chunk_0.txt` and drop it in a new notebook.

You can now ask questions, with precise citations. You can even generate a podcast.

![code_chat](chat_code_base.png "Chat with your code base")

### Command-Line Interface (CLI)

If you prefer to run pyragify directly without `uv`, use the following command:

```bash
python -m pyragify.cli process-repo
```

### Arguments and Options

- **`--config-file`** (default: `config.yaml`): Path to the YAML configuration file.
- **`--repo-path`**: Override the path to the repository to process.
- **`--output-dir`**: Override the directory where output files will be saved.
- **`--max-words`**: Override the maximum number of words per output file.
- **`--max-file-size`**: Override the maximum size (in bytes) of files to process.
- **`--skip-patterns`**: Override the list of file patterns to skip.
- **`--skip-dirs`**: Override the list of directories to skip.
- **`--verbose`**: Enable verbose logging for debugging purposes.

---

## Configuration

The tool can be configured using a YAML file (default: `config.yaml`). Here is an example configuration:

```yaml
repo_path: /path/to/repository
output_dir: /path/to/output
max_words: 200000
max_file_size: 10485760 # 10 MB
skip_patterns:
- "*.log"
- "*.tmp"
skip_dirs:
- "__pycache__"
- "node_modules"
verbose: false
```

Command-line arguments override the settings in the YAML file.

---

## Example Workflow

### 1. Prepare Your Repository

Ensure your repository contains the code you want to process. Add any files or directories you want to exclude to `.gitignore` or `.dockerignore`.

### 2. Configure pyragify

Create a `config.yaml` file with your desired settings or use the default settings.

### 3. Process the Repository

Run the following command with `uv` for the best practice:

```bash
uv run python -m pyragify --config-file config.yaml
```

Alternatively, use the CLI directly:

```bash
python -m pyragify.cli process-repo --repo-path /path/to/repository --output-dir /path/to/output
```

### 4. Check the Output

The processed content will be saved in the specified output directory, organized into subdirectories like `python` and `markdown`.

---

## Examples

### Process a Repository with Default Settings

```bash
uv run python -m pyragify --config-file config.yaml
```

### Process a Specific Repository with Custom Settings

```bash
uv run python -m pyragify.cli process-repo \
--repo-path /my/repo \
--output-dir /my/output \
--max-words 100000 \
--max-file-size 5242880 \
--skip-patterns "*.log,*.tmp" \
--skip-dirs "__pycache__,node_modules" \
--verbose
```

---

## File Outputs

The processed content is saved in `.txt` format and categorized into subdirectories based on file type:

- **`python/`**: Contains chunks of Python functions and classes with their code.
- **`markdown/`**: Contains sections of Markdown files, split by headers.
- **`other/`**: Contains plain text versions of unsupported file types.

---

## Advanced Features

### Respecting `.gitignore` and `.dockerignore`

pyragify automatically skips files and directories listed in `.gitignore` and `.dockerignore` if they are present in the repository.

### Incremental Processing

pyragify uses MD5 hashes to skip unchanged files during subsequent runs.

---

## Development

To contribute to pyragify:

1. Clone the repository:
```bash
git clone https://github.com/your-repo/pyragify.git
cd pyragify
```

2. Install dependencies:
```bash
pip install -r requirements.txt
```

3. Run tests:
TODO: write test suite 😅
```bash
pytest
```

---

## License

This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.

---

## Support

For issues or feature requests, please create a GitHub issue in the repository or contact the maintainers.

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "pyragify",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "chunking, code-processing, notebookLM, repository-analysis",
    "author": "ThomasBury",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/d3/d5/09ee446a2da7aac96e0eb75cf830cee09259781f7944cb631d5d57c6009a/pyragify-0.1.0.tar.gz",
    "platform": null,
    "description": "# pyragify\n\n**pyragify** is a Python-based tool designed to process python code repositories and extract their content into semantic chunks for analysis. It supports Python files, Markdown files, and other common file types. The extracted content is saved in plain text format for compatibility with tools like `NotebookLM`.\n\n---\n\n## Features\n\n- **Semantic Chunking**: Extracts functions, classes, and inline comments from Python files, as well as headers and sections from Markdown files.\n- **Supported Formats**: Outputs `.txt` files for compatibility with NotebookLM.\n- **Flexible Configuration**: Configure processing options via a YAML file or command-line arguments.\n- **File Skipping**: Respects `.gitignore` and `.dockerignore` patterns and allows custom skip patterns.\n- **Word Limit**: Automatically chunks output files based on a configurable word limit.\n\n---\n\n## Installation\n\nIf you are using `uv`\n\n```bash\nuv pip install pyragify\n```\n\nTo install pyragify, use `pip`:\n\n```bash\npip install pyragify\n```\n\n---\n\n## Usage\n\n### Best Practice: Run with `uv`\n\nUsing `uv` ensures consistent dependency management and reproducibility. First, make sure you have `uv` installed:\n\n```bash\npip install uv\n```\n\nThen, run pyragify using `uv`:\n\n```bash\nuv run python -m pyragify --config-file config.yaml\n```\n\nThis ensures your environment is properly isolated and consistent.\n\n---\n\n### Chat With Your Code-Base\n\nHead over []() and input the text file you will find under `[...]/output/remaining/chunk_0.txt` and drop it in a new notebook.\n\nYou can now ask questions, with precise citations. You can even generate a podcast.\n\n![code_chat](chat_code_base.png \"Chat with your code base\")\n\n\n### Command-Line Interface (CLI)\n\nIf you prefer to run pyragify directly without `uv`, use the following command:\n\n```bash\npython -m pyragify.cli process-repo\n```\n\n### Arguments and Options\n\n- **`--config-file`** (default: `config.yaml`): Path to the YAML configuration file.\n- **`--repo-path`**: Override the path to the repository to process.\n- **`--output-dir`**: Override the directory where output files will be saved.\n- **`--max-words`**: Override the maximum number of words per output file.\n- **`--max-file-size`**: Override the maximum size (in bytes) of files to process.\n- **`--skip-patterns`**: Override the list of file patterns to skip.\n- **`--skip-dirs`**: Override the list of directories to skip.\n- **`--verbose`**: Enable verbose logging for debugging purposes.\n\n---\n\n## Configuration\n\nThe tool can be configured using a YAML file (default: `config.yaml`). Here is an example configuration:\n\n```yaml\nrepo_path: /path/to/repository\noutput_dir: /path/to/output\nmax_words: 200000\nmax_file_size: 10485760  # 10 MB\nskip_patterns:\n  - \"*.log\"\n  - \"*.tmp\"\nskip_dirs:\n  - \"__pycache__\"\n  - \"node_modules\"\nverbose: false\n```\n\nCommand-line arguments override the settings in the YAML file.\n\n---\n\n## Example Workflow\n\n### 1. Prepare Your Repository\n\nEnsure your repository contains the code you want to process. Add any files or directories you want to exclude to `.gitignore` or `.dockerignore`.\n\n### 2. Configure pyragify\n\nCreate a `config.yaml` file with your desired settings or use the default settings.\n\n### 3. Process the Repository\n\nRun the following command with `uv` for the best practice:\n\n```bash\nuv run python -m pyragify --config-file config.yaml\n```\n\nAlternatively, use the CLI directly:\n\n```bash\npython -m pyragify.cli process-repo --repo-path /path/to/repository --output-dir /path/to/output\n```\n\n### 4. Check the Output\n\nThe processed content will be saved in the specified output directory, organized into subdirectories like `python` and `markdown`.\n\n---\n\n## Examples\n\n### Process a Repository with Default Settings\n\n```bash\nuv run python -m pyragify --config-file config.yaml\n```\n\n### Process a Specific Repository with Custom Settings\n\n```bash\nuv run python -m pyragify.cli process-repo \\\n  --repo-path /my/repo \\\n  --output-dir /my/output \\\n  --max-words 100000 \\\n  --max-file-size 5242880 \\\n  --skip-patterns \"*.log,*.tmp\" \\\n  --skip-dirs \"__pycache__,node_modules\" \\\n  --verbose\n```\n\n---\n\n## File Outputs\n\nThe processed content is saved in `.txt` format and categorized into subdirectories based on file type:\n\n- **`python/`**: Contains chunks of Python functions and classes with their code.\n- **`markdown/`**: Contains sections of Markdown files, split by headers.\n- **`other/`**: Contains plain text versions of unsupported file types.\n\n---\n\n## Advanced Features\n\n### Respecting `.gitignore` and `.dockerignore`\n\npyragify automatically skips files and directories listed in `.gitignore` and `.dockerignore` if they are present in the repository.\n\n### Incremental Processing\n\npyragify uses MD5 hashes to skip unchanged files during subsequent runs.\n\n---\n\n## Development\n\nTo contribute to pyragify:\n\n1. Clone the repository:\n   ```bash\n   git clone https://github.com/your-repo/pyragify.git\n   cd pyragify\n   ```\n\n2. Install dependencies:\n   ```bash\n   pip install -r requirements.txt\n   ```\n\n3. Run tests:\nTODO: write test suite \ud83d\ude05\n   ```bash\n   pytest\n   ```\n\n---\n\n## License\n\nThis project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.\n\n---\n\n## Support\n\nFor issues or feature requests, please create a GitHub issue in the repository or contact the maintainers.\n",
    "bugtrack_url": null,
    "license": "The Unlicense",
    "summary": "A tool for processing code repositories into semantic chunks for analysis with LLMs, especiallyNotebookLM.",
    "version": "0.1.0",
    "project_urls": {
        "homepage": "https://github.com/ThomasBury/pyragify",
        "repository": "https://github.com/ThomasBury/pyragify"
    },
    "split_keywords": [
        "chunking",
        " code-processing",
        " notebooklm",
        " repository-analysis"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "bdd76a53f2d79c19de37eeb63f4c6dd09fbeb3a9de2a439f853a32b079fae26a",
                "md5": "35bff133bdc0f75c9eaa75d1176ef0bc",
                "sha256": "3e291d6c8fdf4a953b9b930c9c08e9f158cc7655299bb13ab004d28e2ed75015"
            },
            "downloads": -1,
            "filename": "pyragify-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "35bff133bdc0f75c9eaa75d1176ef0bc",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 13742,
            "upload_time": "2024-12-14T18:43:32",
            "upload_time_iso_8601": "2024-12-14T18:43:32.324248Z",
            "url": "https://files.pythonhosted.org/packages/bd/d7/6a53f2d79c19de37eeb63f4c6dd09fbeb3a9de2a439f853a32b079fae26a/pyragify-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d3d509ee446a2da7aac96e0eb75cf830cee09259781f7944cb631d5d57c6009a",
                "md5": "4c1c048ca13e96e93087b013b12ff537",
                "sha256": "2beab912b78610486e0e87fc2a7b6437c49fb10f6706fa5c378856b3970ff6f8"
            },
            "downloads": -1,
            "filename": "pyragify-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "4c1c048ca13e96e93087b013b12ff537",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 157103,
            "upload_time": "2024-12-14T18:43:35",
            "upload_time_iso_8601": "2024-12-14T18:43:35.282187Z",
            "url": "https://files.pythonhosted.org/packages/d3/d5/09ee446a2da7aac96e0eb75cf830cee09259781f7944cb631d5d57c6009a/pyragify-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-14 18:43:35",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "ThomasBury",
    "github_project": "pyragify",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "pyragify"
}

ThomasBury