# pyragify
**pyragify** is a Python-based tool designed to process python code repositories and extract their content into semantic chunks for analysis. It supports Python files, Markdown files, and other common file types. The extracted content is saved in plain text format for compatibility with tools like `NotebookLM`.
---
## Features
- **Semantic Chunking**: Extracts functions, classes, and inline comments from Python files, as well as headers and sections from Markdown files.
- **Supported Formats**: Outputs `.txt` files for compatibility with NotebookLM.
- **Flexible Configuration**: Configure processing options via a YAML file or command-line arguments.
- **File Skipping**: Respects `.gitignore` and `.dockerignore` patterns and allows custom skip patterns.
- **Word Limit**: Automatically chunks output files based on a configurable word limit.
---
## Installation
If you are using `uv`
```bash
uv pip install pyragify
```
To install pyragify, use `pip`:
```bash
pip install pyragify
```
---
## Usage
### Best Practice: Run with `uv`
Using `uv` ensures consistent dependency management and reproducibility. First, make sure you have `uv` installed:
```bash
pip install uv
```
Then, run pyragify using `uv`:
```bash
uv run python -m pyragify --config-file config.yaml
```
This ensures your environment is properly isolated and consistent.
---
### Chat With Your Code-Base
Head over []() and input the text file you will find under `[...]/output/remaining/chunk_0.txt` and drop it in a new notebook.
You can now ask questions, with precise citations. You can even generate a podcast.

### Command-Line Interface (CLI)
If you prefer to run pyragify directly without `uv`, use the following command:
```bash
python -m pyragify.cli process-repo
```
### Arguments and Options
- **`--config-file`** (default: `config.yaml`): Path to the YAML configuration file.
- **`--repo-path`**: Override the path to the repository to process.
- **`--output-dir`**: Override the directory where output files will be saved.
- **`--max-words`**: Override the maximum number of words per output file.
- **`--max-file-size`**: Override the maximum size (in bytes) of files to process.
- **`--skip-patterns`**: Override the list of file patterns to skip.
- **`--skip-dirs`**: Override the list of directories to skip.
- **`--verbose`**: Enable verbose logging for debugging purposes.
---
## Configuration
The tool can be configured using a YAML file (default: `config.yaml`). Here is an example configuration:
```yaml
repo_path: /path/to/repository
output_dir: /path/to/output
max_words: 200000
max_file_size: 10485760 # 10 MB
skip_patterns:
- "*.log"
- "*.tmp"
skip_dirs:
- "__pycache__"
- "node_modules"
verbose: false
```
Command-line arguments override the settings in the YAML file.
---
## Example Workflow
### 1. Prepare Your Repository
Ensure your repository contains the code you want to process. Add any files or directories you want to exclude to `.gitignore` or `.dockerignore`.
### 2. Configure pyragify
Create a `config.yaml` file with your desired settings or use the default settings.
### 3. Process the Repository
Run the following command with `uv` for the best practice:
```bash
uv run python -m pyragify --config-file config.yaml
```
Alternatively, use the CLI directly:
```bash
python -m pyragify.cli process-repo --repo-path /path/to/repository --output-dir /path/to/output
```
### 4. Check the Output
The processed content will be saved in the specified output directory, organized into subdirectories like `python` and `markdown`.
---
## Examples
### Process a Repository with Default Settings
```bash
uv run python -m pyragify --config-file config.yaml
```
### Process a Specific Repository with Custom Settings
```bash
uv run python -m pyragify.cli process-repo \
--repo-path /my/repo \
--output-dir /my/output \
--max-words 100000 \
--max-file-size 5242880 \
--skip-patterns "*.log,*.tmp" \
--skip-dirs "__pycache__,node_modules" \
--verbose
```
---
## File Outputs
The processed content is saved in `.txt` format and categorized into subdirectories based on file type:
- **`python/`**: Contains chunks of Python functions and classes with their code.
- **`markdown/`**: Contains sections of Markdown files, split by headers.
- **`other/`**: Contains plain text versions of unsupported file types.
---
## Advanced Features
### Respecting `.gitignore` and `.dockerignore`
pyragify automatically skips files and directories listed in `.gitignore` and `.dockerignore` if they are present in the repository.
### Incremental Processing
pyragify uses MD5 hashes to skip unchanged files during subsequent runs.
---
## Development
To contribute to pyragify:
1. Clone the repository:
```bash
git clone https://github.com/your-repo/pyragify.git
cd pyragify
```
2. Install dependencies:
```bash
pip install -r requirements.txt
```
3. Run tests:
TODO: write test suite 😅
```bash
pytest
```
---
## License
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.
---
## Support
For issues or feature requests, please create a GitHub issue in the repository or contact the maintainers.
Raw data
{
"_id": null,
"home_page": null,
"name": "pyragify",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "chunking, code-processing, notebookLM, repository-analysis",
"author": "ThomasBury",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/d3/d5/09ee446a2da7aac96e0eb75cf830cee09259781f7944cb631d5d57c6009a/pyragify-0.1.0.tar.gz",
"platform": null,
"description": "# pyragify\n\n**pyragify** is a Python-based tool designed to process python code repositories and extract their content into semantic chunks for analysis. It supports Python files, Markdown files, and other common file types. The extracted content is saved in plain text format for compatibility with tools like `NotebookLM`.\n\n---\n\n## Features\n\n- **Semantic Chunking**: Extracts functions, classes, and inline comments from Python files, as well as headers and sections from Markdown files.\n- **Supported Formats**: Outputs `.txt` files for compatibility with NotebookLM.\n- **Flexible Configuration**: Configure processing options via a YAML file or command-line arguments.\n- **File Skipping**: Respects `.gitignore` and `.dockerignore` patterns and allows custom skip patterns.\n- **Word Limit**: Automatically chunks output files based on a configurable word limit.\n\n---\n\n## Installation\n\nIf you are using `uv`\n\n```bash\nuv pip install pyragify\n```\n\nTo install pyragify, use `pip`:\n\n```bash\npip install pyragify\n```\n\n---\n\n## Usage\n\n### Best Practice: Run with `uv`\n\nUsing `uv` ensures consistent dependency management and reproducibility. First, make sure you have `uv` installed:\n\n```bash\npip install uv\n```\n\nThen, run pyragify using `uv`:\n\n```bash\nuv run python -m pyragify --config-file config.yaml\n```\n\nThis ensures your environment is properly isolated and consistent.\n\n---\n\n### Chat With Your Code-Base\n\nHead over []() and input the text file you will find under `[...]/output/remaining/chunk_0.txt` and drop it in a new notebook.\n\nYou can now ask questions, with precise citations. You can even generate a podcast.\n\n\n\n\n### Command-Line Interface (CLI)\n\nIf you prefer to run pyragify directly without `uv`, use the following command:\n\n```bash\npython -m pyragify.cli process-repo\n```\n\n### Arguments and Options\n\n- **`--config-file`** (default: `config.yaml`): Path to the YAML configuration file.\n- **`--repo-path`**: Override the path to the repository to process.\n- **`--output-dir`**: Override the directory where output files will be saved.\n- **`--max-words`**: Override the maximum number of words per output file.\n- **`--max-file-size`**: Override the maximum size (in bytes) of files to process.\n- **`--skip-patterns`**: Override the list of file patterns to skip.\n- **`--skip-dirs`**: Override the list of directories to skip.\n- **`--verbose`**: Enable verbose logging for debugging purposes.\n\n---\n\n## Configuration\n\nThe tool can be configured using a YAML file (default: `config.yaml`). Here is an example configuration:\n\n```yaml\nrepo_path: /path/to/repository\noutput_dir: /path/to/output\nmax_words: 200000\nmax_file_size: 10485760 # 10 MB\nskip_patterns:\n - \"*.log\"\n - \"*.tmp\"\nskip_dirs:\n - \"__pycache__\"\n - \"node_modules\"\nverbose: false\n```\n\nCommand-line arguments override the settings in the YAML file.\n\n---\n\n## Example Workflow\n\n### 1. Prepare Your Repository\n\nEnsure your repository contains the code you want to process. Add any files or directories you want to exclude to `.gitignore` or `.dockerignore`.\n\n### 2. Configure pyragify\n\nCreate a `config.yaml` file with your desired settings or use the default settings.\n\n### 3. Process the Repository\n\nRun the following command with `uv` for the best practice:\n\n```bash\nuv run python -m pyragify --config-file config.yaml\n```\n\nAlternatively, use the CLI directly:\n\n```bash\npython -m pyragify.cli process-repo --repo-path /path/to/repository --output-dir /path/to/output\n```\n\n### 4. Check the Output\n\nThe processed content will be saved in the specified output directory, organized into subdirectories like `python` and `markdown`.\n\n---\n\n## Examples\n\n### Process a Repository with Default Settings\n\n```bash\nuv run python -m pyragify --config-file config.yaml\n```\n\n### Process a Specific Repository with Custom Settings\n\n```bash\nuv run python -m pyragify.cli process-repo \\\n --repo-path /my/repo \\\n --output-dir /my/output \\\n --max-words 100000 \\\n --max-file-size 5242880 \\\n --skip-patterns \"*.log,*.tmp\" \\\n --skip-dirs \"__pycache__,node_modules\" \\\n --verbose\n```\n\n---\n\n## File Outputs\n\nThe processed content is saved in `.txt` format and categorized into subdirectories based on file type:\n\n- **`python/`**: Contains chunks of Python functions and classes with their code.\n- **`markdown/`**: Contains sections of Markdown files, split by headers.\n- **`other/`**: Contains plain text versions of unsupported file types.\n\n---\n\n## Advanced Features\n\n### Respecting `.gitignore` and `.dockerignore`\n\npyragify automatically skips files and directories listed in `.gitignore` and `.dockerignore` if they are present in the repository.\n\n### Incremental Processing\n\npyragify uses MD5 hashes to skip unchanged files during subsequent runs.\n\n---\n\n## Development\n\nTo contribute to pyragify:\n\n1. Clone the repository:\n ```bash\n git clone https://github.com/your-repo/pyragify.git\n cd pyragify\n ```\n\n2. Install dependencies:\n ```bash\n pip install -r requirements.txt\n ```\n\n3. Run tests:\nTODO: write test suite \ud83d\ude05\n ```bash\n pytest\n ```\n\n---\n\n## License\n\nThis project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.\n\n---\n\n## Support\n\nFor issues or feature requests, please create a GitHub issue in the repository or contact the maintainers.\n",
"bugtrack_url": null,
"license": "The Unlicense",
"summary": "A tool for processing code repositories into semantic chunks for analysis with LLMs, especiallyNotebookLM.",
"version": "0.1.0",
"project_urls": {
"homepage": "https://github.com/ThomasBury/pyragify",
"repository": "https://github.com/ThomasBury/pyragify"
},
"split_keywords": [
"chunking",
" code-processing",
" notebooklm",
" repository-analysis"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "bdd76a53f2d79c19de37eeb63f4c6dd09fbeb3a9de2a439f853a32b079fae26a",
"md5": "35bff133bdc0f75c9eaa75d1176ef0bc",
"sha256": "3e291d6c8fdf4a953b9b930c9c08e9f158cc7655299bb13ab004d28e2ed75015"
},
"downloads": -1,
"filename": "pyragify-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "35bff133bdc0f75c9eaa75d1176ef0bc",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 13742,
"upload_time": "2024-12-14T18:43:32",
"upload_time_iso_8601": "2024-12-14T18:43:32.324248Z",
"url": "https://files.pythonhosted.org/packages/bd/d7/6a53f2d79c19de37eeb63f4c6dd09fbeb3a9de2a439f853a32b079fae26a/pyragify-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "d3d509ee446a2da7aac96e0eb75cf830cee09259781f7944cb631d5d57c6009a",
"md5": "4c1c048ca13e96e93087b013b12ff537",
"sha256": "2beab912b78610486e0e87fc2a7b6437c49fb10f6706fa5c378856b3970ff6f8"
},
"downloads": -1,
"filename": "pyragify-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "4c1c048ca13e96e93087b013b12ff537",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 157103,
"upload_time": "2024-12-14T18:43:35",
"upload_time_iso_8601": "2024-12-14T18:43:35.282187Z",
"url": "https://files.pythonhosted.org/packages/d3/d5/09ee446a2da7aac96e0eb75cf830cee09259781f7944cb631d5d57c6009a/pyragify-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-14 18:43:35",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "ThomasBury",
"github_project": "pyragify",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "pyragify"
}