codebase-to-text


Namecodebase-to-text JSON
Version 1.2 PyPI version JSON
download
home_pagehttps://github.com/QaisarRajput/codebase_to_text
SummaryA Python package to convert codebase to text
upload_time2025-07-20 20:26:41
maintainerNone
docs_urlNone
authorQaisar Tanvir
requires_pythonNone
licenseMIT
keywords codebase code conversion text conversion folder structure file contents text extraction document conversion python package github repository command-line tool code analysis file parsing code documentation formatting preservation readability
VCS
bugtrack_url
requirements python-docx gitpython
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Codebase to Text Converter

A powerful Python tool that converts codebases (folder structures with files) into a single text file or Microsoft Word document (.docx), while preserving folder structure and file contents. Perfect for AI/LLM processing, documentation generation, and code analysis.

## ✨ Features

- **Multi-source input**: Local directories and GitHub repositories
- **Flexible output**: Text files (.txt) and Microsoft Word documents (.docx)
- **Smart exclusions**: Advanced pattern matching for files and directories
- **Performance optimized**: Efficient traversal of large codebases
- **Comprehensive logging**: Detailed verbose mode for transparency
- **Encoding support**: Handles various file encodings gracefully

## 🚀 Installation

```bash
pip install codebase-to-text
```

## 📖 Usage

### Command Line Interface (CLI)

#### Basic Usage

```bash
codebase-to-text --input "path_or_github_url" --output "output_path" --output_type "txt"
```

#### Advanced Usage with Exclusions

```bash
# Exclude specific patterns
codebase-to-text --input "./my_project" --output "output.txt" --output_type "txt" --exclude "*.log,temp/,**/__pycache__/**"

# Multiple exclude arguments
codebase-to-text --input "./my_project" --output "output.txt" --output_type "txt" --exclude "*.pyc" --exclude "build/" --exclude "venv/"

# Exclude hidden files
codebase-to-text --input "./my_project" --output "output.txt" --output_type "txt" --exclude_hidden

# Verbose mode for detailed logging
codebase-to-text --input "./my_project" --output "output.txt" --output_type "txt" --verbose
```

### Python API

```python
from codebase_to_text import CodebaseToText

# Basic usage
converter = CodebaseToText(
    input_path="path_or_github_url",
    output_path="output_path",
    output_type="txt"
)
converter.get_file()

# Advanced usage with exclusions
converter = CodebaseToText(
    input_path="./my_project",
    output_path="./output.txt",
    output_type="txt",
    exclude=["*.log", "temp/", "**/__pycache__/**"],
    exclude_hidden=True,
    verbose=True
)
converter.get_file()

# Get text content without saving to file
text_content = converter.get_text()
print(text_content)
```

## 🎯 Exclusion Patterns

The tool supports powerful exclusion patterns to filter out unwanted files and directories:

### Pattern Types

1. **Exact filename**: `README.md`, `config.yaml`
2. **Wildcard patterns**: `*.log`, `*.tmp`, `test_*`
3. **Directory patterns**: `__pycache__/`, `.git/`, `node_modules/`
4. **Recursive patterns**: `**/__pycache__/**`, `**/node_modules/**`
5. **Path-based patterns**: `src/temp/`, `docs/build/`

### Exclusion Sources

1. **CLI Arguments**: Use `--exclude` flag (can be used multiple times)
2. **`.exclude` file**: Place in your project root (see example below)
3. **Default patterns**: Common files/folders are excluded automatically

### Default Exclusions

The tool automatically excludes common development files:

- `.git/`, `__pycache__/`, `*.pyc`, `*.pyo`
- `node_modules/`, `.venv/`, `venv/`, `env/`
- `*.log`, `*.tmp`, `.DS_Store`
- `.pytest_cache/`, `build/`, `dist/`

## 📝 .exclude File Example

Create a `.exclude` file in your project root:

```bash
# .exclude file - Patterns for files/folders to exclude

# Version control
.git/
.gitignore

# Python
__pycache__/
*.pyc
venv/
.pytest_cache/

# Node.js
node_modules/
*.log

# IDE files
.vscode/
.idea/

# Project specific
config/secrets.yaml
data/large_files/
```

## 🔧 CLI Parameters

| Parameter | Description | Example |
|-----------|-------------|---------|
| `--input` | Input path (local folder or GitHub URL) | `./my_project` or `https://github.com/user/repo` |
| `--output` | Output file path | `./output.txt` |
| `--output_type` | Output format (`txt` or `docx`) | `txt` |
| `--exclude` | Exclusion patterns (repeatable) | `--exclude "*.log" --exclude "temp/"` |
| `--exclude_hidden` | Exclude hidden files/folders | Flag (no value) |
| `--verbose` | Enable detailed logging | Flag (no value) |

## 💡 Examples

### Convert Local Project

```bash
# Basic conversion
codebase-to-text --input "~/projects/my_app" --output "my_app_code.txt" --output_type "txt"

# With custom exclusions
codebase-to-text --input "~/projects/my_app" --output "my_app_code.txt" --output_type "txt" --exclude "*.log,build/,dist/" --verbose
```

### Convert GitHub Repository

```bash
# Public repository
codebase-to-text --input "https://github.com/username/repo" --output "repo_analysis.docx" --output_type "docx"

# With exclusions for cleaner output
codebase-to-text --input "https://github.com/username/repo" --output "repo_clean.txt" --output_type "txt" --exclude "*.md,docs/,examples/"
```

### Python Integration

```python
# Analyze a codebase programmatically
from codebase_to_text import CodebaseToText

def analyze_codebase(project_path):
    converter = CodebaseToText(
        input_path=project_path,
        output_path="analysis.txt",
        output_type="txt",
        exclude=["*.log", "test/", "**/__pycache__/**"],
        verbose=True
    )
    
    # Get the content
    content = converter.get_text()
    
    # Process with your preferred LLM/AI tool
    # analysis_result = your_ai_tool.analyze(content)
    
    return content

# Usage
code_content = analyze_codebase("./my_project")
```

## 🎯 Use Cases

- **AI/LLM Training**: Prepare codebases for language model training
- **Code Review**: Generate comprehensive code overviews for review
- **Documentation**: Create single-file documentation from projects
- **Analysis**: Feed entire codebases to AI tools for analysis
- **Migration**: Document legacy codebases before migration
- **Learning**: Study open-source projects more effectively

## 🔄 Output Format

The generated output includes:

1. **Folder Structure**: Tree-like representation of the directory structure
2. **File Contents**: Full content of each file with metadata
3. **Clear Separators**: Distinct sections for easy navigation

## ✒️ License

License This project is licensed under the MIT License - see the LICENSE file for details.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/QaisarRajput/codebase_to_text",
    "name": "codebase-to-text",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "codebase, code conversion, text conversion, folder structure, file contents, text extraction, document conversion, Python package, GitHub repository, command-line tool, code analysis, file parsing, code documentation, formatting preservation, readability",
    "author": "Qaisar Tanvir",
    "author_email": "qaisartanvir.dev+codebase_to_text@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/f2/cd/b7ef3e1100d7f44c89c6a86d9ada97890edd6cf25bc11a1c6e8780b6da08/codebase_to_text-1.2.tar.gz",
    "platform": null,
    "description": "# Codebase to Text Converter\n\nA powerful Python tool that converts codebases (folder structures with files) into a single text file or Microsoft Word document (.docx), while preserving folder structure and file contents. Perfect for AI/LLM processing, documentation generation, and code analysis.\n\n## \u2728 Features\n\n- **Multi-source input**: Local directories and GitHub repositories\n- **Flexible output**: Text files (.txt) and Microsoft Word documents (.docx)\n- **Smart exclusions**: Advanced pattern matching for files and directories\n- **Performance optimized**: Efficient traversal of large codebases\n- **Comprehensive logging**: Detailed verbose mode for transparency\n- **Encoding support**: Handles various file encodings gracefully\n\n## \ud83d\ude80 Installation\n\n```bash\npip install codebase-to-text\n```\n\n## \ud83d\udcd6 Usage\n\n### Command Line Interface (CLI)\n\n#### Basic Usage\n\n```bash\ncodebase-to-text --input \"path_or_github_url\" --output \"output_path\" --output_type \"txt\"\n```\n\n#### Advanced Usage with Exclusions\n\n```bash\n# Exclude specific patterns\ncodebase-to-text --input \"./my_project\" --output \"output.txt\" --output_type \"txt\" --exclude \"*.log,temp/,**/__pycache__/**\"\n\n# Multiple exclude arguments\ncodebase-to-text --input \"./my_project\" --output \"output.txt\" --output_type \"txt\" --exclude \"*.pyc\" --exclude \"build/\" --exclude \"venv/\"\n\n# Exclude hidden files\ncodebase-to-text --input \"./my_project\" --output \"output.txt\" --output_type \"txt\" --exclude_hidden\n\n# Verbose mode for detailed logging\ncodebase-to-text --input \"./my_project\" --output \"output.txt\" --output_type \"txt\" --verbose\n```\n\n### Python API\n\n```python\nfrom codebase_to_text import CodebaseToText\n\n# Basic usage\nconverter = CodebaseToText(\n    input_path=\"path_or_github_url\",\n    output_path=\"output_path\",\n    output_type=\"txt\"\n)\nconverter.get_file()\n\n# Advanced usage with exclusions\nconverter = CodebaseToText(\n    input_path=\"./my_project\",\n    output_path=\"./output.txt\",\n    output_type=\"txt\",\n    exclude=[\"*.log\", \"temp/\", \"**/__pycache__/**\"],\n    exclude_hidden=True,\n    verbose=True\n)\nconverter.get_file()\n\n# Get text content without saving to file\ntext_content = converter.get_text()\nprint(text_content)\n```\n\n## \ud83c\udfaf Exclusion Patterns\n\nThe tool supports powerful exclusion patterns to filter out unwanted files and directories:\n\n### Pattern Types\n\n1. **Exact filename**: `README.md`, `config.yaml`\n2. **Wildcard patterns**: `*.log`, `*.tmp`, `test_*`\n3. **Directory patterns**: `__pycache__/`, `.git/`, `node_modules/`\n4. **Recursive patterns**: `**/__pycache__/**`, `**/node_modules/**`\n5. **Path-based patterns**: `src/temp/`, `docs/build/`\n\n### Exclusion Sources\n\n1. **CLI Arguments**: Use `--exclude` flag (can be used multiple times)\n2. **`.exclude` file**: Place in your project root (see example below)\n3. **Default patterns**: Common files/folders are excluded automatically\n\n### Default Exclusions\n\nThe tool automatically excludes common development files:\n\n- `.git/`, `__pycache__/`, `*.pyc`, `*.pyo`\n- `node_modules/`, `.venv/`, `venv/`, `env/`\n- `*.log`, `*.tmp`, `.DS_Store`\n- `.pytest_cache/`, `build/`, `dist/`\n\n## \ud83d\udcdd .exclude File Example\n\nCreate a `.exclude` file in your project root:\n\n```bash\n# .exclude file - Patterns for files/folders to exclude\n\n# Version control\n.git/\n.gitignore\n\n# Python\n__pycache__/\n*.pyc\nvenv/\n.pytest_cache/\n\n# Node.js\nnode_modules/\n*.log\n\n# IDE files\n.vscode/\n.idea/\n\n# Project specific\nconfig/secrets.yaml\ndata/large_files/\n```\n\n## \ud83d\udd27 CLI Parameters\n\n| Parameter | Description | Example |\n|-----------|-------------|---------|\n| `--input` | Input path (local folder or GitHub URL) | `./my_project` or `https://github.com/user/repo` |\n| `--output` | Output file path | `./output.txt` |\n| `--output_type` | Output format (`txt` or `docx`) | `txt` |\n| `--exclude` | Exclusion patterns (repeatable) | `--exclude \"*.log\" --exclude \"temp/\"` |\n| `--exclude_hidden` | Exclude hidden files/folders | Flag (no value) |\n| `--verbose` | Enable detailed logging | Flag (no value) |\n\n## \ud83d\udca1 Examples\n\n### Convert Local Project\n\n```bash\n# Basic conversion\ncodebase-to-text --input \"~/projects/my_app\" --output \"my_app_code.txt\" --output_type \"txt\"\n\n# With custom exclusions\ncodebase-to-text --input \"~/projects/my_app\" --output \"my_app_code.txt\" --output_type \"txt\" --exclude \"*.log,build/,dist/\" --verbose\n```\n\n### Convert GitHub Repository\n\n```bash\n# Public repository\ncodebase-to-text --input \"https://github.com/username/repo\" --output \"repo_analysis.docx\" --output_type \"docx\"\n\n# With exclusions for cleaner output\ncodebase-to-text --input \"https://github.com/username/repo\" --output \"repo_clean.txt\" --output_type \"txt\" --exclude \"*.md,docs/,examples/\"\n```\n\n### Python Integration\n\n```python\n# Analyze a codebase programmatically\nfrom codebase_to_text import CodebaseToText\n\ndef analyze_codebase(project_path):\n    converter = CodebaseToText(\n        input_path=project_path,\n        output_path=\"analysis.txt\",\n        output_type=\"txt\",\n        exclude=[\"*.log\", \"test/\", \"**/__pycache__/**\"],\n        verbose=True\n    )\n    \n    # Get the content\n    content = converter.get_text()\n    \n    # Process with your preferred LLM/AI tool\n    # analysis_result = your_ai_tool.analyze(content)\n    \n    return content\n\n# Usage\ncode_content = analyze_codebase(\"./my_project\")\n```\n\n## \ud83c\udfaf Use Cases\n\n- **AI/LLM Training**: Prepare codebases for language model training\n- **Code Review**: Generate comprehensive code overviews for review\n- **Documentation**: Create single-file documentation from projects\n- **Analysis**: Feed entire codebases to AI tools for analysis\n- **Migration**: Document legacy codebases before migration\n- **Learning**: Study open-source projects more effectively\n\n## \ud83d\udd04 Output Format\n\nThe generated output includes:\n\n1. **Folder Structure**: Tree-like representation of the directory structure\n2. **File Contents**: Full content of each file with metadata\n3. **Clear Separators**: Distinct sections for easy navigation\n\n## \u2712\ufe0f License\n\nLicense This project is licensed under the MIT License - see the LICENSE file for details.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A Python package to convert codebase to text",
    "version": "1.2",
    "project_urls": {
        "Download": "https://github.com/QaisarRajput/codebase_to_text/archive/refs/tags/1.2.tar.gz",
        "Homepage": "https://github.com/QaisarRajput/codebase_to_text"
    },
    "split_keywords": [
        "codebase",
        " code conversion",
        " text conversion",
        " folder structure",
        " file contents",
        " text extraction",
        " document conversion",
        " python package",
        " github repository",
        " command-line tool",
        " code analysis",
        " file parsing",
        " code documentation",
        " formatting preservation",
        " readability"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "9b156436d3b1a5c5f180eb6cc30de96e24319b35219412b93c5386704b489263",
                "md5": "5eb942c999c5bb08579018b3926c7667",
                "sha256": "fa02b8dcdc08842d301f5e34547c0859c5448e0e5ae61c64a86f9b979c37b481"
            },
            "downloads": -1,
            "filename": "codebase_to_text-1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "5eb942c999c5bb08579018b3926c7667",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 17277,
            "upload_time": "2025-07-20T20:26:40",
            "upload_time_iso_8601": "2025-07-20T20:26:40.199786Z",
            "url": "https://files.pythonhosted.org/packages/9b/15/6436d3b1a5c5f180eb6cc30de96e24319b35219412b93c5386704b489263/codebase_to_text-1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "f2cdb7ef3e1100d7f44c89c6a86d9ada97890edd6cf25bc11a1c6e8780b6da08",
                "md5": "0f6837dae44f79db632707c015971a4b",
                "sha256": "0e88bfdcab4378e35b1fbe2f0a9a8070abe16701fd8b1e9baff9f9a6446b286a"
            },
            "downloads": -1,
            "filename": "codebase_to_text-1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "0f6837dae44f79db632707c015971a4b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 17997,
            "upload_time": "2025-07-20T20:26:41",
            "upload_time_iso_8601": "2025-07-20T20:26:41.383289Z",
            "url": "https://files.pythonhosted.org/packages/f2/cd/b7ef3e1100d7f44c89c6a86d9ada97890edd6cf25bc11a1c6e8780b6da08/codebase_to_text-1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-20 20:26:41",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "QaisarRajput",
    "github_project": "codebase_to_text",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "python-docx",
            "specs": [
                [
                    ">=",
                    "0.8.11"
                ]
            ]
        },
        {
            "name": "gitpython",
            "specs": []
        }
    ],
    "lcname": "codebase-to-text"
}
        
Elapsed time: 0.50147s