txtify


Nametxtify JSON
Version 0.1.2 PyPI version JSON
download
home_pageNone
SummaryA versatile Python tool to convert documents (PPTX, DOCX, PDF, XLSX) to plain text, ideal for providing context to AI code assistants like GitHub Copilot and Amazon CodeWhisperer.
upload_time2025-07-17 03:24:47
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseNone
keywords ai code generation context converter document documentation excel github copilot llm pdf powerpoint text word
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # txtify
# 📄 txtify

<p align="center">
  <a href="https://pypi.org/project/txtify/">
    <img src="https://img.shields.io/pypi/v/txtify.svg" alt="PyPI Version">
  </a>
    <a href="https://github.com/ray-rada/txtify/blob/main/LICENSE.txt">
    <img src="https://img.shields.io/pypi/l/txtify.svg" alt="PyPI - License">
  </a>
</p>

**txtify** is a simple yet powerful command-line tool and Python library designed to effortlessly convert various document formats (PowerPoint, Word, PDF, and Excel) into clean, plain text files.  
It's ideal for extracting content for analysis, archiving, or providing crucial context to AI assistants like GitHub Copilot and Amazon CodeWhisperer, allowing them to better understand your project's domain knowledge, requirements, and existing documentation.

---

## 📚 Table of Contents

- [✨ Features](#-features)
- [🤖 Providing Context to AI Code Assistants](#-providing-context-to-ai-code-assistants)
- [🚀 Installation](#-installation)
- [💡 Usage (Command Line Interface)](#-usage-command-line-interface)
  - [Convert a Single File](#convert-a-single-file)
  - [Convert an Entire Directory](#convert-an-entire-directory)
  - [Convert Multiple Files and Directories](#convert-multiple-files-and-directories)
  - [Specify an Output Directory](#specify-an-output-directory)
  - [Specify an Output Format](#specify-an-output-format)
- [📂 Supported File Formats](#-supported-file-formats)
- [🗄️ Output](#-output)
- [📜 License](#-license)

---

## ✨ Features

✅ **Multi-Format Support**: Converts `.pptx` (PowerPoint), `.docx` (Word), `.pdf` (Portable Document Format), and `.xlsx` (Excel) files.  
✅ **Multiple Output Formats**: Export content as plain text (`.txt`), Markdown (`.md`), or JSON (`.json`).  
✅ **Batch Processing**: Convert multiple files or entire directories at once.  
✅ **Clean Text Output**: Extracts core textual content, making documents easily searchable and readable for both humans and AI.  
✅ **Intuitive CLI**: Simple command-line interface for quick and easy conversions.  
✅ **Preserves Structure**: When converting directories, the original folder structure is replicated in the output.

---

## 🤖 Providing Context to AI Code Assistants

One of the most powerful use cases for **txtify** is to prepare your project's non-code documentation (e.g., design documents, requirement specifications, meeting notes, data dictionaries) for consumption by AI code generation tools like GitHub Copilot, Amazon CodeWhisperer, or similar LLM-based assistants.

### Why this is useful

- **Expand AI's Knowledge Base**: Let the AI "read" and understand domain-specific terminology, project goals, architectural decisions, and detailed requirements that might otherwise be locked away in binary formats.
- **Improve Code Relevance**: The AI can generate more relevant and accurate code suggestions, function names, and comments by leveraging the textual context.
- **Reduce Hallucinations**: With more accurate information, the AI is less likely to "hallucinate" or generate incorrect assumptions.
- **Seamless Integration**: Place the converted `.txt` files in a directory accessible to your IDE, and they can often automatically index and use this information.

### Example Workflow

1. Convert your documentation:
   ```bash
   txtify ./docs_and_requirements/ -o ./ai_context/
   ```

2. Integrate with your project: Place the `ai_context/` folder directly within your main project repository.
3. Let your AI assistant learn: Your assistant will now have access to the wealth of information contained in these plain text files, enabling more intelligent and context-aware code suggestions.

---

## 🚀 Installation

You can install **txtify** directly from PyPI using pip:

```bash
pip install txtify
```

---

## 💡 Usage (Command Line Interface)

**txtify** can be used directly from your terminal.

### Convert a Single File

Pass the path to your document as an argument:

```bash
txtify my_project_spec.docx
```

This will create a plain text file named `my_project_spec.txt` inside a new `output/` directory by default.

---

### Convert an Entire Directory

Provide the path to a directory, and **txtify** will scan it (and its subdirectories) for all supported document types:

```bash
txtify project_documentation/
```

All convertible files will be processed. The original directory structure will be mirrored in the `output/` folder.
For example:

```
project_documentation/meetings/q1_notes.pptx
```

becomes:

```
output/project_documentation/meetings/q1_notes.txt
```

---

### Convert Multiple Files and Directories

You can process any combination of files and directories in a single command. `txtify` will scan all specified paths and convert every supported document it finds.

```bash
txtify my_spec.docx docs_folder/ old_project/ requirements.pdf
```

This will convert the specified files to `.txt` versions in the `output/` directory.

---

### Specify an Output Directory

Use the `-o` or `--output` option to choose a different location for your converted files:

```bash
txtify legacy_reports/ -o contextual_data/
```

This saves all converted text files into the `contextual_data/` directory.

### Specify an Output Format

By default, all documents are converted to plain text (`.txt`). If you do not specify an output format, you do not need to use this flag. To convert to Markdown or JSON, use the `--output-format` option.

```bash
txtify my_document.pdf --output-format markdown
```

---

## 📂 Supported File Formats

**txtify** currently supports conversion for the following file types:

* PowerPoint Presentations: `.pptx`
* Word Documents: `.docx`
* PDF Documents: `.pdf`
* Excel Workbooks: `.xlsx`
  *(converted to a CSV-like plain text format, useful for data extraction)*

---

## 🗄️ Output

Converted files will have a `.txt`, `.md`, or `.json` extension, depending on the chosen format.
By default, they are saved to a directory named `output/` in your current working directory.
You can customize this using the `-o` or `--output` option.

If converting an entire directory, the relative path from the input directory is preserved in the output.

---

## 📜 License

**txtify** is distributed under the terms of the MIT License.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "txtify",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "ai, code generation, context, converter, document, documentation, excel, github copilot, llm, pdf, powerpoint, text, word",
    "author": null,
    "author_email": "Ray Rada <rayrada1@gmail.com>, Anthony Furst <anthony.furst@afs.com>",
    "download_url": "https://files.pythonhosted.org/packages/61/9f/7ac25c744ea185a38d7b0ef1a23d9cc117ffae7c1966841d6100cd846046/txtify-0.1.2.tar.gz",
    "platform": null,
    "description": "# txtify\n# \ud83d\udcc4 txtify\n\n<p align=\"center\">\n  <a href=\"https://pypi.org/project/txtify/\">\n    <img src=\"https://img.shields.io/pypi/v/txtify.svg\" alt=\"PyPI Version\">\n  </a>\n    <a href=\"https://github.com/ray-rada/txtify/blob/main/LICENSE.txt\">\n    <img src=\"https://img.shields.io/pypi/l/txtify.svg\" alt=\"PyPI - License\">\n  </a>\n</p>\n\n**txtify** is a simple yet powerful command-line tool and Python library designed to effortlessly convert various document formats (PowerPoint, Word, PDF, and Excel) into clean, plain text files.  \nIt's ideal for extracting content for analysis, archiving, or providing crucial context to AI assistants like GitHub Copilot and Amazon CodeWhisperer, allowing them to better understand your project's domain knowledge, requirements, and existing documentation.\n\n---\n\n## \ud83d\udcda Table of Contents\n\n- [\u2728 Features](#-features)\n- [\ud83e\udd16 Providing Context to AI Code Assistants](#-providing-context-to-ai-code-assistants)\n- [\ud83d\ude80 Installation](#-installation)\n- [\ud83d\udca1 Usage (Command Line Interface)](#-usage-command-line-interface)\n  - [Convert a Single File](#convert-a-single-file)\n  - [Convert an Entire Directory](#convert-an-entire-directory)\n  - [Convert Multiple Files and Directories](#convert-multiple-files-and-directories)\n  - [Specify an Output Directory](#specify-an-output-directory)\n  - [Specify an Output Format](#specify-an-output-format)\n- [\ud83d\udcc2 Supported File Formats](#-supported-file-formats)\n- [\ud83d\uddc4\ufe0f Output](#-output)\n- [\ud83d\udcdc License](#-license)\n\n---\n\n## \u2728 Features\n\n\u2705 **Multi-Format Support**: Converts `.pptx` (PowerPoint), `.docx` (Word), `.pdf` (Portable Document Format), and `.xlsx` (Excel) files.  \n\u2705 **Multiple Output Formats**: Export content as plain text (`.txt`), Markdown (`.md`), or JSON (`.json`).  \n\u2705 **Batch Processing**: Convert multiple files or entire directories at once.  \n\u2705 **Clean Text Output**: Extracts core textual content, making documents easily searchable and readable for both humans and AI.  \n\u2705 **Intuitive CLI**: Simple command-line interface for quick and easy conversions.  \n\u2705 **Preserves Structure**: When converting directories, the original folder structure is replicated in the output.\n\n---\n\n## \ud83e\udd16 Providing Context to AI Code Assistants\n\nOne of the most powerful use cases for **txtify** is to prepare your project's non-code documentation (e.g., design documents, requirement specifications, meeting notes, data dictionaries) for consumption by AI code generation tools like GitHub Copilot, Amazon CodeWhisperer, or similar LLM-based assistants.\n\n### Why this is useful\n\n- **Expand AI's Knowledge Base**: Let the AI \"read\" and understand domain-specific terminology, project goals, architectural decisions, and detailed requirements that might otherwise be locked away in binary formats.\n- **Improve Code Relevance**: The AI can generate more relevant and accurate code suggestions, function names, and comments by leveraging the textual context.\n- **Reduce Hallucinations**: With more accurate information, the AI is less likely to \"hallucinate\" or generate incorrect assumptions.\n- **Seamless Integration**: Place the converted `.txt` files in a directory accessible to your IDE, and they can often automatically index and use this information.\n\n### Example Workflow\n\n1. Convert your documentation:\n   ```bash\n   txtify ./docs_and_requirements/ -o ./ai_context/\n   ```\n\n2. Integrate with your project: Place the `ai_context/` folder directly within your main project repository.\n3. Let your AI assistant learn: Your assistant will now have access to the wealth of information contained in these plain text files, enabling more intelligent and context-aware code suggestions.\n\n---\n\n## \ud83d\ude80 Installation\n\nYou can install **txtify** directly from PyPI using pip:\n\n```bash\npip install txtify\n```\n\n---\n\n## \ud83d\udca1 Usage (Command Line Interface)\n\n**txtify** can be used directly from your terminal.\n\n### Convert a Single File\n\nPass the path to your document as an argument:\n\n```bash\ntxtify my_project_spec.docx\n```\n\nThis will create a plain text file named `my_project_spec.txt` inside a new `output/` directory by default.\n\n---\n\n### Convert an Entire Directory\n\nProvide the path to a directory, and **txtify** will scan it (and its subdirectories) for all supported document types:\n\n```bash\ntxtify project_documentation/\n```\n\nAll convertible files will be processed. The original directory structure will be mirrored in the `output/` folder.\nFor example:\n\n```\nproject_documentation/meetings/q1_notes.pptx\n```\n\nbecomes:\n\n```\noutput/project_documentation/meetings/q1_notes.txt\n```\n\n---\n\n### Convert Multiple Files and Directories\n\nYou can process any combination of files and directories in a single command. `txtify` will scan all specified paths and convert every supported document it finds.\n\n```bash\ntxtify my_spec.docx docs_folder/ old_project/ requirements.pdf\n```\n\nThis will convert the specified files to `.txt` versions in the `output/` directory.\n\n---\n\n### Specify an Output Directory\n\nUse the `-o` or `--output` option to choose a different location for your converted files:\n\n```bash\ntxtify legacy_reports/ -o contextual_data/\n```\n\nThis saves all converted text files into the `contextual_data/` directory.\n\n### Specify an Output Format\n\nBy default, all documents are converted to plain text (`.txt`). If you do not specify an output format, you do not need to use this flag. To convert to Markdown or JSON, use the `--output-format` option.\n\n```bash\ntxtify my_document.pdf --output-format markdown\n```\n\n---\n\n## \ud83d\udcc2 Supported File Formats\n\n**txtify** currently supports conversion for the following file types:\n\n* PowerPoint Presentations: `.pptx`\n* Word Documents: `.docx`\n* PDF Documents: `.pdf`\n* Excel Workbooks: `.xlsx`\n  *(converted to a CSV-like plain text format, useful for data extraction)*\n\n---\n\n## \ud83d\uddc4\ufe0f Output\n\nConverted files will have a `.txt`, `.md`, or `.json` extension, depending on the chosen format.\nBy default, they are saved to a directory named `output/` in your current working directory.\nYou can customize this using the `-o` or `--output` option.\n\nIf converting an entire directory, the relative path from the input directory is preserved in the output.\n\n---\n\n## \ud83d\udcdc License\n\n**txtify** is distributed under the terms of the MIT License.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A versatile Python tool to convert documents (PPTX, DOCX, PDF, XLSX) to plain text, ideal for providing context to AI code assistants like GitHub Copilot and Amazon CodeWhisperer.",
    "version": "0.1.2",
    "project_urls": null,
    "split_keywords": [
        "ai",
        " code generation",
        " context",
        " converter",
        " document",
        " documentation",
        " excel",
        " github copilot",
        " llm",
        " pdf",
        " powerpoint",
        " text",
        " word"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "f79928943ad25cf48b3f53e51918184cca78393c19a9bf1057ed620ab53ed38c",
                "md5": "604d6c4f393383c873a03ce7bf20655a",
                "sha256": "cd8e22ef98c1baa73719ab446fd4d3feb98b8970145cdb4eb8073167965691d7"
            },
            "downloads": -1,
            "filename": "txtify-0.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "604d6c4f393383c873a03ce7bf20655a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 8344,
            "upload_time": "2025-07-17T03:24:46",
            "upload_time_iso_8601": "2025-07-17T03:24:46.051898Z",
            "url": "https://files.pythonhosted.org/packages/f7/99/28943ad25cf48b3f53e51918184cca78393c19a9bf1057ed620ab53ed38c/txtify-0.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "619f7ac25c744ea185a38d7b0ef1a23d9cc117ffae7c1966841d6100cd846046",
                "md5": "9d28d1123d8ff0ef155f8eb6f5622d2f",
                "sha256": "cb45e6aaa8d8eeee85cf0317922f4d32245d7a3b8998cdae8ca76585de1a9595"
            },
            "downloads": -1,
            "filename": "txtify-0.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "9d28d1123d8ff0ef155f8eb6f5622d2f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 6799,
            "upload_time": "2025-07-17T03:24:47",
            "upload_time_iso_8601": "2025-07-17T03:24:47.111874Z",
            "url": "https://files.pythonhosted.org/packages/61/9f/7ac25c744ea185a38d7b0ef1a23d9cc117ffae7c1966841d6100cd846046/txtify-0.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-17 03:24:47",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "txtify"
}
        
Elapsed time: 0.75651s