docuvert


Namedocuvert JSON
Version 1.1.2 PyPI version JSON
download
home_pagehttps://github.com/avanomme/docuvert
SummaryUniversal document converter with 200+ format combinations. Features PowerPoint to Obsidian Markdown with image extraction, format preservation, and navigation links. Supports PDF, DOCX, PPTX, MD, TEX, CSV, XLSX, TXT, HEIC, JPG, PNG, HTML, RTF, ODT.
upload_time2025-09-09 18:33:08
maintainerNone
docs_urlNone
authorVanox Tech
requires_python>=3.8
licenseMIT
keywords document converter pdf docx powerpoint obsidian markdown latex csv xlsx file conversion pandoc pptx image extraction
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Docuvert

Docuvert is a command-line tool that supports converting documents from any format to any other format.

## Installation

### Option 1: Install from PyPI (Recommended)

```bash
pip install docuvert
```

After installation, the `docuvert` command will be globally available in your PATH:

```bash
docuvert --version
docuvert input.pdf output.docx
```

### Option 2: Development Setup

1.  **Clone the repository:**

    ```bash
    git clone https://github.com/your-repo/docuvert.git
    cd docuvert
    ```

2.  **Install in development mode:**

    ```bash
    pip install -e .
    ```

    Or use the setup script for local development:

    ```bash
    ./setup.sh
    ```

## Usage

Docuvert converts files based on their extensions. The syntax is simple:

```bash
docuvert <input_file_path> <output_file_path>
```

**Basic Commands:**

```bash
# Convert files
docuvert input.pdf output.docx

# Check version
docuvert --version

# Show detailed info (formats, examples, installation)
docuvert --info

# Show help
docuvert --help
```

**Examples:**

-   **Convert PDF to DOCX:**

    ```bash
    docuvert document.pdf document.docx
    ```

-   **Convert Markdown to PDF:**

    ```bash
    docuvert notes.md notes.pdf
    ```

-   **Convert PowerPoint to Obsidian Markdown (NEW!):**

    ```bash
    docuvert presentation.pptx notes.md
    ```

-   **Convert Legacy PowerPoint with automatic conversion:**

    ```bash
    docuvert lecture.ppt lecture.md
    ```

-   **Convert DOCX to Markdown:**

    ```bash
    docuvert report.docx report.md
    ```

## Supported Conversions

Docuvert supports 200+ format combinations with intelligent conversion routing. Key features include:

### 🎯 **PowerPoint Conversions (NEW!)**
-   **PPTX/PPT to Obsidian Markdown** (`pptx2md`, `ppt2md`) - **Featured Converter**
    - ✅ Automatic image extraction and embedding
    - ✅ Format preservation (bold, italic, colors)
    - ✅ Obsidian-specific features (YAML frontmatter, internal links, callouts)
    - ✅ Slide navigation with Previous/Next links
    - ✅ Table of contents generation
    - ✅ Legacy .ppt support via LibreOffice conversion
-   PPTX to PDF (`pptx2pdf`)
-   PPTX to HTML (`pptx2html`)
-   PPTX to Plain Text (`pptx2txt`)
-   Markdown to PPTX (`md2pptx`)

### 📄 **Document Conversions**

-   PDF to DOCX (`pdf2docx`)
-   PDF to Markdown (`pdf2md`)
-   PDF to LaTeX (`pdf2tex`)
-   PDF to Plain Text (`pdf2txt`)
-   PDF to CSV (`pdf2csv`)
-   PDF to XLSX (`pdf2xlsx`)
-   DOCX to PDF (`docx2pdf`)
-   DOCX to Markdown (`docx2md`)
-   DOCX to LaTeX (`docx2tex`)
-   DOCX to Plain Text (`docx2txt`)
-   DOCX to CSV (`docx2csv`)
-   DOCX to XLSX (`docx2xlsx`)
-   Markdown to PDF (`md2pdf`)
-   Markdown to DOCX (`md2docx`)
-   Markdown to LaTeX (`md2tex`)
-   Markdown to Plain Text (`md2txt`)
-   Markdown to CSV (`md2csv`)
-   Markdown to XLSX (`md2xlsx`)
-   LaTeX to PDF (`tex2pdf`)
-   LaTeX to DOCX (`tex2docx`)
-   LaTeX to Markdown (`tex2md`)
-   LaTeX to Plain Text (`tex2txt`)
-   LaTeX to CSV (`tex2csv`)
-   LaTeX to XLSX (`tex2xlsx`)
-   Plain Text to PDF (`txt2pdf`)
-   Plain Text to DOCX (`txt2docx`)
-   Plain Text to Markdown (`txt2md`)
-   Plain Text to LaTeX (`txt2tex`)
-   Plain Text to CSV (`txt2csv`)
-   Plain Text to XLSX (`txt2xlsx`)
-   CSV to PDF (`csv2pdf`)
-   CSV to DOCX (`csv2docx`)
-   CSV to Markdown (`csv2md`)
-   CSV to LaTeX (`csv2tex`)
-   CSV to Plain Text (`csv2txt`)
-   CSV to XLSX (`csv2xlsx`)
-   XLSX to PDF (`xlsx2pdf`)
-   XLSX to DOCX (`xlsx2docx`)
-   XLSX to Markdown (`xlsx2md`)
-   XLSX to LaTeX (`xlsx2tex`)
-   XLSX to Plain Text (`xlsx2txt`)
-   XLSX to CSV (`xlsx2csv`)

## 🔄 Legacy Format Support

Docuvert automatically handles legacy Microsoft Office formats:

### 📝 Legacy Word (.doc) Support
- **Automatic conversion**: `.doc` files are automatically converted to `.docx` format before processing
- **All format combinations supported**: Use any `.doc` to `format` conversion just like `.docx`
- **Examples:**
  ```bash
  docuvert old-document.doc new-document.pdf
  docuvert report.doc report.md
  docuvert legacy.doc modern.docx
  ```

### 📊 Legacy Excel (.xls) Support  
- **Automatic conversion**: `.xls` files are automatically converted to `.xlsx` format before processing
- **All format combinations supported**: Use any `.xls` to `format` conversion just like `.xlsx`
- **Examples:**
  ```bash
  docuvert old-spreadsheet.xls new-spreadsheet.pdf
  docuvert data.xls data.csv
  docuvert legacy.xls modern.xlsx
  ```

### 📋 Requirements for Legacy Formats
- **LibreOffice**: Recommended for best conversion quality
  - Install: [https://www.libreoffice.org/download/](https://www.libreoffice.org/download/)
  - Supports both `.doc` and `.xls` formats
- **Pandoc**: Alternative for `.doc` conversion
  - Install: [https://pandoc.org/installing.html](https://pandoc.org/installing.html)
- **xlrd**: Python library for `.xls` reading (automatically installed)

### 🔧 Conversion Process
1. Docuvert detects legacy format (`.doc` or `.xls`)
2. Creates temporary modern format file (`.docx` or `.xlsx`)
3. Processes conversion using existing converters  
4. Cleans up temporary files automatically
5. Returns final converted output

No additional configuration needed - just use legacy files like modern formats!

## Contributing

See `instructions.md` for details on project organization and how to add new converters.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/avanomme/docuvert",
    "name": "docuvert",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "document converter, pdf, docx, powerpoint, obsidian, markdown, latex, csv, xlsx, file conversion, pandoc, pptx, image extraction",
    "author": "Vanox Tech",
    "author_email": "Vanox Tech <dev@vanoxtech.com>",
    "download_url": "https://files.pythonhosted.org/packages/a8/fd/222ba42696ca5a8b80a33eb5f8e4f97d851821f1f4d896390325753702db/docuvert-1.1.2.tar.gz",
    "platform": null,
    "description": "# Docuvert\n\nDocuvert is a command-line tool that supports converting documents from any format to any other format.\n\n## Installation\n\n### Option 1: Install from PyPI (Recommended)\n\n```bash\npip install docuvert\n```\n\nAfter installation, the `docuvert` command will be globally available in your PATH:\n\n```bash\ndocuvert --version\ndocuvert input.pdf output.docx\n```\n\n### Option 2: Development Setup\n\n1.  **Clone the repository:**\n\n    ```bash\n    git clone https://github.com/your-repo/docuvert.git\n    cd docuvert\n    ```\n\n2.  **Install in development mode:**\n\n    ```bash\n    pip install -e .\n    ```\n\n    Or use the setup script for local development:\n\n    ```bash\n    ./setup.sh\n    ```\n\n## Usage\n\nDocuvert converts files based on their extensions. The syntax is simple:\n\n```bash\ndocuvert <input_file_path> <output_file_path>\n```\n\n**Basic Commands:**\n\n```bash\n# Convert files\ndocuvert input.pdf output.docx\n\n# Check version\ndocuvert --version\n\n# Show detailed info (formats, examples, installation)\ndocuvert --info\n\n# Show help\ndocuvert --help\n```\n\n**Examples:**\n\n-   **Convert PDF to DOCX:**\n\n    ```bash\n    docuvert document.pdf document.docx\n    ```\n\n-   **Convert Markdown to PDF:**\n\n    ```bash\n    docuvert notes.md notes.pdf\n    ```\n\n-   **Convert PowerPoint to Obsidian Markdown (NEW!):**\n\n    ```bash\n    docuvert presentation.pptx notes.md\n    ```\n\n-   **Convert Legacy PowerPoint with automatic conversion:**\n\n    ```bash\n    docuvert lecture.ppt lecture.md\n    ```\n\n-   **Convert DOCX to Markdown:**\n\n    ```bash\n    docuvert report.docx report.md\n    ```\n\n## Supported Conversions\n\nDocuvert supports 200+ format combinations with intelligent conversion routing. Key features include:\n\n### \ud83c\udfaf **PowerPoint Conversions (NEW!)**\n-   **PPTX/PPT to Obsidian Markdown** (`pptx2md`, `ppt2md`) - **Featured Converter**\n    - \u2705 Automatic image extraction and embedding\n    - \u2705 Format preservation (bold, italic, colors)\n    - \u2705 Obsidian-specific features (YAML frontmatter, internal links, callouts)\n    - \u2705 Slide navigation with Previous/Next links\n    - \u2705 Table of contents generation\n    - \u2705 Legacy .ppt support via LibreOffice conversion\n-   PPTX to PDF (`pptx2pdf`)\n-   PPTX to HTML (`pptx2html`)\n-   PPTX to Plain Text (`pptx2txt`)\n-   Markdown to PPTX (`md2pptx`)\n\n### \ud83d\udcc4 **Document Conversions**\n\n-   PDF to DOCX (`pdf2docx`)\n-   PDF to Markdown (`pdf2md`)\n-   PDF to LaTeX (`pdf2tex`)\n-   PDF to Plain Text (`pdf2txt`)\n-   PDF to CSV (`pdf2csv`)\n-   PDF to XLSX (`pdf2xlsx`)\n-   DOCX to PDF (`docx2pdf`)\n-   DOCX to Markdown (`docx2md`)\n-   DOCX to LaTeX (`docx2tex`)\n-   DOCX to Plain Text (`docx2txt`)\n-   DOCX to CSV (`docx2csv`)\n-   DOCX to XLSX (`docx2xlsx`)\n-   Markdown to PDF (`md2pdf`)\n-   Markdown to DOCX (`md2docx`)\n-   Markdown to LaTeX (`md2tex`)\n-   Markdown to Plain Text (`md2txt`)\n-   Markdown to CSV (`md2csv`)\n-   Markdown to XLSX (`md2xlsx`)\n-   LaTeX to PDF (`tex2pdf`)\n-   LaTeX to DOCX (`tex2docx`)\n-   LaTeX to Markdown (`tex2md`)\n-   LaTeX to Plain Text (`tex2txt`)\n-   LaTeX to CSV (`tex2csv`)\n-   LaTeX to XLSX (`tex2xlsx`)\n-   Plain Text to PDF (`txt2pdf`)\n-   Plain Text to DOCX (`txt2docx`)\n-   Plain Text to Markdown (`txt2md`)\n-   Plain Text to LaTeX (`txt2tex`)\n-   Plain Text to CSV (`txt2csv`)\n-   Plain Text to XLSX (`txt2xlsx`)\n-   CSV to PDF (`csv2pdf`)\n-   CSV to DOCX (`csv2docx`)\n-   CSV to Markdown (`csv2md`)\n-   CSV to LaTeX (`csv2tex`)\n-   CSV to Plain Text (`csv2txt`)\n-   CSV to XLSX (`csv2xlsx`)\n-   XLSX to PDF (`xlsx2pdf`)\n-   XLSX to DOCX (`xlsx2docx`)\n-   XLSX to Markdown (`xlsx2md`)\n-   XLSX to LaTeX (`xlsx2tex`)\n-   XLSX to Plain Text (`xlsx2txt`)\n-   XLSX to CSV (`xlsx2csv`)\n\n## \ud83d\udd04 Legacy Format Support\n\nDocuvert automatically handles legacy Microsoft Office formats:\n\n### \ud83d\udcdd Legacy Word (.doc) Support\n- **Automatic conversion**: `.doc` files are automatically converted to `.docx` format before processing\n- **All format combinations supported**: Use any `.doc` to `format` conversion just like `.docx`\n- **Examples:**\n  ```bash\n  docuvert old-document.doc new-document.pdf\n  docuvert report.doc report.md\n  docuvert legacy.doc modern.docx\n  ```\n\n### \ud83d\udcca Legacy Excel (.xls) Support  \n- **Automatic conversion**: `.xls` files are automatically converted to `.xlsx` format before processing\n- **All format combinations supported**: Use any `.xls` to `format` conversion just like `.xlsx`\n- **Examples:**\n  ```bash\n  docuvert old-spreadsheet.xls new-spreadsheet.pdf\n  docuvert data.xls data.csv\n  docuvert legacy.xls modern.xlsx\n  ```\n\n### \ud83d\udccb Requirements for Legacy Formats\n- **LibreOffice**: Recommended for best conversion quality\n  - Install: [https://www.libreoffice.org/download/](https://www.libreoffice.org/download/)\n  - Supports both `.doc` and `.xls` formats\n- **Pandoc**: Alternative for `.doc` conversion\n  - Install: [https://pandoc.org/installing.html](https://pandoc.org/installing.html)\n- **xlrd**: Python library for `.xls` reading (automatically installed)\n\n### \ud83d\udd27 Conversion Process\n1. Docuvert detects legacy format (`.doc` or `.xls`)\n2. Creates temporary modern format file (`.docx` or `.xlsx`)\n3. Processes conversion using existing converters  \n4. Cleans up temporary files automatically\n5. Returns final converted output\n\nNo additional configuration needed - just use legacy files like modern formats!\n\n## Contributing\n\nSee `instructions.md` for details on project organization and how to add new converters.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Universal document converter with 200+ format combinations. Features PowerPoint to Obsidian Markdown with image extraction, format preservation, and navigation links. Supports PDF, DOCX, PPTX, MD, TEX, CSV, XLSX, TXT, HEIC, JPG, PNG, HTML, RTF, ODT.",
    "version": "1.1.2",
    "project_urls": {
        "Homepage": "https://github.com/avanomme/docuvert",
        "Repository": "https://github.com/avanomme/docuvert"
    },
    "split_keywords": [
        "document converter",
        " pdf",
        " docx",
        " powerpoint",
        " obsidian",
        " markdown",
        " latex",
        " csv",
        " xlsx",
        " file conversion",
        " pandoc",
        " pptx",
        " image extraction"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "55d7ce67e2da7e1ac435895cd0646380af070620012e0b219e868cc99e842e92",
                "md5": "f82b9cafda70cfd4115c18e64854d60e",
                "sha256": "1ae0bfd4b9b567cb33905df892f0dba3b23aa29c2f010c5de03ca4a882f38db2"
            },
            "downloads": -1,
            "filename": "docuvert-1.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "f82b9cafda70cfd4115c18e64854d60e",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 136860,
            "upload_time": "2025-09-09T18:33:07",
            "upload_time_iso_8601": "2025-09-09T18:33:07.533441Z",
            "url": "https://files.pythonhosted.org/packages/55/d7/ce67e2da7e1ac435895cd0646380af070620012e0b219e868cc99e842e92/docuvert-1.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "a8fd222ba42696ca5a8b80a33eb5f8e4f97d851821f1f4d896390325753702db",
                "md5": "8032af056c94ecaf82de2354307a908b",
                "sha256": "e31d453bd30ce2370b05536faa1de05257266ac5816c0ed1d282edabc6a66769"
            },
            "downloads": -1,
            "filename": "docuvert-1.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "8032af056c94ecaf82de2354307a908b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 103463,
            "upload_time": "2025-09-09T18:33:08",
            "upload_time_iso_8601": "2025-09-09T18:33:08.606666Z",
            "url": "https://files.pythonhosted.org/packages/a8/fd/222ba42696ca5a8b80a33eb5f8e4f97d851821f1f4d896390325753702db/docuvert-1.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-09 18:33:08",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "avanomme",
    "github_project": "docuvert",
    "github_not_found": true,
    "lcname": "docuvert"
}
        
Elapsed time: 3.63900s