exparso


Nameexparso JSON
Version 0.0.3 PyPI version JSON
download
home_pageNone
SummaryAnalyzing and parsing documents
upload_time2025-07-22 09:43:16
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseNone
keywords langchain openai openpyxl pdf pdfplumber pillow
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # 📑 Exparso

![python](https://img.shields.io/badge/python-%20%203.10%20|%203.11%20|%203.12-blue)

本ライブラリは、画像を含むドキュメントのパースを行うためのライブラリです。
テキストとして出力することで、従来のベクトル検索や全文検索での利用を可能することを目的とします。
[](<より詳しい情報に関しては、[こちら](https://congenial-waddle-5krzvq6.pages.github.io/)を参照してください。>)

## 📥 インストール方法

### LibreOffice

Office ファイルをテキストに変換するために、LibreOffice をインストールします。

```bash
# Ubuntu
sudo apt install libreoffice

# Mac
brew install --cask libreoffice
```

### ライブラリのインストール

```bash
pip install exparso
```

## 💡 使用方法

`parse_document` 関数を利用して、ドキュメントをパースします。

```python
from exparso import parse_document
from langchain_openai import AzureChatOpenAI

llm_model = AzureChatOpenAI(model="gpt-4o")
text = parse_document(path="path/to/document.pdf", model=llm_model)
```

## 📑 対応ファイル

| コンテンツタイプ      | 拡張子                     |
| --------------------- | -------------------------- |
| **📑 ドキュメント**   | PDF, PowerPoint            |
| **🖼️ 画像**           | JPEG, PNG, BMP             |
| **📝 テキストデータ** | テキストファイル, Markdown |
| **📊 表データ**       | Excel, CSV                 |

## 🔥 LLM

| クラウドベンダー | モデル                                                                                                              |
| ---------------- | ------------------------------------------------------------------------------------------------------------------- |
| Azure            | ChatGPT(`gpt-4o`, `gpt-4o-mini`)                                                                                    |
| Google Cloud     | Claude(`claude-3.7-sonnet`,`claude-3.5-sonnet`), Gemini(`gemini-2.0-flash`,`gemini-1.5-flash-*`,`gemini-2.0-pro-*`) |

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "exparso",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "langchain, openai, openpyxl, pdf, pdfplumber, pillow",
    "author": null,
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/15/68/84a5bb8c3f544f1494436f22e5d98d826cd6a22e845d3c968d2459666b6b/exparso-0.0.3.tar.gz",
    "platform": null,
    "description": "# \ud83d\udcd1 Exparso\n\n![python](https://img.shields.io/badge/python-%20%203.10%20|%203.11%20|%203.12-blue)\n\n\u672c\u30e9\u30a4\u30d6\u30e9\u30ea\u306f\u3001\u753b\u50cf\u3092\u542b\u3080\u30c9\u30ad\u30e5\u30e1\u30f3\u30c8\u306e\u30d1\u30fc\u30b9\u3092\u884c\u3046\u305f\u3081\u306e\u30e9\u30a4\u30d6\u30e9\u30ea\u3067\u3059\u3002\n\u30c6\u30ad\u30b9\u30c8\u3068\u3057\u3066\u51fa\u529b\u3059\u308b\u3053\u3068\u3067\u3001\u5f93\u6765\u306e\u30d9\u30af\u30c8\u30eb\u691c\u7d22\u3084\u5168\u6587\u691c\u7d22\u3067\u306e\u5229\u7528\u3092\u53ef\u80fd\u3059\u308b\u3053\u3068\u3092\u76ee\u7684\u3068\u3057\u307e\u3059\u3002\n[](<\u3088\u308a\u8a73\u3057\u3044\u60c5\u5831\u306b\u95a2\u3057\u3066\u306f\u3001[\u3053\u3061\u3089](https://congenial-waddle-5krzvq6.pages.github.io/)\u3092\u53c2\u7167\u3057\u3066\u304f\u3060\u3055\u3044\u3002>)\n\n## \ud83d\udce5 \u30a4\u30f3\u30b9\u30c8\u30fc\u30eb\u65b9\u6cd5\n\n### LibreOffice\n\nOffice \u30d5\u30a1\u30a4\u30eb\u3092\u30c6\u30ad\u30b9\u30c8\u306b\u5909\u63db\u3059\u308b\u305f\u3081\u306b\u3001LibreOffice \u3092\u30a4\u30f3\u30b9\u30c8\u30fc\u30eb\u3057\u307e\u3059\u3002\n\n```bash\n# Ubuntu\nsudo apt install libreoffice\n\n# Mac\nbrew install --cask libreoffice\n```\n\n### \u30e9\u30a4\u30d6\u30e9\u30ea\u306e\u30a4\u30f3\u30b9\u30c8\u30fc\u30eb\n\n```bash\npip install exparso\n```\n\n## \ud83d\udca1 \u4f7f\u7528\u65b9\u6cd5\n\n`parse_document` \u95a2\u6570\u3092\u5229\u7528\u3057\u3066\u3001\u30c9\u30ad\u30e5\u30e1\u30f3\u30c8\u3092\u30d1\u30fc\u30b9\u3057\u307e\u3059\u3002\n\n```python\nfrom exparso import parse_document\nfrom langchain_openai import AzureChatOpenAI\n\nllm_model = AzureChatOpenAI(model=\"gpt-4o\")\ntext = parse_document(path=\"path/to/document.pdf\", model=llm_model)\n```\n\n## \ud83d\udcd1 \u5bfe\u5fdc\u30d5\u30a1\u30a4\u30eb\n\n| \u30b3\u30f3\u30c6\u30f3\u30c4\u30bf\u30a4\u30d7      | \u62e1\u5f35\u5b50                     |\n| --------------------- | -------------------------- |\n| **\ud83d\udcd1 \u30c9\u30ad\u30e5\u30e1\u30f3\u30c8**   | PDF, PowerPoint            |\n| **\ud83d\uddbc\ufe0f \u753b\u50cf**           | JPEG, PNG, BMP             |\n| **\ud83d\udcdd \u30c6\u30ad\u30b9\u30c8\u30c7\u30fc\u30bf** | \u30c6\u30ad\u30b9\u30c8\u30d5\u30a1\u30a4\u30eb, Markdown |\n| **\ud83d\udcca \u8868\u30c7\u30fc\u30bf**       | Excel, CSV                 |\n\n## \ud83d\udd25 LLM\n\n| \u30af\u30e9\u30a6\u30c9\u30d9\u30f3\u30c0\u30fc | \u30e2\u30c7\u30eb                                                                                                              |\n| ---------------- | ------------------------------------------------------------------------------------------------------------------- |\n| Azure            | ChatGPT(`gpt-4o`, `gpt-4o-mini`)                                                                                    |\n| Google Cloud     | Claude(`claude-3.7-sonnet`,`claude-3.5-sonnet`), Gemini(`gemini-2.0-flash`,`gemini-1.5-flash-*`,`gemini-2.0-pro-*`) |\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Analyzing and parsing documents",
    "version": "0.0.3",
    "project_urls": null,
    "split_keywords": [
        "langchain",
        " openai",
        " openpyxl",
        " pdf",
        " pdfplumber",
        " pillow"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7a1eed7a00a5593bff7a7667d3ca34c058d25fb7a20d349471ca78a3c0525630",
                "md5": "d641d4e9bb671cd629b5826601cdeb72",
                "sha256": "39778da11956f407de52479e3e561aa4aaff7b06bed05989c218b6df5bca2f24"
            },
            "downloads": -1,
            "filename": "exparso-0.0.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d641d4e9bb671cd629b5826601cdeb72",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 24276,
            "upload_time": "2025-07-22T09:43:15",
            "upload_time_iso_8601": "2025-07-22T09:43:15.082960Z",
            "url": "https://files.pythonhosted.org/packages/7a/1e/ed7a00a5593bff7a7667d3ca34c058d25fb7a20d349471ca78a3c0525630/exparso-0.0.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "156884a5bb8c3f544f1494436f22e5d98d826cd6a22e845d3c968d2459666b6b",
                "md5": "e18b5982e8609de707b10e37042493e9",
                "sha256": "9a779e8af7d5afe82e822521938b5801fbfd47001cd6cea9c56efd8e8020464a"
            },
            "downloads": -1,
            "filename": "exparso-0.0.3.tar.gz",
            "has_sig": false,
            "md5_digest": "e18b5982e8609de707b10e37042493e9",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 16118,
            "upload_time": "2025-07-22T09:43:16",
            "upload_time_iso_8601": "2025-07-22T09:43:16.142474Z",
            "url": "https://files.pythonhosted.org/packages/15/68/84a5bb8c3f544f1494436f22e5d98d826cd6a22e845d3c968d2459666b6b/exparso-0.0.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-22 09:43:16",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "exparso"
}
        
Elapsed time: 0.55994s