wisup_e2m


Namewisup_e2m JSON
Version 0.1.61 PyPI version JSON
download
home_pagehttps://github.com/wisupai/e2m
SummaryEverything to Markdown.
upload_time2024-08-30 04:25:13
maintainerNone
docs_urlNone
authorWisup Team
requires_python<3.13,>=3.10
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <p align="center">
  <img src="https://github.com/wisupai/e2m/blob/main/docs/images/wisup_e2m_banner.jpg?raw=true" width="800px" alt="wisup_e2m Logo">
</p>

<p align="center">
    <a href="https://github.com/user/repo/blob/main/LICENSE">
        <img src="https://img.shields.io/badge/license-MIT-green" alt="License">
    </a>
    <a href="https://github.com/wisupai/e2m">
        <img src="https://img.shields.io/badge/e2m-repo-blue" alt="E2M Repo">
    </a>
    <a href="https://github.com/Jing-yilin/E2M/tags/0.1.61">
        <img src="https://img.shields.io/badge/version-0.1.61-blue" alt="E2M Version">
    </a>
    <a href="https://www.python.org/downloads/">
        <img src="https://img.shields.io/badge/python-3.10%20%7C%203.11-blue" alt="Python Version">
    </a>
    <a href="https://pypi.org/project/wisup_e2m/">
        <img src="https://img.shields.io/badge/pypi-wisup__e2m-blue" alt="PyPI">
    </a>
    <a href="https://github.com/wisupai/e2m/blob/main/README-zh.md">
        <img src="https://img.shields.io/badge/docs-中文文档-red" alt="中文文档">
    </a>
</p>

# 🚀 E2M: Everything to Markdown

**Everything to Markdown**

E2M is a Python library that can parse and convert various file types into Markdown format. By utilizing a parser-converter architecture, it supports the conversion of multiple file formats, including doc, docx, epub, html, htm, url, pdf, ppt, pptx, mp3, and m4a.

✨The ultimate goal of the E2M project is to provide high-quality data for Retrieval-Augmented Generation (RAG) and model training or fine-tuning.

**Core Architecture of the Project:**

- **Parser**: Responsible for parsing various file types into text or image data.
- **Converter**: Responsible for converting text or image data into Markdown format.

Generally, for any type of file, the parser is run first to extract internal data such as text and images. Then, the converter is used to transform this data into Markdown format.


<p align="center">
  <img src="https://github.com/wisupai/e2m/blob/main/docs/images/e2m_pipeline.jpg?raw=true" width="400px" alt="wisup_e2m Logo">
</p>

## 📹 Video Introduction

<div align="center">
  <a href="https://www.bilibili.com/video/BV1HvWeenEYQ">
    <img src="./docs/images/video_banner.png" alt="Watch the video" width="400px">
  </a>
</div>

## 📂 All Converters and Parsers

<table>
  <thead>
    <tr>
      <th colspan="3" style="text-align:center;">Parser</th>
    </tr>
    <tr>
      <th>Parser Type</th>
      <th>Engine</th>
      <th>Supported File Type</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>PdfParser</td>
      <td>surya_layout, marker, unstructured</td>
      <td>pdf</td>
    </tr>
    <tr>
      <td>DocParser</td>
      <td>pandoc, xml</td>
      <td>doc</td>
    </tr>
    <tr>
      <td>DocxParser</td>
      <td>pandoc, xml</td>
      <td>docx</td>
    </tr>
    <tr>
      <td>PptParser</td>
      <td>unstructured</td>
      <td>ppt</td>
    </tr>
    <tr>
      <td>PptxParser</td>
      <td>unstructured</td>
      <td>pptx</td>
    </tr>
    <tr>
      <td>UrlParser</td>
      <td>unstructured, jina, firecrawl</td>
      <td>url</td>
    </tr>
    <tr>
      <td>EpubParser</td>
      <td>unstructured</td>
      <td>epub</td>
    </tr>
    <tr>
      <td>HtmlParser</td>
      <td>unstructured</td>
      <td>html, htm</td>
    </tr>
    <tr>
      <td>VoiceParser</td>
      <td>openai_whisper_api, openai_whisper_local, SpeechRecognition</td>
      <td>mp3, m4a</td>
    </tr>
  </tbody>
</table>


<table>
  <thead>
    <tr>
      <th colspan="3" style="text-align:center;">Converter</th>
    </tr>
    <tr>
      <th>Converter Type</th>
      <th>Engine</th>
      <th>Strategy</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>ImageConverter</td>
      <td>litellm, zhipuai (Not Well in Image Recognition, Not Recommended)</td>
      <td>default</td>
    </tr>
    <tr>
      <td>TextConverter</td>
      <td>litellm, zhipuai</td>
      <td>default</td>
    </tr>
  </tbody>
</table>

## 📦 Installation

Create Environment:
```bash
conda create -n e2m python=3.10
conda activate e2m
```

Update pip:
```bash
pip install --upgrade pip
```

Install E2M using pip:

```bash
# Option 1: Install via git, most recommended
pip install git+https://github.com/wisupai/e2m.git --index-url https://pypi.org/simple
# Option 2: Install via pip
pip install wisup_e2m
# Option 3: Manual installation
git clone https://github.com/wisupai/e2m.git
cd e2m
pip install poetry
poetry build
pip install dist/wisup_e2m-0.1.61-py3-none-any.whl
```

## ⚡️ Parser Quick Start

Here's simple examples demonstrating how to use E2M Parsers:

### 📄 Pdf Parser

```python
from wisup_e2m import PdfParser

pdf_path = "./test.pdf"
parser = PdfParser(engine="marker") # pdf engines: marker, unstructured, surya_layout
pdf_data = parser.parse(pdf_path)
print(pdf_data.text)
```

### 📝 Doc Parser

```python
from wisup_e2m import DocParser

doc_path = "./test.doc"
parser = DocParser(engine="pandoc") # doc engines: pandoc, xml
doc_data = parser.parse(doc_path)
print(doc_data.text)
```

### 📜 Docx Parser

```python
from wisup_e2m import DocxParser

docx_path = "./test.docx"
parser = DocxParser(engine="pandoc") # docx engines: pandoc, xml
docx_data = parser.parse(docx_path)
print(docx_data.text)
```

### 📚 Epub Parser

```python
from wisup_e2m import EpubParser

epub_path = "./test.epub"
parser = EpubParser(engine="unstructured") # epub engines: unstructured
epub_data = parser.parse(epub_path)
print(epub_data.text)
```

### 🌐 Html Parser

```python
from wisup_e2m import HtmlParser

html_path = "./test.html"
parser = HtmlParser(engine="unstructured") # html engines: unstructured
html_data = parser.parse(html_path)
print(html_data.text)
```

### 🔗 Url Parser

```python
from wisup_e2m import UrlParser

url = "https://www.example.com"
parser = UrlParser(engine="jina") # url engines: jina, firecrawl, unstructured
url_data = parser.parse(url)
print(url_data.text)
```

### 🖼️ Ppt Parser

```python
from wisup_e2m import PptParser

ppt_path = "./test.ppt"
parser = PptParser(engine="unstructured") # ppt engines: unstructured
ppt_data = parser.parse(ppt_path)
print(ppt_data.text)
```

### 🖼️ Pptx Parser

```python
from wisup_e2m import PptxParser

pptx_path = "./test.pptx"
parser = PptxParser(engine="unstructured") # pptx engines: unstructured
pptx_data = parser.parse(pptx_path)
print(pptx_data.text)
```

### 🎤 Voice Parser

```python
from wisup_e2m import VoiceParser

voice_path = "./test.mp3"
parser = VoiceParser(
  engine="openai_whisper_local", # voice engines: openai_whisper_api, openai_whisper_local
  model="large" # available models: https://github.com/openai/whisper#available-models-and-languages
  )

voice_data = parser.parse(voice_path)
print(voice_data.text)
```

## 🔄 Converter Quick Start

Here's simple examples demonstrating how to use E2M Converters:

### 📝 Text Converter

```python
from wisup_e2m import TextConverter

text = "Parsed text data from any parser"
converter = TextConverter(
  engine="litellm", # text engines: litellm
  model="deepseek/deepseek-chat",
  api_key="your api key",
  base_url="your base url"
  )
text_data = converter.convert(text)
print(text_data)
```

### 🖼️ Image Converter

```python
from wisup_e2m import ImageConverter

images = ["./test1.png", "./test2.png"]
converter = ImageConverter(
  engine="litellm", # image engines: litellm
  model="gpt-4o",
  api_key="your api key",
  base_url="your base url"
  )
image_data = converter.convert(image_path)
print(image_data)
```

## 🆙 Next Level

### 🛠️ E2MParser

`E2MParser` is an integrated parser that supports multiple file types. It can be used to parse a wide range of file types into Markdown format.

```python
from wisup_e2m import E2MParser

# Initialize the parser with your configuration file
ep = E2MParser.from_config("config.yaml")

# Parse the desired file
data = ep.parse(file_name="/path/to/file.pdf")

# Print the parsed data as a dictionary
print(data.to_dict())
```

### 🛠️ E2MConverter

`E2MConverter` is an integrated converter that supports text and image conversion. It can be used to convert text and images into Markdown format.

```python
from wisup_e2m import E2MConverter

ec = E2MConverter.from_config("./config.yaml")

text = "Parsed text data from any parser"

ec.convert(text=text)

images = ["test.jpg", "test.png"]
ec.convert(images=images)
```

You can use a `config.yaml` file to specify the parsers and converters you want to use. Here is an example of a `config.yaml` file:


```yaml
parsers:
    doc_parser:
        engine: "pandoc"
        langs: ["en", "zh"]
    docx_parser:
        engine: "pandoc"
        langs: ["en", "zh"]
    epub_parser:
        engine: "unstructured"
        langs: ["en", "zh"]
    html_parser:
        engine: "unstructured"
        langs: ["en", "zh"]
    url_parser:
        engine: "jina"
        langs: ["en", "zh"]
    pdf_parser:
        engine: "marker"
        langs: ["en", "zh"]
    pptx_parser:
        engine: "unstructured"
        langs: ["en", "zh"]
    voice_parser:
        # option 1: use openai whisper api
        # engine: "openai_whisper_api"
        # api_base: "https://api.openai.com/v1"
        # api_key: "your_api_key"
        # model: "whisper"

        # option 2: use local whisper model
        engine: "openai_whisper_local"
        model: "large" # available models: https://github.com/openai/whisper#available-models-and-languages

converters:
    text_converter:
        engine: "litellm"
        model: "deepseek/deepseek-chat"
        api_key: "your_api_key"
        # base_url: ""
    image_converter:
        engine: "litellm"
        model: "gpt-4o-mini"
        api_key: "your_api_key"
        # base_url: ""
```

## ❓ Q&A

[FAQ Document](./docs/faq/FAQ-en.md)

## 📜 License

This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.

## 📧 Contact

You can scan the QR code below to join our WeChat group:

<p align="center">
  <img src="docs/images/wechat_QR.png" width="200px" alt="wisup_e2m Logo">
</p>

For any questions or inquiries, please open an issue on [GitHub](https://github.com/wisupai/e2m) or contact us at [team@wisup.ai](mailto:team@wisup.ai).

Contact for business cooperation: [team@wisup.ai](mailto:team@wisup.ai)

## 💼 Join Us

<p align="center">
  <img src="./docs/images/wisup_logo.png" width="400px" alt="wisup_e2m Logo">
</p>

- Wisup is an AI startup with a strong focus on data and algorithms. We specialize in providing high-quality data and algorithm services for enterprises. We embrace a remote working model and welcome talented individuals from around the world to join us.

- Our philosophy: From information to data, from data to knowledge, from knowledge to value.

- Our vision: To make the world a better place through data.

- We are looking for: Like-minded Co-Founders
  - No restrictions on education, age, location, race, or gender
  - Keen interest in AI and familiarity with AI and related vertical industries
  - Passionate about AI and data, with a strong sense of purpose
  - Possess unique strengths, responsibility, and a team-oriented mindset

- To apply, send your resume to: [team@wisup.ai](mailto:team@wisup.ai)

- You also need to answer three questions in your email:
  - What makes you irreplaceable?
  - What is the most challenging situation you have faced, and how did you resolve it?
  - How do you view the future development of AI?

## 🌟 Contributing

<a href="https://github.com/wisupai/e2m/graphs/contributors">
  <img src="https://contrib.rocks/image?repo=wisupai/e2m" />
</a>

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/wisupai/e2m",
    "name": "wisup_e2m",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.13,>=3.10",
    "maintainer_email": null,
    "keywords": null,
    "author": "Wisup Team",
    "author_email": "team@wisup.a",
    "download_url": "https://files.pythonhosted.org/packages/e9/3d/f4d384be914d23c6fcd2045311c6f17b28875acc051cc4149de41050d6f4/wisup_e2m-0.1.61.tar.gz",
    "platform": null,
    "description": "<p align=\"center\">\n  <img src=\"https://github.com/wisupai/e2m/blob/main/docs/images/wisup_e2m_banner.jpg?raw=true\" width=\"800px\" alt=\"wisup_e2m Logo\">\n</p>\n\n<p align=\"center\">\n    <a href=\"https://github.com/user/repo/blob/main/LICENSE\">\n        <img src=\"https://img.shields.io/badge/license-MIT-green\" alt=\"License\">\n    </a>\n    <a href=\"https://github.com/wisupai/e2m\">\n        <img src=\"https://img.shields.io/badge/e2m-repo-blue\" alt=\"E2M Repo\">\n    </a>\n    <a href=\"https://github.com/Jing-yilin/E2M/tags/0.1.61\">\n        <img src=\"https://img.shields.io/badge/version-0.1.61-blue\" alt=\"E2M Version\">\n    </a>\n    <a href=\"https://www.python.org/downloads/\">\n        <img src=\"https://img.shields.io/badge/python-3.10%20%7C%203.11-blue\" alt=\"Python Version\">\n    </a>\n    <a href=\"https://pypi.org/project/wisup_e2m/\">\n        <img src=\"https://img.shields.io/badge/pypi-wisup__e2m-blue\" alt=\"PyPI\">\n    </a>\n    <a href=\"https://github.com/wisupai/e2m/blob/main/README-zh.md\">\n        <img src=\"https://img.shields.io/badge/docs-\u4e2d\u6587\u6587\u6863-red\" alt=\"\u4e2d\u6587\u6587\u6863\">\n    </a>\n</p>\n\n# \ud83d\ude80 E2M: Everything to Markdown\n\n**Everything to Markdown**\n\nE2M is a Python library that can parse and convert various file types into Markdown format. By utilizing a parser-converter architecture, it supports the conversion of multiple file formats, including doc, docx, epub, html, htm, url, pdf, ppt, pptx, mp3, and m4a.\n\n\u2728The ultimate goal of the E2M project is to provide high-quality data for Retrieval-Augmented Generation (RAG) and model training or fine-tuning.\n\n**Core Architecture of the Project:**\n\n- **Parser**: Responsible for parsing various file types into text or image data.\n- **Converter**: Responsible for converting text or image data into Markdown format.\n\nGenerally, for any type of file, the parser is run first to extract internal data such as text and images. Then, the converter is used to transform this data into Markdown format.\n\n\n<p align=\"center\">\n  <img src=\"https://github.com/wisupai/e2m/blob/main/docs/images/e2m_pipeline.jpg?raw=true\" width=\"400px\" alt=\"wisup_e2m Logo\">\n</p>\n\n## \ud83d\udcf9 Video Introduction\n\n<div align=\"center\">\n  <a href=\"https://www.bilibili.com/video/BV1HvWeenEYQ\">\n    <img src=\"./docs/images/video_banner.png\" alt=\"Watch the video\" width=\"400px\">\n  </a>\n</div>\n\n## \ud83d\udcc2 All Converters and Parsers\n\n<table>\n  <thead>\n    <tr>\n      <th colspan=\"3\" style=\"text-align:center;\">Parser</th>\n    </tr>\n    <tr>\n      <th>Parser Type</th>\n      <th>Engine</th>\n      <th>Supported File Type</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <td>PdfParser</td>\n      <td>surya_layout, marker, unstructured</td>\n      <td>pdf</td>\n    </tr>\n    <tr>\n      <td>DocParser</td>\n      <td>pandoc, xml</td>\n      <td>doc</td>\n    </tr>\n    <tr>\n      <td>DocxParser</td>\n      <td>pandoc, xml</td>\n      <td>docx</td>\n    </tr>\n    <tr>\n      <td>PptParser</td>\n      <td>unstructured</td>\n      <td>ppt</td>\n    </tr>\n    <tr>\n      <td>PptxParser</td>\n      <td>unstructured</td>\n      <td>pptx</td>\n    </tr>\n    <tr>\n      <td>UrlParser</td>\n      <td>unstructured, jina, firecrawl</td>\n      <td>url</td>\n    </tr>\n    <tr>\n      <td>EpubParser</td>\n      <td>unstructured</td>\n      <td>epub</td>\n    </tr>\n    <tr>\n      <td>HtmlParser</td>\n      <td>unstructured</td>\n      <td>html, htm</td>\n    </tr>\n    <tr>\n      <td>VoiceParser</td>\n      <td>openai_whisper_api, openai_whisper_local, SpeechRecognition</td>\n      <td>mp3, m4a</td>\n    </tr>\n  </tbody>\n</table>\n\n\n<table>\n  <thead>\n    <tr>\n      <th colspan=\"3\" style=\"text-align:center;\">Converter</th>\n    </tr>\n    <tr>\n      <th>Converter Type</th>\n      <th>Engine</th>\n      <th>Strategy</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <td>ImageConverter</td>\n      <td>litellm, zhipuai (Not Well in Image Recognition, Not Recommended)</td>\n      <td>default</td>\n    </tr>\n    <tr>\n      <td>TextConverter</td>\n      <td>litellm, zhipuai</td>\n      <td>default</td>\n    </tr>\n  </tbody>\n</table>\n\n## \ud83d\udce6 Installation\n\nCreate Environment:\n```bash\nconda create -n e2m python=3.10\nconda activate e2m\n```\n\nUpdate pip:\n```bash\npip install --upgrade pip\n```\n\nInstall E2M using pip:\n\n```bash\n# Option 1: Install via git, most recommended\npip install git+https://github.com/wisupai/e2m.git --index-url https://pypi.org/simple\n# Option 2: Install via pip\npip install wisup_e2m\n# Option 3: Manual installation\ngit clone https://github.com/wisupai/e2m.git\ncd e2m\npip install poetry\npoetry build\npip install dist/wisup_e2m-0.1.61-py3-none-any.whl\n```\n\n## \u26a1\ufe0f Parser Quick Start\n\nHere's simple examples demonstrating how to use E2M Parsers:\n\n### \ud83d\udcc4 Pdf Parser\n\n```python\nfrom wisup_e2m import PdfParser\n\npdf_path = \"./test.pdf\"\nparser = PdfParser(engine=\"marker\") # pdf engines: marker, unstructured, surya_layout\npdf_data = parser.parse(pdf_path)\nprint(pdf_data.text)\n```\n\n### \ud83d\udcdd Doc Parser\n\n```python\nfrom wisup_e2m import DocParser\n\ndoc_path = \"./test.doc\"\nparser = DocParser(engine=\"pandoc\") # doc engines: pandoc, xml\ndoc_data = parser.parse(doc_path)\nprint(doc_data.text)\n```\n\n### \ud83d\udcdc Docx Parser\n\n```python\nfrom wisup_e2m import DocxParser\n\ndocx_path = \"./test.docx\"\nparser = DocxParser(engine=\"pandoc\") # docx engines: pandoc, xml\ndocx_data = parser.parse(docx_path)\nprint(docx_data.text)\n```\n\n### \ud83d\udcda Epub Parser\n\n```python\nfrom wisup_e2m import EpubParser\n\nepub_path = \"./test.epub\"\nparser = EpubParser(engine=\"unstructured\") # epub engines: unstructured\nepub_data = parser.parse(epub_path)\nprint(epub_data.text)\n```\n\n### \ud83c\udf10 Html Parser\n\n```python\nfrom wisup_e2m import HtmlParser\n\nhtml_path = \"./test.html\"\nparser = HtmlParser(engine=\"unstructured\") # html engines: unstructured\nhtml_data = parser.parse(html_path)\nprint(html_data.text)\n```\n\n### \ud83d\udd17 Url Parser\n\n```python\nfrom wisup_e2m import UrlParser\n\nurl = \"https://www.example.com\"\nparser = UrlParser(engine=\"jina\") # url engines: jina, firecrawl, unstructured\nurl_data = parser.parse(url)\nprint(url_data.text)\n```\n\n### \ud83d\uddbc\ufe0f Ppt Parser\n\n```python\nfrom wisup_e2m import PptParser\n\nppt_path = \"./test.ppt\"\nparser = PptParser(engine=\"unstructured\") # ppt engines: unstructured\nppt_data = parser.parse(ppt_path)\nprint(ppt_data.text)\n```\n\n### \ud83d\uddbc\ufe0f Pptx Parser\n\n```python\nfrom wisup_e2m import PptxParser\n\npptx_path = \"./test.pptx\"\nparser = PptxParser(engine=\"unstructured\") # pptx engines: unstructured\npptx_data = parser.parse(pptx_path)\nprint(pptx_data.text)\n```\n\n### \ud83c\udfa4 Voice Parser\n\n```python\nfrom wisup_e2m import VoiceParser\n\nvoice_path = \"./test.mp3\"\nparser = VoiceParser(\n  engine=\"openai_whisper_local\", # voice engines: openai_whisper_api, openai_whisper_local\n  model=\"large\" # available models: https://github.com/openai/whisper#available-models-and-languages\n  )\n\nvoice_data = parser.parse(voice_path)\nprint(voice_data.text)\n```\n\n## \ud83d\udd04 Converter Quick Start\n\nHere's simple examples demonstrating how to use E2M Converters:\n\n### \ud83d\udcdd Text Converter\n\n```python\nfrom wisup_e2m import TextConverter\n\ntext = \"Parsed text data from any parser\"\nconverter = TextConverter(\n  engine=\"litellm\", # text engines: litellm\n  model=\"deepseek/deepseek-chat\",\n  api_key=\"your api key\",\n  base_url=\"your base url\"\n  )\ntext_data = converter.convert(text)\nprint(text_data)\n```\n\n### \ud83d\uddbc\ufe0f Image Converter\n\n```python\nfrom wisup_e2m import ImageConverter\n\nimages = [\"./test1.png\", \"./test2.png\"]\nconverter = ImageConverter(\n  engine=\"litellm\", # image engines: litellm\n  model=\"gpt-4o\",\n  api_key=\"your api key\",\n  base_url=\"your base url\"\n  )\nimage_data = converter.convert(image_path)\nprint(image_data)\n```\n\n## \ud83c\udd99 Next Level\n\n### \ud83d\udee0\ufe0f E2MParser\n\n`E2MParser` is an integrated parser that supports multiple file types. It can be used to parse a wide range of file types into Markdown format.\n\n```python\nfrom wisup_e2m import E2MParser\n\n# Initialize the parser with your configuration file\nep = E2MParser.from_config(\"config.yaml\")\n\n# Parse the desired file\ndata = ep.parse(file_name=\"/path/to/file.pdf\")\n\n# Print the parsed data as a dictionary\nprint(data.to_dict())\n```\n\n### \ud83d\udee0\ufe0f E2MConverter\n\n`E2MConverter` is an integrated converter that supports text and image conversion. It can be used to convert text and images into Markdown format.\n\n```python\nfrom wisup_e2m import E2MConverter\n\nec = E2MConverter.from_config(\"./config.yaml\")\n\ntext = \"Parsed text data from any parser\"\n\nec.convert(text=text)\n\nimages = [\"test.jpg\", \"test.png\"]\nec.convert(images=images)\n```\n\nYou can use a `config.yaml` file to specify the parsers and converters you want to use. Here is an example of a `config.yaml` file:\n\n\n```yaml\nparsers:\n    doc_parser:\n        engine: \"pandoc\"\n        langs: [\"en\", \"zh\"]\n    docx_parser:\n        engine: \"pandoc\"\n        langs: [\"en\", \"zh\"]\n    epub_parser:\n        engine: \"unstructured\"\n        langs: [\"en\", \"zh\"]\n    html_parser:\n        engine: \"unstructured\"\n        langs: [\"en\", \"zh\"]\n    url_parser:\n        engine: \"jina\"\n        langs: [\"en\", \"zh\"]\n    pdf_parser:\n        engine: \"marker\"\n        langs: [\"en\", \"zh\"]\n    pptx_parser:\n        engine: \"unstructured\"\n        langs: [\"en\", \"zh\"]\n    voice_parser:\n        # option 1: use openai whisper api\n        # engine: \"openai_whisper_api\"\n        # api_base: \"https://api.openai.com/v1\"\n        # api_key: \"your_api_key\"\n        # model: \"whisper\"\n\n        # option 2: use local whisper model\n        engine: \"openai_whisper_local\"\n        model: \"large\" # available models: https://github.com/openai/whisper#available-models-and-languages\n\nconverters:\n    text_converter:\n        engine: \"litellm\"\n        model: \"deepseek/deepseek-chat\"\n        api_key: \"your_api_key\"\n        # base_url: \"\"\n    image_converter:\n        engine: \"litellm\"\n        model: \"gpt-4o-mini\"\n        api_key: \"your_api_key\"\n        # base_url: \"\"\n```\n\n## \u2753 Q&A\n\n[FAQ Document](./docs/faq/FAQ-en.md)\n\n## \ud83d\udcdc License\n\nThis project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.\n\n## \ud83d\udce7 Contact\n\nYou can scan the QR code below to join our WeChat group:\n\n<p align=\"center\">\n  <img src=\"docs/images/wechat_QR.png\" width=\"200px\" alt=\"wisup_e2m Logo\">\n</p>\n\nFor any questions or inquiries, please open an issue on [GitHub](https://github.com/wisupai/e2m) or contact us at [team@wisup.ai](mailto:team@wisup.ai).\n\nContact for business cooperation: [team@wisup.ai](mailto:team@wisup.ai)\n\n## \ud83d\udcbc Join Us\n\n<p align=\"center\">\n  <img src=\"./docs/images/wisup_logo.png\" width=\"400px\" alt=\"wisup_e2m Logo\">\n</p>\n\n- Wisup is an AI startup with a strong focus on data and algorithms. We specialize in providing high-quality data and algorithm services for enterprises. We embrace a remote working model and welcome talented individuals from around the world to join us.\n\n- Our philosophy: From information to data, from data to knowledge, from knowledge to value.\n\n- Our vision: To make the world a better place through data.\n\n- We are looking for: Like-minded Co-Founders\n  - No restrictions on education, age, location, race, or gender\n  - Keen interest in AI and familiarity with AI and related vertical industries\n  - Passionate about AI and data, with a strong sense of purpose\n  - Possess unique strengths, responsibility, and a team-oriented mindset\n\n- To apply, send your resume to: [team@wisup.ai](mailto:team@wisup.ai)\n\n- You also need to answer three questions in your email:\n  - What makes you irreplaceable?\n  - What is the most challenging situation you have faced, and how did you resolve it?\n  - How do you view the future development of AI?\n\n## \ud83c\udf1f Contributing\n\n<a href=\"https://github.com/wisupai/e2m/graphs/contributors\">\n  <img src=\"https://contrib.rocks/image?repo=wisupai/e2m\" />\n</a>\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Everything to Markdown.",
    "version": "0.1.61",
    "project_urls": {
        "Homepage": "https://github.com/wisupai/e2m",
        "Repository": "https://github.com/wisupai/e2m"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d168b09e22208ff9e1dec5908c8a3d3e6f310b7aee75063ccb1fc245c382ac20",
                "md5": "a7f4bb107d042fe8c6340eb71e71cb2b",
                "sha256": "4d9cbc4cf9b5d0d9e7b9fb6deeb4bd1677d7c5b54c01f4223847ae23914e207d"
            },
            "downloads": -1,
            "filename": "wisup_e2m-0.1.61-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a7f4bb107d042fe8c6340eb71e71cb2b",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.13,>=3.10",
            "size": 71012,
            "upload_time": "2024-08-30T04:25:11",
            "upload_time_iso_8601": "2024-08-30T04:25:11.874375Z",
            "url": "https://files.pythonhosted.org/packages/d1/68/b09e22208ff9e1dec5908c8a3d3e6f310b7aee75063ccb1fc245c382ac20/wisup_e2m-0.1.61-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e93df4d384be914d23c6fcd2045311c6f17b28875acc051cc4149de41050d6f4",
                "md5": "08a950786bd82d74f53aeab533e7a7c7",
                "sha256": "3550aa7a82036b43199b96fbc728cda2ad6ef0a089e8e20becd2e107dd2e830f"
            },
            "downloads": -1,
            "filename": "wisup_e2m-0.1.61.tar.gz",
            "has_sig": false,
            "md5_digest": "08a950786bd82d74f53aeab533e7a7c7",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.13,>=3.10",
            "size": 49892,
            "upload_time": "2024-08-30T04:25:13",
            "upload_time_iso_8601": "2024-08-30T04:25:13.739475Z",
            "url": "https://files.pythonhosted.org/packages/e9/3d/f4d384be914d23c6fcd2045311c6f17b28875acc051cc4149de41050d6f4/wisup_e2m-0.1.61.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-30 04:25:13",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "wisupai",
    "github_project": "e2m",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "wisup_e2m"
}
        
Elapsed time: 0.43771s