# pdftotext
[![PyPI Status](https://img.shields.io/pypi/v/pdftotext.svg)](https://pypi.python.org/pypi/pdftotext)
[![Azure Status](https://dev.azure.com/jalanpalmer/jalanpalmer/_apis/build/status/jalan.pdftotext?branchName=master)](https://dev.azure.com/jalanpalmer/jalanpalmer/_build/latest?definitionId=1&branchName=master)
[![AppVeyor status](https://ci.appveyor.com/api/projects/status/uwcjxgu31kirkiuj/branch/master?svg=true)](https://ci.appveyor.com/project/jalan/pdftotext/branch/master)
[![Coverage Status](https://coveralls.io/repos/github/jalan/pdftotext/badge.svg?branch=master)](https://coveralls.io/github/jalan/pdftotext?branch=master)
[![Downloads](https://img.shields.io/pypi/dm/pdftotext.svg)](https://pypistats.org/packages/pdftotext)
Simple PDF text extraction
```python
import pdftotext
# Load your PDF
with open("lorem_ipsum.pdf", "rb") as f:
pdf = pdftotext.PDF(f)
# If it's password-protected
with open("secure.pdf", "rb") as f:
pdf = pdftotext.PDF(f, "secret")
# How many pages?
print(len(pdf))
# Iterate over all the pages
for page in pdf:
print(page)
# Read some individual pages
print(pdf[0])
print(pdf[1])
# Read all the text into one string
print("\n\n".join(pdf))
```
## OS Dependencies
These instructions assume you're using Python 3 on a recent OS. Package names
may differ for Python 2 or for an older OS.
### Debian, Ubuntu, and friends
```
sudo apt install build-essential libpoppler-cpp-dev pkg-config python3-dev
```
### Fedora, Red Hat, and friends
```
sudo yum install gcc-c++ pkgconfig poppler-cpp-devel python3-devel
```
### macOS
```
brew install pkg-config poppler python
```
### Windows
Currently tested only when using conda:
- Install the Microsoft Visual C++ Build Tools
- Install poppler through conda:
```
conda install -c conda-forge poppler
```
## Install
```
pip install pdftotext
```
Raw data
{
"_id": null,
"home_page": "https://github.com/jalan/pdftotext",
"name": "pdftotext",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "",
"author": "Jason Alan Palmer",
"author_email": "jalanpalmer@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/e0/e3/79a2ad7ca71160fb6442772155389881672c98bd44c6022303ce242cbfb9/pdftotext-2.2.2.tar.gz",
"platform": "",
"description": "# pdftotext\n\n[![PyPI Status](https://img.shields.io/pypi/v/pdftotext.svg)](https://pypi.python.org/pypi/pdftotext)\n[![Azure Status](https://dev.azure.com/jalanpalmer/jalanpalmer/_apis/build/status/jalan.pdftotext?branchName=master)](https://dev.azure.com/jalanpalmer/jalanpalmer/_build/latest?definitionId=1&branchName=master)\n[![AppVeyor status](https://ci.appveyor.com/api/projects/status/uwcjxgu31kirkiuj/branch/master?svg=true)](https://ci.appveyor.com/project/jalan/pdftotext/branch/master)\n[![Coverage Status](https://coveralls.io/repos/github/jalan/pdftotext/badge.svg?branch=master)](https://coveralls.io/github/jalan/pdftotext?branch=master)\n[![Downloads](https://img.shields.io/pypi/dm/pdftotext.svg)](https://pypistats.org/packages/pdftotext)\n\nSimple PDF text extraction\n\n```python\nimport pdftotext\n\n# Load your PDF\nwith open(\"lorem_ipsum.pdf\", \"rb\") as f:\n pdf = pdftotext.PDF(f)\n\n# If it's password-protected\nwith open(\"secure.pdf\", \"rb\") as f:\n pdf = pdftotext.PDF(f, \"secret\")\n\n# How many pages?\nprint(len(pdf))\n\n# Iterate over all the pages\nfor page in pdf:\n print(page)\n\n# Read some individual pages\nprint(pdf[0])\nprint(pdf[1])\n\n# Read all the text into one string\nprint(\"\\n\\n\".join(pdf))\n```\n\n\n## OS Dependencies\n\nThese instructions assume you're using Python 3 on a recent OS. Package names\nmay differ for Python 2 or for an older OS.\n\n### Debian, Ubuntu, and friends\n\n```\nsudo apt install build-essential libpoppler-cpp-dev pkg-config python3-dev\n```\n\n### Fedora, Red Hat, and friends\n\n```\nsudo yum install gcc-c++ pkgconfig poppler-cpp-devel python3-devel\n```\n\n### macOS\n\n```\nbrew install pkg-config poppler python\n```\n\n### Windows\n\nCurrently tested only when using conda:\n\n - Install the Microsoft Visual C++ Build Tools\n - Install poppler through conda:\n ```\n conda install -c conda-forge poppler\n ```\n\n\n## Install\n\n```\npip install pdftotext\n```\n\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Simple PDF text extraction",
"version": "2.2.2",
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"md5": "8814a3bdc5c9ad6bc6c3189914b597af",
"sha256": "2a9aa89bc62022408781b39d188fabf5a3ad1103b6630f32c4e27e395f7966ee"
},
"downloads": -1,
"filename": "pdftotext-2.2.2.tar.gz",
"has_sig": false,
"md5_digest": "8814a3bdc5c9ad6bc6c3189914b597af",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 113899,
"upload_time": "2021-11-23T19:29:39",
"upload_time_iso_8601": "2021-11-23T19:29:39.358389Z",
"url": "https://files.pythonhosted.org/packages/e0/e3/79a2ad7ca71160fb6442772155389881672c98bd44c6022303ce242cbfb9/pdftotext-2.2.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2021-11-23 19:29:39",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "jalan",
"github_project": "pdftotext",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "pdftotext"
}