pdftotext

Name	pdftotext JSON
Version	2.2.2 JSON
	download
home_page	https://github.com/jalan/pdftotext
Summary	Simple PDF text extraction
upload_time	2021-11-23 19:29:39
maintainer
docs_url	None
author	Jason Alan Palmer
requires_python
license	MIT
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # pdftotext

[![PyPI Status](https://img.shields.io/pypi/v/pdftotext.svg)](https://pypi.python.org/pypi/pdftotext)
[![Azure Status](https://dev.azure.com/jalanpalmer/jalanpalmer/_apis/build/status/jalan.pdftotext?branchName=master)](https://dev.azure.com/jalanpalmer/jalanpalmer/_build/latest?definitionId=1&branchName=master)
[![AppVeyor status](https://ci.appveyor.com/api/projects/status/uwcjxgu31kirkiuj/branch/master?svg=true)](https://ci.appveyor.com/project/jalan/pdftotext/branch/master)
[![Coverage Status](https://coveralls.io/repos/github/jalan/pdftotext/badge.svg?branch=master)](https://coveralls.io/github/jalan/pdftotext?branch=master)
[![Downloads](https://img.shields.io/pypi/dm/pdftotext.svg)](https://pypistats.org/packages/pdftotext)

Simple PDF text extraction

```python
import pdftotext

# Load your PDF
with open("lorem_ipsum.pdf", "rb") as f:
    pdf = pdftotext.PDF(f)

# If it's password-protected
with open("secure.pdf", "rb") as f:
    pdf = pdftotext.PDF(f, "secret")

# How many pages?
print(len(pdf))

# Iterate over all the pages
for page in pdf:
    print(page)

# Read some individual pages
print(pdf[0])
print(pdf[1])

# Read all the text into one string
print("\n\n".join(pdf))
```


## OS Dependencies

These instructions assume you're using Python 3 on a recent OS. Package names
may differ for Python 2 or for an older OS.

### Debian, Ubuntu, and friends

```
sudo apt install build-essential libpoppler-cpp-dev pkg-config python3-dev
```

### Fedora, Red Hat, and friends

```
sudo yum install gcc-c++ pkgconfig poppler-cpp-devel python3-devel
```

### macOS

```
brew install pkg-config poppler python
```

### Windows

Currently tested only when using conda:

 - Install the Microsoft Visual C++ Build Tools
 - Install poppler through conda:
   ```
   conda install -c conda-forge poppler
   ```


## Install

```
pip install pdftotext
```

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/jalan/pdftotext",
    "name": "pdftotext",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "",
    "author": "Jason Alan Palmer",
    "author_email": "jalanpalmer@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/e0/e3/79a2ad7ca71160fb6442772155389881672c98bd44c6022303ce242cbfb9/pdftotext-2.2.2.tar.gz",
    "platform": "",
    "description": "# pdftotext\n\n[![PyPI Status](https://img.shields.io/pypi/v/pdftotext.svg)](https://pypi.python.org/pypi/pdftotext)\n[![Azure Status](https://dev.azure.com/jalanpalmer/jalanpalmer/_apis/build/status/jalan.pdftotext?branchName=master)](https://dev.azure.com/jalanpalmer/jalanpalmer/_build/latest?definitionId=1&branchName=master)\n[![AppVeyor status](https://ci.appveyor.com/api/projects/status/uwcjxgu31kirkiuj/branch/master?svg=true)](https://ci.appveyor.com/project/jalan/pdftotext/branch/master)\n[![Coverage Status](https://coveralls.io/repos/github/jalan/pdftotext/badge.svg?branch=master)](https://coveralls.io/github/jalan/pdftotext?branch=master)\n[![Downloads](https://img.shields.io/pypi/dm/pdftotext.svg)](https://pypistats.org/packages/pdftotext)\n\nSimple PDF text extraction\n\n```python\nimport pdftotext\n\n# Load your PDF\nwith open(\"lorem_ipsum.pdf\", \"rb\") as f:\n    pdf = pdftotext.PDF(f)\n\n# If it's password-protected\nwith open(\"secure.pdf\", \"rb\") as f:\n    pdf = pdftotext.PDF(f, \"secret\")\n\n# How many pages?\nprint(len(pdf))\n\n# Iterate over all the pages\nfor page in pdf:\n    print(page)\n\n# Read some individual pages\nprint(pdf[0])\nprint(pdf[1])\n\n# Read all the text into one string\nprint(\"\\n\\n\".join(pdf))\n```\n\n\n## OS Dependencies\n\nThese instructions assume you're using Python 3 on a recent OS. Package names\nmay differ for Python 2 or for an older OS.\n\n### Debian, Ubuntu, and friends\n\n```\nsudo apt install build-essential libpoppler-cpp-dev pkg-config python3-dev\n```\n\n### Fedora, Red Hat, and friends\n\n```\nsudo yum install gcc-c++ pkgconfig poppler-cpp-devel python3-devel\n```\n\n### macOS\n\n```\nbrew install pkg-config poppler python\n```\n\n### Windows\n\nCurrently tested only when using conda:\n\n - Install the Microsoft Visual C++ Build Tools\n - Install poppler through conda:\n   ```\n   conda install -c conda-forge poppler\n   ```\n\n\n## Install\n\n```\npip install pdftotext\n```\n\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Simple PDF text extraction",
    "version": "2.2.2",
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "8814a3bdc5c9ad6bc6c3189914b597af",
                "sha256": "2a9aa89bc62022408781b39d188fabf5a3ad1103b6630f32c4e27e395f7966ee"
            },
            "downloads": -1,
            "filename": "pdftotext-2.2.2.tar.gz",
            "has_sig": false,
            "md5_digest": "8814a3bdc5c9ad6bc6c3189914b597af",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 113899,
            "upload_time": "2021-11-23T19:29:39",
            "upload_time_iso_8601": "2021-11-23T19:29:39.358389Z",
            "url": "https://files.pythonhosted.org/packages/e0/e3/79a2ad7ca71160fb6442772155389881672c98bd44c6022303ce242cbfb9/pdftotext-2.2.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2021-11-23 19:29:39",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "jalan",
    "github_project": "pdftotext",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "pdftotext"
}

Jason Alan Palmer