<!-- SPDX-FileCopyrightText: 2023 geisserml <geisserml@gmail.com> -->
<!-- SPDX-License-Identifier: CC-BY-4.0 -->
# pypdfium2
[](https://pepy.tech/project/pypdfium2)
[pypdfium2](https://github.com/pypdfium2-team/pypdfium2) is an ABI-level Python 3 binding to [PDFium](https://pdfium.googlesource.com/pdfium/+/refs/heads/main), a powerful and liberal-licensed library for PDF rendering, inspection, manipulation and creation.
The project is built using [ctypesgen](https://github.com/ctypesgen/ctypesgen) and external [PDFium binaries](https://github.com/bblanchon/pdfium-binaries/).
Its custom setup infrastructure provides a seamless packaging and installation process. A wide range of platforms is supported with wheel packages.
pypdfium2 includes helpers to simplify common use cases, while the raw PDFium/ctypes API remains accessible as well.
## Installation
* Installing the latest PyPI release (recommended)
```bash
python3 -m pip install -U pypdfium2
```
This will use a pre-built wheel package, the easiest way of installing pypdfium2.
* Installing from source
* With an external PDFium binary
```bash
# In the directory containing the source code of pypdfium2
python3 -m pip install .
```
* With a locally built PDFium binary
```bash
python3 setupsrc/pypdfium2_setup/build_pdfium.py # call with --help to list options
PDFIUM_PLATFORM="sourcebuild" python3 -m pip install .
```
Building PDFium may take a long time because it comes with its own toolchain and bundled dependencies, rather than using system-provided components.[^pdfium_buildsystem]
The host system needs to provide `git` and a C pre-processor (`gcc` or `clang`).
Setup code also depends on the Python packages `ctypesgen`, `wheel`, and `setuptools`, which will usually get installed automatically.
When installing from source, some additional options of the `pip` package manager may be relevant:
* `-v`: Request more detailed logging output. Useful for debugging.
* `-e`: Install in editable mode, so that the installation will point to the source tree. This way, changes directly take effect without needing to re-install. Recommended for development.
* `--no-build-isolation`: Do not isolate the installation in a virtual environment and use system packages instead. In this case, dependencies specified in `pyproject.toml` (PEP 518) will not take effect and should be pre-installed by the caller. This is an indispensable option if wanting to run the installation with custom versions of setup dependencies.[^no_build_isolation]
[^pdfium_buildsystem]: Replacing PDFium's toolchain with a lean build system that is designed to run on an arbitrary host platform is a long-standing task. This would be required to enable local source build capabilities on installation of an `sdist`. If you have the time and expertise to set up such a build system, please start a repository and inform us about it.
[^no_build_isolation]: Possible scenarios include using a locally modified version of a dependency, or supplying a dependency built from a certain commit.
* Installing an unofficial distribution
To the authors' knowledge, there currently are no other distributions of pypdfium2 apart from the official wheel releases on PyPI and GitHub.
There is no conda package yet.
So far, pypdfium2 has not been included in any operating system repositories. While we are interested in cooperation with external package maintainers to make this possible, the authors of this project have no control over and are not responsible for third-party distributions of pypdfium2.
### Setup magic
As pypdfium2 uses external binaries, there are some special setup aspects to consider.
* Binaries are stored in platform-specific sub-directories of `data/`, along with bindings and version information.
* The environment variable `$PDFIUM_PLATFORM` controls which binary to include on setup.
* If unset or `auto`, the host platform is detected and a corresponding binary will be selected.
By default, the latest pdfium-binaries release is used, otherwise `$PDFIUM_VERSION` may be set to request a specific one.
Moreover, `$PDFIUM_USE_V8=1` may be set to use the V8 (JavaScript) enabled binaries.
(If matching platform files already exist in the `data/` cache, they will be reused as-is.)
* If set to a certain platform identifier, binaries for the requested platform will be used.[^platform_ids]
In this case, platform files will not be downloaded/generated automatically, but need to be supplied beforehand using the `update_pdfium.py` script.
* If set to `sourcebuild`, binaries will be taken from the location where the build script places its artefacts, assuming a prior run of `build_pdfium.py`.
* If set to `none`, no platform-dependent files will be injected, so as to create a source distribution.
[^platform_ids]: This is mainly of internal interest for packaging, so that wheels can be crafted for any platform without access to a native host.
### Runtime Dependencies
pypdfium2 does not have any mandatory runtime dependencies apart from Python and its standard library.
However, some optional support model features require additional packages:
* [`Pillow`](https://pillow.readthedocs.io/en/stable/) (module name `PIL`) is a highly pouplar imaging library for Python.
pypdfium2 provides convenience methods to directly take or return PIL image objects when dealing with raster graphics.
* [`NumPy`](https://numpy.org/doc/stable/index.html) is a library for scientific computing. Similar to `Pillow`, pypdfium2 provides helpers to get raster graphics in the form of multidimensional numpy arrays.
## Usage
### [Support model](https://pypdfium2.readthedocs.io/en/stable/python_api.html)
<!-- TODO demonstrate more APIs (e. g. XObject placement, transform matrices, image extraction, ...) -->
Here are some examples of using the support model API.
* Import the library
```python
import pypdfium2 as pdfium
```
* Open a PDF using the helper class `PdfDocument` (supports file path strings, bytes, and byte buffers)
```python
pdf = pdfium.PdfDocument("./path/to/document.pdf")
version = pdf.get_version() # get the PDF standard version
n_pages = len(pdf) # get the number of pages in the document
```
* Render multiple pages concurrently
```python
page_indices = [i for i in range(n_pages)] # all pages
renderer = pdf.render(
pdfium.PdfBitmap.to_pil,
page_indices = page_indices,
scale = 300/72, # 300dpi resolution
)
for i, image in zip(page_indices, renderer):
image.save("out_%0*d.jpg" % (n_digits, i))
```
* Read the table of contents
```python
for item in pdf.get_toc():
if item.n_kids == 0:
state = "*"
elif item.is_closed:
state = "-"
else:
state = "+"
if item.page_index is None:
target = "?"
else:
target = item.page_index + 1
print(
" " * item.level +
"[%s] %s -> %s # %s %s" % (
state, item.title, target, item.view_mode,
[round(c, n_digits) for c in item.view_pos],
)
)
```
* Load a page to work with
```python
page = pdf[0]
# Get page dimensions in PDF canvas units (1pt->1/72in by default)
width, height = page.get_size()
# Set the absolute page rotation to 90° clockwise
page.set_rotation(90)
# Locate objects on the page
for obj in page.get_objects():
print(obj.level, obj.type, obj.get_pos())
```
* Render a single page
```python
bitmap = page.render(
scale = 1, # 72dpi resolution
rotation = 0, # no additional rotation
# ... further rendering options
)
pil_image = bitmap.to_pil()
pil_image.show()
```
* Extract and search text
```python
# Load a text page helper
textpage = page.get_textpage()
# Extract text from the whole page
text_all = textpage.get_text_range()
# Extract text from a specific rectangular area
text_part = textpage.get_text_bounded(left=50, bottom=100, right=width-50, top=height-100)
# Locate text on the page
searcher = textpage.search("something", match_case=False, match_whole_word=False)
# This will be a list of bounding boxes of the form (left, bottom, right, top)
first_occurrence = searcher.get_next()
```
* Create a new PDF with an empty A4 sized page
```python
pdf = pdfium.PdfDocument.new()
width, height = (595, 842)
page_a = pdf.new_page(width, height)
```
* Include a JPEG image in a PDF
```python
pdf = pdfium.PdfDocument.new()
image = pdfium.PdfImage.new(pdf)
image.load_jpeg("./tests/resources/mona_lisa.jpg")
metadata = image.get_metadata()
matrix = pdfium.PdfMatrix().scale(metadata.width, metadata.height)
image.set_matrix(matrix)
page = pdf.new_page(metadata.width, metadata.height)
page.insert_obj(image)
page.gen_content()
```
* Save the document
```python
# PDF 1.7 standard
pdf.save("output.pdf", version=17)
```
### Raw PDFium API
While helper classes conveniently wrap the raw PDFium API, it may still be accessed directly and is available in the submodule `pypdfium2.raw`.
Since PDFium is a large library, many components are not covered by helpers yet. You may seamlessly interact with the raw API while still using helpers where available. When used as ctypes function parameter, helper objects automatically resolve to the underlying raw object (but you may still access it explicitly if desired):
```python
permission_flags = pdfium.raw.FPDF_GetDocPermission(pdf.raw) # explicit
permission_flags = pdfium.raw.FPDF_GetDocPermission(pdf) # implicit
```
For PDFium documentation, please look at the comments in its [public header files](https://pdfium.googlesource.com/pdfium/+/refs/heads/main/public/).[^pdfium_docs]
A large variety of examples on how to interface with the raw API using [`ctypes`](https://docs.python.org/3/library/ctypes.html) is already provided with [support model source code](src/pypdfium2/_helpers).
Nonetheless, the following guide may be helpful to get started with the raw API, especially for developers who are not familiar with `ctypes` yet.
[^pdfium_docs]: Unfortunately, no recent HTML-rendered documentation is available for PDFium at the moment.
<!-- TODO write something about weakref.finalize(); add example on creating a C page array -->
* In general, PDFium functions can be called just like normal Python functions.
However, parameters may only be passed positionally, i. e. it is not possible to use keyword arguments.
There are no defaults, so you always need to provide a value for each argument.
```python
# arguments: filepath (bytes), password (bytes|None)
# null-terminate filepath and encode as UTF-8
pdf = pdfium.FPDF_LoadDocument((filepath + "\x00").encode("utf-8"), None)
```
This is the underlying bindings declaration,[^bindings_decl] which loads the function from the binary and
contains the information required to convert Python types to their C equivalents.
```python
if _libs["pdfium"].has("FPDF_LoadDocument", "cdecl"):
FPDF_LoadDocument = _libs["pdfium"].get("FPDF_LoadDocument", "cdecl")
FPDF_LoadDocument.argtypes = [FPDF_STRING, FPDF_BYTESTRING]
FPDF_LoadDocument.restype = FPDF_DOCUMENT
```
Python `bytes` are converted to `FPDF_STRING` by ctypes autoconversion.
When passing a string to a C function, it must always be null-terminated, as the function merely receives a pointer to the first item and then continues to read memory until it finds a null terminator.
[^bindings_decl]: From the auto-generated bindings file, which is not part of the repository. It is built into wheels, or created on installation. If you have an editable install, the bindings file may be found at `src/raw.py`.
* While some functions are quite easy to use, things soon get more complex.
First of all, function parameters are not only used for input, but also for output:
```python
# Initialise an integer object (defaults to 0)
c_version = ctypes.c_int()
# Let the function assign a value to the c_int object, and capture its return code (True for success, False for failure)
success = pdfium.FPDF_GetFileVersion(pdf, c_version)
if success:
# If successful, get the Python int by accessing the `value` attribute of the c_int object
version = c_version.value
else:
# Otherwise, set the variable to None (in other cases, it may be desired to raise an exception instead)
version = None
```
* If an array is required as output parameter, you can initialise one like this (conceived in general terms):
```python
# long form
array_type = (c_type * array_length)
array_object = array_type()
# short form
array_object = (c_type * array_length)()
```
Example: Getting view mode and target position from a destination object returned by some other function.
```python
# (Assuming `dest` is an FPDF_DEST)
n_params = ctypes.c_ulong()
# Create a C array to store up to four coordinates
view_pos = (pdfium.FS_FLOAT * 4)()
view_mode = pdfium.FPDFDest_GetView(dest, n_params, view_pos)
# Convert the C array to a Python list and cut it down to the actual number of coordinates
view_pos = list(view_pos)[:n_params.value]
```
* For string output parameters, callers needs to provide a sufficiently long, pre-allocated buffer.
This may work differently depending on what type the function requires, which encoding is used, whether the number of bytes or characters is returned, and whether space for a null terminator is included or not. Carefully review the documentation for the function in question to fulfill its requirements.
Example A: Getting the title string of a bookmark.
```python
# (Assuming `bookmark` is an FPDF_BOOKMARK)
# First call to get the required number of bytes (not characters!), including space for a null terminator
n_bytes = pdfium.FPDFBookmark_GetTitle(bookmark, None, 0)
# Initialise the output buffer
buffer = ctypes.create_string_buffer(n_bytes)
# Second call with the actual buffer
pdfium.FPDFBookmark_GetTitle(bookmark, buffer, n_bytes)
# Decode to string, cutting off the null terminator
# Encoding: UTF-16LE (2 bytes per character)
title = buffer.raw[:n_bytes-2].decode('utf-16-le')
```
Example B: Extracting text in given boundaries.
```python
# (Assuming `textpage` is an FPDF_TEXTPAGE and the boundary variables are set)
# Store common arguments for the two calls
args = (textpage, left, top, right, bottom)
# First call to get the required number of characters (not bytes!) - a possible null terminator is not included
n_chars = pdfium.FPDFText_GetBoundedText(*args, None, 0)
# If no characters were found, return an empty string
if n_chars <= 0:
return ""
# Calculate the required number of bytes (UTF-16LE encoding again)
n_bytes = 2 * n_chars
# Initialise the output buffer - this function can work without null terminator, so skip it
buffer = ctypes.create_string_buffer(n_bytes)
# Re-interpret the type from char to unsigned short as required by the function
buffer_ptr = ctypes.cast(buffer, ctypes.POINTER(ctypes.c_ushort))
# Second call with the actual buffer
pdfium.FPDFText_GetBoundedText(*args, buffer_ptr, n_chars)
# Decode to string (You may want to pass `errors="ignore"` to skip possible errors in the PDF's encoding)
text = buffer.raw.decode("utf-16-le")
```
* Not only are there different ways of string output that need to be handled according to the requirements of the function in question.
String input, too, can work differently depending on encoding and type.
We have already discussed `FPDF_LoadDocument()`, which takes a UTF-8 encoded string as `char *`.
A different examples is `FPDFText_FindStart()`, which needs a UTF-16LE encoded string, given as `unsigned short *`:
```python
# (Assuming `text` is a str and `textpage` an FPDF_TEXTPAGE)
# Add the null terminator and encode as UTF-16LE
enc_text = (text + "\x00").encode("utf-16-le")
# cast `enc_text` to a c_ushort pointer
text_ptr = ctypes.cast(enc_text, ctypes.POINTER(ctypes.c_ushort))
search = pdfium.FPDFText_FindStart(textpage, text_ptr, 0, 0)
```
* Leaving strings, let's suppose you have a C memory buffer allocated by PDFium and wish to read its data.
PDFium will provide you with a pointer to the first item of the byte array.
To access the data, you'll want to re-interpret the pointer using `ctypes.cast()` to encompass the whole array:
```python
# (Assuming `bitmap` is an FPDF_BITMAP and `size` is the expected number of bytes in the buffer)
first_item = pdfium.FPDFBitmap_GetBuffer(bitmap)
buffer = ctypes.cast(first_item, ctypes.POINTER(ctypes.c_ubyte * size))
# Buffer as ctypes array (referencing the original buffer, will be unavailable as soon as the bitmap is destroyed)
c_array = buffer.contents
# Buffer as Python bytes (independent copy)
data = bytes(c_array)
```
* Writing data from Python into a C buffer works in a similar fashion:
```python
# (Assuming `first_item` is a pointer to the first item of a C buffer to write into,
# `size` the number of bytes it can store, and `py_buffer` a Python byte buffer)
c_buffer = ctypes.cast(first_item, ctypes.POINTER(ctypes.c_char * size))
# Read from the Python buffer, starting at its current position, directly into the C buffer
# (until the target is full or the end of the source is reached)
n_bytes = py_buffer.readinto(c_buffer.contents) # returns the number of bytes read
```
* If you wish to check whether two objects returned by PDFium are the same, the `is` operator won't help you because `ctypes` does not have original object return (OOR),
i. e. new, equivalent Python objects are created each time, although they might represent one and the same C object.[^ctypes_no_oor] That's why you'll want to use `ctypes.addressof()` to get the memory addresses of the underlying C object.
For instance, this is used to avoid infinite loops on circular bookmark references when iterating through the document outline:
```python
# (Assuming `pdf` is an FPDF_DOCUMENT)
seen = set()
bookmark = pdfium.FPDFBookmark_GetFirstChild(pdf, None)
while bookmark:
# bookmark is a pointer, so we need to use its `contents` attribute to get the object the pointer refers to
# (otherwise we'd only get the memory address of the pointer itself, which would result in random behaviour)
address = ctypes.addressof(bookmark.contents)
if address in seen:
break # circular reference detected
else:
seen.add(address)
bookmark = pdfium.FPDFBookmark_GetNextSibling(pdf, bookmark)
```
[^ctypes_no_oor]: Confer the [ctypes documentation on Pointers](https://docs.python.org/3/library/ctypes.html#pointers).
* In many situations, callback functions come in handy.[^callback_usecases] Thanks to `ctypes`, it is seamlessly possible to use callbacks across Python/C language boundaries.
[^callback_usecases]: e. g. incremental reading/writing, progress bars, pausing of progressive tasks, ...
Example: Loading a document from a Python buffer. This way, file access can be controlled in Python while the whole data does not need to be in memory at once.
```python
# Factory class to create callable objects holding a reference to a Python buffer
class _reader_class:
def __init__(self, py_buffer):
self.py_buffer = py_buffer
def __call__(self, _, position, p_buf, size):
# Write data from Python buffer into C buffer, as explained before
c_buffer = ctypes.cast(p_buf, ctypes.POINTER(ctypes.c_char * size))
self.py_buffer.seek(position)
self.py_buffer.readinto(c_buffer.contents)
return 1 # non-zero return code for success
# (Assuming py_buffer is a Python file buffer, e. g. io.BufferedReader)
# Get the length of the buffer
py_buffer.seek(0, 2)
file_len = py_buffer.tell()
py_buffer.seek(0)
# Set up an interface structure for custom file access
fileaccess = pdfium.FPDF_FILEACCESS()
fileaccess.m_FileLen = file_len
# CFUNCTYPE declaration copied from the bindings file (unfortunately, this is not applied automatically)
functype = ctypes.CFUNCTYPE(ctypes.c_int, ctypes.POINTER(None), ctypes.c_ulong, ctypes.POINTER(ctypes.c_ubyte), ctypes.c_ulong)
# Instantiate a callable object, wrapped with the CFUNCTYPE declaration
fileaccess.m_GetBlock = functype( _reader_class(py_buffer) )
# Finally, load the document
pdf = pdfium.FPDF_LoadCustomDocument(fileaccess, None)
```
* When using the raw API, special care needs to be taken regarding object lifetime, considering that Python may garbage collect objects as soon as their reference count reaches zero. However, the interpreter has no way of magically knowing how long the underlying resources of a Python object might still be needed on the C side, so measures need to be taken to keep such objects referenced until PDFium does not depend on them anymore.
If resources need to remain valid after the time of a function call, PDFium documentation usually indicates this clearly. Ignoring requirements on object lifetime will lead to memory corruption (commonly resulting in a segmentation fault).
For instance, the documentation on `FPDF_LoadCustomDocument()` states that
> The application must keep the file resources |pFileAccess| points to valid until the returned FPDF_DOCUMENT is closed. |pFileAccess| itself does not need to outlive the FPDF_DOCUMENT.
This means that the callback function and the Python buffer need to be kept alive as long as the `FPDF_DOCUMENT` is used.
This can be achieved by referencing these objects in an accompanying class, e. g.
```python
class PdfDataHolder:
def __init__(self, buffer, function):
self.buffer = buffer
self.function = function
def close(self):
# Make sure both objects remain available until this function is called
# No-op id() call to denote that the object needs to stay in memory up to this point
id(self.function)
self.buffer.close()
# ... set up an FPDF_FILEACCESS structure
# (Assuming `py_buffer` is the buffer and `fileaccess` the FPDF_FILEACCESS interface)
data_holder = PdfDataHolder(py_buffer, fileaccess.m_GetBlock)
pdf = pdfium.FPDF_LoadCustomDocument(fileaccess, None)
# ... work with the pdf
# Close the PDF to free resources
pdfium.FPDF_CloseDocument(pdf)
# Close the data holder, to keep the object itself and thereby the objects it
# references alive up to this point, as well as to release the buffer
data_holder.close()
```
* Finally, let's finish this guide with an example on how to render the first page of a document to a `PIL` image in `RGBA` color format.
```python
import math
import ctypes
import os.path
import PIL.Image
import pypdfium2 as pdfium
# Load the document
filepath = os.path.abspath("tests/resources/render.pdf")
pdf = pdfium.FPDF_LoadDocument(filepath, None)
# Check page count to make sure it was loaded correctly
page_count = pdfium.FPDF_GetPageCount(pdf)
assert page_count >= 1
# Load the first page and get its dimensions
page = pdfium.FPDF_LoadPage(pdf, 0)
width = math.ceil(pdfium.FPDF_GetPageWidthF(page))
height = math.ceil(pdfium.FPDF_GetPageHeightF(page))
# Create a bitmap
use_alpha = False # We don't render with transparent background
bitmap = pdfium.FPDFBitmap_Create(width, height, int(use_alpha))
# Fill the whole bitmap with a white background
# The color is given as a 32-bit integer in ARGB format (8 bits per channel)
pdfium.FPDFBitmap_FillRect(bitmap, 0, 0, width, height, 0xFFFFFFFF)
# Store common rendering arguments
render_args = (
bitmap, # the bitmap
page, # the page
# positions and sizes are to be given in pixels and may exceed the bitmap
0, # left start position
0, # top start position
width, # horizontal size
height, # vertical size
0, # rotation (as constant, not in degrees!)
pdfium.FPDF_LCD_TEXT | pdfium.FPDF_ANNOT, # rendering flags, combined with binary or
)
# Render the page
pdfium.FPDF_RenderPageBitmap(*render_args)
# Get a pointer to the first item of the buffer
first_item = pdfium.FPDFBitmap_GetBuffer(bitmap)
# Re-interpret the pointer to encompass the whole buffer
buffer = ctypes.cast(first_item, ctypes.POINTER(ctypes.c_ubyte * (width * height * 4)))
# Create a PIL image from the buffer contents
img = PIL.Image.frombuffer("RGBA", (width, height), buffer.contents, "raw", "BGRA", 0, 1)
# Save it as file
img.save("out.png")
# Free resources
pdfium.FPDFBitmap_Destroy(bitmap)
pdfium.FPDF_ClosePage(page)
pdfium.FPDF_CloseDocument(pdf)
```
### [Command-line Interface](https://pypdfium2.readthedocs.io/en/stable/shell_api.html)
pypdfium2 also ships with a simple command-line interface, providing access to key features of the support model in a shell environment (e. g. rendering, content extraction, document inspection, page rearranging, ...).
The primary motivation behind this is to have a nice testing interface, but it may be helpful in a variety of other situations as well.
Usage should be largely self-explanatory, assuming a minimum of familiarity with the command-line.
## Licensing
PDFium and pypdfium2 are available by the terms and conditions of either [`Apache-2.0`](LICENSES/Apache-2.0.txt) or [`BSD-3-Clause`](LICENSES/BSD-3-Clause.txt), at your choice.
Various other open-source licenses apply to dependencies bundled with PDFium. Verbatim copies of their respective licenses are contained in the file [`LicenseRef-PdfiumThirdParty.txt`](LICENSES/LicenseRef-PdfiumThirdParty.txt), which also has to be shipped with binary redistributions.
Documentation and examples of pypdfium2 are licensed under [`CC-BY-4.0`](LICENSES/CC-BY-4.0.txt).
pypdfium2 complies with the [reuse standard](https://reuse.software/spec/) by including [SPDX](https://spdx.org/licenses/) headers in source files, and license information for data files in [`.reuse/dep5`](.reuse/dep5).
To the author's knowledge, pypdfium2 is one of the rare Python libraries that are capable of PDF rendering while not being covered by copyleft licenses (such as the `GPL`).[^liberal_pdf_renderlibs]
As of early 2023, a single developer is author and rightsholder of the code base (apart from a few minor [code contributions](https://github.com/pypdfium2-team/pypdfium2/graphs/contributors)).
[^liberal_pdf_renderlibs]: The only other liberal-licensed PDF rendering libraries known to the authors are [`pdf.js`](https://github.com/mozilla/pdf.js/) (JavaScript) and [`Apache PDFBox`](https://github.com/apache/pdfbox) (Java). `pdf.js` is limited to a web environment. Creating Python bindings to `PDFBox` might be possible but there is no serious solution yet (apart from amateurish wrappers around its command-line API).
## Issues
While using pypdfium2, you might encounter bugs or missing features.
In this case, please file an issue report. Remember to include applicable details such as tracebacks, operating system and CPU architecture, as well as the versions of pypdfium2 and used dependencies.
In case your issue could be tracked down to a third-party dependency, we will accompany or conduct subsequent measures.
Here is a roadmap of relevant places:
* pypdfium2
- [Issues panel](https://github.com/pypdfium2-team/pypdfium2/issues): Initial reports of specific issues.
May need to be transferred to other projects if not caused by or fixable in pypdfium2 code alone.
- [Discussions page](https://github.com/pypdfium2-team/pypdfium2/discussions): General questions and suggestions.
- In case you do not want to publicly disclose the issue or your code, you may also contact the maintainers privately via e-mail.
* PDFium
- [Bug tracker](https://bugs.chromium.org/p/pdfium/issues/list): Defects in PDFium.
Beware: The bridge between Python and C increases the probability of integration issues or API misuse.
The symptoms can often make it look like a PDFium bug while it is not. In some cases, this may be quite difficult to distinguish.
- [Mailing list](https://groups.google.com/g/pdfium/): Questions regarding PDFium usage.
* [pdfium-binaries](https://github.com/bblanchon/pdfium-binaries/issues): Binary builder.
* [ctypesgen](https://github.com/ctypesgen/ctypesgen/issues): Bindings generator.
### Known limitations
pypdfium2 also has some drawbacks, of which you will be informed below.
#### Incompatibility with CPython 3.7.6 and 3.8.1
pypdfium2 built with mainstream ctypesgen cannot be used with releases 3.7.6 and 3.8.1 of the CPython interpreter due to a [regression](https://github.com/python/cpython/pull/16799#issuecomment-612353119) that [broke](https://github.com/ctypesgen/ctypesgen/issues/77) ctypesgen-created string handling code.
However, we are currently [making efforts](https://github.com/ctypesgen/ctypesgen/pull/162) to remove ctypesgen's wonky string code.
Since version 4, pypdfium2 releases will be built with a patched variant of ctypesgen.
#### Risk of unknown object lifetime violations
As outlined in the raw API section, it is essential that Python-managed resources remain available as long as they are needed by PDFium.
The problem is that the Python interpreter may garbage collect objects with reference count zero at any time. Thus, it can happen that an unreferenced but still required object by chance stays around long enough before it is garbage collected. Such dangling objects are likely to cause non-deterministic segmentation faults.
If the timeframe between reaching reference count zero and removal is sufficiently large and roughly consistent across different runs, it is even possible that mistakes regarding object lifetime remain unnoticed for a long time.
Although great care has been taken while developing the support model, it cannot be fully excluded that unknown object lifetime violations are still lurking around somewhere, especially if unexpected requirements were not documented by the time the code was written.
#### No direct access to raw PDF data structure
<!-- https://crbug.com/pdfium/1694 -->
PDFium does not currently provide direct access to the raw PDF data structure. It does not publicly expose APIs to read/write PDF dictionaries, name trees, etc. Instead, it merely offers a variety of higher-level functions to modify PDFs. While these are certainly useful to abstract some of the format's complexity and avoid the creation of invalid PDFs, the lack of public instruments for raw access considerably limits the library's potential. If PDFium's capabilities are not sufficient for your use case, or you just wish to work with the raw PDF structure on your own, you may want to consider other products such as [`pikepdf`](https://github.com/pikepdf/pikepdf) to use instead of, or in conjunction with, pypdfium2.
## Development
This section contains some key information relevant for project maintainers.
<!-- TODO wheel tags, maintainer access, GitHub peculiarities -->
### Documentation
pypdfium2 provides API documentation using [Sphinx](https://github.com/sphinx-doc/sphinx/). It can be rendered to various formats, including HTML:
```bash
sphinx-build -b html ./docs/source ./docs/build/html/
```
Built documentation is primarily hosted on [`readthedocs.org`](https://readthedocs.org/projects/pypdfium2/).
It may be configured using a [`.readthedocs.yaml`](.readthedocs.yaml) file (see [instructions](https://docs.readthedocs.io/en/stable/config-file/v2.html)), and the administration page on the web interface.
RTD supports hosting multiple versions, so we currently have one linked to the `main` branch and another to `stable`.
New builds are automatically triggered by a webhook whenever you push to a linked branch.
Additionally, one documentation build can also be hosted on [GitHub Pages](https://pypdfium2-team.github.io/pypdfium2/index.html).
It is implemented with a CI workflow, which is currently linked to `main` and triggered on push as well.
This provides us with full control over the build environment and the used commands, whereas RTD is kind of limited in this regard.
### Testing
pypdfium2 contains a small test suite to verify the library's functionality. It is written with [pytest](https://github.com/pytest-dev/pytest/):
```bash
python3 -m pytest tests/ tests_old/
```
Note that ...
* you can pass `-sv` to get more detailed output.
* `$DEBUG_AUTOCLOSE=1` may be set to get debugging information on automatic object finalization.
To get code coverage statistics, you can run
```bash
make coverage
```
Sometimes, it can also be helpful to test code on many PDFs.[^testing_corpora]
In this case, the command-line interface and `find` come in handy:
```bash
# Example A: Analyse PDF images (in the current working directory)
find . -name '*.pdf' -exec bash -c "echo \"{}\" && pypdfium2 pageobjects \"{}\" --types image" \;
# Example B: Parse PDF table of contents
find . -name '*.pdf' -exec bash -c "echo \"{}\" && pypdfium2 toc \"{}\"" \;
```
[^testing_corpora]: For instance, one could use the testing corpora of open-source PDF libraries (pdfium, pikepdf/ocrmypdf, mupdf/ghostscript, tika/pdfbox, pdfjs, ...)
### Release workflow
The release process is fully automated using Python scripts and a CI setup for GitHub Actions.
A new release is triggered every Tuesday, one day after `pdfium-binaries`.
You may also trigger the workflow manually using the GitHub Actions panel or the [`gh`](https://cli.github.com/) command-line tool.
Python release scripts are located in the folder `setupsrc/pypdfium2_setup`, along with custom setup code:
* `update_pdfium.py` downloads binaries and generates the bindings.
* `craft_packages.py` builds platform-specific wheel packages and a source distribution suitable for PyPI upload.
* `autorelease.py` takes care of versioning, changelog, release note generation and VCS checkin.
The autorelease script has some peculiarities maintainers should know about:
* The changelog for the next release shall be written into `docs/devel/changelog_staging.md`.
On release, it will be moved into the main changelog under `docs/source/changelog.md`, annotated with the PDFium version update.
It will also be shown on the GitHub release page.
* pypdfium2 versioning uses the pattern `major.minor.patch`, optionally with an appended beta mark (e. g. `2.7.1`, `2.11.0`, `3.0.0b1`, ...).
Version changes are based on the following logic:
* If PDFium was updated, the minor version is incremented.
* If only pypdfium2 code was updated, the patch version is incremented instead.
* Major updates and beta marks are controlled via empty files in the `autorelease/` directory.
If `update_major.txt` exists, the major version is incremented.
If `update_beta.txt` exists, a new beta tag is set, or an existing one is incremented.
These files are removed automatically once the release is finished.
* If switching from a beta release to a non-beta release, only the beta mark is removed while minor and patch versions remain unchanged.
In case of necessity, you may also forego autorelease/CI and do the release manually, which will roughly work like this (though ideally it should never be needed):
* Commit changes to the version file
```bash
git add src/pypdfium2/version.py
git commit -m "increment version"
git push
```
* Create a new tag that matches the version file
```bash
# substitute $VERSION accordingly
git tag -a $VERSION
git push --tags
```
* Build the packages
```bash
python3 setupsrc/pypdfium2_setup/update_pdfium.py
python3 setupsrc/pypdfium2_setup/craft_packages.py
```
* Upload to PyPI
```bash
# make sure the packages are valid
twine check dist/*
# upload to PyPI (this will interactively ask for your username/password)
twine upload dist/*
```
* Update the `stable` branch to trigger a documentation rebuild
```bash
git checkout stable
git rebase origin/main # alternatively: git reset --hard main
git checkout main
```
If something went wrong with commit or tag, you can still revert the changes:
```bash
# perform an interactive rebase to change history (substitute $N_COMMITS with the number of commits to drop or modify)
git rebase -i HEAD~$N_COMMITS
git push --force
# delete local tag (substitute $TAGNAME accordingly)
git tag -d $TAGNAME
# delete remote tag
git push --delete origin $TAGNAME
```
Faulty PyPI releases may be yanked using the web interface.
## Thanks to[^thanks_to]
<!-- order: alphabetical by surname -->
* [Benoît Blanchon](https://github.com/bblanchon): Author of [PDFium binaries](https://github.com/bblanchon/pdfium-binaries/) and [patches](sourcebuild/patches/).
* [Anderson Bravalheri](https://github.com/abravalheri): Help with PEP 517/518 compliance. Hint to use an environment variable rather than separate setup files.
* [Bastian Germann](https://github.com/bgermann): Help with inclusion of licenses for third-party components of PDFium.
* [Tim Head](https://github.com/betatim): Original idea for Python bindings to PDFium with ctypesgen in `wowpng`.
* [Yinlin Hu](https://github.com/YinlinHu): `pypdfium` prototype and `kuafu` PDF viewer.
* [Adam Huganir](https://github.com/adam-huganir): Help with maintenance and development decisions since the beginning of the project.
* [kobaltcore](https://github.com/kobaltcore): Bug fix for `PdfDocument.save()`.
* [Mike Kroutikov](https://github.com/mkroutikov): Examples on how to use PDFium with ctypes in `redstork` and `pdfbrain`.
* [Peter Saalbrink](https://github.com/petersaalbrink): Code style improvements to the multipage renderer.
... and further [code contributors](https://github.com/pypdfium2-team/pypdfium2/graphs/contributors) (GitHub stats).
*If you have somehow contributed to this project but we forgot to mention you here, please let us know.*
[^thanks_to]: People listed in this section may not necessarily have contributed any copyrightable code to the repository. Some have rather helped with ideas, or contributions to dependencies of pypdfium2.
## History
pypdfium2 is the successor of *pypdfium* and *pypdfium-reboot*.
Inspired by *wowpng*, the first known proof of concept Python binding to PDFium using ctypesgen, the initial *pypdfium* package was created. It had to be updated manually, which did not happen frequently. There were no platform-specific wheels, but only a single wheel that contained binaries for 64-bit Linux, Windows and macOS.
*pypdfium-reboot* then added a script to automate binary deployment and bindings generation to simplify regular updates. However, it was still not platform specific.
pypdfium2 is a full rewrite of *pypdfium-reboot* to build platform-specific wheels and consolidate the setup scripts. Further additions include ...
* A CI workflow to automatically release new wheels every Tuesday
* Support models that conveniently wrap the raw PDFium/ctypes API
* Test code
* A script to build PDFium from source
Raw data
{
"_id": null,
"home_page": "",
"name": "pypdfium2",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": "",
"keywords": "pdf,pdfium",
"author": "pypdfium2-team",
"author_email": "geisserml <geisserml@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/29/50/79112f1b16e7f4539a58d69619a5d6254f4919b87d7c0ea357426b00d81b/pypdfium2-4.3.0.tar.gz",
"platform": null,
"description": "<!-- SPDX-FileCopyrightText: 2023 geisserml <geisserml@gmail.com> -->\n<!-- SPDX-License-Identifier: CC-BY-4.0 -->\n\n# pypdfium2\n\n[](https://pepy.tech/project/pypdfium2)\n\n[pypdfium2](https://github.com/pypdfium2-team/pypdfium2) is an ABI-level Python 3 binding to [PDFium](https://pdfium.googlesource.com/pdfium/+/refs/heads/main), a powerful and liberal-licensed library for PDF rendering, inspection, manipulation and creation.\n\nThe project is built using [ctypesgen](https://github.com/ctypesgen/ctypesgen) and external [PDFium binaries](https://github.com/bblanchon/pdfium-binaries/).\nIts custom setup infrastructure provides a seamless packaging and installation process. A wide range of platforms is supported with wheel packages.\n\npypdfium2 includes helpers to simplify common use cases, while the raw PDFium/ctypes API remains accessible as well.\n\n\n## Installation\n\n* Installing the latest PyPI release (recommended)\n ```bash\n python3 -m pip install -U pypdfium2\n ```\n This will use a pre-built wheel package, the easiest way of installing pypdfium2.\n\n* Installing from source\n \n * With an external PDFium binary\n ```bash\n # In the directory containing the source code of pypdfium2\n python3 -m pip install .\n ```\n \n * With a locally built PDFium binary\n ```bash\n python3 setupsrc/pypdfium2_setup/build_pdfium.py # call with --help to list options\n PDFIUM_PLATFORM=\"sourcebuild\" python3 -m pip install .\n ```\n Building PDFium may take a long time because it comes with its own toolchain and bundled dependencies, rather than using system-provided components.[^pdfium_buildsystem]\n \n The host system needs to provide `git` and a C pre-processor (`gcc` or `clang`).\n Setup code also depends on the Python packages `ctypesgen`, `wheel`, and `setuptools`, which will usually get installed automatically.\n \n When installing from source, some additional options of the `pip` package manager may be relevant:\n * `-v`: Request more detailed logging output. Useful for debugging.\n * `-e`: Install in editable mode, so that the installation will point to the source tree. This way, changes directly take effect without needing to re-install. Recommended for development.\n * `--no-build-isolation`: Do not isolate the installation in a virtual environment and use system packages instead. In this case, dependencies specified in `pyproject.toml` (PEP 518) will not take effect and should be pre-installed by the caller. This is an indispensable option if wanting to run the installation with custom versions of setup dependencies.[^no_build_isolation]\n \n [^pdfium_buildsystem]: Replacing PDFium's toolchain with a lean build system that is designed to run on an arbitrary host platform is a long-standing task. This would be required to enable local source build capabilities on installation of an `sdist`. If you have the time and expertise to set up such a build system, please start a repository and inform us about it.\n \n [^no_build_isolation]: Possible scenarios include using a locally modified version of a dependency, or supplying a dependency built from a certain commit.\n \n* Installing an unofficial distribution\n \n To the authors' knowledge, there currently are no other distributions of pypdfium2 apart from the official wheel releases on PyPI and GitHub.\n There is no conda package yet.\n So far, pypdfium2 has not been included in any operating system repositories. While we are interested in cooperation with external package maintainers to make this possible, the authors of this project have no control over and are not responsible for third-party distributions of pypdfium2.\n\n### Setup magic\n\nAs pypdfium2 uses external binaries, there are some special setup aspects to consider.\n\n* Binaries are stored in platform-specific sub-directories of `data/`, along with bindings and version information.\n* The environment variable `$PDFIUM_PLATFORM` controls which binary to include on setup.\n * If unset or `auto`, the host platform is detected and a corresponding binary will be selected.\n By default, the latest pdfium-binaries release is used, otherwise `$PDFIUM_VERSION` may be set to request a specific one.\n Moreover, `$PDFIUM_USE_V8=1` may be set to use the V8 (JavaScript) enabled binaries.\n (If matching platform files already exist in the `data/` cache, they will be reused as-is.)\n * If set to a certain platform identifier, binaries for the requested platform will be used.[^platform_ids]\n In this case, platform files will not be downloaded/generated automatically, but need to be supplied beforehand using the `update_pdfium.py` script.\n * If set to `sourcebuild`, binaries will be taken from the location where the build script places its artefacts, assuming a prior run of `build_pdfium.py`.\n * If set to `none`, no platform-dependent files will be injected, so as to create a source distribution.\n\n[^platform_ids]: This is mainly of internal interest for packaging, so that wheels can be crafted for any platform without access to a native host.\n\n### Runtime Dependencies\n\npypdfium2 does not have any mandatory runtime dependencies apart from Python and its standard library.\n\nHowever, some optional support model features require additional packages:\n* [`Pillow`](https://pillow.readthedocs.io/en/stable/) (module name `PIL`) is a highly pouplar imaging library for Python.\n pypdfium2 provides convenience methods to directly take or return PIL image objects when dealing with raster graphics.\n* [`NumPy`](https://numpy.org/doc/stable/index.html) is a library for scientific computing. Similar to `Pillow`, pypdfium2 provides helpers to get raster graphics in the form of multidimensional numpy arrays.\n\n\n## Usage\n\n### [Support model](https://pypdfium2.readthedocs.io/en/stable/python_api.html)\n\n<!-- TODO demonstrate more APIs (e. g. XObject placement, transform matrices, image extraction, ...) -->\n\nHere are some examples of using the support model API.\n\n* Import the library\n ```python\n import pypdfium2 as pdfium\n ```\n\n* Open a PDF using the helper class `PdfDocument` (supports file path strings, bytes, and byte buffers)\n ```python\n pdf = pdfium.PdfDocument(\"./path/to/document.pdf\")\n version = pdf.get_version() # get the PDF standard version\n n_pages = len(pdf) # get the number of pages in the document\n ```\n\n* Render multiple pages concurrently\n ```python\n page_indices = [i for i in range(n_pages)] # all pages\n renderer = pdf.render(\n pdfium.PdfBitmap.to_pil,\n page_indices = page_indices,\n scale = 300/72, # 300dpi resolution\n )\n for i, image in zip(page_indices, renderer):\n image.save(\"out_%0*d.jpg\" % (n_digits, i))\n ```\n\n* Read the table of contents\n ```python\n for item in pdf.get_toc():\n \n if item.n_kids == 0:\n state = \"*\"\n elif item.is_closed:\n state = \"-\"\n else:\n state = \"+\"\n \n if item.page_index is None:\n target = \"?\"\n else:\n target = item.page_index + 1\n \n print(\n \" \" * item.level +\n \"[%s] %s -> %s # %s %s\" % (\n state, item.title, target, item.view_mode,\n [round(c, n_digits) for c in item.view_pos],\n )\n )\n ```\n\n* Load a page to work with\n ```python\n page = pdf[0]\n \n # Get page dimensions in PDF canvas units (1pt->1/72in by default)\n width, height = page.get_size()\n # Set the absolute page rotation to 90\u00b0 clockwise\n page.set_rotation(90)\n \n # Locate objects on the page\n for obj in page.get_objects():\n print(obj.level, obj.type, obj.get_pos())\n ```\n\n* Render a single page\n ```python\n bitmap = page.render(\n scale = 1, # 72dpi resolution\n rotation = 0, # no additional rotation\n # ... further rendering options\n )\n pil_image = bitmap.to_pil()\n pil_image.show()\n ```\n\n* Extract and search text\n ```python\n # Load a text page helper\n textpage = page.get_textpage()\n \n # Extract text from the whole page\n text_all = textpage.get_text_range()\n # Extract text from a specific rectangular area\n text_part = textpage.get_text_bounded(left=50, bottom=100, right=width-50, top=height-100)\n \n # Locate text on the page\n searcher = textpage.search(\"something\", match_case=False, match_whole_word=False)\n # This will be a list of bounding boxes of the form (left, bottom, right, top)\n first_occurrence = searcher.get_next()\n ```\n\n* Create a new PDF with an empty A4 sized page\n ```python\n pdf = pdfium.PdfDocument.new()\n width, height = (595, 842)\n page_a = pdf.new_page(width, height)\n ```\n\n* Include a JPEG image in a PDF\n ```python\n pdf = pdfium.PdfDocument.new()\n \n image = pdfium.PdfImage.new(pdf)\n image.load_jpeg(\"./tests/resources/mona_lisa.jpg\")\n metadata = image.get_metadata()\n \n matrix = pdfium.PdfMatrix().scale(metadata.width, metadata.height)\n image.set_matrix(matrix)\n \n page = pdf.new_page(metadata.width, metadata.height)\n page.insert_obj(image)\n page.gen_content()\n ```\n\n* Save the document\n ```python\n # PDF 1.7 standard\n pdf.save(\"output.pdf\", version=17)\n ```\n\n### Raw PDFium API\n\nWhile helper classes conveniently wrap the raw PDFium API, it may still be accessed directly and is available in the submodule `pypdfium2.raw`.\n\nSince PDFium is a large library, many components are not covered by helpers yet. You may seamlessly interact with the raw API while still using helpers where available. When used as ctypes function parameter, helper objects automatically resolve to the underlying raw object (but you may still access it explicitly if desired):\n```python\npermission_flags = pdfium.raw.FPDF_GetDocPermission(pdf.raw) # explicit\npermission_flags = pdfium.raw.FPDF_GetDocPermission(pdf) # implicit\n```\n\nFor PDFium documentation, please look at the comments in its [public header files](https://pdfium.googlesource.com/pdfium/+/refs/heads/main/public/).[^pdfium_docs]\nA large variety of examples on how to interface with the raw API using [`ctypes`](https://docs.python.org/3/library/ctypes.html) is already provided with [support model source code](src/pypdfium2/_helpers).\nNonetheless, the following guide may be helpful to get started with the raw API, especially for developers who are not familiar with `ctypes` yet.\n\n[^pdfium_docs]: Unfortunately, no recent HTML-rendered documentation is available for PDFium at the moment.\n\n<!-- TODO write something about weakref.finalize(); add example on creating a C page array -->\n\n* In general, PDFium functions can be called just like normal Python functions.\n However, parameters may only be passed positionally, i. e. it is not possible to use keyword arguments.\n There are no defaults, so you always need to provide a value for each argument.\n ```python\n # arguments: filepath (bytes), password (bytes|None)\n # null-terminate filepath and encode as UTF-8\n pdf = pdfium.FPDF_LoadDocument((filepath + \"\\x00\").encode(\"utf-8\"), None)\n ```\n This is the underlying bindings declaration,[^bindings_decl] which loads the function from the binary and\n contains the information required to convert Python types to their C equivalents.\n ```python\n if _libs[\"pdfium\"].has(\"FPDF_LoadDocument\", \"cdecl\"):\n FPDF_LoadDocument = _libs[\"pdfium\"].get(\"FPDF_LoadDocument\", \"cdecl\")\n FPDF_LoadDocument.argtypes = [FPDF_STRING, FPDF_BYTESTRING]\n FPDF_LoadDocument.restype = FPDF_DOCUMENT\n ```\n Python `bytes` are converted to `FPDF_STRING` by ctypes autoconversion.\n When passing a string to a C function, it must always be null-terminated, as the function merely receives a pointer to the first item and then continues to read memory until it finds a null terminator.\n \n[^bindings_decl]: From the auto-generated bindings file, which is not part of the repository. It is built into wheels, or created on installation. If you have an editable install, the bindings file may be found at `src/raw.py`.\n\n* While some functions are quite easy to use, things soon get more complex.\n First of all, function parameters are not only used for input, but also for output:\n ```python\n # Initialise an integer object (defaults to 0)\n c_version = ctypes.c_int()\n # Let the function assign a value to the c_int object, and capture its return code (True for success, False for failure)\n success = pdfium.FPDF_GetFileVersion(pdf, c_version)\n if success:\n # If successful, get the Python int by accessing the `value` attribute of the c_int object\n version = c_version.value\n else:\n # Otherwise, set the variable to None (in other cases, it may be desired to raise an exception instead)\n version = None\n ```\n\n* If an array is required as output parameter, you can initialise one like this (conceived in general terms):\n ```python\n # long form\n array_type = (c_type * array_length)\n array_object = array_type()\n # short form\n array_object = (c_type * array_length)()\n ```\n Example: Getting view mode and target position from a destination object returned by some other function.\n ```python\n # (Assuming `dest` is an FPDF_DEST)\n n_params = ctypes.c_ulong()\n # Create a C array to store up to four coordinates\n view_pos = (pdfium.FS_FLOAT * 4)()\n view_mode = pdfium.FPDFDest_GetView(dest, n_params, view_pos)\n # Convert the C array to a Python list and cut it down to the actual number of coordinates\n view_pos = list(view_pos)[:n_params.value]\n ```\n\n* For string output parameters, callers needs to provide a sufficiently long, pre-allocated buffer.\n This may work differently depending on what type the function requires, which encoding is used, whether the number of bytes or characters is returned, and whether space for a null terminator is included or not. Carefully review the documentation for the function in question to fulfill its requirements.\n \n Example A: Getting the title string of a bookmark.\n ```python\n # (Assuming `bookmark` is an FPDF_BOOKMARK)\n # First call to get the required number of bytes (not characters!), including space for a null terminator\n n_bytes = pdfium.FPDFBookmark_GetTitle(bookmark, None, 0)\n # Initialise the output buffer\n buffer = ctypes.create_string_buffer(n_bytes)\n # Second call with the actual buffer\n pdfium.FPDFBookmark_GetTitle(bookmark, buffer, n_bytes)\n # Decode to string, cutting off the null terminator\n # Encoding: UTF-16LE (2 bytes per character)\n title = buffer.raw[:n_bytes-2].decode('utf-16-le')\n ```\n \n Example B: Extracting text in given boundaries.\n ```python\n # (Assuming `textpage` is an FPDF_TEXTPAGE and the boundary variables are set)\n # Store common arguments for the two calls\n args = (textpage, left, top, right, bottom)\n # First call to get the required number of characters (not bytes!) - a possible null terminator is not included\n n_chars = pdfium.FPDFText_GetBoundedText(*args, None, 0)\n # If no characters were found, return an empty string\n if n_chars <= 0:\n return \"\"\n # Calculate the required number of bytes (UTF-16LE encoding again)\n n_bytes = 2 * n_chars\n # Initialise the output buffer - this function can work without null terminator, so skip it\n buffer = ctypes.create_string_buffer(n_bytes)\n # Re-interpret the type from char to unsigned short as required by the function\n buffer_ptr = ctypes.cast(buffer, ctypes.POINTER(ctypes.c_ushort))\n # Second call with the actual buffer\n pdfium.FPDFText_GetBoundedText(*args, buffer_ptr, n_chars)\n # Decode to string (You may want to pass `errors=\"ignore\"` to skip possible errors in the PDF's encoding)\n text = buffer.raw.decode(\"utf-16-le\")\n ```\n\n* Not only are there different ways of string output that need to be handled according to the requirements of the function in question.\n String input, too, can work differently depending on encoding and type.\n We have already discussed `FPDF_LoadDocument()`, which takes a UTF-8 encoded string as `char *`.\n A different examples is `FPDFText_FindStart()`, which needs a UTF-16LE encoded string, given as `unsigned short *`:\n ```python\n # (Assuming `text` is a str and `textpage` an FPDF_TEXTPAGE)\n # Add the null terminator and encode as UTF-16LE\n enc_text = (text + \"\\x00\").encode(\"utf-16-le\")\n # cast `enc_text` to a c_ushort pointer\n text_ptr = ctypes.cast(enc_text, ctypes.POINTER(ctypes.c_ushort))\n search = pdfium.FPDFText_FindStart(textpage, text_ptr, 0, 0)\n ```\n\n* Leaving strings, let's suppose you have a C memory buffer allocated by PDFium and wish to read its data.\n PDFium will provide you with a pointer to the first item of the byte array.\n To access the data, you'll want to re-interpret the pointer using `ctypes.cast()` to encompass the whole array:\n ```python\n # (Assuming `bitmap` is an FPDF_BITMAP and `size` is the expected number of bytes in the buffer)\n first_item = pdfium.FPDFBitmap_GetBuffer(bitmap)\n buffer = ctypes.cast(first_item, ctypes.POINTER(ctypes.c_ubyte * size))\n # Buffer as ctypes array (referencing the original buffer, will be unavailable as soon as the bitmap is destroyed)\n c_array = buffer.contents\n # Buffer as Python bytes (independent copy)\n data = bytes(c_array)\n ```\n\n* Writing data from Python into a C buffer works in a similar fashion:\n ```python\n # (Assuming `first_item` is a pointer to the first item of a C buffer to write into,\n # `size` the number of bytes it can store, and `py_buffer` a Python byte buffer)\n c_buffer = ctypes.cast(first_item, ctypes.POINTER(ctypes.c_char * size))\n # Read from the Python buffer, starting at its current position, directly into the C buffer\n # (until the target is full or the end of the source is reached)\n n_bytes = py_buffer.readinto(c_buffer.contents) # returns the number of bytes read\n ```\n\n* If you wish to check whether two objects returned by PDFium are the same, the `is` operator won't help you because `ctypes` does not have original object return (OOR),\n i. e. new, equivalent Python objects are created each time, although they might represent one and the same C object.[^ctypes_no_oor] That's why you'll want to use `ctypes.addressof()` to get the memory addresses of the underlying C object.\n For instance, this is used to avoid infinite loops on circular bookmark references when iterating through the document outline:\n ```python\n # (Assuming `pdf` is an FPDF_DOCUMENT)\n seen = set()\n bookmark = pdfium.FPDFBookmark_GetFirstChild(pdf, None)\n while bookmark:\n # bookmark is a pointer, so we need to use its `contents` attribute to get the object the pointer refers to\n # (otherwise we'd only get the memory address of the pointer itself, which would result in random behaviour)\n address = ctypes.addressof(bookmark.contents)\n if address in seen:\n break # circular reference detected\n else:\n seen.add(address)\n bookmark = pdfium.FPDFBookmark_GetNextSibling(pdf, bookmark)\n ```\n \n [^ctypes_no_oor]: Confer the [ctypes documentation on Pointers](https://docs.python.org/3/library/ctypes.html#pointers).\n\n* In many situations, callback functions come in handy.[^callback_usecases] Thanks to `ctypes`, it is seamlessly possible to use callbacks across Python/C language boundaries.\n \n [^callback_usecases]: e. g. incremental reading/writing, progress bars, pausing of progressive tasks, ...\n \n Example: Loading a document from a Python buffer. This way, file access can be controlled in Python while the whole data does not need to be in memory at once.\n ```python\n # Factory class to create callable objects holding a reference to a Python buffer\n class _reader_class:\n \n def __init__(self, py_buffer):\n self.py_buffer = py_buffer\n \n def __call__(self, _, position, p_buf, size):\n # Write data from Python buffer into C buffer, as explained before\n c_buffer = ctypes.cast(p_buf, ctypes.POINTER(ctypes.c_char * size))\n self.py_buffer.seek(position)\n self.py_buffer.readinto(c_buffer.contents)\n return 1 # non-zero return code for success\n \n # (Assuming py_buffer is a Python file buffer, e. g. io.BufferedReader)\n # Get the length of the buffer\n py_buffer.seek(0, 2)\n file_len = py_buffer.tell()\n py_buffer.seek(0)\n \n # Set up an interface structure for custom file access\n fileaccess = pdfium.FPDF_FILEACCESS()\n fileaccess.m_FileLen = file_len\n # CFUNCTYPE declaration copied from the bindings file (unfortunately, this is not applied automatically)\n functype = ctypes.CFUNCTYPE(ctypes.c_int, ctypes.POINTER(None), ctypes.c_ulong, ctypes.POINTER(ctypes.c_ubyte), ctypes.c_ulong)\n # Instantiate a callable object, wrapped with the CFUNCTYPE declaration\n fileaccess.m_GetBlock = functype( _reader_class(py_buffer) )\n # Finally, load the document\n pdf = pdfium.FPDF_LoadCustomDocument(fileaccess, None)\n ```\n\n* When using the raw API, special care needs to be taken regarding object lifetime, considering that Python may garbage collect objects as soon as their reference count reaches zero. However, the interpreter has no way of magically knowing how long the underlying resources of a Python object might still be needed on the C side, so measures need to be taken to keep such objects referenced until PDFium does not depend on them anymore.\n \n If resources need to remain valid after the time of a function call, PDFium documentation usually indicates this clearly. Ignoring requirements on object lifetime will lead to memory corruption (commonly resulting in a segmentation fault).\n \n For instance, the documentation on `FPDF_LoadCustomDocument()` states that\n > The application must keep the file resources |pFileAccess| points to valid until the returned FPDF_DOCUMENT is closed. |pFileAccess| itself does not need to outlive the FPDF_DOCUMENT.\n \n This means that the callback function and the Python buffer need to be kept alive as long as the `FPDF_DOCUMENT` is used.\n This can be achieved by referencing these objects in an accompanying class, e. g.\n \n ```python\n class PdfDataHolder:\n \n def __init__(self, buffer, function):\n self.buffer = buffer\n self.function = function\n \n def close(self):\n # Make sure both objects remain available until this function is called\n # No-op id() call to denote that the object needs to stay in memory up to this point\n id(self.function)\n self.buffer.close()\n \n # ... set up an FPDF_FILEACCESS structure\n \n # (Assuming `py_buffer` is the buffer and `fileaccess` the FPDF_FILEACCESS interface)\n data_holder = PdfDataHolder(py_buffer, fileaccess.m_GetBlock)\n pdf = pdfium.FPDF_LoadCustomDocument(fileaccess, None)\n \n # ... work with the pdf\n \n # Close the PDF to free resources\n pdfium.FPDF_CloseDocument(pdf)\n # Close the data holder, to keep the object itself and thereby the objects it\n # references alive up to this point, as well as to release the buffer\n data_holder.close()\n ```\n\n* Finally, let's finish this guide with an example on how to render the first page of a document to a `PIL` image in `RGBA` color format.\n ```python\n import math\n import ctypes\n import os.path\n import PIL.Image\n import pypdfium2 as pdfium\n \n # Load the document\n filepath = os.path.abspath(\"tests/resources/render.pdf\")\n pdf = pdfium.FPDF_LoadDocument(filepath, None)\n \n # Check page count to make sure it was loaded correctly\n page_count = pdfium.FPDF_GetPageCount(pdf)\n assert page_count >= 1\n \n # Load the first page and get its dimensions\n page = pdfium.FPDF_LoadPage(pdf, 0)\n width = math.ceil(pdfium.FPDF_GetPageWidthF(page))\n height = math.ceil(pdfium.FPDF_GetPageHeightF(page))\n \n # Create a bitmap\n use_alpha = False # We don't render with transparent background\n bitmap = pdfium.FPDFBitmap_Create(width, height, int(use_alpha))\n # Fill the whole bitmap with a white background\n # The color is given as a 32-bit integer in ARGB format (8 bits per channel)\n pdfium.FPDFBitmap_FillRect(bitmap, 0, 0, width, height, 0xFFFFFFFF)\n \n # Store common rendering arguments\n render_args = (\n bitmap, # the bitmap\n page, # the page\n # positions and sizes are to be given in pixels and may exceed the bitmap\n 0, # left start position\n 0, # top start position\n width, # horizontal size\n height, # vertical size\n 0, # rotation (as constant, not in degrees!)\n pdfium.FPDF_LCD_TEXT | pdfium.FPDF_ANNOT, # rendering flags, combined with binary or\n )\n \n # Render the page\n pdfium.FPDF_RenderPageBitmap(*render_args)\n \n # Get a pointer to the first item of the buffer\n first_item = pdfium.FPDFBitmap_GetBuffer(bitmap)\n # Re-interpret the pointer to encompass the whole buffer\n buffer = ctypes.cast(first_item, ctypes.POINTER(ctypes.c_ubyte * (width * height * 4)))\n \n # Create a PIL image from the buffer contents\n img = PIL.Image.frombuffer(\"RGBA\", (width, height), buffer.contents, \"raw\", \"BGRA\", 0, 1)\n # Save it as file\n img.save(\"out.png\")\n \n # Free resources\n pdfium.FPDFBitmap_Destroy(bitmap)\n pdfium.FPDF_ClosePage(page)\n pdfium.FPDF_CloseDocument(pdf)\n ```\n\n### [Command-line Interface](https://pypdfium2.readthedocs.io/en/stable/shell_api.html)\n\npypdfium2 also ships with a simple command-line interface, providing access to key features of the support model in a shell environment (e. g. rendering, content extraction, document inspection, page rearranging, ...).\n\nThe primary motivation behind this is to have a nice testing interface, but it may be helpful in a variety of other situations as well.\nUsage should be largely self-explanatory, assuming a minimum of familiarity with the command-line.\n\n\n## Licensing\n\nPDFium and pypdfium2 are available by the terms and conditions of either [`Apache-2.0`](LICENSES/Apache-2.0.txt) or [`BSD-3-Clause`](LICENSES/BSD-3-Clause.txt), at your choice.\nVarious other open-source licenses apply to dependencies bundled with PDFium. Verbatim copies of their respective licenses are contained in the file [`LicenseRef-PdfiumThirdParty.txt`](LICENSES/LicenseRef-PdfiumThirdParty.txt), which also has to be shipped with binary redistributions.\nDocumentation and examples of pypdfium2 are licensed under [`CC-BY-4.0`](LICENSES/CC-BY-4.0.txt).\n\npypdfium2 complies with the [reuse standard](https://reuse.software/spec/) by including [SPDX](https://spdx.org/licenses/) headers in source files, and license information for data files in [`.reuse/dep5`](.reuse/dep5).\n\nTo the author's knowledge, pypdfium2 is one of the rare Python libraries that are capable of PDF rendering while not being covered by copyleft licenses (such as the `GPL`).[^liberal_pdf_renderlibs]\n\nAs of early 2023, a single developer is author and rightsholder of the code base (apart from a few minor [code contributions](https://github.com/pypdfium2-team/pypdfium2/graphs/contributors)).\n\n[^liberal_pdf_renderlibs]: The only other liberal-licensed PDF rendering libraries known to the authors are [`pdf.js`](https://github.com/mozilla/pdf.js/) (JavaScript) and [`Apache PDFBox`](https://github.com/apache/pdfbox) (Java). `pdf.js` is limited to a web environment. Creating Python bindings to `PDFBox` might be possible but there is no serious solution yet (apart from amateurish wrappers around its command-line API).\n\n\n## Issues\n\nWhile using pypdfium2, you might encounter bugs or missing features.\nIn this case, please file an issue report. Remember to include applicable details such as tracebacks, operating system and CPU architecture, as well as the versions of pypdfium2 and used dependencies.\n\nIn case your issue could be tracked down to a third-party dependency, we will accompany or conduct subsequent measures.\n\nHere is a roadmap of relevant places:\n* pypdfium2\n - [Issues panel](https://github.com/pypdfium2-team/pypdfium2/issues): Initial reports of specific issues.\n May need to be transferred to other projects if not caused by or fixable in pypdfium2 code alone.\n - [Discussions page](https://github.com/pypdfium2-team/pypdfium2/discussions): General questions and suggestions.\n - In case you do not want to publicly disclose the issue or your code, you may also contact the maintainers privately via e-mail.\n* PDFium\n - [Bug tracker](https://bugs.chromium.org/p/pdfium/issues/list): Defects in PDFium.\n Beware: The bridge between Python and C increases the probability of integration issues or API misuse.\n The symptoms can often make it look like a PDFium bug while it is not. In some cases, this may be quite difficult to distinguish.\n - [Mailing list](https://groups.google.com/g/pdfium/): Questions regarding PDFium usage.\n* [pdfium-binaries](https://github.com/bblanchon/pdfium-binaries/issues): Binary builder.\n* [ctypesgen](https://github.com/ctypesgen/ctypesgen/issues): Bindings generator.\n\n### Known limitations\n\npypdfium2 also has some drawbacks, of which you will be informed below.\n\n#### Incompatibility with CPython 3.7.6 and 3.8.1\n\npypdfium2 built with mainstream ctypesgen cannot be used with releases 3.7.6 and 3.8.1 of the CPython interpreter due to a [regression](https://github.com/python/cpython/pull/16799#issuecomment-612353119) that [broke](https://github.com/ctypesgen/ctypesgen/issues/77) ctypesgen-created string handling code.\n\nHowever, we are currently [making efforts](https://github.com/ctypesgen/ctypesgen/pull/162) to remove ctypesgen's wonky string code.\nSince version 4, pypdfium2 releases will be built with a patched variant of ctypesgen.\n\n#### Risk of unknown object lifetime violations\n\nAs outlined in the raw API section, it is essential that Python-managed resources remain available as long as they are needed by PDFium.\n\nThe problem is that the Python interpreter may garbage collect objects with reference count zero at any time. Thus, it can happen that an unreferenced but still required object by chance stays around long enough before it is garbage collected. Such dangling objects are likely to cause non-deterministic segmentation faults.\nIf the timeframe between reaching reference count zero and removal is sufficiently large and roughly consistent across different runs, it is even possible that mistakes regarding object lifetime remain unnoticed for a long time.\n\nAlthough great care has been taken while developing the support model, it cannot be fully excluded that unknown object lifetime violations are still lurking around somewhere, especially if unexpected requirements were not documented by the time the code was written.\n\n#### No direct access to raw PDF data structure\n\n<!-- https://crbug.com/pdfium/1694 -->\n\nPDFium does not currently provide direct access to the raw PDF data structure. It does not publicly expose APIs to read/write PDF dictionaries, name trees, etc. Instead, it merely offers a variety of higher-level functions to modify PDFs. While these are certainly useful to abstract some of the format's complexity and avoid the creation of invalid PDFs, the lack of public instruments for raw access considerably limits the library's potential. If PDFium's capabilities are not sufficient for your use case, or you just wish to work with the raw PDF structure on your own, you may want to consider other products such as [`pikepdf`](https://github.com/pikepdf/pikepdf) to use instead of, or in conjunction with, pypdfium2.\n\n\n## Development\n\nThis section contains some key information relevant for project maintainers.\n\n<!-- TODO wheel tags, maintainer access, GitHub peculiarities -->\n\n### Documentation\n\npypdfium2 provides API documentation using [Sphinx](https://github.com/sphinx-doc/sphinx/). It can be rendered to various formats, including HTML:\n```bash\nsphinx-build -b html ./docs/source ./docs/build/html/\n```\n\nBuilt documentation is primarily hosted on [`readthedocs.org`](https://readthedocs.org/projects/pypdfium2/).\nIt may be configured using a [`.readthedocs.yaml`](.readthedocs.yaml) file (see [instructions](https://docs.readthedocs.io/en/stable/config-file/v2.html)), and the administration page on the web interface.\nRTD supports hosting multiple versions, so we currently have one linked to the `main` branch and another to `stable`.\nNew builds are automatically triggered by a webhook whenever you push to a linked branch.\n\nAdditionally, one documentation build can also be hosted on [GitHub Pages](https://pypdfium2-team.github.io/pypdfium2/index.html).\nIt is implemented with a CI workflow, which is currently linked to `main` and triggered on push as well.\nThis provides us with full control over the build environment and the used commands, whereas RTD is kind of limited in this regard.\n\n\n### Testing\n\npypdfium2 contains a small test suite to verify the library's functionality. It is written with [pytest](https://github.com/pytest-dev/pytest/):\n```bash\npython3 -m pytest tests/ tests_old/\n```\n\nNote that ...\n* you can pass `-sv` to get more detailed output.\n* `$DEBUG_AUTOCLOSE=1` may be set to get debugging information on automatic object finalization.\n\nTo get code coverage statistics, you can run\n```bash\nmake coverage\n```\n\nSometimes, it can also be helpful to test code on many PDFs.[^testing_corpora]\nIn this case, the command-line interface and `find` come in handy:\n```bash\n# Example A: Analyse PDF images (in the current working directory)\nfind . -name '*.pdf' -exec bash -c \"echo \\\"{}\\\" && pypdfium2 pageobjects \\\"{}\\\" --types image\" \\;\n# Example B: Parse PDF table of contents\nfind . -name '*.pdf' -exec bash -c \"echo \\\"{}\\\" && pypdfium2 toc \\\"{}\\\"\" \\;\n```\n\n[^testing_corpora]: For instance, one could use the testing corpora of open-source PDF libraries (pdfium, pikepdf/ocrmypdf, mupdf/ghostscript, tika/pdfbox, pdfjs, ...)\n\n### Release workflow\n\nThe release process is fully automated using Python scripts and a CI setup for GitHub Actions.\nA new release is triggered every Tuesday, one day after `pdfium-binaries`.\nYou may also trigger the workflow manually using the GitHub Actions panel or the [`gh`](https://cli.github.com/) command-line tool.\n\nPython release scripts are located in the folder `setupsrc/pypdfium2_setup`, along with custom setup code:\n* `update_pdfium.py` downloads binaries and generates the bindings.\n* `craft_packages.py` builds platform-specific wheel packages and a source distribution suitable for PyPI upload.\n* `autorelease.py` takes care of versioning, changelog, release note generation and VCS checkin.\n\nThe autorelease script has some peculiarities maintainers should know about:\n* The changelog for the next release shall be written into `docs/devel/changelog_staging.md`.\n On release, it will be moved into the main changelog under `docs/source/changelog.md`, annotated with the PDFium version update.\n It will also be shown on the GitHub release page.\n* pypdfium2 versioning uses the pattern `major.minor.patch`, optionally with an appended beta mark (e. g. `2.7.1`, `2.11.0`, `3.0.0b1`, ...).\n Version changes are based on the following logic:\n * If PDFium was updated, the minor version is incremented.\n * If only pypdfium2 code was updated, the patch version is incremented instead.\n * Major updates and beta marks are controlled via empty files in the `autorelease/` directory.\n If `update_major.txt` exists, the major version is incremented.\n If `update_beta.txt` exists, a new beta tag is set, or an existing one is incremented.\n These files are removed automatically once the release is finished.\n * If switching from a beta release to a non-beta release, only the beta mark is removed while minor and patch versions remain unchanged.\n\nIn case of necessity, you may also forego autorelease/CI and do the release manually, which will roughly work like this (though ideally it should never be needed):\n* Commit changes to the version file\n ```bash\n git add src/pypdfium2/version.py\n git commit -m \"increment version\"\n git push\n ```\n* Create a new tag that matches the version file\n ```bash\n # substitute $VERSION accordingly\n git tag -a $VERSION\n git push --tags\n ```\n* Build the packages\n ```bash\n python3 setupsrc/pypdfium2_setup/update_pdfium.py\n python3 setupsrc/pypdfium2_setup/craft_packages.py\n ```\n* Upload to PyPI\n ```bash\n # make sure the packages are valid\n twine check dist/*\n # upload to PyPI (this will interactively ask for your username/password)\n twine upload dist/*\n ```\n* Update the `stable` branch to trigger a documentation rebuild\n ```bash\n git checkout stable\n git rebase origin/main # alternatively: git reset --hard main\n git checkout main\n ```\n\nIf something went wrong with commit or tag, you can still revert the changes:\n```bash\n# perform an interactive rebase to change history (substitute $N_COMMITS with the number of commits to drop or modify)\ngit rebase -i HEAD~$N_COMMITS\ngit push --force\n# delete local tag (substitute $TAGNAME accordingly)\ngit tag -d $TAGNAME\n# delete remote tag\ngit push --delete origin $TAGNAME\n```\nFaulty PyPI releases may be yanked using the web interface.\n\n\n## Thanks to[^thanks_to]\n\n<!-- order: alphabetical by surname -->\n\n* [Beno\u00eet Blanchon](https://github.com/bblanchon): Author of [PDFium binaries](https://github.com/bblanchon/pdfium-binaries/) and [patches](sourcebuild/patches/).\n* [Anderson Bravalheri](https://github.com/abravalheri): Help with PEP 517/518 compliance. Hint to use an environment variable rather than separate setup files.\n* [Bastian Germann](https://github.com/bgermann): Help with inclusion of licenses for third-party components of PDFium.\n* [Tim Head](https://github.com/betatim): Original idea for Python bindings to PDFium with ctypesgen in `wowpng`.\n* [Yinlin Hu](https://github.com/YinlinHu): `pypdfium` prototype and `kuafu` PDF viewer.\n* [Adam Huganir](https://github.com/adam-huganir): Help with maintenance and development decisions since the beginning of the project.\n* [kobaltcore](https://github.com/kobaltcore): Bug fix for `PdfDocument.save()`.\n* [Mike Kroutikov](https://github.com/mkroutikov): Examples on how to use PDFium with ctypes in `redstork` and `pdfbrain`.\n* [Peter Saalbrink](https://github.com/petersaalbrink): Code style improvements to the multipage renderer.\n\n... and further [code contributors](https://github.com/pypdfium2-team/pypdfium2/graphs/contributors) (GitHub stats).\n\n*If you have somehow contributed to this project but we forgot to mention you here, please let us know.*\n\n[^thanks_to]: People listed in this section may not necessarily have contributed any copyrightable code to the repository. Some have rather helped with ideas, or contributions to dependencies of pypdfium2.\n\n\n## History\n\npypdfium2 is the successor of *pypdfium* and *pypdfium-reboot*.\n\nInspired by *wowpng*, the first known proof of concept Python binding to PDFium using ctypesgen, the initial *pypdfium* package was created. It had to be updated manually, which did not happen frequently. There were no platform-specific wheels, but only a single wheel that contained binaries for 64-bit Linux, Windows and macOS.\n\n*pypdfium-reboot* then added a script to automate binary deployment and bindings generation to simplify regular updates. However, it was still not platform specific.\n\npypdfium2 is a full rewrite of *pypdfium-reboot* to build platform-specific wheels and consolidate the setup scripts. Further additions include ...\n* A CI workflow to automatically release new wheels every Tuesday\n* Support models that conveniently wrap the raw PDFium/ctypes API\n* Test code\n* A script to build PDFium from source\n",
"bugtrack_url": null,
"license": "Apache-2.0 or BSD-3-Clause",
"summary": "Python bindings to PDFium",
"version": "4.3.0",
"split_keywords": [
"pdf",
"pdfium"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "bedaf4efcc3c0a1ed5d45e1f37c301ddd6701f5cf1c240f9f0a5e2d0923bc920",
"md5": "ba6f7235a504cc8645a5ff3bf0fdbc7a",
"sha256": "89284955d4d60e1ef22a74be6fdadb13e9b6f70f461fa8b5786d16830a14004e"
},
"downloads": -1,
"filename": "pypdfium2-4.3.0-py3-none-macosx_10_13_x86_64.whl",
"has_sig": false,
"md5_digest": "ba6f7235a504cc8645a5ff3bf0fdbc7a",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 2855047,
"upload_time": "2023-03-21T04:05:45",
"upload_time_iso_8601": "2023-03-21T04:05:45.668712Z",
"url": "https://files.pythonhosted.org/packages/be/da/f4efcc3c0a1ed5d45e1f37c301ddd6701f5cf1c240f9f0a5e2d0923bc920/pypdfium2-4.3.0-py3-none-macosx_10_13_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "63eb93770678c0afdf606f05043c4a99a5c9646a59d6da20257095777fe96cfb",
"md5": "ced4fa25709f494e89617c011d6d3151",
"sha256": "80008e4af8ff486f8d92177842eecc75542ae3f10db59dab68737051075254bb"
},
"downloads": -1,
"filename": "pypdfium2-4.3.0-py3-none-macosx_11_0_arm64.whl",
"has_sig": false,
"md5_digest": "ced4fa25709f494e89617c011d6d3151",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 2743588,
"upload_time": "2023-03-21T04:05:48",
"upload_time_iso_8601": "2023-03-21T04:05:48.501582Z",
"url": "https://files.pythonhosted.org/packages/63/eb/93770678c0afdf606f05043c4a99a5c9646a59d6da20257095777fe96cfb/pypdfium2-4.3.0-py3-none-macosx_11_0_arm64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "d5ad5dcfd258048ddc1a430e54905e9dddf0f622462c6749c10835679a98417d",
"md5": "5a3910a30c101d24b94ea8085cc7572e",
"sha256": "360ff481685f2d9c4d3b92a5f7df8ea7fc7f584d655b675479950d8f0034f65d"
},
"downloads": -1,
"filename": "pypdfium2-4.3.0-py3-none-manylinux_2_26_aarch64.whl",
"has_sig": false,
"md5_digest": "5a3910a30c101d24b94ea8085cc7572e",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 2839801,
"upload_time": "2023-03-21T04:05:50",
"upload_time_iso_8601": "2023-03-21T04:05:50.248546Z",
"url": "https://files.pythonhosted.org/packages/d5/ad/5dcfd258048ddc1a430e54905e9dddf0f622462c6749c10835679a98417d/pypdfium2-4.3.0-py3-none-manylinux_2_26_aarch64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "76cac4c4ad028b5904a30c1c7102a4de1a176df245984ac1408d484f19197016",
"md5": "ec7ad11a045b62eb2df43e4d1351b9d0",
"sha256": "71d208a42267a7ff3c9451ade2af215bcb6bc4c0c21d83d580f976efae233222"
},
"downloads": -1,
"filename": "pypdfium2-4.3.0-py3-none-manylinux_2_26_armv7l.whl",
"has_sig": false,
"md5_digest": "ec7ad11a045b62eb2df43e4d1351b9d0",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 2606067,
"upload_time": "2023-03-21T04:05:53",
"upload_time_iso_8601": "2023-03-21T04:05:53.058023Z",
"url": "https://files.pythonhosted.org/packages/76/ca/c4c4ad028b5904a30c1c7102a4de1a176df245984ac1408d484f19197016/pypdfium2-4.3.0-py3-none-manylinux_2_26_armv7l.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "4a53ee4e1c83f203f5cab8997aa9c4211fe3a7f2ab919afa6f545d6abd66d86c",
"md5": "c491eeb80a788792a554baee0a3d881b",
"sha256": "71d9019baa659f67f01f78d130aa9558397f1009a65ea26a0e5b0acf447d2d49"
},
"downloads": -1,
"filename": "pypdfium2-4.3.0-py3-none-manylinux_2_26_i686.whl",
"has_sig": false,
"md5_digest": "c491eeb80a788792a554baee0a3d881b",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 2888177,
"upload_time": "2023-03-21T04:05:55",
"upload_time_iso_8601": "2023-03-21T04:05:55.332471Z",
"url": "https://files.pythonhosted.org/packages/4a/53/ee4e1c83f203f5cab8997aa9c4211fe3a7f2ab919afa6f545d6abd66d86c/pypdfium2-4.3.0-py3-none-manylinux_2_26_i686.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "ddb1c7c4a30f86d9fbcdec59e339eccbcb94ef9d4fdae29aba252a77b8f574bc",
"md5": "25a87b7f3407acf4d48d226260e6c145",
"sha256": "45d99532e15503f7c8e515cbc001b8fae2aae6af0a09c3ee02ca20b50564b833"
},
"downloads": -1,
"filename": "pypdfium2-4.3.0-py3-none-manylinux_2_26_x86_64.whl",
"has_sig": false,
"md5_digest": "25a87b7f3407acf4d48d226260e6c145",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 2845213,
"upload_time": "2023-03-21T04:05:57",
"upload_time_iso_8601": "2023-03-21T04:05:57.505127Z",
"url": "https://files.pythonhosted.org/packages/dd/b1/c7c4a30f86d9fbcdec59e339eccbcb94ef9d4fdae29aba252a77b8f574bc/pypdfium2-4.3.0-py3-none-manylinux_2_26_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "534d60f698559b829cda75e35480888a7d886a1f463ebac050026d7425ac1428",
"md5": "7b5f7fb0769bb7a0fceb00ca05ed0c9e",
"sha256": "36aa87b60349d5e1de45e53967a756383e64f0a3608e14c3f2739df86dc546ea"
},
"downloads": -1,
"filename": "pypdfium2-4.3.0-py3-none-musllinux_1_2_i686.whl",
"has_sig": false,
"md5_digest": "7b5f7fb0769bb7a0fceb00ca05ed0c9e",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 2890625,
"upload_time": "2023-03-21T04:05:59",
"upload_time_iso_8601": "2023-03-21T04:05:59.672761Z",
"url": "https://files.pythonhosted.org/packages/53/4d/60f698559b829cda75e35480888a7d886a1f463ebac050026d7425ac1428/pypdfium2-4.3.0-py3-none-musllinux_1_2_i686.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "c004c8f835b57b74d426ca3b885f55d8975edc3d0a28869e1dee7252bdde1ff4",
"md5": "cdae381585ef83eb13e6ad1057171b30",
"sha256": "aef9c0e24cdaca62523557fc5323bccdff6b19157dd91f99fa5e1ce457a10a6b"
},
"downloads": -1,
"filename": "pypdfium2-4.3.0-py3-none-musllinux_1_2_x86_64.whl",
"has_sig": false,
"md5_digest": "cdae381585ef83eb13e6ad1057171b30",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 2860960,
"upload_time": "2023-03-21T04:06:02",
"upload_time_iso_8601": "2023-03-21T04:06:02.776050Z",
"url": "https://files.pythonhosted.org/packages/c0/04/c8f835b57b74d426ca3b885f55d8975edc3d0a28869e1dee7252bdde1ff4/pypdfium2-4.3.0-py3-none-musllinux_1_2_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "68a8d99acff156abc716a6a831984c9192c479bc14c12678a8f67b409c589764",
"md5": "1684a6395a4cb3caadcc37bfd5e1b452",
"sha256": "a0d73b4e1c32b0b39373fc1788c776ad4b87db82742558dae9da1050d9a8c17d"
},
"downloads": -1,
"filename": "pypdfium2-4.3.0-py3-none-win32.whl",
"has_sig": false,
"md5_digest": "1684a6395a4cb3caadcc37bfd5e1b452",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 2601773,
"upload_time": "2023-03-21T04:06:04",
"upload_time_iso_8601": "2023-03-21T04:06:04.767441Z",
"url": "https://files.pythonhosted.org/packages/68/a8/d99acff156abc716a6a831984c9192c479bc14c12678a8f67b409c589764/pypdfium2-4.3.0-py3-none-win32.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "4a7aeed7cc98a61b2a2f4d075a036a9b1fefa971dc794140ed3c99c53fa69b93",
"md5": "aa10ef75b12e00f8153de096fa650ef3",
"sha256": "a3d0237840efc91053a3c0c45b2e943629c46e4ed558a9068acf3e975780f4ee"
},
"downloads": -1,
"filename": "pypdfium2-4.3.0-py3-none-win_amd64.whl",
"has_sig": false,
"md5_digest": "aa10ef75b12e00f8153de096fa650ef3",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 2714872,
"upload_time": "2023-03-21T04:06:06",
"upload_time_iso_8601": "2023-03-21T04:06:06.401025Z",
"url": "https://files.pythonhosted.org/packages/4a/7a/eed7cc98a61b2a2f4d075a036a9b1fefa971dc794140ed3c99c53fa69b93/pypdfium2-4.3.0-py3-none-win_amd64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "e0f2804e1e752513642efdac06604c59bc09e6e5daf6d80f0811b77ea1bb3002",
"md5": "acc1eee46c781a74c9bde8b7f327f8fc",
"sha256": "c09d021398e35f0ee77ec3c630e58fc6740d7580e8e10c4c9c477e8d3f94dc71"
},
"downloads": -1,
"filename": "pypdfium2-4.3.0-py3-none-win_arm64.whl",
"has_sig": false,
"md5_digest": "acc1eee46c781a74c9bde8b7f327f8fc",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 2584211,
"upload_time": "2023-03-21T04:06:08",
"upload_time_iso_8601": "2023-03-21T04:06:08.607619Z",
"url": "https://files.pythonhosted.org/packages/e0/f2/804e1e752513642efdac06604c59bc09e6e5daf6d80f0811b77ea1bb3002/pypdfium2-4.3.0-py3-none-win_arm64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "295079112f1b16e7f4539a58d69619a5d6254f4919b87d7c0ea357426b00d81b",
"md5": "8beef85a6f000fab489a20c9bbf55e8e",
"sha256": "81f7b6319f141a30edbba89953be83623c90974cc9bc819d910d525ad9df7d0d"
},
"downloads": -1,
"filename": "pypdfium2-4.3.0.tar.gz",
"has_sig": false,
"md5_digest": "8beef85a6f000fab489a20c9bbf55e8e",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 115949,
"upload_time": "2023-03-21T04:06:10",
"upload_time_iso_8601": "2023-03-21T04:06:10.623373Z",
"url": "https://files.pythonhosted.org/packages/29/50/79112f1b16e7f4539a58d69619a5d6254f4919b87d7c0ea357426b00d81b/pypdfium2-4.3.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-03-21 04:06:10",
"github": false,
"gitlab": false,
"bitbucket": false,
"lcname": "pypdfium2"
}