Working with interactive forms

pikepdf provides two interfaces for working with interactive forms. There is a low-level interface, pikepdf.AcroForm, which is exposed as the pikepdf.Pdf.acroform property. There is also a higher-level interface available in the pikepdf.form module, which provides several abstractions to make usage easier.

Extracting Form Data

It is relatively easy to extract basic form data from a PDF.

>>> from pikepdf.form import Form

>>> form = Form(pdf)

>>> data = {}

>>> for field_name, field in form.items():
...    if field.is_text or field.is_choice or field.is_radio_button:
...        data[field_name] = field.value
...    elif field.is_checkbox:
...        data[field_name] = field.checked

Inspecting the Form

The form allows retrieving specific named fields via dict-like access. There are several useful properties common to all fields. The most useful of these are:

  • alternate_name, which is a human-readable label for the field.

  • fully_qualified_name, which is the machine-readable key which identifies this field

  • is_required

  • is_text

  • is_checkbox

  • is_radio_button

  • is_pushbutton

  • is_choice

>>> field = form['MyField']

>>> field.fully_qualified_name
"MyField"

>>> field.alternate_name
"Applicant's first given name"

>>> field.is_text
True

>>> field.is_required
False

Fields with duplicate names are supported. Accessing them by name returns a list of fields instead of a single field. Accessing attributes directly on this list (e.g. field.value) will proxy to the first field in the list.

Filling Form Data

Before filling a form, you will need to determine how you will deal with appearance streams. In addition to merely holding values, PDF form fields must explicitly declare how the filled-in value should look. This is known as the appearance stream. There are several options available.

First, you may choose not to generate appearance streams at all. Most full-fat PDF readers are capable of generating these appearance streams themselves, so depending on your use case it may be acceptable to leave appearance stream generation to the end-user application. This is the default behavior of the pikepdf.form.Form class.

If you do need or want to generate appearance streams, you must provide the class you wish to use to accomplish this task. There are two possible implementations provided with pikepdf: pikepdf.form.DefaultAppearanceStreamGenerator and pikepdf.form.ExtendedAppearanceStreamGenerator. To use either of these, simply pass the class as the second argument to the constructor:

>>> from pikepdf.form import Form, DefaultAppearanceStreamGenerator

>>> form = Form(pdf, DefaultAppearanceStreamGenerator)

The differences between these two options is explained in the documentation for each class.

Lastly, you may implement your own class for generating appearance streams that better fits your specific use case. It must implement the interface provided by pikepdf.form.AppearanceStreamGenerator.

After filling a form, you may also wish to flatten it. This converts the interactive form fields into normal, un-editable text. This can be done as follows:

pdf.flatten_annotations()

Generating appearance streams is required if you wish to flatten the form.

Text Fields

Text fields can either resemble an HTML text input, or an HTML textarea, as well as a password field, file upload, or rich text input. pikepdf supports only the first two options, which can be distinguished from one another using the is_multiline property.

The underlying value of the text field is stored in the value property. The field may also have a default_value which should be used when resetting the form.

>>> text_field = form['MyTextField']

>>> text_field.is_multiline
False

>>> text_field.default_value
''

>>> text_field.value
''

>>> text_field.max_length
75

>>> text_field.value = "Hello World!"

Checkbox Fields

Checkbox fields behave somewhat similarly to what you might be familiar with working with HTML forms in JavaScript. There is a checked property which will tell you if the box is checked or not. If access to the underlying value is needed, it can be fetched via the value property.

Unlike HTML checkboxes, however, there is a value for both the on and off states, and thus value will return different values depending on if the box is checked or not. The value for an off state will be a pikepdf.Name with the value “/Off”. The value for the on state is variable, and can be retrieved from the on_value property.

>>> checkbox_field = form['MyCheckbox']

>>> checkbox.checked
False

>>> checkbox.value
pikepdf.Name("/Off")

>>> checkbox.on_value
pikepdf.Name("/Yes")

>> checkbox.states
(pikepdf.Name("/Yes"), pikepdf.Name("/Off"))

>>> checkbox.checked = True

>>> checkbox.value
pikepdf.Name("/Yes")

Radio Button Groups

A radio button group is constrained to a finite list of allowed values, which are all pikepdf.Name objects. The list of allowed values can be obtained via the states property.

>>> radio_group = form['MyRadioButtonGroup']

>>> radio_group.states
(pikepdf.Name("/1"), pikepdf.Name("/2"), pikepdf.Name("/3"))

>>> radio_group.value
None

>>> radio_group.value = pikepdf.Name("/1")

Radio buttons are returned as a group rather than as individual buttons, though representations of the individual buttons can be obtained by way of the options property. You can set the selection option via the group’s selected property, or via the button’s select method.

>>> radio_group.options[0].checked
True

>>> radio_group.options[1].on_value
pikepdf.Name("/2")

>>> radio_group.options[1].states
(pikepdf.Name("/2"), pikepdf.Name("/Off"))

>>> radio_group.selected = radio_group.options[1]

>>> radio_group.value
pikepdf.Name("/2")

>>> radio_group.options[2].select()

>>> radio_group.value
pikepdf.Name("/3")

Choice Fields

Choice fields may be either list boxes or comboboxes, as determined by the is_combobox property. If the field is a combobox, it may optionally have an editable text box attached to it, as shown by the allows_edit property. Editable choice fields may store arbitrary values, but otherwise choice fields are limited to those options which are returned via the options property.

>>> field = form['MyChoiceField']

>>> field.is_combobox
True

>>> field.allows_edit
False

>>> field.options[0].display_name
"Pike"

>>> field.options[2].select()

>>> field.value
"Trout"

>>> field.value = "Pike"

Signature Fields

pikepdf does not support signature fields, but does include a utility function to stamp an image over the top of the field’s bounding box. The stamped image must be a PDF.

>>> form_pdf = Pdf.open(...)

>>> sig_pdf = Pdf.open(...)

>>> form = Form(form_pdf)

>>> form['MySigField'].stamp_overlay(sig_pdf.pages[0])

To stamp an image that is not already a PDF, you will need to use an image processing library, such as Pillow to convert it:

>>> from PIL import Image

>>> img = Image.open(img).convert('RGB')

>>> img_as_pdf = BytesIO()

>>> img.save(img_as_pdf, 'pdf')

>>> img_as_pdf.seek(0)

>>> sig_pdf = Pdf.open(img_as_pdf)