PDF, PS and DjVu

This article covers software to view, edit and convert PDF, PostScript (PS), DjVu (déjà vu) and XPS files.

Engines

DjVuLibre — Suite to create, manipulate and view DjVu documents.

https://djvu.sourceforge.net/ || djvulibre

Ghostscript — Interpreter for PostScript and PDF. Provides the gs(1) command-line interface, see also /usr/share/doc/ghostscript/*/Use.htm (online), along with many wrapper scripts like ps2pdf and pdf2ps.

https://ghostscript.com/ || ghostscript

libgxps — GObject based library for handling and rendering XPS documents.

https://wiki.gnome.org/Projects/libgxps || libgxps

libspectre — Small library for rendering Postscript documents.

https://www.freedesktop.org/wiki/Software/libspectre || libspectre

Mupdf — MuPDF is a lightweight PDF, XPS, and EPUB viewer, consisting of a software library, command line tools, and viewers.

https://mupdf.com/ || libmupdf

Poppler — PDF rendering library based on Xpdf. For CJK (Chinese, Japanese, Korean) support with Poppler, install poppler-data.

https://poppler.freedesktop.org/ || poppler

Viewers

Framebuffer

fbgs — Poor man's PostScript/pdf viewer for the linux framebuffer console.

https://www.kraxel.org/blog/linux/fbida/ || fbida

fbpdf — Small framebuffer PDF and DjVu viewer based on MuPDF, with Vim keybindings and written in C

https://repo.or.cz/w/fbpdf.git || fbpdf-git^AUR

jfbview — Framebuffer PDF and image viewer. Features include Vim-like controls, zoom-to-fit, a TOC (outline) view and fast multi-threaded rendering.

https://github.com/jichu4n/jfbview || jfbview^AUR

Graphical

Note: Some web browsers can display PDF files, for example with PDF.js.

apvlv — Lightweight document viewer with Vim keybindings using GTK libraries. Supports PDF, DjVu, EPUB, HTML and TXT.

https://naihe2010.github.io/apvlv/ || apvlv^AUR

Atril — Simple multi-page document viewer for MATE. Supports DjVu, DVI, EPS, EPUB, PDF, PostScript, TIFF, XPS and Comicbook.

https://github.com/mate-desktop/atril || atril

CorePDF — Simple lightweight PDF viewer based on Qt and poppler. Part of C-Suite.

https://cubocore.gitlab.io/ || corepdf^AUR

Deepin Document Viewer — A simple PDF and DjVu reader, supporting bookmarks, highlights and annotations.

https://github.com/linuxdeepin/deepin-reader || deepin-reader

DjView — Viewer for DjVu documents.

https://djvu.sourceforge.net/djview4.html || djview

Emacs — See also pdf-tools for improved pdf support (emacs-pdf-tools-git^AUR) and the djvu package for djvu support.

https://www.gnu.org/software/emacs/ || emacs

ePDFView — Lightweight PDF document viewer using the Poppler and GTK libraries. Development stopped.

http://freecode.com/projects/epdfview || epdfview-git^AUR

Foxit Reader — Small, fast (compared to Acrobat) proprietary PDF viewer. Releases (outside of security updates) are discontinued for Linux (November 2020).

https://www.foxitsoftware.com/pdf-reader/ || foxitreader^AUR

GNOME Document Viewer — Document viewer for GNOME using GTK. Supports DjVu, DVI, EPS, PDF, PostScript, TIFF, XPS and Comicbook. Part of gnome.

https://apps.gnome.org/Evince/ || evince

gv — Graphical user interface for the Ghostscript interpreter that allows to view and navigate through PostScript and PDF documents.

https://www.gnu.org/software/gv/ || gv^AUR

llpp — Very fast PDF reader based off of MuPDF, that supports continuous page scrolling, bookmarking, and text search through the whole document.

https://repo.or.cz/w/llpp.git || llpp^AUR

MuPDF — Very fast EPUB, FictionBook, PDF, XPS and Comicbook viewer written in portable C. Features CJK font support and vim-like bindings.

https://mupdf.com/ || mupdf

Okular — Universal document viewer for KDE. Supports CHM, Comicbook, DjVu, DVI, EPUB, FictionBook, Mobipocket, ODT, PDF, Plucker, PostScript, TIFF and XPS. Part of kde-graphics.

https://okular.kde.org/ || okular

Papers — Document viewer for GNOME using GTK. Supports DjVu, EPS, PDF, PostScript, TIFF, XPS and Comicbook.

https://apps.gnome.org/Papers/ || papers^AUR

pdfpc — Presenter console with multi-monitor support for PDF files.

https://pdfpc.github.io/ || pdfpc

qpdfview — Tabbed document viewer. It uses Poppler for PDF support, libspectre for PS support, DjVuLibre for DjVu support, CUPS for printing support and the Qt toolkit for its interface.

https://launchpad.net/qpdfview || qpdfview^AUR

Sioyek — Lightweight PDF viewer based on MuPDF with features designed for viewing research papers and technical books, e.g., marking, bookmarking, highlighting, searchable command palette, jumping to references, and more.

https://sioyek.info/ || sioyek^AUR

Xpdf — Viewer that can decode LZW and read encrypted PDFs.

https://www.xpdfreader.com/ || xpdf

Xreader — Document viewer part of the X-Apps Project. Supports DjVu, DVI, EPUB, PDF, PostScript, TIFF, XPS, Comicbook.

https://github.com/linuxmint/xreader/ || xreader

Zathura — Highly customizable and functional document viewer (plugin based). Supports PDF, DjVu, PostScript and Comicbook.

https://pwmt.org/projects/zathura/ || zathura

Comparison

The factual accuracy of this article or section is disputed.

Reason: Filling out PDF forms seem to be broken in MuPDF and llpp. (Discuss in Talk:PDF, PS and DjVu)

Name	PDF	PostScript	DjVu	XPS	PDF forms	PDF Annotation	Non-rectangle selection^{[dead link 2024-07-30 ⓘ]}	License
Adobe Reader	Custom	–	–	–	Yes	–	Yes	proprietary
apvlv	Poppler	–	DjVuLibre	–	No	–	No (not by default, at least)	GPLv2
Atril	Poppler	libspectre	DjVuLibre	libgxps	Yes	–	–	GPLv2
DjView	–	–	DjVuLibre	–	–	–	–	GPLv2
Emacs	Ghostscript¹		DjVuLibre¹	–	No	Yes	Yes	GPLv3
Emacs pdf-tools	Poppler	–	–	–	–	Yes	Yes	GPLv3
ePDFView	Poppler	–	–	–	No	–	–	GPLv2
Foxit Reader	Custom	–	–	–	Yes	Yes	Yes	proprietary
GNOME Document Viewer	Poppler	libspectre	DjVuLibre	libgxps	Yes	Yes	Yes	GPLv2
gv	Ghostscript		–	–	No	–	–	GPLv3
llpp	libmupdf	–	–	libmupdf	Yes	–	–	GPLv3
MuPDF	Custom	–	–	Custom	Yes (mupdf-gl)	Yes (mupdf-gl)	Yes (mupdf-gl)	AGPLv3
Okular	Poppler	libspectre	DjVuLibre	Custom	Yes	Yes	Yes	GPL, LGPL
PDF4QT	Custom	–	–	–	No	Yes	Yes	LGPLv3
pdfpc	Poppler	–	–	–	No	–	–	GPLv2
qpdfview	Poppler	libspectre¹	DjVuLibre¹	–	Yes	Yes	–	GPLv2
Xpdf	Custom	–	–	–	No	–	–	GPLv3
Xreader	Poppler	libspectre¹	DjVuLibre¹	libgxps¹	Yes	Yes	Yes	GPLv2
Zathura	libmupdf¹ / Poppler¹	libspectre¹	DjVuLibre¹	libmupdf¹	^[dead link 2024-07-30 ⓘ] No	^[dead link 2024-07-30 ⓘ] No	^[dead link 2024-07-30 ⓘ] Yes	zlib

Optional dependency needs to be installed

PDF forms

The PDF forms column in the above table refers to AcroForms support. If you do not need your input to be directly extractable from the PDF, you can also use the applications in #Graphical PDF editing to put text on top of a PDF. PDF forms can be created with LibreOffice Writer (View > Toolbars > Form Controls) and the advanced PDF editors.

The proprietary and deprecated XFA format for forms is not fully supported by Poppler[1][2] and only supported by Adobe Reader and Master PDF Editor.

Alternatively, web browsers such as Firefox or Chromium feature a built-in PDF viewer capable of filling out forms.

Graphical PDF editing

Editors that can import PDF files

Scribus can import and export PDF; text is imported as polygons.[3]
LibreOffice Draw can import and export PDF; text is imported as text; embedded fonts are substituted.[4][5]
Inkscape can import and export PDF; text is imported as cloned glyphs or text; with the latter embedded fonts are substituted.
Graphics editors like GIMP and krita can also import and export PDFs at the cost of rasterization.

Basic editors

flpsed — A PostScript and PDF annotator, only supports text boxes.

https://flpsed.org/flpsed.html || flpsed^AUR

HandyOutliner for DjVu / PDF — Make easier and faster the process of creating bookmarks for DjVu and PDF documents.

https://handyoutlinerfo.sourceforge.net || handyoutliner-bin^AUR

jPDF Tweak — Java Swing application that can combine, split, rotate, reorder, watermark, encrypt, sign, and otherwise tweak PDF files.

https://jpdftweak.sourceforge.net/ || jpdftweak^AUR

Paper Clip — PDF document metadata editor to edit the title, author, keywords and more details.

https://apps.gnome.org/PdfMetadataEditor/ || paper-clip

PDF Arranger — Helps merge or split pdf documents and rotate, crop and rearrange pages. It is a maintained fork of PDF-Shuffler.

https://github.com/jeromerobert/pdfarranger || pdfarranger

PDF Chain — GTK front-end for PDFtk, written in C++, supporting concatenation, burst, watermarks, attaching files and more.

https://pdfchain.sourceforge.net/ || pdfchain^AUR

PdfJumbler — Simple tool to rearrange, merge, delete and rotate pages in PDF files.

https://github.com/mgropp/pdfjumbler || pdfjumbler^AUR

PDF Mix Tool — Qt front-end for PoDoFo, written in C++, supports splitting, merging, rotating and mixing PDF files.

https://scarpetta.eu/pdfmixtool/ || pdfmixtool

PDFsam — Open source application, written in Java, supports merging, splitting and rotating.

https://pdfsam.org/ || pdfsam^AUR

PDF Slicer — Simple application to extract, merge, rotate and reorder pages of PDF documents.

https://junrrein.github.io/pdfslicer/ || pdfslicer

PDF Tricks — Simple, efficient application for small manipulations in PDF files using Ghostscript.

https://github.com/muriloventuroso/pdftricks || pdftricks

Cropping tools

briss — Java GUI to crop pages of PDF documents to one or more regions selected.

https://sourceforge.net/projects/briss/ || briss^AUR

krop — Simple graphical tool to crop the pages of PDF files.

https://arminstraub.com/software/krop || krop^AUR

pdfCropMargins — Automatically crops the margins of PDF files.

https://github.com/abarker/pdfCropMargins || pdfcropmargins^AUR

PdfHandoutCrop — Tool to crop pdf handout with multiple pages per sheet.

https://cges30901.github.io/pdfhandoutcrop/ || pdfhandoutcrop^AUR

Advanced editors

Master PDF Editor — Functional proprietary PDF editor. Latest version free for non-commercial use. The -free package is outdated but lacks a watermark.

https://code-industry.net/free-pdf-editor/ || masterpdfeditor^AUR, masterpdfeditor-free^AUR

PDF Studio — All-in-one proprietary PDF editor similar to Adobe Acrobat.

https://www.qoppa.com/pdfstudio/ || pdfstudio-bin^AUR

PDF4QT — Open source PDF editor.

https://jakubmelka.github.io/ || pdf4qt^AUR

Comparison of advanced editors

Name	Cost (USD, lifetime)	Page Labels	Form Designer	Content Editing (Text and Images)	Optimize PDFs	Digitally Sign PDFs	License
Master PDF Editor	85.34	No	Yes	Yes	Yes	Yes	proprietary
Qoppa PDF Studio Standard	99	Yes	No	No	No	No	proprietary
Qoppa PDF Studio Pro	139	Yes	Yes	Yes	Yes	Yes	proprietary

PDF tools

Command snippets

Create a PDF from images

With GraphicsMagick:

$ gm convert 1.jpg 2.jpg 3.jpg out.pdf

With ImageMagick:

$ magick convert 1.jpg 2.jpg 3.jpg out.pdf

Note that ImageMagick's output is lossy. For lossless PDF creation from jpeg, use img2pdf.

Concatenate PDFs

With Ghostscript:

$ gs -dNOPAUSE -sDEVICE=pdfwrite -sOUTPUTFILE=out.pdf -dBATCH 1.pdf 2.pdf 3.pdf

With PDFtk:

$ pdftk 1.pdf 2.pdf 3.pdf cat output out.pdf

With Poppler:

$ pdfunite 1.pdf 2.pdf 3.pdf out.pdf

With QPDF:

$ qpdf --empty --pages 1.pdf 2.pdf 3.pdf -- out.pdf

Extract text from PDF

With Poppler and maintaining the layout:

$ pdftotext -layout in.pdf out.txt

Decrypt a PDF

This section lists commands to decrypt a PDF to an unencrypted file. Note that most PDF viewers also support encrypted PDFs.

With PDFtk:

$ pdftk in.pdf input_pw password output out.pdf

With Poppler to PostScript:

$ pdftops -upw password in.pdf out.ps

With QPDF:

$ qpdf --decrypt --password=password in.pdf out.pdf

Tip: Forgotten passwords might be recovered with pdfcrack, see pdfcrack(1).

Encrypt a PDF

The user password is used for encryption, the owner password to restrict operations once the document is decrypted, for more information, see Wikipedia:PDF#Encryption and signatures.

With PDFtk:

$ pdftk in.pdf output out.pdf user_pw password

With PoDoFo:

$ podofoencrypt -u user_password -o owner_password in.pdf out.pdf

With QPDF:

$ qpdf --encrypt user_password owner_password key_length -- in.pdf out.pdf

where key_length can be 40, 128 or 256.

Extract images from a PDF

With poppler, saving images as JPEG:

$ pdfimages infile.pdf -j outfileroot

Extract page range from PDF, split multipage PDF document

With Ghostscript as a single file[6]

$ gs -sDEVICE=pdfwrite -dNOPAUSE -dBATCH -dSAFER -dFirstPage=first -dLastPage=last -sOutputFile=outfile.pdf infile.pdf

With PDFtk as a single file:

$ pdftk infile.pdf cat first-last output outfile.pdf

With Poppler as separate files:

$ pdfseparate -f first -l last infile.pdf outfileroot-%d.pdf

With QPDF as a single file:

$ qpdf --empty --pages infile.pdf first-last -- outfile.pdf

With mutool as a single file:

$ mutool clean -g infile.pdf outfile.pdf first-last

Impose a PDF (nup)

PDF Imposition is the process by which multiple input pages are combined into one output page, layed out into a rowsxcolumns grid.

It can be done with pdfjam (notice that wrapper scripts such as pdfnup and pdfbook are deprecated):

$ pdfjam --nup rowsxcolumns input.pdf --outfile output.pdf

or with pdfsak:

$ pdfsak --input-file input.pdf --output output.pdf --nup rows columns

Inspect metadata

With ExifTool:

$ exiftool -All file.pdf

With Poppler:

$ pdfinfo file.pdf

Remove metadata

Using ExifTool

With ExifTool:

$ exiftool -All= -overwrite_original input.pdf
$ mv input.pdf /tmp/temp.pdf
$ qpdf --linearize /tmp/temp.pdf input.pdf

The linearize step is needed to prevent recovery of deleted metadata. See this SuperUser question and the related ExifTool forum thread.

Using pdftk

Many PDFs store document metadata using both an Info dictionary (old school) and an XMP stream (new school). This pdftk command remove the XMP stream from the PDF altogether. It does not remove the Info dictionary.

Note that objects inside the PDF might have their own, separate XMP metadata streams, and that this command does not remove those. It only removes the PDF’s document‐level XMP stream.

$ pdftk input.pdf drop_xmp output output.pdf

Reduce size of a PDF

PDF size can be reduced by setting an appropriate optimization or compression level.

With Ghostscript one of:

$ ps2pdf -dPDFSETTINGS=/screen in.pdf out.pdf

or

$ gs -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/printer -sOutputFile=out.pdf in.pdf

For different settings see the documentation.

There is also shrinkpdf^AUR, a script wrapping gs.

Rasterize a PDF

These commands will convert your PDF into images.

With GraphicsMagick to convert a specific page into an image file:

$ gm convert -density dpi infile.pdf[page] outfile.jpg

With ImageMagick to convert a specific page into an image file:

$ magick convert -density dpi infile.pdf[page] outfile.jpg

With ImageMagick to convert all pages into another PDF file composed by an image file per page:

$ magick convert -density dpi infile.pdf outfile.pdf

Warning: This will increase the file size of your PDF substantially. Use it for example if your printer is not able to print your PDF correctly.

With Poppler to convert all pages into one image file per page:

$ pdftoppm -jpeg -r dpi infile.pdf outfileroot

With Poppler to convert a specific page into an image file:

$ pdftoppm -jpeg -r dpi -f page -singlefile infile.pdf outfileroot

Split PDF pages

With mupdf-tools to split every page vertically into two pages:

$ mutool poster -y 2 in.pdf out.pdf

Can be used to undo simple imposition.

Add an image

Adding an image to any location in a PDF can be done

with ImageMagick (convert), xv^AUR and pdftk. (Wrapper script)
with xournal^AUR
with LibreOffice

Details on these and other solutions can be found on StackExchange.

Add digital signature to PDF

jsignpdf^AUR can digitally sign PDF files with X.509 certificates in GUI and CLI.

Readers such as Okular and MuPDF can sign PDFs with digital signatures. This requires a PFX certificate, which can be created with an OpenSSL command:

$ openssl req -x509 -days 365 -newkey rsa:2048 -keyout cert.pem -out cert.pem
$ openssl pkcs12 -export -in cert.pem -out cert.pfx

MuPDF users can then sign PDFs with the cert.pfx using the graphical interface, or its mutool-sign tool.

Okular users must import cert.pfx into a certificate store such as the one in the default Firefox profile.[7]^{[dead link 2024-01-13 ⓘ]} With Firefox this is done through Settings > Privacy & Security > View Certificates > Your Certificates > Import and selecting cert.pfx. Afterwards Okular will offer this certificate to be used when signing PDFs.

Libreoffice can also sign PDFs.[8]

Removing annotations from a PDF

With pdftk [9]:

$ pdftk in.pdf output - uncompress | sed '/^\/Annots/d' | pdftk - output out.pdf compress

With perl-cam-pdf^AUR:

$ rewritepdf.pl -C in.pdf out.pdf

See https://superuser.com/a/1051543 for more information.

Add page numbers

With pdfsak:

$ pdfsak --input-file input.pdf --output output.pdf --text "\large \$page/\$pages" br 0.99 0.99 --latex-engine xelatex --font "Noto Regular"

Add page labels

Page labels are logical page numbers shown in the navigation bar of your PDF reader. They are useful for example if the first pages of the PDF are indices numbered with roman numbers (I, II, etc.), while the page numbered "1" corresponds to a PDF page greater than 1, and you want the page number shown in the navigation bar to corresponds to the page number shown in the physical page.

This should not be confused with adding page numbers into a physical page. See section 12.4.2 of PDF reference to better understand page labels.

Using pagelabels-py, let's say we have a PDF named my_document.pdf, that has 12 pages.
- Pages 1 to 4 should be labelled Intro I to Intro IV.
- Pages 5 to 9 should be labelled 2 to 6.
- Pages 10 to 12 should be labelled Appendix A to Appendix C
- We can issue the following list of commands:
```
$ python3 -m pagelabels --delete "my_document.pdf"
$ python3 -m pagelabels --startpage 1 --prefix "Intro " --type "roman uppercase" "my_document.pdf"
$ python3 -m pagelabels --startpage 5 --firstpagenum 2 "my_document.pdf"
$ python3 -m pagelabels --startpage 10 --prefix "Appendix " --type "letters uppercase" "my_document.pdf" 
```
- Note: pagelabels-py will convert your file to PDF 1.3 specification
Using pdftk, create a metadata.txt file with labels:
```
PageLabelBegin
PageLabelNewIndex: 1
PageLabelStart: 1
PageLabelPrefix: Cover
PageLabelNumStyle: NoNumber
PageLabelBegin
PageLabelNewIndex: 2
PageLabelStart: 1
PageLabelPrefix: Back Cover
PageLabelNumStyle: NoNumber
PageLabelBegin
PageLabelNewIndex: 3
PageLabelStart: 1
PageLabelNumStyle: LowercaseRomanNumerals
PageLabelBegin
PageLabelNewIndex: 27
PageLabelStart: 1
PageLabelNumStyle: DecimalArabicNumerals 
```
- Where:
  
  PageLabelBegin
  
  signal a new page label definition will follow
  
  PageLabelNewIndex
  
  is the PDF page index from which the numbering style applies, counting from one. The numbering style will continue until the next page label or, if there are no more page labels, until the end of the document.
  
  PageLabelStart
  
  is the starting number. For example, if you specify 5 here, the pages will be numbered 5, 6, 7, ...
  
  PageLabelPrefix
  
  a text to put before the number in page labels.
  
  PageLabelNumStyle
  
  can be DecimalArabicNumerals, UppercaseRomanNumerals, LowercaseRomanNumerals, UppercaseLetters, LowercaseLetters or NoNumber.
- Then use:
```
pdftk book.pdf update_info_utf8 metadata.txt output book-with-metadata.pdf
```

See this SuperUser question for more details.

Extract bookmarks

With pdftk:

$ pdftk file.pdf dump_data_utf8 | grep '^Bookmark'

With qpdf:

$ qpdf --json --json-key=outlines file.pdf

See https://unix.stackexchange.com/questions/143886/how-to-extract-bookmarks-from-a-pdf-file for more information.

Add bookmarks

With pdftk

Create a text file bookmark_definitions.txt with bookmark definitions in the following format:

BookmarkBegin
BookmarkTitle: Chapter 1
BookmarkLevel: 1
BookmarkPageNumber: 1
BookmarkBegin
BookmarkTitle: Chapter 1.1
BookmarkLevel: 2
BookmarkPageNumber: 2
BookmarkBegin
BookmarkTitle: Chapter 1.2
BookmarkLevel: 2
BookmarkPageNumber: 3
BookmarkBegin
BookmarkTitle: Chapter 1.3
BookmarkLevel: 2
BookmarkPageNumber: 4
BookmarkBegin
BookmarkTitle: Chapter 1.3.1
BookmarkLevel: 3
BookmarkPageNumber: 5
BookmarkBegin
BookmarkTitle: Chapter 2
BookmarkLevel: 1
BookmarkPageNumber: 6

Where

BookmarkBegin: signal a new bookmark definition
BookmarkTitle: the title of the bookmark
BookmarkLevel: the level of the bookmark in the hierarchy
BookmarkPageNumber: the page number the bookmark redirects to

In this example, the above file will create the following bookmark structure:

Chapter 1
- Chapter 1.1
- Chapter 1.2
- Chapter 1.3
  - Chapter 1.3.1
Chapter 2

Apply the bookmarks with the following command:

$ pdftk input.pdf update_info_utf8 bookmark_definitions.txt output output.pdf

Extract pages contained within a bookmark

To extract the pages contained within a bookmark, you can use pdf_extbook-git^AUR.

With pdf_extbook file you will be prompted on what bookmark whose pages you want to extract and where to save it. To extract all bookmarks of a given hierarchical level:

$ pdf_extbook file -a level output_file_stem

Remove blank pages

One can use the following script to remove blank pages form a PDF file (credit: SuperUser post):

#!/bin/sh

IN="$1"
filename=$(basename "${IN}")
filename="${filename%.*}"
PAGES=$(pdfinfo "$IN" | grep ^Pages: | tr -dc '0-9')

non_blank() {
	for i in $(seq 1 $PAGES); do
		PERCENT=$(gs -o - -dFirstPage=${i} -dLastPage=${i} -sDEVICE=ink_cov "$IN" | grep CMYK | nawk 'BEGIN { sum=0; } {sum += $1 + $2 + $3 + $4;} END { printf "%.5f\n", sum } ')
		if [ $(echo "$PERCENT > 0.001" | bc) -eq 1 ]; then
			echo $i
			#echo $i 1>&2
		fi
		echo -n . 1>&2
	done | tee "$filename.tmp"
	echo 1>&2
}

set +x
pdftk "${IN}" cat $(non_blank) output "${filename}_noblanks.pdf"

Use it like pdf_remove_blank_pages input.pdf.

The script needs pdftk, nawk and ghostscript.

Find fonts used in a PDF

The pdffonts(1) command (from poppler), can be used to find which fonts a PDF uses and if they have been embedded in it or not:

$ pdffonts file.pdf

name                                 type              encoding         emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
Times-Roman                          Type 1            Custom           no  no  no       8  0
Times-Italic                         Type 1            Standard         no  no  no       9  0
Times-Bold                           Type 1            Standard         no  no  no       7  0
Helvetica                            Type 1            Standard         no  no  no      34  0
Helvetica-Bold                       Type 1            Standard         no  no  no      35  0

This can be used when having issues displaying properly the text in a PDF, to determine if missing fonts or their metric-compatible equivalent need to be installed.

Repair broken PDF file

With ghostscript:

$ gs -o repaired.pdf -sDEVICE=pdfwrite -dPDFSETTINGS=/prepress corrupted.pdf

With poppler:

$ pdftocairo -pdf corrupted.pdf repaired.pdf

With mupdf-tools:

$ mutool clean corrupted.pdf repaired.pdf

Reference: https://superuser.com/q/278562

Convert PDF to PDF/A standard

With ghostscript:

$ gs -dPDFA -dBATCH -dNOPAUSE -sColorConversionStrategy=UseDeviceIndependentColor -sDEVICE=pdfwrite -dPDFACompatibilityPolicy=2 -sOutputFile=document_pdfa.pdf document.pdf

Reference: https://stackoverflow.com/a/56459053

Validate PDF/A compliance

Using verapdf^AUR you can validate the compliance of your PDF to different flavours of the PDF/A standard:

$ verapdf --flavour 1a --format text document.pdf

DjVu tools

DjVuLibre provides many command-line tools, like ddjvu(1) for example.
img2djvu — Single-pass DjVu encoder based on DjVu Libre and ImageMagick.

https://github.com/ashipunov/img2djvu || img2djvu-git^AUR

pdf2djvu — Creates DjVu files from PDF files.

https://jwilk.net/software/pdf2djvu || pdf2djvu^AUR

Convert DjVu to images

Break Djvu into separate pages:

$ djvmcvt -i input.djvu /path/to/out/dir output-index.djvu

Convert Djvu pages into images:

$ ddjvu --format=tiff page.djvu page.tiff

Convert Djvu pages into PDF:

$ ddjvu --format=pdf inputfile.djvu ouputfile.pdf

You can also use --page to export specific pages:

$ ddjvu --format=tiff --page=1-10 input.djvu output.tiff

this will convert pages from 1 to 10 into one tiff file.

Processing images

You can use scantailor-advanced to:

fix orientation
split pages
deskew
crop
adjust margins

Make DjVu from images

There is a useful script img2djvu-git^AUR.

$ img2djvu -c1 -d600 -v1 ./out

it will create 600 DPI out.djvu from all files in ./out directory.

Alternatively, you can try didjvu^AUR, which seems to create smaller files especially on images with well defined background.

PostScript tools

pstotext — Converts PostScript files to text.

https://www.cs.wisc.edu/~ghost/doc/pstotext.htm || pstotext^AUR

Ghostscript

ps2pdf

ps2pdf is a wrapper around ghostscript to convert PostScript to PDF:

$ ps2pdf -sPAPERSIZE=a4 -dOptimize=true -dEmbedAllFonts=true YourPSFile.ps

Explanation:

with -sPAPERSIZE=something you define the paper size. For valid PAPERSIZE values, see [10]^{[dead link 2022-09-22 ⓘ]}.
-dOptimize=true lets the created PDF be optimised for loading.
-dEmbedAllFonts=true makes the fonts look always nice.

Note: You cannot choose the paper orientation in ps2pdf. If your input PS file is healthy, it already contains the orientation information. If you are trying to use an Encapsulated PS file, you will have problems, if it does not fit in the -sPAPERSIZE you specified, because EPS files usually do not contain paper orientation information. A workaround is creating a new paper in ghostscript settings (call it e.g. "slide") and use it as -sPAPERSIZE=slide.

Libraries

C/C++

libharu — C library for generating PDF documents.

https://github.com/libharu/libharu || libharu, Lua binding: lua-hpdf^AUR

PoDoFo — A C++ library to work with the PDF file format.

https://podofo.sourceforge.net || podofo

Python

borb — borb is a library for reading, creating and manipulating PDF files in python.

https://borbpdf.com/, https://github.com/jorisschellekens/borb || not packaged? search in AUR

pdfrw — A pure Python library that reads and writes PDFs.

https://github.com/pmaupin/pdfrw || python-pdfrw

PyPDF — A pure-Python library built as a PDF toolkit.

https://github.com/py-pdf/pypdf || python-pypdf

PyX — Python library for the creation of PostScript and PDF files.

https://pyx.sourceforge.net || python-pyx

ReportLab — A proven industry-strength PDF generating solution

https://www.reportlab.com/ || python-reportlab

Java

iText Core — iText is a more versatile, programmable and enterprise-grade PDF solution that allows you to embed its functionalities within your own software for digital transformation.

https://itextpdf.com/products/itext-core || itext-rups-bin^AUR

OpenPDF — OpenPDF is a free Java library for creating and editing PDF files with a LGPL and MPL open source license. OpenPDF is based on a fork of iText.

https://github.com/LibrePDF/OpenPDF || not packaged? search in AUR

Engines

Viewers

Framebuffer

Graphical

Comparison

PDF forms

Graphical PDF editing

Editors that can import PDF files

Basic editors

Cropping tools

Advanced editors

Comparison of advanced editors

PDF tools

Command snippets

Create a PDF from images

Concatenate PDFs

Extract text from PDF

Decrypt a PDF

Encrypt a PDF

Extract images from a PDF

Extract page range from PDF, split multipage PDF document

Impose a PDF (nup)

Inspect metadata

Remove metadata

Using ExifTool

Using pdftk

Reduce size of a PDF

Rasterize a PDF

Split PDF pages

Add an image

Add digital signature to PDF

Removing annotations from a PDF

Add page numbers

Add page labels

Extract bookmarks

Add bookmarks

With pdftk

Extract pages contained within a bookmark

Remove blank pages

Find fonts used in a PDF

Repair broken PDF file

Convert PDF to PDF/A standard

Validate PDF/A compliance

DjVu tools

Convert DjVu to images

Processing images

Make DjVu from images

PostScript tools

ps2pdf

Libraries

C/C++

Python

Java

See also