Getting rid of Type 3 fonts from gnuplot outputs

I recently had a small odyssey with PDF and fonts when preparing camera-ready versions of my two most recent papers. It started with the following error message from HotCRP format checker: “Bad font: unnamed Type3 fonts first referenced on page X”.

I cannot even comprehend the error message on first sight – what is a Type 3 font? After some googling, I learned that the definition of “types” comes from PostScript, and a Type 3 font is a font stored as bitmaps instead of vector glyphs that you and I would expect. Such fonts are designed for specific devices (say, a 300-DPI printer) and would naturally be ugly when displayed on monitors, even more so when zoomed in/out. So, HotCRP and most publishers (rightfully) decide to ban such fonts from camera-ready papers.

I’m still slightly confused though. I have always thought that PDFs are basically vector pictures. Why do they need to care about fonts? It turns out that the text is actually not lost when typeset into a PDF file. The text is encoded (with a standard or nonstandard encoding) and stored in the PDF, accompanied by a hint on which font to use to display it. It is the PDF viewer’s job to draw the glyphs of the text using a proper font on the screen. Note that the font itself may or may not be embedded in the PDF file. If so, the embedded font will always be used. If not, the viewer will choose a locally-available font as a substitute. This font-embedding business is another source of trouble by itself. Publishers (again, rightfully) want final papers to look the same on all devices regardless of what fonts are locally available, so they generally require all fonts to be embedded, which may or may not be the default for LaTeX and gnuplot. But let’s get back to Type 3 fonts for now.

So, the error message basically says “I found ugly Type 3 fonts used in the PDF”. The first question is how it got in there. The page number in the error message and some tweaking led me to believe the problem comes from figures generated by gnuplot. Another round of googling led me to a bunch of other users sharing the same problem.

Solving the problem

Usually, the diagnosis from stackoverflow would be: gnuplot on macOS does strange stuff and uses Type 3 fonts for spaces in captions and labels. It does not show this behavior on Linux. Someone did file a ticket for this bug, but unfortunately, the developer pointed out that gnuplot does not handle fonts by itself, and suggested looking at its dependencies. This looks like a rabbit hole to me. The suggested solution is usually re-generating the plots on Linux.

As a side note, if you would like to see for yourself that there are indeed Type 3 fonts in the document, you may install the pdffonts tool from the poppler package (available in Homebrew). Then, run pdffonts your-figure.pdf and you should see something like this:

name                                 type              encoding         emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
[none]                               Type 3            Custom           yes no  yes      6  0
FDCBUM+Helvetica                     TrueType          WinAnsi          yes yes yes      7  0
CQDHZB+Helvetica                     TrueType          WinAnsi          yes yes yes      8  0
[none]                               Type 3            Custom           yes no  yes      9  0

If you embed this figure into a paper, the PDF output of the paper will contain Type 3 fonts.

As mentioned before, the optimal solution is to re-generate the figures and try to not let Type 3 fonts sneak in. However, sometimes it is difficult or impossible. For example, I do not even know why gnuplot on macOS would use Type 3 fonts. Or, I may have figures which I have lost the original script to generate. Or, I may not want to take the risk that the re-generated plot may look different from the original, peer-reviewed version. It would be great if there is a way to fix the PDF plots I’ve already got.

The solution is actually simple. I just remove all fonts from the figure. Recall that fonts are there to help show text, and there is little text in a figure. So, we can simply draw the plot as a pure vector image, with text rendered into vector shapes. This process is called “outlining”. Some quick googling led me to the one-liner:

gs -o output.pdf -dNoOutputFonts -sDEVICE=pdfwrite input.pdf

If you run pdffonts on the output, you will notice that all fonts are gone. HotCRP should stop complaining if you embed the output into the paper.

Note that you should definitely not run this on the full paper. It will convert your paper into a huge vector picture. Although the text will look the same, the PDF viewer will just treat them as vector shapes, and functions like search or highlight will break.

Updated June 12, 2024.

Okay, so I needed to work on another camera ready. This time, I did not wish to outline all glyphs in the figures, but just the ones using Type 3 fonts. (In comparison, the -dNoOutputFonts option will outline all glyphs in a file.) There are many reasons. For example, outlined glyphs are just vector shapes and applications handle them differently from actual text, which in my opinion make them appear uglier. After some digging, I finally figured out a solution using only ghostscript. I believe this method has not been documented elsewhere.

The key is to use the AlwaysOutline option of ghostscript documented here. It tells ghostscript to only outline a specified list of fonts. A small issue is that the Type 3 fonts in Gnuplot outputs do not have a name, so I was a bit unsure about how to refer to them. But after some trial and error I found the following command worked.

gs -sDEVICE=pdfwrite -o out.pdf -c "<< /AlwaysOutline [/] >> setdistillerparams" -f input.pdf

To make sure it worked, we use pdffonts again.

% pdffonts input.pdf
name                                 type              encoding         emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
[none]                               Type 3            Custom           yes no  yes      7  0
WMFXZX+Helvetica                     TrueType          WinAnsi          yes yes yes      8  0
ZNDOST+Helvetica                     TrueType          WinAnsi          yes yes yes      9  0
[none]                               Type 3            Custom           yes no  yes     10  0

% pdffonts out.pdf     
name                                 type              encoding         emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
JGONUT+Helvetica                     TrueType          WinAnsi          yes yes yes     10  0
LLMJYR+Helvetica                     TrueType          WinAnsi          yes yes yes     14  0

Notice that the Type 3 fonts with no name are gone from the output!