GSoC :: Coding Period – Phase Three (July 8th to August 6th): Create and parse DA string in poppler-core and font family implementation

Hi everyone,

The coding period phase three is now completed. After the second evaluation, Poppler’s maintainer Albert Astals Cid commented on my bug report https://bugs.freedesktop.org/show_bug.cgi?id=107151#c3 stating that the current parsing and creation of DA string is handled by Qt5 frontend whereas the API should be changed and the creation and parsing of the DA string and font color should be the concern of poppler-core.

So I needed to shift my plans for the font family implementation in Poppler and I began to work on correcting the creation and parsing of DA. Both the Qt5 frontend and the Core were modified and after a series of reviews, I finally submitted the patch co-authored by my mentor Tobias Deimingerhttps://bugs.freedesktop.org/attachment.cgi?id=140963.

Meanwhile, my mentor was sending me the emails and was pushing his researched results into the scratch repo regarding the font family implementation in Poppler. In the last days when I completed with the font color and DA creation and parsing patch, I began to hack into the font family. I have written the full post about my own understanding and experiment here: FreeText Annotation :: Font family implementation in Poppler. My experimental patch can be found here.

After incorporating D13203 and both the patches, this is how it finally works:

  1. It adds the typewriter tool to the annotation toolbar
  2. You can select a font color for the annotation’s text
  3. You should choose a Base-14 font and it would be correctly displayed

As you can see in the above image that you are able to make the different base-14 font for the different typewriter annotations text within the same document when loaded but neither it writes the font dictionary into the PDF document nor you will get the same fonts on document reload. It falls to the default font “Helvetica”.

But this experimental patch is very promising for the intuition of accurate solution and the support for the embedded fonts.

What’s Left

The experimental font family implementation for the base-14 fonts is quite buggy and is not an accurate solution. It is also not nicely structured and requires the proper implementation to write the streams and embed fonts into the PDF document and we need to create an in-memory Appearance with an in-memory Resource Dictionary, where the /Font entry points out to the on-disk Font Subdictionary inside AcroForm->DR.
It would actually be good to save the in-memory appearance as /AP into the document (PDF 2.0 requires this, and a number of existing Okular issues arise from not saving AP). But that’s unrelated to the font family issue.

I want to thank my mentor for constantly motivating and monitoring me by putting the enormous efforts and it was really nice working with him :)

 

FreeText Annotation :: Font family implementation in Poppler

Before getting started with Poppler, let’s first understand the PDF structure and different terminologies used.

Annotations are PDF objects that enable user-clickable actions as well as contents text or other graphics and media. The following is a snippet of FreeText annotation created on a PDF page in Acrobat Reader. It is in the decompressed form:

87 0 obj % The FreeText Annot dictionary
<<
/AP <<
/N 119 0 R % Indirect reference to the normal state XObject
>>
/DA (0 G 0 g 0 Tc 0 Tw 100 Tz 24.45 TL 0 Ts 0 Tr /Cour 18 Tf)
/Subtype /FreeText
/Type /Annot
>>
endobj

Let’s grab the important terminologies regarding the font family and appearance.

  • Appearance Stream: It is a form Xobject. A form Xobject is a reusable graphics object. It has a content stream which defines certain graphics. In the case of appearance stream, it is a self-content stream that shall be rendered inside the annotation rectangle. In the above snippet, AP string is the appearance stream.
  • Default Appearance: The default appearance string (DA) shall be used in formatting the text. It contains any graphics state or text state operators needed to establish the graphics state parameters, such as text size and color, for displaying the field’s variable text.
    Only operators that are allowed within text objects shall occur in this string. At a minimum, the string shall include a Tf (text font) operator along with its two operands, font and size. The specified font value shall match a resource name in the Font entry of the default resource dictionary (the DR entry of the interactive form dictionary). A zero value for size means that the font shall be auto-sized: its size shall be computed as a function of the height of the annotation rectangle.
  • Tf operator: In the above snippet, the DA string /DA (0 G 0 g 0 Tc 0 Tw 100 Tz 24.45 TL 0 Ts 0 Tr /Cour 18 Tf) gives us the information about the font color and the font value and size. Tw, Tz, TL, Ts, Tr are the additional operators to fine-tune the FreeText annotation text.

/TheFont 18 Tf
|                     |
|                     | “text font operator”
|
“font value”, must match a “resource name in the Font entry”

 

So in the above PDF snippet, the appearance stream (AP) is being provided inside the annotation dictionary. /AP <</N 119 0 R>> It is pointing to another object 119 0 which is shown below:

119 0 obj % The normal state XObjects stream dictionary
<<
/Resources << % Optional but strongly recommended; PDF 1.2
/Font << % The font dictionary
/Helv 556 0 R % The font definition! It's Helvetica Type1, see object 556 below. The tag /Helv can now be used by Tf operator in FreeText.
>>
/ProcSet [
/PDF
/Text
]
>>
>>
stream
...

Here you can find the /Resources key which contains the font dictionary with the /Helv font tag pointing to the object 556 0 below. Let’s look at the object 556 0:

556 0 obj
<<
/BaseFont /Helvetica
/Encoding 1037 0 R
/Name /Helv
/Subtype /Type1
/Type /Font
>>
endobj

You can see the 4 essential key-value pairs in the above stream dictionary:

  1. /Type tells that it is a Font.
  2. /SubType tells about the font type. It’s one of Type0, Type1, MMType1, Type3, TrueType, CIDFontType0, CIDFontType2.
  3. /BaseFont tells the PostScript name, a platform-independent identifier for a font. Type1 fonts have the PostScript name as FontName field in the .afm file, and as /FontName in the *.pfb file TrueType fonts (*.ttf) fonts have the PostScript name as Id 6 in the “name table”.
  4. /Encoding tells about the encoding that the font supports. If WinAnsiEncoding, it supports all the Latin characters.

So, in a nutshell, /AP takes precedence over /DA and points to the resource stream dictionary which contains the font dictionary telling about the font. In the above example, the annotation content is finally rendered using Helvetica glyphs.

But what if the AP string is missing? In this case, DA tells about the font family. Let’s begin with the interactive form dictionary aka AcroForm to understand how does DA string tells about the font!

AcroForm (Interactive form dictionary): It is a collection of fields for gathering information interactively from the user. AcroForm contains the default resource (DR) key and its value (of type dictionary) needs to include a Font key. The value of Font is the resource name and font to be used as a default font for displaying text in fields.

This is the case with Poppler that creates a FreeText annotation in which the AP string is missing and DA tells about the font family. The following diagram explains it in a better way:

In the above example, instead of using the base-14/standard font, we used the embedded font. These fonts are of TrueType and are embedded into the PDF document. We will discuss a bit about the font configuration in Poppler but before that, let’s look at the UML diagram of the font object graph prepared by my mentor Tobias Deiminger:

The diagram explains everything that we have discussed so far and is almost accurate.

Now we have different font configurations in Poppler viz. libfontconfig, win32 and generic for the Linux, Windows and Android OS respectively. We have to take care of all of them but as I’m working on Kubuntu, my prime concern is libfontconfig.

Enough about the annotation and font in PDF. Let me tell you how did I explore the domain and the existing code in Poppler?

Poppler’s codebase is quite complex and as I was working on the project Okular, my domain was restricted to Poppler Qt5 frontend and the class AnnotFreeText in Poppler Core. After sending the patch for creating and parsing DA string in poppler-core, the load was shifted towards Annot.cc and after going through the codebase, I realized that poppler produces the in-memory appearance stream and sets the same font program for all the FreeText annotations in the current PDF document.  The generateFreeTextAppearance and createAnnotDrawFont functions are responsible but what we require is to create the DA string with a meaningful font tag and to write the DR string of AcroForm and the font dictionary into the document in the case of base-14 fonts so that the annotation font can be shown correctly by another document viewer.

Poppler needs to show the correct font when the AP exists and there should be a mechanism to call the C API of libfontconfig to get the font file for the non-standard font and to embed it into the document. All of these require a new API plan.

I created a workaround patch https://bugs.freedesktop.org/attachment.cgi?id=140969 which is the work of my experiment to create different typewriter annotations with the different base-14 fonts on the same page in Okular. Neither it writes the font dictionary and DR string to the document nor it supports the embedded fonts. But it gives the idea and the intuition to carry forward the work towards writing the streams into the document and correctly displaying the fonts. It should also support the C API of libfontconfig in Linux and win32, generic for the other font configurations to get the stream of the font file is responsible.

The hardest part that I think is the different font configurations that we should use in Poppler and embed the font file. Writing the correct stream to the document is another bigger challenge.

Sources:

[0] Developing with PDF – SafariBooksOnline

[1] PDF ISO-32000 Standard

[2] okular-gsoc2018-typewriter

 

I’m going to Akademy 2018 – Vienna, Austria

Being an active open source contributor to KDE Community and a GSoC 2018 student in the same organization, I’m going to attend Akademy 2018 in Vienna, Austria from August 10 to August 17.

Akademy is KDE annual conference and comprises of hundreds of attendees from the global community. The venue is Technische Universität Wien (TU Wien) and I’m glad that I’m being sponsored by the KDE e.V.

I have a short presentation on my ongoing GSoC project where I will spread the word about Okular under the Student Presentations Slot on day 1 at 16:30. Besides that, I’m planning to attend the Public Speaking Training and Practicals by Marta Rybczynska and various BoFs for the next 6 days. I will be lucky to witness the awesome speakers there.

See you in Wien!

 

 

GSoC :: Coding Period – Phase Two (June 13th to July 7th): Font color implementation in Poppler and Okular

Hi everyone,

The coding period phase two is now completed and I’m done with the font color implementation in Poppler’s Qt5 frontend and in Okular’s typewriter annotation tool. I have updated the phabricator revision D13203 and filed a bug and attached the patch in freedesktop’s bugzilla https://bugs.freedesktop.org/show_bug.cgi?id=107151.

As per the agreed timeline, I have patched the poppler-qt5 with the font color by introducing the ‘rg’ operator in the GooString which formats the font color in the RGB color model. In Okular, the font color chooser is introduced in the typewriter annotation setting dialogue which sets both the text annotation’s color and the engine color and hence colorize the typewriter icon color accordingly. The generator side and the doctype XML metadata for saving text color are also adapted. It is well supported in PagePainter too. The review comments (if any) from my mentor, Tobias Deiminger, is yet to come.

This is how it works:

Following is my plan for the next phase:

  • Respect font family in Poppler

You can track my commits at https://cgit.kde.org/okular.git/log/?h=gsoc2018_typewriter

Feedbacks and suggestions are always welcome :)

 

GSoC :: Coding Period – Phase One (May 14th to June 12th): Initial implementation of typewriter annotation tool in Okular

Hi everyone,

The phase one of the coding period is now completed and I’m done with the initial implementation of typewriter annotation tool in Okular along with writing the integration tests for the same. I have created the revision on Phabricator and it is currently under review. Some review comments by my mentor are still to come.

As per the agreed timeline, I have implemented the fully functional typewriter tool that creates the annotation with the transparent background in all the supported document formats and the text input UI in the current implementation is the popup QInputDialog window which is in accordance with the inline note.

This is how it works:

Thanks to Tobias Deiminger, my mentor, and other Okular developers who helped me in all the ways whenever I was stuck anywhere.

The typewriter tool icon is inspired from Adobe Reader’s one and currently, we are missing a number of vital features in others annotations plus the typewriter annotation which we have planned to complete in the next phase. I need to do some fixes before proceeding to the other goals of this project.

The first 15 days of the phase one were quite busy for me as I had my college exams and so I only devoted 15 hours a week and the last 10 days were spent in figuring out how to write the tests and in writing a few of them. Following is our next plan:

  • Font color implementation in Poppler
  • Font color chooser in typewriter annotation’s settings dialog
  • Respect font family in Poppler
  • Writing integration tests

You can track my commits at https://cgit.kde.org/okular.git/log/?h=gsoc2018_typewriter

Feedbacks and suggestions are always welcome :)

Wait for the next post…