GSoC :: Coding Period – Phase Three (July 8th to August 6th): Create and parse DA string in poppler-core and font family implementation

Hi everyone,

The coding period phase three is now completed. After the second evaluation, Poppler’s maintainer Albert Astals Cid commented on my bug report https://bugs.freedesktop.org/show_bug.cgi?id=107151#c3 stating that the current parsing and creation of DA string is handled by Qt5 frontend whereas the API should be changed and the creation and parsing of the DA string and font color should be the concern of poppler-core.

So I needed to shift my plans for the font family implementation in Poppler and I began to work on correcting the creation and parsing of DA. Both the Qt5 frontend and the Core were modified and after a series of reviews, I finally submitted the patch co-authored by my mentor Tobias Deimingerhttps://bugs.freedesktop.org/attachment.cgi?id=140963.

Meanwhile, my mentor was sending me the emails and was pushing his researched results into the scratch repo regarding the font family implementation in Poppler. In the last days when I completed with the font color and DA creation and parsing patch, I began to hack into the font family. I have written the full post about my own understanding and experiment here: FreeText Annotation :: Font family implementation in Poppler. My experimental patch can be found here.

After incorporating D13203 and both the patches, this is how it finally works:

  1. It adds the typewriter tool to the annotation toolbar
  2. You can select a font color for the annotation’s text
  3. You should choose a Base-14 font and it would be correctly displayed

As you can see in the above image that you are able to make the different base-14 font for the different typewriter annotations text within the same document when loaded but neither it writes the font dictionary into the PDF document nor you will get the same fonts on document reload. It falls to the default font “Helvetica”.

But this experimental patch is very promising for the intuition of accurate solution and the support for the embedded fonts.

What’s Left

The experimental font family implementation for the base-14 fonts is quite buggy and is not an accurate solution. It is also not nicely structured and requires the proper implementation to write the streams and embed fonts into the PDF document and we need to create an in-memory Appearance with an in-memory Resource Dictionary, where the /Font entry points out to the on-disk Font Subdictionary inside AcroForm->DR.
It would actually be good to save the in-memory appearance as /AP into the document (PDF 2.0 requires this, and a number of existing Okular issues arise from not saving AP). But that’s unrelated to the font family issue.

I want to thank my mentor for constantly motivating and monitoring me by putting the enormous efforts and it was really nice working with him :)

 

FreeText Annotation :: Font family implementation in Poppler

Before getting started with Poppler, let’s first understand the PDF structure and different terminologies used.

Annotations are PDF objects that enable user-clickable actions as well as contents text or other graphics and media. The following is a snippet of FreeText annotation created on a PDF page in Acrobat Reader. It is in the decompressed form:

87 0 obj % The FreeText Annot dictionary
<<
/AP <<
/N 119 0 R % Indirect reference to the normal state XObject
>>
/DA (0 G 0 g 0 Tc 0 Tw 100 Tz 24.45 TL 0 Ts 0 Tr /Cour 18 Tf)
/Subtype /FreeText
/Type /Annot
>>
endobj

Let’s grab the important terminologies regarding the font family and appearance.

  • Appearance Stream: It is a form Xobject. A form Xobject is a reusable graphics object. It has a content stream which defines certain graphics. In the case of appearance stream, it is a self-content stream that shall be rendered inside the annotation rectangle. In the above snippet, AP string is the appearance stream.
  • Default Appearance: The default appearance string (DA) shall be used in formatting the text. It contains any graphics state or text state operators needed to establish the graphics state parameters, such as text size and color, for displaying the field’s variable text.
    Only operators that are allowed within text objects shall occur in this string. At a minimum, the string shall include a Tf (text font) operator along with its two operands, font and size. The specified font value shall match a resource name in the Font entry of the default resource dictionary (the DR entry of the interactive form dictionary). A zero value for size means that the font shall be auto-sized: its size shall be computed as a function of the height of the annotation rectangle.
  • Tf operator: In the above snippet, the DA string /DA (0 G 0 g 0 Tc 0 Tw 100 Tz 24.45 TL 0 Ts 0 Tr /Cour 18 Tf) gives us the information about the font color and the font value and size. Tw, Tz, TL, Ts, Tr are the additional operators to fine-tune the FreeText annotation text.

/TheFont 18 Tf
|                     |
|                     | “text font operator”
|
“font value”, must match a “resource name in the Font entry”

 

So in the above PDF snippet, the appearance stream (AP) is being provided inside the annotation dictionary. /AP <</N 119 0 R>> It is pointing to another object 119 0 which is shown below:

119 0 obj % The normal state XObjects stream dictionary
<<
/Resources << % Optional but strongly recommended; PDF 1.2
/Font << % The font dictionary
/Helv 556 0 R % The font definition! It's Helvetica Type1, see object 556 below. The tag /Helv can now be used by Tf operator in FreeText.
>>
/ProcSet [
/PDF
/Text
]
>>
>>
stream
...

Here you can find the /Resources key which contains the font dictionary with the /Helv font tag pointing to the object 556 0 below. Let’s look at the object 556 0:

556 0 obj
<<
/BaseFont /Helvetica
/Encoding 1037 0 R
/Name /Helv
/Subtype /Type1
/Type /Font
>>
endobj

You can see the 4 essential key-value pairs in the above stream dictionary:

  1. /Type tells that it is a Font.
  2. /SubType tells about the font type. It’s one of Type0, Type1, MMType1, Type3, TrueType, CIDFontType0, CIDFontType2.
  3. /BaseFont tells the PostScript name, a platform-independent identifier for a font. Type1 fonts have the PostScript name as FontName field in the .afm file, and as /FontName in the *.pfb file TrueType fonts (*.ttf) fonts have the PostScript name as Id 6 in the “name table”.
  4. /Encoding tells about the encoding that the font supports. If WinAnsiEncoding, it supports all the Latin characters.

So, in a nutshell, /AP takes precedence over /DA and points to the resource stream dictionary which contains the font dictionary telling about the font. In the above example, the annotation content is finally rendered using Helvetica glyphs.

But what if the AP string is missing? In this case, DA tells about the font family. Let’s begin with the interactive form dictionary aka AcroForm to understand how does DA string tells about the font!

AcroForm (Interactive form dictionary): It is a collection of fields for gathering information interactively from the user. AcroForm contains the default resource (DR) key and its value (of type dictionary) needs to include a Font key. The value of Font is the resource name and font to be used as a default font for displaying text in fields.

This is the case with Poppler that creates a FreeText annotation in which the AP string is missing and DA tells about the font family. The following diagram explains it in a better way:

In the above example, instead of using the base-14/standard font, we used the embedded font. These fonts are of TrueType and are embedded into the PDF document. We will discuss a bit about the font configuration in Poppler but before that, let’s look at the UML diagram of the font object graph prepared by my mentor Tobias Deiminger:

The diagram explains everything that we have discussed so far and is almost accurate.

Now we have different font configurations in Poppler viz. libfontconfig, win32 and generic for the Linux, Windows and Android OS respectively. We have to take care of all of them but as I’m working on Kubuntu, my prime concern is libfontconfig.

Enough about the annotation and font in PDF. Let me tell you how did I explore the domain and the existing code in Poppler?

Poppler’s codebase is quite complex and as I was working on the project Okular, my domain was restricted to Poppler Qt5 frontend and the class AnnotFreeText in Poppler Core. After sending the patch for creating and parsing DA string in poppler-core, the load was shifted towards Annot.cc and after going through the codebase, I realized that poppler produces the in-memory appearance stream and sets the same font program for all the FreeText annotations in the current PDF document.  The generateFreeTextAppearance and createAnnotDrawFont functions are responsible but what we require is to create the DA string with a meaningful font tag and to write the DR string of AcroForm and the font dictionary into the document in the case of base-14 fonts so that the annotation font can be shown correctly by another document viewer.

Poppler needs to show the correct font when the AP exists and there should be a mechanism to call the C API of libfontconfig to get the font file for the non-standard font and to embed it into the document. All of these require a new API plan.

I created a workaround patch https://bugs.freedesktop.org/attachment.cgi?id=140969 which is the work of my experiment to create different typewriter annotations with the different base-14 fonts on the same page in Okular. Neither it writes the font dictionary and DR string to the document nor it supports the embedded fonts. But it gives the idea and the intuition to carry forward the work towards writing the streams into the document and correctly displaying the fonts. It should also support the C API of libfontconfig in Linux and win32, generic for the other font configurations to get the stream of the font file is responsible.

The hardest part that I think is the different font configurations that we should use in Poppler and embed the font file. Writing the correct stream to the document is another bigger challenge.

Sources:

[0] Developing with PDF – SafariBooksOnline

[1] PDF ISO-32000 Standard

[2] okular-gsoc2018-typewriter

 

I’m going to Akademy 2018 – Vienna, Austria

Being an active open source contributor to KDE Community and a GSoC 2018 student in the same organization, I’m going to attend Akademy 2018 in Vienna, Austria from August 10 to August 17.

Akademy is KDE annual conference and comprises of hundreds of attendees from the global community. The venue is Technische Universität Wien (TU Wien) and I’m glad that I’m being sponsored by the KDE e.V.

I have a short presentation on my ongoing GSoC project where I will spread the word about Okular under the Student Presentations Slot on day 1 at 16:30. Besides that, I’m planning to attend the Public Speaking Training and Practicals by Marta Rybczynska and various BoFs for the next 6 days. I will be lucky to witness the awesome speakers there.

See you in Wien!

 

 

My encounter with academic fiasco and depression

In the end, the institute and the place don’t matter. It’s the person who only matters.

I have wasted my one year regretting about my poor performance in JEE and boards due to typhoid.

Afterall the meeting point is now the same; both of them would be looking for a Ph.D. ahead!

And now that creates no any difference at all!

A child of four wanted to be a loco pilot in the Indian Railways,

A schoolboy of twelve wanted to be a fighter pilot of Mig-21,

Being profound in academics,

This lad with 10 CGPA in 10th std came to know about JEE in the summers of 2013,

Being a self-learner, he decided to prepare for JEE all alone at his home,

Cramming nothing but observing the movements of the birds using vectors on the terrace,

A boy from the small municipality area used the 20 MB internet pack as the only source to clear the doubts,

Without coaching and tuitions and without any fellow companion,

He topped the Allen Distance Test Series twice,

So was offered admission in the Kota batch but declined,

His passion raised for computer science and programming while following the school curriculum,

A KV student without any knowledge of KVPY, International Olympiads,

Went on preparing for JEE and learned programming & development in laze.

But at the beginning of 2015, the crucial year of every 12th student,

He developed some strange kind of fever,

The local clinic treated him with the viral fever,

It was the late Feb when he was diagnosed with typhoid stage II,

While in the phase of near death,

He successfully flunked his 12th boards and the JEE Mains on April 4,

where he was confident enough to acquire a CSE seat at some IIT!

And without hope, with a luck, got admitted to a Regional Engg College,

But the depressed boy of 18 was unable to cope up with the college and hostel life,

Cried all day and night, criticized his college and crushed his past examination times,

Being on anti-depressants, he finally dropped out of the college.

Took a six-month break where he rebuilds himself,

Raised himself from the herd mentality,

Followed his heart, believed in himself and joined a local Engg college,

Continued his degree and his passion for computers,

Afterall he can’t get a Ph.D. without a formal degree.

So now being a successful Google Summer of Code student,

And an aspiring machine learning research enthusiast,

Do you think life is always unfair?

 

GSoC :: Coding Period – Phase Two (June 13th to July 7th): Font color implementation in Poppler and Okular

Hi everyone,

The coding period phase two is now completed and I’m done with the font color implementation in Poppler’s Qt5 frontend and in Okular’s typewriter annotation tool. I have updated the phabricator revision D13203 and filed a bug and attached the patch in freedesktop’s bugzilla https://bugs.freedesktop.org/show_bug.cgi?id=107151.

As per the agreed timeline, I have patched the poppler-qt5 with the font color by introducing the ‘rg’ operator in the GooString which formats the font color in the RGB color model. In Okular, the font color chooser is introduced in the typewriter annotation setting dialogue which sets both the text annotation’s color and the engine color and hence colorize the typewriter icon color accordingly. The generator side and the doctype XML metadata for saving text color are also adapted. It is well supported in PagePainter too. The review comments (if any) from my mentor, Tobias Deiminger, is yet to come.

This is how it works:

Following is my plan for the next phase:

  • Respect font family in Poppler

You can track my commits at https://cgit.kde.org/okular.git/log/?h=gsoc2018_typewriter

Feedbacks and suggestions are always welcome :)