Unicode Conference 36: Day 1

These are live blog notes from the 2012 Unicode Conference in Santa Clara, California, 22nd October 2012.

Usual disclaimer for live blogging: These are informal notes taken by Dave Crossland at the event, and may or may not be similar to what was said by the people who spoke on these topics. Probably if something here is incorrect it is because Dave mistyped it or misunderstood, and if anyone wants corrections, they should email him immediately (dave@understandingfonts.com) – or post a comment.

Developing an OpenType font for complex scripts using Fontforge
===============================================================

Pravin Dinkar Satpute, Senior Software Engineer, Red Hat

Pravin Satpure: I am working for RHAT 7 eyars in i18n, working on Indic scripts. I developed 6 unicode chars for devanagari. Pashmiri language. I’m project lead for Lohit fonts, supporting the 9 major scripts and are the default fonts in Fedora, Debian, Ubuntu, and some are used in ChromeOS and Android. Wikipedia loads some

I am also leading the Liberation Fonts project, the default fonts in LibreOffice.

I want the world scripts in Unicode, so everyone on earth should have their language on their terminal.

I was to show production of OpenType fonts, why they are important, FontForge – a 5 minute demo for Latin – then whats involved in OpenType fonts for 20 mins – then demonstrate making a Devanagari font with FontForge.

Imagine a developer has made an excellent applicatoin, but the first screen in that app has broken fonts. What will you feel if you see the square with 4 numbers, or a dotted circle?

A friend was working in Embedded systems, and faced a problem, he was getting U+0916 instead of U+0915. It took 2 days to realise it was the font that was buggy! Understand fonts is important 🙂

Why this session?

1 If you see the Unicode from early version today, many scripts were added. 5.0 to 6.0 added Khadoshthi, Lepcha. If normal people want to use these langugaes, they can’t, because fonts either don’t exist or are VERY expensive. So we should have libre fonts for these languages available to all developers, so they can support these users in their applicatoins.

2 People in those communities WANT to do font development. I tell them Unicode is a good thing. We don’t have users from this session here today, but I hope they will learn the things we are covering today.

The number of script in unicode is increasing.

There are several knowledge domains involved OpenType font. Linguistics, knowing the language and writing system. Art, the visual drawing of letters. Technology, since the letters are shaped by an operating system, so you need technical knowledge to write the layout table.

In the RHAT office in Pune, there are people from all the language communities I can consult with.

Why FontForge?

Its the only libre font editor tool available today.

I started using FontForge in 2005, it was very complex to write OpenType then. Today its very easy!

Continue improvement: There are active users and developers around this tool, and you’ll typically get a reply to your query within 1 or 2 days. I found a problem with the grid fitting tables and posted about it on the mailing list and got a fix later that day!

It runs on GNU+Linux, Windows and MacOS X.

What is a complex script?

[audience ideas]

In Devanagari we have reordering of characters, what we type and what we see are different; what we type looks totally different to what we see when its ligated.

Can we call CJK complex? I don’t think so, at the rendering level no. There are huge character sets, 6k in Japanese, Chinese is more, but the complexity is on the input level. I feel personally that Indic scripts can go like that; every syllable in an Indic language will have a key code and the rest will be automatic. I hope one day 🙂

The complexity can be in the OS level or Font level and its complicated. Win, Mac and GNU+Linux have different shaping engines for OpenType standard.

Indic and Arabic have re-ordering all the time.

I will do a small demo of OpenType in FontForge.

[GNOME3 accessiblity panel has a zoom tool]

I open FontForge git latest. [Default theme]

I open a LiberationSerif-Regular.sfd and copy the 4 abcd glyphs. The em size isn’t matches so I scale them from Glyph origin 50%.

Then I go to MS Typography site,and find the OpenType specification, Features page, and see the documentation for the LIGA feature.

Then I go Element, Font Info, Lookups, Add Lookup, Ligature Substition, add a liga feature for Latin (default) script.

Add subtable, add the Ligature Glyph Name colom first, tahts the name of the final shape, and then in the Source Glyph Names col, I type

a b c

And a mouse hover shows a preview.

Q: Its glyphs or characters?

A: You can see the glyph with glyph name a is associated with the unicode value U+0061 which FontForge does automatically.

Now I generate a TTF and install the Test1 font, then in gEdit I can pick the Test1 font, and if I type abc then I see d!

Its the font doing the magic here 🙂

Q: How does Japanese ruby typesetting work? Where you have kana written above kanji?

Jungshik: Its done by browsers with CSS, its Harakana glyphs placed by the layout engine entirely.

Q: How is the order of subtition rules defined?

A: Its defined by the rules, you can see in the microsoft.com/typography/…/otfntdev/features.html its ccmp, liga, clig,

OpenType is a cross platform standard by ADobe and Microsoft, extends TrueType format by Apple. Its cross paltform, has i18n characer seupport, large glyph sets, and supports many advanced typographic features.

How do they work?

Unicode characters are input into a OPENTYPE LAYOUT SHAPER (OTLS) which also takes as input a OpenType font, and outputs GLYPH IDs in that font.

The OTLS reviews the sequewnce of unicode chars and asks what kind of char is it, and based on the OT spec it applies a number of faetures to the input chars. the shaper searches for these features in the fonts and processes the features and finally outputs the glyph ids.

In FontForge we can see these Glyph IDs in the Font View as the first number on the toolbar.

The whole magic behind OpenType is the OTLS. We can develop an OpenType font but we can’t b

Jungshik: Glyph ID alone isn’t sufficient, the OTLS also emits x y positions

Q: And with the GlyphIDs and positions any dumb renderer can render text?

A: Yes, this is a part of the overall rendering stack

Here i write in Gedit ‘pravin’ with 3 syllables in Devanagari.

The OTLS does this:

1. Analyse the text

2. Reorder chars as per script requirements

3. share glyph sequences with GSUB then GPOS

eg, Uniscribe on win, ICU in LibreOffice, Harfbuzz-NG on many free systems. A few years ago pango, qt and icu were all different OTLS and a few years back, harfbuzz was started to unify them. harfbuzz is meant to be fully compatible with uniscribe, so a single font will work the same everywhere. This makes font development easier.

Jungshik: What is Apple using on Mac OS?

A: They use AAT

Jungshik: But they support OpenType too, in addition to AAT they support OpenType to CoreText. I wonder what is behind that.

A: I wonder if they use harfbuzz?

Jungshik: Perhaps ICU, but I don’t know.

A: I have used AAT and I like that approach

Jungshik: I like it too

Q: Split vowels?

A: No, not in Devanagari, but in say Malayalam

Demonstration

[Brief talk about Unicode encodings]

designers draw shapes on paper and scan that. drawing on screen is different, they are very used to drawing on paper. they draw and scan and place on background. once its in the backgroud, they say Element, Autotrace.

FontForge has vectorised the points. Now remove the background image. The problem with this is that there are SO many points. We can remove these with ‘merge’ points, select them and CTRL-M. We can also draw around the background image instead of using autotrace. I am no artist and it took me a whole day to draw the indian rupee symbol.

Q: Do you have different Rupee shape for each script?

A: We do a little bit

So, lets copy some glyphs from Lohit Devanagari to a new font

You can see that font development is time consuming process and I think thats why there is not much community contribution to libre fonts.

The ligatures dont have Unicode points so the glyphs belong in the Private Use Area.

Q: are there rules to assign PUA points?

Roozbeh: We are meant to not point any CMAP table at those glyphs

We can set the glyph name to ‘khsa’ and assign the unicode point to -1, and pste the base glyphs into their correct slots.

Lets sets up the GDEF table. Go Element, Compact, we only see the glyphs that are IN the font. Now we go to the base glyphs in Font View and right click, Glyph Info, and set their OT Glyph Class as Base. THen the mark glyph we set as Mark.

www.microsoft.com/typography/otfntdev/devanot/features.aspx

now we set up the GSUB table. Font INfo, lookups, gsub, new lookup, ligature substitaion, akhn, for deva {dflt} script, and we can see this is set up correctly in the tooltip preview

Now in the metrics window we can see this working live.

GPOS, FontInfo, Lookups, GPOS tab, Add lookup, we want a Mark to Base position feature, abvs, for deva {dflt} script.

For testing purposes, we can see if we add the wrong feature.

Now we add an Above Base Mark GPOS lookup, and a subtable, name the anchor ANCHOR. then go to the base glyph, Point, Add Anchor, place it.

We add this to the mark glyph and then Point, Add Anchor, place it, and we can see in Metrics View that it works.

You can see moving the anchor point in the glyph view updated in real time in the metrics view.

In GPOS, the anchor point is a key idea.

How to debug for problems? there can be many issues, especailly with ‘cyclic’ features, where a b c becomes d and b c d becomes e.

I’m lucky! Are you aware of Nastaleeq? Its one of the most complex scripts. I’ve never seen a more complex script. Needs 1,000s of lookups. Debugging it is very hard!

Be patient. THat’s the only option we have.

Q: Do we have automated tools for subsitution that can become infinite and lokc up?

A: Yes, its possible. In FontForge if we do a sequnece, we pass that to harfbuzz. If we can test directly frmo FontForge it would be easier. Same problem with MS.

Q: Who defines the tags?

A: MS and Adobe, its an ISO standard, OpenType and Open Font Format are the same.

Roozbeh: Another good libre tool is TTX. It dumps the font as XML and you can edit tables without any side effects. Its nice to make precision edits. Play with existing fonts, dump them, see how they are arranged.

Jungshik: There is no automated sanity checking.

A: I’d like to see harfbuzz integrated into fontforge

Q: Is opentype sufficient for everything?

jungshik: nastaleeq is hard. it pushes the limits of opentype. you can not fully implement it, as needed by urdu speakers, you must do some compromises.

roozbeh: you want geometric calculations, move dots based on actual text, so fonts will always been a compromise even with a more advanced engine than OT. www.flickr.com/photos/pimu/4671362490/ You can see 2 baselines, every word has its own base line and there is the line baseline. tools arent ready for this.

A: in OT fonts, the font developer is totally dependent on the OT spec. AAT gievs total control to the font developer, so i think it is the best. OT shaper has so many implementatinos that differ. we should have a single shaper to make life easier. long term, a single base can solve all problems and move forward.

Q: when creating gsub rules, how can you verify they are correct?

A: thats where a linguist role comes in. you test by installing the font and typing text using those rules and see it is correct with native reader knowledge.

Finally, here is a testing tool:

http://utrrs-testing.rhcloud.com/languages/hi/gsub

Its libre licensed, under MIT license, in 2010.

This entry was posted in Knowledge. Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

Post a Comment

Your email is never published nor shared. Required fields are marked *

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

*
*

  • What we do

    Understanding Fonts is a type design training business. If you'd like an event in your college or city, let Dave know: dave@understandingfonts.com