Modern LaTeX

From Exterior Memory
Jump to: navigation, search

Many LaTeX users do not employ the capabilities of their typesetting system, even though modern LaTeX distributions, such as TeX Live have extensive support for Unicode and OpenType. This document shows you how to harvest that power.

TeX through the Ages

In the beginning, Donald Knuth created TeX, at that time a modern typesetting system.

Despite claims to the contrary, TeX never really distinguished between mark-up and layout. LaTeX was a great improvement in this respect, but it never achieved the same separation as HTML did with XHTML and CSS.

pdfTeX and pdfLaTeX show their age by now, as the underlying processing engine is still largely unmodified. For example, it is still using an old font format, and has very poor internationalization support.

XeTeX is one attempt to fix this, and works reliable since about 2007. LuaTeX is another variant, which aims to take the best of pdfLaTeX, XeLaTeX, ConTeXt, and Ω (Omega). The community hopes that it will replace them all when it is finished. LuaTeX is still in development.


ConTeXt is one of the major forks from the original TeX typesetting system (pdfTeX is the other).

ConTeXt gives you much more control over the final layout of a page, while LaTeX has a good looking default layout which is hard to change. Since ConTeXt has no default layout, you do need to specify the details. LaTeX relies on packages for more advanced typesetting options. The downside is that LaTeX has many package conflicts. ConTeXt provides a much larger base functionality then LaTeX and therefor has no packages, and thus also has no package conflicts (think of it as a monolithic engine instead of the modular design of plain TeX). The downside of course is that if you want to achieve something non-standard in ConTeXt, there is no package to quickly provide that function for you.

It is possible to combine ConTeXt with both plain TeX, pdfTex, XeTeX, and luaTex. For example, to combine ConTeXt with XeTeX, run:

texexec --xetex file.tex

Output Format

Originally, the output of LaTeX was DVI files. DVI files were a novelty in 1979, when they were designed, but are seriously outdated by now. Today pdflatex is the de-facto standard to output PDF files (albeit I have to admit that XeTeX still uses extended DVI, XDV, as the intermediate format). Like DVI, PDF is a device independent file format, but has the advantage that it can contain embedded images and fonts. (DVI can not contain fonts, XDVI can as far as I understood.)

Text Encoding

LaTeX originally employed an archaic system to support non-ASCI characters. For example \'e becomes é and ``Hello'' becomes “Hello”. This is no longer necessary. A modern author simply writes é and “Hello” to yield the desired result. Since é and “” are non-ASCI characters, you need to set the text encoding of your file. UTF-8 is the de-facto standard:




Both packages support (amongst others) input encoding in UTF-8. xunicode is used in conjunction with OpenType (or TrueType) fonts as provided by xelatex, while inputenc is used in conjunction with older TFM Fonts (older, but still often used LaTeX fonts). See the next section.

Whilst ancient formats such as ASCI and Latin-1 (ISO 8859-1) have long been left behind by civilized society, there still remain small differences in character encoding. Nitpicks will further demand line feed for line endings, no Byte Order Mark, and Normalization Form C for all files. Of course, they are right, but since nearly every decent programs does this right, you don't have to bother about it.

Font Encoding

Main article: Fonts in LaTeX

All LaTeX and OpenType fonts use the Unicode Character System (UCS).

Most LaTeX authors use a relative old font selection system, based on glyphs stored in the metafont (MF) files and hints stored in TFM (TeX Font Metric) files. In addition, it is possible to use Postcript Type1 fonts and TrueType fonts stored in the texmf directory.

LaTeX by default uses the (very old) OT1 encoding to access the glyphs. OT1 only allows 128 glyphs to be accessible at the same time. For that reasons, most authors explicitly choose Postscript Type1 fonts, which allows 256 glyphs to be accessible at the same time (if LaTeX displays the <, > and | characters as ¡, ¿ and –, you have not set the encoding to Postscript type 1).


Note that if you want to use a writing system with more than 256 glyphs (such as Chinese, Japanese or Korean), you can change the encoding vector to access the other glyphs. The CJK and CJKutf8 packages allow you to automatically switch between encoding vectors as required.

While the above system is still in widespread use, you no longer need to use it. XeTeX (and XeLaTeX) is a variant of PdfTeX (and PdfLaTeX) that supports OpenType or TrueType fonts definitions installed in the operating system. All modern LaTeX distributions, such as TeXLive, contain xelatex support.

The header lines in a TeX file for using TFM fonts typically looks like this:

%!TEX TS-program = pdflatex
\usepackage{lmodern} % Use Latin Modern instead of Computer Modern

The header lines in a TeX file for using OpenType fonts typically looks like this:

%!TEX TS-program = xelatex
\defaultfontfeatures{Mapping=tex-text}  % For archaic input (e.g. convert -- to en-dash)
\setmainfont[SmallCapsFont={* Caps}]{Latin Modern Roman}
\setsansfont{Latin Modern Sans}
\setmonofont[SmallCapsFont={Latin Modern Mono Caps}]{Latin Modern Mono Light}

Furthermore, instead of typesetting your LaTeX file with pdflatex:

pdflatex mydoc.tex

You need to typeset with xelatex:

xelatex mydoc.tex

Note: I highly recommend to use version 0.999.7 or higher of XeTeX if you want to use OpenType fonts. (The older xdv2pdf driver that was used up till 0.999.6 gave too many errors.)

For further information on Fonts and LaTeX, see the short and excellent tutorials at ∃xistential Type:

Recommended Fonts

Latin Modern, Computer Modern Unicode (CMU) and Deja Vu are good font choices. These fonts all have serif, sans-serif and monospaced variants. If you want more variations, have a look at the TeX Gyre Fonts collection for modern OpenType fonts.