IT Blog & Photo Gallery

Introducing an advanced LaTeX toolchain

September 15, 2017

Creating large documents with different kinds of graphics, like UML diagrams and various plots, can be a quite daunting task. Building up on the LaTeX toolchain, this article will introduce ways for including PlantUML and GNUPlot graphics in a comfortable and easy to use manner. The resulting toolchain will streamline the writing process and eliminate unnecessary side tasks.

At some point most, if not all, developers have to write some kind of large text document, be it documentation or a bachelor, master or other thesis. As you’re just reading this right now, I suspect you might be just starting to write one yourself. Choosing the toolchain for such a document wisely makes enormous differences. That is especially important if you’re working on a document with multiple people. I learned this the hard way, while writing a 40 page documentation with 6 people using different versions of Word running on different operating systems. But if only wanted to praise the advantages of LaTeX, I wouldn’t need to write all this, so let’s get to the point.

The Point of this toolchain

Using this toolchain should result in nicely looking, functional documents, that don’t need much unnecessary effort to create. To give you a quick overview of all the benefits of this toolchain, I tried to boil them down to a list of bulled points. Practical, the plots and diagrams created with this toolchain will be:

  • Almost completely vectorized
    Avoiding nasty pixel soup in printed documents
  • Containing actual text
    that might be copied, searched for, or everything else you do with your document text
  • Printed in the document font
    so everything looks nice and clean
  • Created in text files
    so you can enjoy all the benefits of a version control system, e.g. diff the changes made to any diagram
  • Graphics compile automatically with the document itself
    so you don’t have to keep track of changed sources yourself and can concentrate on important stuff

So basically, you can work like in a software project; checkout new changes from version control, write something yourself and compile everything. In the end you’ll always have a document where everything is up to date and no worries about not correctly updated graphics.

Some of those points might not sound important, but I’ve come to value all of them while using this toolchain in combination with Git for writing my master thesis from two alternating computers. Of course I wanted it to look nicely, as it might be something to look back on after some years. Additionally, I didn’t want to check if all my generated plots and diagrams were still up to date every time I switched between my desktop at home and my laptop in the office. This should be enough for the why, let’s see how to actually do it.

Integrating PlantUML

As creating LaTeX documents with embedded UML diagrams is quite common, there are a lot of tools to use. I’ve stumbled across this question on StackExchange multiple times and it lists a lot of possible ways to integrate nice UML diagrams in LaTeX documents. I used the proposed MetaUML (an extension to MetaPost) for my bachelor thesis and I did get some decent diagrams from it. It’s main advantage is being able to define pixel perfect layouts for every last bit of your diagrams, but as a result, you must create pixel perfect layouts for every last bit. That was the main cause for me to step back from using it again as it typically took a couple of hours until a medium complex diagram was all set and done. Instead I wanted something that was more comfortable to use, but kept a similar quality level on the results.

I tried to set up a couple of other tools described there, but most of them would require additional LaTeX packages, that weren’t available trough the Ubuntu repositories. As I have had tried to install LaTeX packages manually without success in the past, I stopped at that point. The last option found in the answers on that site is PlantUML. It’s a nice tool that generates UML diagrams based on textual descriptions, that can also be embedded in other markup. Next to the large number of supported diagram types, PlantUML also supports a couple of output formats. Additional to PNG, SVG, and ASCII Art there is a native LaTeX output that uses TIKZ. As that feature is still in Beta and it didn’t work well in my experience, I opted out for another solution. Instead I use PlantUML to generate plain SVG files and convert them to a combined LaTeX+PDF format using Inkscape and it’s headless mode, which can easily be used in shell scripts. Although it seems a bit odd, this seems to be a usual way for including SVG files in LaTeX.

To automate this whole process, I’ve written a small shell script, that will look trough a specified directory looking for text files (containing the PlantUML descriptions). It will then check if there isn’t a matching SVG file or one that is older as the text file. If one is the case, it’ll run PlantUML to create a new SVG file. The same is then done for SVG files, comparing them to PDF files and using Inkscape for the conversion.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
#!/bin/bash
for file in assets/uml/*.txt; do
    base="${file%.txt}"
    if test ! \( -e "$base.svg" \) -o \( "${file}" -nt "${base}.svg" \); then
        echo "Generating diagram: ${file}"
        plantuml -tsvg ${file} > /dev/null &
    fi
done
wait

for file in assets/uml/*.svg; do
    base="${file%.svg}"
    if test ! \( -e "$base.pdf" \) -o \( "${file}" -nt "${base}.pdf" \); then
        echo "Creating PDF for: ${file}"
        inkscape -z -C -f ${file} -A ${base}.pdf \
            --export-latex -d 300 > /dev/null 2>&1 &
    fi
done
wait

Using such a conditional compilation will speed up following build processes drastically, as usually most diagrams will stay the same over time. To make use of multi-CPU computers when creating a lot of diagrams, both conversion steps will also process all their respective files in parallel. As Inkscape does convert some diagram parts to raster graphics, -d 300 will set the DPI to 300, avoiding blurred lines in printed documents. The conversion from SVG to PDF+LaTeX is theoretically done the same way by the LaTeX-Module svg, which we’ll use to include our diagrams into the main LaTeX-document.

1
2
3
4
5
6
7
8
% include and configure svg package for including
% generated pdf+tex files from Inkscape
\usepackage{svg}
\setsvg{svgpath=./assets/uml/}

% Omit file ending and path prefix set above
% when including the diagrams
\includesvg{diagram}

At this point we’re able to create PlantUML diagrams in separate text files in a designated directory and include those without a lot of work in our LaTeX-document. As the first part of the toolchain is now working (at least on it’s own), let’s go to the next part that is plots and diagrams generated using GNUPlot. After that, we’ll put everything together and to the automation.

Integrating GNUPlot

Integrating GNUPlot diagrams will work in a similar way as PlantUML, but without the need for an additional conversion step in the middle. We’ll directly use GNUPlots ability to export into tex files, which works a lot better than that of PlantUML. We’ll also be using another small shell script to implement the conditional compilation that can later be embedded in any kind of automation or build system. To do this properly, we’ll have to setup one small constraint to GNUPlot, that is only one diagram output per file. Also we have to make sure that the output is named the same as the source file, so we’ll leave out the output setting from the plot definition and supply it as a parameter to GNUPlot.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
#!/bin/bash
workingDir=$(pwd)
cd assets/plot/

for file in *.dem; do
    base="${file%.dem}"
    if test ! \( -e "$base.tex" \) -o \( "${file}" -nt "${base}.tex" \); then
        echo "Generating diagram: ${file}"
        gnuplot -e "set output '${base}.tex'" ${file} &
    fi
done
wait

cd $workingDir

If you feel the need to remove this constraint, you can of course let go of the conditional compilation, what might cost you a couple of seconds per compilation at best. To include the tex files output by GNUPlot, you can simply use \input{path/to/plots/text.tex}, but you’ll need to set your graphics path correctly using \graphicspath{path/to/plots} because GNUPlot will create additional vector graphics, similar to Inkscapes output. That’s in fact everything you need to include GNUPlot graphics as vector graphics with all labels and legend written in your document’s font.

Creating a Makefile

As this point as both PlantUML and GNUPlot graphics are nicely generated and can be included in the document, it’s still up to you to run the shell scripts manually every time anything changes and compile the document afterwards. One of the best known ways to automate such a build process would be make, which we’ll set up now. The Makefile needs three main parts one for each: UML diagrams, plots and LaTeX itself.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
latex = pdflatex -shell-escape -interaction=nonstopmode > /dev/null
file = master-thesis

pdf: uml plot
        latexmk -pdf -pdflatex="$(latex)" -use-make $(file).tex

uml:
        bin/makeuml

plot:
        bin/makeplot

clean:
        git clean -fX

As you see, the parts for UML diagrams and plots just delegate to the shell scripts, we’ve created. To keep the working directory nice and clean, I’ve created a separate directory for them and any other scripts one might need. For the LaTeX part, I’ve used latexmk which will take care of running LaTeX itself as often as needed to have everything up to date. Bibliography and the table of contents may need up to three compilation runs that rely on some intermediate files from earlier runs.

So if you run make pdf in your working directory the file specified in the Makefile, in this case ‘master-thesis.tex’, will be built with all plots and diagrams up to date. While this might be a sufficient state for using in a continuous integration environment

  • given that you have latex, GNUPlot, PlantUML and Inkscape installed on your build server
  • it’s still one commandline call away from being the thing that works silently in the background and you don’t even think about it after a while.

IDE Integration

To eliminate the last bit that makes you start the toolchain yourself, we’re going to integrate it into our (or at least my) LaTeX-IDE, that is TeXStudio. The first thing that might come to your mind if you know TeXStudio, might be just adding custom compile steps via the settings dialog. You’ll regret that decision in the exact moment when some other document gets build errors because the shell scripts can’t be found. Instead another even cooler feature of TeXStudio are the “magic comments”. They enable you to give custom settings per file.

So to integrate our new toolchain nicely, we’ll override the compile program. Prepending the shell scripts directly instead of replacing everything with a make pdf leaves the synctex configuration intact. That enables us to jump between the source code and the PDF file and is a feature that I wouldn’t want to miss.

1
% !TeX TXS-Program:compile = ./bin/makeuml | ./bin/makeplot | txs:///pdflatex

After hitting F5 (compile and preview) the first time with this comment in place TeXStudio wants you to confirm altering the build configuration for this document. That will result in another magic comment being added to your document, giving it a unique identity.

TL;DR - The Template

For all those of you who didn’t have the time to read this long article, you can take a look at my master-template repository on Github. It’s not named that way because I believe this to be the one template to rule them all, find them, bring them all, and in the darkness bind them. It’s just the template I extracted from my master thesis and didn’t think of any creative name. Of course it contains the introduced toolchain and aside from that it looks quite nice and has a thought trough file structure. You can find a preview document in the repository.