{"id":3251,"date":"2026-05-03T17:48:11","date_gmt":"2026-05-03T17:48:11","guid":{"rendered":"https:\/\/timallanwheeler.com\/blog\/?p=3251"},"modified":"2026-05-03T17:48:11","modified_gmt":"2026-05-03T17:48:11","slug":"alg4opt-exporting-alt-text","status":"publish","type":"post","link":"https:\/\/timallanwheeler.com\/blog\/2026\/05\/03\/alg4opt-exporting-alt-text\/","title":{"rendered":"Alg4Opt &#8211; Exporting Alt Text"},"content":{"rendered":"\n<p><a href=\"https:\/\/mykel.kochenderfer.com\/\">Mykel Kochenderfer<\/a> and I are finishing up the second edition of <em>Algorithms for Optimization<\/em> with MIT Press. The first edition took about five years, and the second edition has *only* taken&#8230; another five years. You can <a href=\"https:\/\/algorithmsbook.com\/optimization\/#download\">read  it here<\/a>.<\/p>\n\n\n\n<p>What started as a simple extension with three new chapters became much more extensive. We overhauled many existing chapters, added new sections to others, and generally touched up the entire book. It has earned the title of second edition.<\/p>\n\n\n\n<p>Modern technical textbooks include <em>alt text<\/em>, which are short annotations attached to images that allow visually impaired readers to receive those descriptions via screen readers. MIT Press authors provide alt text for every graphic element by submitting a spreadsheet in addition to the manuscript. Having alt text definitions live in a different place than our LaTeX source immediately felt like a bad idea. We don&#8217;t want to rely on memory to keep the alt text document up to date every time we change a figure or introduce a new one.<\/p>\n\n\n\n<p>I decided very early on that I wanted the alt text descriptions to be alongside each graphical element in the LaTeX source. I defined a macro, <code>\\alttext{...}<\/code>, and use that in every graphical element:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code class=\"\">\\begin{marginfigure}<br>    \\centering<br>    \\begin{tikzpicture}[<br>    \t->, >=stealth',<br>    \tlevel\/.style={sibling distance = 1.5cm\/#1, level distance = 1cm},<br>    \tmynode\/.style={circle, minimum size=7mm, align=center},<br>    \tterminal\/.style={mynode, draw=black, fill=none},<br>    \tnonterminal\/.style={mynode, draw=pastelBlue, fill=pastelBlue},<br>    \t]<br>    \t\\node [terminal] {$+$}<br>    \tchild{ node[terminal] {$x$} }<br>    \tchild{ node[terminal] {$\\ln$}<br>    \t\tchild{ node[terminal] {$2$}<br>    \t\t}<br>    \t};<br>    \\end{tikzpicture}<br>    \\alttext{A directed acyclic graph with a root plus node, a left-side x node, and a right-side natural log node with a two-node child.}<br>\t\\caption{<br>        \\label{fig:expr-ln}<br>        The expression $x + \\ln 2$ represented as a tree.<br>    }<br>\\end{marginfigure}<\/code><\/pre>\n\n\n\n<p><\/p>\n\n\n\n<p>These annotations let us update the alt text any time the figure is changed. The macro evaluates into nothing (<code>\\newcommand{\\alttext}[1]{}<\/code>), so it doesn&#8217;t affect document compilation. Great!<\/p>\n\n\n\n<p>Unfortunately, MIT Press still needed a spreadsheet containing the figure numbers, captions, and their corresponding alt text. I needed an automated way to export the alt text annotations from the LaTeX source.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The First Attempt: Per-Line Scanning<\/h2>\n\n\n\n<p>My initial approach was to write a <a href=\"https:\/\/julialang.org\/\">Julia<\/a> script that scanned each LaTeX source file line-by-line and used regex matching to identify graphical elements and the <code>\\alttext<\/code> macros. This script served a few purposes. First, it could identify which graphical elements still lacked alt text entries. Second, it allowed us to automatically check that our alt text entries adhered to MIT Press standards, such as avoiding certain characters and staying below a length limit. Finally, it packaged them up and exported the data to the required spreadsheet.<\/p>\n\n\n\n<p>The first challenge was figure identification. The script looked for  environments like <code>\\begin{tikzpicture}<\/code> and commands like <code>\\plot<\/code>. Unfortunately, there was more nuance here than one would have initially thought. TikzPicture environments can be nested, and in a few cases we define a new command containing a TikzPicture environment, only to invoke it later with a <code>\\protect<\/code> to inject it safely into a caption. We also have <code>\\begin{ignore}<\/code> environments, and have to ignore graphical elements in those.<\/p>\n\n\n\n<p>After graphical elements were identified, I searched within them for an alt text entry. This was done with a simple lookahead search. Not ideal, but it worked fairly well.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code class=\"\">function find_alttext(lines, line_index_lo::Int, line_index_hi::Int)::String<br>    for line in lines[line_index_lo:line_index_hi]<br>        m = match(r\"\\\\alttext\\{([^}]+)\\}\", line)<br>        if isa(m, RegexMatch)<br>            return strip(m[1])<br>        end<br>    end<br>    return \"\"<br>end<\/code><\/pre>\n\n\n\n<p><\/p>\n\n\n\n<p>MIT Press wanted the figure numbers for each alt text entry. That is, if a figure has a caption with, e.g., &#8220;Figure 5.7&#8221;, we wanted 5.7 exported in the spreadsheet. LaTeX does all of the numbering at compile time, which is extremely convenient, but that makes it harder to infer from the source. The script had to keep track of figure numbers, incrementing them over time, but only for graphical elements that had such captions. The script would try to figure out whether a caption existed, but would only assign a new  figure number if the graphical element was actually a figure or margin figure.<\/p>\n\n\n\n<p>This script was used for the first version of the spreadsheet that we sent to MIT Press. It worked really well with respect to identifying figures that didn&#8217;t have alt text, and verified the content of the entries. Unfortunately, it did have some issues with figure number identification, and I ended up having to manually scan through all figures in the book and manually update the spreadsheet &#8211; exactly what I had wanted to avoid doing.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">A Better Way: Lexing and Parsing<\/h2>\n\n\n\n<p>LaTeX source code is, well, code. And code is fundamentally hierarchical. Thinking of source code as a list of lines has a lot of downsides. Instead, to extract the data accurately, the script needed to understand the structure of the document more like the LaTeX compiler does. I started over, writing a proper lexer and parser, thereby providing a simple Abstract Syntax Tree (AST).<\/p>\n\n\n\n<p>A lexer&#8217;s job is to convert the source code &#8211; a long bytes array &#8211; into a sequence of manageable <em>tokens<\/em>. Reasoning about a character sequences like <code>\"\\alttext{this is a figure}\"<\/code> is a lot harder than reasoning about <code>&lt;command>&lt;brace_open>&lt;text>&lt;brace_close><\/code>.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"746\" height=\"219\" src=\"https:\/\/timallanwheeler.com\/blog\/wp-content\/uploads\/2026\/05\/2026-05-02_15-07.png\" alt=\"\" class=\"wp-image-3254\" srcset=\"https:\/\/timallanwheeler.com\/blog\/wp-content\/uploads\/2026\/05\/2026-05-02_15-07.png 746w, https:\/\/timallanwheeler.com\/blog\/wp-content\/uploads\/2026\/05\/2026-05-02_15-07-300x88.png 300w\" sizes=\"auto, (max-width: 746px) 100vw, 746px\" \/><\/figure>\n<\/div>\n\n\n<p><\/p>\n\n\n\n<p>Each <code>Token<\/code> is effectively an enum that applies to a range of characters:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code class=\"\">@enum TokenKind begin<br>    TOKEN_TEXT        # Raw text content<br>    TOKEN_COMMAND     # A LaTeX command starting with \\, e.g. \\alttext<br>    TOKEN_BRACE_OPEN  # An opening curly brace {<br>    TOKEN_BRACE_CLOSE # A closing curly brace }<br>end<br><br>struct Token<br>    kind::TokenKind<br>    i_byte_lo::Int<br>    i_byte_hi::Int<br>end<\/code><\/pre>\n\n\n\n<p>The next step, parsing, does the work of relating tokens to one another to form a tree. For example, in the running example, the text is an input into the <code>\\alttext<\/code> command, so it becomes a child:<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"661\" height=\"250\" src=\"https:\/\/timallanwheeler.com\/blog\/wp-content\/uploads\/2026\/05\/2026-05-02_15-11.png\" alt=\"\" class=\"wp-image-3255\" style=\"width:500px\" srcset=\"https:\/\/timallanwheeler.com\/blog\/wp-content\/uploads\/2026\/05\/2026-05-02_15-11.png 661w, https:\/\/timallanwheeler.com\/blog\/wp-content\/uploads\/2026\/05\/2026-05-02_15-11-300x113.png 300w\" sizes=\"auto, (max-width: 661px) 100vw, 661px\" \/><\/figure>\n<\/div>\n\n\n<p>This tree lets us represent larger structures more efficiently. For example, the marginfigure code sample at the top ends up being represented as a tree:<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"905\" height=\"475\" src=\"https:\/\/timallanwheeler.com\/blog\/wp-content\/uploads\/2026\/05\/2026-05-03_10-17.png\" alt=\"\" class=\"wp-image-3256\" style=\"aspect-ratio:1.90530195633683;width:750px\" srcset=\"https:\/\/timallanwheeler.com\/blog\/wp-content\/uploads\/2026\/05\/2026-05-03_10-17.png 905w, https:\/\/timallanwheeler.com\/blog\/wp-content\/uploads\/2026\/05\/2026-05-03_10-17-300x157.png 300w, https:\/\/timallanwheeler.com\/blog\/wp-content\/uploads\/2026\/05\/2026-05-03_10-17-768x403.png 768w\" sizes=\"auto, (max-width: 905px) 100vw, 905px\" \/><\/figure>\n<\/div>\n\n\n<p><\/p>\n\n\n\n<p>This tree structure makes it much easier to tell whether a caption is defined for the margin figure, or whether there is an alt text command, or whether the marginfigure itself is inside an <code>ignore<\/code> environment.<\/p>\n\n\n\n<p>Nodes are more complicated than Tokens, but not by much:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code class=\"\">@enum NodeKind begin<br>    NODE_ROOT    # The root node of the abstract syntax tree (AST)<br>    NODE_TEXT    # A text node<br>    NODE_GROUP   # A group of nodes enclosed in curly braces<br>    NODE_COMMAND # A command node, optionally with arguments as children<br>    NODE_ENV     # An environment like \\begin{...} ... \\end{...}<br>end<br><br>struct Node<br>    kind::NodeKind<br>    i_byte_lo::Int      # index into the original byte array of the first byte corresponding to this node<br>    i_byte_hi::Int      # index into the original byte array of the last byte corresponding to this node<br>    i_token_lo::Int     # index into the tokens array of the first token corresponding to this node<br>    i_token_hi::Int     # index into the tokens array of the last token corresponding to this node<br>    i_name_lo::Int      # index into the original byte array of the first byte of the command name<br>    i_name_hi::Int      # index into the original byte array of the last byte of the command name<br>    i_parent::Int       # parent node index, or zero otherwise<br>    i_first_child::Int  # first child node index, or zero otherwise<br>    i_sibling_next::Int # first sibling node index - circular list<br>    i_sibling_prev::Int # previous sibling node index - circular list<br>end<\/code><\/pre>\n\n\n\n<p>(This struct definition is very similar to <a href=\"https:\/\/timallanwheeler.com\/blog\/2026\/03\/31\/into-depth-large-array-of-things\/\">the Large Array of Things struct<\/a> from last month.). <\/p>\n\n\n\n<p>It supports five types of nodes. A true LaTeX compiler would of course do more, but we only need these. The root node is the root of the abstract syntax tree for the file being processed. Text nodes are basic text. Group nodes are formed from nodes enclosed by curly braces. Command nodes are commands like <code>\\alttext<\/code>. Environment nodes represent a <code>\\begin{the_env}...\\end{the_env}<\/code> pair.<\/p>\n\n\n\n<p>Using an AST made the alt text export logic much cleaner and more robust. This helped with the numbering issue, as it was easier to detect when a caption was defined, and whether a graphical element was inside an example or table.<\/p>\n\n\n\n<p>We were able to detect when a graphical element was defined inside a <code>\\newcommand<\/code> call, and added some basic tracking to find invocations of the defined command (typically after <code>\\protect<\/code>). These deferred macros were really hard to do with the previous system. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p>Yay! We built a mini-compiler to avoid filling out a spreadsheet. That might sound wasteful, but for a five year project, spending a few days to ensure long-term maintainability is an easy decision.<\/p>\n\n\n\n<p>What started as a simple string-matching script evolved into a LaTeX AST parser. Using the right data structure for the problem at hand made the exporting task a lot simpler. This process overall gives us the best of both worlds &#8211; our alt text lives next to the LaTeX for the graphical elements and it can be automatically exported into the spreadsheet that our publisher needs.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Mykel Kochenderfer and I are finishing up the second edition of Algorithms for Optimization with MIT Press. The first edition took about five years, and the second edition has *only* taken&#8230; another five years. You can read it here. What started as a simple extension with three new chapters became much more extensive. We overhauled [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[15],"class_list":["post-3251","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-textbook"],"_links":{"self":[{"href":"https:\/\/timallanwheeler.com\/blog\/wp-json\/wp\/v2\/posts\/3251","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/timallanwheeler.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/timallanwheeler.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/timallanwheeler.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/timallanwheeler.com\/blog\/wp-json\/wp\/v2\/comments?post=3251"}],"version-history":[{"count":9,"href":"https:\/\/timallanwheeler.com\/blog\/wp-json\/wp\/v2\/posts\/3251\/revisions"}],"predecessor-version":[{"id":3263,"href":"https:\/\/timallanwheeler.com\/blog\/wp-json\/wp\/v2\/posts\/3251\/revisions\/3263"}],"wp:attachment":[{"href":"https:\/\/timallanwheeler.com\/blog\/wp-json\/wp\/v2\/media?parent=3251"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/timallanwheeler.com\/blog\/wp-json\/wp\/v2\/categories?post=3251"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/timallanwheeler.com\/blog\/wp-json\/wp\/v2\/tags?post=3251"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}