Last night I pushed another performance enhancement for Elfeed, this
time reducing the time spent parsing feeds. It’s accomplished by
compiling, during macro expansion, a jQuery-like domain-specific
language within Elfeed.
Given the nature of the domain — an under-specified standard
and a lack of robust adherence — feed parsing is much more heuristic
than strict. Sure, everyone’s feed XML is strictly conforming since
virtually no feed reader tolerates invalid XML (thank you, XML
libraries), but, for the schema, the situation resembles the de
facto looseness of HTML. Sometimes important or required information
is missing, or is only available in a different namespace.
Sometimes, especially in the case of timestamps, it’s in the wrong
format, or encoded incorrectly, or ambiguous. It’s real world data.
To get a particular piece of information, Elfeed looks in a number of
different places within the feed, starting with the preferred source
and stopping when the information is found. For example, to find the
date of an Atom entry, Elfeed first searches for elements in this
Failing to find any of these elements, or if no parsable date is
found, it settles on the current time. Only the
updated element is
published usually has the desired information, so it
goes first. The last three are only valid for another namespace, but
are useful fallbacks.
Before Elfeed even starts this search, the XML text is parsed into an
xml-parse-region — a pure Elisp XML parser
included in Emacs. The search is made over the resulting s-expression.
For example, here’s a sample from the Atom specification.
<?xml version="1.0" encoding="utf-8"?>
<title>Atom-Powered Robots Run Amok</title>
<link rel="alternate" href="http://example.org/2003/12/13/atom03"/>
Which is parsed to into this s-expression.
((feed ((xmlns . "http://www.w3.org/2005/Atom"))
(title () "Example Feed")
(link ((href . "http://example.org/")))
(updated () "2003-12-13T18:30:02Z")
(author () (name () "John Doe"))
(id () "urn:uuid:60a76c80-d399-11d9-b93C-0003939e0af6")
(title () "Atom-Powered Robots Run Amok")
(link ((rel . "alternate")
(href . "http://example.org/2003/12/13/atom03")))
(id () "urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a")
(updated () "2003-12-13T18:30:02Z")
(summary () "Some text."))))
Each XML element is converted to a list. The first item is a symbol
that is the element’s name. The second item is an alist of attributes
— cons pairs of symbols and strings. And the rest are its children,
both string nodes and other elements. I’ve trimmed the extraneous
string nodes from the sample s-expression.
A subtle detail is that
xml-parse-region doesn’t just return the
root element. It returns a list of elements, which always happens to
be a single element list, which is the root element. I don’t know why
this is, but I’ve built everything to assume this structure as input.
For Elfeed, all namespaces stripped from both elements and attributes
to make parsing simpler. As I said, it’s heuristic rather than strict,
so namespaces are treated as noise.
A domain-specific language
Coding up Elfeed’s s-expression searches in straight Emacs Lisp would
be tedious, error-prone, and difficult to understand. It’s a lot of
assoc, etc. So instead I invented a jQuery-like, CSS
selector-like, domain-specific language (DSL) to express these
searches concisely and clearly.
For example, all of the entry links are “selected” using this
(feed entry link [rel "alternate"] :href)
Reading right-to-left, this matches every
href attribute under every
link element with the
rel="alternate" attribute, under every
entry element, under the
feed root element. Symbols match element
names, two-element vectors match elements with a particular attribute
pair, and keywords (which must come last) narrow the selection to a
specific attribute value.
Imagine hand-writing the code to navigate all these conditions for
each piece of information that Elfeed requires. The RSS parser makes
up to 16 such queries, and the Atom parser makes as many as 24. That
would add up to a lot of tedious code.
The package (included with Elfeed) that executes this query is called
“xml-query.” It comes in two flavors:
The former returns just the first match, and the latter returns all
matches. The naming parallels the
(let ((xml (elfeed-xml-parse-region)))
(xml-query-all '(feed entry link [rel "alternate"] :href) xml))
;; => ("http://example.org/2003/12/13/atom03")
That date search I mentioned before looks roughly like this. The
matches text nodes within the selected element. It must come last just
like the keyword matcher.
(or (xml-query '(feed entry published *))
(xml-query '(feed entry updated *))
(xml-query '(feed entry date *))
(xml-query '(feed entry modified *))
(xml-query '(feed entry issued *))
Over the past three years, Elfeed has gained more and more of these
selectors as it collects more and more information from feeds. Most
recently, Elfeed collects author and category information provided by
feeds. Each new query slows feed parsing a little bit, and it’s a
perfect example of a program slowing down as it gains more features
But I don’t want Elfeed to slow down. I want it to get faster!
Optimizing the domain-specific language
Just like the primary jQuery function (
xml-query-all are functions. The xml-query engine processes the
selector from scratch on each invocation. It examines the first
element, dispatches on its type/value to apply it to the input, and
then recurses on the rest of selector with the narrowed input,
stopping when it hits the end of the list. That’s the way it’s worked
from the start.
However, every selector argument in Elfeed is a static, quoted list.
Unlike user-supplied filters, I know exactly what I want to
execute ahead of time. It would be much better if the engine didn’t
have to waste time reparsing the DSL for each query.
This is the classic split between interpreters and compilers. An
interpreter reads input and immediately executes it, doing what the
input tells it to do. A compiler reads input and, rather than execute
it, produces output, usually in a simpler language, that, when
evaluated, has the same effect as executing the input.
Rather than interpret the selector, it would be better to compile it
into Elisp code, compile that into byte-code, and then have the
Emacs byte-code virtual machine (VM) execute the query each time it’s
needed. The extra work of parsing the DSL is performed ahead of time,
the dispatch is entirely static, and the selector ultimately executes
on a much faster engine (byte-code VM). This should be a lot faster!
So I wrote a function that accepts a selector expression and emits
Elisp source that implements that selector: a compiler for my DSL.
Having a readily-available syntax tree is one of the big advantages
of homoiconicity, and this sort of function makes perfect sense
in a lisp. For the external interface, this compiler function is
called by a new pair of macros,
These macros consume a static selector and expand into the compiled
Elisp form of the selector.
To demonstrate, remember that link query from before? Here’s the macro
version of that selection, but only returning the first match. Notice
the selector is no longer quoted. This is because it’s consumed by the
macro, not evaluated.
(xml-query* (feed entry title [rel "alternate"] :href) xml)
This will expand into the following code.
(dolist (v xml)
(when (and (consp v) (eq (car v) 'feed))
(dolist (v (cddr v))
(when (and (consp v) (eq (car v) 'entry))
(dolist (v (cddr v))
(when (and (consp v) (eq (car v) 'title))
(let ((value (cdr (assq 'rel (cadr v)))))
(when (equal value "alternate")
(let ((v (cdr (assq 'href (cadr v)))))
(throw 'done v))))))))))))
As soon as it finds a match, it’s thrown to the top level and
returned. Without the DSL, the expansion is essentially what would
have to be written by hand. This is exactly the sort of leverage
you should be getting from a compiler. It compiles to around 130
xml-query-all* form is nearly the same, but instead of a
throw, it pushes the result into the return list. Only the prologue
(the outermost part) and the epilogue (the innermost part) are
Parsing feeds is a hot spot for Elfeed, so I wanted the compiler’s
output to be as efficient as possible. I had three goals for this:
No extraneous code. It’s easy for the compiler to emit
unnecessary code. The byte-code compiler might be able to eliminate
some of it, but I don’t want to rely on that. Except for the
identifiers, it should basically look like a human wrote it.
Avoid function calls. I don’t want to pay function call
overhead, and, with some care, it’s easy to avoid. In the
xml-query* expansion, the only function call is
throw, which is
xml-query-all* version makes no function calls
whatsoever. Notice that I used
assq rather than
assoc. First, it
only needs to match symbols, so it should be faster. Second,
has its own byte-code instruction (158) and
assoc does not.
No unnecessary memory allocations. The
makes no allocations. The
xml-query-all* version only conses
once per output, which is the minimum possible.
The end result is at least as optimal as hand-written code, but
without the chance of human error (typos, fat fingering) and sourced
from an easy-to-read DSL.
In my tests, the xml-query macros are a full order of magnitude
faster than the functions. Yes, ten times faster! It’s an even
bigger gain than I expected.
In the full picture, xml-query is only one part of parsing a feed.
Measuring the time starting from raw XML text (as delivered by
cURL) to a list of database entry objects, I’m seeing an
overall 25% speedup with the macros. The remaining time is
xml-parse-region, which is mostly out of my control.
With xml-query so computationally cheap, I don’t need to worry about
using it more often. Compared to parsing XML text, it’s virtually
When it came time to validate my DSL compiler, I was really happy
that Elfeed had a test suite. I essentially rewrote a core component
from scratch, and passing all of the unit tests was a strong sign that
it was correct. Many times that test suite has provided confidence in
changes made both by me and by others.
I’ll end by describing another possible application: Apply this
technique to regular expressions, such that static strings containing
regular expressions are compiled into Elisp/byte-code via macro
expansion. I wonder if situationally this would be faster than Emacs’
own regular expression engine.