Cranelift's Instruction Selector DSL, ISLE: Term-Rewriting Made Practical

Today I’m going to be writing about ISLE, or the “instruction selection/lowering expressions” domain-specific language (DSL), which over the past year we have designed, improved, and fully adopted in the Cranelift compiler project. ISLE is now used to express both our instruction-lowering patterns for each of four target architectures, and also machine-independent optimizing rewrites. It allows us to develop these parts of the compiler in an extremely productive way: we can write the key idea – that one opcode or instruction should map to another – in a concise way, while maintaining type-safety with an expressive type system, and allowing us to use the declarative patterns for many different purposes.

The goal of this blog post is to illustrate the requirements and the design-space that led to ISLE’s key ideas, and especially its departures from other “term-rewriting systems” and blending of ideas from backtracking languages like Prolog. In particular, we’ll talk about how ISLE has a strong type system, with terms of distinct types (as opposed to one “value” type); how it matches on abstract “extractors” provided by an embedding environment, which effectively define a virtual “input term” without ever reifying it; and how it was explicitly designed to have a simple “FFI” with Rust for easy-to-understand, no-magic interactions with the rest of the compiler. All of these properties allowed us to chart an incremental path from a fully handwritten compiler backend to one with all lowering logic in the DSL (in 27k lines of DSL code), with a relatively low defect rate over the year-long migration and with significant correctness improvements along the way.

The ISLE project was also a really interesting moment in my career personally: it was very much a “research project”, in that it required synthesis of existing approaches and careful thought about the domain and requirements, and invention of slightly new takes on old ideas in order to make the whole thing practical. At the same time, there was quite a lot of work put into the incrementalist and pragmatic aspect of the design (something I also talk about in an earlier post on regalloc2), and a lot of care and feeding to a 12-month migration effort, curation of our understanding of the language and how to use it, and nurturing of ongoing ideas for improvement. I feel pretty fortunate to have (i) been given the space to wrestle the problem space down to its essential kernel and find a working design, and (ii) a cohort of really great coworkers who ran with it and made it real. (Especially thanks to my brilliant colleague Nick Fitzgerald (@fitzgen) who completed the Cranelift integration of ISLE and who pioneered the idea of rewrite DSLs in Cranelift with his Peepmatic project, which inspired many parts of ISLE and primed the project for this effort.) I wrote more about the benefits we’ve seen so far from ISLE, and anticipate to see, here; suffice it to say that we’re happy we’ve gone this way and we look forward to additional results that this work has enabled.

This post covers material and repeats some arguments I made early in ISLE’s pre-RFC and RFC; I recommend reading those documents as well for further background, if desired. The ISLE language reference is the canonical definition of the DSL.

Let’s first go through some background on Cranelift, on compiler DSLs in general, and motivate the case for a DSL in Cranelift; then we’ll get into the details of ISLE proper.

Context: New Cranelift Backends and Handwritten Code

In the Cranelift project, starting in 2020, we developed a new framework for the machine backends – the part of the compiler that takes the final optimized machine-independent IR (intermediate representation) and converts it to instructions for the target instruction set, allocates registers, lowers control flow, and emits machine code. (I described more details in an earlier post series: first, second, third.)

Prior to introducing ISLE, we build three backends in this framework: AArch64, x86-64, and s390x (IBM Z). The general experience was quite positive – the simplicity that we aimed to achieve by focusing on a “straightforward handwritten lowering pass” design allowed us to quickly implement quite complete support for WebAssembly (core and SIMD) and support other users of Cranelift (like cg_clif). In March 2021, we made the new x86-64 backend the default, and at the end of September 2021, we were able to remove the old x86 backend with its legacy, complex, and generally slower framework.

The Need for a DSL

However, as with many engineering problems, there is a tradeoff point in the design space. The simplicity of the “just write the match on the opcode and emit the instructions” approach eventually becames a downside: we had an increasingly-deeply-nested lowering function with many conditions matching on types, sub-expressions, and special cases, and keeping track of it all became more and more difficult. In total, we found we were running into three major problems:

It became very tedious to write what was essentially a longhand form of a list of lowering patterns. If adding a special lowering for a combination of IR operators requires understanding control in handwritten code, and writing out the checks for special conditions or searches for other combining operators by hand, then we are much less likely to improve the compiler backends: the incentives instead point toward keeping the code as simple and as minimalistic as possible and discouraging change.
It became difficult to refactor the code at all: the compiler “lowering API” was ossifying as more and more handwritten backend code came to depend on its subtle details, and refactors became very hard or impossible. This became especially apparent with the regalloc2 work.
It became more and more difficult to maintain correctness, and ensure that the backends were generating the code we expected. While writing code against the lowering API, one had to keep in mind the subtle correctness invariants for when one can “combine” instructions, when one can “sink” an operation, and so on; and the rules for how to use registers and temporaries properly. Even when generating some kind of correct code, it was easy to miss a corner of the state space and, say, omit a lowering for a particular combination of input types, or skip an intended optimized lowering and use a general one instead, in an accidental and hard-to-reason-about way due to complex control flow. As we’ll see below, a DSL allows us to solve both of these problems with (i) principled strongly-typed abstractions in the DSL, and (ii) “overlap checks”, respectively.

It became clear that generating lowering patterns from some meta-description would lead to overall clearer and more maintainable compiler source, and would give us more flexibility if we wanted to change any details of or optimize the translation, as well. Hence, our realization that we probably needed a domain-specific language (DSL) to generate this part of Cranelift.

DSLs in Compilers and Term-Rewriting

There is a long history of DSLs to specify compilers – the “metacompiler” or “compiler-compiler” concept – going back to at least the 1970s. The general idea of a DSL-based instruction selection stage is to declaratively describe a list of patterns – combinations of operators in the program – and for each pattern, when it matches, a series of instructions that can implement that pattern. This makes it easier to reason about what the compiler is doing, to modify and improve it, and to apply systematic optimizations across the backend by changing how the DSL is used to generate the compiler backend itself.

The “pattern rewriting” approach to a compiler backend can be seen as a kind of term-rewriting system: that is, a formal framework in which rules operate on a data representation (in this case, the program to be compiled). Term-rewriting is an old idea: the lambda calculus, one of the original mathematical models of computation, operates via term-rewriting. It is general enough to be useful yet also concise and expressive enough to be very productive when the goal is to transform structured data.

One reason why term-rewriting is such a good fit for a compiler in particular is that “terms”, or nodes in an AST, represent values in the program being compiled; given this, a rewrite rule is an expression of an equivalence. An “integer addition” operator in a compiler IR is equivalent to (or produces an equivalent result to) the integer addition instruction in a given CPU’s instruction set; so we can replace one with the other. One might write this rule as something like (add x y) => (x86_add x y), for example. Likewise, many compiler optimizations can be expressed as rules, in a way that is familiar to any student of algebra: for example, x + 0 == x, or in an AST notation, (add x 0) => x.

Examples of such pattern-rules abound in production compilers. For example, in the Go compiler, a set of rules define how the IR’s operators are converted into x86-64 instructions:

// Lowering arithmetic
(Add(64|32|16|8) ...) => (ADD(Q|L|L|L) ...)

// combine add/shift into LEAQ/LEAL
(ADD(L|Q) x (SHL(L|Q)const [3] y)) => (LEA(L|Q)8 x y)

// Merge load and op
((ADD|SUB|AND|OR|XOR)Q x l:(MOVQload [off] {sym} ptr mem)) &&
    canMergeLoadClobber(v, l, x) &&
    clobber(l) =>
((ADD|SUB|AND|OR|XOR)Qload x [off] {sym} ptr mem)

Similar kinds of descriptions exist in the LLVM x86 backend and the GCC x86 backend, using the TableGen language (LLVM) and the Machine Description DSL (gcc), respectively.

There is a large and well-explored design-space for auto-generated compiler backends from such rules. A classical design is the BURS (bottom-up rewrite system) technique. I won’t attempt a deeper introduction here; further descriptions can be found in e.g. the Dragon Book¹ or Muchnick’s textbook². For this post, it suffices to know that these systems find a “covering” of tree patterns such as the above over an input expression tree.

Term Rewriting for Lowering… Maybe?

Given the above precedent – several mainstream compilers adopting a pattern-matching-based scheme, with clear benefits – it would seem that our path ahead is well-defined. Why, then, is there so much of this post remaining? What more could be said?

It turns out that three main problems arise when considering how to adopt a typical term-rewriting scheme. First, there is a basic question: do we actually reify the input tree as a tree? For example, if we have a pattern

(add x y) => (x86_add x y)

meaning that an add operator on two operands is lowered to an x86 ADD instruction, does that imply that our IR literally contains a node for the add?

That may seem like a silly question to raise, as CLIF, Cranelift’s IR, does indeed have an operator for add (actually iadd for “integer add”) that takes two arguments.

But directly matching on a “tree of operators” implies several properties of the IR that have deep impact. One of these properties is that the “value” or “result” of the operator is its unique identifier. In CLIF, this isn’t the case: each instruction has its own identifier (Inst) and each Inst can have any number of Value result. Another is that it implies that the tree that the matching process should see is exactly what is in the IR. However, quite a few considerations are involved when a backend wants to “merge” the handling of operators by looking deeper into the tree. None of these impedance mismatches are fatal to the approach, but they do imply extra work to build the tree in memory “as matched”.

Second, there is the problem of how to incorporate additional information beyond the raw tree of operators. One could see this as a question of “side-tables” or “auxiliary information”, or of supporting various “queries” on the input tree.

For example, we might want to represent the ability to encode an integer immediate in certain ISA-specific forms as a term. AArch64 has several such forms: a “regular” 12-bit immediate, and a rather clever “logical immediate” format designed to efficiently encode common kinds of bitmasks in only 13 bits. We might represent these with terms, and have rules that translate (iconst K) into (aarch64_imm12 bits) or (aarch64_logicalimm bits) and subsequent rules that match on these terms to encode immediate-using instruction forms. The problem then comes: how do we know which of these intermediate rewrites to do before we attempt to match any instruction forms? Do we do both, and represent both forms?

The net effect of this requirement is that the matching pattern for a rewrite rule starts to look less like a tree of terms and more like a sequence of custom queries. The Go compiler’s rules add predicates to rewrite rules to handle these cases, but this is awkward and makes the language harder to reason about. It would be better if we could represent the ISA concepts at the term-rewrite level as well.

Third, there is the question of how to interact with the rest of the compiler as we make these queries on the input representation. In the most straightforward implementation, a rewrite system has knowledge of the “tree nodes” that terms in the pattern match and that terms in the rewrite expression produce. But building the glue between the rewrite system and the IR data structures may be nontrivial, especially if custom queries (as above) are also involved.

All of this raises the question: is there a better way to think about the execution semantics of the rewrite rules? What if the DSL were not involved in ASTs or rewrites in a direct way at all?

Sequential Execution Semantics and “External Extractors/Constructors”

At this point, I hit upon ISLE’s first key idea: what if all interactions with the rest of the compiler were via “virtual” terms, implemented by Rust functions? In other words, rather than build a system that matches on literal AST data structures and rewrites or produces new output ASTs, all of the pattern matching would “bottom out” in a sort of FFI that would invoke the rest of the compiler, in handwritten Rust. The DSL itself knows nothing about the rest of the compiler, or “ASTs”, or any other IR-specific concept. (This is ISLE’s main secret: it is not actually an instruction-selection DSL, but actually (or at least aspirationally) a more general language.)

Sequential Semantics for Matching

One could imagine a rule like (iadd (imul a b) c) => (aarch64_madd a b c) to “compile” to a series of “match operations” like the following invented operations for some matching engine:

$t0, $t1 := match_op $root, iadd
$t2, $t3 := match_op $t0, imul
$t4 = create_op aarch64_madd, $t2, $t3, $t1
return $t4

In this “matching VM”, we execute match_op operations by trying to unpack a tree node into its children (arguments), given the expected operator. Any step in this sequence of match operators might “fail”, which causes us to try the next rewrite rule instead. If we can match the iadd from the input tree root, and the imul from its first argument, then the compiled form of this rule builds the aarch64_madd (“multiply-add”) term.

Programmable Matching

Rather than a fixed set of operators like match_op, what if we allowed for environment-defined operators? What if operators like the aarch64_logicalimm above were “match operators” as well, such that they “matched” if the given u64 could be encoded in the desired form and failed to match otherwise?

This is the essence of the “external extractor” idea (and the dual to it, the “external constructor”) in ISLE. Once we allow user-defined operators for the left-hand side (“matching pattern”) of a rule, we actually no longer need any built-in notion of AST matching at all; this becomes just another thing we can define in the “standard library” or “prelude” of our DSL!³

The basic idea is to introduce the ability to define a term like iadd or imul and associate an external Rust function with it. When appearing on the left-hand side of a rewrite rule, terms match “outside in”: that is, (iadd (imul a b) c) takes the root of the input, tries to use iadd to destructure it to two arguments, and tries to use imul to destructure it further. (This outside-in, reversed order is the opposite of what one might expect if this were a tree of function calls, because we are destructuring (extracting) rather than constructing. We’ll explore the analogy to functions more below.)

So, skipping straight to the real ISLE syntax now, we can define the term like

(decl (iadd Value Value) Value)

and then associate a Rust function with it like

(extern extractor iadd my_iadd_impl)

and this implies the existence of a Rust function that the generated matching code will invoke:

fn my_iadd_impl(&mut self, input: Value) -> Option<(Value, Value)>;

Likewise, we can define a term to be used on the right-hand side and associate an implementation like

(decl (aarch64_madd Value Value Value) Value)
(extern constructor aarch64_madd my_madd_impl)

with the Rust function

fn my_madd_impl(&mut self, a: Value, b: Value, c: Value) -> Value;

and then use it like

(rule (iadd (imul a b) c)
      (aarch64_madd a b c))

and the generated code will invoke the external extractor my_iadd_impl; if it returns Some (matches), will invoke whatever external extractor is associated with imul; if it also returns Some, then invoke aarch64_madd to “construct” the result. These Rust functions can do whatever they like: we have abstracted away the need to actually query a reified AST and mutate it.

Extractors and the Execution-Driven View

Another important consequence of this design is that we can define arbitrary extractors and constructors, and they can have arbitrary types. (ISLE is strongly-typed, with sum types that lower 1:1 to Rust enums.) This neatly addresses the “metadata or side-table” question above: we don’t need to generate auxiliary nodes in a real AST to represent information about a value, and we don’t need to know when to compute them; such computations can be driven by demand on the matching side, and don’t need to reify any actual node.

Let’s take a step back and understand what we have done here. We have taken a rule that expresses a tree rewrite – like (iadd (imul a b) c) => (madd a b c) – and allowed for the terms in the left-hand side and right-hand side to compile to Rust function calls, with well-defined semantics. The DSL itself deals only with pattern-matching; ASTs and compiler IRs are wholly outside of its understanding. Nevertheless, if we ascribe formal semantics to iadd, imul and madd, we can still reason about the rewrite at the term-rewriting system level, and this is essential to formal verification efforts for our compiler backends (see below!). We have thus allowed for integration with existing, handwritten Rust code while raising the abstraction level to allow for more declarative reasoning.⁴

It’s worth dwelling for a moment on the shift from an explicit AST data structure, traversed and built by a rule-matching engine, to calls to external extractors and constructors. As a consequence of this shift, the term nodes corresponding to extractor matches need not ever actually exist in memory. The ISLE rewrite flow thus works something like the “visitor pattern”: it introduces a level of abstraction that decouples the consumption and production of data from its representation, allowing more flexibility.

The execution-driven view of term rewriting gives us our rewrite procedure for free, as well: rather than an engine that takes some specification of patterns and rewrites, we compile rules to sequential code that invokes extractors and constructors. “Rewriting” a top-level term is equivalent to invoking a Rust function. If we define a term, lower, corresponding to instruction lowering, then (lower (iadd ...)) is a term that will be rewritten to whatever ISA-specific terms the rules specify. This rewriting is done by a Rust function that implements lower; we invoke it with the term’s arguments, and the body will match on extractors as needed, then invoke constructors to build the rewritten expression.

Explicit Types (and Implicit Conversions) in ISLE

The second key differentiator in ISLE as compared to most other term-rewriting systems is its strong type system. It might not be too surprising that a DSL that compiles to Rust would incorporate a type system that mirrors Rust’s type system to some degree, e.g. with sum types (enum variants). But this is actually a bit of a departure from conventional compiler-backend rule systems, and allows significant expressivity and safe-encapsulation wins, as we’ll see below.

Why Types?

A conventional rewrite system operates on an AST whose nodes all represent values in the program. In other words, every term has the same type (at the DSL level): we can replace iadd with x86_add because both have type Value.

While this works fine for simple substitutions, it quickly breaks down when various ISA complexities are considered. For example, how do we model an addressing mode? We might wish to have a node x86_add that accepts a “register or memory” operand, as x86 allows; and if a memory operand, the memory address can have one of several different forms (“addressing modes”): [reg], [reg + offset], [reg + reg + offset], [reg + scale*reg + offset].

We could impose some ad-hoc structure on the AST in order to model this: for example, an x86_add with an x86_load in its second argument (or alternately, a separate opcode x86_add_from_memory) could represent this case. Then we would need to have rules for the address expression: if another x86_add node (but only with register arguments!), we could absorb that into the instruction’s addressing mode.

Ad-hoc structure like this is fragile, though, especially when transformed by optimization passes. As a general guideline, as well, the more we can put program invariants into the type system (at any level), the more likely we are to be able to maintain the structure across refactors or unexpected interactions.

Types in ISLE

ISLE thus allows terms to have types that resemble function types. One can define a term

(decl lower_amode (Value) AMode)

that takes one argument, a Value, and produces an AMode. This type can then be defined as

(type AMode (enum (Reg (reg Reg))
                  (RegReg (ra Reg)
                          (rb Reg))
                  (RegOffset (base Reg)
                             (offset u32))))

and so on. The AMode compiles to an enum in Rust with the specified enum variants, making interop with Rust code (in external extractors and constructors) via these rich types straightforward. In our machine backends, machine instructions are defined as enums and constructed directly in the ISLE.

Typed Terms as Functions

Now that we have seen how to give a “signature” to a term, it’s worth discussing how one can see terms – both extractors and constructors – as functions, albeit in opposite directions. In particular, with a term

(decl F (A B C) R)

that has arguments (or AST child nodes) of types A, B, C, with a type R for the term itself, one can see:

F as a function from A, B, C to R, when used as a constructor; or
F as a function from R to A, B, C, when used as an extractor.

In other words, given a tree of terms in a pattern (left-hand side), where terms are interpreted as extractors, we can see each term as a function invocation from the “outer” type (the thing being destructured) to the “inner” types (the pieces that are the result of the destructuring). Conversely, given a tree of terms in an expression (right-hand side), we can see each term as a function from the “inner” types (the arguments of the new thing being constructed) to the “outer” type (the return value). This is another way of seeing the “execution-driven” view of ISLE semantics described above.

Note that the extractor form of F above is ordinarily a partial function – that is, it is allowed to have no mapping for a particular value. This is how we formally think about the “doesn’t match” case when searching for a particular kind of node in an AST, or any other matcher on the left-hand side of a pattern. In contrast, the constructor is normally total – cannot fail – unless explicitly declared to be partial. (Partial constructors are useful for if-let clauses.)

Types for Invariants

Support for arbitrary types lets us much more richly capture invariants as well, by encapsulating values (on the input side) or machine instructions (on the output side) so that they can only be used or combined in legal ways. For example:

Many instruction-set architectures have a “flags” register that is set by certain operations with bits that correspond to conditions (result was zero, result was negative, etc.) and used by conditional branches and conditional moves. This is “global” or “ambient” state and one has to be careful to use the flags after computing them, without overwriting them in the meantime. To ensure exact correspondence of a particular flag-producer and flag-consumer, certain instruction constructors create ProducesFlags and ConsumesFlags values rather than raw instructions. These can then be emitted together, with no clobber in the middle, with the with_flags constructor.
There is a distinction between an IR-level Value and a machine-level value in a register, which we denote with Reg. When a lowering rule requires an input to be in a register, it can use the put_in_reg constructor, which takes a Value and produces (rewrites to) a Reg. This provides a way for us to do bookkeeping (note that the value was used, and ensure that we codegen its producer), but also allows us to distinguish how to place the value in the register: one may wish to sign- or zero-extend the values.
There is a distinction between an IR-level Value and the instruction that produces it. Not every Value is defined by an instruction; some are block parameters. Furthermore, at a given point in the lowering process, we may not be allowed to see the producer of a value, if we cannot “sink” its effect (merge it into an instruction generated at the current point). Thus, instructions have Values as operands, but one goes from a Value to an Inst (instruction ID) with def_inst, which may or may not match depending on whether there is an Inst and we can see/merge it.

Implicit Conversions

Type-safe abstractions allow for well-defined and safe interfaces, but can lead to verbose code. After several months of experience with ISLE, we were finding that we wrote rules like:

(rule (lower (iadd (def_inst (imul x y)) z))
  (madd
   (put_in_reg x)
   (put_in_reg y)
   (put_in_reg z)))

when we would prefer to write more natural rules like

(rule (lower (iadd (imul x y) z))
      (madd x y z))

as in our original term-rewriting examples above. At first it seemed we had to choose one or the other: the shorter form would require abandoning some of the type distinctions we were making. But in actuality, there is some redundancy. Consider: when typechecking the left-hand side pattern, we know that iadd’s arguments (which the extractor will produce, if it can destructure the Inst type) have type Value, but the inner imul expects an Inst. In our prelude we have one canonical term that converts from one to the other: def_inst. Likewise, in the right-hand side, bindings x, y and z have type Value, but the madd constructor that builds a machine instruction requires Reg types (which are virtual registers in pre-regalloc machine code). We likewise have one term, put_in_reg, that can do this conversion. If there is only one canonical way, or usual way, to make the conversion, and if the types on both sides of the conversion are already known, why can’t we rectify the types by inserting necessary conversions automatically?

ISLE thus has one final trick up its sleeve to improve ease-of-use: implicit conversions. By specifying converter terms for pairs of types like

(convert Inst Value def_inst)
(convert Value Reg put_in_reg)

the typechecking pass can expand the pattern and rewrite expression ASTs as necessary. This makes writing lowering rules much more natural with less boilerplate, and we have a fairly rich set of implicit conversions defined in our prelude to facilitate this.

Putting it All Together: AST Patterns in ISLE

Now that we’ve taken a tour of the various features of ISLE, let’s review what we have built. We have started with a desire to express high-level rewrite rules that equate AST nodes – such as (iadd (imul x y) z) and (madd x y z) – and to have a system that performs such rewrites, in a way that interoperates with the existing compiler infrastructure and has predictable and comprehensible behavior.

ISLE allows high-level patterns to be compiled to straightforward Rust pattern-matching code. A strong type system with sum types (enums) ensures that terms in patterns and rewrites match the expected schema, and allows for expressing high-level invariants. Implicit conversions leverage these types to remove redundancy in the patterns, allowing for more natural high-level forms while retaining the useful type-level distinctions. The ability to arbitrarily define “extractors” to use in patterns allows us to build up a rich pattern-language in our prelude, matching trees of operators, values, and pieces of the input program with various properties in a programmable way. And the well-defined mapping to Rust and an execution scheme that maps terms directly to function calls, rather than an incrementally-rewritten AST allows for code as efficient as what one would write by hand.

We have thus gone from (iadd (imul x y) z) => (madd x y z) to something like:

Match the input Inst against the iadd enum variant, getting argument Values if so;
Get the Inst that produced the first Value, if we’re allowed to merge it, via def_inst;
Match that Inst against the imul enum variant, getting its argument Values if so;
Put all three remaining Values in registers with calls to put_in_reg; and
Emit an madd instruction with these registers

all via bindings defined in ISLE itself and without any knowledge of “instruction lowering” in the ISLE DSL compiler. As concrete evidence that the last point is valuable, we have been able to use ISLE for CLIF-to-CLIF rewrite rules in our new mid-end optimization framework as well, simply by defining a new prelude (see below!).

Ongoing and Future Benefits of Declarative Rules

The next most exciting thing about writing lowering rules declaratively – after the expressivity and productivity win while developing the compiler itself – is that being able to reason about the rules as data lets us analyze them in various ways and check that they satisfy desirable properties.

Correctness and Formal Verification

As an example, during the development of our rulesets, we found that it was sometimes unintuitive which rule would fire first. We have a priority mechanism to allow this to be controlled in an arbitrary way, but the default heuristics (roughly, “more specific rule first”) were sometimes counter-intuitive.

We thus invented the idea of an overlap checker, initially implemented by my brilliant colleague Trevor Elliott and subsequently redesigned with a new internal representation and algorithm by my other brilliant colleague Jamey Sharp. The key idea is to define “rule overlap” such that two rules overlap if a given input to the pattern-matching could cause either rule to fire. In these (and only these) cases, priority and/or default ordering heuristics determine which rule actually does fire. Then we decided that in such cases, we would require the ISLE author to use the priority mechanism to explicitly choose one of the rules. In other words, no two overlapping rules can have the same priority. Through a series of PRs to fix up our existing rules, we were able to actually find several cases where rules were fully “shadowed”, or unreachable because some other more general rule was always firing first. We turned on enforcing mode for non-overlap among same-priority rules and non-shadowing of rules by higher-priority rules after fixing several cases, and as a result, we now have more confidence in the correctness of our rulesets.

On a broader scale, writing rules as equivalences from one AST to another lets us verify that the two sides are, well, actually equivalent! There is an ongoing collaboration with some formal-verification researchers who are adding annotations to external extractors and constructors that describe their semantics (mostly in terms of an SMT checker’s theory-of-bitvectors primitives). Given these “specs”, one could lower each ISLE rule to SMT clauses rather than executable Rust code, and search for cases where it is incorrect. I won’t steal any thunder here – the work is really exciting (also still in progress) and the researchers will present it in due time – but it’s an example of what declarative DSLs allow.

The ISLE-to-Rust compiler (metacompiler) is also a fairly complex tool in its own right, and has had bugs in the past. What makes us confident that we are generating code that correctly implements the rules – even if the rules themselves are verified? To answer that question, we have considered how to verify the translation of the ISLE rules. Our current plan is to modify the ISLE compiler to generate both the production backend, with intelligently-scheduled matching operations, and a “naive” version that runs through rules sequentially in priority order. If the latter picks the same rule as the former, then we know we have a faithful implementation of the left-hand-side matching (and the translation of the right-hand-side expression to constructor calls is straightforward in comparison, so we trust it already). Then we can trust that our verified-correct rules are being applied as written.

Optimizing the Instruction Selector

Next, my colleague Jamey is working on a new ISLE metacompiler backend that lowers ISLE rules to a planned sequence of matching ops more efficiently. The ability to change the way that compiler-backend code matches the IR was a theoretical benefit of a DSL-based approach, recognized and evaluated as we weighed the pros and cons of ISLE, but admittedly was a bit speculative – no one knew if we would actually find a better way to generate code from the rules than the initial ISLE compiler and its “trie”-based approach. I am very excited that Jamey actually did manage to do this (and we should look forward to a hopeful future blog post in which he can describe this in more detail!).

Mid-end Optimizations

Finally, as mentioned above, we were able to find a second use for ISLE as part of the egraph-based mid-end optimizer work, which we just enabled by default. (I hope to write a blog post about this soon too!) This was excellent and very satisfying validation to me personally that ISLE is more general than just Cranelift backends: we were able to write a new prelude (and actually share a bunch of extractors too) so that rules could specify IR-to-IR rewrites, in addition to IR-to-machine-instruction lowerings. This will allow us to iterate on and improve the compiler’s suite of optimizations more easily in the future, and it will also have follow-on benefits in terms of shared infrastructure: verification tools that we build for backend lowering rules can also be adapted to verify mid-end rules. In addition, there are potentially other ways that putting all of the compiler’s core program-transform logic into the same DSL will allow us to blur the lines, combine stages, or move logic around in ways we can’t yet anticipate today; but it seems like a worthwhile investment. In any case, ISLE proved to be a quite useful tool in developing pattern-matching Rust code with less boilerplate and tedium than before!

Conclusion

ISLE has been great fun to design, build, and use; while we have learned a lot and made several language adjustments and extensions over the past year, I think that there is general consensus that it has made the compiler backends easier to work on. I’m excited to see how the ongoing work (verification, new ISLE codegen strategy) turns out, and how the language evolves in general. And as noted above, ISLE’s secret is that it is actually more general than instruction selection, or Cranelift: if you find another way to use it, I’d be very interested in hearing about it!

Thanks to Jamey Sharp and Nick Fitzgerald for reading and providing very helpful feedback on a draft of this blog post, and to bjorn3 and Adrian Sampson for feedback and typo fixes after publication.

A Aho, R Sethi, J Ullman, M S Lam. Compilers: Principles, Techniques, and Tools. 2006. ↩
S Muchnick. Advanced Compiler Design and Implementation. 1997. ↩
ISLE does have special knowledge about Rust enums, and the ability to match on them efficiently with match expressions in the generated Rust code, because to miss this optimization would be very costly. But in principle it could have been built without this, involving only Rust function calls and control flow around them. ↩
There is a parallel to the Prolog language here, in that it also allows for high-level, declarative expression of rule-matching with backtracking while also having a well-defined sequential execution semantics with FFI to an imperative world. In fact Prolog was a central inspiration for ISLE’s design. The key difference(s) are that (i) ISLE does not have full backtracking – once a left-hand side matches, we cannot backtrack, as the right-hand sides are infallible – and (ii) there is no unification, and all dataflow in a term is unidirectional, from input (value to be destructured) to output (arguments). We used to have “argument polarity”, which was closer to unification in that it allowed a configurable (but fixed) input/output direction for each argument to an extractor. We discarded this feature, however, in favor of a more general if-let clause. ↩