C3 Blog

The defer statement is going mainstream. Go has it's own special defer which only fires on function end, otherwise defer has consistent "execute at scope end" semantics. Swift, Zig, Jai, Nim and Odin all use defer in this manner.

The problems with implementing defer is similar to implementing destructors for stack allocated objects in C++, although the presence of virtual functions complicates things.

I couldn't find anyone describing how defer is done in other compilers so when working on a version of it for C2 I had to make it up as I went along.

For posterity's sake I thought it might be interesting to do a writeup on how defer was implemented.

Setting the rules

First up there are lots of different possible rules to adapt defer to. The original draft of this article would handle goto across defers. C2 retains goto, and for a long time, so did C3 – so this was important to make the article complete. However goto adds much complexity to defer which made the article both much longer and harder to follow.

For that reason we'll limit ourselves to return, continue, break plus labelled versions of the latter. If there is interest I can go into details on how to add defer for goto in another article.

Handling early exit

The first issue in defer is the early exit:

void test()
{
  defer printf("A");
  if (rand() % 2 == 0) return;
  printf("B");
}

Every return needs to inline the defer at the end, so this is lowered to:

void test()
{
  if (rand() % 2 == 0) 
  {
    printf("A");
    return;
  }
  printf("B");
  printf("A");
}

For break and continue this is handled similar to return but only part of the defers may be inlined at the point of the break:

void test()
{
  defer printf("A");
  while (true)
  {
    defer printf("B");
    {
      defer printf("C");
      if (rand() % 2 == 0) break;
    }
  }
}

And the inlined version:

void test()
{
  while (true)
  {
    {
      if (rand() % 2 == 0) 
      {
        printf("C");
        printf("B");
        break;
      }
      printf("C");
    }
    printf("B");
  }
  printf("A");
}

We also have the labelled version of break. (I'll stick to the java-style labelled break syntax here, even though C3 has a different variant)

void test()
{
  defer printf("A");
  FOO: while (true)
  {
    defer printf("B");
    while (true)
    {
      defer printf("C");
      if (rand() % 2 == 0) break FOO;
    }
  }
}

This is again lowered to:

void test()
{
  FOO: while (true)
  {
    while (true)
    {
      if (rand() % 2 == 0) 
      {
        printf("C");
        printf("B");
        break FOO;
      }
      printf("C");
    }
    printf("B");
  }
  printf("A");
}

So as we see it's sufficient to keep a list of the defers and then inline the defer statements in reverse order where we encounter a break, continue or return.

Putting it all together

So now we've listed all the things we need to solve. How do we put it together? Here's the algorithm I used:

For each defer, keep a pointer back to the previously active defer.
For each dynamic scope, keep track of the current defer.
On break/continue/return AST nodes, store 2 AST nodes.
On each scoping AST node (while, for, compound statement etc) store 2 AST nodes.

Algorithm

Set current_scope->current_defer = NULL
Traverse the AST-tree.

When pushing a new dynamic scope

1	current_scope->current_defer = prev_scope->current_defer;

When encountering a defer:

1 2	defer->prev_defer = current_scope->current_defer; current_scope->current_defer = defer;

When encountering a scoped statement:

scoped_stmt->start_defer = current_scope->current_defer;
push_new_current_scope();
... recursively process nodes ...
scoped_stmt->end_defer = current_scope->current_defer;
pop_current_scope();

When encountering a break or continue:

1
2
3

Ast* target_ast = find_target(break_stmt);
break_stmt->end_defer = current_scope->current_defer;
break_stmt->start_defer = target_ast->start_defer;

When encountering a return:

1	return_stmt->defer = current_scope->current_defer;

This results in us being able to use each defer as the top of a linked list:

current_defer
↓
current_defer->prev_defer
↓
current_defer->prev_defer->prev_defer
↓
NULL

Codegen is now easy.

We introduce a helper function to inline defers:

void codegen_defers(Defer *current, Defer *last)
{
  while (current_defer != last)
  {
    codegen(current_defer);
    current_defer = current_defer->prev_defer;
  }
}

When doing codegen for a scoped statement:

1 2	codegen(scoped_stmt->inner_stmt); codegen_defers(scoped_stmt->end_defer, scoped_stmt->start_defer);

When doing codegen for break or continue:

1 2	codegen_defers(break_stmt->end_defer, break_stmt->start_defer); codegen_break(break_stmt);

Codegen for return

1 2	codegen_defers(return_stmt->defer, NULL); codegen_return(return_stmt);

Going further

Ok, so now we're done? Not quite, if we want to go beyond C syntax. We can imagine something looking a bit like this:

1	if (File *f = getFile(), defer close(f)) { ... }

In this case we actually have two scopes: one inner scope (between {}) and the outer one that starts in the conditional.

The principle is the same so we can reuse the same solution as above, but it's worth taking note of this case.

Defer after if

We have other questions to answer as well. What does this code do:

1	if (x == 0) defer printf("x was 0\n");

Some people have suggested that this should be treated as:

defer
{
  if (x == 0) printf("x was 0\n");
}

I am strongly against that idea, as it would mean that compound statements suddenly have a different meaning than regular statements.

Defer as part of the function

Another interesting thing one can do with defer is the idea that a function may contain an implicit `defer` that is added to the scope which invokes it. Odin has that feature using "deferred attributes" (see further down from this link). This is simple to tie into the defer machinery.

Defers & goto

Handling goto with defers is a bit more complicated as one need to conditionally invoke defers:

void test(int x)
{
  if (x > 0) goto FOO;
  // When is this called?
  defer printf("A");
  FOO:
  printf("B");
}

The lowered code needs to look like this:

void test(int x)
{
  bool _defer_1 = false;
  if (x > 0) goto FOO;
  _defer_1 = true;
  FOO:
  printf("B");
  if (_defer_1) 
  {
    printf("A");
  }
}

Since B can jump into scopes as well as out of scopes, this adds another dimension to the analysis. The solution is not hard but definitely not as straightforward as the structured jumps of break and continue

Non-local jumps of setjmp are not possible to handle at all.

Go style defers

Go has a different style of defer. Go's defers actually store the defer code like a closure that is queued and invoked at function end rather than at scope end. This means a defer actually needs to allocate memory for itself. A loop like this:

for() {
  ...
  defer ...
  ...
}

Would queue up all the defers generated in the loop in a long list and release them at function end. If the `defer` is releasing something limited like db connections then this is a bad idea. For various "gotchas" in Go due to this style of defer, see this.

While Go defers work nicely with exceptions and `goto`, it has quite a bit of quirks as well as the need to reserve memory to store the defers.

Defers and errors

Sometimes one would prefer for defers to only occur on error:

File *getAndCheckFile()
{
   File *f = getFile();
   if (!f) return NULL;
   // We want to close if we return with error.
   defer close(f);
   if (!fileIsValid(f)) return NuLL;
   if (readHeader(f) != 0xdeadbeef) return NULL;
   // oops, we will be closing f!
   return f;
}

For this reason Zig introduces errdefer, and C3 has defer catch / defer try statements.

Being able to cancel defers

As an alternative (and complement) to special forms of defer is being able to cancel defers. So far I've only seen this functionality on defer implemented as RAII. Theoretically it could look something like:

File *getAndCheckFile()
{
   File *f = getFile();
   if (!f) return NULL;
   FOO: defer close(f);
   if (!fileIsValid(f)) return NuLL;
   if (readHeader(f) != 0xdeadbeef) return NULL;
   undefer FOO;
   return f;
}

Summary

Defer is useful functionality for languages that lack both finally and RAII. With structured jumps it is straightforward to implement with zero overhead.

It is generally understood that overflowing an add or a multiplication is usually a bad thing. This seems to imply that the solution is to detect and trap (quit the program or throw an exception) on such errors. But as we will see, the answer isn't that clear cut.

Commutative and associative addition?

Typically when we work with integers, we prefer that the ordering of the operands doesn't matter. For example, if we see a + b + c we'd prefer that (a + b) + c is the same as a + (b + c).

Unfortunately, if we trap for overflow this does not hold. Here is an example:

int a = INT_MAX;
int b = 20;
int c = -20;
int d = a + b + c;

If we trap on overflow then (a + b) + c would trap, but a + (b + c) would not. Ooops.

In C the former is even undefined behaviour. Let's pick an unsigned example:

unsigned a = UINT_MAX;
unsigned b = 20;
unsigned c = 20;
unsigned d = a + b - c;

Because overflow is wrapping in C, this is well defined and gives the expected result. In languages where even unsigned overflow is trapped, such as Swift, this will similarly trap or not trap depending on evaluation order. For unsigned we can also design an common overflow (sometimes called an underflow) by having a possible negative intermediate value:

unsigned a = 20;
unsigned b = 0;
unsigned c = 20;
unsigned d = a + b - c;

In this case the overflow will happens if we evaluate a + (b - c). Again this is not a problem in C, but will be a problem if the language traps the overflow.

Trapping overflow fixes actual bugs

So is trapping overflow bad? If they create subtle problems (in particularly in C where it's undefined behaviour), shouldn't we always wrap?

Again, the problem is not as simple as that. With trapping overflow we catch this exploit:

1
2
3

char *ptr = malloc(a * b);
...
ptr[a * i + j + offset] = some_value;

We can see here that if there is no trap on overflow, then we can pick a and b so that the allocated size overflows into a small value, and then some_value actually will be written out of bounds.

(This is from an actual exploited overflow in the wilds.)

Some people will point out that with proper bounds checking then the exploit cannot occur either. But that relies on proper bounds being known. It's probably possible to rewrite the code in the example to use slices (pointer + length) with bounds checking, but in general we can't rely on that to fix all the various overflows.

A detour: mixed signedness

A quick question: "What should the type of an unsigned int added to a signed int be?"

For C the answer is "an unsigned int", in C# the answer is "a long".

The C answer seems bad, surely we want something signed to be safe? But there's a reason to this apparent madness:

unsigned a = INT_MAX + 10U;
int b = -10;
...
int d = a + b;

In the example above, is the result well defined? – Well, yes it is! All the operands are converted into unsigned and then wrapping addition is performed, followed by an implicit cast back to an integer value.

What had happened if C instead had did a cast on the unsigned to a signed integer? Well INT_MAX + 10 results in a large negative number, we then subtract 10 from that value, which results in a value less than INT_MIN, which in C is undefined behaviour due to the overflow.

Because signed ints have undefined behaviour for overflow, it's safer to cast to unsigned in C. Implicit conversion between signed and unsigned representations means that the code looks very simple and is mostly right, even though it does clever things.

So what about languages with trapping overflow?

The usual case is that such languages require explicit casts. Let's try that:

unsigned a = INT_MAX + 10U;
int b = -10;
...
int d = (int)a + b; // BOOM!

Ok, that didn't work. What about this:

unsigned a = INT_MAX + 10U;
int b = -10;
...
int d = (int)(a + (unsigned)b); // Also BOOM!

In the second case the unsigned version of b becomes a large positive number, causing the unsigned add to overflow as well.

If you think about it this is quite natural: unsigned maths uses 2s complement numbers and wrapping overflow to represent negative numbers, so without wrapping behaviour adding negative numbers can't work.

Let's look at another common example, now with unsigned:

1
2
3

unsigned x = 10;
int change = -1;
x = x + change;

Again this "just works" since C gladly converts change to an unsigned value even if it's negative and it just works. With overflow trap on unsigned again it won't work.

Let's look at how we currently could solve this in Zig:

1
2
3

var x : u32 = 10;
var change : i32 = -1;
x = x +% @bitCast(u32, change);

Let's break it down: @bitCast is needed to convert the i32 in the same way (unsigned)change would do in C.

Now once we have this 2s complement unsigned, we request wrapping addition using the +% operator (Zig has wrapping operator counterparts for all arithmetics operations).

Problem solved!

... Or wait? What happened to our overflow check? For example we could set x = 0 and change = -1 and this would happily just work without batting an eye.

The real solution that works and traps overflow looks like this:

1	x = @intCast(u32, @as(i64, x) + @as(i64, change));

So we first go to the larger type (i64) perform the add, then try to narrow that number to an u32, catching any negative numbers or numbers that exceed the maximum of u32.

Now we've come full circle:

Introduce trapping "by default" so it's easy to do the right thing.
Realize that some cases need to circumvent the trapping to be correct.
Find a "correct" solution that is very complicated that people are supposed to follow to do the right thing.

Enter the left hand side

Unfortunately we're not done yet listing the problems, what about this:

1 2	int a = INT_MAX; long long b = a + a / 2;

Is that UB assuming long long is larger than int? In C, sure it is. What the person writing this probably wanted was the following:

1 2	int a = INT_MAX; long long b = (long long)a + (long long)a / 2;

Mostly people don't run into this due to C's promotion rules. For example:

1 2	short a = 0x7FFF; int b = a + a / 2;

- will do "the right thing" on all platforms where the int is at least 32 bits. This is because C implicitly converts all arguments to an int before performing any arithmetic operation.

But as demonstrated by the above examples that only gets you so far. There is a recent language trend to perform arithmetics with the native bit width, leading to new unexpected results (this example is Zig):

1
2
3

var num1 : i8 = 16;
var num2 : i32 = 3;
var res : i32 = num2 + num1 * num1; // BOOM!

Here the multiplication is performed using signed 8 bits, which will overflow on 16 * 16, even it later would have been promoted to i32 for the addition.

These are hard to spot errors that will yield runtime errors but look fine to the compiler.

It would be reasonable that the left hand side guides the type used on the right hand side in the assignment, but that adds complexity that few languages want to take on.

Summary

Overflow traps cause unexpected and hard-to-spot non commutative and associative behaviour in expressions that otherwise would have been fine.
Wrapping behaviour enables buffer overflows and similar exploits.
Overflow traps on unsigned integers makes the mixed signedness case very hard to get right.
In most languages it doesn't matter if the left hand sign can contain the number, what matters are the types on the right hand side, which isn't always intuitive or desired.

In a following post I'll investigate ways to try to improve on the status quo.