On arithmetics and overflow

Christoffer Lernö
It is generally understood that overflowing an add or a multiplication is usually a bad thing. This seems to imply that the solution is to detect and trap (quit the program or throw an exception) on such errors. But as we will see, the answer isn't that clear cut.

Commutative and associative addition?


Typically when we work with integers, we prefer that the ordering of the operands doesn't matter. For example, if we see a + b + c we'd prefer that (a + b) + c is the same as a + (b + c).

Unfortunately, if we trap for overflow this does not hold. Here is an example:

1
2
3
4
int a = INT_MAX;
int b = 20;
int c = -20;
int d = a + b + c;


If we trap on overflow then (a + b) + c would trap, but a + (b + c) would not. Ooops.

In C the former is even undefined behaviour. Let's pick an unsigned example:

1
2
3
4
unsigned a = UINT_MAX;
unsigned b = 20;
unsigned c = 20;
unsigned d = a + b - c;


Because overflow is wrapping in C, this is well defined and gives the expected result. In languages where even unsigned overflow is trapped, such as Swift, this will similarly trap or not trap depending on evaluation order. For unsigned we can also design an common overflow (sometimes called an underflow) by having a possible negative intermediate value:

1
2
3
4
unsigned a = 20;
unsigned b = 0;
unsigned c = 20;
unsigned d = a + b - c;


In this case the overflow will happens if we evaluate a + (b - c). Again this is not a problem in C, but will be a problem if the language traps the overflow.

Trapping overflow fixes actual bugs



So is trapping overflow bad? If they create subtle problems (in particularly in C where it's undefined behaviour), shouldn't we always wrap?

Again, the problem is not as simple as that. With trapping overflow we catch this exploit:

1
2
3
char *ptr = malloc(a * b);
...
ptr[a * i + j + offset] = some_value;


We can see here that if there is no trap on overflow, then we can pick a and b so that the allocated size overflows into a small value, and then some_value actually will be written out of bounds.

(This is from an actual exploited overflow in the wilds.)

Some people will point out that with proper bounds checking then the exploit cannot occur either. But that relies on proper bounds being known. It's probably possible to rewrite the code in the example to use slices (pointer + length) with bounds checking, but in general we can't rely on that to fix all the various overflows.

A detour: mixed signedness



A quick question: "What should the type of an unsigned int added to a signed int be?"

For C the answer is "an unsigned int", in C# the answer is "a long".

The C answer seems bad, surely we want something signed to be safe? But there's a reason to this apparent madness:

1
2
3
4
unsigned a = INT_MAX + 10U;
int b = -10;
...
int d = a + b;


In the example above, is the result well defined? – Well, yes it is! All the operands are converted into unsigned and then wrapping addition is performed, followed by an implicit cast back to an integer value.

What had happened if C instead had did a cast on the unsigned to a signed integer? Well INT_MAX + 10 results in a large negative number, we then subtract 10 from that value, which results in a value less than INT_MIN, which in C is undefined behaviour due to the overflow.

Because signed ints have undefined behaviour for overflow, it's safer to cast to unsigned in C. Implicit conversion between signed and unsigned representations means that the code looks very simple and is mostly right, even though it does clever things.

So what about languages with trapping overflow?

The usual case is that such languages require explicit casts. Let's try that:

1
2
3
4
unsigned a = INT_MAX + 10U;
int b = -10;
...
int d = (int)a + b; // BOOM!


Ok, that didn't work. What about this:

1
2
3
4
unsigned a = INT_MAX + 10U;
int b = -10;
...
int d = (int)(a + (unsigned)b); // Also BOOM!


In the second case the unsigned version of b becomes a large positive number, causing the unsigned add to overflow as well.

If you think about it this is quite natural: unsigned maths uses 2s complement numbers and wrapping overflow to represent negative numbers, so without wrapping behaviour adding negative numbers can't work.

Let's look at another common example, now with unsigned:

1
2
3
unsigned x = 10;
int change = -1;
x = x + change;


Again this "just works" since C gladly converts change to an unsigned value even if it's negative and it just works. With overflow trap on unsigned again it won't work.

Let's look at how we currently could solve this in Zig:

1
2
3
var x : u32 = 10;
var change : i32 = -1;
x = x +% @bitCast(u32, change);


Let's break it down: @bitCast is needed to convert the i32 in the same way (unsigned)change would do in C.

Now once we have this 2s complement unsigned, we request wrapping addition using the +% operator (Zig has wrapping operator counterparts for all arithmetics operations).

Problem solved!

... Or wait? What happened to our overflow check? For example we could set x = 0 and change = -1 and this would happily just work without batting an eye.

The real solution that works and traps overflow looks like this:

1
x = @intCast(u32, @as(i64, x) + @as(i64, change));


So we first go to the larger type (i64) perform the add, then try to narrow that number to an u32, catching any negative numbers or numbers that exceed the maximum of u32.

Now we've come full circle:

  1. Introduce trapping "by default" so it's easy to do the right thing.
  2. Realize that some cases need to circumvent the trapping to be correct.
  3. Find a "correct" solution that is very complicated that people are supposed to follow to do the right thing.


Enter the left hand side


Unfortunately we're not done yet listing the problems, what about this:

1
2
int a = INT_MAX;
long long b = a + a / 2;


Is that UB assuming long long is larger than int? In C, sure it is. What the person writing this probably wanted was the following:

1
2
int a = INT_MAX;
long long b = (long long)a + (long long)a / 2;


Mostly people don't run into this due to C's promotion rules. For example:

1
2
short a = 0x7FFF;
int b = a + a / 2;


- will do "the right thing" on all platforms where the int is at least 32 bits. This is because C implicitly converts all arguments to an int before performing any arithmetic operation.

But as demonstrated by the above examples that only gets you so far. There is a recent language trend to perform arithmetics with the native bit width, leading to new unexpected results (this example is Zig):

1
2
3
var num1 : i8 = 16;
var num2 : i32 = 3;
var res : i32 = num2 + num1 * num1; // BOOM!


Here the multiplication is performed using signed 8 bits, which will overflow on 16 * 16, even it later would have been promoted to i32 for the addition.

These are hard to spot errors that will yield runtime errors but look fine to the compiler.

It would be reasonable that the left hand side guides the type used on the right hand side in the assignment, but that adds complexity that few languages want to take on.

Summary



  1. Overflow traps cause unexpected and hard-to-spot non commutative and associative behaviour in expressions that otherwise would have been fine.
  2. Wrapping behaviour enables buffer overflows and similar exploits.
  3. Overflow traps on unsigned integers makes the mixed signedness case very hard to get right.
  4. In most languages it doesn't matter if the left hand sign can contain the number, what matters are the types on the right hand side, which isn't always intuitive or desired.


In a following post I'll investigate ways to try to improve on the status quo.
I would like to see a language that uses following rules:
* there are no signed or unsigned int types, just int
* all basic operators convert the arguments to CPU-word-sized ints, do signed arithmetic on them, and trap on overflow
* assignment implicitly narrows, and also traps on overflow
* wrapping, unsigned, and lower- or higher-width operations are specified with alternative operators or intrinsics, rather than through the type system

These rules appear to solve all the problems you mentioned, without sacrificing convenience or readability.
notnullnotvoid

there are no signed or unsigned int types, just int


In some cases you'd actually want unsigned ints, especially when dealing with external systems (e.g. file formats), that can contain very big unsigned numbers.

notnullnotvoid

assignment implicitly narrows, and also traps on overflow


I think that somewhat breaks static type safety (although postpones it until runtime, when the bug may or may not happen, and you may or may not see it). Since you already mentioned that all arithmetic will be upconverted to register size, why not make user explicit cast once. Just once. It doesn't make the code too cluttered

notnullnotvoid

wrapping, unsigned, and lower- or higher-width operations are specified with alternative operators or intrinsics, rather than through the type system


I like that, I believe zig has %+ operator for that.

I plan on making similar system for integer types for my language :D

I don't see how having an int type with no sign, which you can do either signed or unsigned operations on, would make dealing with unsigned ints in file formats more difficult.

How would it "break static type safety"? As far as I can see, it doesn't push any errors to runtime that aren't already at runtime, and it doesn't prevent catching any of the errors that you want to catch. The only downside that I can see is you sometimes will trap overflow on assignment of an expression instead of earlier in the expression.

I can see how requiring an explicity cast on assignment can catch some mistakes at compile time where you narrow when you don't want to, but I struggle to think of a time when I ever would have benefited from that, so I don't think it would be worth the boilerplate. Especially since, if people have to write the same boilerplate over and over all the time, they eventually stop thinking about it and will likely make the same mistakes on autopilot that the cast was supposed to save them from anyway.

I like Zig's % operators as well, but I think they would be better if they were extended to also communicate signedness/size, and were used alongside the widening/narrowing rules that I described.
> I don't see how having an int type with no sign, which you can do either signed or unsigned operations on, would make dealing with unsigned ints in file formats more difficult.

I don't get it. The bits can be encoded as one or the other not both right?
notnullnotvoid

there are no signed or unsigned int types, just int


So like Java? There are obviously advantages to this, but it also means that some arithmetic must be done on signed values even though they are actually unsigned. This is why Java has both two shifting operators – so that you can emulate unsigned types.


all basic operators convert the arguments to CPU-word-sized ints, do signed arithmetic on them, and trap on overflow


There are often two useful sizes on a 64 bit CPU: 32 and 64 bit. Using 32 bits we gain double the throughput compared to 64 on vectorization. Which one would you pick?

If there is no trapping overflow or the top bit is not set, then we can pretend unsigned and signed are interchangeable. If none is true, then we run into the risk of triggering overflow on add, so we can't use signed as if they were unsigned. For this reason Java specifies wrapping behaviour for its integers. So basically if we have trapping overflow we cannot get by with only signed. Either remove the trap or add an unsigned type.
ionut
I don't get it. The bits can be encoded as one or the other not both right?


They actually can be encoded as both, because in two's complement encoding (which is what every CPU you will ever ship code on uses), signed and unsigned numbers are interchangeable. This is why casting between signed and unsigned integers is known to be "free" in C++, in the sense that it doesn't actually generate any machine instructions. The CPU doesn't care whether you're storing a signed or unsigned int, the bits are all the same, it's only the edge case behavior of certain math operations that changes. That's what makes the idea of an int which is neither signed nor unsigned make sense all the way down to the machine code level.

lerno
So basically if we have trapping overflow we cannot get by with only signed. Either remove the trap or add an unsigned type.


Nope. The whole point is that you actually don't need signed and unsigned types, only signed and unsigned operations. If you read my original post, you will see that I am not suggesting having signed ints!

lerno
Using 32 bits we gain double the throughput compared to 64 on vectorization. Which one would you pick?


Also using 16 bits gets you double the throughput of 32, and using 8 gets double the the throughput of 16. When vectorizing code, I think it makes sense to specify the size of operations explicitly.

In a sense, I'm suggesting to explicitly specify the width of all scalar int operations as well, it's just that full-width operations are priveleged with operators while lesser widths presumably get stuck with intrinsics.
notnullnotvoid
Nope. The whole point is that you actually don't need signed and unsigned types, only signed and unsigned operations. If you read my original post, you will see that I am not suggesting having signed ints!


If you do not have a signedness, how can you determine trapping in an addition, subtraction and multiplication? The equivalence between signed and unsigned in 2s complement is limited to non-trapping operations. Also, signed and unsigned division works differently, so for division you explicitly need to know the type. Or do you propose that the programmer would have two division operators? That seems a bit onerous.

Maybe you need to provide some examples, because I have some difficulties understanding how your proposal could work.

In regards to vectors, I'm talking about auto-vectorization done by optimizing compilers. Those are guided by the semantics the backend can infer, and if it is hard to determine such a width, they won't happen. That is what I'm pointing at.
lerno
If you do not have a signedness, how can you determine trapping in an addition, subtraction and multiplication?


Different operators, as I've already said multiple times. Like what zig provides for wrapping (i.e. non-trapping) arithmetic, but extended to cover signedness as well.

I never rely on auto-vectorization because existing compilers almost never do it in practice, and when they do it's usually low-quality. So it's not really something I would be concerned with, unless I was designing a language specifically to generate vector code, in which case I would want to do a lot of things differently in general.
Do you propose something like a +signed b? Maybe you could show an example.