C3»Blog
Christoffer Lernö

Last time we were looking at this example:

macro int testmacro(int x)
{
  int z = 2;
  for (int i = 0; i < x; i++)
  {
    z *= 2;
  }
  return z;
}     

fn int test(int y)
{
  int z = getValue(y);
  int x = @testmacro(z);
  return z + x;
}  

In our previous solution, we had the variables in an array, where each scope would keep track of the current and last local. Before entering the testmacro, this list is [int y, int z, int x], but entering the macro we would get [int y, int z, int x, int x, int z]. Which would mean shadowing.

A naive solution would be name mangling, let's say macro names are prefixed with something: [int y, int z, int x, int _testmacro$x, int _testmacro$z. Our lookup must then:

  1. Lookup with the macro prefix.
  2. If not found, lookup without the macro prefix, but in this case only accept globals.

Aside from not actually solving later problems, it's complex for no real benefit, because we can essentially insert a sentinel in the list: [int y, int z, int x, SENTINEL, int x, int z].

Now when we scan back we always stop at the sentinel value. This means that entering the macro scope we simply push the sentinel value on the stack of locals (this is not the only way to introduce the same effect, but it's the simplest version to explain). When looking up locals in the array we can now stop as soon as we reach either the first element OR the sentinel value.

Problem solved?

Resolution without hierarchies

If your macro resolution only takes values, then this solution is sufficient. However, often we want to use macros to provide an expression that only conditionally is evaluated. In C3 we use # in front of the variable identifier to indicate an unresolved expression.

macro foo(#expr)
{
  return #expr * #expr;
}
fn int test(int z)
{
  return @foo(call(z));
  // => return call(z) * call(z);
}

Now we're running into problems. Both z and call should be resolved in the test function scope. Ooops.

What happens if we tag the #expr with the current scope? This seems like it could work, but in C3, like with GCC statement expressions, we can introduce new variables.

macro int two_times(#expr)
{
  int w = 1;
  #expr;
  #expr;
  return w;
}
  
fn void test2(int z)
{
  @two_times({| int y = z; call(y); |});
}  

So we go into two_times with [int z], then add w for [int z, SENTINEL, int w]. Now when we evaluate two_times we would like something like this: [int z, int y, SENTINEL, int w]. That is, we slip in a new scope in the function scope, and not in the macro scope we pushed.

Trying a hack

What we might realize here is that if we evaluate expr just to the declaration before entering, so that all declarations ar resolved, we might just get the behaviour we want. So something like this:

  1. Enter test2 scope
  2. Push z
  3. Start evaluating the macro call.
  4. Take the macro call argument and only check the declarations.
  5. Enter expr scope
  6. Push y
  7. Resolve z
  8. Resolve y
  9. Pop expr scope
  10. Pass in this pre-checked expression into the macro.
  11. Enter the two_times scope
  12. Push w
  13. Copy #expr and insert it.
  14. Evaluate #expr - which will not need a lookup
  15. Copy #expr and insert it.
  16. Evaluate #expr - which will not need a lookup
  17. Lookup w
  18. Pop the macro scope
  19. Pop the test2 scope

This scheme looks like it would work, but there are questions: what if the declarations inside should not be resolved the same way twice? What if the expr instead looks like:

@two_times({|
  $if (@some_compile_time_macro(...)):
    int y = 0;
  $else:
    int z = 0;
  $endif;
  $if ($defined(y)):
    y = 1;
  $endif;
|});  

Here it's not clear that two invocations of the same expr will even lower to the same declarations! So we can't do the lookup ahead of time.

The alternative is to completely evaluate expr, not just the declarations. It's a possible solution, but the corner cases with this approach are hard to foresee.

Summary

If our macros only take values then we can retain a simple model for symbol lookup using a single stack. However, if we can provide expressions or even statements, then these need to not only resolve symbols in the original scope but also possibly introduce them. Pre-checking expressions do not work well with compile time evaluation, since they may change every evaluation.

But maybe there is some way to salvage the model? We'll look at that next.

Christoffer Lernö

The problem: implementing lexical scope in a programming language.

Here is an example.

fn void test(int y)
{
  int x = 123;
  for (int i = 1; i < y; i++)
  {
    int z = foo(i + x); // The first z
    if (z > 100) x++;    
  }
  // The second z
  int z = x + 1; 
  // z += i; <- Error since i is not in scope 
}    

Formulated like this, the problem is fairly simple. One solution is like this:

Decl *locals[MAX_LOCALS];
Scope scopes[MAX_SCOPES];
Scope *next_scope = &scopes[0];
Decl **next_local = &locals[0];

void push_scope()
{
  *next_scope = { .local_start = next_local, .local_end = next_local };
  // TODO Check we didn't reach max 
  next_scope--;
}
void pop_scope()
{
  // Move the current local to the start.
  next_scope--;
  next_local = next_scope.local_start;
}
      
void push_local(Decl *decl)
{
  // TODO check we didn't reach max
  *next_local = decl;
  next_local++;
  next_scope[-1].local_end = next_local;
}  

This solution isn't the simplest we could do, but it has some advantages knowing both the end and the beginning of the locals. Doing a lookup we do a linear search back from the last local:

Decl *find_decl(const char *interned_name)
{
  Decl **current = next_local;
  while (current != &locals[0])
  {
    current--;
    if (current[0].name == interned_name) return current[0];
  }
  return NULL;
}  

This might not seem super fast, but variables are often used near their definition and the number of locals are usually fairly limited anyway.

Roughly this is what happens in this algorithm as it goes through the code above:

1. Push function scope

scope: [{ 0, 0 }]
decl: []

2. The param is added

scope: [{ 0, 1 }]
decl: [int y]

3. Push int x

scope: [{ 0, 2 }]
decl: [int y, int x]

4. Push "for" scope

scope: [{ 0, 2 }, { 2, 2 }]
decl: [int y, int x]

5. Push int i

scope: [{ 0, 2 }, { 2, 3 }]
decl: [int y, int x, int i]

6. Push "for body" scope

scope: [{ 0, 2 }, { 2, 3 }, { 3, 3 }]
decl: [int y, int x, int i]

7. Push int z

scope: [{ 0, 2 }, { 2, 3 }, { 3, 4 }]
decl: [int y, int x, int i, int z]

8. Pop "for body" scope

scope: [{ 0, 2 }, { 2, 3 }]
decl: [int y, int x, int i]

9. Pop "for" scope

scope: [{ 0, 2 }]
decl: [int y, int x]

10. Push int z

scope: [{ 0, 3 }]
decl: [int y, int x, int z]

11. Pop function scope

scope: []
decl: []

It's fairly straightforward to see how - after defining z - we can lookup x and i in step 7.

This model is simple and easy to understand – but we skipped something, didn't we? Yes, the lookup of the function foo. In this scheme, anything not found in the locals, will then first look through the current file, then the module and then finally any imports. (This is not how C works, but that is because in C everything is just a single flat file in the end, so we just "look through" the file).

Macros

So far so good, and if this was all we had, the story would end here. But there's an added complexity. What if we have macros?

Here is a simple example:

macro int testmacro(int x)
{
  int z = 2;
  for (int i = 0; i < x; i++)
  {
    z *= 2;
  }
  return z;
}     

fn int test(int y)
{
  int z = getValue(y);
  int x = @testmacro(z);
  return z + x;
}  

Assuming we want hygienic macros we cannot simply copy the macro into the function, names and all:

// Broken version
fn int test(int y)
{
  int z = getValue(y);
  int x = z;
  int z = 2;
  for (int i = 0; i < x; i++)
  {
    z *= 2;
  }
  int _macro_result = z;
  int x = _macro_result;
  return z + x;
}  

In C3 there is an expression block, working similar to a statement expression in GCC C. If this was used together with allowing shadowing, we'd get something that halfway worked:

// Works but...
fn int test(int y)
{
  int z = getValue(y);
  int x = {|
    int x = z;
    int z = 2;
    for (int i = 0; i < x; i++)
    {
      z *= 2;
    }
    return z;
  |};
  return z + x;
}  

This seems to work, but what if we wrote this incorrect macro:

macro testmacro(int x)
{
  return y * x;
}  

If we fold that:

fn int test(int y)
{
  int z = getValue(y);
  int x = {|
    int x = z;
    return y * x;
  |};
  return z + x;
}

So now y is leaking into the macro, which is implicitly capturing y. This is not what we want (and as we'll see later this is just the tip of the iceberg).

Next I'll show a way to extend our simple scope system to make sure that the macros are indeed hygenic, and show other things that breaks.

Christoffer Lernö

Skejeton has started doing the Advent of Code 2021 challenges in C3.

Have a look at the repo here.

Christoffer Lernö

I wrapped up the bitstruct code last week, and together with that removed the virtual type. Happily this make the language nearly feature complete for 1.0.

This also means I finally have the means to write a fairly solid "what's different from C" list. It's not written in stone, and some things may change, but since the dev from now on is going to be about fleshing out existing features rather than adding new features, any major new features are unlikely.

So without further ado, here's the list of changes from C:

  • Module system
  • Optional contracts
  • Semantic macros
  • Templates through generic modules
  • Subarrays (slices)
  • Foreach
  • Distinct types (similar to typedef but the type is distinct)
  • Compile time evaluation
  • Compile time reflection of types
  • Defer
  • Arrays as values
  • Struct sub-typing (similar to embedded structs in Go)
  • Built-in SIMD vectors
  • Overloadable foreach, allowing types to define custom foreach.
  • Less permissive implicit type conversions and safer widenings
  • Subarray assign/set (e.g. foo[1..3] = 3)
  • Language support for error values
  • Type methods (dot-syntax invocation)
  • Implicit deref on . (removes ->)
  • Bitstructs (well-defined bit packing)
  • Expression blocks (similar to GCC statement expressions)
  • Enum associated values
  • Opt-in structural typing
  • Integer types have well defined bit width
  • 2cc, 4cc, 8cc literals
  • Base64 and hex literals
  • Signed overflow is wrapping
  • Most C UB moved to "implementation defined behaviour"
  • Signed integers are 2s complement
  • Typesafe varargs
  • "Any" type
  • Fewer operator precedence levels
  • Build system included

And yes, with some exceptions you can play with these features today.

Christoffer Lernö

Recently was thinking about Java and reflection and how it actually ended up causing the proliferation of "enterprise-y frameworks" written in the language.

Java reflection & serialization

Java early on had built-in serialization. It was very generic but could serialize basically anything in the object graph. The output was also very verbose.

This functionality sort of set the bar for what was to come. So when there came more libraries that wanted to serialize and deserialize with Java, then everyone said "oh, yeah this is cool, it's better than what's built in!".

And then someone said "hey, now that we have this easy serialization and deserialization - guess what, we can do it from config files, that way we get more flexibility!" – And people were at this point already accepting that they were serializing and deserializing from some generic format that wasn't really tailored for their code. So what could go wrong with a generic configuration that wasn't tailored for their code?

Next you got the proliferation of configs everywhere, where you have to mess around with config files in Java because no one builds APIs to actually programmatically configure things.

– And this just comes from the easy accessibility of reflection and serialization in early Java.

Contrast this to C, where you don't have all those tools and have to build your config readers by hand. – So you'll again start shopping for libraries, but since there is no vertically integrated stack that does everything from reading your config to assembling your objects, odds are that you'll at least limit yourself to what you need. Heck, you might even build it yourself.

The tragedy of Objective-C

A similar thing occurred with ObjC in the iPhone gold rush. A lot of Java/C++ programmers arrived to the platform, and in Java/C++, OO is usually objects all the way down. – Because hey there's no problem doing that and in fact that's how things are built with OO in those languages: just split things into increasingly finer grained classes and objects.

ObjC was intended to be used in a different way though. ObjC OO is C with an OO layer for interop. You're supposed to write 95+% C with ObjC as the high level "script-like" glue, you don't have to be afraid of C structs or enums.

Also, writing ObjC classes takes more time than doing the same with Java/C++, and again that wasn't a problem because you weren't supposed to use ObjC classes in the same way.

So then all the C++/Java folks showed up and boy did they moan about how hard ObjC was to use... well because they tried to use 5% C and 95% OO. The result was both slow to write and slow to execute. Basically they were trying to write Java in Objective–C.

Now what did Apple do?

  1. Automated refcounting - don't let the poor Java developers think about memory management of their gazillion objects, where before (in canonical ObjC projects) memory management had been a very small concern (objects were rarely allocated/deallocated as they were large scale structures in sane ObjC programs) this became a huge problem with Java style OO.
  2. Properties - if you use ObjC classes instead of normal C structs where you should be using C structs, then it's a pain writing accessors. So add code to automate that to allow people to easier write bad ObjC code in a Java style!
  3. Since these devs continued to complain, introduce Swift, which was advertised as "ObjC without the C" but was "C++ without the C". Swift allowed devs to work in a C++/Java way the way they were used to. Swift was also a hugely complex language that was slower than ObjC and lacked a lot of good stuff that ObjC had, but hey, you can't have everything, right?

Where did it go wrong?

It seems that as soon as you create an alternative that is easier than doing it in some other way, people will flock around that alternative, even if it is isn't very good practice in the long run (like the built in Java serialization). And in fact trying to "fix" that to make the wrong choice less problematic is just legitimizing the wrong choice - like Apple did when they made it easier to write Java-style Objective-C.

As a language designer, one often runs into "that's easy to add" sort of features. But it seems to me that one has to be careful to not make things easy people aren't suppose to use.

If I add objects and classes to C3, but say "I only include this for doing things that really works well with OO, like GUI toolkits", then should I really blame people coming form an OO background that they start building everything with objects? If I make it just as easy as building with the preferred procedural style?

Conclusion

I don't really like opinionated languages, but I also realize that one does have some responsibility to make things easy that should be used, and hard if they shouldn't be used (often).

As an example, let's say there are two ways to solve a problem in a language: A and B. If A is the best way (maintainability, fit with the other language mechanisms etc), then it should always be easier to do, even if it that means deliberately making B harder than it needs to be. "Making B easier because it has very low cost for me to implement" is not a neutral act of design, but an implicit endorsement.

The user will look at the programming language and think that what is the easiest thing to do is the best thing to do, and violating that is really letting the users down.