A look at modules (in general + in the context of C3)

Despite being a general concept, modules are often very different from language to language. One major reason for this is that overall language semantics puts many constraints on how modules may work. However, despite these constraints there is a lot of specific design work required.

I'm going to look at the modules in general and also talk a little about how C3 modules work.

An initial observation

When making a module system one first have to decide whether a module is a separate concept or not. Because if the language has the idea of static variables and functions attached to a type there is actually already a sort of module system present.

Here is a short snippet written in the C2 language to illustrate this:

// File bar.c2
module bar;
// Plain function
func int get_one() {
    return 1;
}  

// File foo.c2
module foo;
import bar;
type Bar struct {
  int x;
}  
// Static function
func int Bar.get_one() {
    return 1;
}

func void test() {
    int a = Bar.get_one();
    int b = bar.get_one();
}

The type here acts as namespace in itself. If we extend the type with static variable we can similarly emulate namespaced global variables.

Most languages with methods on their types gladly accept this ambiguity, but one can draw the conclusion that modules are not needed and only structs are necessary. This is the approach taken by Zig. The downside is that it also leads to counter-intuitive things such as "a file is a struct" and having to explicitly arrange sub-modules in a hierarchy.

The other way to resolve the ambiguity is to have type methods, but abolish static methods and globals. This is the approach of C3. The downside is that some methods that are naturally static, such as Foo.new_instance() or constants Foo.MAX_VALUE can't be expressed.

We can also note that Java, while having "packages" use classes as the primary namespacing mechanism for free functions and constants, which is a bit more relaxed than Zig's approach, since the hierarchy is external.

Sub-modules and paths

Flat vs hierarchal

The module namespace can be flat with a single module name or hierarchal, where modules have sub-modules. While flat modules are nice to work with and easy to implement, there is much more contention for unique names. This can mean that module names may need to have longer names to require uniqueness, e.g. mylib_io for the flat module and mylib::io for the hierarchal. But hierarchal modules in general have an even worse problem with length: e.g. std.debug.print("Hello, world!\n", .{}); (with apologies to Zig).

Aliasing and import

The obvious solutions to long names are aliasing and namespace imports. Here is again a C2 example:

import networking as net; // Aliasing
import filesystem local; // Namespace import


// Equivalent:
doSomething(); // Namespace import
filesystem.doSomething();

// Equivalent:
net.connect(); // Aliased
networking.connect();

The downside of aliasing is that aliases may differ between authors and implementations. So while someone might alias networking to net, someone else uses nw. This together with the difficulty of naming aliases makes it a less attractive solution. Full namespace import avoids naming issues, but makes it much less clear what are local functions and what is implemented elsewhere.

C3 path shortening

C3 has a hierarchal module system but employs path shortening. This is basically that the first part of a module path may be elided: std::net::sockets::new_from_url(url) can be used as sockets::new_from_url(url) as long as it is not ambiguous.

Requiring at least the sub-module name in the path is a design decision to avoid the readability problems mentioned with namespace imports. In the example "new_from_url(url)" on its own lacks the context that the "sockets::" prefix gives.

Surveying other languages it's clear that usually contain sufficient context in their names. For this reason they are exempt from the prefix requirement in C3.

Note how something similar happens in Java in practice: java.math.BigInteger is the import, you then use BigInteger, but call static "functions" namespaced: BigInteger prime = BigInteger.probablePrime(128, rnd);

In the Java case this comes from import java.math.BigInteger being an actual namespace import, but then the classes themselves provide a second layer or namespacing.

Visibility

The other major component to modules is visibility between modules. Note that nothing is saying that explicit imports are necessary: with full paths the correct types, functions and variables may be found anyway.

With "import" statements the most common scheme is this:

Modules not imported: no visibility.
Module imported: public declarations are visible.

Hierarchal visibility

As a complement to the above in hierarchal module systems, a module may see non-public declarations in sub modules and/or parent modules.

The desire to have this feature arise from wanting to separate the visible "api layer" module and the internal "implementation layer" modules that which contains implementation details that may change over time.

The downside of this method for modules to peek into other modules is the need to build this into the hierarchy.

"Friend" visibility

As an alternative to the above hierarchal visibility above is to declare "friend" modules that may access the module. This has fewer constraints than trying to fit modules neatly into the right sort of hierarchy just to get the correct visibility between modules.

There is still the drawback that in order to "friend" another module, the module needs to know of that other module.

Becoming a "friend"

Often the concept of visibility is conflated with some idea of "internal safety": "I make this private to make it safe from other modules". This is trying to interpolate the metaphor too far. Visibility and access modifiers are there to help the user of the types to use / override functionality in the correct way. "Public" communicates that this function is made for general consumption, "private" means internal consumption and it not being part of the surface API of the functionality.

However, if one knows what one is doing then circumventing these protections can be useful. For example:

There may be a bug that can be circumvented by calling private methods.
One may want to exploit the particular functionality of a specific version of a library.
One may want to modify behaviour for some other reason that the author did not foresee.

Often languages have convoluted ways of circumventing visibility in these cases, e.g. calling functions using reflection in Java, just because the need does arise.

The obvious way is then for a module to be able to declare itself the friend of a module. A C3 example:

module test;
fn void fn_private() @private {}

module foo;
import test @public; // Override visibility

fn void main()
{
    // This is not an error due to the "@private" import.
    test::fn_private();
}

We can note that C3 has public by default. It is possible to set a different default:

module test2 @private;

fn void fn_private() {}
fn void fn_public() @public {} // Explicitly needs @public!

Visibility levels

To talk about visibility at all we need at least two levels to differentiate between. Usually these are public and private, where public means visible outside of the module and private being visible only inside of the module.

In fact, we could stop here because this will in most cases be all we need. For this reason there is a possibility to not encode this in a keyword, but in the name itself: Go's "uppercase means public" and Dart's "leading underscore means private" (note: I considered the latter for C3).

Between "private" and "public"

If we want hierarchal visibility, then we need another level above private but below public, indicating that something is available to other modules (below or above) in the hierarchy.

Similarly, for the "friend" module visibility we need a visibility level for this behaviour. As an example Rust has pub(in path) and pub(crate) (although note that both of those are somewhat constrained).

Below "private"

If modules may span multiple source files, there is the possibility of another visibility level, where visibility is restricted to the file with the declaration. This is C's static, Swift's fileprivate and C3's @local (Note: while C3 could have used static for globals and functions, it's a poor name for type visibility. This is why @local was chosen instead).

This is not exhaustive: depending on language features more visibility levels might be possible. For C3 with import @private, having "public", "private" and "local" seems to cover most use cases.

Imports

While imports usually is a good way to determine dependencies, this is not guaranteed. As an example: while most Java programmers may think of Java's import as importing classes, all it actually does is to fold namespaces.

The point here is that while import may roughly correspond to the dependency graph, it's not guaranteed to exactly do so. This means that imports is usually simply a way to limit the pollution of the current namespace.

This is very valuable though, in fact this is a variant of the public / private division: importing is picking a set of modules that can be accessed (= is public to the current module).

Narrow imports

In the Java world, wildcard imports (e.g. import java.util.*) is by tradition considered bad. Instead Java source files often contain a litany of single class imports. This is such a problem that most IDEs offer to both hide the list of imports and manage it for you.

In the Java case the tangible benefit claimed is that if you do something like this:

import java.util.*;
import java.sql.*;

You have problem if you try to use Date since it's now unambiguous.

Having written a lot of Java code that works with the DB I can confidently say that the problem here is not the imports, but the reuse of Date in both Java packages. If the java.sql class had a reasonable name like SqlDate this import would not have been a problem AND there would be no confusion when trying to use a java.util.Date and java.sql.Date in the same code, which happens quite often.

So the fact that the above is touted as a reason just shows how weak the arguments are for narrow imports in Java.

HOWEVER if a language uses import to actually pull in dependencies, then narrow is likely better, but it's important to note that this isn't necessarily the case. It's not true in Java, nor is it true in C3.

No imports?

One might think that dumping all modules in the current namespace would be unworkable, but if we already use the full path to types and functions, there are no ambiguities. Even C3 abbreviated paths work fine in general.

The downside is that now things like code completion is going to match EVERYTHING in all modules, which just makes for a much worse experience. This also affects things like error messages. The imports help the compiler (and an IDE) to make better guesses and in general just be more friendly.

A middle ground

In C3 imports are implicitly wildcard, so import std::io will also import sub modules to std::io. It's also possible to have more than one import in a single row, e.g. import std::io, std::math;. To me this seems like a reasonable compromise.

More controversially, C3 modules will implicitly import parent and child modules. So std::io::socket could implicitly import std::io, std and the child module std::io::socket::channel. I am not sure of this feature and it might go away. That said, because there is no sibling module import (e.g. std::io does not implicitly import std::math), the namespace pollution is still fairly low.

Dependency resolution

If the import does not resolve the actual dependency graph, then all code must be at least parsed and analysed. For the C3 compiler this is not a problem, since lexing, parsing and semantic analysis is a fraction of the total compilation time. However, it's desirable to output only the part of the code that is in use.

Exports

We have one more problem: just because a function is public doesn't mean it should be exported in a library.

We can illustrate this with a simple example: let's say we want to build a simple web scraper which creates a list of all the image URLs on a web page. To do so we use a module which handles http + https and writes a thin layer on top with a single function that takes a string and returns a list or strings with the URLs. In other word, we only have a single function that we want to export.

But if we create a static library with this functionality and naively export the public functions we will get the not just get our single function, but the public functions of the http module as well... plus public functions of anything the http module uses!

While the linker might strip unused code when creating an executable, even in this case we will still generate code that is not used.

Explicit exports

The first necessary feature is to be able to mark functions and globals as being exported. Note that being exported is orthogonal to public / private. Public and private is about source level visibility, and exports is about library and linker visibility.

Because exported functions are usually public, some languages conflate public and export, making export simply a variant of "public". (In C3 the @export makes a function or global exported, it has no effect on visibility between modules).

Entry points => dependency graph

With export we're now able to make a real dependency graph. For a regular executable the main function can be considered the entry point, otherwise we use functions marked export to trace dependencies.

Summary

We have looked how static methods and member overlap with module namespaced functions and globals. This means namespacing can be done with modules, static methods and member or a combination thereof. C3 uses modules only.
Modules may be flat or hierarchal. C3 uses a hierarchal module namespace.
Various methods may be used to reduce repetitive module prefixing. Aliasing namespace inlining are common. C3 uses path shortening.
The simplest visibility semantics only has public and private.
Accessing "private" functions is useful, and there are various solutions.
One method is adding a special visibility level to let a parent or child module access private functions.
Another method is defining what other modules as "friends" to access private functions as if they were public.
C3 allows a module to import private functions of other modules.
C3 has three visibility levels: @public @private and @local. "local" means it is local to the current module section.
Imports can be narrow or wide. C3 prefers wildcard imports. Narrow imports is mostly useful when imports directly can infer the dependency graph.
Exports need to be different from "all of the public functions".
C3 uses @export to mark declarations to export.

If you want to try out C3, you can test it here: https://learn-c3.org.