xvw.lol

OCaml, modules, and import schemes

This article is a translation, the original version is available here.

The OCaml module system can be intimidating, and it typically involves the use of many keywords—for example, open and include, which allow importing definitions into a module. Since version OCaml 4.08, the open primitive has been generalized to allow the opening of arbitrary module expressions. In this article, we’ll explore how to use this generalization to reproduce a common practice in other languages, what I somewhat pompously call import strategies, to describe patterns like import {a, b as c} from K, without relying on a (sub-)language dedicated specifically to importing.

The generalization of open is documented in the paper "Extending OCaml's open", presented at the OCaml Workshop 2017, and implemented—in OCaml version 4.08, in pull requests #1506 and #2147 (likely followed by a few fixes after merging). This generalization greatly increases the flexibility of the open construct, making it possible to use various tricks to finely control the import of module components into another module.

Some of the techniques shown here are directly adapted from the paper, which, beyond discussing implementation strategies, also explores various use cases—though some of them fall outside the scope of this article, as they don't relate to import strategies.

It's likely that many of the tricks presented here won't become idiomatic in OCaml codebases. In my view, their main purpose is to highlight the increased flexibility of the open primitive, without relying on a dedicated syntactic extension for importing components—while also showcasing a few somewhat far-fetched encodings... just for the sake of demonstration.

Importing Module Components

When describing an OCaml program, one constructs collections of modules that export components (types, submodules, exceptions, and functions). It is therefore crucial to finely control their accessibility from other modules. For this, we have two primitives — open and include — whose difference is subtle. To clearly describe the differences between these two primitives, we will base ourselves on this (somewhat artificial) module, which we will use in the following examples:

module A = struct
  type a = T_a
  let a_a = 1
  let a_b = T_a
  module B = struct
    type b = T_b
    let b_a = T_b
    let b_b = T_a
  end
end
module A : sig
  type a = T_a
  val a_a : int
  val a_b : a
  module B : sig
    type b = T_b
    val b_a : b
    val b_b : a
  end
end

As you may notice, the implementation—and de facto, the signature — of the module is not very interesting; it will only serve to illustrate my point. What we want is to use, in a file c.ml (which will denote the module C), the components described in A.

The first approach, and the most obvious one, is to use their full names (a fully qualified call) by using the module path. For example, let's create a pair of int and A.a:

let c_a = (A.a_a, A.B.b_b)

However, using fully qualified paths can sometimes be tedious (or even completely unreadable). That’s why we’ll look at how to import components from module A into module C. However, since the goal of this article is not to be a tutorial on open and include, but to explore generalized opens to describe import schemes, we won’t go into detail about these two features, which are already thoroughly documented in the language manual — in the sections dedicated to modules, overriding through opens, and generalized opens

Module Opening

The open primitive allows importing the components of a module into another module, without re-exporting them in the current module. For example, let's use open to rewrite the function c_a:

open A

let c_a = (a_a, B.b_b)
val c_a : int * A.a

As we can see from the (inferred) signature, it does not export the components of module A and references the type a by its full path. It would also be possible to open A.B using open A.B or even open B (since A is already open).

In the same way that it's possible to open modules in the implementation, it's also possible to open modules in the signature, allowing shorter paths when referring to types (or modules).

Local module opening

The opening cases we previously observed were global to the module in which they were invoked, which can be somewhat restrictive when you want to open multiple modules, exposing, for example, arithmetic operators. Fortunately, it's possible to open at the expression level, in two different ways:

  • let open Module in ... opens the module locally — lexically scoped — within the following expression block. This is very useful when you want to open a module only within a function;

  • Module.(...) opens the module — also lexically scoped — only within the parentheses. This is very useful when you want to open a module just within a single expression. For example, imagine the Float module exposes a submodule Infix providing the usual arithmetic operators; you could describe a floating-point operation like this: let x = Float.Infix.(1.2 + 3.14 + 1.68).

The absence of local opening can be very limiting. For example, the language F# only allows global openings, which makes defining operators in a dedicated module cumbersome. This encourages the use of operator overloading (or even Statically Resolved Type Parameters), sometimes resulting in considerable complexity.

Module inclusion

The include primitive is very similar to the open primitive except that it — as its name suggests — includes the contents of the targeted module into the body of the module where it is called. For example, if we had used include instead of open in our previous example, let’s observe the impact on the inferred signature:

include A

let c_a = (a_a, B.b_b)
type a = A.a = T_a
val a_a : int
val a_b : a
module B = A.B
val c_a : int * A.a

Even though the signature varies slightly from the one we defined earlier — there are some subtleties regarding the propagation of type and module equalities, described in the section "Recovering the type of a module", related to strengthening — we can see that the contents of module A have been included, added to module C. Unlike opening, it is not possible to do local inclusion, which makes perfect sense because inclusion at a local level would have exactly the same effect as opening.

From my personal experience, I generally identify two specific use cases for using include:

  • Extending an existing module (for example, adding a function to the List module within my project, or to extend the standard library);

  • Including submodules within a parent module. For example, it is quite common that within a module, there are operators (or binding operators) which are often confined in dedicated submodules (usually Infix and Syntax). For API reasons, re-exporting them at the parent module level can be a good idea. This is, in fact, intensively used in Preface.

Inclusions are a powerful tool for extension, but also for code sharing, and there is much to say because it often involves substitution, destructive substitution, or strengthening, which would require writing a dedicated article!

Opening VS Inclusion before OCaml 4.08

Before the merge of the proposal for generalizing opens, there was a significant difference in the usage of open versus include: the parameter each primitive accepted.

  • open took a module path, for example: A or even A.B.C ;

  • include took a module expression, for example: paths like A or A.B.C, but also functor applications like F(X), modules constrained by signatures (M : S), or directly the module body struct ... end.

This difference in flexibility involved rather cumbersome workarounds to achieve the same level of expressiveness for open compared to include. Indeed, to allow open to work with functor applications or constraints, it was necessary to use intermediate modules.

In the case of using a path, the two calls are — in terms of expressiveness — identical, because a path can also be a module expression:

include A.B.C
open A.B.C

However, as soon as we try to open somewhat more complex cases, natively supported by include, we quickly had to introduce intermediate modules:

include F(X)
include (M : S)
include struct
  let x = 10
end
module A = F(X)
open A

module B : S = M
open B

module C = struct
  let x = 10
end
open C

Even if at first glance it may not seem dramatic, introducing intermediate modules requires not exporting them in the signature of the module that opens them (in its mli or signature). Additionally, although open and include are often presented symmetrically, their asymmetry in the parameters they take was regrettable. Fortunately, since version 4.08 (released in June 2019), thanks to the generalization of openings, open now accepts an arbitrary module expression, just like include, allowing us to use it to mimic those import schemes mentioned at the beginning of this article.

A first benefit

The fact that the open primitive can take arbitrary module expressions offers a first benefit — probably trivial if you like writing your module signatures: the definition of local expressions. Indeed, opening a module does not export its contents, so it is possible to very easily define non-exported top-level values by defining them inside an open struct ... end expression. For example:

open struct
  let x = 10
  let y = 20
end
let z = x + y
val z : int

The functions x and y are confined within an open block, so they are not exported, which can be very useful when you want to define values (types or modules) locally. Moreover, since a structure can be constrained by a signature, it is also possible, for example, to encapsulate shared mutable state within the local open, preventing it from escaping the scope of its opening. Here are two examples where it is impossible to modify the reference cell without going through the exported combinators, the first using a constraint, the second using nested local opens:

open (
  struct
    let cell = ref 0
    let incr () = cell := !cell + 1
    let decr () = cell := !cell - 1
  end :
    sig
      val incr : unit -> unit
      val decr : unit -> unit
    end)
open struct
  open struct 
    let cell = ref 0 
  end
  let incr () = cell := !cell + 1
  let decr () = cell := !cell - 1
end

Even though the classic approach used by an OCaml developer is to use signatures for encapsulation concerns, when used as such, the generalized module opening allows you to hide, in a relatively elegant way, certain boilerplate elements necessary for building a module (which should expose a public API via a signature).

Now that we have looked at some examples of using generalized opening, let's see how it makes having a dedicated language for import schemes questionably useful.

Import Schemes

Since modularity has become a cornerstone of software design — JavaScript, for example, has layered many proposals where the modularization and import strategy depends on the framework or build system used — languages like Python and Haskell have introduced features similar to OCaml's open primitive to import components into the current module. Generally, these import directives form a small language of their own, governed by specific rules and grammar. Since open has been generalized in OCaml, it is possible to encode much of the usual import patterns — even though some, like those proposed by Haskell, may be somewhat verbose to encode.

For the example, we will use the following module as the import target:

module Foo : sig
  val x : int
  val y : string
  val z : char
end

However, there is an essential nuance regarding the notion of qualified import: in Haskell, to use a module, it must be imported, whereas in OCaml, it is the build system (compilation scheme) that indicates whether a module is present or not. In our various examples, we will assume that the module Foo, which we described earlier, is present in our compilation scheme. Therefore, for qualified imports — where terms are always prefixed by their module path — no additional ceremony is necessary. It is important to keep in mind that the tricks presented below can be combined to build very specific (and probably unrealistic) import schemes, demonstrating that with a bit of verbosity, the language approach still allows more flexibility than a rigid import DSL.

Unqualified Import

The first directive simply imports the definitions from Foo into the current module, namely the functions x, y, and z:

import * from Foo
open Foo

There is no subtlety here; importing all the terms exposed by Foo simply consists of opening it. There isn't much more to say, as we are not leveraging any particular language subtlety here.

Renamed Qualification

Another common directive consists of renaming the module, for example, importing Foo under the name Bar so that Bar.x, Bar.y, and Bar.z are accessible in the module. For this, one can use type-level module aliases.

import Foo as Bar
open struct module Bar = Foo end

We use the open struct ... end construct to keep our alias hidden from the module's public API. This ensures that the alias doesn’t leak outside the module. However, if the module has an explicit signature, this is less critical, as simply omitting Bar from the signature will prevent it from being exposed.

Presence of the renamed module

Using a module alias leaves the module Foo accessible, and in some cases, we might want to make it inaccessible. The simplest solution is to simply empty the module and, to clarify the error related to its undesired use, we can add an alert indicating that the module has been removed:

open struct
  module Bar = Foo
  module Foo = struct end [@@alert erased]
end

Making the use of the module Foo within the current module impossible by raising an alert. However, since it is common practice to provide module signatures — and thereby control the public API — you will more often encounter renamings like: module Bar = Foo. Furthermore, I am not convinced that restricting access to the original module is truly problematic.

Nested renaming

One might imagine renaming like this: import Foo as Bar.Baz, but OCaml does not allow full path descriptions of the form module Bar.Baz = Foo. Instead, you need to describe the module nesting hierarchy explicitly, like this, making the functions Bar.Baz.x, Bar.Baz.y, and Bar.Baz.z available in the current module:

open struct 
  module Bar = struct 
    module Baz = Foo 
  end 
end

Which, I admit, is a bit verbose, but if for some obscure reason you want to rename an existing module using a composed path, you can do so by explicitly declaring the module hierarchy.

Selective Import

Sometimes, importing the entire module is overkill, and we only want a few components exposed by it. That’s why it’s possible to import just a part of a module. In this example, we want to import only x and y:

import {x, y} from Foo
open struct
  let x = Foo.x
  let y = Foo.y
end

Although this approach is also somewhat verbose, it imports only the functions x and y into the current module. It’s possible to simplify this syntax by using tuples and local openings:

open struct
  let (x, y) = Foo.(x, y)
end

Alternatively, it is also possible to constrain the opening using a signature, which requires specifying the types of the functions to export:

open (Foo : sig
  val x : int
  val y : string end)

Several proposals (#10013 and #11558) have been made to enable the use of let-punning, which would make the syntax less verbose. However, the first proposal dropped punning for module members, and the second is still at the issue stage.

Selective import with renaming

Since the first two proposals leave the user in control of naming (it’s just function redefinition), renaming can be trivially integrated. In this example, we expose x and new_y_name, which calls Foo.y:

import {x, y as new_y_name} from Foo
open struct
  let (x, new_y_name) = Foo.(x, y)
end

Unsurprisingly, renaming is quite straightforward. However, if we wanted to use the signature-based approach, it would require a bit more trickery by combining an open with an include:

open (struct
    include Foo
    let new_y_name = y
  end : sig
    val x : int
    val new_y_name : string end)

However, this last proposal is so verbose that it becomes somewhat irrational — especially compared to the previous one — and I imagine it’s the kind of code you’ll never see — or at least never want to see — in a regular codebase. That said, even though it’s heavy, I find it still quite clearly demonstrates how it is possible to compose the different constructions and tools we’ve seen earlier.

Import by Exclusion

Haskell has a somewhat special import modifier that I hesitated to mention for a long time because I had no idea how to implement it. But once again, thanks to the invaluable help of @octachron and @xhtmlboi, who both gave me roughly the same solution, here it is. This modifier allows importing an entire module except for a list of components. In this example, x and y will be imported because we import the whole Foo module, except for the function z.

import Foo hidding (z)

OCaml does not natively support constructing intersections or differences of modules. The solution proposed by @octachron and @xhtmlboi relies on function rewriting combined with the use of an alert, in a way somewhat similar to what we did to exclude a renamed module. However, before examining their solution, let's take a brief detour into the empty variant.

Empty Variant

In OCaml, it is possible to define a sum type with no constructors using the empty variant, which essentially allows you to describe unrepresentable values. To define it, you simply create a sum type with an empty branch (which, important, is not the bottom type, denoted ):

type empty = |

To convince yourself that the compiler can reject cases containing a value of type empty, you can easily experiment with pattern matching. In the example below, the compiler raises no warnings because the patterns are exhaustive. Since it's impossible to construct a value of type empty (except by cheating, for instance using sorcery like the infamous Obj.magic function), we can refute handling the error case:

let f : ('a, empty) result -> 'a = function
  | Ok x -> x

But in our use case, it’s not the refutation that interests us, rather the fact that it’s impossible to describe a value of type empty, which we can leverage to exclude certain functions.

Function Suppression

The solution proposed to me is to make the functions we want to remove from the module impossible to call. To do this, we will first create a placeholder function that we will use to override an existing function:

type empty = |
let expelled : empty -> unit = fun _ -> ()

At first glance, our expelled function is impossible to call because it requires a value of type empty, which cannot be produced. Therefore, we can include the module we want to refine and then override the functions we want to exclude with our expelled function, associating them with an alert to clarify the error triggered by using a removed function:

open struct
  include Foo
  let (z [@alert expelled]) = expelled
end

And there you have it, we can be pretty sure that using z will cause a compilation error, and compiling a module that uses it will raise a warning. However, the solution is far from perfect because it does not remove the component from the module. To be honest, I've very rarely found myself missing this feature natively. From my perspective, selective importation usually suffices quite well.

Type Anchoring

Before concluding this article, @octachron pointed out to me the partial asymmetry between open and include when it comes to anonymous modules (i.e., module expressions struct ... end). This is an issue I had already encountered theoretically, having attended the May 2023 event of OCaml Users in Paris where Clément Blaudeau gave his talk Retrofitting OCaml Modules, which was a summary of his paper OCaml modules: formalization, insights and improvements.

Since open does not export the opened components, without associating an explicit signature, some expressions cannot be typed. For example:

open struct type t = A end
let x = A

In this example, the type t (and its constructor A) is present in the current scope; however, since it is not exported, it is impossible to correctly type x. If the module had a signature, one could easily realize that there is no acceptable type for x and that one should either change the open directive or avoid exporting x. This issue is known as type anchoring, which is extensively described in the paper cited at the beginning of this section.

Conclusion

I sincerely believe that in daily OCaml use, we very rarely encounter such needs. The goal of this article was essentially to show how to use certain primitives related to the module system, alongside the generalized open feature, to demonstrate that having expressive and composable primitives allows one to reproduce, sometimes trivially (and sometimes less so), common import patterns found in other programming languages. There are probably other fun encodings — likely based on functors — so don’t hesitate to share them with me so I can update this article!

To conclude, I would add that even though I proudly boasted that programming this way in OCaml is uncommon, the existence of packages like ppx_import or ppx_open shows that some syntactic sugar wouldn’t hurt—especially for selective imports.