Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
157 changes: 157 additions & 0 deletions content/posts/2025-08-gsoc-modules-driver.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
---
author: "Naveen Seth Hanig"
date: "2025-08-31"
tags: ["clang", "modules"]
title: "GSoC 2025 - Support simple C++20 modules use from the Clang driver without a build system"
---

Hi, my name is Naveen. For Google Summer of Code 2025, I’ve been working on adding native support for simple Clang and C++20 named module use from the Clang driver.
This post outlines the project and its current status.

My mentor for this project was Michael Spencer.

## Background

Modules solve many of the long-standing problems with the traditional header-based way of sharing code.

They prevent leaking macros, let you explicitly choose what to export, and can improve compile time at scale.

However, because modules must be precompiled before use, builds that rely on them need to schedule compilations in the right order, according to their imports.

At the moment, Clang’s driver lacks native support to do this, which makes even simple tests or tiny programs using modules hard to compile without first setting up a build system.

## Goals

The goal of this project is to extend the build system in Clang's driver to natievly support simple use of Clang or C++20 named modules, by integrating Clang's existing support for module dependency scanning.

This should also support importing the C++ Standard library modules `std` and `std.compat`, and add no overhead to cases where modules are not used.

With the feature fully implemented, the following example should compile without any issue:

```bash
clang++ -std=c++23 main.cpp A.cpp -fmodules-driver -fmodules -fmodule-map-file=module.modulemap
```

```cpp
// main.cpp
#include "MyLib.h"
import std;
import A;

auto main() -> int {
std::println("{}", make_greeting("modules"));
std::println("The answer is: {}", get_answer());
}
```

```cpp
// A.cpp
export module A;
import std;

export auto make_greeting(std::string_view Name) -> std::string {
return std::format("Hello, {}!", Name);
}
```

```cpp
// module.modulemap
module MyLib {
header "MyLib.h"
export *
}
```

```cpp
// MyLib.h
auto get_answer() -> int { return 42; }
```

Although one of the main advantages of modules is that they can be precompiled once and reused for future uses, support for caching was not included in the scope of this GSoC project.

## Design Overview & Challenges

{{< figure src="/img/gsoc2025-modules-driver-design.webp" alt="Modules Driver Design">}}

### 1. Check: Enable the Modules Driver?

Once stabilized, the driver-managed module build feature should be enabled automatically for compilations which make use of C++20 named modules and have two or more source inputs.
To detect named‑module usage without adding noticeable overhead to compilations that don’t use modules, we added a fast scanner that inspects the leading lines of a source input for named module directives only.

We measured compile times for building Clang with the check enabled and disabled, and profiled it using perf. The benchmarks show that the check typically makes up less than 0.1% of total compile time ([full benchmarks](https://github.com/naveen-seth/llvm-dev-cxx-modules-check-benchmark)).

### 2. Modules Driver Logic

At a high level, the modules driver logic can be summarized as simple as:
(1) scan, (2) plan the build order, and (3) reorder/modify the jobs.

Some parts introduced unique challenges:

#### Handling of the Standard library modules.

Clang’s dependency scanning tooling uses the generated `-cc1` compilation job's command lines as input. Since we can’t know in advance whether a standard library module will be needed, we always build the jobs for `std` and `std.compat`. During the scanning phase, we only scan those standard library modules if a command-line source input depends on them. If a standard library module ends up unused, we drop its job and carefully remove its outputs from the linker command line.

#### Propagating scan diagnostics

After completing the dependency scan, we want to forward all diagnostics generated by the scan through the driver's diagnostics engine.
Because those diagnostics are generated outside the driver’s own invocation, they become invalid once the scan process ends.
To prevent this, we serialize each scan diagnostic into an intermediate representation and deserialize it back into a regular diagnostic before emitting it.

## Outcome

The current upstream draft for this feature (see [here](https://github.com/llvm/llvm-project/pull/156248)) can successfully compile examples that use both C++20 named modules and Clang modules. Importing Standard Library modules is also supported, and the example above compiles without issues.

Additionally, the module dependency graph can be emitted as a diagnostic remark (using `-Rmodules-driver`) in the form of a DOT graph:

{{< figure src="/img/gsoc2025-modules-driver-graphviz.svg" alt="Modules Driver Graph Remark">}}

Regular translation units are able to import both Clang and C++20 named modules. However, importing a Clang module into a C++20 named module interface unit, or vice versa, is not yet supported.

## Future work

While basic examples using modules compile correctly, there are still many command-line options and input configurations that are incompatible or may break the modules driver in unexpected ways. In the near term, I plan to fix the draft's remaining quirks, land this feature, and make it more robust.

In addition, the modules driver should gain support for caching precompiled module files, since caching is one of the core strengths of modules and makes up for the initial overhead they add.

In the longer term, support for imports between different kinds of module units should also be added. Because of the current design of Clang’s dependency scanning tooling, however, allowing C++20 named modules to be imported into Clang modules would require deeper architectural changes.

## Acknowledgements

I’d like to thank my mentor, Michael Spencer, for his invaluable help and guidance, as well as the GSoC and LLVM project admins for making this experience possible.

## Links to all PRs and RFCs

Over the course of this project, I've submitted the following PRs and RFCs:

<details>
<summary> Contributions </summary>

**Project related**

- [#156248](https://github.com/llvm/llvm-project/pull/156248) Add initial support for driver-managed module builds.
- [#155450](https://github.com/llvm/llvm-project/pull/155450) Relocate previous work due to changes in the modules driver design.
- [#149900](https://github.com/llvm/llvm-project/pull/149900) Adds scanner to detect C++20 module usage.
- [#148674](https://github.com/llvm/llvm-project/pull/148674) Fixes a lexing error in the dependency scanning tooling's scanner.
- [#148685](https://github.com/llvm/llvm-project/pull/148685) Fixes a lexing error in the dependency scanning tooling's scanner.
- [#152811](https://github.com/llvm/llvm-project/pull/152811) Allow GraphWriter specialization required for DOT graph remark.
- [#145857](https://github.com/llvm/llvm-project/pull/145857) Adds support for a test for `clang-scan-deps` on Windows.
- [#145221](https://github.com/llvm/llvm-project/pull/145221) Adds C++20 named module outputs to the scanning format `experimental-full` to enable combined scanning of both module kinds.
- [#143950](https://github.com/llvm/llvm-project/pull/143950) Implements P2223R2 for the dependency scanning tooling's scanner.
- [#142455](https://github.com/llvm/llvm-project/pull/142455) (NFC) Moves argument handling: Driver::BuildActions -> handleArguments
- [#155523](https://github.com/llvm/llvm-project/pull/155523) (NFC) Removes dead code in the dependency scanning tooling.

**Misc. contributions**

- [#145243](https://github.com/llvm/llvm-project/pull/145243) Implements P2223R2 for `clang-format`
- [#141230](https://github.com/llvm/llvm-project/pull/141695) Fixes crash related to octal floating-point literals
- [#139457](https://github.com/llvm/llvm-project/pull/139457) Fixes crash related to command line handling of source locations.

</details>

<details>
<summary> RFCs </summary>

- [RFC: Support simple C++20 modules use from the Clang driver without a build system](https://discourse.llvm.org/t/rfc-modules-support-simple-c-20-modules-use-from-the-clang-driver-without-a-build-system/86456?u=naveen-seth)
- [RFC: Link the Driver against clangDependencyScanning](https://discourse.llvm.org/t/rfc-driver-link-the-driver-against-clangdependencyscanning-clangast-clangfrontend-clangserialization-and-clanglex/87469?u=naveen-seth)
</details>

Binary file added static/img/gsoc2025-modules-driver-design.webp
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
67 changes: 67 additions & 0 deletions static/img/gsoc2025-modules-driver-graphviz.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.