Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
3d71860
WIP
danieldk May 12, 2026
2440fae
Packages build
danieldk May 12, 2026
e77b0db
Fix linker scripts
danieldk May 12, 2026
ece6926
Copy include directories
danieldk May 12, 2026
459842d
Working toolchain that can compile simple hello world
danieldk May 12, 2026
1c1e276
More linker scripts
danieldk May 12, 2026
0cf5300
bintools -> full toolchain
danieldk May 12, 2026
c96a01c
Experiment with manylinux stdenv
danieldk May 12, 2026
f9b1822
Do not link libstdc++ statically
danieldk May 12, 2026
864def4
Improve compiler wrapping
danieldk May 12, 2026
4e64c56
wrapping improvements
danieldk May 12, 2026
312a037
Make gcc-unwrapped resemble Nix gcc better
danieldk May 13, 2026
68ab50e
Remove compiler bintools symlinks to force use of wrapper
danieldk May 13, 2026
18f1fcc
Vendor glibc 2.28 derivation
danieldk May 13, 2026
fd58d29
Remove initial bits
danieldk May 13, 2026
0f3a084
Make XPU kernels build
danieldk May 13, 2026
a8ae29e
gcc-13-unwrapped
danieldk May 13, 2026
bdec3de
nix fmt
danieldk May 13, 2026
654875b
Generate stdenvs (adding gcc 13)
danieldk May 14, 2026
976efce
Ensure the right stdenv gets used in CUDA build environments
danieldk May 14, 2026
d076c74
Remove static build vestiges, set C++ version correctly for Torch ker…
danieldk May 14, 2026
1c15aca
Add example/test kernel for newer symbols
danieldk May 14, 2026
a333afd
Test C++20 symbols
danieldk May 14, 2026
e6600d3
AlmaLinux script: add version and arch arguments
danieldk May 15, 2026
fea735f
Add linux-aarch64 support
danieldk May 15, 2026
fb3618d
Add meta for manylinux generic builder
danieldk May 15, 2026
faf8e70
Cache manylinux_2_28 stdenvs
danieldk May 15, 2026
f1b8bc7
Cleanups
danieldk May 15, 2026
b01069f
Better names for the kernel to test recent symbols
danieldk May 15, 2026
b25e990
Remove old glibc 2.27 bits
danieldk May 15, 2026
f77f65e
Experiment with glibc with Red Hat patches
danieldk May 15, 2026
bb24364
Revert "Experiment with glibc with Red Hat patches"
danieldk May 15, 2026
bbe89a7
Hack for sycl-tla, need to fix by shuffling things around
danieldk May 15, 2026
a46eaa9
Cleaner
danieldk May 15, 2026
8e189b3
Fix test name
danieldk May 15, 2026
cba1f0d
Try to fix build issues
danieldk May 16, 2026
efaeafa
Bump up Torch version of the test container to 2.11.0
danieldk May 16, 2026
32992e5
manylinux_2_28 design docs
danieldk May 17, 2026
fb19800
Remove glibc 2.27 dependency
danieldk May 17, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/build_kernel.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ jobs:
name: built-kernels-${{ matrix.arch }}
path: |
activation-kernel
cpp20-symbols-kernel
cutlass-gemm-kernel
cutlass-gemm-tvm-ffi-kernel
extra-data
Expand Down
4 changes: 4 additions & 0 deletions docs/source/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -66,3 +66,7 @@
- local: builder-cli
title: Builder CLI Reference
title: CLI Reference
- sections:
- local: builder/design-nix-builder
title: Nix Builder
title: Design
106 changes: 106 additions & 0 deletions docs/source/builder/design-nix-builder.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
# Nix Builder design

## Introduction

kernel-builder uses a Nix-based builder that orchestrates the build. The Nix
builder provides:

- Reproducible evaluation. The same Nix builder version will always produce
the same derivations (build recipes).
- Largely reproducible builds by using a build sandbox that only has the
dependencies specified in a derivation.
- Seamless creation of different build environments (e.g. different Torch
and CUDA combinations).

## Kernel build steps

A kernel derivation builds a kernel in the following steps:

1. Generate CMake files for the kernel using
`kernel-builder create-pyproject`.
2. Generate Ninja build files using CMake.
3. Build the kernel using Ninja.
4. Perform various checks on the compiled kernel, such as:
- Verify that the kernel only uses ABI3/`manylinux_2_28` symbols.
- Verify that the kernel can be loaded by the `kernels` Python package.
5. Strip runpaths (ELF-embedded library directories) from kernel binaries
to make the kernel distribution-independent.

## manylinux_2_28 compatibility

To achieve `manylinux_2_28` compatibility, kernels are built using a
toolchain similar to the `manylinux_2_28` Docker images. This toolchain
is based on the gcc toolsets from AlmaLinux 8. `manylinux_2_28` [uses
AlmaLinux 8 as its base](https://github.com/pypa/manylinux#manylinux_2_28-almalinux-8-based),
so we have to compile against the same glibc/libstdc++ versions to
ensure compatibility.

We repackage the AlmaLinux 8 toolsets and libstdc++ as Nix derivations (see
the `nix-builder/packages/manylinux_2_28` source directory). Then we merge
various toolset packages to an unwrapped gcc that resembles unwrapped gcc in
nixpkgs. Finally, we wrap binutils and gcc to combine them into a stdenv.

The stdenv does not reuse glibc from AlmaLinux, since its dynamic loader has
hardcoded FHS paths (`/lib64` etc.) that are not valid in Nix. Using this
dynamic loader results in linking errors, since the paths in the dynamic
loader are used as a last resort (to link glibc libraries). So, instead we
build our own glibc 2.28 package and use that.

## The package set pattern

We repackage various existing package sets as Nix derivations. For instance,
this is done for ROCm, XPU, and manylinux_2_28 packages. These package sets
all follow the same pattern:

```nix
{
lib,
callPackage,
newScope,
pkgs,
}:

{
packageMetadata,
}:

let
inherit (lib.fixedPoints) extends composeManyExtensions;

fixedPoint = final: {
inherit lib;
};
composed = lib.composeManyExtensions [
# Base package set.
(import ./components.nix { inherit packageMetadata; })

# Package-specific overrides.
(import ./overrides.nix)

# Additional overlays that extend the package set.
(import ./some-overlay.nix)
];
in
lib.makeScope newScope (lib.extends composed fixedPoint)
```

We use a fixed point to build up the package set as a list of
[overlays](https://nixos.org/manual/nixpkgs/stable/#sec-overlays-definition).
This has various benefits. For instance, it allows us to refine the
package set incrementally and we can refer to the final versions of
packages in intermediate overlays.

The package sets all use a similar list of overlays:

- An initial overlay (`components.nix`) that applies a generic builder
to the package set metadata. The metadata typically comes from a Yum/DNF
repository that contains RPM packages.The generic builder will extract the
RPMs and move binaries, libraries, and headers to the right location. This
results in a set of Nix derivations that may or may not build.
- The next overlay (`overrides.nix`) fixes up derivations created by the
generic builder that do not build. Fixing the derivations typically consists
of adding missing dependencies and changing embedded FHS paths to Nix store
paths.
- Additional overlays with derivations that combine outputs from previous
overlays. One typical example are derivations that construct a full compiler
toolchain.
16 changes: 16 additions & 0 deletions examples/kernels/cpp20-symbols/build.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
[general]
name = "cpp20-symbols"
version = 1
license = "Apache-2.0"
backends = ["cpu"]

[torch]
src = [
"torch-ext/torch_binding.cpp",
"torch-ext/torch_binding.h",
]

[kernel.cpp20_symbols_cpu]
backend = "cpu"
depends = ["torch"]
src = ["cpu/cpu.cpp"]
18 changes: 18 additions & 0 deletions examples/kernels/cpp20-symbols/cpu/cpu.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
#include <array>
#include <charconv>
#include <stdexcept>

#include <torch/all.h>

// std::to_chars(char*, char*, double) is a floating-point overload that
// requires GLIBCXX_3.4.29, introduced in GCC 11. We use this to verify
// that manylinux_2_28 kernels build correctly: the Red Hat toolset
// statically links the newer libstdc++ symbols that exceed the system
// GLIBCXX_3.4.25 ceiling of AlmaLinux 8 / RHEL 8.
torch::Tensor float_to_chars(torch::Tensor const &input) {
std::array<char, 32> buf;
auto [ptr, ec] = std::to_chars(buf.begin(), buf.end(), input.item<double>());
if (ec != std::errc{})
throw std::runtime_error("to_chars failed");
return input;
}
17 changes: 17 additions & 0 deletions examples/kernels/cpp20-symbols/flake.nix
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
{
description = "Flake for invalid-cpp-manylinux-symbols kernel";

inputs = {
kernel-builder.url = "path:../../..";
};

outputs =
{
self,
kernel-builder,
}:
kernel-builder.lib.genKernelFlakeOutputs {
inherit self;
path = ./.;
};
}
Empty file.
17 changes: 17 additions & 0 deletions examples/kernels/cpp20-symbols/tests/test_cpp20_symbols.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
import cpp20_symbols
import pytest
import torch


@pytest.mark.kernels_ci
def test_float_to_chars_runs():
x = torch.tensor([3.14], dtype=torch.float64)
out = cpp20_symbols.float_to_chars(x)
torch.testing.assert_close(out, x)


@pytest.mark.kernels_ci
def test_float_to_chars_float32():
x = torch.tensor([2.71828], dtype=torch.float32)
out = cpp20_symbols.float_to_chars(x)
torch.testing.assert_close(out, x)
10 changes: 10 additions & 0 deletions examples/kernels/cpp20-symbols/torch-ext/cpp20_symbols/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
import torch

from ._ops import ops


def float_to_chars(input: torch.Tensor) -> torch.Tensor:
return ops.float_to_chars(input)


__all__ = ["float_to_chars"]
13 changes: 13 additions & 0 deletions examples/kernels/cpp20-symbols/torch-ext/torch_binding.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
#include <torch/library.h>

#include "registration.h"
#include "torch_binding.h"

TORCH_LIBRARY_EXPAND(TORCH_EXTENSION_NAME, ops) {
ops.def("float_to_chars(Tensor input) -> Tensor");
#if defined(CPU_KERNEL)
ops.impl("float_to_chars", torch::kCPU, &float_to_chars);
#endif
}

REGISTER_EXTENSION(TORCH_EXTENSION_NAME)
10 changes: 10 additions & 0 deletions examples/kernels/cpp20-symbols/torch-ext/torch_binding.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#pragma once

#include <torch/torch.h>

// Uses std::to_chars for floating-point, which requires GLIBCXX_3.4.29
// (introduced in GCC 11). We use this to verify that manylinux_2_28
// kernels build correctly: the Red Hat toolset statically links the newer
// libstdc++ symbols that exceed the system GLIBCXX_3.4.25 ceiling of
// AlmaLinux 8 / RHEL 8.
torch::Tensor float_to_chars(torch::Tensor const &input);
7 changes: 6 additions & 1 deletion examples/kernels/flake.nix
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
inherit (kernel-builder.inputs.nixpkgs) lib;

cudaVersion = "cu126";
torchVersion = "210";
torchVersion = "211";
tvmFfiVersion = "01";

# All example kernels to build in CI.
Expand All @@ -32,6 +32,11 @@
drv =
sys: out: out.packages.${sys}.redistributable.${"torch${torchVersion}-cxx11-${cudaVersion}-${sys}"};
}
{
name = "cpp20-symbols-kernel";
path = ./cpp20-symbols;
drv = sys: out: out.packages.${sys}.redistributable.${"torch${torchVersion}-cxx11-cpu-${sys}"};
}
{
name = "relu-tvm-ffi-kernel";
path = ./relu-tvm-ffi;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,6 @@ define_gpu_extension_target(
USE_SABI 3
WITH_SOABI)

if(NOT (MSVC OR GPU_LANG STREQUAL "SYCL"))
target_link_options(${OPS_NAME} PRIVATE -static-libstdc++)
endif()

if(GPU_LANG STREQUAL "SYCL")
target_link_options(${OPS_NAME} PRIVATE ${sycl_link_flags})
target_link_libraries(${OPS_NAME} PRIVATE dnnl)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,6 @@ target_compile_definitions(${OPS_NAME} PRIVATE
"-DTVM_FFI_EXTENSION_NAME=${OPS_NAME}")
tvm_ffi_configure_target(${OPS_NAME})

if(NOT (MSVC OR GPU_LANG STREQUAL "SYCL"))
target_link_options(${OPS_NAME} PRIVATE -static-libstdc++)
endif()

if(GPU_LANG STREQUAL "SYCL")
target_link_options(${OPS_NAME} PRIVATE ${sycl_link_flags})
target_link_libraries(${OPS_NAME} PRIVATE dnnl)
Expand Down
6 changes: 5 additions & 1 deletion kernel-builder/src/pyproject/templates/utils.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -612,7 +612,11 @@ function (define_gpu_extension_target GPU_MOD_NAME)
endif()
endif()

set_property(TARGET ${GPU_MOD_NAME} PROPERTY CXX_STANDARD 20)
if (TORCH_VERSION VERSION_LESS 2.12.0)
set_property(TARGET ${GPU_MOD_NAME} PROPERTY CXX_STANDARD 17)
else()
set_property(TARGET ${GPU_MOD_NAME} PROPERTY CXX_STANDARD 20)
endif()

target_compile_options(${GPU_MOD_NAME} PRIVATE
$<$<COMPILE_LANGUAGE:${GPU_LANGUAGE}>:${GPU_COMPILE_FLAGS}>)
Expand Down
4 changes: 2 additions & 2 deletions nix-builder/lib/build.nix
Original file line number Diff line number Diff line change
Expand Up @@ -73,10 +73,9 @@ rec {
);
extraDeps =
let
inherit (import ./deps.nix { inherit lib pkgs torch; }) resolveCppDeps;
kernelDeps = lib.unique (lib.flatten (lib.mapAttrsToList (_: kernel: kernel.depends) kernels));
in
resolveCppDeps kernelDeps;
extension.resolveCppDeps kernelDeps;

# Use the mkSourceSet function to get the source
src = mkSourceSet path;
Expand Down Expand Up @@ -366,6 +365,7 @@ rec {
++ pythonCheckInputs ps
++ [
buildSet.torch
kernels
pip
pytest
]
Expand Down
7 changes: 6 additions & 1 deletion nix-builder/lib/cache.nix
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,10 @@
buildSetOutputs =
buildSet:
with buildSet.pkgs;
let
isLinux = stdenv.hostPlatform.isLinux;
cudaSupport = config.cudaSupport;
in
(
allOutputs buildSet.torch
++ lib.concatMap allOutputs buildSet.extension.extraBuildDeps
Expand All @@ -27,7 +31,8 @@
++ allOutputs python3.pkgs.kernels
++ allOutputs python3.pkgs.tvm-ffi
++ allOutputs ruff
++ lib.optionals stdenv.hostPlatform.isLinux (allOutputs stdenvGlibc_2_27)
++ lib.optionals (isLinux && cudaSupport) (allOutputs manylinux_2_28.cudaBackendStdenv)
++ lib.optionals (isLinux && !config.cudaSupport) (allOutputs manylinux_2_28.stdenv)
# Only works on recent CUDAs.
++ lib.optionals (!python3.pkgs.nvidia-cutlass-dsl.meta.broken) (
allOutputs python3.pkgs.nvidia-cutlass-dsl
Expand Down
7 changes: 5 additions & 2 deletions nix-builder/lib/deps.nix
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
{
lib,
pkgs,
stdenv,
torch,
}:

Expand All @@ -27,9 +28,11 @@ let
"torch" = [
torch
];
"sycl_tla" = [ torch.xpuPackages.sycl-tla ];
"sycl_tla" = [
(torch.xpuPackages.sycl-tla.override { inherit stdenv; })
];
"metal-cpp" = [
pkgs.metal-cpp.dev
(pkgs.metal-cpp.override { inherit stdenv; }).dev
];
};

Expand Down
Loading
Loading