Skip to content

Pull requests: NVIDIA/TransformerEngine

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

[JAX] Use avg m,n,k heuristics for Grouped GEMM
#2840 opened Apr 6, 2026 by jberchtold-nvidia Loading…
8 of 13 tasks
comm_gemm_test fixes
#2839 opened Apr 6, 2026 by almogsegal Loading…
13 tasks
Add grouped unswizzle functionality for MXFP8 scaling factors
#2837 opened Apr 5, 2026 by int-smart Loading…
8 of 13 tasks
Fix JAX extension build with NVTE_UB_WITH_MPI=1
#2835 opened Apr 4, 2026 by GaetanLepage Loading…
2 of 13 tasks
fix CUDA architectures cmake logic community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#2832 opened Apr 3, 2026 by GaetanLepage Loading…
2 of 13 tasks
Port softmax ops to libtorch stable ABI
#2830 opened Apr 3, 2026 by pstjohn Loading…
Cp thd swa with ag
#2829 opened Apr 3, 2026 by sudhakarsingh27 Draft
13 tasks
[Common] Reduced padding kernel compilation time
#2827 opened Apr 2, 2026 by Oleg-Goncharov Loading…
5 of 13 tasks
fix(CP, MLA): CP works fine with MLA in a2a cp_comm_type community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#2826 opened Apr 2, 2026 by zhujian19891203 Loading…
5 of 13 tasks
[Common] Fix fused router for large top-K and expert counts
#2821 opened Apr 1, 2026 by harryzhou2000 Loading…
7 of 13 tasks
[Pytorch][Common] Hybrid quantization
#2817 opened Mar 31, 2026 by negvet Loading…
1 of 13 tasks
Streamline group Hadamard ComputeKernel loads
#2810 opened Mar 29, 2026 by cael-ling Loading…
5 of 13 tasks
Single __syncthreads per stage in GroupHadamardAmaxTmaKernel
#2809 opened Mar 29, 2026 by cael-ling Loading…
8 of 13 tasks
Precomputed swizzle_idx into group Hadamard ComputeKernel
#2808 opened Mar 29, 2026 by cael-ling Loading…
8 of 13 tasks
[PyTorch][Flash Attn] Add fallback import for FA3
#2806 opened Mar 26, 2026 by eattia-nvidia Loading…
7 of 13 tasks
[PyT] Fix FSDP2 memory leaks for FP8 weight workspaces and transpose caches
#2805 opened Mar 26, 2026 by pstjohn Loading…
3 tasks done
2
3
ProTip! Mix and match filters to narrow down what you’re looking for.