Conversation
There was a problem hiding this comment.
Pull request overview
This pull request adds GPU-accelerated GICP (Generalized Iterative Closest Point) matching using KdTree on CUDA. The implementation provides a GPU-based alternative to the existing CPU-based GICP factor, allowing for faster point cloud registration on systems with CUDA support.
Changes:
- Introduces
KdTreeGPUclass for GPU-accelerated nearest neighbor search using KdTree data structure - Adds
IntegratedGICPFactorGPUfactor class for GPU-accelerated GICP matching cost computation - Implements CUDA kernels for KdTree search, correspondence computation, and GICP derivatives linearization
- Adds comprehensive test coverage for both KdTree GPU functionality and GICP GPU factor integration
- Updates demo and CI configuration to support the new GPU GICP factor
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
include/gtsam_points/ann/kdtree_gpu.hpp |
Header defining KdTreeGPU class with GPU nearest neighbor search interface |
include/gtsam_points/factors/integrated_gicp_factor_gpu.hpp |
Header defining IntegratedGICPFactorGPU class for GPU-accelerated GICP matching |
include/gtsam_points/factors/integrated_gicp_derivatives.cuh |
CUDA header for GICP derivatives computation utilities |
include/gtsam_points/cuda/kernels/kdtree.cuh |
CUDA kernel for KdTree nearest neighbor search on GPU |
include/gtsam_points/cuda/kernels/gicp_derivatives.cuh |
CUDA kernels for computing GICP derivatives and error |
include/gtsam_points/cuda/kernels/correspondence.hpp |
Header defining Correspondence structure for point matching |
src/gtsam_points/ann/kdtree_gpu.cpp |
CPU-side implementation of KdTreeGPU (tree construction, memory management) |
src/gtsam_points/ann/kdtree_gpu.cu |
GPU-side implementation of KdTree nearest neighbor search kernel |
src/gtsam_points/factors/integrated_gicp_factor_gpu.cpp |
Implementation of IntegratedGICPFactorGPU factor methods |
src/gtsam_points/factors/integrated_gicp_derivatives.cu |
Base implementation of GICP derivatives computation |
src/gtsam_points/factors/integrated_gicp_derivatives_inliers.cu |
Inlier correspondence management for GICP |
src/gtsam_points/factors/integrated_gicp_derivatives_linearize.cu |
GICP factor linearization on GPU |
src/gtsam_points/factors/integrated_gicp_derivatives_compute.cu |
GICP error computation on GPU |
src/test/test_kdtree.cpp |
Adds KdTreeGPU unit tests for nearest neighbor search accuracy |
src/test/test_matching_cost_factors.cpp |
Adds GICP_CUDA to integration tests for factor optimization |
src/demo/demo_matching_cost_factors.cpp |
Adds GICP_GPU option to matching cost factors demo |
CMakeLists.txt |
Adds new CUDA source files to build configuration |
.github/workflows/build.yml |
Removes CUDA 12.2 build, keeping 12.5 and 13.1 |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| * @param targfixed_target_pose Fixed target pose | ||
| * @param source_key Source key | ||
| * @param target Target kdtree | ||
| * @param source Source frame |
There was a problem hiding this comment.
Parameter documentation is missing for "target_tree" in the unary constructor. The binary constructor on line 37 documents it, but it's missing here. This inconsistency in documentation should be fixed for clarity.
| * @param targfixed_target_pose Fixed target pose | |
| * @param source_key Source key | |
| * @param target Target kdtree | |
| * @param source Source frame | |
| * @param fixed_target_pose Fixed target pose | |
| * @param source_key Source key | |
| * @param target Target frame | |
| * @param source Source frame | |
| * @param target_tree Target kdtree |
| check_error << cudaMemcpyAsync(nodes, h_nodes.data(), sizeof(KdTreeNodeGPU) * num_nodes, cudaMemcpyHostToDevice, stream); | ||
| } | ||
|
|
||
| KdTreeGPU::~KdTreeGPU() { |
There was a problem hiding this comment.
Inconsistent stream parameter in cudaFreeAsync calls. In the destructor, cudaFreeAsync is called with nullptr for the stream parameter, but these allocations were done with a specific stream in the constructor (line 52-55). The stream should match the allocation stream or be synchronized before freeing. Using nullptr may cause undefined behavior if the allocation stream is still using the memory.
| KdTreeGPU::~KdTreeGPU() { | |
| KdTreeGPU::~KdTreeGPU() { | |
| // Ensure all GPU work that might use 'indices' or 'nodes' has completed | |
| check_error << cudaDeviceSynchronize(); |
| if (stack_size > MAX_STACK_SIZE - 2) { | ||
| printf("kdtree stack overflow!!"); | ||
| } else if (cut_sq_dist < result.second) { | ||
| search_stack[stack_size].first = other_child; | ||
| search_stack[stack_size++].second = cut_sq_dist; | ||
| } | ||
|
|
||
| search_stack[stack_size].first = best_child; | ||
| search_stack[stack_size++].second = 0.0f; |
There was a problem hiding this comment.
Stack overflow handling is incomplete. When stack_size exceeds MAX_STACK_SIZE - 2, a warning is printed but execution continues, potentially writing beyond array bounds on lines 59-60. After printing the warning, the function should either return early with an error result or skip adding to the stack to prevent buffer overflow.
|
|
||
| /** | ||
| * @brief Create a unary GICP_GPU factor between a fixed target pose and an active source pose. | ||
| * @param targfixed_target_pose Fixed target pose |
There was a problem hiding this comment.
Typo in parameter name documentation. "targfixed_target_pose" should be "fixed_target_pose" to match the actual parameter name on line 60.
| * @param targfixed_target_pose Fixed target pose | |
| * @param fixed_target_pose Fixed target pose |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
No description provided.