Replaced unsafe ptr logic with chained split_at_mut in DenseMatrix an… by jackpots28 · Pull Request #371 · smartcorelib/smartcore

jackpots28 · 2026-05-20T20:42:22Z

…d DenseMatrixMutView

Fixes #368

Checklist

[ x ] My branch is up-to-date with development branch.
[ x ] Everything works and tested on latest stable Rust.
[ x ] Coverage and Linting have been applied

Current behaviour

Changing DenseMatrix and DenseMatrixMutView that uses unsafe ptr arithmetic for generating mutable iterators

New expected behavior

DenseMatrix and DenseMatrixMutView continue to produce mutable iterators, but uses a "head" / "tail" method + split_at_mut function for iterating

no test changes required

Change logs

Replaced unsafe pointer arithmetic in DenseMatrix / DenseMatrixMutView mutable iterators with a safe, chained split_at_mut implementation to ensure memory safety without performance loss.

…d DenseMatrixMutView

Mec-iS · 2026-05-20T21:41:22Z

thank you!

please run cargo fmt --all -- --check and the other checks needed.

codecov · 2026-05-20T21:45:37Z

Codecov Report

❌ Patch coverage is 43.24324% with 63 lines in your changes missing coverage. Please review.
✅ Project coverage is 44.34%. Comparing base (70d8a0f) to head (5714314).
⚠️ Report is 17 commits behind head on development.

Files with missing lines	Patch %	Lines
src/linalg/basic/matrix.rs	42.72%	63 Missing ⚠️

Additional details and impacted files

@@               Coverage Diff               @@
##           development     #371      +/-   ##
===============================================
- Coverage        45.59%   44.34%   -1.26%     
===============================================
  Files               93       94       +1     
  Lines             8034     8105      +71     
===============================================
- Hits              3663     3594      -69     
- Misses            4371     4511     +140

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…ies_encoder

jackpots28 · 2026-05-21T02:03:04Z

I updated the formatting and caught the issue with sort_by that clippy complained about. Test coverage is still slightly lower "src/linalg/basic/matrix.rs: 225/364" | "59.72% coverage, 5238/8771 lines covered", should I add some tests for more coverage?

Mec-iS · 2026-05-21T11:15:00Z

I updated the formatting and caught the issue with sort_by that clippy complained about. Test coverage is still slightly lower "src/linalg/basic/matrix.rs: 225/364" | "59.72% coverage, 5238/8771 lines covered", should I add some tests for more coverage?

adding more tests on the methods affected by changes is always welcome 🐳 thanks. at your convenience.

…tView - src/linalg/basic/matrix.rs: 251/364 +7.14% tarpaulin check

jackpots28 · 2026-05-21T20:12:50Z

I updated the formatting and caught the issue with sort_by that clippy complained about. Test coverage is still slightly lower "src/linalg/basic/matrix.rs: 225/364" | "59.72% coverage, 5238/8771 lines covered", should I add some tests for more coverage?

adding more tests on the methods affected by changes is always welcome 🐳 thanks. at your convenience.

Rolled through and updated matrix.rs to include 33 new cases

genefold-ai

The security goal is right and the overall direction is solid — removing unsafe pointer arithmetic is a clear win for long-term maintainability. Two runtime bugs need to be fixed before merging (see inline comments marked 🔴), and there are three medium-priority cleanup items (🟡) that would meaningfully improve performance and reduce duplication. Happy to discuss any of these — great contribution.

genefold-ai · 2026-05-22T10:24:48Z

+        // Collect all mutable references up-front using split_at_mut so
+        // that the resulting iterator owns no borrow of "self.values"
+
+        match (column_major, axis) {


🔴 [HIGH] Potential panic: col_slice[..nrows] when stride < nrows

In DenseMatrixMutView, when the view has a stride smaller than nrows (e.g. a sub-view into a larger matrix where padding or offset applies), split_at_mut(stride) will produce a col_slice shorter than nrows, causing col_slice[..nrows] to panic at runtime.

The original unsafe implementation handled this correctly via direct offset arithmetic; this safe version does not inherit that safety.

Please add an explicit guard before slicing:

assert!( col_slice.len() >= nrows, "iter_mut: stride ({stride}) < nrows ({nrows}): view layout is inconsistent" ); for elem in col_slice[..nrows].iter_mut() {

Or alternatively, use .get_mut(..nrows) and propagate/handle the None case rather than panicking silently later.

genefold-ai · 2026-05-22T10:24:59Z

        let nrows = self.nrows;
        let ncols = self.ncols;
-        let ptr = self.values.as_mut_ptr();



🔴 [HIGH] Subtraction underflow: ncols - 1 panics (debug) or wraps (release) when ncols == 0

The expression if _c == ncols - 1 is evaluated without first checking whether ncols is zero. When ncols == 0:

In debug mode: panics with a subtraction overflow.

In release mode: wraps around to usize::MAX, causing the branch condition to never match and likely producing an incorrect or out-of-bounds slice.

This can be triggered by a zero-column DenseMatrixMutView (e.g. a view into an empty region). The same issue exists symmetrically for nrows - 1 in the row-major cases.

Please add an early guard at the top of the method:

if ncols == 0 || nrows == 0 { return Box::new(std::iter::empty()); }

This also simplifies all subsequent arithmetic by making zero-size cases unreachable.

genefold-ai · 2026-05-22T10:25:14Z

+                let mut indexed: Vec<(usize, &'b mut T)> = by_col
+                    .into_iter()
+                    .enumerate()
+                    .map(|(flat_col_idx, r)| {


🟡 [MEDIUM] Triple Vec allocation in Case A is redundant — refs can be eliminated

In Case A (column-major, row-by-row) the code allocates by_col, then indexed, then a final refs Vec that is immediately consumed by into_iter():

let mut refs: Vec<&'b mut T> = Vec::with_capacity(total); refs.extend(indexed.into_iter().map(|(_, r)| r)); Box::new(refs.into_iter())

The final refs is a pure copy of the indexed values with keys stripped. It can be removed entirely:

Box::new(indexed.into_iter().map(|(_, r)| r))

This saves one Vec allocation and one full iteration pass per iter_mut call. The same pattern appears in Case D of this method and in both Cases A and D of DenseMatrix::iterator_mut — four sites in total should be cleaned up.

genefold-ai · 2026-05-22T10:25:30Z

@@ -143,81 +143,135 @@ impl<'a, T: Debug + Display + Copy + Sized> DenseMatrixMutView<'a, T> {
    }


🟡 [MEDIUM] Duplicated split_at_mut chunking logic across cases — extract a private helper

Cases B and C in DenseMatrixMutView::iter_mut, and their analogues in DenseMatrix::iterator_mut, share nearly identical code for iterating over a strided slice with split_at_mut:

let mut remaining: &'b mut [T] = self.values; for _c in 0..ncols { let col_end = if _c == ncols - 1 { remaining.len() } else { stride }; let (col_slice, tail) = remaining.split_at_mut(col_end); for elem in col_slice[..nrows].iter_mut() { refs.push(elem); } remaining = tail; }

This 9-line block is repeated (with minor parameter changes) four times across the two impls. Consider extracting a private helper:

/// Collects mutable references to the first `take` elements of each /// chunk of size `chunk` from `slice`, advancing through the tail. fn collect_strided_mut<'a, T>( slice: &'a mut [T], chunks: usize, chunk_size: usize, take: usize, ) -> Vec<&'a mut T> { let mut refs = Vec::with_capacity(chunks * take); let mut remaining = slice; for i in 0..chunks { let end = if i == chunks - 1 { remaining.len() } else { chunk_size }; let (head, tail) = remaining.split_at_mut(end); refs.extend(head[..take].iter_mut()); remaining = tail; } refs }

This reduces duplication, makes the zero-size guard easier to add in one place, and makes the bounds check (head.len() >= take) easier to enforce uniformly.

genefold-ai · 2026-05-22T10:25:44Z

+
+            // Case D: row-major, col-by-col
+            (false, _) => {
+                let total = nrows * ncols;


🟡 [MEDIUM] O(n log n) sort for cross-axis traversal — consider direct index computation

Cases A and D both use this pattern to reorder elements for cross-axis iteration:

let mut indexed: Vec<(usize, &'b mut T)> = by_col .into_iter() .enumerate() .map(|(flat_col_idx, elem)| { let c = flat_col_idx / nrows; let r = flat_col_idx % nrows; (r * ncols + c, elem) }) .collect(); indexed.sort_unstable_by_key(|(idx, _)| *idx);

This performs an O(n log n) sort on n = nrows * ncols elements just to reorder them into row-major (or col-major) access order. For large matrices this is a meaningful overhead compared to the original O(n) pointer arithmetic.

For DenseMatrix (contiguous, non-strided allocation), the same reordering can be done directly using chunks_mut with a transposition index formula, which is O(n):

// Case A: column-major, iterate row-by-row (no sort needed) // Conceptually: output[r * ncols + c] = col_major_data[c * nrows + r] // This is just a transpose read-order, achievable with two nested iterators.

If a sort is kept for correctness in the strided DenseMatrixMutView case, please add a comment explaining why direct index computation is not viable there.

Mec-iS · 2026-05-22T10:27:34Z

@jackpots28 please see the automated code-review by @genefold-ai and feel free to decide which patches to apply

Mec-iS · 2026-05-22T12:06:27Z

This one more highlighted by another automated code-review (may overlap the previous ones):

🔴 [HIGH] Bug 1: Subtraction underflow when ncols == 0 or nrows == 0

File: src/linalg/basic/matrix.rs — both DenseMatrixMutView::iter_mut and DenseMatrix::iterator_mut
The expression _c == ncols - 1 (and symmetrically _r == nrows - 1 in the row-major cases) is evaluated without
checking whether ncols or nrows is zero:

for _c in 0..ncols {
    let col_end = if _c == ncols - 1 {  // panics when ncols == 0
        remaining.len()
    } else {
        stride
    };

When ncols == 0:
• Debug builds: Panics with attempt to subtract with overflow.
• Release builds: Wraps to usize::MAX, the condition never matches, and split_at_mut(usize::MAX) panics or produces
garbage.

This was correctly pointed out by the genefold-ai review. Add an early guard:

if ncols == 0 || nrows == 0 {
    return Box::new(std::iter::empty());
}

About the tests:
| 🟢 Low | Missing edge case tests | Add tests for empty/1×N/N×1 cases |

Also I am trying https://docs.rs/cargo-crap/latest/cargo_crap/ to analyse complexity/risk patterns.

Mec-iS · 2026-05-22T12:27:18Z

Not too bad

$ cargo crap --workspace --lcov lcov.info --summary
warning: 13 source files had no matching entry in the LCOV report — verify your --lcov path or coverage tool configuration:
/smartcore/src/dataset/boston.rs
/smartcore/src/dataset/breast_cancer.rs
/smartcore/src/dataset/diabetes.rs
/smartcore/src/dataset/digits.rs
/smartcore/src/dataset/generator.rs
/smartcore/src/dataset/iris.rs
/smartcore/src/dataset/mod.rs
/smartcore/src/linalg/ndarray/matrix.rs
/smartcore/src/linalg/ndarray/vector.rs
/smartcore/src/model_selection/hyper_tuning/grid_search.rs
/smartcore/src/readers/csv.rs
/smartcore/src/readers/error.rs
/smartcore/src/readers/io_testing.rs
Per-crate summary:
┌───────────┬───────────┬────────┐
│ Crate     ┆ Functions ┆ Crappy │
╞═══════════╪═══════════╪════════╡
│ smartcore ┆       917 ┆     16 │
└───────────┴───────────┴────────┘
✗ Analyzed: 917 · Crappy: 16 (threshold 30) · Worst: CategoricalNBDistribution::eq (CRAP 182.0)

Mec-iS · 2026-05-22T12:33:43Z

this is a problem, can't merge with this as it breaks lazy evaluation:

1. Significant Performance Regression — Eager Collection

File: src/linalg/basic/matrix.rs
Lines: Both iter_mut implementations
The original code returned lazy iterators using flat_map and pointer arithmetic. The new implementation eagerly
collects all mutable references into a Vec before returning an iterator:

// New implementation pattern
for i in 0..cols {
    // ... split_at_mut logic ...
    refs.extend(col_slice.iter_mut());  // Collects all refs upfront
}
Box::new(refs.into_iter())  // Returns iterator over collected Vec

Problems:
• Memory overhead: Allocates a full Vec of ncols mutable references even for simple iteration
• Latency: Must traverse entire matrix before yielding first element
• Breaks iterator composability: Callers expecting lazy evaluation (e.g., .take(5).collect()) now pay full cost

Recommendation: Use flat_map with split_at_mut to maintain lazy evaluation:

fn iter_mut<'b>(&'b mut self, axis: u8) -> Box<dyn Iterator<Item = &'b mut T> + 'b> {
    if axis == 0 {
        Box::new(
            self.values
                .chunks_mut(self.stride)
                .take(ncols)
                .flat_map(move |col| col[..nrows].iter_mut())
        )
    } else {
        // Similar for axis == 1
    }
}

also code duplication:

3. Code Duplication Across 4 Cases

File: src/linalg/basic/matrix.rs
Lines: DenseMatrixMutView::iter_mut and DenseMatrix::iterator_mut
Both methods have nearly identical logic for handling:
• Column-major vs row-major storage
• Axis 0 (row-wise) vs axis 1 (column-wise) iteration
• split_at_mut chunking with special case for last column

This results in ~200 lines of duplicated logic.
Recommendation: Extract a shared helper function:

/// Build mutable iterator over matrix elements in specified axis
fn build_mut_iter<'b, T>(
    values: &'b mut [T],
    nrows: usize,
    ncols: usize,
    stride: usize,
    column_major: bool,
    axis: u8,
) -> Box<dyn Iterator<Item = &'b mut T> + 'b> {
    // Shared implementation
}

jackpots28 · 2026-05-22T13:38:17Z

this is a problem, can't merge with this as it breaks lazy evaluation:

1. Significant Performance Regression — Eager Collection

File: src/linalg/basic/matrix.rs

Lines: Both iter_mut implementations

The original code returned lazy iterators using flat_map and pointer arithmetic. The new implementation eagerly

collects all mutable references into a Vec before returning an iterator:
// New implementation pattern

for i in 0..cols {

    // ... split_at_mut logic ...

    refs.extend(col_slice.iter_mut());  // Collects all refs upfront

}

Box::new(refs.into_iter())  // Returns iterator over collected Vec
Problems:

• Memory overhead: Allocates a full Vec of ncols mutable references even for simple iteration

• Latency: Must traverse entire matrix before yielding first element

• Breaks iterator composability: Callers expecting lazy evaluation (e.g., .take(5).collect()) now pay full cost

Recommendation: Use flat_map with split_at_mut to maintain lazy evaluation:
fn iter_mut<'b>(&'b mut self, axis: u8) -> Box<dyn Iterator<Item = &'b mut T> + 'b> {

    if axis == 0 {

        Box::new(

            self.values

                .chunks_mut(self.stride)

                .take(ncols)

                .flat_map(move |col| col[..nrows].iter_mut())

        )

    } else {

        // Similar for axis == 1

    }

}
also code duplication:

3. Code Duplication Across 4 Cases

File: src/linalg/basic/matrix.rs

Lines: DenseMatrixMutView::iter_mut and DenseMatrix::iterator_mut

Both methods have nearly identical logic for handling:

• Column-major vs row-major storage

• Axis 0 (row-wise) vs axis 1 (column-wise) iteration

• split_at_mut chunking with special case for last column

This results in ~200 lines of duplicated logic.

Recommendation: Extract a shared helper function:
/// Build mutable iterator over matrix elements in specified axis

fn build_mut_iter<'b, T>(

    values: &'b mut [T],

    nrows: usize,

    ncols: usize,

    stride: usize,

    column_major: bool,

    axis: u8,

) -> Box<dyn Iterator<Item = &'b mut T> + 'b> {

    // Shared implementation

}

Yeah, I'm working on the deduplication via some single collector logic to use across the four instances instead. Also got curious and started into some bigger matrices testing, found what you stated with degradation since it does the full collect. I'll work through it!

Replaced unsafe ptr logic with chained split_at_mut in DenseMatrix an…

64b90c5

…d DenseMatrixMutView

jackpots28 requested a review from Mec-iS as a code owner May 20, 2026 20:42

jackpots28 added 2 commits May 20, 2026 20:43

Fixed formatting issue with a single-line vec

2cdc24d

Clippy didn't like sort_by; Update to use sort_by_key in preproc: ser…

87ba688

…ies_encoder

Added 13 test containing 33 new cases for DenseMatrix / DenseMatrixMu…

2aff9da

…tView - src/linalg/basic/matrix.rs: 251/364 +7.14% tarpaulin check

Failed to fix fmt before pushing

5714314

genefold-ai suggested changes May 22, 2026

View reviewed changes

		@@ -143,81 +143,135 @@ impl<'a, T: Debug + Display + Copy + Sized> DenseMatrixMutView<'a, T> {
		}

Conversation

jackpots28 commented May 20, 2026

Checklist

Current behaviour

New expected behavior

Change logs

Uh oh!

Mec-iS commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jackpots28 commented May 21, 2026

Uh oh!

Mec-iS commented May 21, 2026

Uh oh!

jackpots28 commented May 21, 2026

Uh oh!

genefold-ai left a comment

Choose a reason for hiding this comment

Uh oh!

genefold-ai May 22, 2026

Choose a reason for hiding this comment

Uh oh!

genefold-ai May 22, 2026

Choose a reason for hiding this comment

Uh oh!

genefold-ai May 22, 2026

Choose a reason for hiding this comment

Uh oh!

genefold-ai May 22, 2026

Choose a reason for hiding this comment

Uh oh!

genefold-ai May 22, 2026

Choose a reason for hiding this comment

Uh oh!

Mec-iS commented May 22, 2026

Uh oh!

Mec-iS commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Mec-iS commented May 22, 2026

Uh oh!

Mec-iS commented May 22, 2026

1. Significant Performance Regression — Eager Collection

3. Code Duplication Across 4 Cases

Uh oh!

jackpots28 commented May 22, 2026

1. Significant Performance Regression — Eager Collection

3. Code Duplication Across 4 Cases

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Mec-iS commented May 20, 2026 •

edited

Loading

codecov Bot commented May 20, 2026 •

edited

Loading

Mec-iS commented May 22, 2026 •

edited

Loading