Skip to content

perf: Optimize NULL handling in array_has#21471

Open
neilconway wants to merge 2 commits intoapache:mainfrom
neilconway:neilc/perf-array-has-nulls
Open

perf: Optimize NULL handling in array_has#21471
neilconway wants to merge 2 commits intoapache:mainfrom
neilconway:neilc/perf-array-has-nulls

Conversation

@neilconway
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

array_has uses BooleanArray::builder to construct its results. This updates the NULL buffer in an incremental fashion (row-by-row). It is more efficient to use BooleanBufferBuilder to construct the results and separately construct the output NULL buffer via NullBuffer::union or similar.

Benchmarks (AR64):

  array_has_i64 (scalar needle path)
  - found/10: 51.7 µs -> 49.2 µs, -4.7%
  - not_found/10: 43.2 µs -> 40.0 µs, -7.2%
  - found/100: 162.0 µs -> 152.3 µs, -6.3%
  - not_found/100: 143.1 µs -> 135.8 µs, -5.2%
  - found/500: 623.4 µs -> 572.5 µs, -8.4%
  - not_found/500: 593.4 µs -> 560.5 µs, -5.5%

  array_has_all (row-converter path)
  - all_found_small_needle/10: 699.4 µs -> 641.1 µs, -8.1%
  - not_all_found/10: 533.3 µs -> 470.7 µs, -9.5%
  - all_found_small_needle/100: 5.94 ms -> 4.49 ms, -24.4%
  - not_all_found/100: 4.60 ms -> 3.83 ms, -16.7%
  - all_found_small_needle/500: 34.6 ms -> 31.7 ms, -8.3%
  - not_all_found/500: 30.4 ms -> 28.4 ms, -6.7%

  array_has_any (row-converter path)
  - some_match/10: 619.7 µs -> 563.2 µs, -9.5%
  - no_match/10: 1.11 ms -> 1.04 ms, -0.1%
  - scalar_some_match/10: 467.4 µs -> 444.6 µs, -4.2%
  - scalar_no_match/10: 1.15 ms -> 994.5 µs, -14.5%
  - some_match/100: 4.98 ms -> 4.35 ms, -12.6%
  - no_match/100: 10.47 ms -> 8.90 ms, -15.0%
  - scalar_some_match/100: 4.63 ms -> 4.19 ms, -9.4%
  - scalar_no_match/100: 10.40 ms -> 9.98 ms, -4.1%
  - some_match/500: 31.8 ms -> 28.9 ms, -9.1%
  - no_match/500: 66.1 ms -> 58.3 ms, -11.8%
  - scalar_some_match/500: 25.6 ms -> 23.5 ms, -8.3%
  - scalar_no_match/500: 58.8 ms -> 54.1 ms, -8.0%

  array_has_strings (scalar needle path)
  - found/10: 396.3 µs -> 364.7 µs, -8.5%
  - not_found/10: 69.9 µs -> 67.5 µs, -3.6%
  - found/100: 1.34 ms -> 1.25 ms, -7.0%
  - not_found/100: 2.53 ms -> 2.37 ms, -6.1%
  - found/500: 3.36 ms -> 3.11 ms, -7.5%
  - not_found/500: 8.59 ms -> 8.12 ms, -5.5%

  array_has_all_strings (string fast path)
  - all_found/10: 1.08 ms -> 1.01 ms, -7.7%
  - not_all_found/10: 659.0 µs -> 632.5 µs, -3.9%
  - all_found/100: 5.50 ms -> 5.24 ms, -4.6%
  - not_all_found/100: 4.77 ms -> 4.58 ms, -4.0%
  - all_found/500: 27.4 ms -> 26.2 ms, -4.6%
  - not_all_found/500: 30.0 ms -> 28.8 ms, -3.9%

  array_has_any_strings (string fast path)
  - some_match/10: 946.5 µs -> 872.9 µs, -6.7%
  - no_match/10: 1.20 ms -> 1.12 ms, -6.0%
  - scalar_some_match/10: 420.4 µs -> 375.7 µs, -8.7%
  - scalar_no_match/10: 344.0 µs -> 309.5 µs, -10.1%
  - some_match/100: 4.93 ms -> 4.76 ms, -3.4%
  - no_match/100: 8.76 ms -> 8.50 ms, -2.9%
  - scalar_some_match/100: 1.49 ms -> 1.40 ms, -6.0%
  - scalar_no_match/100: 2.94 ms -> 2.72 ms, -7.8%
  - some_match/500: 24.0 ms -> 23.4 ms, -2.7%
  - no_match/500: 57.0 ms -> 55.8 ms, -2.1%
  - scalar_some_match/500: 5.45 ms -> 5.07 ms, -7.0%
  - scalar_no_match/500: 34.5 ms -> 32.9 ms, -4.5%

  array_has_any_scalar (varying scalar size)
  - i64_no_match/1: 173.7 µs -> 162.6 µs, -4.3%
  - i64_no_match/10: 139.8 µs -> 125.7 µs, -9.5%
  - i64_no_match/100: 264.8 µs -> 253.7 µs, -4.6%
  - i64_no_match/1000: 155.4 µs -> 142.4 µs, -8.0%
  - string_no_match/1: 125.1 µs -> 107.5 µs, -14.3%
  - string_no_match/10: 164.8 µs -> 147.0 µs, -10.1%
  - string_no_match/100: 257.8 µs -> 243.4 µs, -5.8%
  - string_no_match/1000: 180.9 µs -> 164.5 µs, -8.9%

What changes are included in this PR?

  • Implement optimization, minor code cleanup

Are these changes tested?

Yes.

Are there any user-facing changes?

No.

@github-actions github-actions bot added the functions Changes to functions implementation label Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize NULL handling for array_has

1 participant