Skip to content

Conversation

@nalimilan
Copy link
Contributor

@nalimilan nalimilan commented May 29, 2025

Invalid indices were used with OffsetArrays as 1-based indexing was assumed. Fix this, and always wrap them in a ToArrow objet so that they are consistently turned into 1-based arrays.

This allows dropping special code for CategoricalArray in favor of using the standard DataAPI.refpool API combined with the Arrow extension point added to CategoricalArrays (JuliaData/CategoricalArrays.jl#415).

If this looks good I'll backport JuliaData/CategoricalArrays.jl#415 to a minor CategoricalArrays release, as currently it's only on master (soon to become 1.0), which explains why CI fails. Another interesting option would be to move the Arrow-CategoricalArrays extension to Arrow, which would ensure support works even with older CategoricalArray versions. Let me know what you think.

@codecov-commenter
Copy link

codecov-commenter commented May 29, 2025

Codecov Report

❌ Patch coverage is 66.66667% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 4.90%. Comparing base (3712291) to head (f49dfb0).
⚠️ Report is 38 commits behind head on main.

Files with missing lines Patch % Lines
src/arraytypes/dictencoding.jl 0.00% 2 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (3712291) and HEAD (f49dfb0). Click for more details.

HEAD has 27 uploads less than BASE
Flag BASE (3712291) HEAD (f49dfb0)
35 8
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #556       +/-   ##
==========================================
- Coverage   87.43%   4.90%   -82.54%     
==========================================
  Files          26      26               
  Lines        3288    3305       +17     
==========================================
- Hits         2875     162     -2713     
- Misses        413    3143     +2730     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Invalid indices were used with `OffsetArray`s as 1-based indexing was assumed.
Fix this, and always wrap them in a `ToArrow` objet so that they are consistently
turned into 1-based arrays.

This allows dropping special code for `CategoricalArray` in favor of using the
standard `DataAPI.refpool` API combined with the Arrow extension point added to
CategoricalArrays.
@quinnj quinnj merged commit 472baa0 into apache:main Jan 14, 2026
10 of 22 checks passed
@nalimilan nalimilan deleted the nl/categorical branch January 14, 2026 07:51
nalimilan added a commit to nalimilan/arrow-julia that referenced this pull request Jan 14, 2026
This should fix test failures from apache#556.
@nalimilan
Copy link
Contributor Author

Ah I had forgotten this PR, I should have bumped CategoricalArrays to 1.0 now that it's been released to avoid CI failures. See #583.

quinnj added a commit to nalimilan/arrow-julia that referenced this pull request Jan 14, 2026
ArrowTypes 2.3.0 in the registry doesn't include the offset array fix
from PR apache#556. The fix is in the local src/ArrowTypes/ but hasn't been
released yet. Update CI and release verification to use Pkg.develop
to use the local ArrowTypes instead of the registry version.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
quinnj added a commit that referenced this pull request Jan 14, 2026
This should fix test failures from #556.

---------

Co-authored-by: Jacob Quinn <quinn.jacobd@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants