Language Bindings#
Vortex provides bindings for multiple languages at varying levels of API depth. This page documents the tier model that governs what each language binding exposes, the current state of each binding, and the API surface available at each tier.
Tier Model#
Each language binding targets one of four tiers. Higher tiers are strict supersets of lower tiers.
Tier 0: Arrow I/O#
Read and write Arrow record batches to and from Vortex files. This is the minimum viable integration point — any language with Arrow C Data Interface support can reach this tier.
Capabilities: open files, write files, import/export Arrow streams.
Tier 1: Scan API#
Filter and projection pushdown via expressions. Expressions can be serialized as protobuf bytes or constructed natively in the host language. Results are still returned as Arrow streams, making this tier suitable for query engine integrations (e.g. DataFusion, DuckDB, Spark, Trino).
Capabilities: everything in Tier 0, plus scan builder with filter, projection, limit, and row range pushdown, and expression construction.
Tier 2: Native Arrays#
Return Vortex array streams instead of (or in addition to) Arrow. At this tier, bindings can inspect array trees (walk children, read encoding IDs and metadata), execute compute operations over Vortex arrays, and export results to Arrow. This allows direct access to compressed representations without requiring an upfront conversion to Arrow.
Capabilities: everything in Tier 1, plus Vortex array stream consumption, array tree inspection, compute execution, and Arrow export.
Tier 3: Plugins#
Define custom encodings, compute functions, layouts, and extension DTypes as plugins and register them into a Session. This is full extensibility — the host language participates in Vortex’s encoding and compute ecosystem.
Capabilities: everything in Tier 2, plus registration of array plugins, compute plugins, layout plugins, and extension DTypes.
Per-Tier API Surface#
Capability |
Tier |
Description |
|---|---|---|
Open a Vortex file |
0 |
Open a file from a path or byte source |
Write a Vortex file |
0 |
Write Arrow data into a Vortex file |
Export to Arrow stream |
0 |
Read an entire file as Arrow record batches |
Import from Arrow stream |
0 |
Write Arrow record batches into a file |
Scan with expressions |
1 |
Build a scan with filter, projection, limit, and row range |
Construct expressions |
1 |
Build filter/projection expressions (serialized or native) |
Consume scan results as Arrow |
1 |
Execute a scan and receive Arrow record batches |
Consume Vortex array stream |
2 |
Receive Vortex arrays instead of Arrow from a scan |
Inspect array trees |
2 |
Walk array children, read encoding IDs and metadata |
Execute compute over arrays |
2 |
Run compute functions (e.g. filter, take, cast) on Vortex arrays |
Export arrays to Arrow |
2 |
Convert Vortex arrays to Arrow on demand |
Access scalars |
2 |
Read individual scalar values from arrays |
Register array plugin |
3 |
Define a custom encoding with its own array vtable |
Register compute plugin |
3 |
Define custom compute functions |
Register layout plugin |
3 |
Define a custom file layout |
Register extension DType |
3 |
Define a custom logical type |
Per-Language Status#
Language |
Current Tier |
Target Tier |
Technology |
Notes |
|---|---|---|---|---|
Rust |
3 |
3 |
Native |
Future: stable plugin API via C ABI |
Python |
~2 |
3 |
PyO3 |
Already has native expressions + array access |
C |
~1 |
2 |
cbindgen |
Foundation ABI for all non-Rust bindings |
C++ |
~1 |
2 |
cxx -> C |
Migrate from cxx to wrap C API |
Java (JNI) |
~1 |
1 |
JNI |
Broad JDK compatibility, Arrow-based |
Java (Panama) |
— |
2 |
Panama FFI |
Direct C ABI access, requires JDK 22+ |
Rust#
Rust is the native implementation language and has full Tier 3 access. All array plugins, compute plugins, layouts, and extension DTypes are defined in Rust. Future work may introduce a stable plugin ABI for dynamically loading encoding crates.
Python#
Python bindings are implemented via PyO3. They already provide native expression construction and array access, placing them near Tier 2. The path to Tier 3 involves formalizing which APIs are stable vs experimental and exposing plugin registration.
C#
The C API (generated via cbindgen) is the foundation ABI for non-Rust bindings. It currently provides Tier ~1 capabilities (file I/O and basic scan). The C API is not yet ABI-stable — it evolves with the project and should be statically linked. In the future, a subset of the API will be flagged as stable for use via dynamic linking. The target is Tier 2, which requires array tree inspection, compute execution, and stabilized error handling and memory ownership conventions.
C++#
C++ bindings currently use cxx for Rust interop. The plan is to migrate to wrapping the C API directly, providing RAII wrappers and CMake integration. Target is Tier 2.
Java (JNI)#
Java JNI bindings provide Tier ~1 capabilities today (Arrow I/O and basic scan). JNI will remain at Tier 1 for broad JDK compatibility. This is the current integration point for Spark and Trino connectors.
Java (Panama)#
A new binding layer using Java’s Foreign Function & Memory API (Panama) to call the C API directly. Panama enables native array access without JNI overhead, targeting Tier 2. Requires JDK 22+. Trino already supports JDK 22 and can adopt Panama immediately. Spark targets older LTS releases and will not support Panama for some time, so the JNI path remains essential for Spark integration.