typed-arrow provides a strongly typed, fully compile-time way to declare Arrow schemas in Rust. It maps Rust types directly to arrow-rs typed builders/arrays and arrow_schema::DataType
— without any runtime DataType
switching — enabling zero runtime cost, monomorphized column construction and ergonomic ORM-like APIs.
- Performance: monomorphized builders/arrays with zero dynamic dispatch; avoids runtime
DataType
matching. - Safety: column types, names, and nullability live in the type system; mismatches fail at compile time.
- Interop: uses
arrow-array
/arrow-schema
types directly; no bespoke runtime layer to learn.
use typed_arrow::{prelude::*, schema::SchemaMeta};
use typed_arrow::{Dictionary, TimestampTz, Millisecond, Utc, List};
#[derive(typed_arrow::Record)]
struct Address { city: String, zip: Option<i32> }
#[derive(typed_arrow::Record)]
struct Person {
id: i64,
address: Option<Address>,
tags: Option<List<Option<i32>>>, // List column with nullable items
code: Option<Dictionary<i32, String>>, // Dictionary<i32, Utf8>
joined: TimestampTz<Millisecond, Utc>, // Timestamp(ms) with timezone (UTC)
}
fn main() {
// Build from owned rows
let rows = vec![
Person {
id: 1,
address: Some(Address { city: "NYC".into(), zip: None }),
tags: Some(List::new(vec![Some(1), None, Some(3)])),
code: Some(Dictionary::new("gold".into())),
joined: TimestampTz::<Millisecond, Utc>::new(1_700_000_000_000),
},
Person {
id: 2,
address: None,
tags: None,
code: None,
joined: TimestampTz::<Millisecond, Utc>::new(1_700_000_100_000),
},
];
let mut b = <Person as BuildRows>::new_builders(rows.len());
b.append_rows(rows);
let arrays = b.finish();
// Compile-time schema + RecordBatch
let batch = arrays.into_record_batch();
assert_eq!(batch.schema().fields().len(), <Person as Record>::LEN);
println!("rows={}, field0={}", batch.num_rows(), batch.schema().field(0).name());
}
Add to your Cargo.toml
(derives enabled by default):
[dependencies]
typed-arrow = { version = "0.x" }
When working in this repository/workspace:
[dependencies]
typed-arrow = { path = "." }
Run the included examples to see end-to-end usage:
01_primitives
— deriveRecord
, inspectDataType
, build primitives02_lists
—List<T>
andList<Option<T>>
03_dictionary
—Dictionary<K, String>
04_timestamps
—Timestamp<U>
units04b_timestamps_tz
—TimestampTz<U, Z>
withUtc
and custom markers05_structs
— nested structs →StructArray
06_rows_flat
— row-based building for flat records07_rows_nested
— row-based building with nested struct fields08_record_batch
— compile-time schema +RecordBatch
09_duration_interval
— Duration and Interval types10_union
— Dense Union as a Record column (with attributes)11_map
— Map (incl.Option<V>
values) + as a Record column12_ext_hooks
— Extend#[derive(Record)]
with visitor injection and macro callbacks
Run:
cargo run --example 08_record_batch
Record
: implemented by the derive macro for structs with named fields.ColAt<I>
: per-column associated itemsRust
,ColumnBuilder
,ColumnArray
,NULLABLE
,NAME
, anddata_type()
.ArrowBinding
: compile-time mapping from a Rust value type to its Arrow builder, array, andDataType
.BuildRows
: derive generates<Type>Builders
and<Type>Arrays
withappend_row(s)
andfinish
.SchemaMeta
: derive providesfields()
andschema()
; arrays structs provideinto_record_batch()
.AppendStruct
andStructMeta
: enable nested struct fields andStructArray
building.
- Schema-level: annotate with
#[schema_metadata(k = "owner", v = "data")]
. - Field-level: annotate with
#[metadata(k = "pii", v = "email")]
. - You can repeat attributes to add multiple pairs; later duplicates win.
- Struct fields: struct-typed fields map to Arrow
Struct
columns by default. Make the parent field nullable withOption<Nested>
; child nullability is independent. - Lists:
List<T>
(items non-null) andList<Option<T>>
(items nullable). UseOption<List<_>>
for list-level nulls. - LargeList:
LargeList<T>
andLargeList<Option<T>>
for 64-bit offsets; wrap withOption<_>
for column nulls. - FixedSizeList:
FixedSizeList<T, N>
(items non-null) andFixedSizeListNullable<T, N>
(items nullable). Wrap withOption<_>
for list-level nulls. - Map:
Map<K, V, const SORTED: bool = false>
where keys are non-null; useMap<K, Option<V>>
to allow nullable values. Column nullability viaOption<Map<...>>
.SORTED
setskeys_sorted
in the ArrowDataType
. - OrderedMap:
OrderedMap<K, V>
usesBTreeMap<K, V>
and declareskeys_sorted = true
. - Dictionary:
Dictionary<K, V>
with integral keysK ∈ { i8, i16, i32, i64, u8, u16, u32, u64 }
and values:String
/LargeUtf8
(Utf8/LargeUtf8)Vec<u8>
/LargeBinary
(Binary/LargeBinary)[u8; N]
(FixedSizeBinary)- primitives
i*
,u*
,f32
,f64
Column nullability viaOption<Dictionary<..>>
.
- Timestamps:
Timestamp<U>
(unit-only) andTimestampTz<U, Z>
(unit + timezone). Units:Second
,Millisecond
,Microsecond
,Nanosecond
. UseUtc
or define your ownZ: TimeZoneSpec
. - Decimals:
Decimal128<P, S>
andDecimal256<P, S>
(precisionP
, scaleS
as const generics). - Unions:
#[derive(Union)]
for enums with#[union(mode = "dense"|"sparse")]
, per-variant#[union(tag = N)]
,#[union(field = "name")]
, and optional null carrier#[union(null)]
or container-levelnull_variant = "Var"
.
Supported (arrow-rs v56):
- Primitives: Int8/16/32/64, UInt8/16/32/64, Float16/32/64, Boolean
- Strings/Binary: Utf8, LargeUtf8, Binary, LargeBinary, FixedSizeBinary (via
[u8; N]
) - Temporal: Timestamp (with/without TZ; s/ms/us/ns), Date32/64, Time32(s/ms), Time64(us/ns), Duration(s/ms/us/ns), Interval(YearMonth/DayTime/MonthDayNano)
- Decimal: Decimal128, Decimal256 (const generic precision/scale)
- Nested:
- List (including nullable items), LargeList, FixedSizeList (nullable/non-null items)
- Struct,
- Map (Vec<(K,V)>; use
Option<V>
for nullable values), OrderedMap (BTreeMap<K,V>) withkeys_sorted = true
- Union: Dense and Sparse (via
#[derive(Union)]
on enums) - Dictionary: keys = all integral types; values = Utf8 (String), LargeUtf8, Binary (Vec), LargeBinary, FixedSizeBinary (
[u8; N]
), primitives (i*, u*, f32, f64)
Missing:
- BinaryView, Utf8View
- Utf8View
- ListView, LargeListView
- RunEndEncoded
- Derive extension hooks allow user-level customization without changing the core derive:
- Inject compile-time visitors:
#[record(visit(MyVisitor))]
- Call your macros per field/record:
#[record(field_macro = my_ext::per_field, record_macro = my_ext::per_record)]
- Tag fields/records with free-form markers:
#[record(ext(key))]
- Inject compile-time visitors:
- See
docs/extensibility.md
and the runnable exampleexamples/12_ext_hooks.rs
.