scry-index - Concurrent Learned Index in Rust

01

Overview

why not BTreeMap

Rust's BTreeMap is the standard sorted container and it's fine for most cases. But at scale two costs dominate: O(log n) comparisons per lookup, and cache misses from pointer-chasing through tree nodes. Under concurrency it gets worse - readers have to take a lock alongside writers, so contention scales with thread count.

Learned indexes flip the model. If your keys have any structure at all (sequential IDs, timestamps, sorted bulk-loaded data), a piecewise linear function can predict the position of any key in one step. The tree stays shallow, most lookups resolve in a single prediction, and there's no comparison cascade. The trade-off is that you give up the contiguous-leaves property that makes B-tree range scans so fast, and random-key inserts create deeper conflict chains than they would in a balanced tree.

The angle that makes scry-index worth building: every existing concurrent learned index (SALI, FINEdex, XIndex, APEX) is C++ only. This is the first Rust implementation, built on the LIPP chain method with the FMCD model fitter from the SALI paper (SIGMOD 2024). No unsafe in the public API, #[deny(unsafe_code)] at the crate level, Apache-2.0 / MIT.

02

How it works

LIPP chain method

Each node stores a linear model f(key) = slope * key + intercept and a fixed-size slot array. The model is fitted with FMCD (Fastest Minimum Conflict Degree), which picks a slope that maps every key in the node to a unique slot - zero prediction error by construction. A lookup is: compute f(key), load the slot atomically, return the value. No search, no correction scan.

Root Node f(k) = 0.03k + 0.0 · 64 slots

key=10val="a"

empty

key=30val="c"

child ↓

key=80val="h"

Child Node f(k) = 0.3k - 12.0

key=40val="d"

empty

key=45val="e"

When two keys would land on the same slot, the slot becomes a pointer to a child node with its own model. Conflicts create children rather than shifting data, which is the property that makes lock-free concurrent inserts possible. Well-distributed keys stay in the root; clustered keys form short chains. When a subtree grows past a depth threshold, the inserting thread rebuilds it inline via a CAS-swap of the parent's child pointer.

Bulk loading from sorted data uses the same FMCD pass to build an optimal tree in one shot. Random-order inserts work, but they hit the rebuild path more often - every learned index assumes some structure in the key distribution, and scry-index is honest about it.

Key types supported out of the box: all integers, [u8; N], String, Vec<u8>. Custom types implement the Key trait via two byte-conversion helpers (bytes_to_model_input, bytes_to_exact_ordinal).

03

Concurrency model

no global lock

Reads

Lock-free. A thread pins itself to an epoch via crossbeam, follows the model prediction through atomic loads, and reads inline key-value data. Readers never block, never spin, never coordinate with writers.

Writes

Per-slot CAS retry loops. An insert claims an empty slot atomically, or converts the slot into a child pointer on collision. Contention is bounded to the individual slot - two writers touching different slots in the same node never see each other.

Reclamation

Epoch-based via crossbeam-epoch. Retired nodes (rebuilds, removes, drains) are deferred until all readers from that epoch have unpinned. No reference counting, no GC. 0.1.1 swapped unbounded spin-waits for crossbeam::utils::Backoff to play nicer with the scheduler.

Rebuilds

Localized subtree rebuilds, triggered by depth threshold. The parent's child pointer is frozen with a tag bit during the rebuild so concurrent inserts back off and retry rather than writing into a doomed subtree. No global lock, no stop-the-world.

04

Benchmarks

criterion, 100K-500K keys

100K sequential keys, bulk-loaded. Full workload in examples/simulate.rs.

Point lookups (single-threaded)

Operation	scry-index	BTreeMap
Sequential keys	0.6 ms	2.8 ms	4.7x
Random keys	0.9 ms	5.4 ms	6.2x

Concurrent (8 reader + 4 writer threads)

Operation	scry-index	RwLock<BTreeMap>
8T point reads	1.6 ms	9.8 ms	6.2x
4T writes	1.5 ms	3.4 ms	2.2x
8R + 4W mixed	3.6 ms	27.3 ms	7.5x

Writes (single-threaded)

Operation	scry-index	BTreeMap
Incremental insert (10K into 100K)	0.2 ms	0.3 ms	1.5x
Bulk load	1.4 ms	0.7 ms	0.5x (BTree wins)

Bulk load is slower because FMCD model fitting isn't free. The payoff is at read time. Pure sequential range scans are roughly 30x slower than BTreeMap - that's the structural cost of the chain method, covered in tradeoffs below.

05

Design evolution

five phases

Phase 1 Single-threaded prototype. FMCD model fitting, lookup, insert, iterator. All methods take &mut self. Validate the algorithm works in Rust.
Phase 2 Concurrent. Atomic slots, lock-free reads, CAS writes, epoch-based reclamation via crossbeam. All methods take &self.
Phase 3 O(n) in-order iteration (DFS is naturally sorted). Range queries: range(start..end), first_key_value(), last_key_value(). Model-guided seek for O(depth) range initialization.
Phase 4 Depth-triggered localized subtree rebuilds. The inserting thread rebuilds the degraded subtree inline via CAS-swap. Replaced the Phase 2.5 RwLock with fully lock-free operation.
Phase 5 Production hardening. True FMCD candidate slopes. Adaptive root sizing. Key generalization (String, Vec<u8>, [u8; N]). Entry API. Tombstone compaction. Serde support. Write safety under concurrent rebuilds.

06

API

usage example

use scry_index::LearnedMap;

let map = LearnedMap::new();
let m = map.pin();

m.insert(42u64, "the answer");
m.insert(1, "first");
m.insert(100, "last");

assert_eq!(m.get(&42), Some(&"the answer"));

// sorted iteration, range queries
for (k, v) in m.range(1u64..50) {
    println!("{k}: {v}");
}

// atomic check-and-insert
let val = m.get_or_insert(42, "default");
assert_eq!(val, &"the answer"); // existing value, not "default"

For pre-sorted data, bulk_load builds the tree in one pass with optimal model fitting. bulk_load_dedup handles duplicates (keeps the last value per key, matching BTreeMap::from_iter semantics).

// bulk load from sorted data
let data: Vec<(u64, &str)> = vec![(1, "a"), (2, "b"), (3, "c")];
let map = LearnedMap::bulk_load(&data).unwrap();

// supports String, Vec<u8>, [u8; N], and all integer types
let string_map: LearnedMap<String, u64> = LearnedMap::new();

07

Constraints

non-negotiable rules

No unsafe in the public API

#[deny(unsafe_code)] at the crate level. Internal concurrency primitives use targeted unsafe blocks with documented safety invariants. The public surface is fully safe Rust.

No panics in library code

All fallible operations return Result or Option. unwrap() only appears in tests. Panics are treated as bugs.

Clippy pedantic + nursery

All warnings are errors in CI. Issues are fixed at the source, not suppressed, unless there is a documented exception.

Every bug fix ships a test

Regression tests are required for every fix. Property-based tests cover core algorithms. Target: >90% line coverage on core modules.

08

Tradeoffs

when to use something else

Range scans are slow. Roughly 30x slower than BTreeMap for pure sequential scans. The chain method places keys at model-predicted slots across a tree of nodes, so in-order traversal is a DFS with pointer chasing. BTreeMap's contiguous sorted leaves make range scans essentially sequential memory reads. If your workload is dominated by range(a..b) and you're not running it concurrently with writers, use BTreeMap.

Random-key inserts from empty are 3-5x slower. Keys that don't fit the node's linear model collide into the same slot and create chains until a localized rebuild fires. Time-series, sequential IDs, and pre-sorted bulk loads fit well. Hash-distributed keys or adversarial inputs don't.

Per-operation epoch overhead. Lock-free reclamation adds a small fixed cost on every call. It's the same tax every epoch-based concurrent structure pays (including crossbeam's SkipMap). Single-threaded inserts are ~1.3x slower than BTreeMap because of it. Under actual concurrency the cost amortizes - concurrent insert ends up 2.2x faster than RwLock<BTreeMap>. If you have one thread and never plan to share, BTreeMap is simpler and lighter.

09

Status

0.1.0 on crates.io, 0.1.1 in-flight

Current version 0.1.0 on crates.io (published March 2026), with 0.1.1 in the changelog as unreleased - TSan race fixes in the inline tombstone reuse path, a last_key_value correctness fix after removes, a drain / clear length-counter race, and the spin-wait → Backoff migration. MSRV 1.83. Apache-2.0 OR MIT. Optional serde feature.

The library surface is stable: LearnedMap, LearnedSet, Config, range queries, entry API, bulk_load / bulk_load_dedup, allocated_bytes() for memory introspection. The workspace also contains scry-lsm/, an in-progress LSM-tree built on top.

A sorted map that learns where your keys live