A sorted map that learns where your keys live
A sorted concurrent key-value map for Rust that uses a learned linear model to predict where each key lives, instead of walking a tree. Reads are lock-free, writes are CAS-based, and there's no global lock anywhere in the structure.
Overview
why not BTreeMap
Rust's BTreeMap is the standard sorted container and it's fine for most cases. But at scale two costs dominate:
O(log n) comparisons per lookup, and cache misses from pointer-chasing through tree nodes. Under concurrency it gets worse
- readers have to take a lock alongside writers, so contention scales with thread count.
Learned indexes flip the model. If your keys have any structure at all (sequential IDs, timestamps, sorted bulk-loaded data), a piecewise linear function can predict the position of any key in one step. The tree stays shallow, most lookups resolve in a single prediction, and there's no comparison cascade. The trade-off is that you give up the contiguous-leaves property that makes B-tree range scans so fast, and random-key inserts create deeper conflict chains than they would in a balanced tree.
The angle that makes scry-index worth building: every existing concurrent learned index (SALI, FINEdex, XIndex, APEX) is C++
only. This is the first Rust implementation, built on the LIPP chain method with the FMCD model fitter from the
SALI paper (SIGMOD 2024). No unsafe
in the public API, #[deny(unsafe_code)] at the crate level, Apache-2.0 / MIT.
How it works
LIPP chain method
Each node stores a linear model f(key) = slope * key + intercept and a fixed-size slot array. The model is fitted
with FMCD (Fastest Minimum Conflict Degree), which picks a slope that maps every key in the node to a unique slot - zero
prediction error by construction. A lookup is: compute f(key), load the slot atomically, return the value. No
search, no correction scan.
When two keys would land on the same slot, the slot becomes a pointer to a child node with its own model. Conflicts create children rather than shifting data, which is the property that makes lock-free concurrent inserts possible. Well-distributed keys stay in the root; clustered keys form short chains. When a subtree grows past a depth threshold, the inserting thread rebuilds it inline via a CAS-swap of the parent's child pointer.
Bulk loading from sorted data uses the same FMCD pass to build an optimal tree in one shot. Random-order inserts work, but they hit the rebuild path more often - every learned index assumes some structure in the key distribution, and scry-index is honest about it.
Key types supported out of the box: all integers, [u8; N], String, Vec<u8>. Custom
types implement the Key trait via two byte-conversion helpers (bytes_to_model_input,
bytes_to_exact_ordinal).
Concurrency model
no global lockReads
Lock-free. A thread pins itself to an epoch via crossbeam, follows the model prediction through atomic loads, and reads inline key-value data. Readers never block, never spin, never coordinate with writers.
Writes
Per-slot CAS retry loops. An insert claims an empty slot atomically, or converts the slot into a child pointer on collision. Contention is bounded to the individual slot - two writers touching different slots in the same node never see each other.
Reclamation
Epoch-based via crossbeam-epoch. Retired nodes (rebuilds, removes, drains) are deferred until all readers from that epoch
have unpinned. No reference counting, no GC. 0.1.1 swapped unbounded spin-waits for crossbeam::utils::Backoff to
play nicer with the scheduler.
Rebuilds
Localized subtree rebuilds, triggered by depth threshold. The parent's child pointer is frozen with a tag bit during the rebuild so concurrent inserts back off and retry rather than writing into a doomed subtree. No global lock, no stop-the-world.
Benchmarks
criterion, 100K-500K keys100K sequential keys, bulk-loaded. Full workload in examples/simulate.rs.
Point lookups (single-threaded)
| Operation | scry-index | BTreeMap | |
|---|---|---|---|
| Sequential keys | 0.6 ms | 2.8 ms | 4.7x |
| Random keys | 0.9 ms | 5.4 ms | 6.2x |
Concurrent (8 reader + 4 writer threads)
| Operation | scry-index | RwLock<BTreeMap> | |
|---|---|---|---|
| 8T point reads | 1.6 ms | 9.8 ms | 6.2x |
| 4T writes | 1.5 ms | 3.4 ms | 2.2x |
| 8R + 4W mixed | 3.6 ms | 27.3 ms | 7.5x |
Writes (single-threaded)
| Operation | scry-index | BTreeMap | |
|---|---|---|---|
| Incremental insert (10K into 100K) | 0.2 ms | 0.3 ms | 1.5x |
| Bulk load | 1.4 ms | 0.7 ms | 0.5x (BTree wins) |
Design evolution
five phases-
Phase 1
Single-threaded prototype. FMCD model fitting, lookup, insert, iterator.
All methods take
&mut self. Validate the algorithm works in Rust. -
Phase 2
Concurrent. Atomic slots, lock-free reads, CAS writes, epoch-based
reclamation via crossbeam. All methods take
&self. -
Phase 3
O(n) in-order iteration (DFS is naturally sorted). Range queries:
range(start..end),first_key_value(),last_key_value(). Model-guided seek for O(depth) range initialization. -
Phase 4
Depth-triggered localized subtree rebuilds. The inserting thread rebuilds
the degraded subtree inline via CAS-swap. Replaced the Phase 2.5
RwLockwith fully lock-free operation. -
Phase 5
Production hardening. True FMCD candidate slopes. Adaptive root sizing.
Key generalization (
String,Vec<u8>,[u8; N]). Entry API. Tombstone compaction. Serde support. Write safety under concurrent rebuilds.
API
usage exampleuse scry_index::LearnedMap; let map = LearnedMap::new(); let m = map.pin(); m.insert(42u64, "the answer"); m.insert(1, "first"); m.insert(100, "last"); assert_eq!(m.get(&42), Some(&"the answer")); // sorted iteration, range queries for (k, v) in m.range(1u64..50) { println!("{k}: {v}"); } // atomic check-and-insert let val = m.get_or_insert(42, "default"); assert_eq!(val, &"the answer"); // existing value, not "default"
For pre-sorted data, bulk_load builds the tree in one pass
with optimal model fitting. bulk_load_dedup handles
duplicates (keeps the last value per key, matching BTreeMap::from_iter
semantics).
// bulk load from sorted data let data: Vec<(u64, &str)> = vec![(1, "a"), (2, "b"), (3, "c")]; let map = LearnedMap::bulk_load(&data).unwrap(); // supports String, Vec<u8>, [u8; N], and all integer types let string_map: LearnedMap<String, u64> = LearnedMap::new();
Constraints
non-negotiable rulesNo unsafe in the public API
#[deny(unsafe_code)] at the crate level. Internal concurrency
primitives use targeted unsafe blocks with documented safety invariants.
The public surface is fully safe Rust.
No panics in library code
All fallible operations return Result or Option.
unwrap() only appears in tests. Panics are treated as bugs.
Clippy pedantic + nursery
All warnings are errors in CI. Issues are fixed at the source, not suppressed, unless there is a documented exception.
Every bug fix ships a test
Regression tests are required for every fix. Property-based tests cover core algorithms. Target: >90% line coverage on core modules.
Tradeoffs
when to use something else
Range scans are slow. Roughly 30x slower than BTreeMap for pure sequential scans. The chain method places keys at
model-predicted slots across a tree of nodes, so in-order traversal is a DFS with pointer chasing. BTreeMap's contiguous sorted
leaves make range scans essentially sequential memory reads. If your workload is dominated by range(a..b) and you're
not running it concurrently with writers, use BTreeMap.
Random-key inserts from empty are 3-5x slower. Keys that don't fit the node's linear model collide into the same slot and create chains until a localized rebuild fires. Time-series, sequential IDs, and pre-sorted bulk loads fit well. Hash-distributed keys or adversarial inputs don't.
Per-operation epoch overhead. Lock-free reclamation adds a small fixed cost on every call. It's the same tax
every epoch-based concurrent structure pays (including crossbeam's SkipMap). Single-threaded inserts are ~1.3x
slower than BTreeMap because of it. Under actual concurrency the cost amortizes - concurrent insert ends up 2.2x faster
than RwLock<BTreeMap>. If you have one thread and never plan to share, BTreeMap is simpler and
lighter.
Status
0.1.0 on crates.io, 0.1.1 in-flight
Current version 0.1.0 on crates.io (published March 2026), with 0.1.1 in the changelog as
unreleased - TSan race fixes in the inline tombstone reuse path, a last_key_value correctness fix after
removes, a drain / clear length-counter race, and the spin-wait → Backoff migration. MSRV 1.83. Apache-2.0 OR MIT.
Optional serde feature.
The library surface is stable: LearnedMap, LearnedSet, Config, range queries, entry API,
bulk_load / bulk_load_dedup, allocated_bytes() for memory introspection. The workspace
also contains scry-lsm/, an in-progress LSM-tree built on top.