Caching Strategy

OpenDB uses an LRU (Least Recently Used) cache to accelerate reads while maintaining consistency.

Cache Architecture

┌──────────────────────────────────┐
│         Application              │
└─────────────┬────────────────────┘
              │
         Read/Write
              │
┌─────────────▼────────────────────┐
│         LRU Cache                │
│  ┌──────┬──────┬──────┬──────┐  │
│  │ Hot1 │ Hot2 │ Hot3 │ Hot4 │  │
│  └──────┴──────┴──────┴──────┘  │
└─────────────┬────────────────────┘
              │
       Cache Miss/Write
              │
┌─────────────▼────────────────────┐
│      Storage Backend             │
│         (RocksDB)                │
└──────────────────────────────────┘

Write-Through Policy

All writes go to storage first, then update the cache:

#![allow(unused)]
fn main() {
pub fn put(&self, key: &[u8], value: &[u8]) -> Result<()> {
    // 1. Write to storage (ensures durability)
    self.storage.put(ColumnFamilies::DEFAULT, key, value)?;
    
    // 2. Update cache
    self.cache.insert(key.to_vec(), value.to_vec());
    
    Ok(())
}
}

Why Write-Through?

✅ Durability: Data is persisted immediately
✅ Consistency: Cache never has uncommitted data
❌ Slower writes: Every write hits disk

Alternative: Write-Back

✅ Faster writes (batch to disk later)
❌ Risk of data loss if crash before flush
❌ More complex consistency model

Cache Invalidation

Deletes remove from both cache and storage:

#![allow(unused)]
fn main() {
pub fn delete(&self, key: &[u8]) -> Result<()> {
    // 1. Delete from storage
    self.storage.delete(ColumnFamilies::DEFAULT, key)?;
    
    // 2. Invalidate cache
    self.cache.invalidate(&key.to_vec());
    
    Ok(())
}
}

LRU Eviction

When cache reaches capacity, least-recently-used items are evicted:

Cache (capacity = 3):
  
Put("A", "1")  →  [A]
Put("B", "2")  →  [B, A]
Put("C", "3")  →  [C, B, A]
Get("A")       →  [A, C, B]  # A is now most recent
Put("D", "4")  →  [D, A, C]  # B evicted (LRU)

Cache Sizes

Default cache sizes:

#![allow(unused)]
fn main() {
pub struct OpenDBOptions {
    pub kv_cache_size: usize,       // Default: 1000
    pub record_cache_size: usize,   // Default: 500
}
}

Tuning Cache Size

#![allow(unused)]
fn main() {
let mut options = OpenDBOptions::default();
options.kv_cache_size = 10_000;      // More KV entries
options.record_cache_size = 2_000;   // More Memory records

let db = OpenDB::open_with_options("./db", options)?;
}

Guidelines:

Small cache (100-1000): Low memory, high cache miss rate
Medium cache (1000-10000): Balanced for most workloads
Large cache (10000+): High memory, low cache miss rate

Cache Hit Rates

Monitor effectiveness (metrics to be added):

Hit Rate = Cache Hits / Total Reads

> 80%: Excellent, cache is effective
50-80%: Good, consider increasing size
< 50%: Poor, increase cache or review access patterns

Multi-Level Caching

OpenDB has two cache levels:

Application Cache (LRU): In-process, fast
RocksDB Block Cache: Built into RocksDB, shared

RocksDB Block Cache

RocksDB has its own block cache (not exposed in current API):

#![allow(unused)]
fn main() {
// Future tuning option
opts.set_block_cache_size(256 * 1024 * 1024); // 256 MB
}

Concurrent Access

Caches use parking_lot::RwLock for thread safety:

#![allow(unused)]
fn main() {
pub struct LruMemoryCache<K, V> {
    cache: RwLock<LruCache<K, V>>,
}
}

Reads: Multiple concurrent readers
Writes: Exclusive lock during insert/evict

Cache Coherency Guarantees

Write Visibility: Writes are immediately visible after put() returns
Delete Visibility: Deletes are immediately visible after delete() returns
Transaction Isolation: Transactions bypass cache (read from storage snapshot)

Best Practices

Warm Up Cache

#![allow(unused)]
fn main() {
// Preload important data
let important_ids = vec!["mem_001", "mem_002", "mem_003"];
for id in important_ids {
    db.get_memory(id)?;  // Populate cache
}
}

Avoid Thrashing

#![allow(unused)]
fn main() {
// ❌ Bad: Random access pattern, poor cache hit rate
for i in 0..1_000_000 {
    let random_key = generate_random_key();
    db.get(&random_key)?;
}

// ✅ Good: Sequential or localized access
for i in 0..1000 {
    db.get(&format!("key_{}", i).as_bytes())?;
}
}

Cache Bypass for Large Scans

For scanning large datasets, consider bypassing cache (future feature):

#![allow(unused)]
fn main() {
// Future API
db.scan_prefix_no_cache(b"prefix")?;
}

OpenDB Documentation