Keyboard shortcuts

Press โ† or โ†’ to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

OpenDB

OpenDB Logo License

OpenDB is a high-performance hybrid embedded database written in pure Rust, combining multiple database paradigms into a single, cohesive system.

Features

  • ๐Ÿ”‘ Key-Value Store: Fast point lookups and range scans
  • ๐Ÿ“„ Structured Records: Document/row storage with schema support
  • ๐Ÿ”— Graph Database: Relationships and graph traversals
  • ๐Ÿ” Vector Search: Semantic search with HNSW-based approximate nearest neighbors
  • ๐Ÿ’พ In-Memory Cache: LRU cache for hot data
  • โœ… ACID Transactions: Full transactional guarantees with WAL

Why OpenDB?

OpenDB is designed for applications that need multiple database capabilities without the complexity of managing separate systems:

  • Agent Memory Systems: Store and recall facts, relationships, and semantic information
  • Knowledge Graphs: Build and traverse complex relationship networks
  • Semantic Search: Find similar content using vector embeddings
  • High-Performance Applications: LSM-tree backend for excellent write throughput

Repository

Quick Example

use opendb::{OpenDB, Memory};

fn main() -> opendb::Result<()> {
    // Open database
    let db = OpenDB::open("./my_database")?;
    
    // Store a memory with embedding
    let memory = Memory::new(
        "memory_1",
        "Rust is awesome!",
        vec![0.1, 0.2, 0.3],
        0.9, // importance
    );
    db.insert_memory(&memory)?;
    
    // Create relationships
    db.link("memory_1", "related_to", "memory_2")?;
    
    // Vector search
    let similar = db.search_similar(&[0.1, 0.2, 0.3], 5)?;
    
    Ok(())
}

Next Steps

Installation

From crates.io (once published)

cargo add opendb

From source

  1. Clone the repository:
git clone https://github.com/muhammad-fiaz/OpenDB.git
cd OpenDB
  1. Build the project:
cargo build --release
  1. Run tests:
cargo test
  1. Run examples:
cargo run --example quickstart
cargo run --example memory_agent
cargo run --example graph_relations

Requirements

  • Rust: 1.70.0 or higher (Rust 2021 edition)
  • Operating System: Linux, macOS, or Windows
  • Dependencies: All dependencies are managed by Cargo

System Dependencies

OpenDB uses RocksDB as its storage backend, which requires:

  • Linux: gcc, g++, make, libsnappy-dev, zlib1g-dev, libbz2-dev, liblz4-dev
  • macOS: Xcode command line tools
  • Windows: Visual Studio Build Tools

Linux Setup

# Ubuntu/Debian
sudo apt-get install -y gcc g++ make libsnappy-dev zlib1g-dev libbz2-dev liblz4-dev

# Fedora/RHEL
sudo dnf install -y gcc gcc-c++ make snappy-devel zlib-devel bzip2-devel lz4-devel

macOS Setup

xcode-select --install

Windows Setup

Install Visual Studio Build Tools

Verifying Installation

cargo test --all

All tests should pass. If you encounter issues, please check:

  1. Rust version: rustc --version
  2. Build dependencies are installed
  3. Open an issue if problems persist

Quick Start

This guide will walk you through the basic usage of OpenDB.

Opening a Database

use opendb::{OpenDB, Result};

fn main() -> Result<()> {
    // Open or create a database
    let db = OpenDB::open("./my_database")?;
    Ok(())
}

Working with Key-Value Data

#![allow(unused)]
fn main() {
// Store a value
db.put(b"my_key", b"my_value")?;

// Retrieve a value
if let Some(value) = db.get(b"my_key")? {
    println!("Value: {:?}", value);
}

// Delete a value
db.delete(b"my_key")?;

// Check existence
if db.exists(b"my_key")? {
    println!("Key exists!");
}
}

Working with Memory Records

Memory records are structured data with embeddings for semantic search.

#![allow(unused)]
fn main() {
use opendb::Memory;

// Create a memory
let memory = Memory::new(
    "memory_001",
    "The user prefers dark mode",
    vec![0.1, 0.2, 0.3, 0.4], // embedding vector
    0.9, // importance (0.0 to 1.0)
)
.with_metadata("category", "preference")
.with_metadata("source", "user_settings");

// Insert the memory
db.insert_memory(&memory)?;

// Retrieve it
if let Some(mem) = db.get_memory("memory_001")? {
    println!("Content: {}", mem.content);
    println!("Importance: {}", mem.importance);
}

// List all memories with a prefix
let all = db.list_memories("memory")?;
println!("Found {} memories", all.len());
}

Creating Relationships

#![allow(unused)]
fn main() {
// Create relationships between memories
db.link("memory_001", "related_to", "memory_002")?;
db.link("memory_001", "caused_by", "memory_003")?;

// Query relationships
let related = db.get_related("memory_001", "related_to")?;
for id in related {
    println!("Related memory: {}", id);
}

// Get all outgoing edges
let edges = db.get_outgoing("memory_001")?;
for edge in edges {
    println!("{} --[{}]--> {}", edge.from, edge.relation, edge.to);
}
}
#![allow(unused)]
fn main() {
// Search for similar memories
let query_embedding = vec![0.1, 0.2, 0.3, 0.4];
let results = db.search_similar(&query_embedding, 5)?; // top 5

for result in results {
    println!("Memory: {} (distance: {:.4})", 
             result.memory.content, 
             result.distance);
}
}

Using Transactions

#![allow(unused)]
fn main() {
// Begin a transaction
let mut txn = db.begin_transaction()?;

// Perform operations
txn.put("records", b"key1", b"value1")?;
txn.put("records", b"key2", b"value2")?;

// Commit the transaction
txn.commit()?;

// Or rollback if needed
// txn.rollback()?;
}

Flushing to Disk

#![allow(unused)]
fn main() {
// Ensure all writes are persisted
db.flush()?;
}

Complete Example

See the quickstart example for a complete, runnable example.

Next Steps

Architecture Overview

OpenDB is designed as a modular, hybrid database system that combines multiple database paradigms while maintaining high performance and ACID guarantees.

System Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                  OpenDB Public API                       โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  Key-Value  โ”‚   Records    โ”‚    Graph     โ”‚   Vectors   โ”‚
โ”‚   Store     โ”‚  (Memory)    โ”‚  Relations   โ”‚   (HNSW)    โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚           Transaction Manager (ACID)                     โ”‚
โ”‚        WAL + Optimistic Locking + MVCC                  โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚              LRU Cache Layer                             โ”‚
โ”‚        (Write-Through + Invalidation)                   โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚         Storage Trait (Pluggable Backend)                โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚            RocksDB Backend (LSM Tree)                    โ”‚
โ”‚    Column Families + Native Transactions + WAL          โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Core Components

1. Storage Layer

  • Backend: RocksDB (high-performance LSM tree)
  • Column Families: Namespace isolation for different data types
  • Persistence: Write-Ahead Log (WAL) for durability

2. Transaction Manager

  • ACID Guarantees: Full transactional support
  • Isolation: Snapshot isolation via RocksDB transactions
  • Concurrency: Optimistic locking

3. Cache Layer

  • Strategy: LRU (Least Recently Used)
  • Write Policy: Write-through (update storage first, then cache)
  • Coherency: Automatic invalidation on delete

4. Feature Modules

Key-Value Store

  • Direct byte-level storage
  • Prefix scans
  • Cache-accelerated reads

Records Manager

  • Structured Memory records
  • Codec: rkyv (zero-copy deserialization)
  • Metadata support

Graph Manager

  • Bidirectional adjacency lists
  • Forward index: from โ†’ [(relation, to)]
  • Backward index: to โ†’ [(relation, from)]

Vector Manager

  • HNSW index for approximate nearest neighbor search
  • Automatic index rebuilding
  • Configurable search quality

Data Flow

Write Path

Application โ†’ OpenDB API โ†’ Cache (update) โ†’ Storage Backend โ†’ WAL โ†’ Disk

Read Path (Cache Hit)

Application โ†’ OpenDB API โ†’ Cache โ†’ Return

Read Path (Cache Miss)

Application โ†’ OpenDB API โ†’ Cache (miss) โ†’ Storage Backend โ†’ Cache (populate) โ†’ Return

Design Decisions

Why RocksDB?

Advantages:

  • Production-tested LSM tree
  • Excellent write throughput
  • Built-in WAL and transactions
  • Column families for organization

Tradeoffs:

  • Not pure Rust (C++ with bindings)
  • Larger binary size

Alternatives Considered:

  • redb: Pure Rust, B-tree based, simpler but lower throughput
  • sled: Pure Rust, but less mature and maintenance concerns
  • Custom LSM: Too much complexity for initial version

Why rkyv for Serialization?

Advantages:

  • Zero-copy deserialization (fast reads)
  • Schema versioning support
  • Type safety

Alternatives:

  • bincode: Simpler but requires full deserialization
  • serde_json: Human-readable but slower

Advantages:

  • Excellent accuracy/speed tradeoff
  • Logarithmic search complexity
  • Works well for high-dimensional data

Alternatives:

  • IVF (Inverted File Index): Faster but less accurate
  • Flat index: Exact but O(n) search

Next Steps

Storage Layer

RocksDB Backend

OpenDB uses RocksDB as its default storage backend, providing a robust foundation for ACID transactions and high-performance data access.

Column Families

Data is organized into separate column families (namespaces):

Column FamilyPurposeData Format
defaultKey-value storeRaw bytes
recordsMemory recordsrkyv-encoded Memory structs
graph_forwardForward adjacency listrkyv-encoded Edge arrays
graph_backwardBackward adjacency listrkyv-encoded Edge arrays
vector_dataVector embeddingsbincode-encoded f32 arrays
vector_indexHNSW metadata(currently in-memory)
metadataDB metadataJSON

Storage Trait

The storage layer is abstracted behind a trait, allowing for pluggable backends:

#![allow(unused)]
fn main() {
pub trait StorageBackend: Send + Sync {
    fn get(&self, cf: &str, key: &[u8]) -> Result<Option<Vec<u8>>>;
    fn put(&self, cf: &str, key: &[u8], value: &[u8]) -> Result<()>;
    fn delete(&self, cf: &str, key: &[u8]) -> Result<()>;
    fn scan_prefix(&self, cf: &str, prefix: &[u8]) -> Result<Vec<(Vec<u8>, Vec<u8>)>>;
    fn begin_transaction(&self) -> Result<Box<dyn Transaction>>;
    fn flush(&self) -> Result<()>;
}
}

Performance Tuning

RocksDB is configured with optimizations for mixed read/write workloads:

#![allow(unused)]
fn main() {
// Write buffer: 128MB
opts.set_write_buffer_size(128 * 1024 * 1024);

// Number of write buffers: 3
opts.set_max_write_buffer_number(3);

// Target file size: 64MB
opts.set_target_file_size_base(64 * 1024 * 1024);

// Compression: LZ4
opts.set_compression_type(rocksdb::DBCompressionType::Lz4);
}

Write-Ahead Log (WAL)

RocksDB's WAL ensures durability:

  1. All writes are first appended to the WAL
  2. Then applied to memtables
  3. Periodically flushed to SST files
  4. Old WAL segments are deleted after checkpoint

LSM Tree Structure

RocksDB uses a Log-Structured Merge (LSM) tree:

Write Path:
  Write โ†’ WAL โ†’ MemTable โ†’ (flush) โ†’ L0 SST โ†’ (compact) โ†’ L1 SST โ†’ ...

Read Path:
  Read โ†’ MemTable โ†’ Block Cache โ†’ L0 โ†’ L1 โ†’ ... โ†’ Ln

Advantages

  • Write Amplification: Minimized for sequential writes
  • Compression: Data is compressed at each level
  • Compaction: Background process merges and cleans data

Tradeoffs

  • Read Amplification: May need to check multiple levels
  • Space Amplification: Compaction creates temporary overhead

Future Backend Options

redb (Pure Rust B-Tree)

Pros:

  • Pure Rust, no C++ dependencies
  • Simpler architecture
  • Good for read-heavy workloads

Cons:

  • Lower write throughput than LSM
  • Less mature

Custom LSM Implementation

Pros:

  • Full control over optimization
  • Pure Rust

Cons:

  • High development and maintenance cost
  • Risk of bugs in critical path

Next

Transaction Model

OpenDB provides full ACID (Atomicity, Consistency, Isolation, Durability) guarantees through RocksDB's transaction support.

ACID Properties

Atomicity

All operations in a transaction either succeed together or fail together.

#![allow(unused)]
fn main() {
let mut txn = db.begin_transaction()?;
txn.put("records", b"key1", b"value1")?;
txn.put("records", b"key2", b"value2")?;
txn.commit()?; // Both writes succeed or both fail
}

Consistency

Transactions move the database from one consistent state to another.

Isolation

Transactions use snapshot isolation:

  • Each transaction sees a consistent snapshot of the database
  • Concurrent transactions don't interfere with each other
  • RocksDB provides MVCC (Multi-Version Concurrency Control)

Durability

Once a transaction commits, the changes are permanent:

  • Write-Ahead Log (WAL) ensures durability
  • Data survives process crashes
  • Can be verified by reopening the database

Transaction API

Basic Usage

#![allow(unused)]
fn main() {
// Begin transaction
let mut txn = db.begin_transaction()?;

// Perform operations
txn.put("records", b"key1", b"value1")?;
let val = txn.get("records", b"key1")?;

// Commit
txn.commit()?;
}

Rollback

#![allow(unused)]
fn main() {
let mut txn = db.begin_transaction()?;
txn.put("records", b"key1", b"modified")?;

// Something went wrong, rollback
txn.rollback()?;

// Original value remains unchanged
}

Auto-Rollback

Transactions are automatically rolled back if dropped without commit:

#![allow(unused)]
fn main() {
{
    let mut txn = db.begin_transaction()?;
    txn.put("records", b"key1", b"value")?;
    // txn dropped here - auto rollback
}
}

Concurrency Model

Optimistic Locking

RocksDB transactions use optimistic locking:

  1. Read phase: Transaction reads data without locks
  2. Validation phase: Before commit, check if data changed
  3. Write phase: If no conflicts, commit; otherwise abort

Conflict Detection

#![allow(unused)]
fn main() {
// Transaction 1
let mut txn1 = db.begin_transaction()?;
txn1.put("records", b"counter", b"1")?;

// Transaction 2 (concurrent)
let mut txn2 = db.begin_transaction()?;
txn2.put("records", b"counter", b"2")?;

// First to commit wins
txn1.commit()?; // Success
txn2.commit()?; // May fail with conflict error
}

Snapshot Isolation Example

#![allow(unused)]
fn main() {
// Initial state: counter = 0
db.put(b"counter", b"0")?;

// Transaction 1 reads
let mut txn1 = db.begin_transaction()?;
let val1 = txn1.get("default", b"counter")?;

// Meanwhile, Transaction 2 updates
let mut txn2 = db.begin_transaction()?;
txn2.put("default", b"counter", b"5")?;
txn2.commit()?;

// Transaction 1 still sees old snapshot
let val1_again = txn1.get("default", b"counter")?;
assert_eq!(val1, val1_again); // Still "0"
}

Best Practices

Keep Transactions Short

#![allow(unused)]
fn main() {
// โŒ Bad: Long-running transaction
let mut txn = db.begin_transaction()?;
for i in 0..1_000_000 {
    txn.put("default", &i.to_string().as_bytes(), b"value")?;
}
txn.commit()?;

// โœ… Good: Batch commits
for chunk in (0..1_000_000).collect::<Vec<_>>().chunks(1000) {
    let mut txn = db.begin_transaction()?;
    for i in chunk {
        txn.put("default", &i.to_string().as_bytes(), b"value")?;
    }
    txn.commit()?;
}
}

Handle Conflicts

#![allow(unused)]
fn main() {
loop {
    let mut txn = db.begin_transaction()?;
    
    // Read-modify-write
    let val = txn.get("default", b"counter")?.unwrap_or_default();
    let new_val = increment(val);
    txn.put("default", b"counter", &new_val)?;
    
    match txn.commit() {
        Ok(_) => break,
        Err(Error::Transaction(_)) => continue, // Retry on conflict
        Err(e) => return Err(e),
    }
}
}

Use Snapshots for Consistent Reads

For read-only operations across multiple keys, use snapshots (coming soon):

#![allow(unused)]
fn main() {
let snapshot = db.snapshot()?;
let val1 = snapshot.get("records", b"key1")?;
let val2 = snapshot.get("records", b"key2")?;
// val1 and val2 are from the same consistent point in time
}

Limitations

  • Transactions are single-threaded (one transaction per thread)
  • Cross-column-family transactions are supported
  • Very large transactions may impact performance

Next

Caching Strategy

OpenDB uses an LRU (Least Recently Used) cache to accelerate reads while maintaining consistency.

Cache Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚         Application              โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
              โ”‚
         Read/Write
              โ”‚
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚         LRU Cache                โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  โ”‚ Hot1 โ”‚ Hot2 โ”‚ Hot3 โ”‚ Hot4 โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
              โ”‚
       Cache Miss/Write
              โ”‚
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚      Storage Backend             โ”‚
โ”‚         (RocksDB)                โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Write-Through Policy

All writes go to storage first, then update the cache:

#![allow(unused)]
fn main() {
pub fn put(&self, key: &[u8], value: &[u8]) -> Result<()> {
    // 1. Write to storage (ensures durability)
    self.storage.put(ColumnFamilies::DEFAULT, key, value)?;
    
    // 2. Update cache
    self.cache.insert(key.to_vec(), value.to_vec());
    
    Ok(())
}
}

Why Write-Through?

  • โœ… Durability: Data is persisted immediately
  • โœ… Consistency: Cache never has uncommitted data
  • โŒ Slower writes: Every write hits disk

Alternative: Write-Back

  • โœ… Faster writes (batch to disk later)
  • โŒ Risk of data loss if crash before flush
  • โŒ More complex consistency model

Cache Invalidation

Deletes remove from both cache and storage:

#![allow(unused)]
fn main() {
pub fn delete(&self, key: &[u8]) -> Result<()> {
    // 1. Delete from storage
    self.storage.delete(ColumnFamilies::DEFAULT, key)?;
    
    // 2. Invalidate cache
    self.cache.invalidate(&key.to_vec());
    
    Ok(())
}
}

LRU Eviction

When cache reaches capacity, least-recently-used items are evicted:

Cache (capacity = 3):
  
Put("A", "1")  โ†’  [A]
Put("B", "2")  โ†’  [B, A]
Put("C", "3")  โ†’  [C, B, A]
Get("A")       โ†’  [A, C, B]  # A is now most recent
Put("D", "4")  โ†’  [D, A, C]  # B evicted (LRU)

Cache Sizes

Default cache sizes:

#![allow(unused)]
fn main() {
pub struct OpenDBOptions {
    pub kv_cache_size: usize,       // Default: 1000
    pub record_cache_size: usize,   // Default: 500
}
}

Tuning Cache Size

#![allow(unused)]
fn main() {
let mut options = OpenDBOptions::default();
options.kv_cache_size = 10_000;      // More KV entries
options.record_cache_size = 2_000;   // More Memory records

let db = OpenDB::open_with_options("./db", options)?;
}

Guidelines:

  • Small cache (100-1000): Low memory, high cache miss rate
  • Medium cache (1000-10000): Balanced for most workloads
  • Large cache (10000+): High memory, low cache miss rate

Cache Hit Rates

Monitor effectiveness (metrics to be added):

Hit Rate = Cache Hits / Total Reads
  • > 80%: Excellent, cache is effective
  • 50-80%: Good, consider increasing size
  • < 50%: Poor, increase cache or review access patterns

Multi-Level Caching

OpenDB has two cache levels:

  1. Application Cache (LRU): In-process, fast
  2. RocksDB Block Cache: Built into RocksDB, shared

RocksDB Block Cache

RocksDB has its own block cache (not exposed in current API):

#![allow(unused)]
fn main() {
// Future tuning option
opts.set_block_cache_size(256 * 1024 * 1024); // 256 MB
}

Concurrent Access

Caches use parking_lot::RwLock for thread safety:

#![allow(unused)]
fn main() {
pub struct LruMemoryCache<K, V> {
    cache: RwLock<LruCache<K, V>>,
}
}
  • Reads: Multiple concurrent readers
  • Writes: Exclusive lock during insert/evict

Cache Coherency Guarantees

  1. Write Visibility: Writes are immediately visible after put() returns
  2. Delete Visibility: Deletes are immediately visible after delete() returns
  3. Transaction Isolation: Transactions bypass cache (read from storage snapshot)

Best Practices

Warm Up Cache

#![allow(unused)]
fn main() {
// Preload important data
let important_ids = vec!["mem_001", "mem_002", "mem_003"];
for id in important_ids {
    db.get_memory(id)?;  // Populate cache
}
}

Avoid Thrashing

#![allow(unused)]
fn main() {
// โŒ Bad: Random access pattern, poor cache hit rate
for i in 0..1_000_000 {
    let random_key = generate_random_key();
    db.get(&random_key)?;
}

// โœ… Good: Sequential or localized access
for i in 0..1000 {
    db.get(&format!("key_{}", i).as_bytes())?;
}
}

Cache Bypass for Large Scans

For scanning large datasets, consider bypassing cache (future feature):

#![allow(unused)]
fn main() {
// Future API
db.scan_prefix_no_cache(b"prefix")?;
}

Next

Key-Value Store API

OpenDB provides a simple, fast key-value interface for storing arbitrary binary data.

Basic Operations

Put

Store a value under a key:

#![allow(unused)]
fn main() {
use opendb::OpenDB;

let db = OpenDB::open("./db")?;
db.put(b"user:123", b"Alice")?;
}

Signature:

#![allow(unused)]
fn main() {
pub fn put(&self, key: &[u8], value: &[u8]) -> Result<()>
}

Behavior:

  • Writes to storage immediately (write-through cache)
  • Updates cache
  • Returns error if storage fails

Get

Retrieve a value by key:

#![allow(unused)]
fn main() {
let value = db.get(b"user:123")?;
match value {
    Some(bytes) => println!("Found: {}", String::from_utf8_lossy(&bytes)),
    None => println!("Not found"),
}
}

Signature:

#![allow(unused)]
fn main() {
pub fn get(&self, key: &[u8]) -> Result<Option<Vec<u8>>>
}

Behavior:

  • Checks cache first (fast path)
  • Falls back to storage on cache miss
  • Returns None if key doesn't exist

Delete

Remove a key-value pair:

#![allow(unused)]
fn main() {
db.delete(b"user:123")?;
}

Signature:

#![allow(unused)]
fn main() {
pub fn delete(&self, key: &[u8]) -> Result<()>
}

Behavior:

  • Removes from storage
  • Invalidates cache entry
  • Succeeds even if key doesn't exist

Exists

Check if a key exists without fetching the value:

#![allow(unused)]
fn main() {
if db.exists(b"user:123")? {
    println!("User exists");
}
}

Signature:

#![allow(unused)]
fn main() {
pub fn exists(&self, key: &[u8]) -> Result<bool>
}

Behavior:

  • Checks cache first
  • Falls back to storage on cache miss
  • More efficient than get() for existence checks

Advanced Operations

Scan Prefix

Iterate over all keys with a common prefix:

#![allow(unused)]
fn main() {
let users = db.scan_prefix(b"user:")?;
for (key, value) in users {
    println!("{} = {}", 
        String::from_utf8_lossy(&key),
        String::from_utf8_lossy(&value)
    );
}
}

Signature:

#![allow(unused)]
fn main() {
pub fn scan_prefix(&self, prefix: &[u8]) -> Result<Vec<(Vec<u8>, Vec<u8>)>>
}

Behavior:

  • Bypasses cache (reads from storage)
  • Returns all matching key-value pairs
  • Sorted by key (lexicographic order)

Usage Patterns

Namespacing

Use prefixes to organize data:

#![allow(unused)]
fn main() {
// User namespace
db.put(b"user:123", b"Alice")?;
db.put(b"user:456", b"Bob")?;

// Session namespace
db.put(b"session:abc", b"user:123")?;
db.put(b"session:xyz", b"user:456")?;

// Scan all users
let users = db.scan_prefix(b"user:")?;
}

Counter

Implement atomic counters with transactions:

#![allow(unused)]
fn main() {
fn increment_counter(db: &OpenDB, key: &[u8]) -> Result<u64> {
    let mut txn = db.begin_transaction()?;
    
    let current = txn.get("default", key)?
        .map(|v| u64::from_le_bytes(v.try_into().unwrap()))
        .unwrap_or(0);
    
    let new_val = current + 1;
    txn.put("default", key, &new_val.to_le_bytes())?;
    txn.commit()?;
    
    Ok(new_val)
}

let count = increment_counter(&db, b"visits")?;
}

Binary Data

Store any serializable type:

#![allow(unused)]
fn main() {
use serde::{Serialize, Deserialize};

#[derive(Serialize, Deserialize)]
struct Config {
    host: String,
    port: u16,
}

let config = Config {
    host: "localhost".to_string(),
    port: 8080,
};

// Serialize
let bytes = bincode::serialize(&config)?;
db.put(b"config", &bytes)?;

// Deserialize
let bytes = db.get(b"config")?.unwrap();
let config: Config = bincode::deserialize(&bytes)?;
}

Performance Characteristics

OperationTime ComplexityCache HitCache Miss
get()O(1) avg~100ns~1-10ยตs
put()O(log n)~1-10ยตs~1-10ยตs
delete()O(log n)~1-10ยตs~1-10ยตs
exists()O(1) avg~100ns~1-10ยตs
scan_prefix()O(k log n)N/A~10ยตs + k*1ยตs

Where:

  • n = total keys in database
  • k = number of matching keys

Error Handling

All operations return Result<T, Error>:

#![allow(unused)]
fn main() {
use opendb::{OpenDB, Error};

match db.get(b"key") {
    Ok(Some(value)) => { /* use value */ },
    Ok(None) => { /* key not found */ },
    Err(Error::Storage(e)) => { /* storage error */ },
    Err(Error::Cache(e)) => { /* cache error */ },
    Err(e) => { /* other error */ },
}
}

Thread Safety

All KV operations are thread-safe:

#![allow(unused)]
fn main() {
use std::sync::Arc;
use std::thread;

let db = Arc::new(OpenDB::open("./db")?);

let handles: Vec<_> = (0..10).map(|i| {
    let db = Arc::clone(&db);
    thread::spawn(move || {
        db.put(format!("key_{}", i).as_bytes(), b"value").unwrap();
    })
}).collect();

for handle in handles {
    handle.join().unwrap();
}
}

Next

Records API

The Records API manages structured Memory objects with metadata, timestamps, and embeddings.

Memory Type

#![allow(unused)]
fn main() {
pub struct Memory {
    pub id: String,
    pub content: String,
    pub embedding: Vec<f32>,
    pub importance: f64,
    pub timestamp: i64,
    pub metadata: HashMap<String, String>,
}
}

Creating Memories

New Memory

#![allow(unused)]
fn main() {
use opendb::{OpenDB, Memory};

let memory = Memory::new(
    "mem_001".to_string(),
    "User asked about Rust ownership".to_string(),
);
}

With Metadata

#![allow(unused)]
fn main() {
let memory = Memory::new("mem_002".to_string(), "Content".to_string())
    .with_metadata("category", "conversation")
    .with_metadata("user_id", "123");
}

Custom Builder

#![allow(unused)]
fn main() {
use std::collections::HashMap;

let mut metadata = HashMap::new();
metadata.insert("priority".to_string(), "high".to_string());

let memory = Memory {
    id: "mem_003".to_string(),
    content: "Important note".to_string(),
    embedding: vec![0.1, 0.2, 0.3], // 3D for demo
    importance: 0.95,
    timestamp: chrono::Utc::now().timestamp(),
    metadata,
};
}

CRUD Operations

Insert

#![allow(unused)]
fn main() {
let db = OpenDB::open("./db")?;
let memory = Memory::new("mem_001".to_string(), "Hello world".to_string());
db.insert_memory(&memory)?;
}

Signature:

#![allow(unused)]
fn main() {
pub fn insert_memory(&self, memory: &Memory) -> Result<()>
}

Behavior:

  • Serializes with rkyv (zero-copy)
  • Writes to records column family
  • Updates cache
  • If embedding is non-empty, stores in vector index (requires rebuild for search)

Get

#![allow(unused)]
fn main() {
let memory = db.get_memory("mem_001")?;
match memory {
    Some(mem) => println!("Content: {}", mem.content),
    None => println!("Not found"),
}
}

Signature:

#![allow(unused)]
fn main() {
pub fn get_memory(&self, id: &str) -> Result<Option<Memory>>
}

Behavior:

  • Checks cache first
  • Deserializes from storage on cache miss
  • Returns None if not found

Update

#![allow(unused)]
fn main() {
let mut memory = db.get_memory("mem_001")?.unwrap();
memory.content = "Updated content".to_string();
memory.importance = 0.9;
memory.touch(); // Update timestamp
db.insert_memory(&memory)?; // Upsert
}

Note: insert_memory() acts as upsert (update if exists, insert if not).

Delete

#![allow(unused)]
fn main() {
db.delete_memory("mem_001")?;
}

Signature:

#![allow(unused)]
fn main() {
pub fn delete_memory(&self, id: &str) -> Result<()>
}

Behavior:

  • Removes from storage
  • Invalidates cache
  • Does not remove from vector index (requires rebuild)
  • Does not remove graph edges (handle separately)

Listing Operations

List All IDs

#![allow(unused)]
fn main() {
let ids = db.list_memory_ids()?;
for id in ids {
    println!("Memory ID: {}", id);
}
}

Signature:

#![allow(unused)]
fn main() {
pub fn list_memory_ids(&self) -> Result<Vec<String>>
}

List All Memories

#![allow(unused)]
fn main() {
let memories = db.list_memories()?;
for memory in memories {
    println!("{}: {}", memory.id, memory.content);
}
}

Signature:

#![allow(unused)]
fn main() {
pub fn list_memories(&self) -> Result<Vec<Memory>>
}

Warning: Loads all memories into memory. For large datasets, use pagination (not yet implemented) or filter by prefix.

Advanced Usage

Importance Filtering

#![allow(unused)]
fn main() {
let memories = db.list_memories()?;
let important: Vec<_> = memories.into_iter()
    .filter(|m| m.importance > 0.8)
    .collect();
}

Metadata Queries

#![allow(unused)]
fn main() {
let memories = db.list_memories()?;
let category_matches: Vec<_> = memories.into_iter()
    .filter(|m| {
        m.metadata.get("category")
            .map(|v| v == "conversation")
            .unwrap_or(false)
    })
    .collect();
}

Time Range Queries

#![allow(unused)]
fn main() {
use chrono::{Utc, Duration};

let one_hour_ago = (Utc::now() - Duration::hours(1)).timestamp();
let recent: Vec<_> = db.list_memories()?.into_iter()
    .filter(|m| m.timestamp > one_hour_ago)
    .collect();
}

Embeddings

Setting Embeddings

Embeddings enable semantic search:

#![allow(unused)]
fn main() {
let embedding = generate_embedding("Hello world"); // Your embedding model
let memory = Memory {
    id: "mem_001".to_string(),
    content: "Hello world".to_string(),
    embedding, // Vec<f32>
    ..Default::default()
};
db.insert_memory(&memory)?;
}

Dimension Requirements

All embeddings must have the same dimension (default 384):

#![allow(unused)]
fn main() {
use opendb::OpenDBOptions;

let mut options = OpenDBOptions::default();
options.vector_dimension = 768; // For larger models
let db = OpenDB::open_with_options("./db", options)?;
}

Searching Embeddings

See Vector API for semantic search.

Touch Timestamp

Update access time without modifying content:

#![allow(unused)]
fn main() {
let mut memory = db.get_memory("mem_001")?.unwrap();
memory.touch(); // Sets timestamp to now
db.insert_memory(&memory)?;
}

Default Values

#![allow(unused)]
fn main() {
impl Default for Memory {
    fn default() -> Self {
        Self {
            id: String::new(),
            content: String::new(),
            embedding: Vec::new(),
            importance: 0.5,
            timestamp: chrono::Utc::now().timestamp(),
            metadata: HashMap::new(),
        }
    }
}
}

Performance Tips

  1. Batch Inserts: Use transactions for multiple inserts:
#![allow(unused)]
fn main() {
let mut txn = db.begin_transaction()?;
for memory in memories {
    // Insert via transaction (lower-level API needed)
}
txn.commit()?;
}
  1. Cache Warm-Up: Preload frequently accessed memories:
#![allow(unused)]
fn main() {
for id in important_ids {
    db.get_memory(id)?; // Populate cache
}
}
  1. Lazy Embedding Generation: Only generate embeddings when needed for search:
#![allow(unused)]
fn main() {
let memory = Memory::new(id, content);
// Don't set embedding unless search is required
db.insert_memory(&memory)?;
}

Error Handling

#![allow(unused)]
fn main() {
use opendb::Error;

match db.get_memory("mem_001") {
    Ok(Some(memory)) => { /* use memory */ },
    Ok(None) => { /* not found */ },
    Err(Error::Codec(_)) => { /* deserialization error */ },
    Err(Error::Storage(_)) => { /* storage error */ },
    Err(e) => { /* other error */ },
}
}

Next

Graph API

OpenDB provides a labeled property graph for modeling relationships between memories.

Core Concepts

  • Nodes: Memory objects (referenced by ID)
  • Edges: Directed relationships with labels and weights
  • Relations: String labels like "causes", "before", "similar_to"

Edge Type

#![allow(unused)]
fn main() {
pub struct Edge {
    pub from: String,
    pub relation: String,
    pub to: String,
    pub weight: f64,
    pub timestamp: i64,
}
}

Linking Memories

#![allow(unused)]
fn main() {
use opendb::OpenDB;

let db = OpenDB::open("./db")?;

// Create two memories
let mem1 = Memory::new("mem_001".to_string(), "Rust is fast".to_string());
let mem2 = Memory::new("mem_002".to_string(), "C++ is fast".to_string());
db.insert_memory(&mem1)?;
db.insert_memory(&mem2)?;

// Link them
db.link("mem_001", "mem_002", "similar_to")?;
}

Signature:

#![allow(unused)]
fn main() {
pub fn link(&self, from: &str, to: &str, relation: &str) -> Result<()>
}

Behavior:

  • Creates directed edge from from โ†’ to
  • Default weight: 1.0
  • Stores in both forward and backward indexes
  • Allows multiple relations between same nodes

Custom Weight

#![allow(unused)]
fn main() {
use opendb::{OpenDB, Edge};

let edge = Edge {
    from: "mem_001".to_string(),
    relation: "causes".to_string(),
    to: "mem_002".to_string(),
    weight: 0.85,  // Custom confidence score
    timestamp: chrono::Utc::now().timestamp(),
};

// Link via graph manager (internal API, use link() for simple cases)
}

Unlinking

Remove a specific relationship:

#![allow(unused)]
fn main() {
db.unlink("mem_001", "mem_002", "similar_to")?;
}

Signature:

#![allow(unused)]
fn main() {
pub fn unlink(&self, from: &str, to: &str, relation: &str) -> Result<()>
}

Behavior:

  • Removes edge from both indexes
  • Succeeds even if edge doesn't exist
  • Does not delete the nodes

Querying Relationships

#![allow(unused)]
fn main() {
let related = db.get_related("mem_001", "similar_to")?;
for edge in related {
    println!("{} --[{}]--> {} (weight: {})", 
        edge.from, edge.relation, edge.to, edge.weight);
}
}

Signature:

#![allow(unused)]
fn main() {
pub fn get_related(&self, id: &str, relation: &str) -> Result<Vec<Edge>>
}

Returns: All edges from id with the specified relation.

Get Outgoing Edges

#![allow(unused)]
fn main() {
let outgoing = db.get_outgoing("mem_001")?;
for edge in outgoing {
    println!("Outgoing: {} --[{}]--> {}", edge.from, edge.relation, edge.to);
}
}

Signature:

#![allow(unused)]
fn main() {
pub fn get_outgoing(&self, id: &str) -> Result<Vec<Edge>>
}

Returns: All edges where id is the source (all relations).

Get Incoming Edges

#![allow(unused)]
fn main() {
let incoming = db.get_incoming("mem_002")?;
for edge in incoming {
    println!("Incoming: {} --[{}]--> {}", edge.from, edge.relation, edge.to);
}
}

Signature:

#![allow(unused)]
fn main() {
pub fn get_incoming(&self, id: &str) -> Result<Vec<Edge>>
}

Returns: All edges where id is the target (all relations).

Relation Types

OpenDB provides predefined relation constants:

#![allow(unused)]
fn main() {
pub mod relation {
    pub const RELATED_TO: &str = "related_to";
    pub const CAUSED_BY: &str = "caused_by";
    pub const BEFORE: &str = "before";
    pub const AFTER: &str = "after";
    pub const REFERENCES: &str = "references";
    pub const SIMILAR_TO: &str = "similar_to";
    pub const CONTRADICTS: &str = "contradicts";
    pub const SUPPORTS: &str = "supports";
}
}

Usage

#![allow(unused)]
fn main() {
use opendb::graph::relation;

db.link("mem_001", "mem_002", relation::CAUSED_BY)?;
db.link("mem_002", "mem_003", relation::BEFORE)?;
}

Custom Relations

You can use any string as a relation:

#![allow(unused)]
fn main() {
db.link("mem_001", "mem_002", "depends_on")?;
db.link("mem_003", "mem_004", "implements")?;
}

Graph Patterns

Temporal Chain

#![allow(unused)]
fn main() {
use opendb::graph::relation;

// Build timeline
db.link("event_1", "event_2", relation::BEFORE)?;
db.link("event_2", "event_3", relation::BEFORE)?;
db.link("event_3", "event_4", relation::BEFORE)?;

// Traverse forward
let next_events = db.get_related("event_1", relation::BEFORE)?;
}

Causal Graph

#![allow(unused)]
fn main() {
use opendb::graph::relation;

// A causes B, B causes C
db.link("symptom_A", "symptom_B", relation::CAUSED_BY)?;
db.link("symptom_B", "symptom_C", relation::CAUSED_BY)?;

// Find root causes
let causes = db.get_incoming("symptom_C")?;
}

Knowledge Graph

#![allow(unused)]
fn main() {
use opendb::graph::relation;

// Rust has ownership
db.link("rust", "ownership", "has_feature")?;
// Ownership enables memory_safety
db.link("ownership", "memory_safety", "enables")?;
// Memory_safety prevents bugs
db.link("memory_safety", "bug_prevention", "prevents")?;

// Traverse features
let features = db.get_related("rust", "has_feature")?;
}

Bidirectional Relationships

#![allow(unused)]
fn main() {
// A is similar to B
db.link("mem_A", "mem_B", "similar_to")?;
// B is also similar to A
db.link("mem_B", "mem_A", "similar_to")?;

// Query either direction
let similar_from_A = db.get_related("mem_A", "similar_to")?;
let similar_from_B = db.get_related("mem_B", "similar_to")?;
}

Advanced Queries

Multi-Hop Traversal

#![allow(unused)]
fn main() {
fn traverse_depth_2(db: &OpenDB, start: &str, relation: &str) -> Result<Vec<String>> {
    let mut result = Vec::new();
    
    // First hop
    let hop1 = db.get_related(start, relation)?;
    for edge1 in hop1 {
        result.push(edge1.to.clone());
        
        // Second hop
        let hop2 = db.get_related(&edge1.to, relation)?;
        for edge2 in hop2 {
            result.push(edge2.to.clone());
        }
    }
    
    Ok(result)
}
}

Filter by Weight

#![allow(unused)]
fn main() {
let edges = db.get_related("mem_001", "similar_to")?;
let strong_edges: Vec<_> = edges.into_iter()
    .filter(|e| e.weight > 0.8)
    .collect();
}

Aggregate Relations

#![allow(unused)]
fn main() {
use std::collections::HashMap;

let outgoing = db.get_outgoing("mem_001")?;
let mut relation_counts: HashMap<String, usize> = HashMap::new();

for edge in outgoing {
    *relation_counts.entry(edge.relation).or_insert(0) += 1;
}

println!("Relation distribution: {:?}", relation_counts);
}

Performance Characteristics

OperationTime ComplexityNotes
link()O(log n)Two index writes (forward + backward)
unlink()O(k log n)k = edges between nodes
get_related()O(log n + k)k = matching edges
get_outgoing()O(log n + k)k = total outgoing edges
get_incoming()O(log n + k)k = total incoming edges

Storage Details

Edges are stored in two column families:

  1. graph_forward: {from}:{relation} โ†’ Vec<Edge>
  2. graph_backward: {to}:{relation} โ†’ Vec<Edge>

This dual-indexing enables fast queries in both directions.

Error Handling

#![allow(unused)]
fn main() {
use opendb::Error;

match db.link("mem_001", "mem_002", "related_to") {
    Ok(_) => println!("Link created"),
    Err(Error::Storage(_)) => println!("Storage error"),
    Err(Error::Graph(_)) => println!("Graph error"),
    Err(e) => println!("Other error: {}", e),
}
}

Next

Vector Search API

OpenDB provides semantic similarity search using HNSW (Hierarchical Navigable Small World) index.

Overview

Vector search enables finding memories based on semantic similarity rather than exact matches:

#![allow(unused)]
fn main() {
use opendb::OpenDB;

let db = OpenDB::open("./db")?;

// Insert memories with embeddings
let memory = Memory {
    id: "mem_001".to_string(),
    content: "Rust is a systems programming language".to_string(),
    embedding: generate_embedding("Rust is a systems programming language"),
    ..Default::default()
};
db.insert_memory(&memory)?;

// Search by query embedding
let query_embedding = generate_embedding("What is Rust?");
let results = db.search_similar(&query_embedding, 5)?;
}

Search Similar

Find memories similar to a query vector:

#![allow(unused)]
fn main() {
let results = db.search_similar(&query_embedding, top_k)?;

for result in results {
    println!("ID: {}, Distance: {}", result.id, result.distance);
    let memory = db.get_memory(&result.id)?.unwrap();
    println!("Content: {}", memory.content);
}
}

Signature:

#![allow(unused)]
fn main() {
pub fn search_similar(&self, query: &[f32], top_k: usize) -> Result<Vec<SearchResult>>
}

Parameters:

  • query: Query vector (must match configured dimension)
  • top_k: Number of results to return

Returns: Vec<SearchResult> sorted by distance (closest first).

SearchResult Type

#![allow(unused)]
fn main() {
pub struct SearchResult {
    pub id: String,
    pub distance: f32,
}
}
  • id: Memory ID
  • distance: Euclidean distance (lower = more similar)

Embeddings

Dimension Configuration

Set embedding dimension when opening database:

#![allow(unused)]
fn main() {
use opendb::OpenDBOptions;

let mut options = OpenDBOptions::default();
options.vector_dimension = 768; // For OpenAI ada-002 or similar
let db = OpenDB::open_with_options("./db", options)?;
}

Default: 384 (for sentence-transformers/all-MiniLM-L6-v2)

Generating Embeddings

OpenDB does not include embedding generation. Use external models:

Example: sentence-transformers (Python)

from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')
embedding = model.encode("Hello world").tolist()  # [0.1, -0.2, ...]

Example: OpenAI API

#![allow(unused)]
fn main() {
// Pseudo-code (use openai-rust crate)
let embedding = openai_client
    .embeddings("text-embedding-ada-002")
    .create("Hello world")
    .await?;
}

Example: Candle (Rust)

#![allow(unused)]
fn main() {
// Use candle-transformers for local inference
// See: https://github.com/huggingface/candle
}

Synthetic Embeddings (Testing)

For testing without real models:

#![allow(unused)]
fn main() {
fn generate_synthetic_embedding(text: &str, dimension: usize) -> Vec<f32> {
    use std::collections::hash_map::DefaultHasher;
    use std::hash::{Hash, Hasher};
    
    let mut hasher = DefaultHasher::new();
    text.hash(&mut hasher);
    let seed = hasher.finish();
    
    let mut rng = /* initialize with seed */;
    (0..dimension).map(|_| rng.gen_range(-1.0..1.0)).collect()
}
}

Index Management

Automatic Index Building

The HNSW index is built automatically on first search:

#![allow(unused)]
fn main() {
// Insert memories
db.insert_memory(&memory1)?;
db.insert_memory(&memory2)?;

// First search triggers index build
let results = db.search_similar(&query, 5)?; // Builds index here
}

Manual Rebuild

Force index rebuild (e.g., after bulk inserts):

#![allow(unused)]
fn main() {
db.rebuild_vector_index()?;
}

Signature:

#![allow(unused)]
fn main() {
pub fn rebuild_vector_index(&self) -> Result<()>
}

When to rebuild:

  • After bulk memory inserts
  • After changing embeddings
  • To incorporate deleted memories

Note: Search automatically rebuilds if index is stale.

HNSW Parameters

HNSW has tunable parameters for speed vs accuracy tradeoff:

Default Parameters

#![allow(unused)]
fn main() {
pub struct HnswParams {
    pub ef_construction: usize, // 200
    pub max_neighbors: usize,   // 16
}
}

Presets

#![allow(unused)]
fn main() {
// High accuracy (slower build, better recall)
HnswParams::high_accuracy()  // ef=400, neighbors=32

// High speed (faster build, lower recall)
HnswParams::high_speed()     // ef=100, neighbors=8

// Balanced (default)
HnswParams::default()        // ef=200, neighbors=16
}

Note: Currently not exposed in OpenDB API. Future versions will allow tuning.

Distance Metric

OpenDB uses Euclidean distance:

$$ d(p, q) = \sqrt{\sum_{i=1}^{n} (p_i - q_i)^2} $$

Properties:

  • Lower distance = more similar
  • Distance 0 = identical vectors
  • Sensitive to magnitude (normalize if needed)

Normalization

For cosine similarity behavior, normalize embeddings:

#![allow(unused)]
fn main() {
fn normalize(vec: &mut Vec<f32>) {
    let magnitude: f32 = vec.iter().map(|x| x * x).sum::<f32>().sqrt();
    for x in vec.iter_mut() {
        *x /= magnitude;
    }
}

let mut embedding = generate_embedding(text);
normalize(&mut embedding);
}

Usage Patterns

#![allow(unused)]
fn main() {
// User asks a question
let query = "How do I prevent memory leaks in Rust?";
let query_embedding = generate_embedding(query);

// Find relevant memories
let results = db.search_similar(&query_embedding, 3)?;
for result in results {
    let memory = db.get_memory(&result.id)?.unwrap();
    println!("Relevant memory: {}", memory.content);
}
}

Deduplication

Find duplicate or near-duplicate content:

#![allow(unused)]
fn main() {
let new_content = "Rust ownership prevents data races";
let new_embedding = generate_embedding(new_content);

let similar = db.search_similar(&new_embedding, 1)?;
if let Some(top) = similar.first() {
    if top.distance < 0.1 {  // Threshold for "duplicate"
        println!("Similar content already exists: {}", top.id);
    }
}
}

Clustering

Group similar memories:

#![allow(unused)]
fn main() {
let all_memories = db.list_memories()?;
let mut clusters: Vec<Vec<String>> = Vec::new();

for memory in all_memories {
    if memory.embedding.is_empty() {
        continue;
    }
    
    let similar = db.search_similar(&memory.embedding, 5)?;
    let cluster: Vec<String> = similar.iter()
        .filter(|r| r.distance < 0.5)  // Similarity threshold
        .map(|r| r.id.clone())
        .collect();
    
    clusters.push(cluster);
}
}

Performance Characteristics

OperationTime ComplexityTypical Latency
search_similar()O(log n)~1-10ms
rebuild_vector_index()O(n log n)~100ms per 1k vectors
Insert with embeddingO(1) + rebuildInstant (rebuild deferred)

Scalability:

  • 100-1k memories: Instant search
  • 1k-10k memories: <10ms search
  • 10k-100k memories: <50ms search
  • 100k+ memories: Consider sharding (future feature)

Limitations

  1. Dimension Mismatch: All embeddings must have same dimension
  2. No Incremental Updates: Index rebuild is full reconstruction
  3. Memory Usage: HNSW index kept in memory (~4 bytes ร— dimension ร— count)
  4. No GPU Support: Pure CPU implementation

Error Handling

#![allow(unused)]
fn main() {
use opendb::Error;

match db.search_similar(&query, 10) {
    Ok(results) => { /* use results */ },
    Err(Error::VectorIndex(e)) => println!("Index error: {}", e),
    Err(Error::InvalidInput(e)) => println!("Bad query: {}", e),
    Err(e) => println!("Other error: {}", e),
}
}

Best Practices

  1. Batch Inserts: Insert all memories, then rebuild once:
#![allow(unused)]
fn main() {
for memory in memories {
    db.insert_memory(&memory)?;
}
db.rebuild_vector_index()?; // One rebuild for all
}
  1. Lazy Embeddings: Only generate embeddings for searchable content:
#![allow(unused)]
fn main() {
let memory = Memory::new(id, content);
// Don't set embedding if this memory won't be searched
db.insert_memory(&memory)?;
}
  1. Relevance Filtering: Filter by distance threshold:
#![allow(unused)]
fn main() {
let results = db.search_similar(&query, 20)?;
let relevant: Vec<_> = results.into_iter()
    .filter(|r| r.distance < 1.0)  // Adjust threshold
    .collect();
}
  1. Combine with Metadata: Use metadata to post-filter:
#![allow(unused)]
fn main() {
let results = db.search_similar(&query, 50)?;
for result in results {
    let memory = db.get_memory(&result.id)?.unwrap();
    if memory.metadata.get("category") == Some(&"docs".to_string()) {
        println!("Relevant doc: {}", memory.content);
    }
}
}

Next

Multimodal File Support

OpenDB provides production-ready support for multimodal file processing, designed specifically for AI/LLM applications, RAG (Retrieval Augmented Generation) pipelines, and agent memory systems.

Overview

The multimodal API enables you to:

  • Detect and classify file types (PDF, DOCX, audio, video, text)
  • Process and chunk large documents
  • Store extracted text with embeddings
  • Track processing status for async workflows
  • Add custom metadata for any file type

File Type Detection

FileType Enum

The FileType enum represents supported file formats:

#![allow(unused)]
fn main() {
use opendb::FileType;

// Automatic detection from file extension
let pdf_type = FileType::from_extension("pdf");
assert_eq!(pdf_type, FileType::Pdf);

let audio_type = FileType::from_extension("mp3");
assert_eq!(audio_type, FileType::Audio);

// Get human-readable description
println!("{}", pdf_type.description()); // "PDF document"
println!("{}", audio_type.description()); // "Audio file"
}

Supported File Types

FileTypeExtensionsDescription
Text.txtPlain text file
Pdf.pdfPDF document
Docx.docxMicrosoft Word document
Audio.mp3, .wav, .ogg, .flacAudio file
Video.mp4, .avi, .mkv, .movVideo file
Image.jpg, .png, .gif, .bmpImage file
UnknownothersUnknown file type

Example: File Type Detection

#![allow(unused)]
fn main() {
use opendb::FileType;

fn detect_file_type(filename: &str) -> FileType {
    let extension = filename
        .rsplit('.')
        .next()
        .unwrap_or("");
    
    FileType::from_extension(extension)
}

// Usage
let file = "research_paper.pdf";
let file_type = detect_file_type(file);

match file_type {
    FileType::Pdf => println!("Processing PDF document"),
    FileType::Audio => println!("Transcribing audio file"),
    FileType::Video => println!("Extracting video captions"),
    _ => println!("Unsupported file type"),
}
}

Multimodal Documents

MultimodalDocument Structure

The MultimodalDocument struct represents a processed file with extracted content:

#![allow(unused)]
fn main() {
pub struct MultimodalDocument {
    pub id: String,
    pub filename: String,
    pub file_type: FileType,
    pub file_size: usize,
    pub extracted_text: String,
    pub chunks: Vec<DocumentChunk>,
    pub embedding: Option<Vec<f32>>,
    pub metadata: HashMap<String, String>,
    pub processing_status: ProcessingStatus,
    pub created_at: DateTime<Utc>,
    pub updated_at: DateTime<Utc>,
}
}

CRUD Operations

Create

#![allow(unused)]
fn main() {
use opendb::{MultimodalDocument, FileType};

// Create a new multimodal document
let doc = MultimodalDocument::new(
    "doc_001",                     // Unique ID
    "research_paper.pdf",          // Filename
    FileType::Pdf,                 // File type
    1024 * 500,                    // File size in bytes (500 KB)
    "Extracted text content...",   // Extracted text
    vec![0.1; 384],                // Document embedding (384-dim)
);

// Add metadata
let doc = doc
    .with_metadata("author", "Dr. Jane Smith")
    .with_metadata("pages", "25")
    .with_metadata("year", "2024")
    .with_metadata("category", "machine-learning");

println!("Created document: {}", doc.id);
println!("Status: {:?}", doc.processing_status);
}

Read

#![allow(unused)]
fn main() {
// Access document properties
println!("Filename: {}", doc.filename);
println!("File type: {:?}", doc.file_type);
println!("File size: {} KB", doc.file_size / 1024);
println!("Extracted text length: {} chars", doc.extracted_text.len());
println!("Number of chunks: {}", doc.chunks.len());

// Access metadata
if let Some(author) = doc.metadata.get("author") {
    println!("Author: {}", author);
}

// Check processing status
match &doc.processing_status {
    ProcessingStatus::Completed => println!("โœ“ Processing complete"),
    ProcessingStatus::Processing => println!("โณ Still processing..."),
    ProcessingStatus::Failed(err) => println!("โœ— Failed: {}", err),
    ProcessingStatus::Queued => println!("โธ Queued for processing"),
}
}

Update

#![allow(unused)]
fn main() {
use opendb::ProcessingStatus;

// Update processing status
let mut doc = doc.clone();
doc.processing_status = ProcessingStatus::Processing;

// Add more metadata
doc.metadata.insert("processed_by".to_string(), "worker-01".to_string());
doc.metadata.insert("processing_time_ms".to_string(), "1234".to_string());

// Mark as completed
doc.processing_status = ProcessingStatus::Completed;
doc.updated_at = chrono::Utc::now();

println!("Updated document: {}", doc.id);
}

Delete

#![allow(unused)]
fn main() {
// In OpenDB, you would typically delete by ID using the database handle
// This is a conceptual example showing how to remove from memory

let mut documents: Vec<MultimodalDocument> = vec![/* ... */];
documents.retain(|d| d.id != "doc_001");

println!("Document deleted");
}

Document Chunking

DocumentChunk Structure

For large documents, use DocumentChunk to split content into processable segments:

#![allow(unused)]
fn main() {
pub struct DocumentChunk {
    pub chunk_id: String,
    pub content: String,
    pub embedding: Option<Vec<f32>>,
    pub start_offset: usize,
    pub end_offset: usize,
    pub metadata: HashMap<String, String>,
}
}

Creating Chunks

#![allow(unused)]
fn main() {
use opendb::{DocumentChunk, MultimodalDocument};

let mut doc = MultimodalDocument::new(
    "doc_002",
    "large_book.pdf",
    FileType::Pdf,
    1024 * 1024 * 5, // 5 MB
    "Full book content...",
    vec![0.1; 384],
);

// Add chunks (e.g., by chapter or page)
doc.add_chunk(DocumentChunk::new(
    "chunk_0",
    "Chapter 1: Introduction to Rust programming...",
    vec![0.15; 384],  // Chunk-specific embedding
    0,                // Start offset
    1500,             // End offset
).with_metadata("chapter", "1")
  .with_metadata("page_start", "1")
  .with_metadata("page_end", "15"));

doc.add_chunk(DocumentChunk::new(
    "chunk_1",
    "Chapter 2: Ownership and Borrowing...",
    vec![0.25; 384],
    1500,
    3200,
).with_metadata("chapter", "2")
  .with_metadata("page_start", "16")
  .with_metadata("page_end", "32"));

println!("Added {} chunks", doc.chunks.len());
}

Chunk Strategies

1. Fixed-Size Chunking

#![allow(unused)]
fn main() {
fn chunk_by_size(text: &str, chunk_size: usize) -> Vec<String> {
    text.chars()
        .collect::<Vec<_>>()
        .chunks(chunk_size)
        .map(|chunk| chunk.iter().collect())
        .collect()
}

// Usage
let text = "Very long document text...";
let chunks = chunk_by_size(&text, 1000);
}

2. Paragraph-Based Chunking

#![allow(unused)]
fn main() {
fn chunk_by_paragraphs(text: &str, max_paragraphs: usize) -> Vec<String> {
    text.split("\n\n")
        .collect::<Vec<_>>()
        .chunks(max_paragraphs)
        .map(|chunk| chunk.join("\n\n"))
        .collect()
}

// Usage
let chunks = chunk_by_paragraphs(&text, 3);
}

3. Token-Based Chunking (for LLMs)

#![allow(unused)]
fn main() {
// Requires tiktoken-rs or similar tokenizer
fn chunk_by_tokens(text: &str, max_tokens: usize) -> Vec<String> {
    // Pseudo-code - use actual tokenizer in production
    let tokens = tokenize(text);
    tokens
        .chunks(max_tokens)
        .map(|chunk| detokenize(chunk))
        .collect()
}
}

Processing Status

ProcessingStatus Enum

Track the lifecycle of document processing:

#![allow(unused)]
fn main() {
use opendb::ProcessingStatus;

// Status variants
let queued = ProcessingStatus::Queued;
let processing = ProcessingStatus::Processing;
let completed = ProcessingStatus::Completed;
let failed = ProcessingStatus::Failed("OCR error".to_string());

// Pattern matching
match doc.processing_status {
    ProcessingStatus::Queued => {
        println!("Document is queued for processing");
    }
    ProcessingStatus::Processing => {
        println!("Processing in progress...");
    }
    ProcessingStatus::Completed => {
        println!("โœ“ Processing completed successfully");
    }
    ProcessingStatus::Failed(error) => {
        eprintln!("โœ— Processing failed: {}", error);
    }
}
}

Production Workflow

Complete PDF Processing Example

#![allow(unused)]
fn main() {
use opendb::{OpenDB, MultimodalDocument, DocumentChunk, FileType, ProcessingStatus};
use std::fs;

fn process_pdf(filepath: &str, db: &OpenDB) -> Result<String> {
    // 1. Read file
    let file_bytes = fs::read(filepath)?;
    let filename = filepath.rsplit('/').next().unwrap();
    
    // 2. Extract text (use pdf-extract or pdfium in production)
    let extracted_text = extract_pdf_text(&file_bytes)?;
    
    // 3. Generate document embedding
    let doc_embedding = generate_embedding(&extracted_text)?;
    
    // 4. Create multimodal document
    let mut doc = MultimodalDocument::new(
        &generate_id(),
        filename,
        FileType::Pdf,
        file_bytes.len(),
        &extracted_text,
        doc_embedding,
    )
    .with_metadata("source", "upload")
    .with_metadata("pages", &count_pages(&file_bytes).to_string());
    
    // 5. Chunk the document
    let chunks = chunk_text(&extracted_text, 1000);
    for (i, chunk_text) in chunks.iter().enumerate() {
        let chunk_embedding = generate_embedding(chunk_text)?;
        let chunk = DocumentChunk::new(
            &format!("chunk_{}", i),
            chunk_text,
            chunk_embedding,
            i * 1000,
            (i + 1) * 1000,
        )
        .with_metadata("chunk_index", &i.to_string());
        
        doc.add_chunk(chunk);
    }
    
    // 6. Mark as completed
    doc.processing_status = ProcessingStatus::Completed;
    
    // 7. Store in OpenDB (pseudo-code - actual storage via Memory type)
    let doc_id = doc.id.clone();
    store_document(db, &doc)?;
    
    Ok(doc_id)
}

// Helper functions (implement with actual libraries)
fn extract_pdf_text(bytes: &[u8]) -> Result<String> {
    // Use pdf-extract, pdfium, or poppler
    todo!("Implement with pdf-extract crate")
}

fn generate_embedding(text: &str) -> Result<Vec<f32>> {
    // Use sentence-transformers, OpenAI API, or onnxruntime
    todo!("Implement with embedding model")
}

fn chunk_text(text: &str, size: usize) -> Vec<String> {
    // Smart chunking by sentences/paragraphs
    todo!("Implement chunking strategy")
}

fn generate_id() -> String {
    uuid::Uuid::new_v4().to_string()
}

fn count_pages(bytes: &[u8]) -> usize {
    // Parse PDF to count pages
    todo!("Implement page counting")
}

fn store_document(db: &OpenDB, doc: &MultimodalDocument) -> Result<()> {
    // Store document and chunks as Memory records with embeddings
    todo!("Implement storage logic")
}
}

Audio Transcription Example

#![allow(unused)]
fn main() {
use opendb::{MultimodalDocument, DocumentChunk, FileType, ProcessingStatus};

fn process_audio(filepath: &str) -> Result<MultimodalDocument> {
    let file_bytes = fs::read(filepath)?;
    let filename = filepath.rsplit('/').next().unwrap();
    
    // 1. Transcribe audio (use whisper-rs or OpenAI Whisper API)
    let transcript = transcribe_audio(&file_bytes)?;
    
    // 2. Generate embedding from transcript
    let embedding = generate_embedding(&transcript)?;
    
    // 3. Create multimodal document
    let mut doc = MultimodalDocument::new(
        &generate_id(),
        filename,
        FileType::Audio,
        file_bytes.len(),
        &transcript,
        embedding,
    )
    .with_metadata("duration_seconds", &get_audio_duration(&file_bytes).to_string())
    .with_metadata("transcription_model", "whisper-large-v3");
    
    // 4. Add timestamped chunks
    let timestamped_segments = get_timestamped_segments(&file_bytes)?;
    for (i, segment) in timestamped_segments.iter().enumerate() {
        let chunk_embedding = generate_embedding(&segment.text)?;
        let chunk = DocumentChunk::new(
            &format!("segment_{}", i),
            &segment.text,
            chunk_embedding,
            segment.start_offset,
            segment.end_offset,
        )
        .with_metadata("timestamp_start", &segment.start_time.to_string())
        .with_metadata("timestamp_end", &segment.end_time.to_string());
        
        doc.add_chunk(chunk);
    }
    
    doc.processing_status = ProcessingStatus::Completed;
    Ok(doc)
}

struct AudioSegment {
    text: String,
    start_time: f64,
    end_time: f64,
    start_offset: usize,
    end_offset: usize,
}

fn transcribe_audio(bytes: &[u8]) -> Result<String> {
    // Use whisper-rs or cloud API
    todo!("Implement transcription")
}

fn get_audio_duration(bytes: &[u8]) -> f64 {
    // Parse audio metadata
    todo!("Implement duration extraction")
}

fn get_timestamped_segments(bytes: &[u8]) -> Result<Vec<AudioSegment>> {
    // Use Whisper with timestamps
    todo!("Implement segment extraction")
}
}

Integration with OpenDB

Storing Multimodal Documents

#![allow(unused)]
fn main() {
use opendb::{OpenDB, Memory, MultimodalDocument};

fn store_multimodal_document(db: &OpenDB, doc: &MultimodalDocument) -> Result<()> {
    // Store main document as Memory
    let memory = Memory::new(
        &doc.id,
        &doc.extracted_text,
        doc.embedding.clone().unwrap_or_default(),
        1.0, // importance
    )
    .with_metadata("filename", &doc.filename)
    .with_metadata("file_type", &format!("{:?}", doc.file_type))
    .with_metadata("file_size", &doc.file_size.to_string());
    
    db.insert_memory(&memory)?;
    
    // Store each chunk as separate Memory with relationships
    for chunk in &doc.chunks {
        let chunk_memory = Memory::new(
            &format!("{}_{}", doc.id, chunk.chunk_id),
            &chunk.content,
            chunk.embedding.clone().unwrap_or_default(),
            0.8, // chunk importance
        )
        .with_metadata("parent_doc", &doc.id)
        .with_metadata("chunk_id", &chunk.chunk_id);
        
        db.insert_memory(&chunk_memory)?;
        
        // Link chunk to parent document
        db.link(&memory.id, "has_chunk", &chunk_memory.id)?;
    }
    
    Ok(())
}
}

Semantic Search Across Documents

#![allow(unused)]
fn main() {
use opendb::{OpenDB, SearchResult};

fn search_documents(
    db: &OpenDB,
    query: &str,
    top_k: usize,
) -> Result<Vec<SearchResult>> {
    // Generate query embedding
    let query_embedding = generate_embedding(query)?;
    
    // Search across all documents and chunks
    let results = db.search_similar(&query_embedding, top_k)?;
    
    Ok(results)
}

// Usage
let results = search_documents(&db, "machine learning algorithms", 5)?;
for result in results {
    println!("Found: {} (distance: {:.4})", 
             result.memory.content,
             result.distance);
}
}

Best Practices

1. Chunking Strategy

  • Small chunks (500-1000 chars): Better precision, more API calls
  • Large chunks (1500-3000 chars): More context, fewer API calls
  • Overlap chunks: 10-20% overlap for continuity

2. Metadata Usage

  • Always add source file metadata
  • Include timestamps for temporal data
  • Add processing metadata (model version, date)
  • Store original file path for reference

3. Error Handling

#![allow(unused)]
fn main() {
use opendb::ProcessingStatus;

fn safe_process(filepath: &str) -> MultimodalDocument {
    let mut doc = MultimodalDocument::new(
        &generate_id(),
        filepath,
        FileType::Unknown,
        0,
        "",
        vec![],
    );
    
    doc.processing_status = ProcessingStatus::Queued;
    
    match process_file(filepath) {
        Ok(processed) => {
            doc = processed;
            doc.processing_status = ProcessingStatus::Completed;
        }
        Err(e) => {
            doc.processing_status = ProcessingStatus::Failed(e.to_string());
            eprintln!("Processing failed: {}", e);
        }
    }
    
    doc
}
}

4. Memory Management

  • Process files in batches
  • Clear processed chunks from memory
  • Use streaming for very large files
  • Implement backpressure for async processing

See Also

Production Libraries

PDF Processing

  • pdf-extract - Text extraction
  • pdfium-render - Rendering and OCR
  • lopdf - Low-level parsing

DOCX Processing

  • docx-rs - Read/write DOCX
  • mammoth-rs - Convert to text

Audio Transcription

  • whisper-rs - Local Whisper
  • OpenAI Whisper API - Cloud service

Video Processing

  • ffmpeg-next - Video/audio extraction
  • Combine with whisper for captions

Embeddings

  • sentence-transformers (Python + PyO3)
  • OpenAI Embeddings API
  • onnxruntime - Local models

Transactions API

OpenDB provides ACID-compliant transactions for atomic multi-operation updates.

Overview

Transactions group multiple operations into a single atomic unit:

#![allow(unused)]
fn main() {
use opendb::OpenDB;

let db = OpenDB::open("./db")?;
let mut txn = db.begin_transaction()?;

txn.put("default", b"key1", b"value1")?;
txn.put("default", b"key2", b"value2")?;
txn.commit()?; // Both writes succeed or both fail
}

Basic API

Begin Transaction

#![allow(unused)]
fn main() {
let mut txn = db.begin_transaction()?;
}

Signature:

#![allow(unused)]
fn main() {
pub fn begin_transaction(&self) -> Result<Transaction>
}

Returns: Transaction handle for performing operations.

Commit

#![allow(unused)]
fn main() {
txn.commit()?;
}

Signature:

#![allow(unused)]
fn main() {
pub fn commit(mut self) -> Result<()>
}

Behavior:

  • Atomically applies all changes
  • Returns error if conflicts detected (optimistic locking)
  • Consumes transaction (can't use after commit)

Rollback

#![allow(unused)]
fn main() {
txn.rollback()?;
}

Signature:

#![allow(unused)]
fn main() {
pub fn rollback(mut self) -> Result<()>
}

Behavior:

  • Discards all changes
  • Always succeeds
  • Consumes transaction

Auto-Rollback

Transactions auto-rollback if dropped without commit:

#![allow(unused)]
fn main() {
{
    let mut txn = db.begin_transaction()?;
    txn.put("default", b"key", b"value")?;
    // txn dropped here โ†’ automatic rollback
}

// Key was not written
assert!(db.get(b"key")?.is_none());
}

Transaction Operations

Get

#![allow(unused)]
fn main() {
let value = txn.get("default", b"key")?;
}

Signature:

#![allow(unused)]
fn main() {
pub fn get(&self, cf: &str, key: &[u8]) -> Result<Option<Vec<u8>>>
}

Behavior:

  • Reads from transaction snapshot
  • Sees writes from current transaction
  • Isolated from concurrent transactions

Put

#![allow(unused)]
fn main() {
txn.put("default", b"key", b"value")?;
}

Signature:

#![allow(unused)]
fn main() {
pub fn put(&mut self, cf: &str, key: &[u8], value: &[u8]) -> Result<()>
}

Behavior:

  • Buffers write in transaction
  • Not visible outside transaction until commit
  • Visible to subsequent reads in same transaction

Delete

#![allow(unused)]
fn main() {
txn.delete("default", b"key")?;
}

Signature:

#![allow(unused)]
fn main() {
pub fn delete(&mut self, cf: &str, key: &[u8]) -> Result<()>
}

Behavior:

  • Buffers delete in transaction
  • Subsequent gets in same transaction return None

Column Families

Transactions work across all column families:

#![allow(unused)]
fn main() {
let mut txn = db.begin_transaction()?;

// Write to different column families
txn.put("default", b"kv_key", b"value")?;
txn.put("records", b"mem_001", &encoded_memory)?;
txn.put("graph_forward", b"mem_001:related_to", &edges)?;

txn.commit()?; // All or nothing
}

Available Column Families:

  • "default" - KV store
  • "records" - Memory records
  • "graph_forward" - Outgoing edges
  • "graph_backward" - Incoming edges
  • "vector_data" - Embedding data
  • "vector_index" - HNSW index
  • "metadata" - Database metadata

ACID Examples

Atomicity

Either all operations succeed or none:

#![allow(unused)]
fn main() {
let mut txn = db.begin_transaction()?;

txn.put("default", b"account_A", b"-100")?;
txn.put("default", b"account_B", b"+100")?;

match txn.commit() {
    Ok(_) => println!("Transfer complete"),
    Err(e) => println!("Transfer failed, both accounts unchanged: {}", e),
}
}

Consistency

Maintain invariants across operations:

#![allow(unused)]
fn main() {
// Invariant: memory must exist before linking
let mut txn = db.begin_transaction()?;

// Insert memories
txn.put("records", b"mem_001", &encode_memory(&mem1))?;
txn.put("records", b"mem_002", &encode_memory(&mem2))?;

// Create link (requires both memories exist)
txn.put("graph_forward", b"mem_001:related_to", &encode_edges(&edges))?;

txn.commit()?; // Ensures consistency
}

Isolation

Transactions don't see each other's uncommitted changes:

#![allow(unused)]
fn main() {
// Transaction 1
let mut txn1 = db.begin_transaction()?;
txn1.put("default", b"counter", b"100")?;

// Transaction 2 (concurrent)
let mut txn2 = db.begin_transaction()?;
let val = txn2.get("default", b"counter")?; // Sees old value (not 100)

txn1.commit()?;
txn2.commit()?; // May conflict depending on operations
}

Durability

Committed changes survive crashes:

#![allow(unused)]
fn main() {
let mut txn = db.begin_transaction()?;
txn.put("default", b"important", b"data")?;
txn.commit()?;

// Even if process crashes here, data is safe

// Reopen database
let db = OpenDB::open("./db")?;
assert_eq!(db.get(b"important")?.unwrap(), b"data");
}

Conflict Handling

Transactions use optimistic locking and may fail on conflict:

#![allow(unused)]
fn main() {
use opendb::Error;

loop {
    let mut txn = db.begin_transaction()?;
    
    // Read-modify-write
    let val = txn.get("default", b"counter")?
        .and_then(|v| String::from_utf8(v).ok())
        .and_then(|s| s.parse::<i64>().ok())
        .unwrap_or(0);
    
    let new_val = val + 1;
    txn.put("default", b"counter", new_val.to_string().as_bytes())?;
    
    match txn.commit() {
        Ok(_) => break,
        Err(Error::Transaction(_)) => {
            println!("Conflict detected, retrying...");
            continue; // Retry
        }
        Err(e) => return Err(e),
    }
}
}

Advanced Patterns

Compare-and-Swap

#![allow(unused)]
fn main() {
fn compare_and_swap(
    db: &OpenDB,
    key: &[u8],
    expected: &[u8],
    new_value: &[u8],
) -> Result<bool> {
    let mut txn = db.begin_transaction()?;
    
    let current = txn.get("default", key)?;
    if current.as_deref() != Some(expected) {
        txn.rollback()?;
        return Ok(false); // Value changed
    }
    
    txn.put("default", key, new_value)?;
    txn.commit()?;
    Ok(true)
}
}

Batch Updates

#![allow(unused)]
fn main() {
fn batch_update(db: &OpenDB, updates: Vec<(Vec<u8>, Vec<u8>)>) -> Result<()> {
    let mut txn = db.begin_transaction()?;
    
    for (key, value) in updates {
        txn.put("default", &key, &value)?;
    }
    
    txn.commit()
}
}

Conditional Delete

#![allow(unused)]
fn main() {
fn delete_if_exists(db: &OpenDB, key: &[u8]) -> Result<bool> {
    let mut txn = db.begin_transaction()?;
    
    if txn.get("default", key)?.is_none() {
        txn.rollback()?;
        return Ok(false);
    }
    
    txn.delete("default", key)?;
    txn.commit()?;
    Ok(true)
}
}

Performance Considerations

Transaction Overhead

Transactions have overhead compared to direct writes:

#![allow(unused)]
fn main() {
// โŒ Slower: Many small transactions
for i in 0..1000 {
    let mut txn = db.begin_transaction()?;
    txn.put("default", &format!("key_{}", i).as_bytes(), b"value")?;
    txn.commit()?;
}

// โœ… Faster: One transaction for batch
let mut txn = db.begin_transaction()?;
for i in 0..1000 {
    txn.put("default", &format!("key_{}", i).as_bytes(), b"value")?;
}
txn.commit()?;
}

Transaction Size

Keep transactions reasonably sized:

  • Small (1-100 ops): Best performance
  • Medium (100-1000 ops): Good
  • Large (1000+ ops): May increase conflict rate and memory usage

Conflict Rate

High contention increases conflict rate:

#![allow(unused)]
fn main() {
// High contention: many threads updating same key
// Solution: Shard keys or use separate counters
}

Limitations

  1. Single-threaded: One transaction per thread
  2. No nested transactions: Can't begin transaction within transaction
  3. Memory buffering: Large transactions use more memory
  4. Optimistic locking: High contention may cause retries

Error Handling

#![allow(unused)]
fn main() {
use opendb::Error;

let mut txn = db.begin_transaction()?;
txn.put("default", b"key", b"value")?;

match txn.commit() {
    Ok(_) => println!("Success"),
    Err(Error::Transaction(e)) => println!("Conflict: {}", e),
    Err(Error::Storage(e)) => println!("Storage error: {}", e),
    Err(e) => println!("Other error: {}", e),
}
}

Best Practices

  1. Keep transactions short: Minimize duration to reduce conflicts
  2. Handle conflicts: Implement retry logic for read-modify-write
  3. Batch when possible: Group related operations
  4. Use auto-rollback: Let Drop handle cleanup in error paths
  5. Explicit commits: Don't rely on implicit behavior

Next

Performance Tuning

This guide covers optimization strategies for OpenDB deployments.

Profiling

Before optimizing, measure your bottleneck:

#![allow(unused)]
fn main() {
use std::time::Instant;

let start = Instant::now();
db.insert_memory(&memory)?;
println!("Insert took: {:?}", start.elapsed());
}

RocksDB Tuning

Write Buffer Size

Larger write buffers improve write throughput:

#![allow(unused)]
fn main() {
// Default: 128 MB
// For write-heavy workloads, increase:
opts.set_write_buffer_size(256 * 1024 * 1024); // 256 MB
}

Trade-offs:

  • โœ… Fewer flushes to disk
  • โœ… Better write throughput
  • โŒ More memory usage
  • โŒ Longer recovery time after crash

Block Cache

RocksDB's internal cache for disk blocks:

#![allow(unused)]
fn main() {
opts.set_block_cache_size(512 * 1024 * 1024); // 512 MB
}

Trade-offs:

  • โœ… Faster reads
  • โŒ More memory usage

Compression

Balance CPU vs storage:

#![allow(unused)]
fn main() {
use rocksdb::DBCompressionType;

// Default: LZ4 (fast, moderate compression)
opts.set_compression_type(DBCompressionType::Lz4);

// For better compression (slower writes):
opts.set_compression_type(DBCompressionType::Zstd);

// For faster writes (larger storage):
opts.set_compression_type(DBCompressionType::None);
}

Parallelism

Increase background threads for compaction:

#![allow(unused)]
fn main() {
opts.increase_parallelism(4); // Use 4 threads
}

Cache Tuning

Cache Sizes

Adjust cache capacity based on workload:

#![allow(unused)]
fn main() {
use opendb::OpenDBOptions;

let mut options = OpenDBOptions::default();

// For read-heavy workloads
options.kv_cache_size = 10_000;
options.record_cache_size = 5_000;

// For write-heavy workloads (smaller cache)
options.kv_cache_size = 1_000;
options.record_cache_size = 500;

let db = OpenDB::open_with_options("./db", options)?;
}

Cache Hit Rate

Monitor cache effectiveness:

#![allow(unused)]
fn main() {
// Implement hit rate tracking (example)
struct CacheStats {
    hits: AtomicU64,
    misses: AtomicU64,
}

impl CacheStats {
    fn hit_rate(&self) -> f64 {
        let hits = self.hits.load(Ordering::Relaxed) as f64;
        let misses = self.misses.load(Ordering::Relaxed) as f64;
        hits / (hits + misses)
    }
}
}

Target hit rates:

  • > 90%: Excellent
  • 70-90%: Good
  • < 70%: Increase cache size

Batch Operations

Batch Inserts

Use transactions for bulk inserts:

#![allow(unused)]
fn main() {
// โŒ Slow: Individual commits
for memory in memories {
    db.insert_memory(&memory)?;
}

// โœ… Fast: Batch commit (future API)
let mut txn = db.begin_transaction()?;
for memory in memories {
    // Insert via transaction
}
txn.commit()?;
}

Flush Control

Control when data is flushed to disk:

#![allow(unused)]
fn main() {
// Insert many records
for i in 0..10_000 {
    db.insert_memory(&memory)?;
}

// Explicit flush
db.flush()?;
}

Vector Search Optimization

Index Parameters

Tune HNSW parameters for your use case:

#![allow(unused)]
fn main() {
// High accuracy (slower, better recall)
HnswParams::high_accuracy()  // ef=400, neighbors=32

// High speed (faster, lower recall)
HnswParams::high_speed()     // ef=100, neighbors=8
}

Rebuild Strategy

Rebuild index strategically:

#![allow(unused)]
fn main() {
// โŒ Bad: Rebuild after every insert
for memory in memories {
    db.insert_memory(&memory)?;
    db.rebuild_vector_index()?; // Expensive!
}

// โœ… Good: Rebuild once after batch
for memory in memories {
    db.insert_memory(&memory)?;
}
db.rebuild_vector_index()?; // Once
}

Dimension Reduction

Lower dimensions = faster search:

#![allow(unused)]
fn main() {
// 768D (high quality, slower)
options.vector_dimension = 768;

// 384D (balanced)
options.vector_dimension = 384;

// 128D (fast, lower quality)
options.vector_dimension = 128;
}

Graph Optimization

Batch graph operations:

#![allow(unused)]
fn main() {
// Create all memories first
for memory in memories {
    db.insert_memory(&memory)?;
}

// Then create all links
for (from, to, relation) in edges {
    db.link(from, to, relation)?;
}
}

Prune Unused Relations

Remove stale edges periodically:

#![allow(unused)]
fn main() {
fn prune_orphaned_edges(db: &OpenDB) -> Result<()> {
    let all_ids: HashSet<_> = db.list_memory_ids()?.into_iter().collect();
    
    for id in db.list_memory_ids()? {
        let outgoing = db.get_outgoing(&id)?;
        for edge in outgoing {
            if !all_ids.contains(&edge.to) {
                db.unlink(&edge.from, &edge.to, &edge.relation)?;
            }
        }
    }
    
    Ok(())
}
}

Memory Usage

Estimate Memory Footprint

Total Memory = 
    RocksDB Write Buffers +
    RocksDB Block Cache +
    Application Caches +
    HNSW Index +
    Overhead

Example:
    128 MB (write buffers) +
    256 MB (block cache) +
    10 MB (app caches, 10k entries ร— 1KB avg) +
    30 MB (HNSW, 10k vectors ร— 384D ร— 4 bytes ร— 2x overhead) +
    50 MB (overhead)
    = ~474 MB

Reduce Memory Usage

  1. Smaller caches:
#![allow(unused)]
fn main() {
options.kv_cache_size = 100;
options.record_cache_size = 100;
}
  1. Lower RocksDB buffers:
#![allow(unused)]
fn main() {
opts.set_write_buffer_size(64 * 1024 * 1024); // 64 MB
opts.set_block_cache_size(128 * 1024 * 1024); // 128 MB
}
  1. Smaller embeddings:
#![allow(unused)]
fn main() {
options.vector_dimension = 128; // Instead of 768
}

Disk Usage

Compaction

Force compaction to reclaim space:

#![allow(unused)]
fn main() {
// Manual compaction (future API)
db.compact_range(None, None)?;
}

Monitoring

Check database size:

#![allow(unused)]
fn main() {
// On Linux
std::process::Command::new("du")
    .args(&["-sh", "./db"])
    .output()?;
}

Benchmarking

Use Criterion for accurate benchmarks:

#![allow(unused)]
fn main() {
use criterion::{black_box, criterion_group, criterion_main, Criterion};

fn benchmark_insert(c: &mut Criterion) {
    let db = OpenDB::open("./bench_db").unwrap();
    
    c.bench_function("insert_memory", |b| {
        b.iter(|| {
            let memory = Memory::new("id".to_string(), "content".to_string());
            db.insert_memory(black_box(&memory)).unwrap();
        });
    });
}

criterion_group!(benches, benchmark_insert);
criterion_main!(benches);
}

Monitoring Metrics

Implement metrics collection:

#![allow(unused)]
fn main() {
struct Metrics {
    reads: AtomicU64,
    writes: AtomicU64,
    cache_hits: AtomicU64,
    cache_misses: AtomicU64,
}

impl Metrics {
    fn report(&self) {
        println!("Reads: {}", self.reads.load(Ordering::Relaxed));
        println!("Writes: {}", self.writes.load(Ordering::Relaxed));
        println!("Cache hit rate: {:.2}%", 
            self.cache_hits.load(Ordering::Relaxed) as f64 /
            (self.cache_hits.load(Ordering::Relaxed) + 
             self.cache_misses.load(Ordering::Relaxed)) as f64 * 100.0
        );
    }
}
}

Platform-Specific Tips

Linux

  • Use io_uring for async I/O (future RocksDB feature)
  • Disable transparent huge pages for lower latency
  • Use fallocate for preallocating disk space

macOS

  • APFS filesystem has good performance
  • Use F_NOCACHE for large scans (avoid cache pollution)

Windows

  • Use NTFS for best RocksDB performance
  • Disable indexing on database directory
  • Use SSD for best performance

Common Bottlenecks

  1. Slow writes: Increase write buffer size, disable compression
  2. Slow reads: Increase cache sizes, use SSD
  3. High memory: Reduce cache sizes, lower embedding dimension
  4. Slow vector search: Reduce HNSW parameters, lower dimension
  5. Large database size: Enable compression, run compaction

Next

Extending OpenDB

OpenDB is designed to be extensible. This guide covers custom backends, plugins, and extensions.

Custom Storage Backends

OpenDB uses the StorageBackend trait for pluggability.

Storage Trait

#![allow(unused)]
fn main() {
pub trait StorageBackend: Send + Sync {
    fn get(&self, cf: &str, key: &[u8]) -> Result<Option<Vec<u8>>>;
    fn put(&self, cf: &str, key: &[u8], value: &[u8]) -> Result<()>;
    fn delete(&self, cf: &str, key: &[u8]) -> Result<()>;
    fn exists(&self, cf: &str, key: &[u8]) -> Result<bool>;
    fn scan_prefix(&self, cf: &str, prefix: &[u8]) -> Result<Vec<(Vec<u8>, Vec<u8>)>>;
    fn begin_transaction(&self) -> Result<Box<dyn Transaction>>;
    fn flush(&self) -> Result<()>;
    fn snapshot(&self) -> Result<Box<dyn Snapshot>>;
}
}

Example: In-Memory Backend

#![allow(unused)]
fn main() {
use std::collections::HashMap;
use std::sync::RwLock;
use opendb::storage::{StorageBackend, Transaction, Snapshot};
use opendb::{Result, Error};

pub struct MemoryBackend {
    data: RwLock<HashMap<String, HashMap<Vec<u8>, Vec<u8>>>>,
}

impl MemoryBackend {
    pub fn new() -> Self {
        Self {
            data: RwLock::new(HashMap::new()),
        }
    }
}

impl StorageBackend for MemoryBackend {
    fn get(&self, cf: &str, key: &[u8]) -> Result<Option<Vec<u8>>> {
        let data = self.data.read().unwrap();
        Ok(data.get(cf)
            .and_then(|cf_data| cf_data.get(key))
            .cloned())
    }
    
    fn put(&self, cf: &str, key: &[u8], value: &[u8]) -> Result<()> {
        let mut data = self.data.write().unwrap();
        data.entry(cf.to_string())
            .or_insert_with(HashMap::new)
            .insert(key.to_vec(), value.to_vec());
        Ok(())
    }
    
    fn delete(&self, cf: &str, key: &[u8]) -> Result<()> {
        let mut data = self.data.write().unwrap();
        if let Some(cf_data) = data.get_mut(cf) {
            cf_data.remove(key);
        }
        Ok(())
    }
    
    fn exists(&self, cf: &str, key: &[u8]) -> Result<bool> {
        Ok(self.get(cf, key)?.is_some())
    }
    
    fn scan_prefix(&self, cf: &str, prefix: &[u8]) -> Result<Vec<(Vec<u8>, Vec<u8>)>> {
        let data = self.data.read().unwrap();
        Ok(data.get(cf)
            .map(|cf_data| {
                cf_data.iter()
                    .filter(|(k, _)| k.starts_with(prefix))
                    .map(|(k, v)| (k.clone(), v.clone()))
                    .collect()
            })
            .unwrap_or_default())
    }
    
    fn flush(&self) -> Result<()> {
        // No-op for in-memory
        Ok(())
    }
    
    // Implement Transaction and Snapshot traits...
}
}

Using Custom Backend

#![allow(unused)]
fn main() {
let backend = Arc::new(MemoryBackend::new());
let db = OpenDB::with_backend(backend, OpenDBOptions::default())?;
}

Custom Cache Implementations

Implement the Cache trait for custom caching strategies:

#![allow(unused)]
fn main() {
pub trait Cache<K, V>: Send + Sync {
    fn get(&self, key: &K) -> Option<V>;
    fn put(&self, key: K, value: V);
    fn remove(&self, key: &K);
    fn clear(&self);
    fn len(&self) -> usize;
}
}

Example: TTL Cache

#![allow(unused)]
fn main() {
use std::collections::HashMap;
use std::time::{Instant, Duration};
use parking_lot::RwLock;

pub struct TtlCache<K, V> {
    data: RwLock<HashMap<K, (V, Instant)>>,
    ttl: Duration,
}

impl<K: Eq + std::hash::Hash + Clone, V: Clone> Cache<K, V> for TtlCache<K, V> {
    fn get(&self, key: &K) -> Option<V> {
        let data = self.data.read();
        data.get(key).and_then(|(value, inserted)| {
            if inserted.elapsed() < self.ttl {
                Some(value.clone())
            } else {
                None // Expired
            }
        })
    }
    
    fn put(&self, key: K, value: V) {
        let mut data = self.data.write();
        data.insert(key, (value, Instant::now()));
    }
    
    // ... implement other methods
}
}

Custom Vector Indexes

While OpenDB uses HNSW, you can wrap alternative indexes:

Example: Flat Index

#![allow(unused)]
fn main() {
pub struct FlatVectorIndex {
    vectors: RwLock<Vec<(String, Vec<f32>)>>,
}

impl FlatVectorIndex {
    pub fn search(&self, query: &[f32], top_k: usize) -> Vec<SearchResult> {
        let vectors = self.vectors.read();
        let mut results: Vec<_> = vectors.iter()
            .map(|(id, vec)| {
                let distance = euclidean_distance(query, vec);
                SearchResult { id: id.clone(), distance }
            })
            .collect();
        
        results.sort_by(|a, b| a.distance.partial_cmp(&b.distance).unwrap());
        results.truncate(top_k);
        results
    }
}

fn euclidean_distance(a: &[f32], b: &[f32]) -> f32 {
    a.iter().zip(b.iter())
        .map(|(x, y)| (x - y).powi(2))
        .sum::<f32>()
        .sqrt()
}
}

Custom Serialization

Replace rkyv with custom codec:

#![allow(unused)]
fn main() {
pub trait Codec<T> {
    fn encode(&self, value: &T) -> Result<Vec<u8>>;
    fn decode(&self, bytes: &[u8]) -> Result<T>;
}

pub struct JsonCodec;

impl<T: serde::Serialize + serde::de::DeserializeOwned> Codec<T> for JsonCodec {
    fn encode(&self, value: &T) -> Result<Vec<u8>> {
        serde_json::to_vec(value).map_err(|e| Error::Codec(e.to_string()))
    }
    
    fn decode(&self, bytes: &[u8]) -> Result<T> {
        serde_json::from_slice(bytes).map_err(|e| Error::Codec(e.to_string()))
    }
}
}

Plugin System (Future)

Planned plugin architecture:

#![allow(unused)]
fn main() {
pub trait Plugin: Send + Sync {
    fn name(&self) -> &str;
    fn init(&mut self, db: &OpenDB) -> Result<()>;
    fn on_insert(&self, memory: &Memory) -> Result<()>;
    fn on_delete(&self, id: &str) -> Result<()>;
    fn on_link(&self, edge: &Edge) -> Result<()>;
}

// Example: Audit logger plugin
pub struct AuditPlugin {
    log_file: Mutex<File>,
}

impl Plugin for AuditPlugin {
    fn on_insert(&self, memory: &Memory) -> Result<()> {
        let mut file = self.log_file.lock().unwrap();
        writeln!(file, "INSERT: {}", memory.id)?;
        Ok(())
    }
}
}

Custom Relation Types

Extend graph relations for domain-specific needs:

#![allow(unused)]
fn main() {
pub mod custom_relations {
    pub const IMPLEMENTS: &str = "implements";
    pub const EXTENDS: &str = "extends";
    pub const DEPENDS_ON: &str = "depends_on";
    pub const TESTED_BY: &str = "tested_by";
}

use custom_relations::*;

db.link("MyStruct", "MyTrait", IMPLEMENTS)?;
db.link("ChildStruct", "ParentStruct", EXTENDS)?;
}

Embedding Adapters

Create adapters for different embedding models:

#![allow(unused)]
fn main() {
pub trait EmbeddingModel {
    fn dimension(&self) -> usize;
    fn encode(&self, text: &str) -> Result<Vec<f32>>;
}

pub struct SentenceTransformerAdapter {
    // Python bindings via PyO3
}

impl EmbeddingModel for SentenceTransformerAdapter {
    fn dimension(&self) -> usize {
        384 // all-MiniLM-L6-v2
    }
    
    fn encode(&self, text: &str) -> Result<Vec<f32>> {
        // Call Python model
        todo!()
    }
}
}

Future Extension Points

Planned extensibility features:

  1. Query Language: SQL-like interface for complex queries
  2. Triggers: Execute callbacks on events
  3. Views: Virtual collections with custom logic
  4. Migrations: Schema evolution helpers
  5. Replication: Multi-instance synchronization

Contributing Extensions

If you build a useful extension, consider contributing:

  1. Fork the repository
  2. Create a new module in src/extensions/
  3. Document usage and API
  4. Add tests for functionality
  5. Submit a pull request

Best Practices

  1. Follow trait contracts: Implement all required methods
  2. Handle errors: Use Result<T, Error> consistently
  3. Thread safety: Use Send + Sync for shared state
  4. Document: Provide clear documentation and examples
  5. Test: Write comprehensive tests for custom components

Examples

See the examples/ directory for:

  • custom_backend.rs: Alternative storage backend
  • plugin_example.rs: Sample plugin implementation
  • custom_index.rs: Alternative vector index

Next

Contributing to OpenDB

Thank you for your interest in contributing to OpenDB! This guide will help you get started.

Code of Conduct

This project adheres to the Contributor Covenant Code of Conduct. By participating, you are expected to uphold this code.

How to Contribute

Reporting Bugs

  1. Check existing issues to avoid duplicates
  2. Use the bug report template when creating a new issue
  3. Provide details:
    • OpenDB version
    • Rust version (rustc --version)
    • Operating system
    • Minimal reproduction steps
    • Expected vs actual behavior

Suggesting Features

  1. Check the roadmap to see if it's planned
  2. Use the feature request template
  3. Describe:
    • Use case and motivation
    • Proposed API design
    • Alternative solutions considered

Pull Requests

  1. Fork the repository
  2. Create a branch from main:
    git checkout -b feature/my-feature
    
  3. Make your changes following our code style
  4. Write tests for new functionality
  5. Update documentation if needed
  6. Commit with descriptive messages
  7. Push to your fork
  8. Open a pull request with detailed description

Development Setup

Prerequisites

  • Rust 1.70 or later
  • RocksDB development libraries (see Installation guide)

Clone and Build

git clone https://github.com/muhammad-fiaz/OpenDB.git
cd OpenDB
cargo build

Run Tests

# All tests
cargo test

# Specific test
cargo test test_name

# With output
cargo test -- --nocapture

Run Examples

cargo run --example quickstart
cargo run --example memory_agent
cargo run --example graph_relations

Build Documentation

# API docs
cargo doc --open

# mdBook docs
cd docs
mdbook serve --open

Code Style

Formatting

Use rustfmt for consistent formatting:

cargo fmt --all

Linting

Use clippy for code quality:

cargo clippy --all-targets --all-features -- -D warnings

Naming Conventions

  • Types: PascalCase (e.g., OpenDB, StorageBackend)
  • Functions: snake_case (e.g., insert_memory, get_related)
  • Constants: SCREAMING_SNAKE_CASE (e.g., DEFAULT_CACHE_SIZE)
  • Modules: snake_case (e.g., graph, vector)

Documentation

  • Public APIs: Must have /// documentation
  • Examples: Include usage examples in doc comments
  • Errors: Document possible error cases

Example:

#![allow(unused)]
fn main() {
/// Inserts a memory record into the database.
///
/// # Arguments
///
/// * `memory` - The memory record to insert
///
/// # Returns
///
/// Returns `Ok(())` on success, or an error if:
/// - Serialization fails
/// - Storage write fails
///
/// # Example
///
/// ```
/// let memory = Memory::new("id".to_string(), "content".to_string());
/// db.insert_memory(&memory)?;
/// ```
pub fn insert_memory(&self, memory: &Memory) -> Result<()> {
    // ...
}
}

Testing Guidelines

Unit Tests

Place unit tests in the same file as the code:

#![allow(unused)]
fn main() {
#[cfg(test)]
mod tests {
    use super::*;
    
    #[test]
    fn test_memory_creation() {
        let memory = Memory::new("id".to_string(), "content".to_string());
        assert_eq!(memory.id, "id");
        assert_eq!(memory.content, "content");
    }
}
}

Integration Tests

Place integration tests in tests/:

#![allow(unused)]
fn main() {
// tests/my_feature_test.rs
use opendb::{OpenDB, Memory};
use tempfile::TempDir;

#[test]
fn test_my_feature() {
    let temp_dir = TempDir::new().unwrap();
    let db = OpenDB::open(temp_dir.path()).unwrap();
    
    // Test logic
}
}

Test Coverage

Aim for:

  • New features: >80% coverage
  • Bug fixes: Regression test included
  • Edge cases: Test error paths

Commit Messages

Follow conventional commits format:

<type>(<scope>): <subject>

<body>

<footer>

Types:

  • feat: New feature
  • fix: Bug fix
  • docs: Documentation changes
  • style: Formatting changes
  • refactor: Code refactoring
  • test: Adding tests
  • chore: Maintenance tasks

Examples:

feat(graph): add weighted edge support

Adds optional weight parameter to link() method,
allowing users to specify edge weights.

Closes #123
fix(cache): prevent race condition in LRU eviction

Fixes deadlock when multiple threads evict simultaneously
by using a write lock during eviction.

Fixes #456

Pull Request Guidelines

PR Title

Use the same format as commit messages:

feat(vector): add cosine similarity distance metric

PR Description

Include:

  1. What: Description of changes
  2. Why: Motivation and context
  3. How: Implementation approach
  4. Testing: How you tested the changes
  5. Checklist:
    • Tests added/updated
    • Documentation updated
    • Changelog updated (for features/fixes)
    • Code formatted with rustfmt
    • Linted with clippy

Review Process

  1. CI checks: All tests must pass
  2. Code review: At least one maintainer approval
  3. Documentation: Verify docs are updated
  4. Changelog: Ensure CHANGELOG.md is updated

Architecture Guidelines

Module Organization

Follow existing structure:

src/
  lib.rs          # Public API exports
  database.rs     # Main OpenDB struct
  error.rs        # Error types
  types.rs        # Core data types
  storage/        # Storage backends
  cache/          # Caching layer
  kv/             # Key-value store
  records/        # Memory records
  graph/          # Graph relationships
  vector/         # Vector search
  transaction/    # Transaction management
  codec/          # Serialization

Adding New Features

  1. New module: Create in appropriate directory
  2. Trait-based: Use traits for extensibility
  3. Error handling: Use Result<T, Error>
  4. Thread safety: Ensure Send + Sync where needed

Performance Considerations

  • Benchmarks: Add benchmarks for performance-critical code
  • Profiling: Profile before optimizing
  • Allocations: Minimize unnecessary allocations
  • Locks: Prefer RwLock for read-heavy workloads

Documentation Updates

When adding features, update:

  1. API docs: /// comments in code
  2. mdBook docs: Relevant pages in docs/src/
  3. Examples: Add example if appropriate
  4. CHANGELOG.md: Document changes
  5. README.md: Update if API changes

Release Process (Maintainers)

  1. Version bump: Update Cargo.toml
  2. Changelog: Update CHANGELOG.md
  3. Tag: Create git tag v0.x.y
  4. Publish: cargo publish
  5. GitHub Release: Create release notes

Getting Help

  • Discussions: GitHub Discussions for questions
  • Issues: GitHub Issues for bugs/features
  • Email: contact@muhammadfiaz.com for private inquiries

Recognition

Contributors are recognized in:

  • CONTRIBUTORS.md file
  • GitHub contributors page
  • Release notes

Thank you for contributing to OpenDB! ๐ŸŽ‰

Next