OpenDB

OpenDB is a high-performance hybrid embedded database written in pure Rust, combining multiple database paradigms into a single, cohesive system.

Features

🔑 Key-Value Store: Fast point lookups and range scans
📄 Structured Records: Document/row storage with schema support
🔗 Graph Database: Relationships and graph traversals
🔍 Vector Search: Semantic search with HNSW-based approximate nearest neighbors
💾 In-Memory Cache: LRU cache for hot data
✅ ACID Transactions: Full transactional guarantees with WAL

Why OpenDB?

OpenDB is designed for applications that need multiple database capabilities without the complexity of managing separate systems:

Agent Memory Systems: Store and recall facts, relationships, and semantic information
Knowledge Graphs: Build and traverse complex relationship networks
Semantic Search: Find similar content using vector embeddings
High-Performance Applications: LSM-tree backend for excellent write throughput

Repository

GitHub: muhammad-fiaz/OpenDB
Documentation: https://muhammad-fiaz.github.io/opendb
Contact: contact@muhammadfiaz.com

Quick Example

use opendb::{OpenDB, Memory};

fn main() -> opendb::Result<()> {
    // Open database
    let db = OpenDB::open("./my_database")?;
    
    // Store a memory with embedding
    let memory = Memory::new(
        "memory_1",
        "Rust is awesome!",
        vec![0.1, 0.2, 0.3],
        0.9, // importance
    );
    db.insert_memory(&memory)?;
    
    // Create relationships
    db.link("memory_1", "related_to", "memory_2")?;
    
    // Vector search
    let similar = db.search_similar(&[0.1, 0.2, 0.3], 5)?;
    
    Ok(())
}

Next Steps

Installation

From crates.io (once published)

cargo add opendb

From source

Clone the repository:

git clone https://github.com/muhammad-fiaz/OpenDB.git
cd OpenDB

Build the project:

cargo build --release

Run tests:

cargo test

Run examples:

cargo run --example quickstart
cargo run --example memory_agent
cargo run --example graph_relations

Requirements

Rust: 1.70.0 or higher (Rust 2021 edition)
Operating System: Linux, macOS, or Windows
Dependencies: All dependencies are managed by Cargo

System Dependencies

OpenDB uses RocksDB as its storage backend, which requires:

Linux: gcc, g++, make, libsnappy-dev, zlib1g-dev, libbz2-dev, liblz4-dev
macOS: Xcode command line tools
Windows: Visual Studio Build Tools

Linux Setup

# Ubuntu/Debian
sudo apt-get install -y gcc g++ make libsnappy-dev zlib1g-dev libbz2-dev liblz4-dev

# Fedora/RHEL
sudo dnf install -y gcc gcc-c++ make snappy-devel zlib-devel bzip2-devel lz4-devel

macOS Setup

xcode-select --install

Windows Setup

Install Visual Studio Build Tools

Verifying Installation

cargo test --all

All tests should pass. If you encounter issues, please check:

Rust version: rustc --version
Build dependencies are installed
Open an issue if problems persist

Quick Start

This guide will walk you through the basic usage of OpenDB.

Opening a Database

use opendb::{OpenDB, Result};

fn main() -> Result<()> {
    // Open or create a database
    let db = OpenDB::open("./my_database")?;
    Ok(())
}

Working with Key-Value Data

#![allow(unused)]
fn main() {
// Store a value
db.put(b"my_key", b"my_value")?;

// Retrieve a value
if let Some(value) = db.get(b"my_key")? {
    println!("Value: {:?}", value);
}

// Delete a value
db.delete(b"my_key")?;

// Check existence
if db.exists(b"my_key")? {
    println!("Key exists!");
}
}

Working with Memory Records

Memory records are structured data with embeddings for semantic search.

#![allow(unused)]
fn main() {
use opendb::Memory;

// Create a memory
let memory = Memory::new(
    "memory_001",
    "The user prefers dark mode",
    vec![0.1, 0.2, 0.3, 0.4], // embedding vector
    0.9, // importance (0.0 to 1.0)
)
.with_metadata("category", "preference")
.with_metadata("source", "user_settings");

// Insert the memory
db.insert_memory(&memory)?;

// Retrieve it
if let Some(mem) = db.get_memory("memory_001")? {
    println!("Content: {}", mem.content);
    println!("Importance: {}", mem.importance);
}

// List all memories with a prefix
let all = db.list_memories("memory")?;
println!("Found {} memories", all.len());
}

Creating Relationships

#![allow(unused)]
fn main() {
// Create relationships between memories
db.link("memory_001", "related_to", "memory_002")?;
db.link("memory_001", "caused_by", "memory_003")?;

// Query relationships
let related = db.get_related("memory_001", "related_to")?;
for id in related {
    println!("Related memory: {}", id);
}

// Get all outgoing edges
let edges = db.get_outgoing("memory_001")?;
for edge in edges {
    println!("{} --[{}]--> {}", edge.from, edge.relation, edge.to);
}
}

Vector Search

#![allow(unused)]
fn main() {
// Search for similar memories
let query_embedding = vec![0.1, 0.2, 0.3, 0.4];
let results = db.search_similar(&query_embedding, 5)?; // top 5

for result in results {
    println!("Memory: {} (distance: {:.4})", 
             result.memory.content, 
             result.distance);
}
}

Using Transactions

#![allow(unused)]
fn main() {
// Begin a transaction
let mut txn = db.begin_transaction()?;

// Perform operations
txn.put("records", b"key1", b"value1")?;
txn.put("records", b"key2", b"value2")?;

// Commit the transaction
txn.commit()?;

// Or rollback if needed
// txn.rollback()?;
}

Flushing to Disk

#![allow(unused)]
fn main() {
// Ensure all writes are persisted
db.flush()?;
}

Complete Example

See the quickstart example for a complete, runnable example.

Next Steps

Architecture Overview

OpenDB is designed as a modular, hybrid database system that combines multiple database paradigms while maintaining high performance and ACID guarantees.

System Architecture

┌─────────────────────────────────────────────────────────┐
│                  OpenDB Public API                       │
├─────────────┬──────────────┬──────────────┬─────────────┤
│  Key-Value  │   Records    │    Graph     │   Vectors   │
│   Store     │  (Memory)    │  Relations   │   (HNSW)    │
├─────────────┴──────────────┴──────────────┴─────────────┤
│           Transaction Manager (ACID)                     │
│        WAL + Optimistic Locking + MVCC                  │
├──────────────────────────────────────────────────────────┤
│              LRU Cache Layer                             │
│        (Write-Through + Invalidation)                   │
├──────────────────────────────────────────────────────────┤
│         Storage Trait (Pluggable Backend)                │
├──────────────────────────────────────────────────────────┤
│            RocksDB Backend (LSM Tree)                    │
│    Column Families + Native Transactions + WAL          │
└──────────────────────────────────────────────────────────┘

Core Components

1. Storage Layer

Backend: RocksDB (high-performance LSM tree)
Column Families: Namespace isolation for different data types
Persistence: Write-Ahead Log (WAL) for durability

2. Transaction Manager

ACID Guarantees: Full transactional support
Isolation: Snapshot isolation via RocksDB transactions
Concurrency: Optimistic locking

3. Cache Layer

Strategy: LRU (Least Recently Used)
Write Policy: Write-through (update storage first, then cache)
Coherency: Automatic invalidation on delete

4. Feature Modules

Key-Value Store

Direct byte-level storage
Prefix scans
Cache-accelerated reads

Records Manager

Structured Memory records
Codec: rkyv (zero-copy deserialization)
Metadata support

Graph Manager

Bidirectional adjacency lists
Forward index: from → [(relation, to)]
Backward index: to → [(relation, from)]

Vector Manager

HNSW index for approximate nearest neighbor search
Automatic index rebuilding
Configurable search quality

Data Flow

Write Path

Application → OpenDB API → Cache (update) → Storage Backend → WAL → Disk

Read Path (Cache Hit)

Application → OpenDB API → Cache → Return

Read Path (Cache Miss)

Application → OpenDB API → Cache (miss) → Storage Backend → Cache (populate) → Return

Design Decisions

Why RocksDB?

Advantages:

Production-tested LSM tree
Excellent write throughput
Built-in WAL and transactions
Column families for organization

Tradeoffs:

Not pure Rust (C++ with bindings)
Larger binary size

Alternatives Considered:

redb: Pure Rust, B-tree based, simpler but lower throughput
sled: Pure Rust, but less mature and maintenance concerns
Custom LSM: Too much complexity for initial version

Why rkyv for Serialization?

Advantages:

Zero-copy deserialization (fast reads)
Schema versioning support
Type safety

Alternatives:

bincode: Simpler but requires full deserialization
serde_json: Human-readable but slower

Why HNSW for Vector Search?

Advantages:

Excellent accuracy/speed tradeoff
Logarithmic search complexity
Works well for high-dimensional data

Alternatives:

IVF (Inverted File Index): Faster but less accurate
Flat index: Exact but O(n) search

Next Steps

Storage Layer

RocksDB Backend

OpenDB uses RocksDB as its default storage backend, providing a robust foundation for ACID transactions and high-performance data access.

Column Families

Data is organized into separate column families (namespaces):

Column Family	Purpose	Data Format
`default`	Key-value store	Raw bytes
`records`	Memory records	rkyv-encoded Memory structs
`graph_forward`	Forward adjacency list	rkyv-encoded Edge arrays
`graph_backward`	Backward adjacency list	rkyv-encoded Edge arrays
`vector_data`	Vector embeddings	bincode-encoded f32 arrays
`vector_index`	HNSW metadata	(currently in-memory)
`metadata`	DB metadata	JSON

Storage Trait

The storage layer is abstracted behind a trait, allowing for pluggable backends:

#![allow(unused)]
fn main() {
pub trait StorageBackend: Send + Sync {
    fn get(&self, cf: &str, key: &[u8]) -> Result<Option<Vec<u8>>>;
    fn put(&self, cf: &str, key: &[u8], value: &[u8]) -> Result<()>;
    fn delete(&self, cf: &str, key: &[u8]) -> Result<()>;
    fn scan_prefix(&self, cf: &str, prefix: &[u8]) -> Result<Vec<(Vec<u8>, Vec<u8>)>>;
    fn begin_transaction(&self) -> Result<Box<dyn Transaction>>;
    fn flush(&self) -> Result<()>;
}
}

Performance Tuning

RocksDB is configured with optimizations for mixed read/write workloads:

#![allow(unused)]
fn main() {
// Write buffer: 128MB
opts.set_write_buffer_size(128 * 1024 * 1024);

// Number of write buffers: 3
opts.set_max_write_buffer_number(3);

// Target file size: 64MB
opts.set_target_file_size_base(64 * 1024 * 1024);

// Compression: LZ4
opts.set_compression_type(rocksdb::DBCompressionType::Lz4);
}

Write-Ahead Log (WAL)

RocksDB's WAL ensures durability:

All writes are first appended to the WAL
Then applied to memtables
Periodically flushed to SST files
Old WAL segments are deleted after checkpoint

LSM Tree Structure

RocksDB uses a Log-Structured Merge (LSM) tree:

Write Path:
  Write → WAL → MemTable → (flush) → L0 SST → (compact) → L1 SST → ...

Read Path:
  Read → MemTable → Block Cache → L0 → L1 → ... → Ln

Advantages

Write Amplification: Minimized for sequential writes
Compression: Data is compressed at each level
Compaction: Background process merges and cleans data

Tradeoffs

Read Amplification: May need to check multiple levels
Space Amplification: Compaction creates temporary overhead

Future Backend Options

redb (Pure Rust B-Tree)

Pros:

Pure Rust, no C++ dependencies
Simpler architecture
Good for read-heavy workloads

Cons:

Lower write throughput than LSM
Less mature

Custom LSM Implementation

Pros:

Full control over optimization
Pure Rust

Cons:

High development and maintenance cost
Risk of bugs in critical path

Transaction Model

OpenDB provides full ACID (Atomicity, Consistency, Isolation, Durability) guarantees through RocksDB's transaction support.

ACID Properties

Atomicity

All operations in a transaction either succeed together or fail together.

#![allow(unused)]
fn main() {
let mut txn = db.begin_transaction()?;
txn.put("records", b"key1", b"value1")?;
txn.put("records", b"key2", b"value2")?;
txn.commit()?; // Both writes succeed or both fail
}

Consistency

Transactions move the database from one consistent state to another.

Isolation

Transactions use snapshot isolation:

Each transaction sees a consistent snapshot of the database
Concurrent transactions don't interfere with each other
RocksDB provides MVCC (Multi-Version Concurrency Control)

Durability

Once a transaction commits, the changes are permanent:

Write-Ahead Log (WAL) ensures durability
Data survives process crashes
Can be verified by reopening the database

Transaction API

Basic Usage

#![allow(unused)]
fn main() {
// Begin transaction
let mut txn = db.begin_transaction()?;

// Perform operations
txn.put("records", b"key1", b"value1")?;
let val = txn.get("records", b"key1")?;

// Commit
txn.commit()?;
}

Rollback

#![allow(unused)]
fn main() {
let mut txn = db.begin_transaction()?;
txn.put("records", b"key1", b"modified")?;

// Something went wrong, rollback
txn.rollback()?;

// Original value remains unchanged
}

Auto-Rollback

Transactions are automatically rolled back if dropped without commit:

#![allow(unused)]
fn main() {
{
    let mut txn = db.begin_transaction()?;
    txn.put("records", b"key1", b"value")?;
    // txn dropped here - auto rollback
}
}

Concurrency Model

Optimistic Locking

RocksDB transactions use optimistic locking:

Read phase: Transaction reads data without locks
Validation phase: Before commit, check if data changed
Write phase: If no conflicts, commit; otherwise abort

Conflict Detection

#![allow(unused)]
fn main() {
// Transaction 1
let mut txn1 = db.begin_transaction()?;
txn1.put("records", b"counter", b"1")?;

// Transaction 2 (concurrent)
let mut txn2 = db.begin_transaction()?;
txn2.put("records", b"counter", b"2")?;

// First to commit wins
txn1.commit()?; // Success
txn2.commit()?; // May fail with conflict error
}

Snapshot Isolation Example

#![allow(unused)]
fn main() {
// Initial state: counter = 0
db.put(b"counter", b"0")?;

// Transaction 1 reads
let mut txn1 = db.begin_transaction()?;
let val1 = txn1.get("default", b"counter")?;

// Meanwhile, Transaction 2 updates
let mut txn2 = db.begin_transaction()?;
txn2.put("default", b"counter", b"5")?;
txn2.commit()?;

// Transaction 1 still sees old snapshot
let val1_again = txn1.get("default", b"counter")?;
assert_eq!(val1, val1_again); // Still "0"
}

Best Practices

Keep Transactions Short

#![allow(unused)]
fn main() {
// ❌ Bad: Long-running transaction
let mut txn = db.begin_transaction()?;
for i in 0..1_000_000 {
    txn.put("default", &i.to_string().as_bytes(), b"value")?;
}
txn.commit()?;

// ✅ Good: Batch commits
for chunk in (0..1_000_000).collect::<Vec<_>>().chunks(1000) {
    let mut txn = db.begin_transaction()?;
    for i in chunk {
        txn.put("default", &i.to_string().as_bytes(), b"value")?;
    }
    txn.commit()?;
}
}

Handle Conflicts

#![allow(unused)]
fn main() {
loop {
    let mut txn = db.begin_transaction()?;
    
    // Read-modify-write
    let val = txn.get("default", b"counter")?.unwrap_or_default();
    let new_val = increment(val);
    txn.put("default", b"counter", &new_val)?;
    
    match txn.commit() {
        Ok(_) => break,
        Err(Error::Transaction(_)) => continue, // Retry on conflict
        Err(e) => return Err(e),
    }
}
}

Use Snapshots for Consistent Reads

For read-only operations across multiple keys, use snapshots (coming soon):

#![allow(unused)]
fn main() {
let snapshot = db.snapshot()?;
let val1 = snapshot.get("records", b"key1")?;
let val2 = snapshot.get("records", b"key2")?;
// val1 and val2 are from the same consistent point in time
}

Limitations

Transactions are single-threaded (one transaction per thread)
Cross-column-family transactions are supported
Very large transactions may impact performance

Caching Strategy

OpenDB uses an LRU (Least Recently Used) cache to accelerate reads while maintaining consistency.

Cache Architecture

┌──────────────────────────────────┐
│         Application              │
└─────────────┬────────────────────┘
              │
         Read/Write
              │
┌─────────────▼────────────────────┐
│         LRU Cache                │
│  ┌──────┬──────┬──────┬──────┐  │
│  │ Hot1 │ Hot2 │ Hot3 │ Hot4 │  │
│  └──────┴──────┴──────┴──────┘  │
└─────────────┬────────────────────┘
              │
       Cache Miss/Write
              │
┌─────────────▼────────────────────┐
│      Storage Backend             │
│         (RocksDB)                │
└──────────────────────────────────┘

Write-Through Policy

All writes go to storage first, then update the cache:

#![allow(unused)]
fn main() {
pub fn put(&self, key: &[u8], value: &[u8]) -> Result<()> {
    // 1. Write to storage (ensures durability)
    self.storage.put(ColumnFamilies::DEFAULT, key, value)?;
    
    // 2. Update cache
    self.cache.insert(key.to_vec(), value.to_vec());
    
    Ok(())
}
}

Why Write-Through?

✅ Durability: Data is persisted immediately
✅ Consistency: Cache never has uncommitted data
❌ Slower writes: Every write hits disk

Alternative: Write-Back

✅ Faster writes (batch to disk later)
❌ Risk of data loss if crash before flush
❌ More complex consistency model

Cache Invalidation

Deletes remove from both cache and storage:

#![allow(unused)]
fn main() {
pub fn delete(&self, key: &[u8]) -> Result<()> {
    // 1. Delete from storage
    self.storage.delete(ColumnFamilies::DEFAULT, key)?;
    
    // 2. Invalidate cache
    self.cache.invalidate(&key.to_vec());
    
    Ok(())
}
}

LRU Eviction

When cache reaches capacity, least-recently-used items are evicted:

Cache (capacity = 3):
  
Put("A", "1")  →  [A]
Put("B", "2")  →  [B, A]
Put("C", "3")  →  [C, B, A]
Get("A")       →  [A, C, B]  # A is now most recent
Put("D", "4")  →  [D, A, C]  # B evicted (LRU)

Cache Sizes

Default cache sizes:

#![allow(unused)]
fn main() {
pub struct OpenDBOptions {
    pub kv_cache_size: usize,       // Default: 1000
    pub record_cache_size: usize,   // Default: 500
}
}

Tuning Cache Size

#![allow(unused)]
fn main() {
let mut options = OpenDBOptions::default();
options.kv_cache_size = 10_000;      // More KV entries
options.record_cache_size = 2_000;   // More Memory records

let db = OpenDB::open_with_options("./db", options)?;
}

Guidelines:

Small cache (100-1000): Low memory, high cache miss rate
Medium cache (1000-10000): Balanced for most workloads
Large cache (10000+): High memory, low cache miss rate

Cache Hit Rates

Monitor effectiveness (metrics to be added):

Hit Rate = Cache Hits / Total Reads

> 80%: Excellent, cache is effective
50-80%: Good, consider increasing size
< 50%: Poor, increase cache or review access patterns

Multi-Level Caching

OpenDB has two cache levels:

Application Cache (LRU): In-process, fast
RocksDB Block Cache: Built into RocksDB, shared

RocksDB Block Cache

RocksDB has its own block cache (not exposed in current API):

#![allow(unused)]
fn main() {
// Future tuning option
opts.set_block_cache_size(256 * 1024 * 1024); // 256 MB
}

Concurrent Access

Caches use parking_lot::RwLock for thread safety:

#![allow(unused)]
fn main() {
pub struct LruMemoryCache<K, V> {
    cache: RwLock<LruCache<K, V>>,
}
}

Reads: Multiple concurrent readers
Writes: Exclusive lock during insert/evict

Cache Coherency Guarantees

Write Visibility: Writes are immediately visible after put() returns
Delete Visibility: Deletes are immediately visible after delete() returns
Transaction Isolation: Transactions bypass cache (read from storage snapshot)

Best Practices

Warm Up Cache

#![allow(unused)]
fn main() {
// Preload important data
let important_ids = vec!["mem_001", "mem_002", "mem_003"];
for id in important_ids {
    db.get_memory(id)?;  // Populate cache
}
}

Avoid Thrashing

#![allow(unused)]
fn main() {
// ❌ Bad: Random access pattern, poor cache hit rate
for i in 0..1_000_000 {
    let random_key = generate_random_key();
    db.get(&random_key)?;
}

// ✅ Good: Sequential or localized access
for i in 0..1000 {
    db.get(&format!("key_{}", i).as_bytes())?;
}
}

Cache Bypass for Large Scans

For scanning large datasets, consider bypassing cache (future feature):

#![allow(unused)]
fn main() {
// Future API
db.scan_prefix_no_cache(b"prefix")?;
}

Key-Value Store API

OpenDB provides a simple, fast key-value interface for storing arbitrary binary data.

Basic Operations

Put

Store a value under a key:

#![allow(unused)]
fn main() {
use opendb::OpenDB;

let db = OpenDB::open("./db")?;
db.put(b"user:123", b"Alice")?;
}

Signature:

#![allow(unused)]
fn main() {
pub fn put(&self, key: &[u8], value: &[u8]) -> Result<()>
}

Behavior:

Writes to storage immediately (write-through cache)
Updates cache
Returns error if storage fails

Get

Retrieve a value by key:

#![allow(unused)]
fn main() {
let value = db.get(b"user:123")?;
match value {
    Some(bytes) => println!("Found: {}", String::from_utf8_lossy(&bytes)),
    None => println!("Not found"),
}
}

Signature:

#![allow(unused)]
fn main() {
pub fn get(&self, key: &[u8]) -> Result<Option<Vec<u8>>>
}

Behavior:

Checks cache first (fast path)
Falls back to storage on cache miss
Returns None if key doesn't exist

Delete

Remove a key-value pair:

#![allow(unused)]
fn main() {
db.delete(b"user:123")?;
}

Signature:

#![allow(unused)]
fn main() {
pub fn delete(&self, key: &[u8]) -> Result<()>
}

Behavior:

Removes from storage
Invalidates cache entry
Succeeds even if key doesn't exist

Exists

Check if a key exists without fetching the value:

#![allow(unused)]
fn main() {
if db.exists(b"user:123")? {
    println!("User exists");
}
}

Signature:

#![allow(unused)]
fn main() {
pub fn exists(&self, key: &[u8]) -> Result<bool>
}

Behavior:

Checks cache first
Falls back to storage on cache miss
More efficient than get() for existence checks

Advanced Operations

Scan Prefix

Iterate over all keys with a common prefix:

#![allow(unused)]
fn main() {
let users = db.scan_prefix(b"user:")?;
for (key, value) in users {
    println!("{} = {}", 
        String::from_utf8_lossy(&key),
        String::from_utf8_lossy(&value)
    );
}
}

Signature:

#![allow(unused)]
fn main() {
pub fn scan_prefix(&self, prefix: &[u8]) -> Result<Vec<(Vec<u8>, Vec<u8>)>>
}

Behavior:

Bypasses cache (reads from storage)
Returns all matching key-value pairs
Sorted by key (lexicographic order)

Usage Patterns

Namespacing

Use prefixes to organize data:

#![allow(unused)]
fn main() {
// User namespace
db.put(b"user:123", b"Alice")?;
db.put(b"user:456", b"Bob")?;

// Session namespace
db.put(b"session:abc", b"user:123")?;
db.put(b"session:xyz", b"user:456")?;

// Scan all users
let users = db.scan_prefix(b"user:")?;
}

Counter

Implement atomic counters with transactions:

#![allow(unused)]
fn main() {
fn increment_counter(db: &OpenDB, key: &[u8]) -> Result<u64> {
    let mut txn = db.begin_transaction()?;
    
    let current = txn.get("default", key)?
        .map(|v| u64::from_le_bytes(v.try_into().unwrap()))
        .unwrap_or(0);
    
    let new_val = current + 1;
    txn.put("default", key, &new_val.to_le_bytes())?;
    txn.commit()?;
    
    Ok(new_val)
}

let count = increment_counter(&db, b"visits")?;
}

Binary Data

Store any serializable type:

#![allow(unused)]
fn main() {
use serde::{Serialize, Deserialize};

#[derive(Serialize, Deserialize)]
struct Config {
    host: String,
    port: u16,
}

let config = Config {
    host: "localhost".to_string(),
    port: 8080,
};

// Serialize
let bytes = bincode::serialize(&config)?;
db.put(b"config", &bytes)?;

// Deserialize
let bytes = db.get(b"config")?.unwrap();
let config: Config = bincode::deserialize(&bytes)?;
}

Performance Characteristics

Operation	Time Complexity	Cache Hit	Cache Miss
`get()`	O(1) avg	~100ns	~1-10µs
`put()`	O(log n)	~1-10µs	~1-10µs
`delete()`	O(log n)	~1-10µs	~1-10µs
`exists()`	O(1) avg	~100ns	~1-10µs
`scan_prefix()`	O(k log n)	N/A	~10µs + k*1µs

Where:

n = total keys in database
k = number of matching keys

Error Handling

All operations return Result<T, Error>:

#![allow(unused)]
fn main() {
use opendb::{OpenDB, Error};

match db.get(b"key") {
    Ok(Some(value)) => { /* use value */ },
    Ok(None) => { /* key not found */ },
    Err(Error::Storage(e)) => { /* storage error */ },
    Err(Error::Cache(e)) => { /* cache error */ },
    Err(e) => { /* other error */ },
}
}

Thread Safety

All KV operations are thread-safe:

#![allow(unused)]
fn main() {
use std::sync::Arc;
use std::thread;

let db = Arc::new(OpenDB::open("./db")?);

let handles: Vec<_> = (0..10).map(|i| {
    let db = Arc::clone(&db);
    thread::spawn(move || {
        db.put(format!("key_{}", i).as_bytes(), b"value").unwrap();
    })
}).collect();

for handle in handles {
    handle.join().unwrap();
}
}

Records API

The Records API manages structured Memory objects with metadata, timestamps, and embeddings.

Memory Type

#![allow(unused)]
fn main() {
pub struct Memory {
    pub id: String,
    pub content: String,
    pub embedding: Vec<f32>,
    pub importance: f64,
    pub timestamp: i64,
    pub metadata: HashMap<String, String>,
}
}

Creating Memories

New Memory

#![allow(unused)]
fn main() {
use opendb::{OpenDB, Memory};

let memory = Memory::new(
    "mem_001".to_string(),
    "User asked about Rust ownership".to_string(),
);
}

With Metadata

#![allow(unused)]
fn main() {
let memory = Memory::new("mem_002".to_string(), "Content".to_string())
    .with_metadata("category", "conversation")
    .with_metadata("user_id", "123");
}

Custom Builder

#![allow(unused)]
fn main() {
use std::collections::HashMap;

let mut metadata = HashMap::new();
metadata.insert("priority".to_string(), "high".to_string());

let memory = Memory {
    id: "mem_003".to_string(),
    content: "Important note".to_string(),
    embedding: vec![0.1, 0.2, 0.3], // 3D for demo
    importance: 0.95,
    timestamp: chrono::Utc::now().timestamp(),
    metadata,
};
}

CRUD Operations

Insert

#![allow(unused)]
fn main() {
let db = OpenDB::open("./db")?;
let memory = Memory::new("mem_001".to_string(), "Hello world".to_string());
db.insert_memory(&memory)?;
}

Signature:

#![allow(unused)]
fn main() {
pub fn insert_memory(&self, memory: &Memory) -> Result<()>
}

Behavior:

Serializes with rkyv (zero-copy)
Writes to records column family
Updates cache
If embedding is non-empty, stores in vector index (requires rebuild for search)

Get

#![allow(unused)]
fn main() {
let memory = db.get_memory("mem_001")?;
match memory {
    Some(mem) => println!("Content: {}", mem.content),
    None => println!("Not found"),
}
}

Signature:

#![allow(unused)]
fn main() {
pub fn get_memory(&self, id: &str) -> Result<Option<Memory>>
}

Behavior:

Checks cache first
Deserializes from storage on cache miss
Returns None if not found

Update

#![allow(unused)]
fn main() {
let mut memory = db.get_memory("mem_001")?.unwrap();
memory.content = "Updated content".to_string();
memory.importance = 0.9;
memory.touch(); // Update timestamp
db.insert_memory(&memory)?; // Upsert
}

Note: insert_memory() acts as upsert (update if exists, insert if not).

Delete

#![allow(unused)]
fn main() {
db.delete_memory("mem_001")?;
}

Signature:

#![allow(unused)]
fn main() {
pub fn delete_memory(&self, id: &str) -> Result<()>
}

Behavior:

Removes from storage
Invalidates cache
Does not remove from vector index (requires rebuild)
Does not remove graph edges (handle separately)

Listing Operations

List All IDs

#![allow(unused)]
fn main() {
let ids = db.list_memory_ids()?;
for id in ids {
    println!("Memory ID: {}", id);
}
}

Signature:

#![allow(unused)]
fn main() {
pub fn list_memory_ids(&self) -> Result<Vec<String>>
}

List All Memories

#![allow(unused)]
fn main() {
let memories = db.list_memories()?;
for memory in memories {
    println!("{}: {}", memory.id, memory.content);
}
}

Signature:

#![allow(unused)]
fn main() {
pub fn list_memories(&self) -> Result<Vec<Memory>>
}

Warning: Loads all memories into memory. For large datasets, use pagination (not yet implemented) or filter by prefix.

Advanced Usage

Importance Filtering

#![allow(unused)]
fn main() {
let memories = db.list_memories()?;
let important: Vec<_> = memories.into_iter()
    .filter(|m| m.importance > 0.8)
    .collect();
}

Metadata Queries

#![allow(unused)]
fn main() {
let memories = db.list_memories()?;
let category_matches: Vec<_> = memories.into_iter()
    .filter(|m| {
        m.metadata.get("category")
            .map(|v| v == "conversation")
            .unwrap_or(false)
    })
    .collect();
}

Time Range Queries

#![allow(unused)]
fn main() {
use chrono::{Utc, Duration};

let one_hour_ago = (Utc::now() - Duration::hours(1)).timestamp();
let recent: Vec<_> = db.list_memories()?.into_iter()
    .filter(|m| m.timestamp > one_hour_ago)
    .collect();
}

Embeddings

Setting Embeddings

Embeddings enable semantic search:

#![allow(unused)]
fn main() {
let embedding = generate_embedding("Hello world"); // Your embedding model
let memory = Memory {
    id: "mem_001".to_string(),
    content: "Hello world".to_string(),
    embedding, // Vec<f32>
    ..Default::default()
};
db.insert_memory(&memory)?;
}

Dimension Requirements

All embeddings must have the same dimension (default 384):

#![allow(unused)]
fn main() {
use opendb::OpenDBOptions;

let mut options = OpenDBOptions::default();
options.vector_dimension = 768; // For larger models
let db = OpenDB::open_with_options("./db", options)?;
}

Searching Embeddings

See Vector API for semantic search.

Touch Timestamp

Update access time without modifying content:

#![allow(unused)]
fn main() {
let mut memory = db.get_memory("mem_001")?.unwrap();
memory.touch(); // Sets timestamp to now
db.insert_memory(&memory)?;
}

Default Values

#![allow(unused)]
fn main() {
impl Default for Memory {
    fn default() -> Self {
        Self {
            id: String::new(),
            content: String::new(),
            embedding: Vec::new(),
            importance: 0.5,
            timestamp: chrono::Utc::now().timestamp(),
            metadata: HashMap::new(),
        }
    }
}
}

Performance Tips

Batch Inserts: Use transactions for multiple inserts:

#![allow(unused)]
fn main() {
let mut txn = db.begin_transaction()?;
for memory in memories {
    // Insert via transaction (lower-level API needed)
}
txn.commit()?;
}

Cache Warm-Up: Preload frequently accessed memories:

#![allow(unused)]
fn main() {
for id in important_ids {
    db.get_memory(id)?; // Populate cache
}
}

Lazy Embedding Generation: Only generate embeddings when needed for search:

#![allow(unused)]
fn main() {
let memory = Memory::new(id, content);
// Don't set embedding unless search is required
db.insert_memory(&memory)?;
}

Error Handling

#![allow(unused)]
fn main() {
use opendb::Error;

match db.get_memory("mem_001") {
    Ok(Some(memory)) => { /* use memory */ },
    Ok(None) => { /* not found */ },
    Err(Error::Codec(_)) => { /* deserialization error */ },
    Err(Error::Storage(_)) => { /* storage error */ },
    Err(e) => { /* other error */ },
}
}

Graph API

OpenDB provides a labeled property graph for modeling relationships between memories.

Core Concepts

Nodes: Memory objects (referenced by ID)
Edges: Directed relationships with labels and weights
Relations: String labels like "causes", "before", "similar_to"

Edge Type

#![allow(unused)]
fn main() {
pub struct Edge {
    pub from: String,
    pub relation: String,
    pub to: String,
    pub weight: f64,
    pub timestamp: i64,
}
}

Linking Memories

Basic Link

#![allow(unused)]
fn main() {
use opendb::OpenDB;

let db = OpenDB::open("./db")?;

// Create two memories
let mem1 = Memory::new("mem_001".to_string(), "Rust is fast".to_string());
let mem2 = Memory::new("mem_002".to_string(), "C++ is fast".to_string());
db.insert_memory(&mem1)?;
db.insert_memory(&mem2)?;

// Link them
db.link("mem_001", "mem_002", "similar_to")?;
}

Signature:

#![allow(unused)]
fn main() {
pub fn link(&self, from: &str, to: &str, relation: &str) -> Result<()>
}

Behavior:

Creates directed edge from from → to
Default weight: 1.0
Stores in both forward and backward indexes
Allows multiple relations between same nodes

Custom Weight

#![allow(unused)]
fn main() {
use opendb::{OpenDB, Edge};

let edge = Edge {
    from: "mem_001".to_string(),
    relation: "causes".to_string(),
    to: "mem_002".to_string(),
    weight: 0.85,  // Custom confidence score
    timestamp: chrono::Utc::now().timestamp(),
};

// Link via graph manager (internal API, use link() for simple cases)
}

Unlinking

Remove a specific relationship:

#![allow(unused)]
fn main() {
db.unlink("mem_001", "mem_002", "similar_to")?;
}

Signature:

#![allow(unused)]
fn main() {
pub fn unlink(&self, from: &str, to: &str, relation: &str) -> Result<()>
}

Behavior:

Removes edge from both indexes
Succeeds even if edge doesn't exist
Does not delete the nodes

Querying Relationships

#![allow(unused)]
fn main() {
let related = db.get_related("mem_001", "similar_to")?;
for edge in related {
    println!("{} --[{}]--> {} (weight: {})", 
        edge.from, edge.relation, edge.to, edge.weight);
}
}

Signature:

#![allow(unused)]
fn main() {
pub fn get_related(&self, id: &str, relation: &str) -> Result<Vec<Edge>>
}

Returns: All edges from id with the specified relation.

Get Outgoing Edges

#![allow(unused)]
fn main() {
let outgoing = db.get_outgoing("mem_001")?;
for edge in outgoing {
    println!("Outgoing: {} --[{}]--> {}", edge.from, edge.relation, edge.to);
}
}

Signature:

#![allow(unused)]
fn main() {
pub fn get_outgoing(&self, id: &str) -> Result<Vec<Edge>>
}

Returns: All edges where id is the source (all relations).

Get Incoming Edges

#![allow(unused)]
fn main() {
let incoming = db.get_incoming("mem_002")?;
for edge in incoming {
    println!("Incoming: {} --[{}]--> {}", edge.from, edge.relation, edge.to);
}
}

Signature:

#![allow(unused)]
fn main() {
pub fn get_incoming(&self, id: &str) -> Result<Vec<Edge>>
}

Returns: All edges where id is the target (all relations).

Relation Types

OpenDB provides predefined relation constants:

#![allow(unused)]
fn main() {
pub mod relation {
    pub const RELATED_TO: &str = "related_to";
    pub const CAUSED_BY: &str = "caused_by";
    pub const BEFORE: &str = "before";
    pub const AFTER: &str = "after";
    pub const REFERENCES: &str = "references";
    pub const SIMILAR_TO: &str = "similar_to";
    pub const CONTRADICTS: &str = "contradicts";
    pub const SUPPORTS: &str = "supports";
}
}

Usage

#![allow(unused)]
fn main() {
use opendb::graph::relation;

db.link("mem_001", "mem_002", relation::CAUSED_BY)?;
db.link("mem_002", "mem_003", relation::BEFORE)?;
}

Custom Relations

You can use any string as a relation:

#![allow(unused)]
fn main() {
db.link("mem_001", "mem_002", "depends_on")?;
db.link("mem_003", "mem_004", "implements")?;
}

Graph Patterns

Temporal Chain

#![allow(unused)]
fn main() {
use opendb::graph::relation;

// Build timeline
db.link("event_1", "event_2", relation::BEFORE)?;
db.link("event_2", "event_3", relation::BEFORE)?;
db.link("event_3", "event_4", relation::BEFORE)?;

// Traverse forward
let next_events = db.get_related("event_1", relation::BEFORE)?;
}

Causal Graph

#![allow(unused)]
fn main() {
use opendb::graph::relation;

// A causes B, B causes C
db.link("symptom_A", "symptom_B", relation::CAUSED_BY)?;
db.link("symptom_B", "symptom_C", relation::CAUSED_BY)?;

// Find root causes
let causes = db.get_incoming("symptom_C")?;
}

Knowledge Graph

#![allow(unused)]
fn main() {
use opendb::graph::relation;

// Rust has ownership
db.link("rust", "ownership", "has_feature")?;
// Ownership enables memory_safety
db.link("ownership", "memory_safety", "enables")?;
// Memory_safety prevents bugs
db.link("memory_safety", "bug_prevention", "prevents")?;

// Traverse features
let features = db.get_related("rust", "has_feature")?;
}

Bidirectional Relationships

#![allow(unused)]
fn main() {
// A is similar to B
db.link("mem_A", "mem_B", "similar_to")?;
// B is also similar to A
db.link("mem_B", "mem_A", "similar_to")?;

// Query either direction
let similar_from_A = db.get_related("mem_A", "similar_to")?;
let similar_from_B = db.get_related("mem_B", "similar_to")?;
}

Advanced Queries

Multi-Hop Traversal

#![allow(unused)]
fn main() {
fn traverse_depth_2(db: &OpenDB, start: &str, relation: &str) -> Result<Vec<String>> {
    let mut result = Vec::new();
    
    // First hop
    let hop1 = db.get_related(start, relation)?;
    for edge1 in hop1 {
        result.push(edge1.to.clone());
        
        // Second hop
        let hop2 = db.get_related(&edge1.to, relation)?;
        for edge2 in hop2 {
            result.push(edge2.to.clone());
        }
    }
    
    Ok(result)
}
}

Filter by Weight

#![allow(unused)]
fn main() {
let edges = db.get_related("mem_001", "similar_to")?;
let strong_edges: Vec<_> = edges.into_iter()
    .filter(|e| e.weight > 0.8)
    .collect();
}

Aggregate Relations

#![allow(unused)]
fn main() {
use std::collections::HashMap;

let outgoing = db.get_outgoing("mem_001")?;
let mut relation_counts: HashMap<String, usize> = HashMap::new();

for edge in outgoing {
    *relation_counts.entry(edge.relation).or_insert(0) += 1;
}

println!("Relation distribution: {:?}", relation_counts);
}

Performance Characteristics

Operation	Time Complexity	Notes
`link()`	O(log n)	Two index writes (forward + backward)
`unlink()`	O(k log n)	k = edges between nodes
`get_related()`	O(log n + k)	k = matching edges
`get_outgoing()`	O(log n + k)	k = total outgoing edges
`get_incoming()`	O(log n + k)	k = total incoming edges

Storage Details

Edges are stored in two column families:

graph_forward: {from}:{relation} → Vec<Edge>
graph_backward: {to}:{relation} → Vec<Edge>

This dual-indexing enables fast queries in both directions.

Error Handling

#![allow(unused)]
fn main() {
use opendb::Error;

match db.link("mem_001", "mem_002", "related_to") {
    Ok(_) => println!("Link created"),
    Err(Error::Storage(_)) => println!("Storage error"),
    Err(Error::Graph(_)) => println!("Graph error"),
    Err(e) => println!("Other error: {}", e),
}
}

Vector Search API

OpenDB provides semantic similarity search using HNSW (Hierarchical Navigable Small World) index.

Overview

Vector search enables finding memories based on semantic similarity rather than exact matches:

#![allow(unused)]
fn main() {
use opendb::OpenDB;

let db = OpenDB::open("./db")?;

// Insert memories with embeddings
let memory = Memory {
    id: "mem_001".to_string(),
    content: "Rust is a systems programming language".to_string(),
    embedding: generate_embedding("Rust is a systems programming language"),
    ..Default::default()
};
db.insert_memory(&memory)?;

// Search by query embedding
let query_embedding = generate_embedding("What is Rust?");
let results = db.search_similar(&query_embedding, 5)?;
}

Search Similar

Find memories similar to a query vector:

#![allow(unused)]
fn main() {
let results = db.search_similar(&query_embedding, top_k)?;

for result in results {
    println!("ID: {}, Distance: {}", result.id, result.distance);
    let memory = db.get_memory(&result.id)?.unwrap();
    println!("Content: {}", memory.content);
}
}

Signature:

#![allow(unused)]
fn main() {
pub fn search_similar(&self, query: &[f32], top_k: usize) -> Result<Vec<SearchResult>>
}

Parameters:

query: Query vector (must match configured dimension)
top_k: Number of results to return

Returns: Vec<SearchResult> sorted by distance (closest first).

SearchResult Type

#![allow(unused)]
fn main() {
pub struct SearchResult {
    pub id: String,
    pub distance: f32,
}
}

id: Memory ID
distance: Euclidean distance (lower = more similar)

Embeddings

Dimension Configuration

Set embedding dimension when opening database:

#![allow(unused)]
fn main() {
use opendb::OpenDBOptions;

let mut options = OpenDBOptions::default();
options.vector_dimension = 768; // For OpenAI ada-002 or similar
let db = OpenDB::open_with_options("./db", options)?;
}

Default: 384 (for sentence-transformers/all-MiniLM-L6-v2)

Generating Embeddings

OpenDB does not include embedding generation. Use external models:

Example: sentence-transformers (Python)

from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')
embedding = model.encode("Hello world").tolist()  # [0.1, -0.2, ...]

Example: OpenAI API

#![allow(unused)]
fn main() {
// Pseudo-code (use openai-rust crate)
let embedding = openai_client
    .embeddings("text-embedding-ada-002")
    .create("Hello world")
    .await?;
}

Example: Candle (Rust)

#![allow(unused)]
fn main() {
// Use candle-transformers for local inference
// See: https://github.com/huggingface/candle
}

Synthetic Embeddings (Testing)

For testing without real models:

#![allow(unused)]
fn main() {
fn generate_synthetic_embedding(text: &str, dimension: usize) -> Vec<f32> {
    use std::collections::hash_map::DefaultHasher;
    use std::hash::{Hash, Hasher};
    
    let mut hasher = DefaultHasher::new();
    text.hash(&mut hasher);
    let seed = hasher.finish();
    
    let mut rng = /* initialize with seed */;
    (0..dimension).map(|_| rng.gen_range(-1.0..1.0)).collect()
}
}

Index Management

Automatic Index Building

The HNSW index is built automatically on first search:

#![allow(unused)]
fn main() {
// Insert memories
db.insert_memory(&memory1)?;
db.insert_memory(&memory2)?;

// First search triggers index build
let results = db.search_similar(&query, 5)?; // Builds index here
}

Manual Rebuild

Force index rebuild (e.g., after bulk inserts):

#![allow(unused)]
fn main() {
db.rebuild_vector_index()?;
}

Signature:

#![allow(unused)]
fn main() {
pub fn rebuild_vector_index(&self) -> Result<()>
}

When to rebuild:

After bulk memory inserts
After changing embeddings
To incorporate deleted memories

Note: Search automatically rebuilds if index is stale.

HNSW Parameters

HNSW has tunable parameters for speed vs accuracy tradeoff:

Default Parameters

#![allow(unused)]
fn main() {
pub struct HnswParams {
    pub ef_construction: usize, // 200
    pub max_neighbors: usize,   // 16
}
}

Presets

#![allow(unused)]
fn main() {
// High accuracy (slower build, better recall)
HnswParams::high_accuracy()  // ef=400, neighbors=32

// High speed (faster build, lower recall)
HnswParams::high_speed()     // ef=100, neighbors=8

// Balanced (default)
HnswParams::default()        // ef=200, neighbors=16
}

Note: Currently not exposed in OpenDB API. Future versions will allow tuning.

Distance Metric

OpenDB uses Euclidean distance:

$$ d(p, q) = \sqrt{\sum_{i=1}^{n} (p_i - q_i)^2} $$

Properties:

Lower distance = more similar
Distance 0 = identical vectors
Sensitive to magnitude (normalize if needed)

Normalization

For cosine similarity behavior, normalize embeddings:

#![allow(unused)]
fn main() {
fn normalize(vec: &mut Vec<f32>) {
    let magnitude: f32 = vec.iter().map(|x| x * x).sum::<f32>().sqrt();
    for x in vec.iter_mut() {
        *x /= magnitude;
    }
}

let mut embedding = generate_embedding(text);
normalize(&mut embedding);
}

Usage Patterns

Semantic Memory Search

#![allow(unused)]
fn main() {
// User asks a question
let query = "How do I prevent memory leaks in Rust?";
let query_embedding = generate_embedding(query);

// Find relevant memories
let results = db.search_similar(&query_embedding, 3)?;
for result in results {
    let memory = db.get_memory(&result.id)?.unwrap();
    println!("Relevant memory: {}", memory.content);
}
}

Deduplication

Find duplicate or near-duplicate content:

#![allow(unused)]
fn main() {
let new_content = "Rust ownership prevents data races";
let new_embedding = generate_embedding(new_content);

let similar = db.search_similar(&new_embedding, 1)?;
if let Some(top) = similar.first() {
    if top.distance < 0.1 {  // Threshold for "duplicate"
        println!("Similar content already exists: {}", top.id);
    }
}
}

Clustering

Group similar memories:

#![allow(unused)]
fn main() {
let all_memories = db.list_memories()?;
let mut clusters: Vec<Vec<String>> = Vec::new();

for memory in all_memories {
    if memory.embedding.is_empty() {
        continue;
    }
    
    let similar = db.search_similar(&memory.embedding, 5)?;
    let cluster: Vec<String> = similar.iter()
        .filter(|r| r.distance < 0.5)  // Similarity threshold
        .map(|r| r.id.clone())
        .collect();
    
    clusters.push(cluster);
}
}

Performance Characteristics

Operation	Time Complexity	Typical Latency
`search_similar()`	O(log n)	~1-10ms
`rebuild_vector_index()`	O(n log n)	~100ms per 1k vectors
Insert with embedding	O(1) + rebuild	Instant (rebuild deferred)

Scalability:

100-1k memories: Instant search
1k-10k memories: <10ms search
10k-100k memories: <50ms search
100k+ memories: Consider sharding (future feature)

Limitations

Dimension Mismatch: All embeddings must have same dimension
No Incremental Updates: Index rebuild is full reconstruction
Memory Usage: HNSW index kept in memory (~4 bytes × dimension × count)
No GPU Support: Pure CPU implementation

Error Handling

#![allow(unused)]
fn main() {
use opendb::Error;

match db.search_similar(&query, 10) {
    Ok(results) => { /* use results */ },
    Err(Error::VectorIndex(e)) => println!("Index error: {}", e),
    Err(Error::InvalidInput(e)) => println!("Bad query: {}", e),
    Err(e) => println!("Other error: {}", e),
}
}

Best Practices

Batch Inserts: Insert all memories, then rebuild once:

#![allow(unused)]
fn main() {
for memory in memories {
    db.insert_memory(&memory)?;
}
db.rebuild_vector_index()?; // One rebuild for all
}

Lazy Embeddings: Only generate embeddings for searchable content:

#![allow(unused)]
fn main() {
let memory = Memory::new(id, content);
// Don't set embedding if this memory won't be searched
db.insert_memory(&memory)?;
}

Relevance Filtering: Filter by distance threshold:

#![allow(unused)]
fn main() {
let results = db.search_similar(&query, 20)?;
let relevant: Vec<_> = results.into_iter()
    .filter(|r| r.distance < 1.0)  // Adjust threshold
    .collect();
}

Combine with Metadata: Use metadata to post-filter:

#![allow(unused)]
fn main() {
let results = db.search_similar(&query, 50)?;
for result in results {
    let memory = db.get_memory(&result.id)?.unwrap();
    if memory.metadata.get("category") == Some(&"docs".to_string()) {
        println!("Relevant doc: {}", memory.content);
    }
}
}

Multimodal File Support

OpenDB provides production-ready support for multimodal file processing, designed specifically for AI/LLM applications, RAG (Retrieval Augmented Generation) pipelines, and agent memory systems.

Overview

The multimodal API enables you to:

Detect and classify file types (PDF, DOCX, audio, video, text)
Process and chunk large documents
Store extracted text with embeddings
Track processing status for async workflows
Add custom metadata for any file type

File Type Detection

FileType Enum

The FileType enum represents supported file formats:

#![allow(unused)]
fn main() {
use opendb::FileType;

// Automatic detection from file extension
let pdf_type = FileType::from_extension("pdf");
assert_eq!(pdf_type, FileType::Pdf);

let audio_type = FileType::from_extension("mp3");
assert_eq!(audio_type, FileType::Audio);

// Get human-readable description
println!("{}", pdf_type.description()); // "PDF document"
println!("{}", audio_type.description()); // "Audio file"
}

Supported File Types

FileType	Extensions	Description
`Text`	.txt	Plain text file
`Pdf`	.pdf	PDF document
`Docx`	.docx	Microsoft Word document
`Audio`	.mp3, .wav, .ogg, .flac	Audio file
`Video`	.mp4, .avi, .mkv, .mov	Video file
`Image`	.jpg, .png, .gif, .bmp	Image file
`Unknown`	others	Unknown file type

Example: File Type Detection

#![allow(unused)]
fn main() {
use opendb::FileType;

fn detect_file_type(filename: &str) -> FileType {
    let extension = filename
        .rsplit('.')
        .next()
        .unwrap_or("");
    
    FileType::from_extension(extension)
}

// Usage
let file = "research_paper.pdf";
let file_type = detect_file_type(file);

match file_type {
    FileType::Pdf => println!("Processing PDF document"),
    FileType::Audio => println!("Transcribing audio file"),
    FileType::Video => println!("Extracting video captions"),
    _ => println!("Unsupported file type"),
}
}

Multimodal Documents

MultimodalDocument Structure

The MultimodalDocument struct represents a processed file with extracted content:

#![allow(unused)]
fn main() {
pub struct MultimodalDocument {
    pub id: String,
    pub filename: String,
    pub file_type: FileType,
    pub file_size: usize,
    pub extracted_text: String,
    pub chunks: Vec<DocumentChunk>,
    pub embedding: Option<Vec<f32>>,
    pub metadata: HashMap<String, String>,
    pub processing_status: ProcessingStatus,
    pub created_at: DateTime<Utc>,
    pub updated_at: DateTime<Utc>,
}
}

CRUD Operations

Create

#![allow(unused)]
fn main() {
use opendb::{MultimodalDocument, FileType};

// Create a new multimodal document
let doc = MultimodalDocument::new(
    "doc_001",                     // Unique ID
    "research_paper.pdf",          // Filename
    FileType::Pdf,                 // File type
    1024 * 500,                    // File size in bytes (500 KB)
    "Extracted text content...",   // Extracted text
    vec![0.1; 384],                // Document embedding (384-dim)
);

// Add metadata
let doc = doc
    .with_metadata("author", "Dr. Jane Smith")
    .with_metadata("pages", "25")
    .with_metadata("year", "2024")
    .with_metadata("category", "machine-learning");

println!("Created document: {}", doc.id);
println!("Status: {:?}", doc.processing_status);
}

Read

#![allow(unused)]
fn main() {
// Access document properties
println!("Filename: {}", doc.filename);
println!("File type: {:?}", doc.file_type);
println!("File size: {} KB", doc.file_size / 1024);
println!("Extracted text length: {} chars", doc.extracted_text.len());
println!("Number of chunks: {}", doc.chunks.len());

// Access metadata
if let Some(author) = doc.metadata.get("author") {
    println!("Author: {}", author);
}

// Check processing status
match &doc.processing_status {
    ProcessingStatus::Completed => println!("✓ Processing complete"),
    ProcessingStatus::Processing => println!("⏳ Still processing..."),
    ProcessingStatus::Failed(err) => println!("✗ Failed: {}", err),
    ProcessingStatus::Queued => println!("⏸ Queued for processing"),
}
}

Update

#![allow(unused)]
fn main() {
use opendb::ProcessingStatus;

// Update processing status
let mut doc = doc.clone();
doc.processing_status = ProcessingStatus::Processing;

// Add more metadata
doc.metadata.insert("processed_by".to_string(), "worker-01".to_string());
doc.metadata.insert("processing_time_ms".to_string(), "1234".to_string());

// Mark as completed
doc.processing_status = ProcessingStatus::Completed;
doc.updated_at = chrono::Utc::now();

println!("Updated document: {}", doc.id);
}

Delete

#![allow(unused)]
fn main() {
// In OpenDB, you would typically delete by ID using the database handle
// This is a conceptual example showing how to remove from memory

let mut documents: Vec<MultimodalDocument> = vec![/* ... */];
documents.retain(|d| d.id != "doc_001");

println!("Document deleted");
}

Document Chunking

DocumentChunk Structure

For large documents, use DocumentChunk to split content into processable segments:

#![allow(unused)]
fn main() {
pub struct DocumentChunk {
    pub chunk_id: String,
    pub content: String,
    pub embedding: Option<Vec<f32>>,
    pub start_offset: usize,
    pub end_offset: usize,
    pub metadata: HashMap<String, String>,
}
}

Creating Chunks

#![allow(unused)]
fn main() {
use opendb::{DocumentChunk, MultimodalDocument};

let mut doc = MultimodalDocument::new(
    "doc_002",
    "large_book.pdf",
    FileType::Pdf,
    1024 * 1024 * 5, // 5 MB
    "Full book content...",
    vec![0.1; 384],
);

// Add chunks (e.g., by chapter or page)
doc.add_chunk(DocumentChunk::new(
    "chunk_0",
    "Chapter 1: Introduction to Rust programming...",
    vec![0.15; 384],  // Chunk-specific embedding
    0,                // Start offset
    1500,             // End offset
).with_metadata("chapter", "1")
  .with_metadata("page_start", "1")
  .with_metadata("page_end", "15"));

doc.add_chunk(DocumentChunk::new(
    "chunk_1",
    "Chapter 2: Ownership and Borrowing...",
    vec![0.25; 384],
    1500,
    3200,
).with_metadata("chapter", "2")
  .with_metadata("page_start", "16")
  .with_metadata("page_end", "32"));

println!("Added {} chunks", doc.chunks.len());
}

Chunk Strategies

1. Fixed-Size Chunking

#![allow(unused)]
fn main() {
fn chunk_by_size(text: &str, chunk_size: usize) -> Vec<String> {
    text.chars()
        .collect::<Vec<_>>()
        .chunks(chunk_size)
        .map(|chunk| chunk.iter().collect())
        .collect()
}

// Usage
let text = "Very long document text...";
let chunks = chunk_by_size(&text, 1000);
}

2. Paragraph-Based Chunking

#![allow(unused)]
fn main() {
fn chunk_by_paragraphs(text: &str, max_paragraphs: usize) -> Vec<String> {
    text.split("\n\n")
        .collect::<Vec<_>>()
        .chunks(max_paragraphs)
        .map(|chunk| chunk.join("\n\n"))
        .collect()
}

// Usage
let chunks = chunk_by_paragraphs(&text, 3);
}

3. Token-Based Chunking (for LLMs)

#![allow(unused)]
fn main() {
// Requires tiktoken-rs or similar tokenizer
fn chunk_by_tokens(text: &str, max_tokens: usize) -> Vec<String> {
    // Pseudo-code - use actual tokenizer in production
    let tokens = tokenize(text);
    tokens
        .chunks(max_tokens)
        .map(|chunk| detokenize(chunk))
        .collect()
}
}

Processing Status

ProcessingStatus Enum

Track the lifecycle of document processing:

#![allow(unused)]
fn main() {
use opendb::ProcessingStatus;

// Status variants
let queued = ProcessingStatus::Queued;
let processing = ProcessingStatus::Processing;
let completed = ProcessingStatus::Completed;
let failed = ProcessingStatus::Failed("OCR error".to_string());

// Pattern matching
match doc.processing_status {
    ProcessingStatus::Queued => {
        println!("Document is queued for processing");
    }
    ProcessingStatus::Processing => {
        println!("Processing in progress...");
    }
    ProcessingStatus::Completed => {
        println!("✓ Processing completed successfully");
    }
    ProcessingStatus::Failed(error) => {
        eprintln!("✗ Processing failed: {}", error);
    }
}
}

Production Workflow

Complete PDF Processing Example

#![allow(unused)]
fn main() {
use opendb::{OpenDB, MultimodalDocument, DocumentChunk, FileType, ProcessingStatus};
use std::fs;

fn process_pdf(filepath: &str, db: &OpenDB) -> Result<String> {
    // 1. Read file
    let file_bytes = fs::read(filepath)?;
    let filename = filepath.rsplit('/').next().unwrap();
    
    // 2. Extract text (use pdf-extract or pdfium in production)
    let extracted_text = extract_pdf_text(&file_bytes)?;
    
    // 3. Generate document embedding
    let doc_embedding = generate_embedding(&extracted_text)?;
    
    // 4. Create multimodal document
    let mut doc = MultimodalDocument::new(
        &generate_id(),
        filename,
        FileType::Pdf,
        file_bytes.len(),
        &extracted_text,
        doc_embedding,
    )
    .with_metadata("source", "upload")
    .with_metadata("pages", &count_pages(&file_bytes).to_string());
    
    // 5. Chunk the document
    let chunks = chunk_text(&extracted_text, 1000);
    for (i, chunk_text) in chunks.iter().enumerate() {
        let chunk_embedding = generate_embedding(chunk_text)?;
        let chunk = DocumentChunk::new(
            &format!("chunk_{}", i),
            chunk_text,
            chunk_embedding,
            i * 1000,
            (i + 1) * 1000,
        )
        .with_metadata("chunk_index", &i.to_string());
        
        doc.add_chunk(chunk);
    }
    
    // 6. Mark as completed
    doc.processing_status = ProcessingStatus::Completed;
    
    // 7. Store in OpenDB (pseudo-code - actual storage via Memory type)
    let doc_id = doc.id.clone();
    store_document(db, &doc)?;
    
    Ok(doc_id)
}

// Helper functions (implement with actual libraries)
fn extract_pdf_text(bytes: &[u8]) -> Result<String> {
    // Use pdf-extract, pdfium, or poppler
    todo!("Implement with pdf-extract crate")
}

fn generate_embedding(text: &str) -> Result<Vec<f32>> {
    // Use sentence-transformers, OpenAI API, or onnxruntime
    todo!("Implement with embedding model")
}

fn chunk_text(text: &str, size: usize) -> Vec<String> {
    // Smart chunking by sentences/paragraphs
    todo!("Implement chunking strategy")
}

fn generate_id() -> String {
    uuid::Uuid::new_v4().to_string()
}

fn count_pages(bytes: &[u8]) -> usize {
    // Parse PDF to count pages
    todo!("Implement page counting")
}

fn store_document(db: &OpenDB, doc: &MultimodalDocument) -> Result<()> {
    // Store document and chunks as Memory records with embeddings
    todo!("Implement storage logic")
}
}

Audio Transcription Example

#![allow(unused)]
fn main() {
use opendb::{MultimodalDocument, DocumentChunk, FileType, ProcessingStatus};

fn process_audio(filepath: &str) -> Result<MultimodalDocument> {
    let file_bytes = fs::read(filepath)?;
    let filename = filepath.rsplit('/').next().unwrap();
    
    // 1. Transcribe audio (use whisper-rs or OpenAI Whisper API)
    let transcript = transcribe_audio(&file_bytes)?;
    
    // 2. Generate embedding from transcript
    let embedding = generate_embedding(&transcript)?;
    
    // 3. Create multimodal document
    let mut doc = MultimodalDocument::new(
        &generate_id(),
        filename,
        FileType::Audio,
        file_bytes.len(),
        &transcript,
        embedding,
    )
    .with_metadata("duration_seconds", &get_audio_duration(&file_bytes).to_string())
    .with_metadata("transcription_model", "whisper-large-v3");
    
    // 4. Add timestamped chunks
    let timestamped_segments = get_timestamped_segments(&file_bytes)?;
    for (i, segment) in timestamped_segments.iter().enumerate() {
        let chunk_embedding = generate_embedding(&segment.text)?;
        let chunk = DocumentChunk::new(
            &format!("segment_{}", i),
            &segment.text,
            chunk_embedding,
            segment.start_offset,
            segment.end_offset,
        )
        .with_metadata("timestamp_start", &segment.start_time.to_string())
        .with_metadata("timestamp_end", &segment.end_time.to_string());
        
        doc.add_chunk(chunk);
    }
    
    doc.processing_status = ProcessingStatus::Completed;
    Ok(doc)
}

struct AudioSegment {
    text: String,
    start_time: f64,
    end_time: f64,
    start_offset: usize,
    end_offset: usize,
}

fn transcribe_audio(bytes: &[u8]) -> Result<String> {
    // Use whisper-rs or cloud API
    todo!("Implement transcription")
}

fn get_audio_duration(bytes: &[u8]) -> f64 {
    // Parse audio metadata
    todo!("Implement duration extraction")
}

fn get_timestamped_segments(bytes: &[u8]) -> Result<Vec<AudioSegment>> {
    // Use Whisper with timestamps
    todo!("Implement segment extraction")
}
}

Integration with OpenDB

Storing Multimodal Documents

#![allow(unused)]
fn main() {
use opendb::{OpenDB, Memory, MultimodalDocument};

fn store_multimodal_document(db: &OpenDB, doc: &MultimodalDocument) -> Result<()> {
    // Store main document as Memory
    let memory = Memory::new(
        &doc.id,
        &doc.extracted_text,
        doc.embedding.clone().unwrap_or_default(),
        1.0, // importance
    )
    .with_metadata("filename", &doc.filename)
    .with_metadata("file_type", &format!("{:?}", doc.file_type))
    .with_metadata("file_size", &doc.file_size.to_string());
    
    db.insert_memory(&memory)?;
    
    // Store each chunk as separate Memory with relationships
    for chunk in &doc.chunks {
        let chunk_memory = Memory::new(
            &format!("{}_{}", doc.id, chunk.chunk_id),
            &chunk.content,
            chunk.embedding.clone().unwrap_or_default(),
            0.8, // chunk importance
        )
        .with_metadata("parent_doc", &doc.id)
        .with_metadata("chunk_id", &chunk.chunk_id);
        
        db.insert_memory(&chunk_memory)?;
        
        // Link chunk to parent document
        db.link(&memory.id, "has_chunk", &chunk_memory.id)?;
    }
    
    Ok(())
}
}

Semantic Search Across Documents

#![allow(unused)]
fn main() {
use opendb::{OpenDB, SearchResult};

fn search_documents(
    db: &OpenDB,
    query: &str,
    top_k: usize,
) -> Result<Vec<SearchResult>> {
    // Generate query embedding
    let query_embedding = generate_embedding(query)?;
    
    // Search across all documents and chunks
    let results = db.search_similar(&query_embedding, top_k)?;
    
    Ok(results)
}

// Usage
let results = search_documents(&db, "machine learning algorithms", 5)?;
for result in results {
    println!("Found: {} (distance: {:.4})", 
             result.memory.content,
             result.distance);
}
}

Best Practices

1. Chunking Strategy

Small chunks (500-1000 chars): Better precision, more API calls
Large chunks (1500-3000 chars): More context, fewer API calls
Overlap chunks: 10-20% overlap for continuity

2. Metadata Usage

Always add source file metadata
Include timestamps for temporal data
Add processing metadata (model version, date)
Store original file path for reference

3. Error Handling

#![allow(unused)]
fn main() {
use opendb::ProcessingStatus;

fn safe_process(filepath: &str) -> MultimodalDocument {
    let mut doc = MultimodalDocument::new(
        &generate_id(),
        filepath,
        FileType::Unknown,
        0,
        "",
        vec![],
    );
    
    doc.processing_status = ProcessingStatus::Queued;
    
    match process_file(filepath) {
        Ok(processed) => {
            doc = processed;
            doc.processing_status = ProcessingStatus::Completed;
        }
        Err(e) => {
            doc.processing_status = ProcessingStatus::Failed(e.to_string());
            eprintln!("Processing failed: {}", e);
        }
    }
    
    doc
}
}

4. Memory Management

Process files in batches
Clear processed chunks from memory
Use streaming for very large files
Implement backpressure for async processing

Production Libraries

PDF Processing

pdf-extract - Text extraction
pdfium-render - Rendering and OCR
lopdf - Low-level parsing

DOCX Processing

docx-rs - Read/write DOCX
mammoth-rs - Convert to text

Audio Transcription

whisper-rs - Local Whisper
OpenAI Whisper API - Cloud service

Video Processing

ffmpeg-next - Video/audio extraction
Combine with whisper for captions

Embeddings

sentence-transformers (Python + PyO3)
OpenAI Embeddings API
onnxruntime - Local models

Transactions API

OpenDB provides ACID-compliant transactions for atomic multi-operation updates.

Overview

Transactions group multiple operations into a single atomic unit:

#![allow(unused)]
fn main() {
use opendb::OpenDB;

let db = OpenDB::open("./db")?;
let mut txn = db.begin_transaction()?;

txn.put("default", b"key1", b"value1")?;
txn.put("default", b"key2", b"value2")?;
txn.commit()?; // Both writes succeed or both fail
}

Basic API

Begin Transaction

#![allow(unused)]
fn main() {
let mut txn = db.begin_transaction()?;
}

Signature:

#![allow(unused)]
fn main() {
pub fn begin_transaction(&self) -> Result<Transaction>
}

Returns: Transaction handle for performing operations.

Commit

#![allow(unused)]
fn main() {
txn.commit()?;
}

Signature:

#![allow(unused)]
fn main() {
pub fn commit(mut self) -> Result<()>
}

Behavior:

Atomically applies all changes
Returns error if conflicts detected (optimistic locking)
Consumes transaction (can't use after commit)

Rollback

#![allow(unused)]
fn main() {
txn.rollback()?;
}

Signature:

#![allow(unused)]
fn main() {
pub fn rollback(mut self) -> Result<()>
}

Behavior:

Discards all changes
Always succeeds
Consumes transaction

Auto-Rollback

Transactions auto-rollback if dropped without commit:

#![allow(unused)]
fn main() {
{
    let mut txn = db.begin_transaction()?;
    txn.put("default", b"key", b"value")?;
    // txn dropped here → automatic rollback
}

// Key was not written
assert!(db.get(b"key")?.is_none());
}

Transaction Operations

Get

#![allow(unused)]
fn main() {
let value = txn.get("default", b"key")?;
}

Signature:

#![allow(unused)]
fn main() {
pub fn get(&self, cf: &str, key: &[u8]) -> Result<Option<Vec<u8>>>
}

Behavior:

Reads from transaction snapshot
Sees writes from current transaction
Isolated from concurrent transactions

Put

#![allow(unused)]
fn main() {
txn.put("default", b"key", b"value")?;
}

Signature:

#![allow(unused)]
fn main() {
pub fn put(&mut self, cf: &str, key: &[u8], value: &[u8]) -> Result<()>
}

Behavior:

Buffers write in transaction
Not visible outside transaction until commit
Visible to subsequent reads in same transaction

Delete

#![allow(unused)]
fn main() {
txn.delete("default", b"key")?;
}

Signature:

#![allow(unused)]
fn main() {
pub fn delete(&mut self, cf: &str, key: &[u8]) -> Result<()>
}

Behavior:

Buffers delete in transaction
Subsequent gets in same transaction return None

Column Families

Transactions work across all column families:

#![allow(unused)]
fn main() {
let mut txn = db.begin_transaction()?;

// Write to different column families
txn.put("default", b"kv_key", b"value")?;
txn.put("records", b"mem_001", &encoded_memory)?;
txn.put("graph_forward", b"mem_001:related_to", &edges)?;

txn.commit()?; // All or nothing
}

Available Column Families:

"default" - KV store
"records" - Memory records
"graph_forward" - Outgoing edges
"graph_backward" - Incoming edges
"vector_data" - Embedding data
"vector_index" - HNSW index
"metadata" - Database metadata

ACID Examples

Atomicity

Either all operations succeed or none:

#![allow(unused)]
fn main() {
let mut txn = db.begin_transaction()?;

txn.put("default", b"account_A", b"-100")?;
txn.put("default", b"account_B", b"+100")?;

match txn.commit() {
    Ok(_) => println!("Transfer complete"),
    Err(e) => println!("Transfer failed, both accounts unchanged: {}", e),
}
}

Consistency

Maintain invariants across operations:

#![allow(unused)]
fn main() {
// Invariant: memory must exist before linking
let mut txn = db.begin_transaction()?;

// Insert memories
txn.put("records", b"mem_001", &encode_memory(&mem1))?;
txn.put("records", b"mem_002", &encode_memory(&mem2))?;

// Create link (requires both memories exist)
txn.put("graph_forward", b"mem_001:related_to", &encode_edges(&edges))?;

txn.commit()?; // Ensures consistency
}

Isolation

Transactions don't see each other's uncommitted changes:

#![allow(unused)]
fn main() {
// Transaction 1
let mut txn1 = db.begin_transaction()?;
txn1.put("default", b"counter", b"100")?;

// Transaction 2 (concurrent)
let mut txn2 = db.begin_transaction()?;
let val = txn2.get("default", b"counter")?; // Sees old value (not 100)

txn1.commit()?;
txn2.commit()?; // May conflict depending on operations
}

Durability

Committed changes survive crashes:

#![allow(unused)]
fn main() {
let mut txn = db.begin_transaction()?;
txn.put("default", b"important", b"data")?;
txn.commit()?;

// Even if process crashes here, data is safe

// Reopen database
let db = OpenDB::open("./db")?;
assert_eq!(db.get(b"important")?.unwrap(), b"data");
}

Conflict Handling

Transactions use optimistic locking and may fail on conflict:

#![allow(unused)]
fn main() {
use opendb::Error;

loop {
    let mut txn = db.begin_transaction()?;
    
    // Read-modify-write
    let val = txn.get("default", b"counter")?
        .and_then(|v| String::from_utf8(v).ok())
        .and_then(|s| s.parse::<i64>().ok())
        .unwrap_or(0);
    
    let new_val = val + 1;
    txn.put("default", b"counter", new_val.to_string().as_bytes())?;
    
    match txn.commit() {
        Ok(_) => break,
        Err(Error::Transaction(_)) => {
            println!("Conflict detected, retrying...");
            continue; // Retry
        }
        Err(e) => return Err(e),
    }
}
}

Advanced Patterns

Compare-and-Swap

#![allow(unused)]
fn main() {
fn compare_and_swap(
    db: &OpenDB,
    key: &[u8],
    expected: &[u8],
    new_value: &[u8],
) -> Result<bool> {
    let mut txn = db.begin_transaction()?;
    
    let current = txn.get("default", key)?;
    if current.as_deref() != Some(expected) {
        txn.rollback()?;
        return Ok(false); // Value changed
    }
    
    txn.put("default", key, new_value)?;
    txn.commit()?;
    Ok(true)
}
}

Batch Updates

#![allow(unused)]
fn main() {
fn batch_update(db: &OpenDB, updates: Vec<(Vec<u8>, Vec<u8>)>) -> Result<()> {
    let mut txn = db.begin_transaction()?;
    
    for (key, value) in updates {
        txn.put("default", &key, &value)?;
    }
    
    txn.commit()
}
}

Conditional Delete

#![allow(unused)]
fn main() {
fn delete_if_exists(db: &OpenDB, key: &[u8]) -> Result<bool> {
    let mut txn = db.begin_transaction()?;
    
    if txn.get("default", key)?.is_none() {
        txn.rollback()?;
        return Ok(false);
    }
    
    txn.delete("default", key)?;
    txn.commit()?;
    Ok(true)
}
}

Performance Considerations

Transaction Overhead

Transactions have overhead compared to direct writes:

#![allow(unused)]
fn main() {
// ❌ Slower: Many small transactions
for i in 0..1000 {
    let mut txn = db.begin_transaction()?;
    txn.put("default", &format!("key_{}", i).as_bytes(), b"value")?;
    txn.commit()?;
}

// ✅ Faster: One transaction for batch
let mut txn = db.begin_transaction()?;
for i in 0..1000 {
    txn.put("default", &format!("key_{}", i).as_bytes(), b"value")?;
}
txn.commit()?;
}

Transaction Size

Keep transactions reasonably sized:

Small (1-100 ops): Best performance
Medium (100-1000 ops): Good
Large (1000+ ops): May increase conflict rate and memory usage

Conflict Rate

High contention increases conflict rate:

#![allow(unused)]
fn main() {
// High contention: many threads updating same key
// Solution: Shard keys or use separate counters
}

Limitations

Single-threaded: One transaction per thread
No nested transactions: Can't begin transaction within transaction
Memory buffering: Large transactions use more memory
Optimistic locking: High contention may cause retries

Error Handling

#![allow(unused)]
fn main() {
use opendb::Error;

let mut txn = db.begin_transaction()?;
txn.put("default", b"key", b"value")?;

match txn.commit() {
    Ok(_) => println!("Success"),
    Err(Error::Transaction(e)) => println!("Conflict: {}", e),
    Err(Error::Storage(e)) => println!("Storage error: {}", e),
    Err(e) => println!("Other error: {}", e),
}
}

Best Practices

Keep transactions short: Minimize duration to reduce conflicts
Handle conflicts: Implement retry logic for read-modify-write
Batch when possible: Group related operations
Use auto-rollback: Let Drop handle cleanup in error paths
Explicit commits: Don't rely on implicit behavior

Performance Tuning

This guide covers optimization strategies for OpenDB deployments.

Profiling

Before optimizing, measure your bottleneck:

#![allow(unused)]
fn main() {
use std::time::Instant;

let start = Instant::now();
db.insert_memory(&memory)?;
println!("Insert took: {:?}", start.elapsed());
}

RocksDB Tuning

Write Buffer Size

Larger write buffers improve write throughput:

#![allow(unused)]
fn main() {
// Default: 128 MB
// For write-heavy workloads, increase:
opts.set_write_buffer_size(256 * 1024 * 1024); // 256 MB
}

Trade-offs:

✅ Fewer flushes to disk
✅ Better write throughput
❌ More memory usage
❌ Longer recovery time after crash

Block Cache

RocksDB's internal cache for disk blocks:

#![allow(unused)]
fn main() {
opts.set_block_cache_size(512 * 1024 * 1024); // 512 MB
}

Trade-offs:

✅ Faster reads
❌ More memory usage

Compression

Balance CPU vs storage:

#![allow(unused)]
fn main() {
use rocksdb::DBCompressionType;

// Default: LZ4 (fast, moderate compression)
opts.set_compression_type(DBCompressionType::Lz4);

// For better compression (slower writes):
opts.set_compression_type(DBCompressionType::Zstd);

// For faster writes (larger storage):
opts.set_compression_type(DBCompressionType::None);
}

Parallelism

Increase background threads for compaction:

#![allow(unused)]
fn main() {
opts.increase_parallelism(4); // Use 4 threads
}

Cache Tuning

Cache Sizes

Adjust cache capacity based on workload:

#![allow(unused)]
fn main() {
use opendb::OpenDBOptions;

let mut options = OpenDBOptions::default();

// For read-heavy workloads
options.kv_cache_size = 10_000;
options.record_cache_size = 5_000;

// For write-heavy workloads (smaller cache)
options.kv_cache_size = 1_000;
options.record_cache_size = 500;

let db = OpenDB::open_with_options("./db", options)?;
}

Cache Hit Rate

Monitor cache effectiveness:

#![allow(unused)]
fn main() {
// Implement hit rate tracking (example)
struct CacheStats {
    hits: AtomicU64,
    misses: AtomicU64,
}

impl CacheStats {
    fn hit_rate(&self) -> f64 {
        let hits = self.hits.load(Ordering::Relaxed) as f64;
        let misses = self.misses.load(Ordering::Relaxed) as f64;
        hits / (hits + misses)
    }
}
}

Target hit rates:

> 90%: Excellent
70-90%: Good
< 70%: Increase cache size

Batch Operations

Batch Inserts

Use transactions for bulk inserts:

#![allow(unused)]
fn main() {
// ❌ Slow: Individual commits
for memory in memories {
    db.insert_memory(&memory)?;
}

// ✅ Fast: Batch commit (future API)
let mut txn = db.begin_transaction()?;
for memory in memories {
    // Insert via transaction
}
txn.commit()?;
}

Flush Control

Control when data is flushed to disk:

#![allow(unused)]
fn main() {
// Insert many records
for i in 0..10_000 {
    db.insert_memory(&memory)?;
}

// Explicit flush
db.flush()?;
}

Vector Search Optimization

Index Parameters

Tune HNSW parameters for your use case:

#![allow(unused)]
fn main() {
// High accuracy (slower, better recall)
HnswParams::high_accuracy()  // ef=400, neighbors=32

// High speed (faster, lower recall)
HnswParams::high_speed()     // ef=100, neighbors=8
}

Rebuild Strategy

Rebuild index strategically:

#![allow(unused)]
fn main() {
// ❌ Bad: Rebuild after every insert
for memory in memories {
    db.insert_memory(&memory)?;
    db.rebuild_vector_index()?; // Expensive!
}

// ✅ Good: Rebuild once after batch
for memory in memories {
    db.insert_memory(&memory)?;
}
db.rebuild_vector_index()?; // Once
}

Dimension Reduction

Lower dimensions = faster search:

#![allow(unused)]
fn main() {
// 768D (high quality, slower)
options.vector_dimension = 768;

// 384D (balanced)
options.vector_dimension = 384;

// 128D (fast, lower quality)
options.vector_dimension = 128;
}

Graph Optimization

Link Batching

Batch graph operations:

#![allow(unused)]
fn main() {
// Create all memories first
for memory in memories {
    db.insert_memory(&memory)?;
}

// Then create all links
for (from, to, relation) in edges {
    db.link(from, to, relation)?;
}
}

Prune Unused Relations

Remove stale edges periodically:

#![allow(unused)]
fn main() {
fn prune_orphaned_edges(db: &OpenDB) -> Result<()> {
    let all_ids: HashSet<_> = db.list_memory_ids()?.into_iter().collect();
    
    for id in db.list_memory_ids()? {
        let outgoing = db.get_outgoing(&id)?;
        for edge in outgoing {
            if !all_ids.contains(&edge.to) {
                db.unlink(&edge.from, &edge.to, &edge.relation)?;
            }
        }
    }
    
    Ok(())
}
}

Memory Usage

Estimate Memory Footprint

Total Memory = 
    RocksDB Write Buffers +
    RocksDB Block Cache +
    Application Caches +
    HNSW Index +
    Overhead

Example:
    128 MB (write buffers) +
    256 MB (block cache) +
    10 MB (app caches, 10k entries × 1KB avg) +
    30 MB (HNSW, 10k vectors × 384D × 4 bytes × 2x overhead) +
    50 MB (overhead)
    = ~474 MB

Reduce Memory Usage

Smaller caches:

#![allow(unused)]
fn main() {
options.kv_cache_size = 100;
options.record_cache_size = 100;
}

Lower RocksDB buffers:

#![allow(unused)]
fn main() {
opts.set_write_buffer_size(64 * 1024 * 1024); // 64 MB
opts.set_block_cache_size(128 * 1024 * 1024); // 128 MB
}

Smaller embeddings:

#![allow(unused)]
fn main() {
options.vector_dimension = 128; // Instead of 768
}

Disk Usage

Compaction

Force compaction to reclaim space:

#![allow(unused)]
fn main() {
// Manual compaction (future API)
db.compact_range(None, None)?;
}

Monitoring

Check database size:

#![allow(unused)]
fn main() {
// On Linux
std::process::Command::new("du")
    .args(&["-sh", "./db"])
    .output()?;
}

Benchmarking

Use Criterion for accurate benchmarks:

#![allow(unused)]
fn main() {
use criterion::{black_box, criterion_group, criterion_main, Criterion};

fn benchmark_insert(c: &mut Criterion) {
    let db = OpenDB::open("./bench_db").unwrap();
    
    c.bench_function("insert_memory", |b| {
        b.iter(|| {
            let memory = Memory::new("id".to_string(), "content".to_string());
            db.insert_memory(black_box(&memory)).unwrap();
        });
    });
}

criterion_group!(benches, benchmark_insert);
criterion_main!(benches);
}

Monitoring Metrics

Implement metrics collection:

#![allow(unused)]
fn main() {
struct Metrics {
    reads: AtomicU64,
    writes: AtomicU64,
    cache_hits: AtomicU64,
    cache_misses: AtomicU64,
}

impl Metrics {
    fn report(&self) {
        println!("Reads: {}", self.reads.load(Ordering::Relaxed));
        println!("Writes: {}", self.writes.load(Ordering::Relaxed));
        println!("Cache hit rate: {:.2}%", 
            self.cache_hits.load(Ordering::Relaxed) as f64 /
            (self.cache_hits.load(Ordering::Relaxed) + 
             self.cache_misses.load(Ordering::Relaxed)) as f64 * 100.0
        );
    }
}
}

Platform-Specific Tips

Linux

Use io_uring for async I/O (future RocksDB feature)
Disable transparent huge pages for lower latency
Use fallocate for preallocating disk space

macOS

APFS filesystem has good performance
Use F_NOCACHE for large scans (avoid cache pollution)

Windows

Use NTFS for best RocksDB performance
Disable indexing on database directory
Use SSD for best performance

Common Bottlenecks

Slow writes: Increase write buffer size, disable compression
Slow reads: Increase cache sizes, use SSD
High memory: Reduce cache sizes, lower embedding dimension
Slow vector search: Reduce HNSW parameters, lower dimension
Large database size: Enable compression, run compaction

Extending OpenDB

OpenDB is designed to be extensible. This guide covers custom backends, plugins, and extensions.

Custom Storage Backends

OpenDB uses the StorageBackend trait for pluggability.

Storage Trait

#![allow(unused)]
fn main() {
pub trait StorageBackend: Send + Sync {
    fn get(&self, cf: &str, key: &[u8]) -> Result<Option<Vec<u8>>>;
    fn put(&self, cf: &str, key: &[u8], value: &[u8]) -> Result<()>;
    fn delete(&self, cf: &str, key: &[u8]) -> Result<()>;
    fn exists(&self, cf: &str, key: &[u8]) -> Result<bool>;
    fn scan_prefix(&self, cf: &str, prefix: &[u8]) -> Result<Vec<(Vec<u8>, Vec<u8>)>>;
    fn begin_transaction(&self) -> Result<Box<dyn Transaction>>;
    fn flush(&self) -> Result<()>;
    fn snapshot(&self) -> Result<Box<dyn Snapshot>>;
}
}

Example: In-Memory Backend

#![allow(unused)]
fn main() {
use std::collections::HashMap;
use std::sync::RwLock;
use opendb::storage::{StorageBackend, Transaction, Snapshot};
use opendb::{Result, Error};

pub struct MemoryBackend {
    data: RwLock<HashMap<String, HashMap<Vec<u8>, Vec<u8>>>>,
}

impl MemoryBackend {
    pub fn new() -> Self {
        Self {
            data: RwLock::new(HashMap::new()),
        }
    }
}

impl StorageBackend for MemoryBackend {
    fn get(&self, cf: &str, key: &[u8]) -> Result<Option<Vec<u8>>> {
        let data = self.data.read().unwrap();
        Ok(data.get(cf)
            .and_then(|cf_data| cf_data.get(key))
            .cloned())
    }
    
    fn put(&self, cf: &str, key: &[u8], value: &[u8]) -> Result<()> {
        let mut data = self.data.write().unwrap();
        data.entry(cf.to_string())
            .or_insert_with(HashMap::new)
            .insert(key.to_vec(), value.to_vec());
        Ok(())
    }
    
    fn delete(&self, cf: &str, key: &[u8]) -> Result<()> {
        let mut data = self.data.write().unwrap();
        if let Some(cf_data) = data.get_mut(cf) {
            cf_data.remove(key);
        }
        Ok(())
    }
    
    fn exists(&self, cf: &str, key: &[u8]) -> Result<bool> {
        Ok(self.get(cf, key)?.is_some())
    }
    
    fn scan_prefix(&self, cf: &str, prefix: &[u8]) -> Result<Vec<(Vec<u8>, Vec<u8>)>> {
        let data = self.data.read().unwrap();
        Ok(data.get(cf)
            .map(|cf_data| {
                cf_data.iter()
                    .filter(|(k, _)| k.starts_with(prefix))
                    .map(|(k, v)| (k.clone(), v.clone()))
                    .collect()
            })
            .unwrap_or_default())
    }
    
    fn flush(&self) -> Result<()> {
        // No-op for in-memory
        Ok(())
    }
    
    // Implement Transaction and Snapshot traits...
}
}

Using Custom Backend

#![allow(unused)]
fn main() {
let backend = Arc::new(MemoryBackend::new());
let db = OpenDB::with_backend(backend, OpenDBOptions::default())?;
}

Custom Cache Implementations

Implement the Cache trait for custom caching strategies:

#![allow(unused)]
fn main() {
pub trait Cache<K, V>: Send + Sync {
    fn get(&self, key: &K) -> Option<V>;
    fn put(&self, key: K, value: V);
    fn remove(&self, key: &K);
    fn clear(&self);
    fn len(&self) -> usize;
}
}

Example: TTL Cache

#![allow(unused)]
fn main() {
use std::collections::HashMap;
use std::time::{Instant, Duration};
use parking_lot::RwLock;

pub struct TtlCache<K, V> {
    data: RwLock<HashMap<K, (V, Instant)>>,
    ttl: Duration,
}

impl<K: Eq + std::hash::Hash + Clone, V: Clone> Cache<K, V> for TtlCache<K, V> {
    fn get(&self, key: &K) -> Option<V> {
        let data = self.data.read();
        data.get(key).and_then(|(value, inserted)| {
            if inserted.elapsed() < self.ttl {
                Some(value.clone())
            } else {
                None // Expired
            }
        })
    }
    
    fn put(&self, key: K, value: V) {
        let mut data = self.data.write();
        data.insert(key, (value, Instant::now()));
    }
    
    // ... implement other methods
}
}

Custom Vector Indexes

While OpenDB uses HNSW, you can wrap alternative indexes:

Example: Flat Index

#![allow(unused)]
fn main() {
pub struct FlatVectorIndex {
    vectors: RwLock<Vec<(String, Vec<f32>)>>,
}

impl FlatVectorIndex {
    pub fn search(&self, query: &[f32], top_k: usize) -> Vec<SearchResult> {
        let vectors = self.vectors.read();
        let mut results: Vec<_> = vectors.iter()
            .map(|(id, vec)| {
                let distance = euclidean_distance(query, vec);
                SearchResult { id: id.clone(), distance }
            })
            .collect();
        
        results.sort_by(|a, b| a.distance.partial_cmp(&b.distance).unwrap());
        results.truncate(top_k);
        results
    }
}

fn euclidean_distance(a: &[f32], b: &[f32]) -> f32 {
    a.iter().zip(b.iter())
        .map(|(x, y)| (x - y).powi(2))
        .sum::<f32>()
        .sqrt()
}
}

Custom Serialization

Replace rkyv with custom codec:

#![allow(unused)]
fn main() {
pub trait Codec<T> {
    fn encode(&self, value: &T) -> Result<Vec<u8>>;
    fn decode(&self, bytes: &[u8]) -> Result<T>;
}

pub struct JsonCodec;

impl<T: serde::Serialize + serde::de::DeserializeOwned> Codec<T> for JsonCodec {
    fn encode(&self, value: &T) -> Result<Vec<u8>> {
        serde_json::to_vec(value).map_err(|e| Error::Codec(e.to_string()))
    }
    
    fn decode(&self, bytes: &[u8]) -> Result<T> {
        serde_json::from_slice(bytes).map_err(|e| Error::Codec(e.to_string()))
    }
}
}

Plugin System (Future)

Planned plugin architecture:

#![allow(unused)]
fn main() {
pub trait Plugin: Send + Sync {
    fn name(&self) -> &str;
    fn init(&mut self, db: &OpenDB) -> Result<()>;
    fn on_insert(&self, memory: &Memory) -> Result<()>;
    fn on_delete(&self, id: &str) -> Result<()>;
    fn on_link(&self, edge: &Edge) -> Result<()>;
}

// Example: Audit logger plugin
pub struct AuditPlugin {
    log_file: Mutex<File>,
}

impl Plugin for AuditPlugin {
    fn on_insert(&self, memory: &Memory) -> Result<()> {
        let mut file = self.log_file.lock().unwrap();
        writeln!(file, "INSERT: {}", memory.id)?;
        Ok(())
    }
}
}

Custom Relation Types

Extend graph relations for domain-specific needs:

#![allow(unused)]
fn main() {
pub mod custom_relations {
    pub const IMPLEMENTS: &str = "implements";
    pub const EXTENDS: &str = "extends";
    pub const DEPENDS_ON: &str = "depends_on";
    pub const TESTED_BY: &str = "tested_by";
}

use custom_relations::*;

db.link("MyStruct", "MyTrait", IMPLEMENTS)?;
db.link("ChildStruct", "ParentStruct", EXTENDS)?;
}

Embedding Adapters

Create adapters for different embedding models:

#![allow(unused)]
fn main() {
pub trait EmbeddingModel {
    fn dimension(&self) -> usize;
    fn encode(&self, text: &str) -> Result<Vec<f32>>;
}

pub struct SentenceTransformerAdapter {
    // Python bindings via PyO3
}

impl EmbeddingModel for SentenceTransformerAdapter {
    fn dimension(&self) -> usize {
        384 // all-MiniLM-L6-v2
    }
    
    fn encode(&self, text: &str) -> Result<Vec<f32>> {
        // Call Python model
        todo!()
    }
}
}

Future Extension Points

Planned extensibility features:

Query Language: SQL-like interface for complex queries
Triggers: Execute callbacks on events
Views: Virtual collections with custom logic
Migrations: Schema evolution helpers
Replication: Multi-instance synchronization

Contributing Extensions

If you build a useful extension, consider contributing:

Fork the repository
Create a new module in src/extensions/
Document usage and API
Add tests for functionality
Submit a pull request

Best Practices

Follow trait contracts: Implement all required methods
Handle errors: Use Result<T, Error> consistently
Thread safety: Use Send + Sync for shared state
Document: Provide clear documentation and examples
Test: Write comprehensive tests for custom components

Examples

See the examples/ directory for:

custom_backend.rs: Alternative storage backend
plugin_example.rs: Sample plugin implementation
custom_index.rs: Alternative vector index

Contributing to OpenDB

Thank you for your interest in contributing to OpenDB! This guide will help you get started.

Check existing issues to avoid duplicates
Use the bug report template when creating a new issue
Provide details:
- OpenDB version
- Rust version (rustc --version)
- Operating system
- Minimal reproduction steps
- Expected vs actual behavior

Suggesting Features

Check the roadmap to see if it's planned
Use the feature request template
Describe:
- Use case and motivation
- Proposed API design
- Alternative solutions considered

Pull Requests

Fork the repository
Create a branch from main:
```
git checkout -b feature/my-feature
```
Make your changes following our code style
Write tests for new functionality
Update documentation if needed
Commit with descriptive messages
Push to your fork
Open a pull request with detailed description

Development Setup

Prerequisites

Rust 1.70 or later
RocksDB development libraries (see Installation guide)

Clone and Build

git clone https://github.com/muhammad-fiaz/OpenDB.git
cd OpenDB
cargo build

Run Tests

# All tests
cargo test

# Specific test
cargo test test_name

# With output
cargo test -- --nocapture

Run Examples

cargo run --example quickstart
cargo run --example memory_agent
cargo run --example graph_relations

Build Documentation

# API docs
cargo doc --open

# mdBook docs
cd docs
mdbook serve --open

Code Style

Formatting

Use rustfmt for consistent formatting:

cargo fmt --all

Linting

Use clippy for code quality:

cargo clippy --all-targets --all-features -- -D warnings

Naming Conventions

Types: PascalCase (e.g., OpenDB, StorageBackend)
Functions: snake_case (e.g., insert_memory, get_related)
Constants: SCREAMING_SNAKE_CASE (e.g., DEFAULT_CACHE_SIZE)
Modules: snake_case (e.g., graph, vector)

Documentation

Public APIs: Must have /// documentation
Examples: Include usage examples in doc comments
Errors: Document possible error cases

Example:

#![allow(unused)]
fn main() {
/// Inserts a memory record into the database.
///
/// # Arguments
///
/// * `memory` - The memory record to insert
///
/// # Returns
///
/// Returns `Ok(())` on success, or an error if:
/// - Serialization fails
/// - Storage write fails
///
/// # Example
///
/// ```
/// let memory = Memory::new("id".to_string(), "content".to_string());
/// db.insert_memory(&memory)?;
/// ```
pub fn insert_memory(&self, memory: &Memory) -> Result<()> {
    // ...
}
}

Testing Guidelines

Unit Tests

Place unit tests in the same file as the code:

#![allow(unused)]
fn main() {
#[cfg(test)]
mod tests {
    use super::*;
    
    #[test]
    fn test_memory_creation() {
        let memory = Memory::new("id".to_string(), "content".to_string());
        assert_eq!(memory.id, "id");
        assert_eq!(memory.content, "content");
    }
}
}

Integration Tests

Place integration tests in tests/:

#![allow(unused)]
fn main() {
// tests/my_feature_test.rs
use opendb::{OpenDB, Memory};
use tempfile::TempDir;

#[test]
fn test_my_feature() {
    let temp_dir = TempDir::new().unwrap();
    let db = OpenDB::open(temp_dir.path()).unwrap();
    
    // Test logic
}
}

Test Coverage

Aim for:

New features: >80% coverage
Bug fixes: Regression test included
Edge cases: Test error paths

Commit Messages

Follow conventional commits format:

<type>(<scope>): <subject>

<body>

<footer>

Types:

feat: New feature
fix: Bug fix
docs: Documentation changes
style: Formatting changes
refactor: Code refactoring
test: Adding tests
chore: Maintenance tasks

Examples:

feat(graph): add weighted edge support

Adds optional weight parameter to link() method,
allowing users to specify edge weights.

Closes #123

fix(cache): prevent race condition in LRU eviction

Fixes deadlock when multiple threads evict simultaneously
by using a write lock during eviction.

Fixes #456

Pull Request Guidelines

PR Title

Use the same format as commit messages:

feat(vector): add cosine similarity distance metric

PR Description

Include:

What: Description of changes
Why: Motivation and context
How: Implementation approach
Testing: How you tested the changes
Checklist:
- Tests added/updated
- Documentation updated
- Changelog updated (for features/fixes)
- Code formatted with rustfmt
- Linted with clippy

Review Process

CI checks: All tests must pass
Code review: At least one maintainer approval
Documentation: Verify docs are updated
Changelog: Ensure CHANGELOG.md is updated

Architecture Guidelines

Module Organization

Follow existing structure:

src/
  lib.rs          # Public API exports
  database.rs     # Main OpenDB struct
  error.rs        # Error types
  types.rs        # Core data types
  storage/        # Storage backends
  cache/          # Caching layer
  kv/             # Key-value store
  records/        # Memory records
  graph/          # Graph relationships
  vector/         # Vector search
  transaction/    # Transaction management
  codec/          # Serialization

Adding New Features

New module: Create in appropriate directory
Trait-based: Use traits for extensibility
Error handling: Use Result<T, Error>
Thread safety: Ensure Send + Sync where needed

Performance Considerations

Benchmarks: Add benchmarks for performance-critical code
Profiling: Profile before optimizing
Allocations: Minimize unnecessary allocations
Locks: Prefer RwLock for read-heavy workloads

Documentation Updates

When adding features, update:

API docs: /// comments in code
mdBook docs: Relevant pages in docs/src/
Examples: Add example if appropriate
CHANGELOG.md: Document changes
README.md: Update if API changes

Release Process (Maintainers)

Version bump: Update Cargo.toml
Changelog: Update CHANGELOG.md
Tag: Create git tag v0.x.y
Publish: cargo publish
GitHub Release: Create release notes

Getting Help

Discussions: GitHub Discussions for questions
Issues: GitHub Issues for bugs/features
Email: contact@muhammadfiaz.com for private inquiries

Recognition

Contributors are recognized in:

CONTRIBUTORS.md file
GitHub contributors page
Release notes

Thank you for contributing to OpenDB! 🎉

Keyboard shortcuts

OpenDB Documentation