AccelerateSearch

A self-hosted, production-grade search engine written in Rust.

AccelerateSearch combines the developer experience of Meilisearch with the analytical power of Elasticsearch, all in a single binary that runs on Linux, macOS, and Windows.

Warning

The project is in active development. APIs, on-disk formats, configuration keys, and CLI flags may change between releases. Pin a specific commit or release tag for stability.

Features

Blazing-fast full-text search with BM25 ranking
Lock-free concurrent reads with DashMap and parking_lot
FST-backed term dictionaries for O(log n) prefix lookups and autocomplete
Vector and hybrid search with scalar / product / binary quantization
Fuzzy matching with bounded Damerau-Levenshtein typo tolerance
Complex filter expressions (field = "value" AND rating > 4 OR location GEO_BBOX …)
Facet distributions and stats for every field
Per-collection settings: ranking rules, synonyms, stop words, typo tolerance, embedders, distinct field, …
Webhooks that fire on document and index events
Tenant tokens for short-lived, scoped access from the browser
API keys with expiry, scopes, and per-collection ACLs
Prometheus metrics at /metrics, structured logs via tracing
Snapshots (tar + zstd) for backup and restore
Single binary with no external services required

Project layout

crates/
  api/          REST handlers, DTOs, OpenAPI schema
  auth/         master key, API keys, tenant tokens
  cache/        LRU + TTL cache
  collections/  collection metadata service
  config/       TOML config & validation
  documents/    document service
  filters/      filter expression parser & evaluator
  hybrid/       RRF, score normalization
  highlighting/ <em> highlighting
  indexing/     tokenization + inverted index + FST
  metrics/      Prometheus exporter
  models/       shared data types
  search/       BM25, ranking, query parser
  security/     rate limit, CORS, audit logger
  server/       HTTP lifecycle, banner
  storage/      StorageBackend trait + redb
  synonyms/     synonym map storage and lookup
  tasks/        async task queue
  telemetry/    tracing-subscriber setup
  typo/         Damerau-Levenshtein
  utils/        hash, random, time helpers
  validation/   input validation & sanitization
  vector/       embedding types + quantization
config/         default.toml
docs/           this mdbook source
.github/        CI + release + docs workflows

Author

Muhammad Fiaz — contact@muhammadfiaz.com

AccelerateSearch Architecture

AccelerateSearch is a self-hosted, production-grade search engine written in Rust. This document describes the high-level architecture, the crate dependency graph, and the data flow during a search and an indexing request.

Goals

Single binary that serves the full REST API on Linux, macOS, and Windows.
Pluggable storage backend (default: embedded redb).
Fast keyword + vector search with a 10 ms target p99 for sub-million-document collections on commodity hardware.
Meilisearch-style developer experience: tasks, settings, scopes, webhooks, tenant tokens.
Elasticsearch-style power: complex filter expressions, facet distributions, multi-index search, ranking rules.
OpenSearch-level observability: Prometheus metrics, structured logging via tracing.

Crate Dependency Graph

The workspace contains 30 library crates and one binary. The binary (accelerate) is a thin shell that wires them together; the server lifecycle (crates/server) owns the actix-web setup, banner, and graceful-shutdown logic. Everything else is layered on top of the api crate, which holds the HTTP handlers.

                      ┌─────────────────────┐
                      │      accelerate     │  (root binary)
                      └──────────┬──────────┘
                                 │
                      ┌──────────▼──────────┐
                      │       server        │  (HTTP lifecycle, banner)
                      └──────────┬──────────┘
                                 │
            ┌────────────────────┼────────────────────┐
            │                    │                    │
     ┌──────▼──────┐     ┌───────▼───────┐    ┌───────▼───────┐
     │     api     │     │  scheduler    │    │  telemetry    │
     └──────┬──────┘     └───────────────┘    └───────────────┘
            │
   ┌────────┼────────┬───────────┬────────────┬──────────────┐
   │        │        │           │            │              │
┌──▼──┐  ┌──▼──┐  ┌───▼───┐  ┌────▼────┐  ┌────▼────┐  ┌─────▼─────┐
│auth │  │search│  │indexing│  │documents│  │filters │  │ collections│
└──┬──┘  └──┬──┘  └───┬───┘  └────┬────┘  └────┬────┘  └─────┬─────┘
   │        │         │           │            │             │
   │    ┌───▼───┐     │           │            │             │
   │    │ cache │     │           │            │             │
   │    └───────┘     │           │            │             │
   │                 │           │            │             │
┌──▼──────┐     ┌────▼────┐  ┌────▼────┐  ┌────▼────┐  ┌────▼────┐
│security │     │ storage │  │ facets  │  │  typo   │  │  hybrid │
└─────────┘     └────┬────┘  └─────────┘  └─────────┘  └─────────┘
                     │
              ┌──────▼──────┐
              │    redb     │  (embedded key-value store)
              └─────────────┘

Cross-cutting helpers that all crates can depend on:

Crate	Role
`errors`	Unified `AppError` / `AppResult` with `From` impls
`utils`	Hash, random, time helpers
`models`	Shared DTOs and value types
`validation`	Collection-uid, field-name, query, and filter validation + sanitisation
`highlighting`	`<em>`-style snippet builder
`synonyms`	Synonym map storage and lookup
`vector`	`Embedding` enum + scalar / product / binary quantisation
`metrics`	Prometheus exporter
`cache`	LRU + TTL cache for search results
`tasks`	Async task queue with cancellation
`snapshots`	tar + zstd snapshot read / write
`telemetry`	`tracing-subscriber` setup with daily file rotation
`cluster`, `replication`, `sharding`	Skeleton traits with `// TODO(<scope>)` markers

Data Flow: Search Request

actix-web receives the HTTP request at /api/v1/collections/{uid}/search.
The middleware stack runs in order: tracing → rate limit → auth.
The search handler validates the request and looks up the collection in the in-memory CollectionStore.
SearchEngine::search_with_rules consults the result cache. On hit, the cached response is returned immediately.
On miss, the engine:
- Loads the collection’s InvertedIndex from the IndexStore (cached in a DashMap keyed by CollectionId).
- Resolves synonym expansion for the query terms.
- Applies typo tolerance (bounded Damerau-Levenshtein expansion).
- Scores candidates with BM25 (crates/search::bm25).
- Applies the filter (recursive-descent parser → evaluator) on hydrated documents.
- Applies user-requested sorting and the ruleset (pinned, hidden, sort/filter overrides).
- Computes facet distributions.
The response is JSON-serialised with a processingTimeMs field and returned.
Successful responses are stored in the result cache (TTL + LRU).

Data Flow: Indexing Request

POST /api/v1/collections/{uid}/documents is received.
The documents handler validates every document and calls DocumentService::add_or_replace.
The service runs the IndexingPipeline which:
- Tokenises each searchable field with the Analyzer (Unicode NFC, lowercase, stop-word removal, optional stemming).
- Updates the in-memory InvertedIndex (per-field term frequencies, per-document field lengths).
- Recomputes BM25 collection statistics.
- Rebuilds the FST-backed term dictionary for O(log n) prefix lookups (used by autocomplete).
- Persists the documents to storage::TABLE_DOCUMENTS and the postings / terms / field-lengths / stats to the matching TABLE_* tables.
The result cache is invalidated for the collection.

Concurrency Model

The server runs actix-web with one Tokio worker per CPU core.
Shared in-memory state lives in DashMap instances (per-collection indexes, hooks, rulesets, key cache).
Long-running mutators (e.g. RwLock over an index) use parking_lot for lower contention than std::sync.
Background jobs (scheduler) run on a Notify-gated Tokio task that can be cancelled on shutdown.
Result caching uses an LRU + TTL TtlCache (parking_lot::Mutex<LruCache<K, Entry<V>>>).
Hot-reloadable config is wrapped in arc-swap so readers never block writers.

Storage

The default StorageBackend is an embedded redb key-value store. The schema is table-based, with the following tables defined in crates/storage:

Table	Key	Value
`collections`	`CollectionId`	`Collection` (JSON)
`documents`	`{collection}\u{0}{doc_id}`	raw document bytes
`inverted_index`	`CollectionId`	`IndexRecord` (JSON snapshot)
`postings`	`{collection}\u{0}{term}`	per-doc posting list
`terms`	`{collection}\u{0}{term}`	term metadata (df, total tf)
`field_lengths`	`{collection}\u{0}{doc_id}`	per-doc field lengths
`collection_stats`	`CollectionId`	`CollectionStats`
`vectors`	`{collection}\u{0}{doc_id}`	raw vector bytes
`tasks`	`TaskId`	`Task` (JSON)
`keys`	`ApiKeyId`	`ApiKey` (JSON)
`settings`	`CollectionId`	`CollectionSettings` (JSON)
`snapshots`	`SnapshotName`	`SnapshotMeta` (JSON)
`synonyms`	`{collection}\u{0}{term}`	synonym entries

A different backend (RocksDB, Sled, …) can be plugged in by implementing the StorageBackend trait and swapping the wiring in crates/server::run.

Configuration

crates/config parses config/default.toml (or the path supplied via --config), then layers CLI overrides on top, then environment variables, then the built-in defaults. Validation is performed with the validator crate. See docs/configuration.md for the full key reference.

Security

Master key (SHA-256 hashed) gates all /api/v1/* routes except a whitelist (/health, /version, /metrics, /swagger-ui/*).
API keys are scoped by Permission and an optional collection list.
Tenant tokens are HS256 JWTs with a short (≤ 1 h) lifetime.
Rate limiting uses governor keyed by client IP.
Security headers (X-Content-Type-Options, CSP, HSTS, …) are added on every response.
All user-supplied strings are sanitised (control characters stripped, whitespace runs collapsed) before storage or query parsing.

AccelerateSearch REST API

All routes are versioned under /api/v1/. The OpenAPI specification is served at /api-docs/openapi.json and the Swagger UI at /swagger-ui/.

The response format for errors is:

{
  "error": "code_snake_case",
  "message": "Human-readable description.",
  "code": 404
}

System (no auth required)

Method	Path	Description
`GET`	`/health`	Health check
`GET`	`/version`	Binary version info (version, commit SHA, commit date)
`GET`	`/stats`	Global statistics (collection count, document count)
`GET`	`/metrics`	Prometheus metrics (gated by `[metrics].enabled`)
`GET`	`/instance-id`	Per-instance UUID

All other endpoints require the master key or a scoped API key in the Authorization: Bearer <key> header.

Collections

Method	Path	Description
`POST`	`/api/v1/collections`	Create collection
`GET`	`/api/v1/collections`	List collections
`GET`	`/api/v1/collections/{uid}`	Get collection
`PATCH`	`/api/v1/collections/{uid}`	Update collection metadata
`DELETE`	`/api/v1/collections/{uid}`	Delete collection
`GET`	`/api/v1/collections/{uid}/stats`	Collection stats
`GET`	`/api/v1/collections/{uid}/settings`	Get full settings blob
`PATCH`	`/api/v1/collections/{uid}/settings`	Update full settings blob
`DELETE`	`/api/v1/collections/{uid}/settings`	Reset settings to defaults

Per-setting endpoints (leaf GET / PUT / DELETE)

The following “leaf” settings each have their own GET/PUT/DELETE trio so clients can manage individual settings without round-tripping the full CollectionSettings blob:

Method	Path
`GET` / `PUT` / `DELETE`	`/api/v1/collections/{uid}/settings/filterable-attributes`
`GET` / `PUT` / `DELETE`	`/api/v1/collections/{uid}/settings/sortable-attributes`
`GET` / `PUT` / `DELETE`	`/api/v1/collections/{uid}/settings/searchable-attributes`
`GET` / `PUT` / `DELETE`	`/api/v1/collections/{uid}/settings/displayed-attributes`
`GET` / `PUT` / `DELETE`	`/api/v1/collections/{uid}/settings/stop-words`
`GET` / `PUT` / `DELETE`	`/api/v1/collections/{uid}/settings/ranking-rules`
`GET` / `PUT` / `DELETE`	`/api/v1/collections/{uid}/settings/typo-tolerance`
`GET` / `PUT` / `DELETE`	`/api/v1/collections/{uid}/settings/distinct-field`
`GET` / `PUT` / `DELETE`	`/api/v1/collections/{uid}/settings/synonyms`
`GET` / `PATCH` / `DELETE`	`/api/v1/collections/{uid}/settings/embedders`

The embedders leaf uses PATCH (not PUT) because the embedder configuration is a JSON object that is deep-merged, not replaced.

Documents

Method	Path	Description
`POST`	`/api/v1/collections/{uid}/documents`	Add or replace documents (upsert by primary key)
`PUT`	`/api/v1/collections/{uid}/documents`	Partial update by primary key (only supplied fields are written)
`GET`	`/api/v1/collections/{uid}/documents`	List documents (paginated)
`GET`	`/api/v1/collections/{uid}/documents/{id}`	Get a single document by primary key
`DELETE`	`/api/v1/collections/{uid}/documents/{id}`	Delete one document
`DELETE`	`/api/v1/collections/{uid}/documents`	Delete every document in the collection
`POST`	`/api/v1/collections/{uid}/documents/delete-batch`	Bulk delete by an array of IDs
`GET`	`/api/v1/collections/{uid}/documents/export?format=`	Export as `ndjson`, `json`, or `csv`

Search

Method	Path	Description
`POST`	`/api/v1/collections/{uid}/search`	Full search (POST is recommended for complex filters)
`GET`	`/api/v1/collections/{uid}/search?q=&offset=&limit=&filter=&facets=`	Search (GET, lightweight)
`GET`	`/api/v1/collections/{uid}/autocomplete?q=&limit=`	FST-backed term suggestions
`POST`	`/api/v1/multi-search`	Multi-collection search

Search request body

{
  "q": "rust search",
  "offset": 0,
  "limit": 20,
  "filter": "rating >= 4 AND status = \"active\"",
  "facets": ["category", "brand"],
  "attributes_to_retrieve": ["id", "title", "price"],
  "attributes_to_highlight": ["title", "body"],
  "sort": ["price:asc"],
  "show_ranking_score": true,
  "hybrid": { "semantic_ratio": 0.5, "embedder": "default" },
  "vector": [0.1, 0.2, 0.3],
  "distinct": "sku"
}

Search response

{
  "query": "rust search",
  "hits": [
    {
      "document": { "id": "1", "title": "Rust in Action", "price": 39.95 },
      "formatted": { "title": "<em>Rust</em> in Action" },
      "ranking_score": 0.86
    }
  ],
  "offset": 0,
  "limit": 20,
  "estimatedTotalHits": 142,
  "processingTimeMs": 4,
  "facetDistribution": { "category": { "counts": { "book": 87, "video": 55 } } }
}

Autocomplete response

{
  "query": "rus",
  "suggestions": [
    { "term": "rust",     "total_term_freq": 1234 },
    { "term": "rusty",    "total_term_freq":  17  },
    { "term": "russian",  "total_term_freq":   9  }
  ],
  "processingTimeMs": 1
}

Filter expression grammar

expr        := or
or          := and ( "OR" and )*
and         := not ( "AND" not )*
not         := "NOT" not | atom
atom        := "(" expr ")" | comparison
comparison  := field op value
op          := "=" | "!=" | ">" | ">=" | "<" | "<=" | "TO"
            | "IN" | "NOT" "IN" | "EXISTS" | "IS" "NULL" | "IS" "NOT" "NULL"
            | "CONTAINS" | "STARTS_WITH" | "ENDS_WITH" | "LIKE"
            | "GEO_BBOX" lat lng lat lng
            | "GEO_RADIUS" lat lng meters
value       := number | string | bool | null | array

LIKE patterns use % for “any” and _ for “single character”.

Tasks

Method	Path	Description
`GET`	`/api/v1/tasks`	List tasks (paginated)
`GET`	`/api/v1/tasks/{taskUid}`	Get a single task by UID
`DELETE`	`/api/v1/tasks`	Cancel every queued/pending task
`POST`	`/api/v1/tasks/cancel`	Cancel a subset of tasks by filter (uid prefix, type, status)

API Keys

Method	Path	Description
`GET`	`/api/v1/keys`	List keys
`POST`	`/api/v1/keys`	Create key
`GET`	`/api/v1/keys/{key_or_uid}`	Get key by raw key value or UID
`PATCH`	`/api/v1/keys/{key_or_uid}`	Update key metadata (name, scopes, expiry)
`DELETE`	`/api/v1/keys/{key_or_uid}`	Delete key

Tenant Tokens

Method	Path	Description
`POST`	`/api/v1/tenant-tokens`	Mint a short-lived HS256 JWT scoped to a search API key

Snapshots

Method	Path	Description
`POST`	`/api/v1/snapshots`	Create snapshot
`GET`	`/api/v1/snapshots`	List snapshots
`GET`	`/api/v1/snapshots/{name}`	Get snapshot info
`DELETE`	`/api/v1/snapshots/{name}`	Delete snapshot
`POST`	`/api/v1/snapshots/{name}/restore`	Restore snapshot

Indexes (alias for collections)

The /api/v1/indexes/* routes are aliases for /api/v1/collections/* and accept the same payloads. They exist for Meilisearch compatibility on the search-rules endpoint (which lives at /api/v1/indexes/{uid}/settings/rules).

Method	Path	Description
`POST`	`/api/v1/indexes`	Create index
`GET`	`/api/v1/indexes`	List indexes
`GET`	`/api/v1/indexes/{uid}`	Get index
`PATCH`	`/api/v1/indexes/{uid}`	Update index metadata
`DELETE`	`/api/v1/indexes/{uid}`	Delete index
`GET`	`/api/v1/indexes/{uid}/stats`	Index stats
`POST`	`/api/v1/swap-indexes`	Atomically swap two indexes

Search Rules (curated queries)

Method	Path	Description
`GET`	`/api/v1/indexes/{uid}/settings/rules`	Get ruleset
`POST`	`/api/v1/indexes/{uid}/settings/rules`	Replace ruleset
`DELETE`	`/api/v1/indexes/{uid}/settings/rules`	Delete ruleset

Hooks (webhooks)

Method	Path	Description
`GET`	`/api/v1/hooks`	List hooks
`GET`	`/api/v1/hooks/{id}`	Get hook
`POST`	`/api/v1/hooks`	Create hook
`PATCH`	`/api/v1/hooks/{id}`	Update hook
`DELETE`	`/api/v1/hooks/{id}`	Delete hook

Network and experimental features

Method	Path	Description
`GET`	`/api/v1/network`	Cluster network info (skeleton)
`GET`	`/api/v1/experimental-features`	List feature toggles
`PATCH`	`/api/v1/experimental-features`	Update feature toggles

OpenAPI and Swagger

Path	Description
`/api-docs/openapi.json`	OpenAPI 3.1 spec in JSON
`/swagger-ui/`	Interactive Swagger UI (HTML)
`/swagger-ui/{tail:.*}`	Swagger UI assets

Both are gated by the [api_docs] TOML section. Disabling either key returns a 404 for that path.

Configuration Reference

Configuration is parsed at startup from (in order of precedence):

CLI flags (e.g. --host 0.0.0.0 --port 7700)
Environment variables (e.g. ACCELERATE_HOST, ACCELERATE_PORT)
config/default.toml (the file shipped with the binary)
Built-in defaults

The path to the TOML file can be overridden with --config <file> or ACCELERATE_CONFIG=/path/to/file.

Warning

The project is in active development. Configuration keys may be renamed, removed, or have their defaults changed between releases. Always read the config/default.toml of the release you are running for the authoritative reference.

`[server]`

Key	Type	Default	Description
`host`	`string`	`localhost`	Bind address. `localhost`/`127.0.0.1` for loopback, `0.0.0.0` to listen on every interface.
`port`	`u16`	`7700`	TCP port (Meilisearch-compatible).
`workers`	`usize`	`0`	Actix worker threads. `0` = auto = number of CPU cores.
`max_connections`	`usize`	`0`	Maximum simultaneous connections. `0` = unlimited.
`keep_alive`	`string`	`75s`	HTTP keep-alive duration.
`read_timeout`	`string`	`30s`	Maximum time to wait for a request.
`write_timeout`	`string`	`30s`	Maximum time to wait for a response.
`shutdown_timeout`	`string`	`10s`	Graceful shutdown window.
`max_body_size`	`usize`	`104857600`	Max HTTP request body in bytes (default 100 MiB).

`[server.tls]`

Key	Type	Default	Description
`enabled`	`bool`	`false`	Enable TLS on the listen socket.
`cert_path`	`string`	`""`	PEM certificate chain path.
`key_path`	`string`	`""`	PEM private key path.
`ca_cert_path`	`string`	`""`	Optional mTLS CA bundle.
`require_client_cert`	`bool`	`false`	Enforce mTLS client certificates.

`[api_docs]`

Key	Type	Default	Description
`swagger_ui_enabled`	`bool`	`true`	Serve Swagger UI at `/swagger-ui/`.
`openapi_enabled`	`bool`	`true`	Serve the OpenAPI spec at `/api-docs/openapi.json`.

`[data]`

Key	Type	Default	Description
`dir`	`string`	`./data`	On-disk `redb` database directory.
`env`	`string`	`development`	`development` or `production`. Production requires a non-empty `auth.master_key`.

`[auth]`

Key	Type	Default	Description
`master_key`	`string`	`""`	Master API key for admin access. Set with `ACCELERATE_MASTER_KEY` in production.
`disable_auth`	`bool`	`false`	Explicitly disable authentication (development only).

`[search]`

Key	Type	Default	Description
`max_values_per_facet`	`usize`	`100`	Max facet values returned per field.
`pagination_max_total_hits`	`usize`	`1000`	Max total hits reported in a paginated response.
`bm25_k1`	`f32`	`1.2`	BM25 term-frequency saturation.
`bm25_b`	`f32`	`0.75`	BM25 length normalisation.
`default_limit`	`usize`	`20`	Default page size when not supplied by the client.
`max_limit`	`usize`	`1000`	Maximum page size accepted from the client.

`[indexing]`

Key	Type	Default	Description
`max_batch_size`	`usize`	`1000`	Max documents per indexing batch.
`commit_interval_ms`	`u64`	`500`	Force commit after this delay.
`parallelism`	`usize`	`0`	Indexing pipeline parallelism. `0` = auto.
`stem`	`bool`	`true`	Apply language-aware stemming.
`remove_stop_words`	`bool`	`true`	Strip stop words before indexing.

`[vector]`

Key	Type	Default	Description
`enabled`	`bool`	`false`	Enable vector search at the platform level.
`dimensions`	`usize`	`384`	Default embedding dimensions.
`similarity`	`string`	`cosine`	`cosine`, `dot`, or `euclidean`.
`hnsw_m`	`usize`	`16`	HNSW connections per node.
`hnsw_ef_construction`	`usize`	`200`	HNSW search depth during indexing.
`hnsw_ef_search`	`usize`	`50`	HNSW search depth during queries.
`quantization`	`string`	`none`	`none`, `scalar`, `product`, or `binary`.
`pq_m`	`usize`	`8`	Sub-spaces for product quantization.
`pq_k`	`usize`	`256`	Centroids per sub-space for product quantization.
`allow_sparse`	`bool`	`true`	Allow sparse vector embeddings (SPLADE-style).
`allow_multi`	`bool`	`true`	Allow multi-vector embeddings (ColBERT-style).
`embedder_url`	`string`	`""`	Optional external embedder URL for auto-embedding.
`embedder_model`	`string`	`""`	Optional embedder model name for telemetry.

`[logging]`

Key	Type	Default	Description
`level`	`string`	`info`	`trace`, `debug`, `info`, `warn`, `error`.
`format`	`string`	`pretty`	`pretty` or `json`.
`dir`	`string`	`./logs`	Log file directory.
`file_prefix`	`string`	`accelerate`	Log file prefix (`{prefix}.{date}.log`).
`max_files`	`usize`	`7`	Retained log files. `0` = unlimited.
`max_size_mb`	`usize`	`100`	Max single-file size in MB. `0` = unlimited.
`auto_delete_days`	`usize`	`30`	Auto-delete logs older than N days. `0` = never.
`no_console`	`bool`	`false`	Disable console output.
`no_file`	`bool`	`false`	Disable the file log appender.
`no_color`	`bool`	`false`	Strip ANSI color from console output.
`quiet`	`bool`	`false`	Silence non-error log lines.

`[metrics]`

Key	Type	Default	Description
`enabled`	`bool`	`true`	Expose Prometheus metrics at `/metrics`.
`endpoint`	`string`	`/metrics`	Metrics endpoint path.

`[snapshots]`

Key	Type	Default	Description
`dir`	`string`	`./snapshots`	Snapshot directory.
`schedule`	`string`	`0 0 * * *`	Cron schedule for auto-snapshot (UTC).
`auto_create`	`bool`	`false`	Enable automatic snapshot creation.

`[updates]`

Key	Type	Default	Description
`check_enabled`	`bool`	`true`	Check for new versions on startup.
`check_interval`	`string`	`24h`	Interval between version checks.

`[rate_limit]`

Key	Type	Default	Description
`enabled`	`bool`	`true`	Enable per-client rate limiting.
`requests_per_second`	`u32`	`100`	Steady-state RPS per client.
`burst_size`	`u32`	`200`	Allowed burst above the steady-state RPS.

`[telemetry]`

Key	Type	Default	Description
`tracing_enabled`	`bool`	`true`	Enable distributed tracing.
`service_name`	`string`	`accelerate`	Service name reported to tracing backends.

`[cache]`

Key	Type	Default	Description
`enabled`	`bool`	`true`	Toggle the search result cache.
`max_entries`	`usize`	`10000`	Maximum number of cached entries.
`ttl_seconds`	`u64`	`300`	Cache TTL in seconds.

`[cors]`

Key	Type	Default	Description
`enabled`	`bool`	`true`	Apply CORS headers on every response.
`allowed_origins`	`array<string>`	`[]`	Allowed origins. `[]` = allow all.
`allowed_methods`	`array<string>`	`["GET", "POST", "PUT", "PATCH", "DELETE", "OPTIONS"]`	Allowed methods.
`allowed_headers`	`array<string>`	`["Authorization", "Content-Type", "Accept", "Origin", "X-Requested-With"]`	Allowed request headers.
`allow_credentials`	`bool`	`true`	Allow cookies / authorization headers.
`max_age`	`u64`	`3600`	Preflight cache duration in seconds.

Environment variables

Every CLI flag is also exposed as an ACCELERATE_* environment variable. For example, --host 0.0.0.0 is equivalent to ACCELERATE_HOST=0.0.0.0. Sensitive values that are commonly set via the environment in production:

Variable	Equivalent CLI flag
`ACCELERATE_CONFIG`	`--config <file>`
`ACCELERATE_MASTER_KEY`	`--master-key <key>`
`ACCELERATE_HOST`	`--host <addr>`
`ACCELERATE_PORT`	`--port <n>`
`ACCELERATE_DATA_DIR`	`--data-dir <dir>`
`ACCELERATE_LOG_LEVEL`	`--log-level <level>`
`ACCELERATE_ENV`	`--env <development\|production>`

Precedence example

# config/default.toml contains:  host = "localhost", port = 7700
# env vars:                       ACCELERATE_PORT=8080
# CLI:                            --host 0.0.0.0

# Effective:
#   host = 0.0.0.0   (CLI > env > TOML > default)
#   port = 8080      (env > TOML > default)

Deployment

Single-node (default)

# Build
cargo build --release

# Run
./target/release/accelerate

A data/ directory is created on first run and holds the embedded redb store, snapshots, daily logs, and the API key store.

TLS

Generate a self-signed cert for local development:

mkcert -install
mkcert accelerate.local

Set in config/default.toml:

[server]
host = "0.0.0.0"
port = 7700

[server.tls]
enabled = true
cert_path = "./accelerate.local.pem"
key_path = "./accelerate.local-key.pem"

For production, use certs from Let’s Encrypt or your corporate CA.

systemd unit

# /etc/systemd/system/accelerate.service
[Unit]
Description=AccelerateSearch
After=network.target

[Service]
Type=simple
User=accelerate
WorkingDirectory=/var/lib/accelerate
Environment=ACCELERATE_CONFIG=/etc/accelerate/config.toml
ExecStart=/usr/local/bin/accelerate
Restart=on-failure
RestartSec=5
LimitNOFILE=65535

[Install]
WantedBy=multi-user.target

sudo useradd -r -s /usr/sbin/nologin accelerate
sudo install -m755 target/release/accelerate /usr/local/bin/
sudo install -d -o accelerate -g accelerate /var/lib/accelerate
sudo install -d -m755 /etc/accelerate
sudo cp config/default.toml /etc/accelerate/config.toml
sudo systemctl daemon-reload
sudo systemctl enable --now accelerate

Docker

docker build -t accelerate:local .
docker run --rm -p 7700:7700 -v "$PWD/data:/data" accelerate:local

docker-compose.yml is provided in the repository root for a single-command bring-up that exposes port 7700.

Reverse proxy

nginx (with TLS termination)

server {
    listen 443 ssl http2;
    server_name accelerate.example.com;

    ssl_certificate     /etc/letsencrypt/live/accelerate.example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/accelerate.example.com/privkey.pem;

    client_max_body_size 10m;

    location / {
        proxy_pass http://127.0.0.1:7700;
        proxy_set_header Host              $host;
        proxy_set_header X-Real-IP         $remote_addr;
        proxy_set_header X-Forwarded-For   $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

Capacity planning

A single node comfortably handles:

5 M documents × 1 KB each
30 K QPS search (cache-warm, single collection)
5 K QPS index (small batch updates)

For higher throughput, scale horizontally behind a load balancer. The embedded redb backend is single-node only; use the network cluster skeleton in crates/cluster to coordinate a fleet.

Documentation site (GitHub Pages)

The user guide is an mdbook site that is published to GitHub Pages via the .github/workflows/docs.yml workflow.

One-time setup

Create the GitHub repository AccelerateSearch (the URL slug must match the directory name for the project page to live at muhammad-fiaz.github.io/AccelerateSearch/).
In Settings → Pages, set Source to GitHub Actions.
(Optional) Configure a custom domain in Settings → Pages → Custom domain and add a CNAME file in site/ from the workflow.
Grant the workflow the pages: write and id-token: write permissions (already declared in the workflow file).

Build locally

# One-off: install the mdbook binary
cargo install mdbook --locked --version 0.4.43

# Build the user guide into ./docs/book/
(cd docs && mdbook build)

# Build cargo doc into ./target/doc/
cargo doc --no-deps --workspace --target-dir target

# Combine both into a single ./site/ directory ready for Pages
mkdir -p site
cp -R docs/book/. site/
mkdir -p site/rust-api
cp -R target/doc/. site/rust-api/
touch site/.nojekyll

Deploy

Every push to main rebuilds and deploys the site. To force a rebuild without a code change, go to the Actions tab, select Docs, and click Run workflow.

The site is served at https://muhammad-fiaz.github.io/AccelerateSearch/.

Backup and restore

# Snapshot
curl -X POST -H "Authorization: Bearer $MASTER" \
     http://localhost:7700/api/v1/snapshots \
     -d '{"name":"nightly-2026-06-03"}'

# Download the underlying files (tar+zstd snapshot)
curl -O -H "Authorization: Bearer $MASTER" \
     http://localhost:7700/api/v1/snapshots/nightly-2026-06-03

# Restore
curl -X POST -H "Authorization: Bearer $MASTER" \
     http://localhost:7700/api/v1/snapshots/nightly-2026-06-03/restore

Always stop the server (or pause writes via --read-only) before restoring, to avoid in-flight write conflicts.

Health checks

curl -fsS http://localhost:7700/health
# {"status":"available"}

Configure your orchestrator to restart the container on a non-200 response.

Upgrades

Drain writes (optional): set [search] readonly = true.
Stop the old binary.
Install the new binary.
Start the new binary; the embedded store is upgraded in place.
Re-enable writes.

Schema migrations live in crates/storage and run automatically on startup. They are additive and idempotent; a rollback to a previous version is always supported.

Development

Prerequisites

Rust 1.85+ (edition 2024; 1.88+ recommended for the latest dependencies)
A C toolchain (gcc, clang, or MSVC)
On Linux: pkg-config and mimalloc’s usual build deps
git, curl, and cargo (the toolchain)

Build

git clone https://github.com/muhammad-fiaz/AccelerateSearch
cd AccelerateSearch
cargo build

The binary lands at target/debug/accelerate.

Run

cargo run -- --config config/default.toml

Logs are emitted to stdout and logs/accelerate-YYYY-MM-DD.log.

Test

cargo test --workspace --no-fail-fast

Unit tests live next to the code they cover (#[cfg(test)] mod tests).
Property-based tests use proptest for the filter parser (filters::evaluator) and the tokenizer (indexing::analyzer).
Every crate that exposes a public type is documented with cargo doc --no-deps --workspace; the CI fails on any rustdoc warning.

Lint and format

cargo fmt --all -- --check
cargo clippy --workspace --all-targets -- -D warnings

The CI workflow runs both on every push; failing either blocks the build.

Benchmarks

benchmark/ is a standalone project that spins up a 1M-document collection and measures indexing and search throughput.

cd benchmark
cargo run --release

For micro-benchmarks, use cargo bench in the crate of interest (currently only filters).

Project layout

crates/         # library crates
  api/          # HTTP handlers, DTOs, OpenAPI
  auth/         # master key, API keys, tenant tokens
  cache/        # LRU + TTL cache
  cluster/      # cluster skeleton (TODO)
  collections/  # collection metadata service
  config/       # TOML config, validation
  documents/    # document service (add, update, delete, get, list)
  errors/       # AppError and From impls
  facets/       # facet distribution
  filters/      # filter parser & evaluator
  hybrid/       # hybrid query fusion (RRF)
  highlighting/ # <em> highlight
  indexing/     # tokenization + inverted index + FST term dict
  metrics/      # Prometheus exporter
  models/       # shared data types
  replication/  # replication skeleton (TODO)
  scheduler/    # cron + interval jobs
  search/       # BM25, ranking, query parser
  security/     # rate limit, CORS, audit
  server/       # HTTP lifecycle, banner
  sharding/     # sharding skeleton (TODO)
  snapshots/    # tar+zstd snapshots
  storage/      # StorageBackend trait + redb
  synonyms/     # synonym map storage and lookup
  tasks/        # async task queue
  telemetry/    # tracing-subscriber setup
  typo/         # Damerau-Levenshtein
  utils/        # helpers (hash, random, time)
  validation/   # input validation + sanitization
  vector/       # embedding types + quantization
config/         # default.toml
docs/           # mdbook user guide (this site)
benchmark/      # standalone benchmark project
.github/        # CI + release + docs workflows

Adding a new feature

Decide the layer. Filters belong in filters/, ranking tweaks in search/, persistence in storage/, etc. The crate boundaries exist to keep compile times low; honour them.
Define the data type in models. Public types live there so they can be shared across crates without circular deps.
Write a failing unit test first. Tests are colocated with code.
Implement the feature. No unwrap outside tests; no unsafe.
Add an integration test if the feature is exposed via HTTP.
Update docs/api.md if a new route was added, and docs/configuration.md if a new config key was added.
Run cargo fmt, cargo clippy, and cargo test before opening a PR.

Adding a new crate

mkdir crates/<name> && cd crates/<name>
Copy a minimal Cargo.toml from a sibling crate; pin the workspace deps and inherit the lints table.
Add it to [workspace.dependencies] in the workspace Cargo.toml if other crates need to depend on it, otherwise just to the [workspace] members list (default).
Update docs/architecture.md to show the new crate in the diagram.

Releasing

Bump versions: cargo set-version --workspace 0.X.0 (manual edit acceptable).
Update CHANGELOG.md (chronological, newest first).
Tag the commit: git tag v0.X.0.
Push the tag; .github/workflows/release.yml cross-compiles for six targets and publishes a draft release.

Common pitfalls

unwrap in non-test code will fail clippy. Use ?, .expect("…") with a justification, or convert the error via From into AppError.
Forgetting to invalidate the search cache after a document write. See crates/api/src/v1/documents.rs for the pattern.
Using println! for logging. Use tracing::{info, warn, error} so the structured-logging pipeline picks it up.
Adding a new env var that doesn’t go through crates/config. All configuration must be TOML-driven; env vars are an escape hatch only.

Rust API Reference

The full Rust API documentation is generated with cargo doc and lives in the target/doc/ directory of the workspace. It is regenerated on every push to main by the GitHub Actions docs workflow and published at:

https://muhammad-fiaz.github.io/AccelerateSearch/rust-api/

Build locally

cargo doc --no-deps --workspace --target-dir target
# Optional: open in your default browser
cargo doc --no-deps --workspace --target-dir target --open

RUSTDOCFLAGS="-D warnings" is set in the CI, so any doc warning fails the build.

Crate index

Crate	Description
`accelerate`	Root binary
`api`	HTTP handlers and DTOs
`auth`	Master key, API keys, tenant tokens
`cache`	LRU + TTL cache
`cluster`	Cluster skeleton
`collections`	Collection metadata service
`config`	TOML configuration
`documents`	Document service
`errors`	Unified error type
`facets`	Facet distribution engine
`filters`	Filter expression parser & evaluator
`hybrid`	Hybrid query fusion (RRF)
`highlighting`	`<em>` highlighting
`indexing`	Tokenisation, inverted index, FST
`metrics`	Prometheus exporter
`models`	Shared data types
`replication`	Replication skeleton
`scheduler`	Cron + interval jobs
`search`	BM25, ranking, query parser
`security`	Rate limit, CORS, audit logger
`server`	HTTP lifecycle, banner
`sharding`	Sharding skeleton
`snapshots`	Tar + zstd snapshots
`storage`	`StorageBackend` trait + redb
`synonyms`	Synonym expansion
`tasks`	Async task queue
`telemetry`	tracing-subscriber setup
`typo`	Damerau-Levenshtein
`utils`	Hash, random, time helpers
`validation`	Input validation & sanitization
`vector`	Embedding types + quantization

Keyboard shortcuts

AccelerateSearch