Skip to content

HTTP Server

The lexrs-server crate compiles to two binaries: writer and reader. They are designed to run together as a search service, but they have no shared code path at runtime — they communicate only through files on a shared volume and a Consul KV entry.

Install

cargo install lexrs-server

Use cases

Autocomplete / typeahead As a user types into a search box, send the current input to GET /prefix. The response is every known word that starts with what they've typed so far, ordered by insertion. Prefix lookups on a DAWG are fast enough to fire on every keystroke.

# User has typed "app"
curl 'http://localhost/prefix?q=app'
# ["apple", "application", "apply", "appointment"]

Spell checking and did-you-mean When a search query returns no results, retry with GET /search?q=<word>&dist=1. A distance of 1 catches most single-character typos. Return the results as suggestions.

curl 'http://localhost/search?q=recieve&dist=1'
# ["receive"]

Frequency-ranked suggestions Ingest words with counts that reflect how often they appear in your corpus (page views, click counts, query frequency). Use with_count=true in prefix or wildcard search to get counts back, then rank suggestions by count in your application.

curl 'http://localhost/prefix?q=app&with_count=true'
# [{"word":"application","count":9823}, {"word":"apple","count":412}, ...]

Vocabulary validation Check whether a submitted word exists in an allowed list before accepting it. GET /contains is an O(word length) exact lookup — no query parsing, no table scan.

curl 'http://localhost/contains?q=scrabble'
# {"found": true}

Why two binaries?

Search reads and word writes have very different performance profiles.

Writes need a mutable data structure (Trie) and can tolerate some latency — the caller just posted a word and moved on. Reads need an immutable, highly-compressed structure (DAWG) that can serve many concurrent queries without locking.

Splitting into two processes lets you:

  • Scale readers independently. Run 1 writer and 10 readers if your query volume demands it.
  • Isolate faults. A crash in the writer does not affect in-flight search queries.
  • Reload without downtime. Readers swap their in-memory DAWG atomically when a new snapshot arrives — no restart, no dropped requests.

How a word goes from POST /words to a search result

Understanding this flow makes it easier to configure and operate the server.

Step 1 — Ingest. A client posts words to the writer. The writer inserts them into an in-memory Trie. At this point the words are not yet visible to readers.

Step 2 — Compact. Every COMPACT_INTERVAL seconds (or immediately via POST /compact), the writer:

  1. Reads all words out of its Trie.
  2. Opens the previous snapshot file (a sorted word count text file).
  3. Merges the two sorted streams line by line — like a merge sort merge step. If a word appears in both, counts are summed. Memory usage during this step is O(1); neither the snapshot nor the Trie is loaded in full.
  4. Writes the merged output to a .tmp file, then renames it atomically to snapshot_N.txt.
  5. Clears the Trie. It now holds only the delta since the last compaction.

Step 3 — Announce. The writer stores {"version": N, "path": "/snapshots/snapshot_N.txt"} at the key lexrs/snapshot in Consul's KV store.

Step 4 — Reload. Each reader runs a background loop that long-polls Consul on that key (?wait=30s). When the version changes, the reader:

  1. Opens the new snapshot file.
  2. Loads all words into a new DAWG.
  3. Calls reduce() to finalise minimisation.
  4. Atomically swaps the new DAWG into the serving path using arc-swap. In-flight requests against the old DAWG complete normally.
  client                writer                consul            reader(s)
    │                     │                     │                  │
    │  POST /words         │                     │                  │
    │────────────────────▶│                     │                  │
    │  {"inserted": N}     │                     │                  │
    │◀────────────────────│                     │                  │
    │                     │                     │                  │
    │  (60s passes)        │                     │                  │
    │                     │ compact + write      │                  │
    │                     │──snapshot_2.txt────▶ volume            │
    │                     │                     │                  │
    │                     │ PUT lexrs/snapshot   │                  │
    │                     │────────────────────▶│                  │
    │                     │                     │  long-poll fires │
    │                     │                     │─────────────────▶│
    │                     │                     │  version=2       │
    │                     │                     │◀─────────────────│
    │                     │                     │                  │ load + reduce
    │                     │                     │                  │ arc-swap
    │  GET /search?q=ap*  │                     │                  │
    │────────────────────────────────────────────────────────────▶│
    │  ["apple","apply"]  │                     │                  │
    │◀────────────────────────────────────────────────────────────│

Running the writer

writer \
  --host 0.0.0.0 \
  --port 3000 \
  --snapshot-dir /snapshots \
  --consul http://localhost:8500 \
  --compact-interval 60

Every flag has a corresponding environment variable:

Flag Env var Default
--host WRITER_HOST 0.0.0.0
--port WRITER_PORT 3000
--snapshot-dir SNAPSHOT_DIR /snapshots
--consul CONSUL_ADDR http://consul:8500
--compact-interval COMPACT_INTERVAL 60

Ingesting words

Send a JSON object with a words array. Each element can be a plain string (uses the top-level count) or an object with its own count:

# All words get count = 1 (the default)
curl -X POST http://localhost:3000/words \
  -H 'Content-Type: application/json' \
  -d '{"words": ["apple", "apply", "apt"]}'

# All words get count = 3
curl -X POST http://localhost:3000/words \
  -H 'Content-Type: application/json' \
  -d '{"words": ["apple", "apply", "apt"], "count": 3}'

# Per-word counts — mix strings and objects freely
curl -X POST http://localhost:3000/words \
  -H 'Content-Type: application/json' \
  -d '{
    "words": [
      {"word": "apple", "count": 10},
      {"word": "apply", "count": 3},
      "apt"
    ]
  }'

Response:

{"inserted": 3}

Triggering compaction

By default the writer compacts every 60 seconds. To flush immediately — useful after a bulk load or in tests:

curl -X POST http://localhost:3000/compact
{"status": "ok", "version": 2}

The version number increments with each compaction. After this call, readers will pick up the new snapshot within one Consul poll cycle (≤ 30 seconds).

Checking writer stats

Stats reflect the live delta Trie — words ingested since the last compaction, not yet visible to readers:

curl http://localhost:3000/stats
{"words": 47, "nodes": 203}

If both numbers are 0 after a compaction, all words have been flushed to the snapshot and readers have everything.


Running the reader

reader \
  --host 0.0.0.0 \
  --port 3001 \
  --snapshot-dir /snapshots \
  --consul http://localhost:8500
Flag Env var Default
--host READER_HOST 0.0.0.0
--port READER_PORT 3001
--snapshot-dir SNAPSHOT_DIR /snapshots
--consul CONSUL_ADDR http://consul:8500
# Zero or more characters
curl 'http://localhost:3001/search?q=ap*'
# ["apple", "apply", "apt"]

# Exactly one character
curl 'http://localhost:3001/search?q=appl?'
# ["apple", "apply"]

# With frequency counts
curl 'http://localhost:3001/search?q=ap*&with_count=true'
# [{"word":"apple","count":10}, {"word":"apply","count":3}, {"word":"apt","count":1}]
# Levenshtein distance ≤ 1
curl 'http://localhost:3001/search?q=aple&dist=1'
# ["apple"]

# Distance ≤ 2 with counts
curl 'http://localhost:3001/search?q=bannana&dist=2&with_count=true'
# [{"word":"banana","count":5}]

Prefix completion

curl 'http://localhost:3001/prefix?q=app'
# ["apple", "apply"]

curl 'http://localhost:3001/prefix?q=app&with_count=true'
# [{"word":"apple","count":10}, {"word":"apply","count":3}]

Exact lookup

curl 'http://localhost:3001/contains?q=apple'
# {"found": true}

curl 'http://localhost:3001/contains?q=appl'
# {"found": false}

Reader stats

Stats reflect the DAWG loaded from the latest snapshot — the full compacted lexicon:

curl 'http://localhost:3001/stats'
# {"words": 1250000, "nodes": 420000}

Snapshot format

Snapshots are plain UTF-8 text files on the shared volume, one entry per line:

apple 10
apply 3
apt 1
banana 5

Lines are sorted lexicographically. The format is intentionally simple — you can inspect, diff, or modify snapshots with standard Unix tools. The filename is snapshot_<version>.txt; old versions are not deleted automatically, so you can roll back by pointing Consul at an older version.