diff options
| author | Michelle Tilley <michelle@michelletilley.net> | 2026-02-26 14:42:08 -0800 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2026-02-26 14:42:08 -0800 |
| commit | 3ba47446f06d5b0fbeaeb59d4ffed768b70729d8 (patch) | |
| tree | 28348bb3d18e9983e9212c26840f691766cad985 /CONTRIBUTING.md | |
| parent | feat: Add history author/intent metadata and v1 record version (#3205) (diff) | |
| download | atuin-3ba47446f06d5b0fbeaeb59d4ffed768b70729d8.zip | |
feat: In-memory search index with atuin daemon (#3201)
## Summary
This PR adds a persistent, in-memory search index to the Atuin daemon,
enabling fast fuzzy search without the startup delay of building an
index each time the TUI opens.
### Key Changes
- **Daemon search service**: A new gRPC service that maintains a Nucleo
fuzzy search index in memory
- **Real-time index updates**: The daemon listens for history events
(new commands, synced records) and updates the index immediately
- **Filter mode support**: All existing filter modes work (Global, Host,
Session, Directory, Workspace)
- **New search engine**: `daemon-fuzzy` search mode that queries the
daemon instead of building a local index
- **Paged history loading**: Database pagination support for efficient
initial index loading
- **Configurable logging**: New `[logs]` settings section for daemon and
search log configuration
- **Component-based daemon architecture**: Refactored daemon internals
into a modular, event-driven system
- **Fallback to DB search for regex**: Since Nucleo doesn't support
regex matching
## Daemon Architecture
The daemon has been refactored to use a component-based, event-driven
architecture that makes it easier to add new functionality and reason
about the system.
### Core Concepts
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ Atuin Daemon │
│ │
│ ┌─────────────┐ ┌──────────────────────────────────────────────────┐ │
│ │ Daemon │ │ Components │ │
│ │ Handle │────▶│ │ │
│ │ │ │ ┌─────────────┐ ┌─────────────┐ ┌────────────┐ │ │
│ │ • emit() │ │ │ History │ │ Search │ │ Sync │ │ │
│ │ • subscribe │ │ │ Component │ │ Component │ │ Component │ │ │
│ │ • settings │ │ │ │ │ │ │ │ │ │
│ │ • databases │ │ │ gRPC service│ │ gRPC service│ │ background │ │ │
│ └─────────────┘ │ │ WIP history │ │ Nucleo index│ │ sync │ │ │
│ │ │ └─────────────┘ └─────────────┘ └────────────┘ │ │
│ │ └──────────────────────────────────────────────────┘ │
│ │ ▲ │
│ ▼ │ │
│ ┌─────────────────────────────────────┴────────────────────────────────┐ │
│ │ Event Bus (broadcast) │ │
│ │ │ │
│ │ HistoryStarted │ HistoryEnded │ RecordsAdded │ SyncCompleted │ ... │ │
│ └──────────────────────────────────────────────────────────────────────┘ │
│ ▲ │
│ ┌───────────────────────────────────┴──────────────────────────────────┐ │
│ │ Control Service (gRPC) │ │
│ │ External event injection from CLI commands │ │
│ └──────────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
```
### DaemonHandle
A lightweight, cloneable handle that provides access to shared daemon
resources:
- **Event emission**: `handle.emit(DaemonEvent::...)` broadcasts to all
components
- **Event subscription**: `handle.subscribe()` returns a receiver for
the event bus
- **Settings**: `handle.settings()` for configuration access
- **Databases**: `handle.history_db()` and `handle.store()` for data
access
### Component Trait
Components implement a simple lifecycle:
```rust
#[async_trait]
trait Component: Send + Sync {
fn name(&self) -> &'static str;
async fn start(&mut self, handle: DaemonHandle) -> Result<()>;
async fn handle_event(&mut self, event: &DaemonEvent) -> Result<()>;
async fn stop(&mut self) -> Result<()>;
}
```
### Event-Driven Design
Components communicate via events rather than direct coupling:
| Event | Emitted By | Consumed By |
|-------|-----------|-------------|
| `HistoryStarted` | History gRPC | Search (logging) |
| `HistoryEnded` | History gRPC | Search (index update) |
| `RecordsAdded` | Sync | Search (index update) |
| `HistoryPruned` | CLI (via Control) | Search (index rebuild) |
| `HistoryDeleted` | CLI (via Control) | Search (index rebuild) |
| `ForceSync` | CLI (via Control) | Sync |
| `ShutdownRequested` | Signal handler | All (graceful shutdown) |
### External Event Injection
CLI commands can inject events into a running daemon:
```rust
// After `atuin history prune`
emit_event(DaemonEvent::HistoryPruned).await?;
// After deleting specific items
emit_event(DaemonEvent::HistoryDeleted { ids }).await?;
// Request immediate sync
emit_event(DaemonEvent::ForceSync).await?;
```
This ensures the daemon's search index stays in sync with database
changes made by CLI commands.
## Search Architecture
The search service uses a [forked version of
Nucleo](https://github.com/atuinsh/nucleo-ext) that adds filter and
scorer callbacks, enabling efficient filtering and frecency-based
ranking.
```
┌────────────────────────────────────────────────────────────────┐
│ Atuin Daemon │
│ │
│ ┌─────────────────┐ ┌──────────────────────────────────┐ │
│ │ Event System │───▶│ Search Component │ │
│ │ │ │ │ │
│ │ • RecordsAdded │ │ ┌────────────────────────────┐ │ │
│ │ • HistoryEnded │ │ │ Deduplicated Index │ │ │
│ │ • HistoryPruned │ │ │ │ │ │
│ └─────────────────┘ │ │ CommandData per command: │ │ │
│ │ │ • Global frecency │ │ │
│ ┌─────────────────┐ │ │ • Filter indexes (sets) │ │ │
│ │ Background Task │ │ │ • Invocation history │ │ │
│ │ │ │ └────────────────────────────┘ │ │
│ │ Rebuilds │ │ │ │ │
│ │ frecency map │ │ ▼ │ │
│ │ every 60s │───▶│ ┌────────────────────────────┐ │ │
│ └─────────────────┘ │ │ Nucleo (forked) │ │ │
│ │ │ │ │ │
│ │ │ • Filter callback │ │ │
│ │ │ • Scorer callback │ │ │
│ │ │ • Fuzzy matching │ │ │
│ │ └────────────────────────────┘ │ │
│ └──────────────────────────────────┘ │
│ │ │
│ │ gRPC (Unix socket) │
└──────────────────────────────────────│─────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Search TUI (Client) │
│ │
│ 1. Send query + filter mode + context to daemon │
│ 2. Receive matching history IDs (ranked by frecency) │
│ 3. Hydrate full records from local SQLite database │
│ 4. Display results in TUI │
└─────────────────────────────────────────────────────────────────┘
```
### Nucleo Fork
The [nucleo-ext fork](https://github.com/atuinsh/nucleo-ext) adds two
key features to Nucleo:
1. **Filter callback**: Pre-filter items before fuzzy matching (used for
directory/host/session filtering)
2. **Scorer callback**: Compute custom scores after matching (used for
frecency ranking)
```rust
// Filter: only include commands run in current directory
nucleo.set_filter(Some(Arc::new(|cmd: &String| {
passing_commands.contains(cmd)
})));
// Scorer: combine fuzzy score with frecency
nucleo.set_scorer(Some(Arc::new(|cmd: &String, fuzzy_score: u32| {
let frecency = frecency_map.get(cmd).unwrap_or(0);
fuzzy_score + (frecency * 10)
})));
```
### Deduplicated Index
Commands are stored once per unique command text, with metadata tracking
all invocations:
```rust
struct CommandData {
command: String,
invocations: Vec<Invocation>, // All times this command was run
global_frecency: FrecencyData, // Precomputed frecency score
// O(1) filter indexes
directories: HashSet<String>, // All cwds where command was run
hosts: HashSet<String>, // All hostnames
sessions: HashSet<String>, // All session IDs
}
```
This deduplication means:
- **Fewer items to match**: ~13K unique commands vs ~62K history entries
- **O(1) filter checks**: HashSet lookups instead of scanning
invocations
- **Single frecency score**: Global frecency computed once, used for all
filter modes
### Frecency Scoring
Frecency (frequency + recency) scoring prioritizes recently and
frequently used commands:
```rust
fn compute_frecency(count: u32, last_used: i64, now: i64) -> u32 {
let age_hours = (now - last_used) / 3600;
// Recency: decays over time (half-life ~24 hours)
let recency = (100.0 * (-age_hours as f64 / 24.0).exp()) as u32;
// Frequency: logarithmic scaling
let frequency = (count.ln() * 20.0).min(100.0) as u32;
recency + frequency
}
```
The frecency map is:
- **Precomputed by background task** every 60 seconds
- **Never computed inline** during search (no latency impact)
- **Graceful fallback**: If unavailable, search works without frecency
ranking
### Filter Mode Implementation
| Filter Mode | Implementation |
|-------------|----------------|
| Global | No filter (all commands) |
| Directory | `command.directories.contains(cwd)` |
| Workspace | `command.directories.any(\|d\| d.starts_with(git_root))` |
| Host | `command.hosts.contains(hostname)` |
| Session | `command.sessions.contains(session_id)` |
Filters are pre-computed into a HashSet before the search, making the
filter callback O(1).
### Search Flow
1. **Daemon startup**: Loads history from SQLite in pages, builds
deduplicated index
2. **Frecency precompute**: Background task builds frecency map after
history loads
3. **Search request**: Client sends query with filter mode and context
4. **Filter**: Pre-computed HashSet determines which commands pass the
filter
5. **Match**: Nucleo fuzzy matches the query against command text
6. **Score**: Frecency scorer ranks results (fuzzy score + frecency *
10)
7. **Response**: Returns history IDs for the most recent invocation of
each matching command
8. **Hydration**: Client fetches full records from local SQLite
### Configuration
```toml
# Enable daemon + autostart
[daemon]
enabled = true
autostart = true
# Enable daemon-based fuzzy search
[search]
search_mode = "daemon-fuzzy"
```
## Performance
Performance varies based on several factors, but in most initial testing
with the new architecture shows improvement:
* **Nucleo performs searches up to 4.5x faster**: direct DB search
averages 18.07ms, but the daemon completes the same queries in 3.99ms.
* **IPC overhead is significant, but acceptable**: a significant amount
of wall-time is taken up by the transfer of data over IPC (via UDS in
this case). This averages to about ~7.8ms and accounts for 66% of
client-side wall time.
* **Tail latency improves at every layer**: p99 times correspond to
initial requests, worst-case query patterns, etc. but the average p99
daemon-based response time is 3.6x better than the associated DB-based
search p99 time
* **Query complexity no longer impacts performance**: the Nucleo-based
search shows consistent 2-7ms times regardless of query pattern. The
DB-based search had a 17x variance (3.59ms to 62.46ms).
Interestingly, @ellie - who has a larger history store than I do - gets
even better performance on the IPC layer. This could use a lot more
testing in various edge cases and on various hardware, but seems
promising.
### Regular DB search
```
Individual calls for: db_search
--------------------------------------------------------------------------------------------------------------
# Wall Busy Idle Fields
--------------------------------------------------------------------------------------------------------------
1 32.25ms 32.20ms 47.70µs {"mode":"Fuzzy","query":"^"}
2 19.48ms 19.40ms 84.20µs {"mode":"Fuzzy","query":"^c"}
3 20.40ms 20.10ms 297.00µs {"mode":"Fuzzy","query":"^ca"}
4 13.07ms 13.00ms 69.90µs {"mode":"Fuzzy","query":"^car"}
5 12.17ms 12.10ms 67.10µs {"mode":"Fuzzy","query":"^carg"}
6 20.78ms 20.70ms 76.60µs {"mode":"Fuzzy","query":"^cargo"}
7 9.15ms 9.10ms 53.20µs {"mode":"Fuzzy","query":"^cargo "}
8 10.24ms 10.00ms 237.00µs {"mode":"Fuzzy","query":"^cargo b"}
9 10.01ms 9.68ms 325.00µs {"mode":"Fuzzy","query":"^cargo bu"}
10 5.89ms 5.83ms 57.20µs {"mode":"Fuzzy","query":"^cargo bui"}
11 8.85ms 8.28ms 568.00µs {"mode":"Fuzzy","query":"^cargo buil"}
12 7.70ms 7.49ms 212.00µs {"mode":"Fuzzy","query":"^cargo build"}
13 3.59ms 3.53ms 57.00µs {"mode":"Fuzzy","query":"^cargo build$"}
14 6.50ms 6.44ms 63.60µs {"mode":"Fuzzy","query":"^cargo "}
15 6.48ms 6.38ms 100.00µs {"mode":"Fuzzy","query":"!"}
16 31.68ms 31.60ms 75.90µs {"mode":"Fuzzy","query":"!g"}
17 62.46ms 62.40ms 58.90µs {"mode":"Fuzzy","query":"!gi"}
18 30.35ms 30.30ms 46.90µs {"mode":"Fuzzy","query":"!git"}
19 53.84ms 53.80ms 40.80µs {"mode":"Fuzzy","query":"!git "}
20 19.24ms 19.20ms 39.70µs {"mode":"Fuzzy","query":"!git c"}
21 22.03ms 22.00ms 34.70µs {"mode":"Fuzzy","query":"!git co"}
22 17.13ms 17.00ms 133.00µs {"mode":"Fuzzy","query":"!git com"}
23 16.14ms 15.90ms 242.00µs {"mode":"Fuzzy","query":"!git comm"}
24 5.11ms 5.08ms 28.60µs {"mode":"Fuzzy","query":"!git commi"}
25 7.31ms 7.26ms 52.70µs {"mode":"Fuzzy","query":"!git commit"}
Summary: 25 calls
Wall: avg=18.07ms, min=3.59ms, max=62.46ms, p50=13.07ms, p99=62.46ms
Busy: avg=17.95ms, min=3.53ms, max=62.40ms, p50=13.00ms, p99=62.40ms
```
### Daemon-based search
**Client**
```
Individual calls for: daemon_search
--------------------------------------------------------------------------------------------------------------
# Wall Busy Idle Fields
--------------------------------------------------------------------------------------------------------------
1 13.05ms 2.55ms 10.50ms {"query":"^"}
2 10.65ms 1.40ms 9.25ms {"query":"^c"}
3 10.72ms 1.18ms 9.54ms {"query":"^ca"}
4 5.54ms 485.00µs 5.06ms {"query":"^car"}
5 15.02ms 1.02ms 14.00ms {"query":"^carg"}
6 9.49ms 840.00µs 8.65ms {"query":"^cargo"}
7 5.53ms 555.00µs 4.97ms {"query":"^cargo "}
8 8.56ms 717.00µs 7.84ms {"query":"^cargo b"}
9 12.34ms 1.24ms 11.10ms {"query":"^cargo bu"}
10 8.38ms 650.00µs 7.73ms {"query":"^cargo bui"}
11 13.07ms 770.00µs 12.30ms {"query":"^cargo buil"}
12 17.11ms 709.00µs 16.40ms {"query":"^cargo build"}
13 15.41ms 907.00µs 14.50ms {"query":"^cargo build$"}
14 8.19ms 665.00µs 7.52ms {"query":"^cargo "}
15 7.98ms 1.72ms 6.26ms {"query":"!"}
16 13.56ms 856.00µs 12.70ms {"query":"!g"}
17 8.11ms 624.00µs 7.49ms {"query":"!gi"}
18 14.57ms 775.00µs 13.80ms {"query":"!git"}
19 14.18ms 779.00µs 13.40ms {"query":"!git "}
20 9.62ms 802.00µs 8.82ms {"query":"!git c"}
21 15.50ms 1.50ms 14.00ms {"query":"!git co"}
22 11.58ms 1.48ms 10.10ms {"query":"!git com"}
23 13.82ms 2.12ms 11.70ms {"query":"!git comm"}
24 17.48ms 2.18ms 15.30ms {"query":"!git commi"}
25 14.81ms 1.71ms 13.10ms {"query":"!git commit"}
Summary: 25 calls
Wall: avg=11.77ms, min=5.53ms, max=17.48ms, p50=12.34ms, p99=17.48ms
Busy: avg=1.13ms, min=485.00µs, max=2.55ms, p50=856.00µs, p99=2.55ms
```
**Daemon**
```
Individual calls for: daemon_search_query
--------------------------------------------------------------------------------------------------------------
# Wall Busy Idle Fields
--------------------------------------------------------------------------------------------------------------
1 1.75ms 250ns 1.75ms {"query":"^","query_id":1}
2 4.58ms 125ns 4.58ms {"query":"^c","query_id":2}
3 4.39ms 250ns 4.39ms {"query":"^ca","query_id":3}
4 2.52ms 125ns 2.52ms {"query":"^car","query_id":4}
5 4.44ms 250ns 4.44ms {"query":"^carg","query_id":5}
6 3.66ms 167ns 3.66ms {"query":"^cargo","query_id":6}
7 2.38ms 84ns 2.38ms {"query":"^cargo ","query_id":7}
8 4.13ms 84ns 4.13ms {"query":"^cargo b","query_id":8}
9 4.40ms 167ns 4.40ms {"query":"^cargo bu","query_id":9}
10 3.87ms 125ns 3.87ms {"query":"^cargo bui","query_id":10}
11 4.36ms 84ns 4.36ms {"query":"^cargo buil","query_id":11}
12 3.96ms 333ns 3.96ms {"query":"^cargo build","query_id":12}
13 4.61ms 167ns 4.61ms {"query":"^cargo build$","query_id":13}
14 4.20ms 209ns 4.20ms {"query":"^cargo ","query_id":14}
15 238.17µs 167ns 238.00µs {"query":"!","query_id":15}
16 4.44ms 125ns 4.44ms {"query":"!g","query_id":16}
17 3.47ms 83ns 3.47ms {"query":"!gi","query_id":17}
18 4.57ms 125ns 4.57ms {"query":"!git","query_id":18}
19 7.15ms 167ns 7.15ms {"query":"!git ","query_id":19}
20 4.27ms 250ns 4.27ms {"query":"!git c","query_id":20}
21 5.19ms 292ns 5.19ms {"query":"!git co","query_id":21}
22 4.29ms 417ns 4.29ms {"query":"!git com","query_id":22}
23 4.08ms 125ns 4.08ms {"query":"!git comm","query_id":23}
24 4.50ms 167ns 4.50ms {"query":"!git commi","query_id":24}
25 4.35ms 208ns 4.35ms {"query":"!git commit","query_id":25}
Summary: 25 calls
Wall: avg=3.99ms, min=238.17µs, max=7.15ms, p50=4.29ms, p99=7.15ms
Busy: avg=182ns, min=83ns, max=417ns, p50=167ns, p99=417ns
```
**Nucleo matching time (in daemon)**
```
Individual calls for: nucleo_match
--------------------------------------------------------------------------------------------------------------
# Wall Busy Idle Fields
--------------------------------------------------------------------------------------------------------------
1 1.73ms 125ns 1.73ms {"query":"^","query_id":1}
2 4.57ms 167ns 4.57ms {"query":"^c","query_id":2}
3 4.37ms 125ns 4.37ms {"query":"^ca","query_id":3}
4 2.51ms 84ns 2.51ms {"query":"^car","query_id":4}
5 4.43ms 125ns 4.43ms {"query":"^carg","query_id":5}
6 3.64ms 125ns 3.64ms {"query":"^cargo","query_id":6}
7 2.37ms 84ns 2.37ms {"query":"^cargo ","query_id":7}
8 4.11ms 125ns 4.11ms {"query":"^cargo b","query_id":8}
9 4.36ms 208ns 4.36ms {"query":"^cargo bu","query_id":9}
10 3.85ms 125ns 3.85ms {"query":"^cargo bui","query_id":10}
11 4.35ms 125ns 4.35ms {"query":"^cargo buil","query_id":11}
12 3.94ms 250ns 3.94ms {"query":"^cargo build","query_id":12}
13 4.59ms 125ns 4.59ms {"query":"^cargo build$","query_id":13}
14 4.18ms 84ns 4.18ms {"query":"^cargo ","query_id":14}
15 220.13µs 125ns 220.00µs {"query":"!","query_id":15}
16 4.43ms 125ns 4.43ms {"query":"!g","query_id":16}
17 3.45ms 125ns 3.45ms {"query":"!gi","query_id":17}
18 4.55ms 125ns 4.55ms {"query":"!git","query_id":18}
19 7.12ms 209ns 7.12ms {"query":"!git ","query_id":19}
20 4.25ms 166ns 4.25ms {"query":"!git c","query_id":20}
21 5.18ms 125ns 5.18ms {"query":"!git co","query_id":21}
22 4.27ms 125ns 4.27ms {"query":"!git com","query_id":22}
23 4.06ms 292ns 4.06ms {"query":"!git comm","query_id":23}
24 4.46ms 166ns 4.46ms {"query":"!git commi","query_id":24}
25 4.31ms 208ns 4.31ms {"query":"!git commit","query_id":25}
Summary: 25 calls
Wall: avg=3.97ms, min=220.13µs, max=7.12ms, p50=4.27ms, p99=7.12ms
Busy: avg=147ns, min=84ns, max=292ns, p50=125ns, p99=292ns
```
Diffstat (limited to 'CONTRIBUTING.md')
| -rw-r--r-- | CONTRIBUTING.md | 45 |
1 files changed, 45 insertions, 0 deletions
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 8c1f45b7..f47e1ff4 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -67,6 +67,51 @@ While iterating on the server, I find it helpful to run a new user on my system, Our test coverage is currently not the best, but we are working on it! Generally tests live in the file next to the functionality they are testing, and are executed just with `cargo test`. +## Logging and Debugging + +### Log Files + +Atuin writes logs to `~/.atuin/logs` unless configured otherwise. Log files are rotated daily and retained for 4 days by default: + +- `search.log.*` - Interactive search session logs +- `daemon.log.*` - Background daemon logs + +### Log Levels + +You can set the `ATUIN_LOG` environment variable to override log verbosity from the config file: + +```shell +ATUIN_LOG=debug atuin search # Enable debug logging +ATUIN_LOG=trace atuin search # Enable trace logging (very verbose) +``` + +### Span Timing (Performance Profiling) + +For performance analysis, you can capture detailed span timing data as JSON: + +```shell +ATUIN_SPAN=spans.json atuin search +``` + +This creates a JSON file with timing information for each instrumented span, including: +- `time.busy` - Time actively executing code +- `time.idle` - Time awaiting async operations (I/O, child tasks) + +The `scripts/span-table.ts` script analyzes these logs: + +```shell +# Summary view - shows all spans with timing stats +bun scripts/span-table.ts spans.json + +# Detail view - shows individual calls for a specific span +bun scripts/span-table.ts spans.json --detail daemon_search + +# Filter to specific spans +bun scripts/span-table.ts spans.json --filter "search|hydrate" +``` + +This is useful for comparing performance between different search implementations or identifying bottlenecks. + ## Migrations Be careful creating database migrations - once your database has migrated ahead of current stable, there is no going back |
