moonlock/DECISIONS.md
nevaforget 3e610bdb4b
All checks were successful
Update PKGBUILD version / update-pkgver (push) Successful in 3s
fix: audit LOW fixes — docs, rustdoc, scope, debug gate, lto fat (v0.6.12)
- Update CLAUDE.md and README.md to reflect the blur range [0,200] that
  the code has clamped to since v0.6.8.
- Move the // SYNC: comment above the /// doc on MAX_BLUR_DIMENSION so
  rustdoc renders one coherent paragraph instead of a truncated sentence.
- Narrow check_account visibility to pub(crate) and document the caller
  precondition (username must come from users::get_current_user()).
- Gate MOONLOCK_DEBUG behind #[cfg(debug_assertions)]. Release builds
  always run at LevelFilter::Info so a session script cannot escalate
  journal verbosity to leak fprintd / D-Bus internals.
- Document why pam_setcred is deliberately not called in authenticate().
- Release profile: lto = "fat" instead of "thin" — doubles release build
  time for better cross-crate inlining on the auth + i18n hot paths.
2026-04-24 14:05:17 +02:00

88 lines
14 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Decisions
Architectural and design decisions for Moonlock, in reverse chronological order.
## 2026-04-24 Audit LOW fixes: docs, rustdoc, check_account scope, debug gating, lto fat (v0.6.12)
- **Who**: ClaudeCode, Dom
- **Why**: Six LOW findings cleared in a single pass. (1) Docs referenced the old `[0,100]` blur range; code clamps `[0,200]` since v0.6.8. (2) The `MAX_BLUR_DIMENSION` doc comment was split by a `// SYNC:` block, producing a truncated sentence in rustdoc. (3) `check_account` was `pub` and relied on callers only ever passing `getuid()`-derived usernames; the contract was not enforced by the type system. (4) `MOONLOCK_DEBUG` env var flipped log verbosity to Debug in release builds, letting a compromised session script escalate journal noise about fprintd / D-Bus. (5) `pam_setcred` absence was undocumented. (6) `[profile.release]` used `lto = "thin"` — fine for most crates, but for a latency-critical auth binary compiled once per release, fat LTO's extra cross-crate inlining is worth the ~1 min build hit.
- **Tradeoffs**: `lto = "fat"` roughly doubles release build time (~30 s → ~60 s) for slightly better inlining across PAM FFI wrappers and the i18n/status paths. `#[cfg(debug_assertions)]` on the debug-level selector means you have to run a debug build to raise log level — inconvenient for live troubleshooting, but aligned with the security-first posture.
- **How**: (1) `CLAUDE.md` + `README.md` updated to `[0,200]`. (2) `// SYNC:` block moved above the `///` doc so rustdoc renders one coherent paragraph. (3) `check_account` visibility narrowed to `pub(crate)` with a `Precondition` paragraph explaining the username contract. (4) Debug-level selection wrapped in `#[cfg(debug_assertions)]`; release builds always run at `LevelFilter::Info`. (5) Added a comment block in `authenticate()` documenting why `pam_setcred` is deliberately absent and where it would hook in if needed. (6) `lto = "fat"` in `Cargo.toml`.
## 2026-04-24 Audit MEDIUM fixes: D-Bus cleanup race, TOCTOU open, FP reset, GTK entry clear (v0.6.11)
- **Who**: ClaudeCode, Dom
- **Why**: Second round after the HIGH fixes, addressing the four MEDIUM findings. (1) `cleanup_dbus` spawned VerifyStop + Release as fire-and-forget, then `resume_async` called Claim after only a 2 s timeout — shorter than the 3 s D-Bus timeout, so on a slow bus the Claim could race the Release and fprintd would reject it, leaving the FP listener permanently dead. (2) `load_background_texture` relied on the caller's `symlink_metadata` check, re-opening the path via `gdk::Texture::from_file` — a classic TOCTOU window. (3) `resume_async` unconditionally reset `failed_attempts`, allowing an attacker with sensor control to evade the 10-attempt cap by cycling verify-match → `check_account` fail → resume. (4) The GTK `PasswordEntry` buffer was only cleared on timeout or auth failure, leaving the password in GLib malloc'd memory longer than necessary.
- **Tradeoffs**: The D-Bus cleanup is now split into a synchronous helper (`take_cleanup_proxy` — signal disconnect + flag clear) and an async helper (`perform_dbus_cleanup` — VerifyStop + Release), so `resume_async` can await the release while `stop()` stays fire-and-forget. Dropping the `failed_attempts` reset means a flaky sensor could reach 10 failures faster, but the correct remedy is a new lock session (construction) rather than a reset that also helps attackers.
- **How**: (1) Split `cleanup_dbus` into `take_cleanup_proxy()` (sync) + `perform_dbus_cleanup(proxy)` (async). `resume_async` now awaits `perform_dbus_cleanup` before `begin_verification`. `stop()` still spawns the cleanup fire-and-forget. (2) `load_background_texture` opens with `O_NOFOLLOW` via `std::fs::OpenOptions::custom_flags`, reads to bytes, and builds the texture via `gdk::Texture::from_bytes`. (3) Removed `listener.borrow_mut().failed_attempts = 0` from `resume_async`. (4) `password_entry.set_text("")` now fires right after the `Zeroizing::new(entry.text().to_string())` extraction, shortening the GTK-side window.
## 2026-04-24 Audit fixes: RefCell borrow across await, async avatar decode
- **Who**: ClaudeCode, Dom
- **Why**: Triple audit found two HIGH issues. (1) `init_fingerprint_async` held a `RefCell` immutable borrow across `is_available_async().await` — a concurrent `connect_monitor` signal (hotplug / suspend-resume) invoking `borrow_mut()` during the await would panic. (2) `set_avatar_from_file` decoded avatars synchronously via `Pixbuf::from_file_at_scale`, blocking the GTK main thread inside the `connect_monitor` handler. With `MAX_AVATAR_FILE_SIZE` at 10 MB the worst-case stall was 200500 ms on monitor hotplug.
- **Tradeoffs**: Avatar is shown as the symbolic default icon for a brief window while decoding completes. Wallpaper stays synchronous because `connect_monitor` fires during `lock()` and needs the texture already present (see 2026-04-09).
- **How**: (1) Extract `username` into a local `String` in `init_fingerprint_async`, drop the borrow before the await, re-borrow in a new scope after — no awaits inside the second borrow, so hotplug during signal setup is safe. (2) `set_avatar_from_file` now uses `gio::File::read_future` + `Pixbuf::from_stream_at_scale_future` for async I/O and decode. The default icon is shown immediately; the decoded texture replaces it when ready. `Pixbuf` itself is `!Send`, so `gio::spawn_blocking` does not apply — the GIO async stream loader keeps the `Pixbuf` on the main thread while the kernel does the I/O asynchronously.
## 2026-04-09 Monitor hotplug via connect_monitor signal
- **Who**: ClaudeCode, Dom
- **Why**: moonlock crashed with segfault in libgtk-4.so after suspend/resume — HDMI monitor disconnect/reconnect invalidated GDK monitor objects, and the statically created windows referenced destroyed surfaces. Crash at consistent GTK4 offset (0x278 NULL dereference), 3x reproduced.
- **Tradeoffs**: Wallpaper texture now loaded before `lock()` instead of after (connect_monitor fires during lock() and needs the texture). Local JPEG loading is fast enough that the delay is negligible. Shared state moved to Rc's for the signal closure — slightly more indirection but necessary for dynamic window creation.
- **How**: (1) Bump gtk4-session-lock feature from `v1_1` to `v1_2` to enable `Instance::connect_monitor`. (2) Replace manual monitor iteration with `lock.connect_monitor()` signal handler that creates windows on demand. (3) Signal fires once per existing monitor at `lock()` and again on hotplug. (4) Windows auto-unmap when their monitor disappears (ext-session-lock-v1 guarantee). (5) Fingerprint listener published to shared Rc so hotplugged monitors get FP labels.
## 2026-03-31 Fourth audit: peek icon, blur limit, GResource compression, sync markers
- **Who**: ClaudeCode, Dom
- **Why**: Fourth triple audit found blur limit inconsistency (moonlock 0100 vs moongreet/moonset 0200), missing GResource compression, peek icon inconsistency, and duplicated code without sync markers.
- **Tradeoffs**: Peek icon enabled in lockscreen — user decision favoring UX consistency over shoulder-surfing protection. Acceptable for single-user desktop. Blur limit raised to 200 for ecosystem consistency.
- **How**: (1) `show_peek_icon(true)` in lockscreen password entry. (2) `clamp(0.0, 200.0)` for blur in config.rs. (3) `compressed="true"` on CSS/SVG GResource entries. (4) SYNC comments on duplicated blur/background functions pointing to moongreet and moonset.
## 2026-03-30 Third audit: blur offset, lock-before-IO, FP signal lifecycle, TOCTOU
- **Who**: ClaudeCode, Dom
- **Why**: Third triple audit (quality, performance, security) found: blur padding offset rendering texture at (0,0) instead of (-pad,-pad) causing edge darkening on left/top (BUG), wallpaper disk I/O blocking before lock() extending the unsecured window (PERF/SEC), signal handler duplication on resume_async (SEC), failed_attempts not reset on FP resume (SEC), unknown VerifyStatus with done=false hanging FP listener (SEC), TOCTOU in is_file+is_symlink checks (SEC), dead code in faillock_warning (QUALITY), unbounded blur sigma (SEC).
- **Tradeoffs**: Wallpaper loads after lock() — screen briefly shows without wallpaper until texture is ready. Acceptable: security > aesthetics. Blur sigma clamped to [0.0, 100.0] — arbitrary upper bound but prevents GPU memory exhaustion.
- **How**: (1) Texture offset to (-pad, -pad) in render_blurred_texture. (2) lock.lock() before resolve_background_path. (3) begin_verification disconnects old signal_id before registering new. (4) resume_async resets failed_attempts. (5) Unknown VerifyStatus with done=true triggers restart. (6) symlink_metadata() for atomic file+symlink check. (7) faillock_warning dead code removed, saturating_sub. (8) background_blur clamped. (9) Redundant Zeroizing<Vec<u8>> removed. (10) Default impl for FingerprintListener. (11) on_verify_status restricted to pub(crate). (12) Warn logging for non-UTF-8 GECOS and avatar paths.
## 2026-03-30 Second audit: zeroize CString, FP account check, PAM timeout, blur downscale
- **Who**: ClaudeCode, Dom
- **Why**: Second triple audit (quality, performance, security) found: CString password copy not zeroized (HIGH), fingerprint unlock bypassing pam_acct_mgmt (MEDIUM), no PAM timeout leaving user locked out on hanging modules (MEDIUM), GPU blur on full wallpaper resolution (MEDIUM), no-monitor edge case doing `return` instead of `exit(1)` (MEDIUM).
- **Tradeoffs**: PAM timeout (30s) uses a generation counter to avoid stale result interference — adds complexity but prevents parallel PAM sessions. FP restart after failed account check re-claims the device, adding a D-Bus round-trip, but prevents permanent FP death on transient failures. Blur downscale to 1920px cap trades negligible quality for ~4x less GPU work on 4K wallpapers.
- **How**: (1) `Zeroizing<CString>` wraps password in auth.rs, `zeroize/std` feature enabled. (2) `check_account()` calls pam_acct_mgmt after FP match; `resume_async()` restarts FP on transient failure. (3) `auth_generation` counter invalidates stale PAM results; 30s timeout re-enables UI. (4) `MAX_BLUR_DIMENSION` caps blur input at 1920px, sigma scaled proportionally. (5) `exit(1)` on no-monitor after `lock.lock()`.
## 2026-03-28 Remove embedded wallpaper from binary
- **Who**: ClaudeCode, Dom
- **Why**: Wallpaper is installed by moonarch to /usr/share/moonarch/wallpaper.jpg. Embedding a 374K JPEG in the binary is redundant. GTK background color (Catppuccin Mocha base) is a clean fallback.
- **Tradeoffs**: Without moonarch installed AND without config, lockscreen shows plain dark background instead of wallpaper. Acceptable — that's the expected minimal state.
- **How**: Remove wallpaper.jpg from GResources, return None from resolve_background_path when no file found, skip background picture creation when no texture available.
## 2026-03-28 Audit-driven security and lifecycle fixes (v0.6.0)
- **Who**: ClaudeCode, Dom
- **Why**: Triple audit (quality, performance, security) revealed a critical D-Bus signal spoofing vector, fingerprint lifecycle bugs, and multi-monitor performance issues.
- **Tradeoffs**: `cleanup_dbus()` extraction adds a method but clarifies the stop/match ownership; `running_flag: Rc<Cell<bool>>` adds a field but prevents race between async restart and stop; sender validation adds a check per signal but closes the only known auth bypass.
- **How**: (1) Validate D-Bus VerifyStatus sender against fprintd's unique bus name. (2) Extract `cleanup_dbus()` from `stop()`, call it on verify-match. (3) `Rc<Cell<bool>>` running flag checked after await in `restart_verify_async`. (4) Consistent 3s D-Bus timeouts. (5) Panic hook before logging. (6) Blur and avatar caches shared across monitors. (7) Peek icon disabled. (8) Symlink rejection for background_path. (9) TOML parse errors logged.
## 2026-03-28 GPU blur via GskBlurNode replaces CPU blur
- **Who**: ClaudeCode, Dom
- **Why**: CPU-side Gaussian blur (`image` crate) blocked the GTK main thread for 500ms2s on 4K wallpapers at cold cache. Disk cache mitigated repeat starts but added ~100 lines of complexity.
- **Tradeoffs**: GPU blur quality is slightly different (box-blur approximation vs true Gaussian), acceptable for wallpaper. Removes `image` and `dirs` dependencies entirely. No disk cache needed.
- **How**: `Snapshot::push_blur()` + `GskRenderer::render_texture()` on `connect_realize`. Blur happens once on the GPU when the widget gets its renderer, producing a concrete `gdk::Texture`. Zero startup latency.
## 2026-03-28 Optional background blur via `image` crate (superseded)
- **Who**: ClaudeCode, Dom
- **Why**: Consistent with moonset/moongreet — blurred wallpaper as lockscreen background is a common UX pattern
- **Tradeoffs**: Adds `image` crate dependency (~15 transitive crates); CPU-side Gaussian blur at load time adds startup latency proportional to image size and sigma. Acceptable because blur runs once and the texture is shared across monitors.
- **How**: `load_background_texture(bg_path, blur_radius)` loads texture, optionally applies `imageops::blur()`, returns `gdk::Texture`. Config option `background_blur: Option<f32>` in TOML.
## 2026-03-28 Shared wallpaper texture pattern (aligned with moonset/moongreet)
- **Who**: ClaudeCode, Dom
- **Why**: Previously loaded wallpaper per-window via `Picture::for_filename()`. Multi-monitor setups decoded the JPEG redundantly. Blur feature requires texture pixel access anyway.
- **Tradeoffs**: Slightly more code in main.rs (texture loaded before window creation), but avoids redundant decoding and enables the blur feature.
- **How**: `load_background_texture()` in lockscreen.rs decodes once, `create_background_picture()` wraps shared `gdk::Texture` in `gtk::Picture`. Same pattern as moonset/moongreet.