Description
Automated firmware rehosting relies on accurately modeling device-specific peripherals, but current approaches often require extensive manual effort and ignore peripherals depending on external inputs. This thesis presents an automated peripheral-modeling method that reduces this bottleneck.
We propose synthesizing one model per register using symbolic regression and neural networks (MLP, CNN, autoregressive), optimized the neural networks via automated hyperparameter tuning and using sequential replay models as a fallback. Models use execution-trace data collected by forwarding MMIO accesses from an emulator to real hardware over a debug interface. Auxiliary features such as uptime, delta t, autoregression, and register-write flags expand the set of peripherals we can accurately model. For externally driven peripherals (UART, for example), models consume logic-analyzer bitstreams using a sliding-window method, allowing emulation with externally-dependent peripherals.
We automatically select the best-performing model for each register, achieving an average R^2=0.99 across six different peripherals on an RP2040 microcontroller. Using these models, we then accurately emulate a password-hashing firmware for the RP2040. Additionally, we successfully model a watchdog-based firmware on an STM32 Nucleo F072RB, demonstrating adaptability across different boards and peripheral types. Worst-case modeling time per register is approximately 25 minutes, but most registers require only seconds using simpler sequence replay.
The method remains limited for interrupts, highly stateful memories, and particularly complex peripherals, but effectively automates modeling for many common peripherals, potentially reducing reliance on manual effort and physical hardware during firmware analysis and security testing.
|