Authors
Liam Hayes, Hendra Gunadi, Adrian Herrera, Jonathon Milford, Shane Magrath, Maggi Sebastian, Michael Norrish, Antony L Hosking
Publication date
2019
Journal
arXiv preprint arXiv:1905.13055
Description
Mutation-based fuzzing typically uses an initial set of valid seed inputs from which to generate new inputs by random mutation. A given corpus of potential seeds will often contain thousands of similar inputs. This lack of diversity can lead to wasted fuzzing effort, as the fuzzer will exhaustively explore mutation from all available seeds. To address this, industrialstrength fuzzers such as American Fuzzy Lop (AFL) come with distillation tools (eg, afl-cmin) that automatically select seeds as the smallest subset of a given corpus that triggers the same range of instrumentation data points as the full corpus. Experience suggests that minimizing both the number and cumulative size of the seeds may lead to more efficient fuzzing, which we explore systematically here. We present a theoretical foundation for understanding the value of distillation techniques and a new algorithm for minimization based on this theory called MoonLight. The theory allows us to characterize the performance of MoonLight as near-optimal, outperforming existing greedy methods to deliver smaller seed sets. We then compare the effectiveness of MoonLight-distilled seed selection in a long fuzzing campaign, comparing against afl-cmin, with MoonLight configured to give weight to different characteristics of the seeds (ie, unweighted, file size, or execution time), as well as against each target’s full corpus and a singleton set containing only an “empty” valid input seed. Our results demonstrate that fuzzing with seeds selected by MoonLight outperforms the existing greedy afl-cmin, and that weighting by file size is usually the best option. We target six common open-source programs …
Total citations
202020212022131
Scholar articles
L Hayes, H Gunadi, A Herrera, J Milford, S Magrath… - arXiv preprint arXiv:1905.13055, 2019