Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OnDiskCorpus files be configurable to contain a human readable representation of the input #2538

Open
riesentoaster opened this issue Sep 23, 2024 · 10 comments
Labels
enhancement New feature or request

Comments

@riesentoaster
Copy link
Contributor

Most fuzzers will likely use some form of OnDiskCorpus (incl. InMemoryOnDiskCorpus, CachedOnDiskCorpus, etc.) for their solutions. To then figure out, what the problem actually was, one would need to know the content of the testcase/input that triggered the feedbacks. Currently, corpora storing them on disk store a bunch of generic information in the file associated with the testcase/input (such as runtime), but no representation of the input.

The only way to do add this without resorting to writing dummy-feedbacks that do nothing but add a new metadata with the input content, is by implementing the filename generating function on the input to extract the testcase from the corpus, and somehow stringify it:

fn generate_name(&self, id: Option<CorpusId>) -> String;

However, file names have a length restriction, so this isn't usable for inputs that can get somewhat long. Plus, for structured inputs, it would be much easier to have the entire structure nicely formatted in the file.

@domenukk
Copy link
Member

I don't fully understand: The OnDiskCorpus will contain the "content of the testcase/input that triggered the inputs"- that's what it's for, right?

That being said, currently the correct(tm) way to add metadata to a Testcase is via custom Feedbacks that do nothing like here:

impl StdOutToMetadataFeedback {

@riesentoaster
Copy link
Contributor Author

Yes, the corpus will contain everything, of course. But it isn't written to disk, so when I kill the fuzzer, I lose everything but the metadata (found in the .metadata file). And that doesn't per default contain the input that triggered a crash (or whatever you're looking for). So I can't reproduce the crash.

@domenukk
Copy link
Member

Why is the _ OnDisk_Corpus not written to disk?
What crash are you talking about? A crash in the fuzzer or a crash in the target? Crashes in the target are of course included in the corpus (if you have a CrashFeedback)? Sorry, I'm confused...

@riesentoaster
Copy link
Contributor Author

Ah, I see, seems like I missed something. If I understand correctly, the input content is serialised and written to disk in this method on Input, to the file associated with the crash without an extension or a leading dot:

/// Write this input to the file
fn to_file<P>(&self, path: P) -> Result<(), Error>
where
    P: AsRef<Path>,
{
    write_file_atomic(path, &postcard::to_allocvec(self)?)
}

When initialising the corpus, a format can be passed, and while this leaves the metadata nicely formatted, the input itself is still serialised and thus not human readable.

 OnDiskCorpus::with_meta_format(
    PathBuf::from("./crashes"),
    OnDiskMetadataFormat::JsonPretty,
)
.unwrap(),

So I guess I'm asking for an option for human-readable serialisation of the input when written to disk.

@riesentoaster riesentoaster changed the title OnDiskCorpus files should contain representation of Input OnDiskCorpus files be configurable to contain a human readable representation of the input Sep 25, 2024
@riesentoaster
Copy link
Contributor Author

I guess I could also just implement this for my input, so a global option may not be strictly necessary, but it would still be nice, just for consistency.

@riesentoaster
Copy link
Contributor Author

riesentoaster commented Sep 25, 2024

Related question: All input types in the repo (at least as far as I can see) generate their testcase names (fn generate_name(&self, id: Option<CorpusId>) -> String; on Input) the exact same way: hash their content (for collection types, namely Vecs, this is done manually for some reason) and take the first 16 bytes.

Should there not just be a blanket implementation that does this for any input that implements Hash (or where this is derived)?

@domenukk
Copy link
Member

domenukk commented Oct 1, 2024

For a human-readable serialization there is the DumpToDiskStage that goes through new inputs and serializes them with a provided closure.
Is this what you are looking for?

@riesentoaster
Copy link
Contributor Author

riesentoaster commented Oct 3, 2024

Yes, this kind of does what I would want it to do, but

  1. It also serialises corpus, not just solutions (and returns an error if passed something like /dev/null)
  2. I need to manually do the serialisation, as opposed to just telling it (like passing OnDiskMetadataFormat::JsonPretty)

Depending on how large your corpus gets and the change-rate within it, the first point may annoying to a considerable downside. The second is not critical, just a bit of extra code, would just be easier without it :)

Plus I would expect this kind of functionality in the corpus, especially OnDiskCorpus, not in a stage — that's probably also why I haven't found this.

@domenukk
Copy link
Member

domenukk commented Oct 3, 2024

Feel free to fix the first point :)
For the second point, we could have a number of serialiser functions in LibAFL, right?

Open for other suggestions of course.

@Slava0135
Copy link

you can use append_metadata on objective feedback to store any metadata for solution you want (see #2556)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants