Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimise the output to reduce token consumption #113

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Commits on Feb 17, 2024

  1. Created a new method for output optimisation

    Implemented a function in esbmc_util that uses regex patterns to remove some output
    lines to reduce token consmuption and to facilitate the reduction of the likelihood that
    LLMs like GPT 3.5 turbo reach the token limit for some codebases that need
    comprehensive explanations.
    
    Addded in config.json four new cases that address common bugs in C programs: buffer overflow,
    arithmetic overflow, array out-of-bounds and memory leaks.
    
    Signed-off-by: mihai.state <mihaita.state@yahoo.com>
    mihaistate05 committed Feb 17, 2024
    Configuration menu
    Copy the full SHA
    783f6b1 View commit details
    Browse the repository at this point in the history

Commits on Feb 18, 2024

  1. Fix minor changes

    mihaistate05 committed Feb 18, 2024
    Configuration menu
    Copy the full SHA
    d8f6704 View commit details
    Browse the repository at this point in the history

Commits on Mar 5, 2024

  1. Improved the ESBMC-AI output reduction

    Created more regex patterns that remove some lines that can be ommitted
    and provide shorter outputs.
    Added a few more indications in the config.json file that
    tell the LLM to avoid mentioning time measurements
    
    Signed-off-by: mihai.state <mihaita.state@yahoo.com>
    mihaistate05 committed Mar 5, 2024
    Configuration menu
    Copy the full SHA
    1dc5c0b View commit details
    Browse the repository at this point in the history

Commits on Mar 25, 2024

  1. Enhanced the output reduction

    Created new output scenarious in the config.json file
    for NaN check, struct fields check, deadlock check and data races check
    Created the reduce_output field in the config.json file to tell the
    LLM to produce a smaller output
    Created new regex patterns in the reduce_output2() function
    Created the remove_patterns_nltk that uses NLP methods to identify
    patterns in a sequence of tokenized words. The advantage of it are
    that it provides more granular control over text sentences, it can
    be more precise in some situations.
    Created the test_output_reducer file that tests the functionality of the esbmc_output_optimisation(),
    reduce_output2() and remove_patterns_nltk() functions that covers all the string patterns
    Instantiated the GPT_4_TURBO_PREVIEW model in the enum of the ai_models.py and used it for various
    code bases.
    
    Signed-off-by: mihai.state <mihaita.state@yahoo.com>
    mihaistate05 committed Mar 25, 2024
    Configuration menu
    Copy the full SHA
    abbf9a0 View commit details
    Browse the repository at this point in the history

Commits on Mar 26, 2024

  1. Configuration menu
    Copy the full SHA
    c7ded6f View commit details
    Browse the repository at this point in the history

Commits on Jun 27, 2024

  1. Optimized the regex pattens to be more time efficient

    Removed the 'flags=re.MULTILINE' argument for the regex patterns that work properly without it
    Used the re.compile method for some patterns to reduce time complexity
    
    Signed-off-by: mihai.state <mihaita.state@yahoo.com>
    mihaistate05 committed Jun 27, 2024
    Configuration menu
    Copy the full SHA
    36798da View commit details
    Browse the repository at this point in the history