Optimise the output to reduce token consumption #113

Implemented a function in esbmc_util that uses regex patterns to remove some output lines to reduce token consmuption and to facilitate the reduction of the likelihood that LLMs like GPT 3.5 turbo reach the token limit for some codebases that need comprehensive explanations. Addded in config.json four new cases that address common bugs in C programs: buffer overflow, arithmetic overflow, array out-of-bounds and memory leaks. Signed-off-by: mihai.state <mihaita.state@yahoo.com>

Created more regex patterns that remove some lines that can be ommitted and provide shorter outputs. Added a few more indications in the config.json file that tell the LLM to avoid mentioning time measurements Signed-off-by: mihai.state <mihaita.state@yahoo.com>

Created new output scenarious in the config.json file for NaN check, struct fields check, deadlock check and data races check Created the reduce_output field in the config.json file to tell the LLM to produce a smaller output Created new regex patterns in the reduce_output2() function Created the remove_patterns_nltk that uses NLP methods to identify patterns in a sequence of tokenized words. The advantage of it are that it provides more granular control over text sentences, it can be more precise in some situations. Created the test_output_reducer file that tests the functionality of the esbmc_output_optimisation(), reduce_output2() and remove_patterns_nltk() functions that covers all the string patterns Instantiated the GPT_4_TURBO_PREVIEW model in the enum of the ai_models.py and used it for various code bases. Signed-off-by: mihai.state <mihaita.state@yahoo.com>

Removed the 'flags=re.MULTILINE' argument for the regex patterns that work properly without it Used the re.compile method for some patterns to reduce time complexity Signed-off-by: mihai.state <mihaita.state@yahoo.com>

Commits on Feb 18, 2024

Fix minor changes

mihaistate05 committed Feb 18, 2024

Configuration menu

View commit details

Copy full SHA for d8f6704

Browse repository at this point

Copy the full SHA

d8f6704 View commit details

Browse the repository at this point in the history

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimise the output to reduce token consumption #113

Optimise the output to reduce token consumption #113

Commits on Feb 17, 2024

Commits on Feb 18, 2024

Commits on Mar 5, 2024

Commits on Mar 25, 2024

Commits on Mar 26, 2024

Commits on Jun 27, 2024

Optimise the output to reduce token consumption #113

Are you sure you want to change the base?

Optimise the output to reduce token consumption #113

Commits on Feb 17, 2024

Commits on Feb 18, 2024

Commits on Mar 5, 2024

Commits on Mar 25, 2024

Commits on Mar 26, 2024

Commits on Jun 27, 2024