Build your own AutoFix with Patchwork

September 25, 2024

We describe a patchflow for creating a customizable AutoFix tool that can automatically detect and fix software vulnerabilities using large language models (LLMs).

In this blog post we will look at creating our own AutoFix tool that can automatically detect and fix software vulnerabilities. In the past year, we have seen existing application security vendors, development tool providers, and new startups release tools that can help to fix the vulnerabilities in code using LLMs. However, all of these tools are not flexible and do not allow users complete control over their configuration and prompts. Moreover, these tools do not usually allow you to use your own local LLMs or self-host the solution. Recently, we partnered with OpenAI and shared how we have built the state of the art (SOTA) fine-tuned model for vulnerability remediation. We have also fine-tuned open weight models like Llama-3.1. Building on top of our fine-tuning work , we will show how you can build your own AutoFix like tool using our open-source framework patchwork.

The general workflow of how AutoFix works:

Clone the code

The first step is to get a copy of the code you are looking to analyze, you can clone the code repository and then run a scan to detect the vulnerabilities. Note that there are static analyzers that work on binary or byte code to detect vulnerabilities. But unlike just static analysis where we are looking to find issues we need access to source code as it will be needed for patch generation when fixing the vulnerabilities. 

Scan for vulnerabilities

There are many open-source vulnerability scanners that you can use to find vulnerabilities in the code. In our implementation, we will use Semgrep which is a lightweight scanner with an extensible set of rules. Semgrep is already integrated with patchwork so you can call it as a step in your patchflow to run:

ScanSemgrep(self.inputs).run()

This step will run the semgrep command locally in the current folder, so as long as you clone the repo and cd into that folder you will be able to scan the repo with semgrep. Semgrep comes with a set of rules for scanning, which are fine if you are using it on our own code locally on your machine. But if you are looking to run scans via your application or service you can use our permissively licensed (under MIT) rules collection. By default in the ScanSemgrep step we get it to output the results in a standard format called SARIF. This makes it easy to integrate any other static analyzer as long as it can output SARIF. The rest of the workflow can assume the input will be SARIF and use that.

Triage the results

The results returned from static analysis may contain false positives so it is a good idea to use LLMs to triage and see if we can eliminate some of the false positives. To do so we can give the vulnerable file along with the details from the SARIF report to ask the LLM to classify it. Now, if we want to do it naively we can give the entire SARIF report to the LLM and ask it to analyze it. But this is a not a good idea for two reasons, firstly the report itself is very large and may not fit the context length of the model, and secondly it is a good idea to include the actual source file as part of the prompt so that the model can analyze the code as long with the issue.

To address this, we can actually process the SARIF report and get the file paths to the vulnerable source code files. In patchwork we can then use the ExtractCode step to get the affected code that can be used as part of the context of the model. Note that the single file itself may be quite large so we do need to chunk and extract the relevant parts from the source file. All this is managed for you in the ExtractCode step. If you are curious you can see the various context strategies that are implemented in patchwork here.

To triage the results we can use a prompt like this and include the vulnerability details from the SARIF report along with the vulnerable code. By forcing the LLM to output with specific keywords like <NOT VULNERABLE> we make it easy to parse the output. You can also use a JSON schema for more complex output structures like we do in our app

You are a senior software engineer who is best in the world at triaging vulnerabilities. Do a vulnerability triage and analyze if the vulnerability can indeed be exploited in the given code.

If the vulnerability cannot be exploited, respond with <NOT VULNERABLE>.

else, If you cannot generate an exact fix for the vulnerability, respond with <NO FIX POSSIBLE>.

{{Vulnerability Details}}

{{Vulnerable Code}}

Once we parse the response from the LLM we can then filter out those vulnerabilities that it classifies as either not vulnerable or cannot be fixed. This should help reduce the false positives from the warnings reported by semgrep.

Generate the patch

The next step is to generate the actual fix that will address the vulnerability. This requires the affected code and the vulnerability details. We can easily use an LLM to prompt and ask to generate the fix. In our experiments we have found that giving the model the full file or as much context and then asking to fix the issue by pointing the vulnerability works better than trying to give only the line where the vulnerability has been found. So, in this case the prompt looks something like this:

 
You are an AI assistant specialized in fixing code vulnerabilities. 

Your task is to provide corrected code that addresses the reported security issue. 

Always maintain the original functionality while improving security. 

Be precise and make only necessary changes. 

Maintain the original code style and formatting unless it directly relates to the vulnerability. 

Pay attention to data flow between sources and sinks when provided.

Vulnerability Report:

- Type: {cwe}

- Location: {lines}

- Description: {message}

Original Code:

```

{file_text}

```

Task: Fix the vulnerability in the code above. Provide only the complete fixed code without explanations or comments. Make minimal changes necessary to address the security issue while preserving the original functionality.

This is the exact prompt that is used in static analysis eval, as you can see here. We discussed this eval and the performance of various models on this eval in a previous blog post. The generated code that fixes the vulnerability can now be committed to the repo.

Check compatibility

While generating the fix, the LLM may introduce other details or modify the code in a manner that doesn’t preserve the original intention. In order to avoid adding such code to the repo, we can do an additional check to determine if the generated patch will cause any compatibility issues with the existing repo. This can be done similar to our triaging prompt:

Do a brief change impact analysis to assess how these modifications might affect the overall system, considering both immediate and potential long-term compatibility issues.

Low: Code diff will be applied to the code base and automatically merged without review.

Medium: Code diff will be applied and a pull request will be sent to the developer to merge, but there are no indirect changes expected to be done in other parts of the system.

High: Code diff will be offered as a suggestion to the developer to review and then apply to the code base. There are likely other changes that need to be done by the developer before the change can be implemented.

Once the vulnerability fixes are classified into Low, Medium and High you can actually decide how you want to process it. We could generate the PRs only for patches that have High compatibility for instance and give suggestions for others. In patchwork we allow users to configure the compatibility level and only generate PR for what they choose. 

Create pull request

The final step is the PR generation step, there is already a CreatePR step in the patchwork framework that will take the modified code and create a pull request in the repo. We presented the workflow here as several steps with 3 different prompts but in our implementation we simplify it all to a single prompt that you can check out in our repo here. You can also see a sample PR generated by AutoFix below:

And, that’s all there is to it. The entire code for AutoFix is available in the patchwork repository here along with the detailed documentation. Some early benchmarking results with AutoFix were published by us sometime back, you can read them on our blog

Boost Release Velocity

Don't make developers wait - provide instant feedback and action items for the code they push.

Unburden Developers

Automate security and quality fixes so developers can focus on the building the features your users love.

Enhance Maintainability

Keep your codebase and documentation clean and up to date - just like it was on the first day!

Go beyond the IDE with Patched.

Get Started
2,000+ Patchflows Run

On 500+ Code Repositories

Choose between Self-hosted and Managed Service
Free and Open Source
Run Locally or On Cloud
Customize to your needs