Using CodeQL and Semgrep to Assist Vulnerability Research (Part 1 of 6)

March 5th, 2025 by Brian

Huy Dai was previously a summer intern from MIT and has since graduated to join the Caesar Creek Software team in Woburn, MA. During his internship, he performed a security assessment of the Peloton Bike and, upon joining CC-SW full-time, he has conducted research using CodeQL and Semgrep to aid in vulnerability research.

Motivation

At Caesar Creek Software, one of our areas of expertise is performing vulnerability research on embedded devices. Due to the large amount of code in modern embedded devices, automation in Vulnerability Research (VR) is an important aspect of our work. Finding smart ways which can reduce tedious tasks and help us find bugs quicker can be hugely beneficial in our VR process.

In the the past couple of months, I’ve looked at investigating into using CodeQL and Semgrep as complementary SAST (Static Application Security Testing) tools for bug-hunting. In particular, I’ve looked at the following use cases:

1. CodeQL: To analyze open-source libraries and other software libraries which we have full-source and build scripts

2. Semgrep + Ghidra/IDA: To decompile and scan binaries for vulnerabilities whose source is unavailable to us

The appeal of tools such as CodeQL and Semgrep is that they allow you to write queries for common bug classes (such as buffer overflow, command injection, integer overflow, etc.), and quickly re-run them across multiple codebases. The idea is that once you have written a query/rule for a given bug pattern, you can scan for similar vulnerable code sections (e.g. identify sister bugs) either in the same or different project, without having to manually look for it yourself.

Goal

This post will be the the first blog of a six-part series covering CodeQL and Semgrep:

  • In this first part, I will be highlighting a broad, generalizable query I wrote that was able to identify integer overflow CVEs across multiple open-source libraries
  • In the second part, I will showcase a CodeQL query I wrote that targets a specific bug in BlueZ (a popular Bluetooth library) that led to identifying an unpatched bug in the same library
  • For part 3-4, I will diving deep into CodeQL and strategies on how to write generalizable queries
  • For part 5-6, I will diving into similar strategies for writing effective Semgrep rules

This blog series is coming out as part of the research effort I’m doing at Caesar Creek Software, where I focus on writing CodeQL queries and Semgrep rules aimed at covering a specific bug class, then I evaluate their effectiveness by using previously known CVEs as a ground truth.

Despite their relatively simple construction, my queries and rules were able to generate hits on multiple CVEs across different libraries. When optimizing my queries, I focused on specificity (the ability for a query to detect a specific bug with low false positives) and generalizability (the ability for a query to able to detect other bugs like it within the same and different codebase).

Common Bug Type: Integer Overflow to Malloc

First, I will be covering a query aimed at describing a general bug type: An int overflow to malloc.

When looking through the list of vulnerabilities found in open-source libraries uch as libcurl, libTIFF, BlueZ, connman, etc. in the past couple of years, one common bug pattern I noticed was an integer overflow leading to malloc(). For example,

could be problematic in situations where var is user-controlled and it may be greater than SIZE_MAX / 2, which would lead to integer overflow. In such case, actual allocated size would be smaller than expected, and that could lead to out-of-bound accesses down the road when the buffer is used.

Writing the CodeQL Query

Given that we have access to the source and build scripts for many of these open-source libraries, I started with trying to describe this bug using CodeQL. My predicate query looks as follows:

I originally wrote the query to target CVE-2018-14618, a 9.8 (Critical) rated vulnerability in the curl library. Taking a look at the patch commit, we can see that it revolves around a possible integer overflow when using the result of strlen(...) * 2 inside malloc.

However, a surprising result is that when re-running this predicate on the rest of the curl library and other open-source libraries, we see that it was also able to detect 6 similar CVEs:

Note: The “Modified” label next to Libexpat indicates that there was additional modification that I had to make to the query to be able to properly detect the bugs.

Specifically, with the way that libexpat, a popular XML parser library, defines its calls to malloc and realloc, it is doing a call to a function pointer that points to custom allocation implementations, rather than always calling the default functions.

Taking a closer look at allocSizeMightOverflow CodeQL query, you may have noticed that I try to encounter for this case through the use custom classes of CCSW_Helper::AllocCall and CustomMulExpr. Due to slight differences in function representation, I had to specify additional constraints to properly detect all allocation calls and multiplication expressions across different libraries:

Syntax Note: One interesting aspect to note is while the #define preprocessor directive for realloc defines three arguments REALLOC(parser, p, s), due to the way that CodeQL resolves define statements, it directly evaluates the underlying expression (parser->m_mem.realloc_fcn((p), (s))), which is how we are able to query for a VariableCall at the second argument (rather than the third) for the allocation size.

For a more in-depth breakdown on the function identification process and understanding use of function pointer/wrappers that can appear across different codebases, see my following blog posts on important CodeQL techniques for security engineers.

With these custom classes, we are now able to detect bugs that involve integer overflows to an allocation call in Libexpat, such as CVE-2022-22824:

along with CVE-2021-46143 :

This type of customization is something that I come across often while trying to generalize my CodeQL queries. For the most part, once I have the main bug logic down (in this case the allocSizeMightOverflow predicate), I was able to make it work across multiple codebases by improving the function identification and adding edge cases rather than modifying the main bug description.

For example, there are a number of areas which we can add to this predicate to improve its generalizability:

  1. Expanding on integer overflows: Currently we are querying for multiplication expressions (var * SOME_CONSTANT) as a source for an integer overflow, but overflows can occur in addition expressions (var + SOME_CONSTANT or var_1 + var_2). If we were to consider integer underflows, we can also expand our definition to include subtraction.
  2. Expanding on allocation function: In addition to alloc and realloc, we can also consider the use of calloc or new operators.

As we try CodeQL scans on more codebases, we can slowly expand these definitions to cover more cases. However, as with any query, we have to be careful to avoid over-generalization. If we were to start capturing every mathematical expression in the code as possible integer overflows, then the surface area for the query becomes too large. This in turn would lead to many false positives in our results, and increase the time needed for manual bug verification.

As such, we have to be careful in the way that we expand our query, and only focus on avenues which we think could help capture the most bugs without compromising too much on accuracy.

Translating to a Semgrep Rule

So far we’ve talked a lot about CodeQL, but can we achieve a similar result using Semgrep?

If you’ve read on discussion guides on the difference between CodeQL and Semgrep, you will find that Semgrep syntax tends to be easier to learn, as it more closely resembles advanced grep syntax – hence its namesake. This is contrast to CodeQL’s implementation of a custom extractor for each language it supports, which leads to different query syntax depending if you are working with C/C++, Java, Python, etc.. However, a tradeoff of that design is that Semgrep captures syntax information with less granularity and accuracy compared to CodeQL, and it does not have as powerful data flow/taint analysis.

That said, I believe that Semgrep rules can achieve many of the same results as an equivalent CodeQL query. In particular, I looked at a particularly challenging workflow where I take compile the binaries and linked libraries (.so files) generated by the open-source libraries above, decompile them with Ghidra/IDA Pro, and then scan the resulting pseudo C/C++ code using Semgrep.

This process is useful in scenarios where security engineers does not have full source and build environments for the targets they are looking at, and instead must depend on looking through decompiled code for potential vulnerabilities.

Using the structure and logic I had in the CodeQL query, I was able to replicate it in Semgrep as follows:

Here we’re using on Semgrep’s experimental symbolic propagation feature to help track the flow of multiplication expression (e.g. var * SOME_CONSTANT) to the size argument of malloc and realloc. While not perfect, it helps us extend the contextual window of our Semgrep rule to track situations such as:

where the value of the expression sVar1 * 2 will be propagated to its usage inside Curl_cmalloc(...).

Overall, from our testing, the Semgrep query was able to identify the six out of the seven vulnerabilities from before, even when we are scanning the decompiled pseudocode from these binaries. We can see this from the Semgrep command line results:

CVE-2018-14618

CVE-2017-8816 (curl)

CVE-2019-5435 (curl)

CVE-2022-22824 (libexpat)

CVE-2022-22827 (libexpat)

CVE-2021-46143 (libexpat)

The vulnerability that Semgrep wasn’t able to detect was CVE-2020-12762 in json-c, a popular JSON library. From closer inspection, I believe the scan failed because Semgrep’s symbolic propagation didn’t register that a potential value for iVar3 could be p->size * 2 due to the if condition in-between. Even if we remove the guard conditions requirements from our Semgrep rule, it still fails to register a hit.

That said, this is still a great result given that Semgrep is working off with much more limited information and uncommon code patterns that we often see in decompiled code. Given its support for regex patterns for function names, we were able to easily search for malloc, realloc, malloc_fcn, Curl_cmalloc, etc. calls without needing any prior knowledge of the codebase.

Summary

So far, we’ve discussed the idea of writing a general query targeting a common bug class that can be run across multiple codebases.

For these queries, we’re interested in capturing high-level code patterns that will be seen across many different projects. As an example, I provided my CodeQL query and Semgrep rule that targets an integer overflow leading to a malloc call. By improving on aspects such as function identification, I was generalize the query that to identify similar-looking CVEs across three open-source libraries. These results help demonstrate the flexibility and potential for both CodeQL and Semgrep as SAST tools that we can use to find common, high-level vulnerabilities.

In the next installment of this blog, I want to focus on another use of CodeQL and Semgrep, which is to write codebase-specific bug patterns that we can use to find sister bugs within the same code project. By trading off generalizability for specificity, we can come into bug queries that allows us to find similar versions of specific vulnerabilities. As an example, I will showcase a query I wrote for a BlueZ buffer overflow CVE that led me to finding an unpatched bug in the library!