Coverage-Based Fuzzing
Software testing is important. Every developer knows that. We all try our very very best to keep up with writing tests. But even a good test coverage is not enough when it comes to writing secure software. The diversity of inputs and internal states of a program is usually too large. Some special cases will be forgotten. Fuzzing tests your software with randomly mutated input data to find bugs and crashes automatically.
Structure Definitions for Binary Files
When dealing with binary file formats, sometimes, one has to look at the raw bits and bytes with a hex editor, especially when debugging a file writer or parser. Reinterpreting the hex numbers to the structures and values in the file format is very tedious. You have to keep all the value offsets and their types in mind to come to the right conclusion. An advanced feature of Okteta, KDE’s hex editor, are structure definitions which describe the data layout of the file format such that the GUI can visualize the values and structures contained in the file.
Self-Made Karaoke
I am definitely not the singer type. Never went to any karaoke and probably will never do. But for some reason I was intrigued to see, from a technical standpoint, how to create a karaoke song. How does one remove the vocals from a song and make the lyrics appear at the right time? I will show you the poor man’s approach of making your own karaoke songs and get them to play on a website.
Codegen in Databases
Just-in-time compilation is usually associated with managed languages like Java and C#, or scripting languages like Javascript. As detailed in the previous post, many other applications benefit from ad hoc code generation as well. This post is a tutorial on code generation in relational databases. Relational databases are commonly accessed with the standard query language SQL. The query optimizer generates an optimized query plan and passes it to the query execution engine for processing which in modern systems generates machine code for faster execution.
COAT: EDSL for Codegen
Code specialization has a huge impact on performance. Tailored code takes advantage of the knowledge about the involved data types and operations. In C++, we can instruct the compiler to generate specialized code at compile-time with the help of template metaprogramming and constant expressions. However, constant evaluation is limited as it cannot leverage runtime information. Just-in-time compilations lifts this limitation by enabling programs to generate code at runtime. In this post, I will present a header-only library providing abstract types and control flow abstractions to make code generation at runtime easier to use.
Static Machine Code Analysis
Modern processors are complex beasts. They reorder instructions in an ever-increasing instruction window and speculatively execute following iterations of a loop by predicting the branch of the loop condition. Both features are meant to extract as much instruction parallelism from the program code as possible to feed superscalar CPUs with enough work. They can execute multiple instructions in a single cycle if there are no dependencies. Static machine code analyzers let us take a look at how our code is executed by modeling the various execution stages of a CPU.
Record And Replay
Developers spent a significant percentage of their working time debugging their code. A fact many people do not really like to acknowledge as it feels like being unproductive. Software is complex. One usually does not get it right the first time. Therefore, it is very important to have good debugging tools supporting the developer in his quest to find the root cause of the issue. Being able to step backwards in the program execution helps tremendously as we can retrace the execution flow from the crash or assertion failure back to the operation which corrupted the program state.
Return Oriented Programming
As a C++ programmer I am well aware of memory errors such as buffer overflows and dangling pointers. There are a lot of good debugging tools available like Memcheck in Valgrind and Address Sanitizer in GCC and Clang which help identifying the root cause leading to the memory corruption. But memory errors are not just bugs resulting in crashes or incorrect program behavior. They are potentially severe security issues. To better understand the risks involved, let us take a look at some basic concepts of exploitation, particularly at return oriented programming (ROP).
X-Macros
A lot of C++ developers try to avoid preprocessor macros like the plague. There are genuine reasons for that. Macros might look like functions, but they behave differently, resulting in confusing bugs when not treated carefully. But even in modern C++, macros still have their use cases. In this post, I want to talk about a special kind of macro called X-macro which is mostly used to generate various code fragments from a single list of elements.
Identical Code Folding
Even more interesting than removing unused functions is consolidating identical instances of templated functions. For each template parameter, the compiler generates a new instance. In case of templated classes, it generates code per template parameter for every member function. The instances can have identical code, e.g., the member function is independent of the template parameters or the types are semantically equivalent for the applied operations. Let’s see if we can minimize the code explosion by deduplicating code when it is identical.
Removing Unused Code
I recently went on a little journey to get a better understanding of the linker used in C++. Like many other C++ developers, my mental model of what the linker actually does is very limited. But I’m always interested in learning new tricks about the tools I use regularly, especially new optimizations slumbering behind some obscure flag. In this post, we will have a look at the removal of unused functions from the executable and how the linker can help us with this task.