Research

Our modern society is fueled by a $699B software industry, which faces a demand so strong that it cannot be satisfied by training more developers, it requires more effective software delivery. My research investigates two aspects of this:

I study the software supply chains, which facilitate the software creation and have been the productivity driver of the past decades.
I investigate the development and operation of complex software systems to improve developer productivity and their ability to produce better software.

In the following, I detail current results and the future direction of my research along these two lines.

Software Supply Chains

The times of fully integrated software monoliths is long gone, recent studies have shown that, nowadays, 96% of proprietary software uses open-source. Large ecosystems like Maven and NPM keep growing exponentially and build the backbone of a plethora of applications. Recent incidents have illustrated the attackability of our supply chains and the fragility of our popular ecosystems. My research investigates the foundations of software ecosystems and means for better protection:

Software Ecosystems

Even politics identified the role that software ecosystems play in innovation and value generation in the economy and the risk of insecure supply chains: The US executive order #14028 was issued to Improve the Nation’s Cyber Security and the EU Cyber Resilience Act has the same direction.

I strongly believe that we need to start treating these ecosystem like essential infrastructure and care more about their maintenance and evolution. I have already started studying this both from the developer and consumer perspective: We have investigated how library repositories like Maven are used and which quality aspects could limit maintainability. We also identified the typical dimensions of App Stores and how they could inform future research in this area. Our insights allowed us to support reproducibility initiatives with novel, fully-automated approaches to make the ecosystems more trustworthy. We also shed light on how software ecosystem are threatened by vulnerability propagation from dependencies.

Moving forward, I want to extend my studies on supply chains and 1) widen these endeavors to other ecosystems beyond Maven and 2) deepen these studies and devise actionable validation models that can secure ecosystems using key metadata like vulnerabilities, license compatibility, or quality attributes.

Dependency Management

Software ecosystems facilitate the release and reuse of proven solution in the form of libraries. Package managers allow to define dependency relations, which can be constrained to guide the resolution, e.g., compatible version ranges or version exclusions. In most ecosystems, only one version of a specific library can be loaded, which makes the resolution process an NP-complete problem. On the backend side, a recent study of trends in package managers has shown that a use of SAT based approached proves critical advantages over other state of the art approaches. On the user side, a proper use is challenging, regardless of whether a package manager is designed for usability or stability. Less constraining systems make systems more appealing and easy to get started, but introducing more constraints allows to find better resolutions. Semantic Versioning presents a convention that allows users to communicate compatibility in the version number. While theoretically useful, it is is difficult to follow for developers, hard to analyze and as such rarely enforced, and overall rather unreliable.

Moving forward, I will work on novel dependency management mechanisms. I want to replace manual semantic versioning with an automated compatibility detection that can widen the dependency specifications from concrete versions to compatible ranges. A constraint-based resolution can then resolve better (e.g., newer) dependency sets, while simultaneouly also considering additional factors like vulnerabilities or licenses.

DevOps

Software has a lifecycle that consists of several phases. In my research I have always focused on two things: the development and release of a new system and the operation and maintenance afterwards. The following topics reprent my current research interest in this direction:

Continuous Software Development

It is important to follow a principled development approach, which has proven to increase efficiency. During my PhD, I have extensively studied the development practices of individual developers in their IDE to understand usage patterns and to validate recommender models. However, software is developed by teams, so I pivoted to holistic analyses of the team processes. I am most fascinated by continuous integration and delivery (“CI/CD”), which has benefits for productivity and efficiency that are widely accepted in academia and research. Unfortunately, we found that people deviate from these principles over time and loose the promised benefits. We were able to counteract this decay and improve developers understanding and shortcomings of their development process. Projects can embed static analyses, for example, in their build process, and use the outcomes to inform decisions about the software creation. We found ways to speed up such analyses through incrementalization. I think that a central build server is also the perfect connection point between development practices and the research on supply chains that has been introduced earlier. When provenance is the issue, an automated build with validation and signing is key to the solution.

Moving forward, I will investigate how project context affects CI/CD, and how these projects evolve and mature over time. Current CI/CD literature is opinionated and the postulated best practices are tailored to very specific contexts. However, not every project builds web applications and not every company is Google, so a CI/CD model is required that provides a more differentiated view on diverse project backgrounds.

Operating Complex Applications

In contrast to the static checks that are possible during development (or later, in recurring checks of the deliverables), a monitoring or running software can capture dynamic application behavior, for example, for intrusion detection using generic usage data. I am convinced that using application specific data like vulnerability in the monitoring of complex applications will open new opportunities for system operators. We have introduced a system that provides a Kubernetes dashboard that allows system operators to monitor vulnerability execution in their live systems and to dynamically disable execution to substantially cut the reaction time that is currently required to react to new vulnerability exposures. My experience in this area is based on my previous work as a system administrator, which I have leveraged to establish and operate a Kubernetes-based research cluster, which is used by students and project partners of my research group to run experiments. The experience also substantially impacted the design of my MSc course on Release Engineering, which covers topics around Containers and Orchestration.

Moving forward, I would like to extend this part of my research. While the low-level networking is certainly interesting, I want to focus on topics on the software engineering level. I find it most important to improve the Observability of running systems. I am interested in devising better models for sharing runtime information and building tools that support developers when debugging the most complex systems built by humankind. The most immediate extension of my previous work in this area will be an extension of the vulnerability monitoring that allows to detect and prevent active exploits of deployed vulnerable components through instrumentation and dynamic analysis. All of this is also closely related to Software Architecture, which heavily influences the creation of these systems using proven patterns and implementation strategies.

Sebastian Proksch