That is the second episode of the CSI Container sequence, printed and offered at CloudNativeSecurityCon 2024. On this episode, we give attention to Kubernetes CSI, methods to conduct DFIR actions on K8s and containers, and methods to carry out static and dynamic evaluation.
As we coated within the first episode, DFIR refers back to the union of Digital Forensics (DF) and Incident Response (IR). We additionally highlighted how conducting DFIR actions in a container atmosphere differs from the same old DFIR in a bunch atmosphere. As a result of peculiarities of containers, particular instruments are required to function successfully.
On this article, we are going to revisit the Kubernetes function often called k8s checkpoint, which we’ve got mentioned beforehand. We’ll display how it may be automated utilizing Falco elements, enabling us to create container snapshots which are invaluable for Digital Forensics and Incident Response (DFIR) evaluation.
Automating K8s checkpoint
As we coated in a separate weblog, the Container Checkpointing function permits the checkpoint of a operating container. This implies it can save you the present container state to probably resume it later with out shedding any details about the operating processes or the saved knowledge.
Despite the fact that the function remains to be within the early phases of improvement and has totally different limitations, it’s very fascinating for our DFIR use case. What if we are able to use this function to snapshot a container state and restore it to a sandbox atmosphere to proceed with our forensics evaluation?
The primary drawback we have to face is that containers are ephemeral. To have the ability to snapshot a container, it must exist. As well as, we wish to snapshot the container as quickly as attainable in the course of the assault, so we are able to monitor it extra after we restore it. Subsequently, the next Kubernetes response engine matches our use case completely.
Utilizing Falco, Falcosidekick, and Argo, we are able to arrange a response engine able to taking motion. On this case, its most important purpose is to carry out a K8s checkpoint as quickly as a particular extremely malicious Falco rule is triggered. The checkpoint can then be used for additional evaluation.
Actual-world situation
To know its habits, let’s study the automation in motion in a real-world situation.
On this situation, on the offensive aspect, we’ll play with a well known chatbot, significantly an IRC chatbot that, as soon as downloaded and executed within the impacted container, will hook up with a identified C2 server. If you wish to know extra, Github hosts a variety of Perl-bot samples. Despite the fact that these can seem like outdated strategies, in recent times, many campaigns have been harvesting totally different containerized providers.
On the defensive aspect, as a substitute of detecting malicious exercise, we’ll give attention to figuring out malicious connections to well-known IPs utilizing the next Falco rule:
- record: malicious_ips
gadgets: [‘“ip1”’, ‘“ip2”’, …]
- rule: Detect Outbound Connection to Malicious IP
desc: This rule detects outbound connections to identified malicious IPs in accordance with risk intelligence feeds. Interactions with such machines might compromise or injury your methods.
situation: >
(evt.sort in (join) and evt.dir=<
and fd.web != "127.0.0.0/8" )
and container
and fd.sip in (malicious_ips)
output: An outbound connection to %fd.sip on port %fd.sport was initiated by %proc.title and consumer %consumer.loginname and was flagged as malicious on %container.title resulting from Risk Intelligence feeds
tags: [host, container, crypto, network]
Code language: Perl (perl)
By downloading and executing the malicious Perl-bot script, we are able to see how the Kubernetes response engine is triggered and the way the checkpoint of the compromised container is accurately carried out.
By default, the checkpoint tar file is saved into the Kubernetes node’s filesystem that hosts the impacted container. Nonetheless, in a extra sensible situation, we must always take into account transferring the checkpoint archive to a safer location, similar to a cloud bucket or exterior storage. Do not forget that if a container has been compromised, the attacker may need moved laterally on the host, so leaving the file within the host filesystem may not be the neatest alternative.
DFIR evaluation
Now that the container checkpoint is prepared, we are able to use its information to analyze and perceive what occurred in the course of the assault and the attacker’s objectives.
We will feed our static and dynamic evaluation utilizing the container checkpoint archive. The next information are within the container checkpoint tar file.
For static evaluation, the modified information within the container filesystem can be very useful, particularly through the use of the binaries or scripts dropped by the attackers. For dynamic evaluation, restoring the container and analyzing the execution with correct instruments can be very efficient in understanding the meant habits.
Let’s begin the evaluation utilizing the real-world situation reported above and transfer on with the investigation utilizing the beforehand obtained checkpoint.
Actual-world situation: Static evaluation
The very first thing we are able to do for static evaluation is to test if the attacker leaves binaries or scripts within the filesystem. For the reason that checkpoint was completed a couple of seconds after the attacker ran the binary, that is very doubtless.
As we’ve got seen within the screenshot above, the container checkpoint contains the rootfs-diff.tar archive, which incorporates the information that have been modified within the beforehand checkpointed container in comparison with the bottom picture:
The file perlbot.pl appears to be like fascinating, and we are able to hold the file for additional static analyses and reverse engineering, making use of all of the extensively identified strategies and instruments that the forensics world provides.
Another choice that we’ve got is utilizing checkpointctl. This software permits us to dig deeper into the checkpoint we’ve got beforehand obtained.
Specifically, we are able to examine what was within the checkpointed container by trying on the course of tree. On this case, for instance, we are able to simply see the TCP connection in place with the C2 established by the malicious [systemd] course of.
We will additionally see the container reminiscence when the container has been checkpointed and search for fascinating patterns:
For instance, on this case, we are able to simply determine extremely suspicious strings and messages exchanged among the many bot and the opposite machines related to the identical IRC channel.
Moreover, checkpointctl can shortly assist us determine container mounts that would have been assigned to the container and possibly abused by the attackers to escalate their privileges into the cluster.
On this case, the one fascinating mount was the Kubernetes service account connected to the Kubernetes pod’s container, and this might have given the attackers entry to the Kubernetes API server and possibly even the entire cluster. Nevertheless, on this situation, that was the default serviceaccount and its permissions have been very restricted, so we received’t go into element on that.
Nevertheless, greatest practices suggest that in case of delicate mounts noticed within the impacted container, the investigation ought to go extra in-depth, enlarging the scope to the entire cluster or the internet hosting Kubernetes node.
One other software in our arsenal for static evaluation is CRIT, which analyzes the CRIU picture information saved within the checkpoint archive. Utilizing these, we are able to get hold of outcomes just like those we’ve got seen with checkpointctl. So, for instance, we are able to get the method tree, present information utilized by duties, and even retrieve reminiscence mapping information.
> crit x checkpoint ps
PID PGID SID COMM
1 1 1 tini
7 7 1 sudo
20 7 1 jupyter-lab
77 77 77 bash
102 100 77 [systemd]
Code language: Perl (perl)
The content material saved within the checkpoint may be actual gold for our investigation. For instance, by studying the uncooked reminiscence pages, it’s attainable to take a look at atmosphere variables and execution outcomes associated to the malicious course of.
Right here, for instance, we retrieved the exchanged messages between the sufferer bot and server, printing out the output associated to the binary execution.
This may give us an concept of what was executed on the impacted container. Nonetheless, it might additionally straight level out which messages have been despatched to the sufferer and which instructions have been requested by different machines related to the identical IRC channel.
Setting the stage for dynamic evaluation
If we wish to proceed with the dynamic evaluation, we are able to begin restoring the checkpoint that was completed earlier than in a particular and closed atmosphere to research the malware and monitor its habits.
Earlier than continuing, it’s essential to notice the constraints of the present checkpointing and restoring options. Whereas containers may be checkpointed and restored elsewhere, utilizing the identical container engine and CRIU variations on each affected and evaluation machines for smoother restoration is strongly beneficial. As of this writing, this function wasn’t built-in into containerd and remained unreliable on some interfaces like crun, so we relied on CRIO and runc for a extra dependable course of.
That stated, how can the restoring course of be achieved?
The very first thing we wish to do is to maneuver the beforehand obtained checkpoint archive into secure storage. This greatest follow means that you can hold the proof secure, making certain you’ll at all times have the possibility to depend on a backup in case the unique checkpoint obtained misplaced, deleted, or tampered with.
Then, we are able to construct a brand new container picture from the beforehand checkpointed container archive utilizing buildah utility. This step may also be automated, extending the beforehand described response engine. Nevertheless, normally, the image-building course of may be achieved as follows:
newcontainer=$(buildah from scratch)
buildah add $newcontainer /var/lib/kubelet/checkpoints/checkpoint-<pod-name>_<namespace-name>-<container-name>-<timestamp>.tar /
buildah config --annotation=io.kubernetes.cri-o.annotations.checkpoint.title=<container-name> $newcontainer
buildah commit $newcontainer checkpoint-image:newest
buildah rm $newcontainer
buildah push localhost/checkpoint-image:newest container-image-registry.instance/consumer/checkpoint-image:newest
Code language: Perl (perl)
…the place the /var/lib/kubelet/checkpoints/checkpoint-<pod-name>_<namespace-name>-<container-name>-<timestamp>.tar
is the situation the place the checkpoint was written to disk.
By doing this, we are able to push our new container picture to our container registry in order that we are able to later pull and run it into different machines.
Having constructed the container picture from the container checkpoint, it’s time to revive it into a totally separated Kubernetes cluster, the place we are going to reproduce the beforehand frozen container by deploying it as a easy pod. Here’s what our yaml template will seem like:
apiVersion: v1
sort: Pod
metadata:
title: restored-pod
spec:
containers:
- title: <container-name>
picture: <container-image-registry.instance/consumer/checkpoint-image:newest>
Code language: Perl (perl)
…the place the picture is strictly the one we’ve got beforehand pushed to our container registry.
As soon as we apply that yaml file, we are able to see that the newly restored pod is now operating. By opening an interactive shell into the container, we are able to see precisely the identical course of tree we had earlier than, with the identical PIDs.
Much more surprisingly, the connection to the IRC bot channel was restored too. Right here you possibly can see that after our container was restored, it routinely related again to the IRC server with the identical bot nickname it had earlier than it was checkpointed, as if we had simply resurrected the execution we had beforehand frozen.
This situation clearly reveals the potentiality of container checkpointing and restoring. However it additionally permits us to breed and analyze the malicious execution in a separated and restricted atmosphere, the place we are able to undertake a extra proactive and forensic method.
Actual-world situation: Dynamic evaluation
Earlier than digging into the small print of dynamic evaluation, it’s important to emphasize the most effective practices to implement in such eventualities and the necessities wanted.
To soundly reproduce malicious habits in machines, it’s essential to determine sturdy constraints, like stopping container escapes or privilege escalation. Correct machine settings should be configured, delicate info should be locked, and constraints should be verified for efficient forensics. Moreover, utilizing the precise instruments is crucial for dynamic evaluation and gaining low-level insights into the occasions taking place on the machine.
Instruments like Wireshark, Sysdig open supply, strace, and others can let you see all of the occasions. Having the exhaustive seize and assortment of what occurred at your disposal can lead you to the precise path to resolve the investigation and provide help to spot the small print of any assaults.
In our case, we used Sysdig open supply to report syscall captures whereas the container was operating. By accumulating a seize for the mandatory period of time, proper after the container was restored, it’s attainable to spy the malicious executions occurring inside the container.
With that completed, having the seize at our disposal, we later used Logray to shortly filter the occasions and punctiliously analyze what occurred in the course of the malicious execution. For these of you who haven’t heard about Logray, it’s Wireshark’s cousin. It is ready to study syscall captures completed with Sysdig open supply, simply as Wireshark is ready to examine community packet visitors.
Nevertheless, they’ve the identical UI and the identical filtering logic that ought to sound acquainted to most of you.
Right here, for instance, we dug into the execve syscalls. This allowed us to see all of the instructions requested by the attackers, chatting with our impacted restored container.
Proper after that, we inspected community traffic-related occasions. Right here, we are able to see how the beforehand requested instructions by the attackers are adopted by the reply to the sufferer container. These outbound community packets have been completed by the sufferer container to ship the arbitrary command’s outcomes again to the attackers. Specifically, the outcomes of `id`
and `ls /`
.
Ultimately, for the reason that attacker additionally requested to carry out a portscan of a particular IP deal with, we filtered the occasions by trying on the IP concerned. Listed below are all of the associated syscalls that present how the portscan command was carried out by the engaged Perl bot.
Here’s a fast recap of the instruments used in the course of the investigation.
Instrument | Motive |
Checkpoint Automation | |
Falco + Falcosidekick | Runtime detection+notification software |
Argo | Open-source Kubernetes native workflows, occasions, CI, and CD |
criu | Gives the checkpoint/restore performance |
Dynamic Evaluation | |
wireshark | Generate community captures and community evaluation |
logray | Seize occasion evaluation |
sysdig, strace, and many others.. | Generate occasion/syscall captures |
tcpdump | Packet analyzer software |
htop | Interactive course of viewer software |
Static Evaluation | |
checkpointctl | In-depth evaluation of container checkpoints software |
crit | CRIU picture information analyzer software |
criu coredump | Convert picture information into coredump |
gdb (or related) | Binary evaluation software |
Conclusion
On this article, we coated a brand new analysis subject, displaying you the way the container checkpoint/restore functionalities may be utilized within the forensics subject. Specifically we’ve got seen how the container checkpoint may be created routinely utilizing the Kubernetes response engine that depends on few malicious guidelines, and in addition methods to deal with the newly created checkpoint archive.
With that completed, we offered alternative ways to dig deeper utilizing the beforehand created checkpoint: the static evaluation, adopting some old-school strategies or instruments particularly conceived for the container checkpoint, but in addition the dynamic evaluation, protecting some greatest practices and sensible hints to extract the assault’s particulars.