Botnet Command & Control Detection in IoT Networks

(1)

Botnet Command & Control Detection in IoT Networks

Najwa Laabid

Master’s thesis

School of Computing Computer Science

July 2021

(2)

university of eastern finland

, Faculty of Science and Forestry, Joensuu School of Computing

Computer Science

Laabid, Najwa: Botnet Command & Control Detection in IoT Networks

Master’s thesis, 65 p.

Supervisors: Prof. Pauli Miettinen, Dr. Gergely Matefi and Mr. Joel Reijonen July 2021

Abstract: Botnets, an overlay network of infected devices controled by a third-party user, are the biggest threat in today’s Internet. Multiple methods have been developed to identify infected devices, Command and Control (C&C) servers, or bot networks based on network traffic traces. Ideally, botnet infections should be detected before the infected devices cause any harm (e.g., data theft or service disruptions). This can be achieved by focusing on detecting the C&C communication between an infected device and the controlling entity. The ubiquity of IoT devices has caused the emergence of botnet malware specifically targeting Linux/ARM platforms, a popular architecture among embedded systems. The network behavior of these recent IoT botnets did not receive yet extensive research attention, despite the damage caused by the malware.

This thesis aims to bridge this gap by proposing an empirical study of the C&C communication of recent IoT botnet malware. We identify three patterns of C&C communication:

centralized-beaconing, centralized-stochastic, and decentralized-Distributed Hash Ta- bles (DHT), with the centralized-beaconing accounting for the majority of observed malware families. An online detection system, based on a Random Forest classifier with statistical aggregates of flow-based numerical features, is proposed to detect each pattern. The system is evaluated on benign data retrieved from IoT networks managed by Ericsson Finland, malware traces collected in the empirical study, and simulated data based on the main characteristics of each pattern. Experiments show that the detection system handles the decentralized-DHT variant well, but is overwhelmed by false positives with the centralized patterns at low infection ratios with highly regular background traffic.

Keywords: IoT; Botnet; Command&Control Detection

ACM CCS (2012)

•Security and privacy→Intrusion/anomaly detection and malware mitigation;

(3)

Acknowledgments

I would like to thank Ericsson Finland for giving me the opportunity to gain hands-on experience in network security through this project. I particularly appreciate the support and guidance of my manager Jarno Kyykkä, my supervisor Joel Reijonen, and my colleagues Juha Eskonen and Adam Peltoniemi in everything from working with the company’s internal systems to applying general data science tools to network analysis.

Furthermore, this work would not have been possible without the thorough supervision of Prof. Pauli Mietinen and Dr. Gergely Matefi. Dr. Matefi’s extensive knowledge on network management and malware analysis provided practical support especially in the data collection stage, as well as general guidance on all technical matters of the thesis.

Prof. Pauli Miettinen’s academic expertise and knowledge of data analysis tools assisted with the detection logic and reporting coherence. I am grateful to both for their time and effort, without which this work would not have been completed as it is.

I would also like to thank my friends and family whose love and encouragement carried me through the most stressful phases of this thesis.

(4)

1. Introduction

Botnetrefers to an overlay network of devices instructed by one or many users (known asbotmasters) to carry on malicious activities like data theft, spamming, phishing, or Distributed Denial of Service (DDoS) attacks (Silva et al., 2013). Initially infecting IT devices like phones and computers, botmasters quickly discovered the massive computational potential and large accessibility of Internet of Things (IoT) devices.

Botmasters thus began accumulating flocks of infected IoT gadgets at a rate never seen before (I. Ali et al., 2020). The resulting IoT botnets have the ability to take down victim servers with DDoS attacks of an unprecedented power (Woolf, 2016). Consequently, the surge of IoT devices, which is expected to accelerate over the years, makes mitigating botnets an ever pressing matter (Lueth, 2018).

Botnets use different technologies and designs, but they have one defining feature in common: the presence of a Command and Control (C&C) setup through which the botmaster creates and controls a botnet (García, Zunino, et al., 2014). Many detection efforts focus on identifying the communication between an infected device and a C&C center (Bilge et al., 2012; Gu, Perdisci, et al., 2008; Gu, Zhang, et al., 2008). This type of detection enables timely mitigation of the infection, preferably before the device causes any harm. Such methods seek to profile the behavior of a device through a statistical analysis of its traffic traces (Xing et al., 2021). The profiling is generally built on observations about the C&C architecture of the botnet and assumptions on its future functioning (Bilge et al., 2012). With the continuous development of botnet malware, their C&C architectures keep evolving to become more resilient and adapted to various purposes and networks (Silva et al., 2013). It is thus important to continuously monitor the botnet malware landscape to re-evaluate the assumptions of existing detection systems.

Specific terminology is used to describe botnet components. Abotmasteris the user operating the botnet and sending commands to the devices (Khattak et al., 2014). Abot (sometimes also calledzombie) is the infected device (Puri, 2003). A group of zombies form abotnet, which designates the whole network structure (Puri, 2003). Command

(8)

and Control (C&C)is the channel through which a botmaster controls the botnet. The C&C channel can either be a server or a another infected machine (i.e., apeer). In this report, we use the words peer and server interchangeably to designate an entity with which a device is communicating since we do not seek to distinguish between servers and regular machines.

The contribution of this thesis is three-folds. First, we propose an empirical analysis of the C&C architectures and corresponding communication patterns observed in the traffic captures of recent IoT botnet malware, through a representative sample of 14 malware families observed in thousands of honeypots between March and April 2021. Second, we describe processes to simulate traffic mimicking the main communication patterns observed in the empirical study for further analysis and training of detection models.

Third and last, we propose a detection system based on a Random Forest model and a set of statistical features characterizing an infected device or a malicious channel. The features are used to train and evaluate three detection modules using real and simulated malware traces and real benign traces from Ericsson managed networks. The detection modules classify traffic aggregates, then alerts from these classifications are grouped per entity (device or peer). A security analyst can define a per entity threshold to raise the alarm on malicious components beyond a certain number of accumulated aggregation alerts.

1.1 Research Questions

This thesis focuses on studying recent botnet malware targeting Linux platforms typically used by IoT devices. The goal is to study the C&C communication patterns of the most popular malware families at the time of the study (March to April 2021), and propose detection methods catering for each pattern. Specifically, we would like to answer the following questions:

1. What are the main patterns in the C&C communication of current IoT botnet malware?

(a) How do specific malware families implement their C&C communication?

(b) Are these patterns different from what was studied for IT-based botnet malware?

2. How well can we detect the C&C traffic of each pattern?

(a) What features allow the detection of each pattern?

(9)

(b) How well does the detector perform with background traffic of various characteristics?

Question 1 is addressed through an empirical study in which we analyze traffic traces of malware retrieved from different honeypots. Chapter 4 explains the data collection procedure for the study, while Sections 6.1, 7.1, and 8.1 present key observations on the main communication patterns observed in the collected malware.

Question 2 is answered by building a detection system with a module tailored for each communication pattern. Chapter 5 presents a generic overview of the various elements at play in this system, while Chapters 6, 7, and 8 describe the details of specific modules.

The rest of the chapters present the background of botnets and IoT networks (Chapter 2), a literature review on botnet detection techniques (Chapter 3), and a discussion on the limitations of the system and other practical considerations (Chapter 9).

1.2 Design of the Detection System

Our proposed system is meant to detect Command and Control (C&C) traffic (also known as botnet control activity) in industrial IoT networks. We focus on this type of traffic (as opposed to attack activity) since it allows for early mitigation of the infection (Gu, Zhang, et al., 2008). Attack activity, such as DDoS, alters the traffic enough to raise suspicions in carefully monitored environments, like industrial networks (Alahmadi et al., 2020). Furthermore, attacking bots also maintain C&C communication, which makes their detection a subclass of this problem.

In industrial networks, we can observe the traffic from and to monitored devices. We cannot observe the interactions of all components of a botnet, as is the case sometimes with Internet backbone networks (Bilge et al., 2012). Instead, we want to know, from the traffic behavior of a monitored device, if it is talking to a C&C entity or not. This means classifying either the device itself or its peers as malicious or infected. Once a suspicious entity is flagged, mitigation measures can be deployed to neutralize the bot or take down the server.

From a practical point of view, the system is meant to be used by a security analyst in real time, as a complement to other anomaly or malware detection systems. The idea is to have a monitoring interface presenting the system’s predictions for each monitored device or peer, and raise an alarm once a predefined number of alerts have accumulated. The alarm threshold can be manually set by the security analyst depending

(10)

on the characteristics of the network, or optimized automatically. Ultimately, for a given threshold, the system is supposed to identify all malicious entities and return a manageable number (if any) of false alerts for the analyst to check manually.

To achieve these goals, we build detection modules (henceforth detectors) to classify aggregated traffic flows between every device and all of its peers (i.e., for every observed communication channel in the network). Flow aggregate classification allows the system to operate in near real time, assuming an aggregation window(i.e., the time span over which traffic features are aggregated) shorter than 5 min. By counting the positive aggregates per channel or device, we can monitor the behavior of devices over time and only flag them once the number of alerts is above a certain threshold.

This setup, as opposed to raising alerts based on aggregate classification only, helps mitigate misclassifications of the system and directly identify an entity to neutralize.

Identifying the malicious entity itself depends on the network behavior of the monitored device. If a device talks to mutliple peers with high regularity, these peers might appear as false positives and delay the distinction of the real malicious address based on alert accumulation. In that sense, the peer detection does not happen in real-time, but rather aggregate classification allows real-time monitoring of the network. It is up to the security analyst to set the threshold value at which to raise alerts, which includes a tradeoff between fast and accurate detection. If a non-infected device happens to exchange C&C-like flow aggregates occasionally with a benign peer, the system is unlikely to flag either component as suspicious over time. Similarly, the system can still identify a malicious component despite having some false negative classifications.

Figure 1.1 gives an overview of the system’s use in a real context, while Figures 1.2 and 1.3 give an example of a monitoring interface showing the accumulation of the model’s predictions per peer or per hour and peer. In terms of scalability, we expect the system to handle networks with about 10 000 devices and an average of 1 000 peers per device per day. We can summarize the system requirements as follows:

1. Perform detection online, and allow identifying malicious components (bot or peer) as quickly as possible.

2. Work with devices (and networks) of different behavioral characteristics.

3. Scale to network sizes of around 10 000 devices with an average of 1 000 peer per device (i.e., a total of 1 000 000 active channels in the network per day).

4. Keep the number of reported false positives below a manageable threshold.

(11)

Data

Flows per traffic group

Features

Stats. of Aggregated Flows

Pre-trained Classfier

Random Forest

Benign Aggregates Malicious

Aggregates Distribution of Aggregates

per Peer

Aggregates per Peer:

Hourly View Security Analyst

C&C server

False Positive False

Positive Sets thresholds for

alerts and manually flagged entities

Real-time Monitoring System Network Probe IoT Devices

Figure 1.1: Overview of the detection system in a real context.

x.x.x.x peer1 peer2 peer3 peer4

peers

0 1000 2000 3000 4000 5000 6000 7000

# of malicious traffic aggregates

Figure 1.2: An example of a peer monitoring graph. Bar labeled x.x.x.x corresponds to the simulated C&C server.

(12)

0 25 50 75 100 125 150 175 200

Peer: x.x.x.x

11 12 13 14 15 16 17 18 19 20 21 22 23

hours

0 25 50 75 100 125 150 175 200

Peer: peer1

Figure 1.3: An example of a per hour monitoring graph. The simulated C&C peer x.x.x.x shows a higher number of accumulated malicious aggregates compared to the benign peer peer1.

(13)

2. Background

2.1 Overview of Botnets

Botnets are a group of infected devices instructed by an operator to perform (often criminal) tasks at a large scale (Negash & Che, 2015). The availability of permanently connected and vulnerable devices on the Internet made botnets reach incredible sizes and cause billions of dollars in damages (Sivanathan, 2019). Accroding to Kount (2020), two out of three companies estimate botnet related damages at 100 000 $ per attack. For example, at the height of its activity, Storm was estimated to control between 250 000 and 1 000 000 devices according to an analyst from F1-secure, the company credited for giving the malware its name (Garretson, 2007). The increasing number of malware variants and the widespread geographical distribution of their C&C servers impede any timely mitigation efforts (European Union Agency for Cybersecurity, 2020). Figure 2.1 presents some facts about the distribution of C&C servers. Furthermore, with the continuous development of botnet architectures and modes of operation, no detection method is likely to remain relevant for very long (Negash & Che, 2015). Continuous effort should thus be maintained to study the most recent trends in bot malware and upgrade the state-of-the-art security tools in use.

A defining property of botnets is the concept of a command channel which transforms isolated devices into a super-computer capable of launching cyberattacks at scales never seen before (Negash & Che, 2015). Different implementation options exist for C&C channels, with various tradeoffs in resilience and ease-of-operation (see Section 2.4). At the same time, research interest in botnets began in the late 90s, with publications on the topic increasing exponentially over the years. In this chapter, we reviewed surveys from the early days of botnets (Puri, 2003), the expansion stage when P2P botnets became popular (Khattak et al., 2014; Negash & Che, 2015; Silva et al., 2013), and the most recent studies covering new architecture types and IoT specific malware (Cozzi et al., 2018; Edwards & Profetis, 2016). We also reviewed the most recent threat reports from

(14)

(a)Number of Mirai variants in recent years.

(b)Geographical distribution of C&C servers.

(c)Number of C&C servers in recent years.

Figure 2.1:General information about the number of Mirai variants (a), the geographical distribution of their C&C servers (b), and the number of C&C servers in recent years (c) Note the 57 % increase in the number of C&C servers between 2018 and 2019. Source:

European Union Agency for Cybersecurity, 2020.

Phase 1:

Initial and Secondary Infection

Phase 2:

Connection or Rally

Phase 3:

Malicious Activity

Phase 4:

Maintenance and Upgrade

Figure 2.2: Botnet life cycle.

key industry players for most up-to-date statistics and information (European Union Agency for Cybersecurity, 2020; Kount, 2020; Spamhaus Malware Labs, 2019).

2.2 Botnet Life Cycle

During their existence, botnets go through a series of phases known as thebotnet life cycle(see Figure 2.2). The infectionstage (phase 1), sometimes divided into initial and secondary infections, is when the bot first receives the malware code. Shortly afterwards, the infected host tries to make itself known to its botmaster or peers through a process known asrallying. This is also the step in which the device establishes a C&C channel used to receive commands which are executed in the third phase of the

(15)

cycle: malicious activity. Lastly, the botmaster maintains and upgrades its fleet by dispatching software updates in themaintenance and upgrade phase(Silva et al., 2013).

The following paragraphs describe each phase in turn. Phase 2 is discussed in greater detail in Section 2.3, since it is closely related to C&C communication.

Infection. Infection with botnet malware happens similarly to any computer virus infection: through unwanted downloads of malware files from an online source (Angrishi, 2017). Said source can be a suspicious website, infected email attachments, removable disks, or another bot scanning for vulnerable devices nearby (Silva et al., 2013). For IoT devices in particular, the infection exploits weak credentials (usually left as defaults), or public-facing services like webservers (Sivanathan, 2019). Once the initial malware file is downloaded, the host connects to a network database to download and install the malware binaries, in a step sometimes known as secondary infection (Silva et al., 2013).

Those binaries are acquired using HTTP, P2P, or FTP protocols (Silva et al., 2013).

If the binaries’ database is the same as the C&C server, the secondary infection stage maybe the same as rallying (Silva et al., 2013).

Rallying. Rallying, sometimes known asconnection phase, refers to the step at which a bot finds its way the C&C server (Khattak et al., 2014). It happens every time the bot is restarted to ensure its continuous integration to the botnet. This also means that the action may be observed multiple times in the life cycle of the bot. Rallying relies on hard-coded IP addresses, DNS-based methods, or more recently communication through Twitter, Facebook, and Github (Xing et al., 2021). Technical details of each option are discussed in Section 2.3.

Malicious Activities. Malicious activities carried out by botnets are intended to gain benefits for the botnet operator, in the form of financial profit, intelligence collection, or service disruptions. IRC chats, the technology which gave birth to botnets, was initially intended for benign use (Puri, 2003). Today, botnets are used to send unwanted emails in bulk (spamming), deplete the bandwidth of a target server (Distributed Denial of Service (DDoS)), fake clicks on web advertisements to generate income (click-fraud), steal information from the victim (exfiltration), or mine bitcoins (coinmining), to only mention some potential cyber crimes (Silva et al., 2013). Despite the fact that some of these attacks can be blocked at the destination (e.g., using email spam filters), the activity is still allowed to travel across the backbones of the Internet, wasting network resources in the process (e.g., Kount (2020) reported that botnets account for 40 % of all Internet traffic). Botnets are continuously evolving in their usage and architecture,

(16)

as reported by European Union Agency for Cybersecurity (2020). Still, to this day, DDoS attacks are some of the most common uses of botnets, with known companies like Amazon, CNN, and EBay at the receiving end of the attacks. Beside attacks, bots performscanningto propagate their malware to as many devices as possible. Botmasters mostly look for victims with favorable features, like high transmission rates, low levels of security, low monitoring rates, and distant locations (Puri, 2003).

Maintenance and Upgrade. Maintenance and upgrade is the stage at which the botmaster updates the code of the bot to evade detection, increase functionality, or migrate to another C&C server (Negash & Che, 2015). As an example, upon its discovery, Hajime did not contain any malicious acitivity code in its binaries (Edwards

& Profetis, 2016). This led some analysts to call the malware awhite worm, which is a type of malicious program aiming to increase the security of the infected device to prevent other malwares from gaining access to them (e.g.,Hajimeblocks ports 23, 7547, 5555, and 5358, known to be exploitable on IoT devices) (Muncaster, 2017). However, the fact that the bot can receive updates with more malicious code later on warrants attention and close monitoring of the progress of the malware (Edwards & Profetis, 2016).

2.3 Rally Systems

Rallying refers to the many ways an infected device can join the C&C channel. Below we review the main rallying options known to date.

Hard-coded IP addresses. Hard-coded IP addresses are the most straightforward rally alternatives. The list of IPs is usually given with the malware binaries. The IPs could refer to the C&C server directly, or point towards an intermediary server (sometimes known asstepping stone) used as a proxy to the command server (Silva et al., 2013). In the case of P2P botnets, the list contains the addresses of an initial set of peers which the newly infected bot will try to reach iteratively. This list is usually provided in a separate location than the malware binaries, cleverly disguised in the infected machine’s system with an elusive name (e.g., Kelihos/Hlux stored its peer list in a Windows registry underHKEY CURRENT USER/Software/Google (Werner, 2011)). This operation is sometimes known asseeding(Khattak et al., 2014). Using IP addresses directly makes the C&C communication stealthier since it does not rely on DNS services. This option

(17)

is however the easiest to neutralize since the servers can be identified directly (Silva et al., 2013).

DNS-based rallying. DNS-based rallying is a more resilient model in which the infected host gets the IP address of the server by resolving one or many domain names using DNS queries (Xing et al., 2021). This allows the botmaster to generate multiple IP addresses associated with the C&C server (Xing et al., 2021). To increase resilience, the DNS servers can be spread out geographically in countries not ready to collaborate to take down cyber criminals (Negash & Che, 2015). Domain names can be hard-coded or dynamically generated. In the first case, the domain name is provided alognside the malware binaries much like an IP address, with the added flexibility of changing the IP address frequently to evade detection without having to update the bots (Khattak et al., 2014). An extreme version of this updating is known asfast-fluxDNS, in which the IP address associated with the domain name changes every few minutes, making the server difficult to track (Negash & Che, 2015). Generated domain names are created by the hosts themselves, using a Domain Generation Algorithm (DGA) agreed upon with the botmaster (Khattak et al., 2014). DGAs generate encrypted strings in a time-dependent manner (using a time component in the encryption process), or time-independent manner (using other seeds like semi-random sequences or foreign exchange rates for example) (T. S. Wang et al., 2017). This allows the creation of domain names with varying patterns that are difficult to identify (Zago et al., 2020). Mitigation efforts in this scenario include reverse-engineering the encryption algorithm in order to predict and register the future domain names before the botmatser does (Zago et al., 2020).

2.4 C&C Architectures

In terms of architecture (also known as topology), C&C environments can be divided into four categories: centralized, decentralized (or Peer2Peer (P2P)), hybrid, and modern platforms (Silva et al., 2013; Xing et al., 2021).

Centralized. Centralized architectures have one or a few C&C servers to which all bots must connect (Negash & Che, 2015). The advantages of this type of architecture include quick reaction times, good coordination across bots, and the ability to collect direct feedback from the botmaster to the bots and vice-versa (Xing et al., 2021). The drawbacks of the architecture is the fact that the C&C server(s) represent a single point of failure for the botnet (T. Wang & Yu, 2009). In terms of protocols, centralized botnets

(18)

often rely on IRC channels or HTTP protocols. The former option dates back to the early days of botnets and has the additional weakness of being unusual in today’s networks, therefore easy to detect and block (Silva et al., 2013). HTTP is nowadays often chosen as a default protocol for centralized botnets due to its ubiquity and consequent ability to blend with benign traffic (Silva et al., 2013).

Decentralized Decentralized or P2P botnets can be formed using the vulnerable hosts of an existing P2P network (parasitemode), by joining an existing P2P network and depending on its peers for C&C communication (leeching mode), or by building an independent botnet where all members are bots (bot-onlymode) (Negash & Che, 2015).

Once a new botnet is infected, it goes through abootstrapphase to join the network.

One way to implement this phase is by hardcoding an initial list of peers in each P2P bot.

This method is common in unstructured and superpeer networks (Silva et al., 2013). The new bot would attempt to access every peer on the list to keep an updated list of peers (Silva et al., 2013). An alternative method, more common in structured networks, is to retrieve peer information from a shared web cache, whose address is given to the new bots (Silva et al., 2013). It is also possible to retrieve a list of peers from the immediate infecting machine, which can reduce the overhead of iteratively checking the list by the new bot (Negash & Che, 2015).

P2P protocols, like torrents, were introduced in an effort to increase botnets resilience (Silva et al., 2013). This topology is not only harder to dismantle, due to the existence of mutliple control servers, but it is also harder to detect, since talking to mutliple peers generates more stochastic traffic metrics (Khattak et al., 2014). Botnets of this category create what is known asoverlay networks. We can distinguish 3 types of such networks:

unstructured,structured, andsuperpeer(Khattak et al., 2014). Theunstructuredtype refers to random topologies with no possibility for key lookups. Thestructurednetworks usually implement adistributed hash tableused for routing. Insuperpeer networks, only a small subset of peers is selected to perform networking operations like search and control. Several known P2P applications, like Skype (in its pre-2014 form) and Gnutella, use superpeers (Silva et al., 2013). This model is usually more visible and vulnerable to targeted attacks since it relies on a limited number of key components, which makes it less popular among efficient botnets (Khattak et al., 2014).

Hybrid. Hybrid architectures fall between centralized and decentralized models. They combine the limited number of controlling nodes from a centralized model with the multiple access points for communication from the P2P model (Silva et al., 2013). As an example, in hierarchical botnets, the communication trickles down from a top to a

(19)

bottom bot, thus every message travels through the entire hierarchy. Another example of a hybrid model, albeit a theoretical one, is the random C&C. In this architecture, the infected hosts do not contact the control center or their peers, but they wait instead to be contacted by the botmaster before an attack (Silva et al., 2013). This is a highly resilient model since it does not use synchronized or persistent communication that might be more easily detected. One of its downsides is the limitation in scanning acitivity and potential delays and asynchronous behavior (Khattak et al., 2014). The setup is also more complicated compared to either centralized or decentralized options which implies more demanding implementation and maintenance (Khattak et al., 2014).

Modern platforms. Modern platforms, including social networks, blockchains, and cloud services, offer a fertile ground for botnets (Xing et al., 2021). All three environments guarantee decentralization and concealment of public service resources, thus making them ideal upgrades from classical C&C topologies (Xing et al., 2021).

On cloud platforms, botmasters pretend to be legitimate users to build a botnet on the virtual machines of the service (François, Wang, Bronzi, et al., 2011). The cloud-based botnets, sometimes known asbotclouds, are quick to create and are always ready to be used by the botmaster, unlike physical compromised devices which can be turned off by their legitimate users (Clark et al., 2011). Other botnets use social media platforms like Facebook, Twitter, and WeChat as transmission channels (Zhang et al., 2016). For example, Stamp et al. (2013) created SocialNetworkingBot, a proof of concept for a botnet using Twitter feeds as a C&C architecture.

Finally, blockchain, a distributed database of cryptographic records used, among other things, to coordinate Bitcoin transactions, is ideal for C&C communication to piggyback (S. T. Ali et al., 2017). One advantage of Bitcoin networks is the guaranteed anonymity for the transactions, which also implies that identifying one bot does not necessarily lead to taking down others (Xing et al., 2021). Another advantage is the high cost and legal implications of shutting down a Bitcoin network, including the technical challenge of updating the bitcoin protocol of clients scattered all around the world, and the difficulty of imposing regulations on the libertarian ideology of Bitcoin (Bustillos, 2013).

Despite the advantages of these modern C&C platforms, traditional platforms still prevail in recent malware due to their simplicity and availability in open-source malware systems.

(20)

2.5 IoT Networks

The phrase Internet of Things (IoT) was first used by Kevin Ashton in 1999, in a presentation for Porter&Gamble, in reference to the Radio-Frequency Identification (RFID) technology used by the company to monitor their supply chain (Ashton, 2009).

The wording was meant to shed light on the revolutionary potential of linking physical objects to the Internet. The phrase rose in popularity since then, and with it the cyber-physical devices it designates. Today, IoT refers to all physical objects capable of sensing (and potentially controlling) their environment without human intervention.

Examples of such devices range from the automated sensors of production lines to the smart house appliances and entertainment units populating today’s homes.

In terms of numbers, IoT Analytics, a company providing insights on IoT markets, expects to see more than 34 million IoT devices in use by 2025 (Lueth, 2018). This increase is as exciting as it is worrisome due to security and privacy hazards posed by these devices.

In fact, more than 90 % of IoT gadgets have access to private or sensitive data like users’

conversations and personal files (Digicert Inc., 2018). At the same time, more than 10 % communicate through unencrypted text messages (Greene, 2019). In addition, consumer devices in particular are notorious for keeping default login credentials against common recommendations, and to receive limited security updates throughout their lifetime (Sivanathan, 2019). These practices leave the devices vulnerable to attacks of all kinds, compromising the privacy, infrastructure, and sometimes safety of the users.

In 2015, for instance, The Guardian reported that a Barbie Doll could be easily turned into a spying device (Gibbs, 2015). In 2017, a fish tank was used to hack into a casino’s data system, according to The Washington Post (Greenberg, 2017). Incidents like these are numerous, costing companies millions of dollars yearly (Schiffer, 2016).

This state of vulnerability makes IoT devices ideal candidates for botnet infections. In 2016, the Mirai malware made the news as one of the first malwares to build a botnet of IoT devices. With an estimated 100 000 bots in its network, the botnet managed to take down the security analysis blog Krebs Security in September 2016 with DDoS attack, then moved on to paralyze the servers of Dyn, a company controlling much of the Internet Domain Name System (DNS) infrastructure. The attack on Dyn was estimated at 1.2 Tbit/s by security analysts, which is twice the strength of any previously recorded DDoS attack. (Woolf, 2016)

This destructive exploitation of IoT devices soared in subsequent years, with the European Union Agency for Cybersecurity (2020) reporting a 57 % increase in observed variants of Mirai in 2019, and a shift on the attack strategy of Mirai botnets from DDoS to credential theft. Other types of malware families have also been ravaging the Internet

(21)

since 2016, like Tsunami, Hajime, and Coinminer to only mention a few (Cozzi et al., 2018).

IoT networks are often monitored using methods known asnetwork forensics(Ghafir et al., 2016). Such methods seek to organize network traffic in a comparable format, including metrics and logged interactions. These measurements are relevant in domains like monitoring, optimization, setup evaluation, and intrusion detection (Hofstede et al., 2014).

There are two main ways to capture the necessary data for network management (Khan et al., 2016). The first is anactiveway, in which traffic is injected into a network and measured with tools such aspingandtraceroute(Hofstede et al., 2014). The second option is known aspassive monitoring, in which existing traffic is observed in real conditions as it passes by a measurement point (Fernandez et al., 2017). In this work, we are primarily interested in the latter type, as we will use data from real environments collected by Ericsson Finland in our experiments.

Passive monitoring can be done at the level of packets or flows (Hofstede et al., 2014).

Packets are a grouping of data made of payload and header, used as the basis of data communication in computer networks through a technology known aspacket switching.

A flow consists of a collection of packets sharing attributes, in particular their source and destination addresses, and characterized by meta-data including the number of packets sent and the total amount of data transferred (Claise et al., 2013). Flow exporting technologies became popular when networks grew too large for packet monitoring (Ghafir et al., 2016). In the nineties, flow export and analysis captured the interest of major industry players in the domain of networking, leading to two main solutions still used to date: Cisco’s Netflow and the Internet Engineering Task Force (IETF)’s (Fernandez et al., 2017).

(22)

3. Literature Review: C&C Detection Methods

C&C detection is concerned with identifying a botnet infection based on C&C traffic.

Methods are given network traces of devices or whole networks and are expected to alert to the presence of malware either in the form of an infected device or a malicious peer.

Detection methods have evolved over time to accommodate the ever developing nature of C&C communication (Silva et al., 2013). These methods can be classified in general topassivemethods, which were wildly adopted in the early days of botnet detection, andactivemethods, which usually rely on network monitoring using statistical tools or machine learning models (Hyslip & Pittman, 2015).

Since the detection developed in response to changes in C&C infrastructure, methods can also be classified depending on the type of architecture they are best suited for (Hyslip & Pittman, 2015). The following sections review the main passive and statistical methods for the two most common C&C architectures: P2P and centralized. The review is supported by key literature surveys from different years (see Table 3.1) in addition to the original paper for each method presented (see Tables 3.2 and 3.3).

Table 3.1: Surveys used included in the literature review

Survey Description

García, Zunino, et al., 2014 Network-based C&C detection methods Hyslip and Pittman, 2015 Overview of detection methods per C&C

infrastructure

Singh Rawat et al., 2018 Survey on P2P botnets

Xing et al., 2021 Recent comprehensive survey with extensive ML methods

(23)

3.1 Honeynets and Signature-Based Methods

Honeynetsare one of the first passive network monitoring approaches used to study botnets. A honeynet is a group of computers, known individually as honeypots, intentionally left vulnerable on a network to attract malicious activity for analysis purposes (Hyslip & Pittman, 2015). The state of vulnerability can be achieved by using default credentials and limited security measures on the machine (e.g., a disabled firewall) (Spitzner, 2003). At the same time, the honeypot is prevented from executing any instructions received from the botmaster so as to not cause any harm (Hyslip & Pittman, 2015). The goal is to observe the behavior of new and unknown malicious software to build mitigation approaches against them (Hyslip & Pittman, 2015). Honeypots aim to behave as regularly as possible to not raise the suspicion of the attackers, who sometimes equip their malware with techniques to identify sandbox environments (Positive Technologies, 2021). One of the most famous implementations of a honeynet is The Honeynet Project, established in 2000 as one of the earliest botnet mitigation solutions (Spitzner, 2003).

Beside live monitoring, honeynets can also be used for analyzing captures. For example, Cooke et al. (2005) used a honeypot to capture the traffic from an IRC botnet to its C&C server, then developed signatures of botnet traffic based on an analysis of the traffic traces. The authors conclude that no connection-based variables can be used to detect a botnet from its C&C communication, since the botmaster retains the ability to modify the communication characteristics to evade any detection rules in place. Instead, Cooke et al. (2005) propose a more comprehensive approach relying on the correlation of alerts from multiple sources.

This idea of correlation was adopted by Gu et al. (2007) to develop BotHunter, a system analyzing inbound traffic inside a local area network to identify C&C communication.

BotHunter relies on two plugins, Statistical Scan Anomaly Detection (SCADE) (monitoring inbound ports) and Statistical Payload Anomaly Detection Engine (SLADE) (monitoring inbound payloads), and one ruleset (monitoring 1383 heuristics of known botnets and malwares) developed for the open source intrusion detection systemSnort.

The outputs of all three units are correlated with weights associated to each type of alert to determine whether a device is infected. The system has been evaluated on virtual and real networks, and overall promising results were reported. Limitations of the system include its reliance on unencrypted communication, which is seldom used nowadays.

Zeng et al. (2010) developed another system relying on correlation across different detection modules. This time, the modules collected host-level features, such as registry changes and file system modifications, and network level features, like network stack

(24)

Table 3.2: Honeynets and signature-based C&C detection methods, with their main characteristics, target botnet types (Ù), main advantages (3) and disadvantages (7).

Method Characteristics

(Spitzner, 2003) honeynet

The Honeynet Project Ùcentralized

3launching botnet detection 7limited to observed malware (Cooke et al., 2005) signature-based detection

Ùcentralized, IRC

3detecting observed malware

7connection values are easily tunable (Gu et al., 2007) monitoring inbound ports and payloads BotHunter Ùcentralized, inside a LAN

3good detection on simulated and real networks 7requires unencrypted communication

(Zeng et al., 2010) host-level monitoring and Netflow data analysis

Ùcentralized (IRC, HTTP) and decentralized

3handles multiple types of botnets 7requires access to monitored devices

changes using Netflow data. The technique was effective against IRC, HTTP and P2P botnets. However, it is not scalable and requires that the host module be installed on every monitored device, making it practically usable only in enterprise networks.

3.2 Statistical and Anomaly-Based Methods

BotMiner was introduced to handle different C&C infrastructures including P2P botnets (Gu, Perdisci, et al., 2008). The wide range of detection was enabled by relying on the core description of a botnet: a group of infected devices receiving commands and executing them. With this idea in mind, BotMiner first clusters the monitored devices based on similar traffic characteritics (C-planeclusters) and malicious activity (A-plane clusters). The C-plane used numerical traffic characteristics (like aggregated volume and packets exchanged and flow durations) to identify groups of similarly behaving devices.

The A-plane usesSnortto capture signatures of malicious activity such as scanning, spamming and downloading binaries. The botnets are then detected by correlating clusters from the two planes. Although the system introduces the most flexibility and generalizability, it requires a high number of infected devices for the clustering planes to be effective. (Gu, Perdisci, et al., 2008)

(25)

BotGrep (Nagaraja et al., 2010) uses graph-based analysis to identify P2P botnets. The system requires an initial seed of botnet information, which can be obtained from a honeynet, to reveal the remaining peers making up the botnet by identifying the pairs of hosts talking to each other. One advantage of the method is that it is not affected by botnets that vary ports or use encryption. One downside is that it requires visibility into the whole network (e.g., by operating on an Internet backbone) and is most reliable with a high number of infected devices.

With the apparent increase in decentralized C&C architectures, modern research focuses on detecting the communication between botnet peers. As an example, BotTrack (François, Wang, State, et al., 2011) – and later on BotCloud (François, Wang, Bronzi, et al., 2011), a MapReduce version of the same system – build on the idea of BotGrep (Nagaraja et al., 2010) of using graph algorithms to identify botnet peers. The system made good use of forensic analysis with NetFlow data, and showed improved detection rates when given prior information on the botnet through honeypot captures. Reliance on graph algorithms again requires a full visibility of the network only possible for systems operating on the Internet as a whole.

Bilge et al. (2012) created Disclosure, a system focused on detecting C&C servers rather than bots. The method first identified server traffic from client traffic, then created three sets of features each describing one aspect of C&C communication, namely: flow-size features, client access patterns, and temporal features. While the method distinguished between benign and malicious servers, it also generated a high number of false positives for high detection rates.

More recent methods include BOTection, a system analyzing the behavior of botnet malware using a first-order Markov chain of flow connection states (Alahmadi et al., 2020). The system computes the transition probability between consecutive flows over a fixed number of flows. A Random Forest model is then trained on classifying windows of flows from benign and malicious traffic. The authors found that Markov chains fitted on malware samples had distinct features compared to benign sequences (including, for example, an abundance of rejection states), such that the classification had promising results across all malware families (the f1-score was 94 % on average). Although the method seems promising in general, it necessitates the infected device to be in attack mode to induce changes to its flows’ connection states, which is unlikely to work for C&C communication.

The same idea of capturing traffic characteristics through discrete state values is used by Torres et al. (2016). This time, the authors use a more powerful modeling tool: a Recurrent Neural Network (RNN) with a Long-Short Term Memory (LSTM) structure.

(26)

The states are defined as a sequence of letters, each representing bins of one of three numerical features: duration, volume, and periodicity. State sequences are built between two endpoints defined by a source and peer address on one end, and a peer port and a protocol on the other end. Experiments revealed that data imbalance resulted in a high number of false alerts, which could be mitigated with oversampling or undersampling.

The ideal sequence length was empirically shown to be 10 states. The model was also shown to maintain comparable false positive rate and precision when tested on an imbalanced dataset of an unseen botnet family. This preliminary study revealed the promising potential of RNNs in botnet detection, but suffers from the same limitations as BOTection (Alahmadi et al., 2020): it is not suitable for the much stealthier C&C traffic.

Other advanced machine learning models were used with numerical traffic features. For instance, Pektaş and Acarman (2018) train a Deep Neural Network (DNN) on over 501 statistical features, including binary encoding of state descriptors, computed from traces of benign and malicious traffic. They evaluate different DNN architectures, with the optimal one containing 2 fully-connected layers with 500 neurons each. Experiments showed that the method is able to beat state-of-the-art results on benchmark dataset CTU13 (García, Grill, et al., 2014), with an f1-score of 99.1 %. Despite the promising results, the method can be impractical in real systems due to the computational resources required for training and complex parameter tuning. Furthermore, the authors did not analyze the misclassifications of the system, which makes it difficult to identify the limitations of the method in terms of botnet traffic characteristics.

(27)

Table 3.3: Statistical C&C detection methods, with their main characteristics, target botnet types (Ù), main advantages (3) and disadvantages (7).

Method Characteristics

(Gu, Perdisci, et al., 2008) clustering based on activity patterns BotMiner Ùcentralized and decentralized

3handles multiple C&C architectures 7requires multiple infected devices (Nagaraja et al., 2010) graph-based analysis

BotGrep Ùdecentralized

3handles encryption and varying ports 7full network visibility

(François, Wang, State, et al., 2011) graph algorithms

BotTrack Ùdecentralized

3beats competing methods 7full network visibility

(Bilge et al., 2012) classification based on 3 feature modules

Disclosure Ùcentralized

3handles large networks 7many false positives

(Alahmadi et al., 2020) Markov chain on connection states BOTection Ùcentralized and decentralized 3insights into botnet behavior

7relies on activity-induced state modifications (Torres et al., 2016) LSTM-RNN model on connection states

Ùnot specified

3beats Markov chain models

7relies on activity-induced state modifications (Pektaş & Acarman, 2018) DNN with 501 traffic features’ statistics

Ùnot specified

3beats SOTA on benchmark CTU13 7computational cost of training and tuning

(28)

4. Data Collection

4.1 A Word on Open-Source Datasets

In security analytics, data collection is usually the bottleneck of studies. There is a consensus in literature on a lack of good quality public data (Casas, 2020). C&C communication traces in particular are difficult to come by, as benchmark open-source datasets often feature infected devices in attack mode with only short traces of C&C activity (Biglar Beigi et al., 2014; García, Grill, et al., 2014). In addition, we focus here on IoT malware, which usually runs on Linux operating systems on top of embedded processors like ARM, PowerPC or MIPS. Compared to Windows/x86-based malware, which has been analyzed for decades, analysis of IoT malware is a young area.

Consequently, tool chains for malware analysis on Linux platforms (particularly for embedded processors) are just appearing, which means public IoT malware traces are rare. For these reasons, we decided to collect our own data. The benign data is captured in an Ericsson operated network of industrial IoT devices, while the malicious data is collected by executing malware captures in a partially isolated environment, known as a sandbox, where communication towards C&C servers is enabled.

4.2 Collecting Benign Data

Both the communication characteristics of a device and the heterogeneity of devices within a network affect the detector’s performance (Zhang, 2012). Devices with highly regular communication (e.g., sensors talking to a limited number of specific backends or services) can exhibit benign communication patterns similar to C&C traffic, making the detection more challenging (Bilge et al., 2012). At the same time, devices with user-generated activity often show bursty behavior that is easy to distinguish from botnet communication (Sivanathan, 2019). It would thus be interesting to test the detection method with background traffic with various levels of regularity. For this purpose, the

(29)

Table 4.1: Overview of benign datasets

Dataset # of Flows # of Devices Duration (in days)

ben1 545 901 2 136 3.77

ben1_long 1 525 099 5 000 1.48

ben2 448 414 1 574 3.76

ben2_long 1 835 767 4 000 6.0

ben3_long 1 719 863 78 4.0

ben4 864 390 1 686 3.82

study uses multiple datasets of traffic captures from different device groups, where each group is made of similarly behaving devices.

In particular, six samples were taken from four networks managed by Ericsson. The networks are monitored using IoT Accelerator (IoT-A), Ericsson’s solution for providing connectivity to industrial IoT devices such as manufacturing sensors, vehicular systems, and networks in smart cities (Ericsson, 2021). Table 4.1 gives an overview of each dataset. Thelongsamples come from the same device groups but either contain more devices or a capture of a longer duration. Group ben4 has devices with the most regular traffic, while group ben3_long contains the least regular communication. The data comes in -similar flow format (Hyslip & Pittman, 2015) (minus a few features like packet counts), with every flow describing the communication channel through endpoints (source/destination addresses and ports), protocol used, and general statistics describing the connection (e.g., volume exchanged, duration of exchange, TCP flags observed, etc.).

The regular features are also augmented with custom data like the unique International Mobile Subscriber Identifier (IMSI).

Due to technical discrepancies, some flows are unidirectional (i.e., uplink and downlink packets processed by different probes generating separate flow records). The flows of each dataset are aggregated with a reporting interval of 1 min (same as the original sampling rate of IoT-A) to mitigate this issue by combining pairs of unidirectional flows, which usually have a periodicity of less than 1 min.

4.3 Collecting Malicious Data

To get C&C flow data, a sandbox environment was used to run malware binaries.

Sandbox is a semi-isolated virtual machine capable of logging traffic traces in the form of PCAP files using a network monitoring tool liketcpdump(TCP Dump, 2021). The sandbox environment enables us to simulate an infected device while preventing the

(30)

spread of the virus. The generated PCAP files, a file format used to record packet data from a network, are converted to flows usingOpenArgus(Keary, 2021; Qosient, 2021).

The details of this procedure are presented in what follows, with Figure 4.1 summarizing the process.

Acquiring malware binaries. (Steps 1–4 in Figure 4.1) The first step is to download malware binaries. The files were retrieved from two main sources: online security- dedicated repositories (e.g., MalwareBazaar (2021)), and honeypot feeds (including a private honeypot and public ones). Since the focus of this work is on C&C communication, the malware files should be as new as possible to guarantee that their servers are still active. This is seldom the case for repository-published binaries, since their servers are taken down quickly upon detecting the botnet and C&C communication cannot be observed afterwards. The honeypot feeds offered sources more relevant to our use case. The challenge however was the abundance of such sources: a daily log of a public honeypot contained thousands of binary URLs. Intrusion logs were mapped to known malware families using pattern-based classification methods such asClamAV, to only keep the URLs belonging to new types of malware. If the actions the malware attempted were never seen before (i.e., a completely new family of malware), or belonged to a family we did not have in our curated collection, the malware URL was saved for testing.

This reduced the number of tested URLs per day to a maximum of 50. The URLs were then used to download the malware binary in the sandbox environment. Only a handful of these binaries could run successfully on our Linux environment. Around 50 % of the binaries downloaded from honeypots established a successful connection with the C&C servers.

Collecting malware traffic traces. (Step 5 in Figure 4.1) Malware traffic traces were collected by running malware binaries in an isolated environment where the behavior of the infected device could be observed without risking the spread of the virus. To do so, a sandbox was set up using theLiSaopen-source framework (Uhříček, 2021). To make sure the malware did not cause unintended harm, the binary was first run in a totally isolated sandbox (i.e., disconnected from the Internet) to watch its behavior and identify the C&C IP address(es). Then communication to the C&C IPs and infrastructural components like DNS was enabled, while all other networking activity remained blocked.

All the while, theLiSaframework recorded all observed traffic from and to the sandbox usingtcpdump. The early sandbox captures lasted for 1000 s, which is a hard limit set byLiSa. We later found a way to disable this limit and get 1800 s and 2000 s captures.

Both types of captures were retained for comparison purposes and for training with

(31)

Figure 4.1: Overview of malware data collection. (1) SSH/Telnet honeypots with poor authentication to lure intruders. (2) Intrusion log classifier analyzes logs’ patterns. (3) Malware binaries are downloaded toLiSa, an open-source Linux-specific Sandbox environment. (4) ClamAV identifies the malware family. (5) The malware binaries are executed within LiSa while tcpdump records traffic with close monitoring of network behavior. (6) PCAP files generated bytcpdumpare processed to flows using OpenArgus.

different dataset lengths. The difference in total number of packets is due to the different activity levels of various malware families.

Generating flows from PCAP files. (Step 6 in Figure 4.1) After collecting the packet captures of each malware, flow reports were generated usingOpenArgus, a network monitoring software with security analysis capabilities (Qosient, 2021). We set the status reporting interval to 1 min.

4.4 Identifying C&C Channels in the Captures

The malware traffic traces collected in this study contain infrastructural traffic like DHCP and DNS, C&C flows, and other flows generated from malicious activities like scanning, depending on the malware familty (e.g., Mirai is famous for performing aggressive scanning before contacting the C&C server (Antonakakis et al., 2017)). Since we are interested in the detection of C&C communication in particular, it is important to isolate flows reflecting this communication from the malware captures. This is done through two filtering steps: protocol-based filtering and activity-based filtering.

Protocol-based filtering discards broadcast messages present in the capture by removing all flows with protocols other than TCP or UDP. This rule is inspired from previous studies suggesting the popularity of these two transport layer protocols (Silva et al., 2013;

(32)

Singh Rawat et al., 2018; Xing et al., 2021), and from manual inspection of the captures confirming that the discarded flows have broadcast addresses. Activity-based filtering removes flows related to malware activity like scanning or flooding attacks. Many botnet malware are pre-programmed to perform aggressive scanning upon activation (i.e., without waiting for explicit instructions from their botmaster (Antonakakis et al., 2017)).

This behavior was particularly acute in Mirai, Hajime, and Mozi. The aggressive scanning makes activity-based detection effective on these malwares. To identify activity-related flows, a few heuristics were used to distinguish rejected communication attempts, which resulted from the sandbox environment blocking the malware attacks.

The heuristics include receiving (or sending) a reset flag (for TCP connections) and a received volume or number of packets of 0 (for both TCP and UDP flows).

Once C&C communication flows are isolated, we use their peer addresses to collect the entire C&C traffic traces. The goal is to include in the dataset any previous failed attempts at contacting the C&C channels that may have been discarded in the filtering stage. It is worth noting here that some captures only contained failed connection attempts (i.e., contacting the C&C server was unsuccessful for the duration of the capture, most likely because the server was taken down previously). These captures were not considered when making observations on the malware behavior, since the focus of this work is on operational botnets with a functioning C&C server.

(33)

5. General Detection System

5.1 Detection Based on Traffic Patterns

When looking for patterns in malware data, the goal is to identify characteristics allowing the distinction between C&C traffic and benign traffic. At the same time, we take note of how different malware families collected in the study organize their C&C communication, answering research question 1.b. The general patterns are then compared with the C&C architectures observed in previous IoT and IT malware as reported in literature, therefore answering research question 1.a. For generalization purposes, we analyze traffic captured from different malware binaries per family in different time periods. Our observations motivate the design of features used in the detection modules, which are then evaluated with different background traffic, as an attempt to answer questions 2.a and 2.b.

In the 46 traffic captures made, we distinguish three communication patterns: centralized- stochastic, centralized-beaconing, and decentralized-DHT. The three patterns matched the main observations reported in literature for centralized and decentralized C&C architectures (García, Grill, et al., 2014; Hyslip & Pittman, 2015; Negash & Che, 2015).

Regarding the specific malware families studied in this work, 5.1 gives an overview of the C&C architecture classification of each in turn along with the number of captures analyzed to build the observations. The three patterns are further discussed in Sections 6.1, 7.1, and 8.1 respectively.

5.2 Data Augmentation Through Simulation

Acquiring malware captures requires a lot of time and effort. Consequently, collecting enough data points from each malware family for comprehensive training and testing of the detection system takes months of work and requires considerable computational

(34)

Table 5.1: Key observations from the C&C behavior of IoT malware

IoT Malware Key Observations Protocol Rallying System

Centralized-stochastic

coinminer (2) same C&C IP address in both captures TCP Hard-coded IP Centralized-beaconing

dakkatoni (3) multiple servers, different beaconing values per server

TCP/HTTP DNS

ddostf (2) fixed beaconing values alternates between two uePorts

TCP Hard-coded IP

dnsamp (3) fixed beaconing values with particularly high volume (> 1000 byte)

TCP DNS

gafgyt (5) fixed beaconing values except for duration (varies between 0.1s and 60s)

TCP Hard-coded IP

mirai (5) performs aggressive scanning upon activation

TCP Hard-coded IP

nanobot (3) near constant beaconing values across captures

TCP Hard-coded IP

shellbot (1) behavior and beaconing values very similar to nanobot

TCP DNS

skidmap (2) less regular beaconing values with high volume (> 1000 byte)

TCP/HTTP DNS

trojan (1) a few initial failed C&C connection attempts

TCP DNS

tsunami (4) multiple C&C IPs opens a new TCP connection with every flow

TCP Hard-coded IP

xordos (3) same C&C IP address across different captures highly regular beaconing with high volume (> 1000 byte)

TCP Hard-coded IP

Decentralized-DHT

hajime (6) traffic patterns differ per peer UDP DNS

mozi (6) traffic patterns differ per peer UDP DNS

(35)

resources. At the same time, supervised methods like Random Forest are sensitive to the balance of classes in training data (Bader-El-Den et al., 2018). This means that, for complex benign traffic, a model can require hundreds of thousands of data points from each class (benign or malicious) which could mean even longer captures depending on the feature engineering and aggregation level employed in the preprocessing stage.

Using simulated data allows to bypass this issue. Note that class balancing techniques like oversampling and undersampling are also an option in this case. Simulation is prefered due to its other advantages discussed below. The idea here is to generate malicious flows capturing all the observations made on C&C communication, while being as close as possible to the real captures. These simulations are then used to train and evaluate the detection model with different dataset imbalance ratios, reflecting different real-world infection scenarios. Additionally, trained models are also tested on real malicious captures to assess the reliability of the detector in the real world.

Another advantage of using simulations is to study the infection of specific devices or device groups through the use of background traffic. Background traffic in this context is a collection of flows (assumed to be benign) fed to the simulator to define the behavior of an infected device. The simulator randomly selects a fraction infection ratio of the background traffic devices to act as bots within the network. C&C flows are then generated for each device in turn for a pre-defined duration. One advantage of using background traffic in the simulation is to take into account the active times of the device and use the real IP addresses the device used during those periods. This leads to much more realistic simulations that can also turn out to be more challenging for the detector.

An important concern when using simulated data is the realism of the simulation. Ideally, the data should not overfit a single capture of a malware family nor should it be too general so as not to reflect key characteristics of the traffic (namely interval values and communication patterns). For this purpose, the simulation script starts by implementing the key observations presented in Sections 6.1 and 7 for each C&C pattern. The quality of the simulation is then assessed by: (1) distribution plots comparing simulated and real data, and (2) comparing the detection results on real and simulated data. Results of this evaluation are presented for each detector in turn.

5.3 Experiment Pipeline

After choosing the detection model and writing a data augmentation script, it is important to describe an experimentation protocol to guide the evaluation of the detectors. This

(36)

protocol is presented in the form of a pipeline shown in Figure 5.1.

Dataset generation. The pipeline starts with randomly selected data from benign and malware samples (step 1). Specifically, we start with 6 benign datasets and 46 real malware captures as described in Sections 4.2 and 4.3 respectively. For the benign data, we split each group to train and test datasets by randomly taking half the devices from each group correspondingly. The splitting process is repeated 5 times to remove the bias of device selection from the study. Later on, the results of each experiment are averaged over the 5 selections to estimate the standard error (defined as the standard deviation divided by the number of attempts) of each metric measured. The result of this step is a total of 30 pairs of training and testing datasets.

In parallel to the benign data selection, we choose a benign sample to be mixed with the real malware captures. The sample is taken to be of duration equal to that of the malware capture to ensure a minimum balance in the data (i.e., the detector is evaluated on𝑥duration of benign and malicious traffic, as opposed to 4 d of benign and 30 min of malicious traffic). We consider the𝑥 duration with the largest amount of traffic from each benign dataset. The duration-based samples are taken from test datasets only, since the model is always trained on simulated data. We get 1380 real test datasets in total (30 datasets mixed with 46 captures).

In steps 3 and 4, we simulate malicious data using each benign dataset as background traffic. The simulation is run for each of the 3 malware patterns with different infection ratios. Training infection ratio (step 4) was set to 0.5. The testing infection ratio (step 3) was set to 0.001, 0.01, and 0.2 in turn. The goal is to evaluate the detector’s performance with different proportions of infected devices in a network. The total number of malicious datasets per C&C communication pattern we get at this stage is 270.

Feature extraction and learning. The goal of this module is to process all datasets to extract the features used by each detector in turn. The processed datasets contain the features defined for each detector as dimensions, along with aggregated flows as data points. Next, the processed data is fed to the model for training and evaluation. Since detection features defer between modules, a detailed description and justification of their extraction is deferred to the corresponding detector’s chapter (Chapters 6, 7, and 8).

Botnet Command & Control Detection in IoT Networks