Skip to content

Classification

Pablo Collado Soto edited this page Dec 2, 2022 · 1 revision

Traffic classification with a SVM (Support Vector Machine)

We have our scenario working properly and the attack is having the desired effect on our network. In other words, it's blowing things up. If we are to detect the attack we need to gather representative data and process it somehow so that we can predict whether we are under attack or not. As Jack the Ripper once said, let's break this into parts. We'll begin by gathering the necessary data and sending it to a database we can easily query. We'll then prepare training datasets for our SVM and get it ready for making guesses. Let's begin!

First step: Getting the data collection to work 😵

What tools are we going to use?

For a previous project belonging to the same subject we were introduced to both telegraf and influxdb. The first one is a metrics agent in charge of collecting data about the host it's running on. It's entirely plugin driven so configuring it is quite a breeze! The latter is a DBMS (DataBase Management System) whose architecture is specifically geared towards time series, just what we need! The interconnection between the two is straightforward as one of telegraf's plugins provides native support for influxdb. We'll have to configure both appropriately and we'll see it wasn't as easy as we once thought due to mininet getting in the way. We have come up both with a "hacky" solution and an alternative any Telecommunications Engineer would be prod of. Just kidding, but it uses networking concepts and not workarounds though.

Leveraging the Mininet's shared filesystem

Have you ever felt like throwing yourself into /dev/null to never come back? That was pretty much our mood when trying to get a host within mininet's network to communicate with the outside world. In order to understand how we ended up "fixing" (it just works 😬) everything we need to go back and take a look at our initial ideas and implementations.

We should not forget that we are looking at ICMP traffic in order to make predictions about the state of the network. We first thought about running telegraf on a network switch that was directly connected to the controller where our InfluxDB instance is running. The good thing about this scheme is that the telegraf process within the switches can communicate with the DB running in the controller through HTTP. This is due to the fact that we are invoking the start() method of the switches during the network configuration so even though there's no "real" link between them (we didn't create it by calling addLink()) they can still communicate.

The above sounds wonderfully well but... switches can only work with information up to the link layer, they know nothing about IP packets or ICMP messages. We should note that ICMP is a layer 3-ish (more like layer 3.5) protocol. As it relies on IP for the network services but doesn't have a port number we cannot assign a particular layer to it... All in all the switches knew nothing about ICMP messages crossing them so we find that we need to run telegraf on one of the hosts if we want to get our metrics. In a real case scenario we could devote a router (which can process ICMP data) instead of a switch for this purpose and reconfigure the network accordingly. Anyway we need to get the telegraf instance running in one of the mininet created hosts to communicate with the influx database found in the controller VM. Let's see how we can go about it...

When discussing the internal mechanisms used by mininet later on we'll find out that it relies solely on network namespaces. This implies that the filesystem is shared across the network elements we create with mininet AND the host machine itself. This host machine has direct connectivity with the VM hosting the controller so we can take advantage of what others consider to be a flaw in mininet's architecture. We are going to run a telegraf instance on mininet's Host 4 whose input plugin will gather ICMP data and whose output will be a file in the VM's home directory. We'll be running a second telegraf instance in the host VM whose input will be the file containing Host 4's output and whose output will be the Influx DB hosted in the controller VM. This architecture leverages the shared filesystem and uses a second telegraf instance as a mere proxy between one of mininet's internal hosts and the controller VM, both living in entirely different networks.

In order to implemnent this idea we have created all the necessary configuration files under conf to then copy them to the appropriate places during Vagrant's provisioning stage.

Implementing a NAT (Network Address Translator) in Mininet for external communication

Once we implemented the solution above we were able to continue developing the SVM as we already had a way of retrieving data. That's why we decided to devote some time to looking for a more elegant solution. Just like we usually do in home LANs we decided to instantiate a NAT process to get interconnection to the network created for the VM's from within the emulated one. Due to problems with the internal functioning of this NAT process provided by Mininet, extra configuration had to be added to achieve the desired connectivity. To solve the problem a series of predefined rules (flows) were installed in each switch to "route" the traffic from our data collector to the NAT process and from there to the outside to InfluxDB. This could be considered a "fix", but in fairness we are only using the logic of an SDN network to route our traffic in the desired way. You can take a closer look at this implementation in this branch.

What data are we going to use?

We are trying to overwhelm Host 4 with a bunch (a VERY BIG bunch) of ICMP Echo Requests (that is fancy for pings). By reading through telegraf's input plugin list we came across the net plugin capable of providing ICMP data out of the box.

Getting the data to InfluxDB

Instead of directly sending the output to an influxdb instance we are going to send it to a regular file thanks to the file output plugin. This leads us directly to the configuration of the second telegraf instance.

In this second process we'll be using the tail input plugin. Just like Linux's tail, this command will continuously read a file so that it can use it as an input data stream. Instead of polling the file continuously we chose to instead notify telegraf to read it when changes took place. This leads to a more efficient use of system resources overall. The output plugin we'll be using is now good ol' influxdb. We'll point it to the influxdb instance running on the controller VM so that everything is correctly connected.

The structure of the system we are dealing with is then:

We are now ready to start querying our database and begin working with the acquired information.

A note on the sampling period

When configuring the interconnection between both telegraf instances we initially left the default 10 s refresh interval in both. When we read the data we were getting in the DB we noticed some "satrange" results in between correct readings so we decided to fiddle with these sampling times in case they were interfering with each other. As we are communicating both processes by means of a file the timing for reading and writing can be critical... We fixed a 2 s sampling interval in "mininet's" telegraf process and a 4 s refresh rate in the VM's instance. This means that we are going to get 2 entries in the DB with each update!

After running some tests we found everything was working flawlessly now 🙆‍♀️ so we just left it as is.

Second step: Generating the training datasets

Weren't we using the received ICMP messages as the input?

Well... yes and no. The cornerstone for the SVM's input is indeed the number of received ICMP messages BUT we decided to use the derivative of the incoming packets with respect to time instead of the absoulute value. This approach will let the network admin apply the exact same SVM for attack detection even if the traffic increases due to a network enlargement. As we are looking for sudden changes in incoming messages rather than for large numbers this approach is more versatile.

After debating it for a while we settled on including the average of the derivative of the incoming packets as a parameter too. As the mean will vary slowly due to the disparity of the data generated by both situations we'll be more likely to consider the aftermath as an attack too. Even though we may not be subject to very high incoming packet variations any more we'll take a while to resume a normal operation and we decided to let this "recovery time" play a role in the SVM's prediction.

Writing a script: src/data_gathering.py

Once we have the desired data stored in the DB using the SVM becomes a matter of reading it and formatting it so that the SVM "likes it". In order to make the process faster we decided to write a simple python script that uses influxdb's python API to read the data and prepare a CSV (Comma Separated Values) to later be read by the script implementing the SVM.

The defining quality of training data is meaningfulness. The SVM's predictions will only be as good as the training it received so we need to provide insightful data if we are to get any consistent results.

In order to get appropriate data samples we went ahead an simulated regular traffic by pinging the target host at a rate of roughly 1 ICMP message per second. We then attacked the target until we got around 100 samples into de DB.

Generating the DB is just a matter of reading the DB and outputting the read data to a text file with a .csv extension.

Third step: Putting it all together: src/traffic_classifier.py

Apart from the scenario's setup the most important program we wrote is the traffic classifier without a doubt. The file defines the gar_py() class which includes a SVM instance, the query used for getting data and many other configuration parameters as its attributes. This let's us use this same technique in other scenarios, in other words, we are increasing this solution's portability.

The class' constructor will limit itself to initializing its attributes and training the SVM by reading the training files we have already prepared. The main thing to note here is how we need to conform to the input format accepted by the SVM itself.

Once it's trained we just need to call the class's work_time() method which will enter an infinite loop whose operation can be summarised into these points:

  1. Read the last 3 entries in the DB.
  2. Verify these entries are indeed new.
  3. Update the parameters we're going to use for the prediction.
  4. Order the SVM to predict wheteher the new data represents an attack or not.
  5. Write an entry to the appropriate DB signaling whether or not we're under attack.
  6. Wait 5 seconds to read new data. New data is sent to the DB every 4 seconds so reading insaley fast is just throwing resources out the window.

Additionally we used matplotlib to draw the classification we were carrying out. As you can see, the red dots are those data that have been classified as an anomalous traffic, DDoS traffic, and although it seems that there is only one blue dot belonging to "normal" traffic, it is not the case, there are several but their deviation between them is minimal ☺️ .

We've also written a signal handler to allow for a graceful exit when pressing CTRL + C.

And with that we are finished! 🎉 We hope to have been clear enough but if you still have any questions don't hesitate to contact us. You can find our GitHub :octocat: profiles over here.