Bluetooth Pairing - Bluetooth Communication

2.4 Bluetooth Communication

2.4.2 Bluetooth Pairing

The Bluetooth pairing concept was well established and designed to be easy, providing a wireless connection, thus enabling the setting up of networks. Pairing is essential and a first step in connecting two Bluetooth devices and establishing a connection happens after pairing: it is the state at which paired devices communicate. In order to pair two devices, a shared key is used to authenticate both devices. This shared key is also known as the PIN code. The user usually initiates pairing of two devices and the process proceed automatically after a device has received the request. Pairing is es-sential to establish keys to encrypt a link: the keys are shared via a transport-specific key distribution.

The methods, protocols for pairing, and key distribution are defined by the Security Manager, which employs key distribution procedure to implement identity verification and encryptions. The keys can encrypt a link in future connections, employ in signed data verification, or perform random address resolution (Bluetooth SIG, 2020).

The phases of pairing are listed below:

• Pairing feature exchange.

• Short Term Key (STK) generation (LE Legacy Pairing).

• Long Term Key (LTK) generation (LE Secure Connections).

• Transport specific key distribution.

In the Bluetooth 4.2 Core specification, secure connections feature was introduced for LE physical transport: this advanced the technology of pairing. AES/CMAC and P/256 elliptic curve were intro-duced, which are FIPS-approved algorithms (Bluetooth SIG, 2014). Therefore, to distinguish between the secure connection introduced by Bluetooth 4.2 and LE pairing in previous Core specification 4.0 and 4.1, it is termed as “LE Legacy Pairing” (see Figure 10).

Figure 10. Legacy Pairing and Secure Connection flow chart. (Bluetooth SIG blog, 2020) 2.4.3 Bluetooth Bonding

While pairing is the verification of each device’s security attributes and creating temporary encryp-tion, Bluetooth bonding involves the verification of long-term keys: this happens after pairing has occurred and then the keys are stored for future use. Paired devices already bonded can be easily connected next time.

2.4.4 Bluetooth Authentication

Authentication is the process of ensuring secure connection through identification verification of piconets attempting to connect. Bluetooth technology uses a challenge-response to perform this ver-ification, in which a secret link key is shared between connecting devices. The claimants’ understand-ing of the secret key is verified with the use of symmetric keys. It is not a requirement that the master

As shown in Figure 11, the challenge-response method is based on the following steps (Iqbal et al., 2010):

• The verifier device randomizes and sends the challenge to claimant.

• The claimant responds with its (BD_ADDR).

• Then claimant computes authentication response with E1-algorithm and calculates Signed Response (SRES) using AU_RAND, BD_ADDRB, and link key as inputs. 32 bits of the 128 bits are utilized at this stage and remaining 96 bits form the input of Bluetooth encryption key.

• The verifier performs exact computation as above.

• Claimants send selected 32 bits of the E1 output and SRES to the verifier.

• Verifier compares own output of E1 algorithm and received SRES.

• When the 32 bits match, the authentication is successful, otherwise authentication fails.

To ensure mutual authentication, the process listed above needs to be repeated with the claimant and verifier switching roles. There is a waiting interval between failed authentication with a claimant and new authentication attempt. The waiting interval increases with subsequently failed authentication to prevent attackers from setting up multiple authentication attempts in a short time.

Figure 11. Bluetooth Authentication process.

2.4.5 Bluetooth Threats and Vulnerabilities

The wide adoption of Bluetooth technology in all areas of life has made it a target to attackers. How-ever, most known vulnerabilities might have been addressed in an updated version of Bluetooth Core

specification. New threats and vulnerabilities always emerge. Some of the new threats and vulnera-bilities of year 2020 are the following:

1. Bluetooth Impersonation Attacks (BIAS): The BIAS attack is possible due to vulnerability in the Bluetooth specification that allows attacker impersonates during a secure connection es-tablishment. This vulnerability results from the lack of mandatory mutual authentication, au-thentication procedure downgrade, and overly permissive role switching (Antonioli, 2020).

2. Integer Overflow Vulnerability in Android: An incorrect bounds calculation may result in out of bounds write, which could result in remote execution over Bluetooth without additional execution privileges (Huawei, 2020). The Integer overflow vulnerability in Android was as-signed a common vulnerability and exposure identification, namely CVE-2020-0022.

Bluetooth connections as a wireless connection is subjected to threats, such as Denial-of-Service (DoS), impersonation, Man-the-Middle (MITM) attack, and eavesdropping. Integrity threats in-volve information being altered to mislead the recipient. Disclosure threat implies leaked information to an eavesdropper not authorized. Denial-of-Service (DoS) threat involves attacker blocking or lim-iting access to the service.

Besides the general wireless protocol threats, some other threats are particular to Bluetooth-enabled devices (Stirparo & Loschner, 2013), such as:

• Incorrect Protocol Implementation: Flaws in implementation have been the reason for the much famous Bluetooth security breaches. The security quality is a function of product-spe-cific implementation.

• Location Tracking: Devices powered by Bluetooth technology broadcast their unique address, essential for connecting with other devices. However, this also makes tracking possible.

• Key Management: Key disclosure or tampering is possible.

• Bluejacking: Social engineering attack on a susceptible Bluetooth device that sends unsolic-ited messages to compromised devices.

3 DEEP LEARNING

3.1 Introduction to Deep Learning

Deep learning (DL) is subset of machine learning (IBM, 2020) and machine learning (ML) can be termed as a branch of Artificial Intelligence (AI). Therefore, we will lay a background information to elucidate the relationship.

AI is a vast discipline that has continued to evolve. The concept was first proposed in 1950 by Allan Turing. He introduced the “Turing Test” to investigate if machine can exhibit same level of intelli-gence like human. In 1956, John McCarthy introduced the name “Artificial Intelliintelli-gence”. There are various definitions of what AI entails. However, for the purpose of this thesis, we will define AI as the simulation of human intelligence in machines, enabling machine to perform tasks commonly as-sociated with intelligent beings. Subsets of AI are shown in Figure 12.

Figure 12. Artificial Intelligence subsets.

Machine Learning (ML) is a subdivision of artificial intelligence involving systems that could learn from data to understand patterns and infer decisions without being specifically programmed. It is primarily concerned with algorithm that allows system to learn from historical data, identify patterns,

and make decisions (IBM, 2020). The algorithm improves automatically through experience. There are three subdivisions of machine learning as shown in Figure 13:

Figure 13: Machine Learning Subdivision.

• Supervised Learning: When systems learn from known datasets to predict the output, this is termed as supervised learning. There are two categories of algorithms used in supervised learning, they are:

o Classifications o Regression

• Reinforcement Learning: In reinforcement learning, an AI agent is trained by giving some commands and it gets rewards on each action as a feedback, thus improving the performance.

There are two types of reinforcement learning:

o Negative Reinforcement learning o Positive Reinforcement learning

• Unsupervised learning: In unsupervised learning, agent learns from patterns without corre-sponding output values. The algorithms are trained with unlabelled and unclassified data. Un-supervised learning can be classified into two categories:

o Association o Clustering

Deep learning (DL) is a subset of machine learning, as shown in figure 12. Deep Learning (DL) emulates human brain, empowering systems to learn and perform complex tasks (IBM, 2020). DL enables machine to perform intelligent tasks that are similar to human-like tasks without human

involvement. Deep learning relies on neural network architecture: it works on deep neural network made of multiple layers as shown in Figure 14.

Figure 14: Deep Neural Network. (IBM cloud education, 2014)

It gains the adjective "deep" because of the multiple layers in the network. Deep learning become possible and effective in the era of big data availability, advancement in computing power and algo-rithm (Hemsoth, 2017). The major difference between DL and other ML is its capability to learn from unstructured and unlabeled data. (IBM, 2020)

3.2 Artificial Neural Network

Artificial neural networks (ANNs), usually called Neural networks (NNs), involve emulating pro-cessing capability of biological neural systems (Meyer-Baese, et al., 2014). It is an algorithm that takes in data as input, pass the data to the hidden layer, where calculations are made and inference deduced before the data is send to an output layer, where the inference is assigned a probability (IBM, 2020). It could also be defined as an adaptive statistical model that draw inspiration and emulate the working principle of human neurons (Abdi et al., 2011). The fundamental concept is to interconnect high number of simple processing elements to build system capable of performing complex pro-cessing tasks. An ANN consists of simulated neurons. Each of the neuron is a node connecting to other nodes through links that corresponds to biological axon synapse dendrite connections. The weight of each link determines the level of influence of one node on another. The main attributes of

ANNs are their many parallel processing architectures and ability to learn from inputs. There are corresponding learning algorithms for each type of ANNs that allow training in an iterative updating manner (Meyer-Baese, et al., 2014). These algorithms can be categorized as supervised and unsuper-vised learning.

3.2.1 Artificial Neuron

ANN building blocks are basic artificial neurons, also called perceptrons (Sahu, 2018). Artificial neuron and perceptron algorithm was invented by Frank Rosenblatt (Lingireddy & Brion, 2005).

ANN comprises of interconnected artificial neuron. Figure 15 shows a representation of artificial neuron.

Figure 15: Artificial Neuron. (Lingireddy & Brion, 2005)

Artificial neuron takes the inputs, aggregates them and gives the output based on function to neighbor neuron. Inputs are represented by 𝑥₁ 𝑡𝑜 𝑥_𝑛, the weight of connection by 𝑤₁𝑡𝑜 𝑤_𝑛. The summation of the input and weight are feed via transfer function, a threshold unit for output generation, b is the bias.

Equation 1 depicts the operation on neural perceptron, where neural threshold whose value is always 1 is seen as a new input node, with weight equal b and a summarizing from 0 to n (Cain, 2017).

𝑦 =

Using bipolar sigmoid F(y) as transfer function, we can represent the neuron output “z” as shown in Equation 2 where y is the summation of input (Abdullah et al., 2011).

𝑧 = 𝐹(𝑦) = 2

1 + 𝑒⁻¹− 1 (2)

Asides from bipolar sigmoid used as the activation function in Equation 2, there are other common neuron activation functions (Duch & Jankowski, 2000) as shown in Table 1.

Table 1: Neuron activation functions.

Gaussian

𝑔(𝑥) = 𝑒^−𝑥

2 2𝜎²

As a mathematically expression, linear threshold unit (LTU) was the first artificial neuron. LTU (see Figure 16) is made up of input X with n values, mathematical operation to apply activation function on computed weighted sum and output y.

Figure 16: Linear Threshold Unit. (Medium, 2019)

The weighted sum z is the product of the input and their weights, shown in Equation 3.

𝑧 = 𝑤^𝑇. 𝑋 = ∑ 𝑤_𝑖

𝑛

𝑖=1

𝑥_𝑖 (3)

When Heaviside step activation function is applied, we have Equation 4:

𝑠𝑡𝑒𝑝(𝑧) = {0 𝑖𝑓 𝑧 < 0

1 𝑖𝑓 𝑧 ≥ 0 (4)

The step function outputs are 1 and 0, hence output y in Equation 5 is binary. A single LTU is capable of binary classification (Daniel, 2019).

𝑦 = 𝑠𝑡𝑒𝑝(𝑧) = 𝑠𝑡𝑒𝑝(𝑤^𝑇. 𝑥) (5) When dealing with perceptron that has more than one LTU, we introduce a bias vector b to calculate the weight vector of each LTU as shown in Equation 6.

𝑦₂=𝑠𝑡𝑒𝑝(𝑧₂)=𝑠𝑡𝑒𝑝(𝑤₂^𝑇.𝑥+𝑏₂) 𝑦₁=𝑠𝑡𝑒𝑝(𝑧₁)=𝑠𝑡𝑒𝑝(𝑤₁^𝑇.𝑥+𝑏₁)

(6)

If we combine the two LTUs, 𝑦₁ and 𝑦₂ we have Equation 7:

𝑦 = 𝑠𝑡𝑒𝑝(𝑤. 𝑥 + 𝑏) (7)

Single layer perceptron can be termed as the simplest form of feedforward neural network: it has the limitation of being able to handle only linearly separable problems (Gallo, 2015). This problem is solved by multilayer perceptron. Feedforward neural network depicts the fundamental principle of neural network and more complex ANN are built on this working principle.

3.2.2 Gradient Descent

Gradient decent is very paramount in neural network. Gradient is the point derivative of a function, descending in the opposite direction of the gradient gives gradient decent. Finding solution to the simple linear Equation 8 helps to understand gradient:

𝑦 = 𝑤₁𝑥₁+ 𝑤₂𝑥₂ (8) The error can be computed using Equation 11: in this equation we simply call Cost or Cost function.

Now to get the value of 𝑤₁ and 𝑤₂ for which the cost is minimum, we take the derivative of C with respect of 𝑤 and 𝑤 , when equal zero (see Equations 12 and 13):

𝑑𝐶 𝑑𝑤₁ =1

2∗ 2(𝑦 − 𝑤₁𝑥₁− 𝑤₂𝑥₂)(−𝑥₁) (12) 𝑑𝐶

𝑑𝑤₁ =1

2∗ 2(𝑦 − 𝑤₁𝑥₁− 𝑤₂𝑥₂)(−𝑥₁) (13) Equations 12 and 13 are the gradients. However, to reach the minima, we start to update weight value gradually towards the direction of minima, an opposite direction to gradient (see Equations 14 and 15):

𝑤₁ → 𝑤₁− 𝑙𝑟 ∗ 𝑑𝐶

𝑑𝑤₁ (14)

𝑤₂ → 𝑤₂− 𝑙𝑟 ∗ 𝑑𝐶

𝑑𝑤₂ (15)

𝑙𝑟 is the learning rate, Equations 14 and 15 are the Gradient Descent.

3.2.3 Multilayer Perceptron

Multilayer perceptron (MLP) has many applications, thus an important type of neural network. MLP architecture is defined by an input layer, one or more hidden layers and an output layer, each layer comprising of at least one neuron (Meyer-Baese, et al., 2014). The choice of the number neurons and hidden layers is a function of the problem to solve. When there are too many or too few neurons, network’s ability to generalise is limited, due to overfitting, input patterns are memorised and inability to represent input-space features. (IBM, 2020). MLP can handle data that cannot be separated linearly.

In MLP every single node in a layer is connected to each node in the following layer and therefore it is fully connected. Figure 17 shows what a multilayer perceptron looks like.

Figure 17: MLP network. (IBM, 2020)

MLP employs sigmoidal kernel functions as linear weight and hidden unit (Meyer-Baese, et al., 2014). Figure 18 depicts a MLP with 3 LTUs in the hidden layer and 2 LTU in the output layer.

Figure 18: Multilayer perceptron showing LUTs in hidden and output layer. (David, 2019) The calculation is the same as in the Equation 6, however there are more layers of LTUs to combine, before reaching y (see Equation 16):

ℎ¹ = 𝑠𝑡𝑒𝑝(𝑧¹) = 𝑠𝑡𝑒𝑝(𝑊¹. 𝑥 + 𝑏¹)

𝑦 = 𝑠𝑡𝑒𝑝(𝑧²) = 𝑠𝑡𝑒𝑝(𝑊². ℎ¹+ 𝑏²) (16)

ANN are train in batches, if k instances Equation 17 are selected from the available m instances, and then combined, we have Equation 18:

𝑥₁ = (

Representing the input X in a matrix from (k, n), allows us to show that k is the number of instances and n number of input value (David, 2019). Equation 19 represents the new way to calculate y:

𝑦 = 𝑠𝑡𝑒𝑝(𝑍) = 𝑠𝑡𝑒𝑝(𝑋. 𝑊 + 𝑏) (19)

Multilayer perceptron has a bi-directional propagation, i.e., forward propagation and backward prop-agation. MLP uses nonlinear activation function, hyperbolic tangent, or logistic function. Inputs are multiplied with weight and supply to the activation function, and in backpropagation weight are mod-ified to reduce loss.

3.3 Backpropagation

Bryson and Ho first introduced backpropagation in 1969, but it was not well known until 1986, when David Rumelhart, Geoffery Hinton, and Ronald Williams published a paper titled “Learning repre-sentations by backpropagation”. Backpropagation is a short name for “backward propagation of er-ror”: it is an algorithm that uses gradient descent for supervised learning of ANN (McGonagle et al., 2019). In simple term, backpropagation algorithms allow network to make guesses about the input data using its parameters, then it measures the error with a loss function, and the error is sent back to adjust wrong parameters in the direction of less error (Pathmind, 2019), these also describe the three steps of the algorithm.

Backpropagation utilizes an error function on an ANN to calculate the gradient of that error function with respect to the input weights. It is the act of finetuning the weights considering the error rate noticed in the last epoch (Al-Masri, 2019). Backpropagation algorithm is bidirectional, forward and backward direction. Training vector is input to the network in the forward direction and classified,

recursive updating of weight in tandem with noticed error take place in the backward direction (Meyer-Baese, 2014).

To represent backpropagation algorithm mathematically, we initialize weights with numbers not less than -0.1 and not greater than 0.1 (see Equation 20):

𝑤_𝑖𝑗 = 𝑟𝑎𝑛𝑑𝑜𝑚([−1.0, 0.1]), 0 ≤ 𝑖 ≤ 1, 1 ≤ 𝑗 ≤ 𝑚

𝑣_𝑗𝑘 = 𝑟𝑎𝑛𝑑𝑜𝑚([−1.0, 0.1]), 0 ≤ 𝑗 ≤ 𝑚, 1 ≤ 𝑘 ≤ 𝑛 (20) Equation 20 shows the initialization. We then proceed to put forward the training data 𝑝^𝑡= [𝑝₁^𝑡 , 𝑝₂^𝑡 , … . , 𝑝_𝑙^𝑡], 𝑓𝑜𝑟 𝑡ℎ𝑒 𝑝𝑎𝑖𝑟 (𝑝^𝑡 , 𝑐^𝑡) . When 𝑥_𝑖 = 𝑝_𝑖 𝑓𝑜𝑟 1 ≤ 𝑖 ≤ 1. To compute values of neu-rons at the hidden layers, we employ Equation 21:

ℎ_𝑗 = 1

1 + 𝑒𝑥𝑝[−(𝑤_0𝑗+ ∑^𝑙_𝑖=1𝑤_𝑖𝑗𝑥_𝑖)]; 1 ≤ 𝑗 ≤ 𝑚 (21) Then we compute values of output neurons with Equation 22:

𝑂_𝑘 = 1

1 + 𝑒𝑥𝑝[−(𝑣_0𝑘+ ∑^𝑚_𝑗=1𝑣_𝑗𝑘ℎ_𝑗)]; 1 ≤ 𝑘 ≤ 𝑛 (22) Equations 20, 21, and 22 are the same as used in representation of the flow of data in perceptron during classification. We then proceed to represent the error at the output and hidden layer for forward computation, 𝛿_𝑜𝑘 and 𝛿_ℎ𝑗 respectively (see Equations 23 and 24):

𝛿_𝑜𝑘 = 𝑜_𝑘(1 − 𝑜_𝑘)(𝑐_𝑘^𝑡− 𝑜_𝑘) 𝑓𝑜𝑟 1 ≤ 𝑘 ≤ 𝑛 (23)

𝛿_ℎ𝑗 = ℎ_𝑗(1 − ℎ_𝑗) ∑ 𝛿_𝑜𝑘𝑣_𝑗𝑘

𝑛

𝑘=1

𝑓𝑜𝑟 1 ≤ 𝑗 ≤ 𝑚 (24)

If 𝑣_𝑗𝑘 represents weight value after “t” training and the parameter 0 ≤ η ≤ 1 connote learning rate, we can adjust weight between output and hidden layer using Equation 25:

𝑣_𝑗𝑘(𝑡) = 𝑣_𝑗𝑘(𝑡 − 1) + 𝜂𝛿_𝑜𝑘ℎ_𝑗 (25) Adjusting weight in the backward computation between the hidden and input layer can be done using Equation 26:

𝑤_𝑖𝑗(𝑡) = 𝑤_𝑖𝑗(𝑡 − 1) + 𝜂𝛿_ℎ𝑗𝑝_𝑖^𝑡 (26)

For iteration, we repeat the steps from Equations 21 to 26.

Sequel to the random selection of the weight of the network, essential computation to minimize error is done using backpropagation algorithm. The algorithm stops when error function is negligible (Ro-jas, 1996). The steps are as follows:

• Feed-forward calculation

• Backpropagation to output layer

• Backpropagation to hidden layer

• Weight updates

4 Experimental Setup

Bluetooth-enabled device authentication involves sharing a 32-bit link key as a security measure be-fore communication is allowed. The approach in this thesis is to represent the 32-bit link key as a randomly generated weight in the Neural Network Toolbox of 𝑀𝐴𝑇𝐿𝐴𝐵^®. The concept is to store the link key as weight matrix of backpropagation neural network, which make reverse tracking a difficult task, thus making the Bluetooth connection more secure.

4.1 MATLAB

𝑀𝐴𝑇𝐿𝐴𝐵^® is an abbreviation for matrix laboratory, published by MathWorks. It is computing and visualization software build on matrix-based language that support most natural expression of com-putational mathematics (MathWorks Inc,1994). The deep learning toolbox of MATLAB, among other functionalities, allows the setup, training, and simulations of neural networks through com-mand-line function and applications. System requirement for 𝑀𝐴𝑇𝐿𝐴𝐵^® and Simulink on Windows operating system are: Windows 10, windows 7 service pack, Windows Server 2019 and Windows Server 2016. The processor minimum requirement is Intel or AMD x86/64, minimum of 3.4GB of HDD for 𝑀𝐴𝑇𝐿𝐴𝐵^®package and 8 GB for installation. The RAM should be at least 4GB, but 8GB is highly recommended.

4.2 Neural Network Implementation

We created a multilayer feedforward backpropagation ANN with one hidden layer. The number of neurons in the input layer, hidden layer, and output layer are 16, 32, and 16, respectively (see Figure 19). There are 𝑛 samples in the known training keys and 𝑚 samples in unknown keys. We set the link key require for authentication as a 32-bits key K. We then split K into two: 𝐾₁ and 𝐾₂. We input 𝐾₁ the first 16-bits of each key is into neural network and compute 𝐾_{𝑜𝑢𝑡𝑝𝑢𝑡}, when the target is 𝐾₂. After the training, the trained ANN has n keys stored in it. When 𝐾₁ = 𝐾_{𝑜𝑢𝑡𝑝𝑢𝑡} the system authenticates.

Figure 19. MATLAB implementation of neural network.

In our experimentation, 𝑛 = 100 and 𝑚 = 10000 (Table 2), and epoch is 200.

Table 2. MATLAB workspace data.

Name Value

keys 32 x 100 double

keys_length 32

known_keys_output 16 x 100 double

net 1 x 1 network

num_bits_differ_known_keys 1 x 100 double num_bits_differ_unknown_keys 1 x 1000 double

num_of_keys 100

num_of_unknown_keys 1000

t_train 16 x 100 double

tr 1 x 1 struct

unknown_keys 32 x 1000 double

unknown_keys_input 16 x 1000 double

unknown_keys_output 16 x 1000 double

unknown_keys_target 16 x 1000 double

x_train 16 x 100 double

4.2.1 Results

At the end of our training, as shown in the performance graph (see Figure 20) and training state plot (see Figure 21), the best training performance was 0.060238 and the Gradient was 0.00014975, at epoch 200.

Figure 20: Performance graph.

Figure 21: Training state plot.

Zero error was recorded (see Figure 22) and R square value from the regression plot is 0.9694 (see Figure 23). This implies 96% less variance.

Figure 22: Error histogram.

Figure 23: Regression plot.

5 CONCLUSION AND FUTURE WORK

In this thesis, we discuss the importance of Bluetooth technology in today’s digital world, its evolu-tion, technology, and recent security threats and vulnerabilities. We proceed to explain the concept of deep learning, a subdivision of machine learning which is also a subset of artificial intelligence.

We then introduce the concept of Artificial neural network which is fundamental to deep learning.

Perceptron, multilayer perceptron, and feed forward neural network was explained and that leads to explain backpropagation algorithm, which is very important in training neural networks.

Using MATLAB neural network toolkit, we set up a feed forward neural network, with backpropa-gation algorithm. The input, hidden and output layer neuron are 16, 32, and 16, respectively. We represented the 32 bits link key of the Bluetooth authentication in weights and created random 100

In document Application of Deep Learning Method in Bluetooth Security (sivua 24-0)