Data Formats In Uplink Baseband Processing

(1)

DATA FORMATS IN UPLINK BASEBAND PROCESSING

Master of Science Thesis Information Technology and Communication Sciences Examiner: Prof. Jari Nurmi January 2021

(2)

Alper Özaslan: Data Formats in Uplink Baseband Processing Master of Science Thesis

Tampere University

Wireless Communications and RF Systems January 2021

Baseband processing in uplink transmission requires high dynamic range. Thus, it is not adequate to use traditional 16-bit fixed point format. Single precision and double precision floating point formats provided by IEEE-754 standard require large memory space due to their high word length. A better solution must be searched in order to achieve the best performance. Data formats affect dynamic range, accuracy, silicon area, memory alignment, hardware and software effort.

Simple methods such as scaling the symbols, truncation or saturation are not feasible, result- ing loss of data and low accuracy. A possible solution to achieve high dynamic range and low word length is to use special data formats that stand between traditional fixed point and floating point formats, providing high dynamic range and sufficient accuracy with minimum data width.

This thesis work focuses on the effects of word length on accuracy for uplink baseband processing blocks. Dynamic range demand varies from one baseband process to another. This thesis presents different number formats that can be supported for different simulation scenarios to achieve high performance with reduced word length. Simulation results are obtained by using Matlab. Due to credentials, the codes and the implementations are not presented within this thesis report. Results present comparison between traditional data formats and proposed new configurations in terms of throughput, bit error rate, block error and signal to interference plus noise ratio with respect to varying receiver attenuation. Outcome of the simulation results show that highest storage saving per physical resource block (PRB) is 37.5% in the scenario with high complexity and 33.3% in the scenario with lower complexity. It is achieved by reducing the fraction part of cyclic prefix removal to 9 bits and channel estimation to 7 bits in the scenarios respectively.

Keywords: Baseband Processing, Uplink, Cyclic Prefix, Dynamic Range, Fixed Point, Floating Point, Throughput, Bit Error Rate, Block Error Rate, Signal to Interference Plus Noise Ratio, Trun- cation, Saturation, Modulation and Coding Scheme, Physical Resource Block

The originality of this thesis has been checked using the Turnitin OriginalityCheck service.

(3)

PREFACE

This Master of Science thesis work is performed with a company and the Information Technology and Communication Sciences Faculty at Tampere University (Finland). I would like to thank my supervisors Erno Salminen from the company and Jari Nurmi from Tampere University for their technical support and feedback in this work. I received a tremendous help from them and I would like to send my special gratitude.

I would like to thank my parents, Nermin and Özgür, who always supported me in every aspect of my life. I always achieved my goals and managed to be successful thanks to them. Thanks to both of my sisters, Deniz and Pınar, for supporting me and giving morale and advice throughout difficult times. Finally, I would like to thank my friends Kaan and Matteo for sharing good and difficult moments.

Thanks to everyone who were with me in Finland and those who supported me from outside.

Tampere, 25th January 2021 Alper Özaslan

(4)

1 Introduction . . . 1

2 Number Formats . . . 4

2.1 Unsigned and Signed Binary Numbers . . . 4

2.2 Traditional Fixed Point (FxP) . . . 5

2.3 Floating Point . . . 8

2.4 Special Formats . . . 9

2.5 Fixed Point & Floating Point Arithmetic . . . 10

2.6 Storage and bandwidth requirements . . . 13

2.7 Fixed Point & Floating Point Performance Comparison . . . 14

3 Baseband Uplink Processing . . . 17

3.1 Resource Grid . . . 20

3.2 Measurement Metrics . . . 21

4 Modelling . . . 23

4.1 Simulation Environment . . . 23

4.2 Simulation Scenario . . . 24

4.3 Quantization . . . 26

4.4 Saturation . . . 27

4.5 Quantization Points . . . 27

5 Results . . . 30

5.1 Matlab Results for Reference Models . . . 30

5.2 Quantized CP Removal and FFT . . . 32

5.3 Quantized Equalization . . . 34

5.4 Quantized Channel Estimation . . . 35

6 Conclusion . . . 39

References . . . 41

(5)

LIST OF FIGURES

1.1 Telecommunication System Structure (Adapted from [25]) . . . 1

1.2 Digital Receiver Front End. Adapted from [24]. . . 2

2.1 Example of 16-bit unsigned and signed binary number. Note how the same bit patterns can be interpreted in multiple ways. . . 5

2.2 Example of 1.4.3b Fixed Point Format . . . 5

2.3 Dynamic Range and Step Size of Fixed-Point Numbers . . . 6

2.4 A sinusoidal sampled by different FxP formats in both time and frequency domain. . . 7

2.5 Quantization error for FxP 1.0.1 and FxP 1.0.2 formats relative to the original sinusoidal . . . 8

2.6 IEEE 754 Single Precision Floating Point, which consists of 1 sign bit, 8 exponent bits and 23 mantissa bits, and bfloat16 representations that is structured as 1 sign bit, 8 exponent bits and 7 mantissa bits. Conversion from binary to decimal examples 242 and -1.75 for both data formats. . . . 9

2.7 Structure of all number formats for both I and Q components . . . 10

2.8 4-bit Ripple Carry Adder [30]. . . 11

2.9 4-bit Multiplier [16]. . . 11

2.10 Floating Point Adder (Adapted from [43]) . . . 12

2.11 Floating Point Multiplier [41]. . . 13

2.12 Memory Alignment of Number Formats with Arbitrary Word Length . . . 14

2.13 Energy vs Timing Chart for Fixed and Floating Point [20] . . . 15

2.14 Accuracy of Separation and Execution Time Comparison for Fixed and Floating Point Implementations [37] . . . 16

3.1 Baseband Processing Blocks in UpLink . . . 17

3.2 A sinusoidal obtained from frequency components 15 kHz and 30 kHz along with FFT and IFFT conversions . . . 18

3.3 Cyclic Prefix Addition (Adapted from [51]) . . . 19

3.4 Resource Grid and Slot Configuration [3] . . . 20

4.1 Main Stages in Matlab Simulation & Analysis . . . 24

4.2 Quantization points in CP Removal and FFT Block. Orange (1) indicates optimum fixed point implementation of the whole block and Green (2A, 2B, 2C) indicates individual quantizations for computations CP Removal, FFT and Scaling. . . 28

(6)

4.4 Quantization points in Equalization Block. Orange (1) indicates optimum fixed point implementation block and Green (2A, 2B, 2C) indicates individual quantizations for parameter Weight Matrix and computations Equaliza- tion and Scaling. . . 29 5.1 Reference BER values for fixed and floating point reference models when

MCS is 1 and 22. . . 31 5.2 Reference BLER values for fixed and floating point reference models for

MCS 1 and MCS 22. . . 31 5.3 Reference Throughput values for fixed and floating point reference models

for MCS 1 and MCS 22. . . 32 5.4 BLER with respect SINR for reference models and locally quantized CP

Removal & FFT block . . . 33 5.5 BLER with respect varying Rx Attenuation for reference models and locally

quantized CP Removal & FFT block . . . 34 5.6 BLER for each quantization point in Equalization Block. Optimum case

have 24 bits for Weight Matrix and 12 bits for both Scaling and Equalization computation. Fraction bits in quantization points (Fig. 5.6) are reduced to 7 bits individually. . . 36 5.7 BER for each quantization point in Equalization Block (Fig. 4.4)Optimum

case have 24 bits for Weight Matrix and 12 bits for both Scaling and Equal- ization computation. Fraction bits in quantization points are reduced to 7 bits individually. . . 37 5.8 BLER for each quantization point in Channel Estimation Block. Optimum

case have 12 bits for both Pilot Sequence and De-mapped Data. Fraction bits for both parameters (Fig. 4.3) are reduced to 7 bits individually. . . 38 5.9 BER for each quantization point in Channel Estimation Block. Optimum

case have 12 bits for both Pilot Sequence and De-mapped Data. Fraction bits for both parameters (Fig. 4.3) are reduced to 7 bits individually. . . 38

(7)

LIST OF TABLES

2.1 Data format comparison in terms of number of bits, range and storage space per PRB . . . 14 4.1 Fixed parameters and variables for scenario creations . . . 25 5.1 Configured parameters and computations in CP removal & FFT block . . . 33 5.2 Configured parameters and computations in equalization block . . . 35 5.3 Configured parameters and computations in channel estimation block . . . 36 6.1 Summary Table . . . 40

(8)

3GPP 3rd Generation Partnership Project

5G Fifth Generation (Cellular Communication) 5GNR 5G New Radio

ADC Analog to Digital Converter AFE Analog Front End

AGC Automatic Gain Control AWGN Additive white Gaussian noise BER Bit Error Rate

bfloat16 Brain Floating Point BFP Block Floating Point BLER Block Error Rate bps Bits-per second BS Base Station

BSC Base Station Controller BTS Base Transceiver Station CP Cyclic Prefix

CPU Central Processing Unit CRC Cyclic Redundancy Check DC Direct Current

DFE Digital Front End

DL Downlink

DSP Digital Signal Processor

E Exponent

eCPRI enhanced Common Public Radio Interface eFxP Exponent Scaler Fixed Point

FDD Frequency Division Duplex FFT Fast Fourier Transform FOC Frequency Offset Correction FPGA Field Programmable Gate Arrays

(9)

FxP Fixed-Point

HARQ Hybrid Automatic Repeat Request

HW Hardware

Hz Hertz

I In-phase Signal

IEEE Institute of Electrical and Electronics Engineers IFFT Inverse Fast Fourier Transform

IQFP IQ Floating Point

ISI Inter-symbol Interference

K Thousand

LDPC Low-Density Parity-Check LSB Least Significant Bit LTE Long Term Evolution M Mantissa/Significand

MCS Modulation and Coding Scheme MIMO Multiple-Input Multiple-Output MMSE Minimum Mean-Squared Error MSB Most Significant Bit

N Number

O-RAN Open Radio Access Network

OFDM Orthogonal Frequency Division Multiplexing PRB Physical Resource Block

Q Quadrature Signal

QAM Quadrature Amplitude Modulation QPSK Quadrature Phase Shift Keying RE Resource Element

RF Radio Frequency

RM Radio Module

Rx Receive

s Sign

SC Subcarrier

SCL Successive Cancellation List SCS Subcarrier Spacing

SINR Signal to Interference + Noise Ratio

(10)

(11)

1 INTRODUCTION

This thesis studies how number formats affect the performance of a telecommunication system. Telecommunications is the technology for transmitting any kind of data/information signal from one point to another point. These information signals (audio, video, text etc.) are transmitted with a transmitter through a noisy channel (environment) and received at the destination point that has a supported receiver device.

Fig. 1.1 presents a simplified wireless telecommunication system. User equipment (UE) form a connection to a Base Transceiver Station (BTS) that is in range. UE sends data to base station which is connected through the central Base Station Controller (BSC) to the Core Network. The data is then sent from the Core Network to another user following the same concept [25]. The term "uplink" (UL) means towards basestation, whereas

"downlink" (DL) is towards user equipment.

Figure 1.1. Telecommunication System Structure (Adapted from [25])

Along with the fast growth of the number of users in wireless applications, new methods are searched and developed to provide the best quality for the customers. This increase in demand can lead to technical challenges where high performance of the transmitted and received data is needed.

Digital systems in general require more hardware and higher complexity compared to

(12)

Figure 1.2.Digital Receiver Front End. Adapted from [24].

analog systems. Nevertheless, using digital systems provides higher stability, flexibility and reliable reproduction of a processed data. Analog signals tend to be affected by environmental effects more, and leading to difficulties in reproduction of a signal [10]. A combination of analog and digital systems provides optimal solution for many systems in telecommunication systems.

Fig. 1.2 shows a front end receiver architecture of a Radio Module (RM) on a base station.The uplink signal received by the antenna is firstly processed by Analog Front End (AFE) which includes components such as filters and op-amps. The processed analog signal from AFE is sampled at a fixed rate and converted by Analog to Digital Converter (ADC) to digital domain, since that provides higher flexibility compared to analog processing [39]. Digital Front End (DFE) performs many operations, such as cross-talk cancellation, digital down-conversion, sample-rate conversion, and channel filtering. In other words, DFE is the interface between the RF signal chain and baseband processing [24].

This thesis concentrates on uplink baseband processing and the following sections intro- duce the main steps there.

This thesis is structured as follows:

• Chapter 2 introduces general information related to number formats and how binary numbers can be represented with these formats. Along with the common number formats, fixed and floating point, it introduces special number formats that are placed between the common formats. A comparison follows this in terms of arithmetics, performance, required storage, dynamic range and precision. The main focus here is between fixed and floating point formats.

• Chapter 3 consists of baseband processing blocks in uplink briefly, including the structure of a resource grid and measurement metrics that are mainly focused on this thesis work.

• Chapter 4 explains the proposed simulation environment, the scenarios chosen for the simulations and simulation setups. In addition, it introduces quantization, its configuration, and saturation concepts. Lastly, it presents the points where quantization is applied locally within different baseband processing blocks.

(13)

• Chapter 5 presents the reference models and the results that will be compared with the tested number format configurations. Later, it presents the results obtained from simulation cases that are proposed in Chapter 4. The comparison is made within each baseband processing block separately. Main measurement metrics this thesis provides are BER, BLER and SINR.

• Chapter 6 concludes this thesis work by summarizing the overall idea, used methods, simulation cases and finally the obtained results in Chapter 5. Moreover, this chapter summarizes storage saving obtained from simulation results for each simulation case and provides proposals for the continuation of the thesis work.

(14)

2 NUMBER FORMATS

Baseband processing requires high dynamic range, especially on uplink [40]. The dis- tance variation between users cause the bins to have different power levels - the difference may tens of decibels (dB) [52]. Adding one bit to binary number length doubles the dynamic range, but of course it increases the silicon area and power consumption as well. Hence, there is a trade-off between accuracy and cost. For example, simple methods, such as scaling the symbols, rounding, saturation (keeping the least significant bits) or truncation (keeping the most significant bits), have limited use when high accuracy is needed [55].

Performance of digital signal processing (DSP) depends heavily on number formats. Tra- ditional binary number format, fixed point (FxP), is not feasible, where more dynamic range and accuracy is required [40]. Floating point formats provide high dynamic range compared to fixed point implementations; however, they require more complex arithmetic units, more power consumption and imply slower operation in general [48]. Better alter- natives must be sought and applied to replace traditional fixed point format and IEEE-754 standard floating point to overcome the stated problems. This thesis studies and compares different formats, and provides new solutions that can be implemented in the future studies namely eFxP, IQFP, BFP and bfloat16 which will be introduced in details in the following chapter. Later, the proposed formats will be compared in terms of dynamic range, accuracy, silicon area, memory alignment and conversion between the formats.

2.1 Unsigned and Signed Binary Numbers

A binary number can present 2^N^bits values. For example, a 16-bit binary number has 2¹⁶ = 65 535 = 64K possible values. Then it is a matter ofinterpretation what do they present, integers, fractional numbers, floating point numbers, etc. Most processors have word length that is a multiple of 8 bits (1 byte), e.g. 8, 16, 32, or 64 bits. In contrast, hardware modules and application-specific processors can support arbitrary widths, e.g.

14, 18, 24, or 128 bits [6] [29].

Unsigned Binary number is a format that has non-negative integer values only. Fig. 2.1 shows an example of 8-bit unsigned binary number with the distribution of its weights. The least significant bit (LSB) is usually indexed with 0and then the weight of each location becomes2ⁱ, i.e. 1, 2, 4, 8...

Signed Binary number format can represent both negative and non-negative integer num-

(15)

Figure 2.1. Example of 16-bit unsigned and signed binary number. Note how the same bit patterns can be interpreted in multiple ways.

bers. Most common format is so called "two’s complement". Fig. 2.1 presents the weights of a 8-bit signed binary number as an example. Moreover, we notice that the most significant bit does indicate the sign of the number (’1’ for negative values) but that is a different concept than "sign bit" used on floating point numbers.

For example, a 16-bit word can be interpreted as unsigned or signed integer and the range is

• [0..2¹⁶−1] = [0..65 535] = [0..64K)as unsigned

• [−2¹⁵..2¹⁵−1] = [−32 768..32 767] = [−32K ..32K)as signed Note that the signed range is not symmetric, but usually that does not matter.

2.2 Traditional Fixed Point (FxP)

In digital signal processing applications, fixed point format is used to provide low cost and power consumption compared to floating point [47]. We use notations.x.ywhere the fields indicate the sign, integer and fraction parts, respectively. The binary point is not (usually) stored anywhere, but "kept in mind" while designing the algorithms. Note that sign and integer parts (s.x) are here stored together in two’s complement format. There is not actually any explicit sign bit but this is just a handy notation to distinguish unsigned and signed numbers.

Figure 2.2.Example of 1.4.3b Fixed Point Format

Fig. 2.2 shows an example of FxP 1.4.3 number which has range [−16..16) with steps of2⁻³. Interpreting this example as signed integer gives -14. That can be converted to decimal number as−14/8 =−1.75since we have 8 different fractional values.

(16)

(a)Step Size

(b)Dynamic Range

Figure 2.3. Dynamic Range and Step Size of Fixed-Point Numbers

It is possible to adjust the accuracy and dynamic range of an algorithm by defining a desired number of fraction and integer bits. In general, the integer part has a signed range[−2^x ..2^x−1]and fractional part defines the step size as2^−y, as shown Fig. 2.3.

Adding more fraction bits makes the step size smaller, i.e., increases accuracy. Adding more integer bits increases the dynamic range, i.e., the difference between the largest and the smallest values. Both change exponentially as a function of word length.

Traditional fixed point format used in baseband processing is 16 bits, e.g., FxP 1.0.15 so that the range is [-1 .. 1). This has a handy property that the result of multiplication remains in the same range. In some cases, more bits are needed for better performance, or vice versa, we can use less bits while maintaining adequate performance level with cheaper HW. In order to obtain values outside of this range, scaling is used causing loss in the accuracy which is not always the best solution [27]. It is important to have the right amount of bits to prevent loss of data. The original data might be lost due to limited accuracy of the data format.

(17)

(a)Time Domain

(b)Frequency Domain

Figure 2.4. A sinusoidal sampled by different FxP formats in both time and frequency domain.

Fig. 2.4 shows a sinusoidal wave in time and frequency domain when represented in different FxP formats. In here, signals sampled with high fraction part provides results closer to the original signal due to high accuracy. The frequency domains of these signals are obtained by taking their Fast Fourier Transform (FFT). Output after FFT gives the frequency components of the signal which is 10 Hz in this example. Signals that are quantized with a higher number of fraction bits have output closer to the original signal due to increased precision. Frequency domain clearly shows a spike at 0 Hz (DC bias) because quantization means rounding towards zero. This relative DC level decreases and the amplitude of the main frequency component increases as the number of fraction bits in the quantization increases.

(18)

DC bias. In order to obtain and evaluate more realistic results, noise can be introduced to the original signal.

Figure 2.5.Quantization error for FxP 1.0.1 and FxP 1.0.2 formats relative to the original sinusoidal

2.3 Floating Point

Floating point formats are used in systems where both very large and very small numbers are required. In general, DSPs that support floating point are more expensive and require higher power consumption compared to fixed point ones. However, it provides high accuracy and high dynamic range [27]. According to IEEE 754 standard [26], binary floating point with single precision is 32 bits that consists of 1 sign, 8 exponent bits and 23 significand (also called ’mantissa’) bits. The calculation for single precision floating point is given in (1), wheresis sign,M is significand, andE is exponential. Value 127 in the last term of the multiplication is bias for single precision and 2 is for the base since the format is binary [11].

(−1)^s∗1.M ∗2^(E−127)

A special floating point format, Brain floating point (bfloat16), is used where reduced precision is feasible and increased throughput is needed [23] [9]. It is formed of one bit sign, seven bits exponent and eight bits mantissa. The range of a bfloat16 is same as a 32 bit floating point, but the precision is lower due to the truncated mantissa size. Moreover, reduced mantissa size provides reduced silicon area and multiplier power consumption.

(19)

(a)IEEE 754 Single Precision Floating Point to Decimal Conversion

(b)Brain Floating Point Format (bfloat16) to Decimal Conversion

Figure 2.6. IEEE 754 Single Precision Floating Point, which consists of 1 sign bit, 8 exponent bits and 23 mantissa bits, and bfloat16 representations that is structured as 1 sign bit, 8 exponent bits and 7 mantissa bits. Conversion from binary to decimal examples 242 and -1.75 for both data formats.

Floating-point numbers are normalized in the form of 1.M x 2^e. The ’1’ in here, is an additional bit in significand that is always one. This bit is called hidden bit [42], and it is not stored in the memory since its value is known. Fig.2.6 shows conversions from IEEE-754 single precision floating point and bfloat16 formats to decimal. The hidden bit is not shown in this figure.

2.4 Special Formats

O-RAN (eCPRI) [14] is a standard interface between radio module and baseband processing. It defines a number format called Block Floating Point (BFP) which is a combination of both fixed point and floating point. It can provide higher dynamic range and improved signal-to-noise ratio with modest optical fiber bandwidth compared to a floating point or fixed point arithmetic in equal number of bits [31] [56] [38]. The main idea is that a block of fixed point values has the same exponent. Here the block size is fixed to 1 physical resource block (PRB) which has 12 sub-carriers, i.e., 12 IQ samples. Default BFP consists of 1 sign bit, 9 mantissa bits for each resource element (RE) and a shared 4-bit exponent for each PRB.

There is also a specific format dubbed as IQ Level Floating Point (IQFP), which is a 32 bit float is in form 14+14+4b. It can be used for processing IQ modulated signals, where I means in-phase (real part) and Q means quadrature phase (imaginary part). This format differs slightly from IEEE format, since I and Q components have 14 bit mantissa size in 2’s complement format (no sign bit). Exponent is 4 bit unsigned value and common

(20)

Figure 2.7. Structure of all number formats for both I and Q components

Another special format is the exponent scaler fixed point (eFxP). Although this number format is fixed point, it contains 4 bit exponential in addition to traditional 16 bit FxP and used with fixed point arithmetics mainly. The exponent bits are used for scaling the step size as2ⁿ where n is a non-negative integer, i.e. scaling with 1,2,4,8... Higher accuracy can be achieved with low exponent values or the accuracy can be lowered to reach the range limits with an increased exponent value. It varies case by case how many values share the same scaling exponent, e.g. from 1 PRB (same as BFP) to 273 PRBs (full 100 MHz symbol). Fig.2.7 is showing a summary of the structure of all number formats including both I and Q components.

2.5 Fixed Point & Floating Point Arithmetic

Adders play an important role in processors for various operations such as counters.

Power consumption, dissipation, silicon area and speed requirements determine the design of arithmetics. Fig. 2.8 represents an adder type called ripple carry adder. In this design, multiple full adders are cascaded. In this logic circuit, a full adder is equivalent to 1 bit. First bits of two numbers are added in the first adder giving outputs carry and sum. Carry out of the first adder is the input of the second adder and so on. Fixed point

(21)

adders and multipliers are core elements used in fixed point processors which basically add/subtract or multiply two input operands respectively. Integer arithmetics, e.g., a ripple carry adder, can provide the notation in fixed point [30].

Figure 2.8. 4-bit Ripple Carry Adder [30].

• S = A xor B xor C_in

• Cout = AB + BCin + ACin

Fig. 2.9, is an illustration of 4-bit integer multiplier architecture. As first step, each bit of multiplicand A are multiplied by the least significant bit of multiplier B using’and’ gates.

Later, each bit of A multiplied one by one with the remaining bits of B which are shifted and added by using 4-bit adders. It is important to note that, N-1 number of 4-bit adders needed where N defines number of multiplier bits. The area of the multiplier increases quadratically as the width of the operands increase while the area increases linearly in integer adders.

Figure 2.9. 4-bit Multiplier [16].

• A = a3a2a1a0

(22)

then fed to right shifter. This shifter basically aligns significands (mantissas) of two numbers by the difference of the exponents. This process is followed by adding significands.

The sum is later normalized to obtain a desired format. For example; if the result is

′0.001xxxx...^′ the exponent of the sum is left shifted until the most significant bit is1. In here, whether the sum has over/underflows or not is checked. In parallel to mantissa and exponent computations, the sign bit handling is done by using and XOR operation. It is important to note that this example assumes both addends support IEEE-754 floating point format. Moreover, all blocks mentioned previously in floating point adder arithmetic except adder/subtractor are extra processing blocks compared to fixed-point arithmetic counterpart.

Figure 2.10. Floating Point Adder (Adapted from [43])

In floating point multiplier (Fig. 2.11), the mantissa multiplication is performed in the multiplier tree and passed to the add/round-normalization block similar to the floating point adder. The result of the multiplication is followed by rounding and normalization. In exponent logic, addition of exponents are performed. Lastly, the sign logic is performed using an XOR operation [41] [43].

(23)

Figure 2.11.Floating Point Multiplier [41].

2.6 Storage and bandwidth requirements

Ideally, smaller number formats are packed "tightly" to memory or transmission channel.

Fig.2.12 shows memory alignment for number formats with different word length. In this figure, numbers are 32-bit word aligned. Data is either tightly packed or padded with dummy bits to fit the storage boundary. As can be seen in the figure, some number formats cannot fit within the boundaries and some portion of the data jumps to the next line. Moreover, in tightly packed approach, numbers are stored one after another without any gaps. However, in practice [22], numbers must often be aligned to certain borders which wastes some memory space. For example, if 28-bit values are always aligned to 32-bit words, there will be 4 unused bits. This is called padding and storage fitting is done with dummy bits.

Table 2.1 compares different data formats in terms of number of bits, range and storage space required per PRB. Its calculation is as follows: number of bits per subcarrier x number of subcarriers per block + number of shared bits per block. 16+16b fixed-point takes 12∗32 = 384 bits, and single-precision floating point takes 12 ∗64 = 768 bits.

In here, it is notable that single-precision floating point provides a massive range with only using two times of storage space compared to traditional 16-bit FxP storage space usage. However, bfloat16 achieves the same range as single-precision floating point with the same storage space per PRB as the traditional FxP. The high range by bfloat16 is achieved by having the same number of exponent bits; thus, providing less accuracy with

(24)

Figure 2.12.Memory Alignment of Number Formats with Arbitrary Word Length

its truncated mantissa size. Block floating point (BFP) uses 4 bits of exponent as common within per PRB. Hence, it takes12∗20 + 4 = 244bits storage space, whereas eFxP and IQFP take388and340bits per PRB respectively with the same approach. eFxP format provides an increased range with only 4 bits extra storage space per PRB compared to traditional FxP with the cost of reduced accuracy. According to this table, single and double-precision floating points take the highest storage space and BFP takes the lowest.

Table 2.1. Data format comparison in terms of number of bits, range and storage space per PRB

Data format # of bits (IQ) Range Storage space per PRB

Traditional FxP 32 ±1 384

FLP - Single Precision 64 ±3.4x10³⁸ 768 FLP - Double Precision 128 ±1.8x10³⁰⁸ 1536

eFxP 32 (+4) ±8 388

BFP 20 (+4) ±256 244

bfloat16 32 ±3.4x10³⁸ 384

IQFP 28 (2+2) ±4 340

2.7 Fixed Point & Floating Point Performance Comparison

As stated in the previous chapter, floating point arithmetic is preferred where high dynamic range is required, e.g., MIMO detection algorithms in baseband processing. In digital signal processing platforms, using fixed or floating point computations affect the

(25)

performance of the algorithm that is in subject. The performance of an algorithm can be evaluated in terms of accuracy, execution time, complexity etc.

A study made in [20] presents the performance of an algorithm used in both fixed- and floating-point single instruction multiple data (SIMD) processor cores in terms of energy consumption, execution time and complexity. In here, MIMO detection algorithms are compared. Due to floating point’s high dynamic range, it is possible to implement reduced complexity. On the other hand, fixed point needs additional calculation and computations to compensate this difference in dynamic range. In addition, mantissa in floating point provides scaling which needs to be implemented additionally for fixed point counterpart.

However, fixed point algorithm is less complex compared to floating point. Moreover, two cores have different architectures for MIMO detection algorithm. Therefore, it is important to evaluate these two processors with different algorithms in terms of performance.

Fig.2.13 in [20] provides a timing chart obtained from different fixed and floating point algorithms implemented for MIMO detection. Different algorithm supports different data width. This figure shows the performance in terms of energy (E) versus clock cycle time (Tc). In here, it is clear that floating point consumes significantly higher energy compared to fixed point implementations. However, it provides faster processing with more power consumption per cycle. Therefore, the superior choice between these two implementations depends highly on the used algorithm and the needs. Although the difference in energy efficiency between two data format implementations can be negligible in certain algorithms, the difference changes according to parameters such as modulation order as stated in the provided paper. The results show that, fixed point implementation’s performance catch floating point in better channel conditions. Having a combination of these two processors provides both fixed and floating point benefits which is one of the main discussions in this thesis work.

Figure 2.13.Energy vs Timing Chart for Fixed and Floating Point [20]

In [37], a comparison between floating point implementation, fixed point implementation

(26)

2.14a represents the comparison between these three platforms in terms of CPU execution time with respect to number of samples. CPU execution time is considerably higher for floating point emulation on fixed point DSP compared to both fixed and floating point implementations. The reason behind this huge difference is that the processor used for floating point emulation supports only fixed point computations. On the other hand, Fig.

2.14b presents correlation coefficient used in the proposed platforms for different source signals. It is a measure that indicates the accuracy between the original signal and the separated one obtained from different platforms. According to this figure, the difference is insignificant for each platform, i.e., the original signal is separated successfully. The most important point in this paper is that emulating floating point in DSP that supports fixed point only provides bad performance, high execution time according to the paper.

(a)Number of Samples vs Execution Time

(b)Correlation Coefficient

Figure 2.14. Accuracy of Separation and Execution Time Comparison for Fixed and Floating Point Implementations [37]

(27)

3 BASEBAND UPLINK PROCESSING

5G wireless communication requires complex computations and high energy consumption. Thus, it requires different architecture types to achieve the best possible performance [17]. Baseband processing within the Radio Access Network (RAN) is split in various ways. Fig. 3.1 shows a simplified data flow of a signal received from Digital Front End (DFE) through Baseband Processor blocks. In this architecture, the first layer is split into two components as indicated with different colors. The first part of this split includes baseband processing starting with CP Removal and ending with Equalization.

The second half continues from equalization and ends with decoding.

Figure 3.1. Baseband Processing Blocks in UpLink

Crosstalk causes difficulties on reconstruction of the received data and Orthogonal Fre- quency Division Multiplexing (OFDM) is a common method to overcome this problem.

This method uses multiple sub-carriers (SC), i.e., dividing the bandwidth into a defined amount of slots. They are orthogonal to each other which prevents undesired interference. Each sub-carrier has a frequency that is an integer-multiple of sub-carrier spacing (SCS), for example ±15,30,45 kHz [57]. Sub-carrier has I (cosine) and Q (sine) waves whose amplitude is modulated based on encoded user data. OFDM is a suitable solution for mitigating frequency-selective fading. The frequency response of a channel will be almost flat for each sub-carrier. Moreover, this method provides high spectral efficiency and high tolerance to delay spread. Introducing guard intervals/spacing between these sub-carriers prevent inter-symbol interference [10] [21].

The very first module of baseband uplink processing in the base station is Fast Fourier Transform (FFT). In general, downlink has opposite operations, for example the last step

(28)

shows an example for both of the transforms. Two sinusoids with different frequencies are summed in time domain and its transform in frequency domain is represented. We can consider these as I components of two sub-carriers. The figure also shows their sum which corresponds to the signal transmitted via antenna. (Once we sum thousands of subcarriers together, the result is not usually very informative for human readers.)

Figure 3.2. A sinusoidal obtained from frequency components 15 kHz and 30 kHz along with FFT and IFFT conversions

Cyclic Prefix (CP) addition is done along with the IFFT module upon transmission to reduce the effects of Inter Symbol Interference (ISI) and provide better channel estimation.

ISI is the distortion when subsequent symbols interfere each other and distort the signal.

It happens, for example, when radio signal propagates via multiple paths, so the effect reminds echo of audio signals. Therefore, the end of the time-domain symbol is copied and added to the beginning of the same data. The length of the cyclic prefix is such that all echoes of previous symbols are likely ceased before the actual data of the current symbol starts. This method also provides an efficient way of finding symbol borders. CP removal is following the same concept but subtracting the additional portion of received data at the receiver side [15] [21].

Beamforming at receiver side basically combines the received data, providing signal sum- mation in required angles and cancellation in other angles. Digital beamforming main- tains higher flexibility compared to analog beamforming using such devices like Field Programmable Gate Arrays (FPGA) which are reconfigurable. However, a combination of analog and digital beamforming (hybrid) is preferred in many applications [35]. The next

(29)

Figure 3.3.Cyclic Prefix Addition (Adapted from [51])

module is Resource Element (RE) Mapping for the transmitter and RE De-Mapping for the receiver side where symbols with complex values are mapped to resource elements in resource blocks and vice versa [4]. In the receiver side, the extracted bits are processed through Equalization and Channel Estimation modules. As the name implies, the channel estimation detects the channel effects to obtain original transmitted symbols that are modified throughout the channel. This block processes pilot bits added as cyclic prefix.

These channel effects are removed with the Equalization. This thesis work uses minimum mean-squared error (MMSE) equalizer method. This method utilizes SNR providing high immunity to noise [33] [40] [53].

The second split starts with Layer De-Mapping. Basically layer mapping distributes modulated symbols across different transmission layers. Rank of the channel matrix defines the size of the layers . For instance, a 3x2 MIMO system has rank of 2; thus, provides only 2 layers. Multi-layer transmission is supported in systems with OFDM. Layer demapping extracts these mapped codewords from the layers in uplink transmission. Demodulation, where complex-valued symbol is converted back to bits, follows Layer De-Mapping [2]

[33]. Supported modulation schemes include QPSK, 16 QAM, 64 QAM and 256 QAM etc.

In transmitters, scrambling randomizes the data in bit level, i.e., before symbols are gen- erated. It introduces a scrambling sequence to secure the data and prevent so called cyclostationary property [28]. Receiver removes this scrambling effect by Descrambling, to retrieve the information. One of the proposed ideas to detect this scrambling code is to observe the changes in the autocorrelation of the incoming signal. Rate De-Matching follows descrambling process which matches radio frame with the block size. In here, either additional bits are introduced or extra bits are removed.

Encoding in transmitter is needed to provide better transmission in unideal environment where intentional or unintentional interference may occur. Therefore, it is necessary to encode the data to be transmitted for higher reliability. One of the encoding techniques that 5G supports in uplink is Polar coding, which is provided by Arikan [5] [7]. Decoders, which is the last baseband processing block in 1st layer, basically process the data in-

(30)

Baseband processing includes many more operations such as; automatic gain control (AGC), frequency offset correction (FOC) and many more. However, this thesis study doesn’t go into detail within the blocks provided in Fig. 3.1.

3.1 Resource Grid

Fig. 3.4 shows so called resource grid which is a common way to present OFDM data.

Time is X axis and sub-carrier frequency is on Y axis. Each small box in the grid is a single frequency-domain IQ sample. User data and control information are allocated to this grid per slot. For example, UserA might get sub-carriers 0-50, UserB gets sub-carriers 51-80 and so on. In addition to user data, 3GPP standard also defines when and where to send the control information [1].

Figure 3.4.Resource Grid and Slot Configuration [3]

This example resource grid is equivalent to the duration of one sub-frame and the whole

(31)

carrier bandwidth, e.g., 500µsand 100 MHz. A sub-frame consists of at least one single slot which is formed by 14 OFDM symbols. However, 5G standards support multiple nu- merologies and due to that the sub-carrier spacing and the number of slots and symbols within a sub-frame changes. The details are not relevant for this thesis. The table in Fig. 3.4 is showing the number of slots per sub-frame and number of OFDM symbols per sub-frame for each numerology.

The smallest unit in this structure is called resource element (RE) which determines one subcarrier in frequency domain and one OFDM symbol in time domain. As mentioned, 12 consecutive REs form a physical resource block (PRB or RB). It is necessary to understand this term in order to understand the concepts Block Error Rate and Block Floating Point which are explained in the following sections.

3.2 Measurement Metrics

The transmitted or received bit rate in a transmission system is measured at bits/sec, abbreviated asbps[8]. As the name implies, it is the number of bits transmitted through the channel per second. However, throughput is known only once we have simulated all layers of the functional stack.

Fortunately, bit error rate (BER) and block error rate (BLER) are indirect performance metrics that are easier to analyze [12] [34]. They can help understanding how throughput changes, although there’s no linear dependence. Channel fading, interference, multipath propagation, non-linearity of devices, and other factors distort the signal which causes errors in data transmission. Channel coding (Turbo, Polar, LDPC...) seeks to detect and correct transmission errors [46]. After decoding we can measure this lowest level metric.

Control of data transfers happens in blocks, and BLER measures the fraction of blocks containing one or more errors. Hence these error rates are defined as

BER = Nbit_error / Nbit_total (3.1)

BLER = Nblock_error/ Nblock_total (3.2)

BLER≤BER. (3.3)

Digital communication systems have requirements concerning the maximum amount of BER which should not be exceeded. It is possible to provide transmissions with less error rates with techniques such as Channel Equalization and Channel Coding. BER is a performance metric that can be measured in different processing blocks. It can be either measured after demodulation process, where the ratio is between obtained erroneous bits and the total number of received bits, or the other option is to measure it after decoding process which compares false bits after the error correction process

(32)

kilobits. Incorrectly decoded one or more symbols within a block leads to block errors and the ratio is between the incorrect decoded symbols over the total number of symbols within the block [32] [13]. An erroneous block is defined as a transport block whose cyclic redundancy check (CRC) is wrong. Corrupted blocks must be re-transmitted which obviously reduces throughput. One common protocol is called Hybrid Automatic Repeat Request (HARQ). Similar to BER, BLER is calculated after decoding process. When error rate increases, the system may switch to simpler modulation schemes that are more robust against errors. When the number of errors grows too much, it will lead to call drops [19]. In worst case, every block is corrupted and every re-transmission fails as well, and throughput drops to zero. Therefore, simulation results may show abrupt changes in throughput, when BLER becomes less than 1.

(33)

4 MODELLING

In engineering, it is likely to detect errors in integration of hardware and software. These errors can be costly, and might lead to loss of time and effort. It is not always afford- able and convenient way to detect faults or errors after integration, therefore, modelling is proposed as a critical engineering technique. With modelling, it is possible to visualize the communication system, see its behavior with different scenarios and more impor- tantly, to observe any errors, faults and unwanted performance/functionality. In this thesis work, the objective is to use a model that adapts a communication system with defined scenarios and testing its performance.

4.1 Simulation Environment

This thesis work uses Matlab, which is a tool that supports LTE, TDD, FDD and 5GNR standards. Moreover, it is possible to evaluate the performance of fixed-point and floating- point performances in different use cases. With Matlab, measurements metrics such as BLER, BER, Throughput, SINR and many more parameters can be obtained and analyzed.

Fig. 4.1 shows the main steps for simulation. The first step of this simulation environment is to modify an existing scenario or create a new one. After the scenario creation, we obtain a ’.scn’ file which contains all the fixed and varying parameters. This file contains all information about cell and channel parameters including the supported standard, carrier frequency, modulation order, subcarrier spacing etc. Moreover, it is possible to select an appropriate simulation duration to define the data rate or selecting the number of basestation or user equipment and the antenna ports within these units. In this study, the receiver attenuation is defined as variable. However, Matlab provides its users to define many other variables depending on the use case including all the parameters that are defined in Table 4.1. The next step is to run Matlab and that can be done in various ways. The default results are obtained using floating-point computation entirely. Since the scope of the thesis is to apply different number formats and analyze the difference to this default format, we need to change floating point into fixed point. There are two ways to apply this

1. Simple way is just quantizing the input and output of baseband operations

2. Harder way is to do all baseband operations with internal calculations in fixed point.

(34)

Figure 4.1. Main Stages in Matlab Simulation & Analysis

The latter method is more accurate but also more complex to apply it. In this thesis we will mainly focus on changing input and output parameters of baseband processes.

Moreover, a fixed-point switch, which is implemented by the designers in Nokia, is used to have a comparison between these methods. Matlab creates a .res file where all the scenario parameters and results are dumped. The results in this file can be plotted using any tool such as Excel or Matlab itself for analysis.

4.2 Simulation Scenario

Since the scope of this thesis study is to observe the receiver side of a base station, the transmission type is chosen as uplink for both scenarios. Simulations in this thesis work support NR5G standards and parameters are chosen according to this standard. In order to start with a simple case, the beamforming method is canceled. Different simulation lengths are chosen for each scenario to find the optimal duration for clear results. The trade-off is between obtaining more accurate results with higher data transmission and higher simulation time.

Table 4.1 lists the fixed and variable parameters for both scenarios. These scenarios support both floating and fixed point implementations. Quantization is applied separately only in certain points in certain baseband processing blocks which is within the running codes in Matlab. Unfortunately, the structure of codes are not presented in this thesis due to credentials. Matlab stores all these defined parameters in ’.scn’ files to be executed later.

It is possible to vary any given parameter within this table for further analysis. However, it is convenient to make a sweep by changing the receiver attenuation to obtain throughput, BER and BLER with respect to varying SINR. The range for receiver attenuation is different for each scenario. Modulation and coding schemes affect the results significantly and the idea behind these simulations is to show where these scenarios achieve transmission and where it is shut down completely. Therefore, the simulation range must contain both features for a clear comparison in the end. Using a very wide receiver attenuation range

(35)

Table 4.1. Fixed parameters and variables for scenario creations

is not preferred in order to reduce simulation duration. Simulation parameters are the first defined parameters in Matlab. The 1st scenario is executed for receiver attenuation between -15 dB to + 15 dB with 1 dB steps while the 2nd scenario has the range of -30 dB to -6 dB with 2 dB steps. The results of many simulations show that it is adequate enough to use simulation length as 1 second for both scenarios. The trade-off here is between simulation duration and more accurate results where increased simulation length increases the data rate.

Cell and channel parameters in both scenarios support 3GPP 38211.g20 [1] standard and are the same for both scenarios. The main difference between these scenarios is the Modulation and Coding Scheme which is 1 for the 1st scenario and 22 for the 2nd scenario. Channel type is chosen as additive white Gaussian noise (AWGN) which basically introduces a wideband noise with a constant spectral density [49]. Lastly, unit parameters consist of the number of user equipment, number of antenna ports for the receiver

(36)

4.3 Quantization

Fixed point implementations are applied with a Matlab function called Quantize. This function takes a signal to be quantized and 5 separate input arguments. These input arguments are Quantization Type, Saturation Type, Sign, Integer Bit and Fraction Bit Amount. The quantization type defines the type of rounding whether the number will be rounded towards the ceil or floor etc. With this function, it is possible to arrange the total word length or the sign. Therefore, the accuracy and precision of the number format can be defined and optimized for an operation.

In the following section, effects of different word lengths and the change in fraction bits are presented and analyzed.

Quantized_signal = Quantize(F lp_signal,6,1,1,4,3)

• Quantization Type

– 0 = Return input data unchanged – 1 = Round towards minus infinity

– 2 = Round half-way cases towards infinity – 3 = Round always towards plus infinity – 4 = Round towards 0

– 5 = Round towards plus/minus infinity for positive and negative values respectively

– 6 = Round half-way cases away from 0

• Saturation Type – 0 = No saturation

– 1 = Enable saturation in case of over/under flows – 2 = Enable saturation

– 3 = Enable symmetric saturation (no logging) – 4 = Enable symmetric saturation (logging enabled)

• Sign

– 0 = Unsigned fixed-point format – 1 = Signed fixed-point format

• Number of Integer Bits

(37)

• Number of Fraction Bits

4.4 Saturation

Arithmetic operations such as addition and multiplication require rounding and saturation features [45] [36]. A multiplication of two operands with size n produces product in the size of 2n, i.e., increased word length. This leads to increased area, power consumption, delay, etc. Therefore, rounding or saturating the result is essential to provide high performance. In case of obtaining very high or low values in the result, one solution is to saturate the result to the boundary levels, i.e., to the highest or lowest representable values. For example, in case of overflow, all bits (n bits in previous example) are set to 1, providing the highest possible number (2ⁿ−1). Reducing the word length is determined by dynamic range and accuracy requirements. In general, the concept in using fixed- point conversions is to keep the integer word length such that overflows are prevented and fraction word length reduces with a sufficient accuracy. For instance, in traditional fixed-point format (FxP 1.0.15), any result exceeding its range saturates to 0.111 ... 1 in positive or 1.000 ... 0 in negative. This thesis study uses rounding towards minus infinity as quantization type and enables saturation in case of over/underflows. The proposed function,Quantize, provides these features automatically by detecting over/under flows, then saturates and rounds the input.

4.5 Quantization Points

Created scenarios can be evaluated in three ways;

1. floating point

2. fixed point full implementation 3. local quantization

Firstly, Matlab is using floating point as default which is used among the reference models. Secondly, activating full fixed-point implementation is achieved along with defining the scenario parameters which provides the optimum configuration for fixed-point format at each parameter and computation. Lastly, local quantizations on these individual parameter and computations present how data width and accuracy affects the simulation results. In order to analyze the effect of local quantizations, quantization points/areas are defined in this section. The reason behind applying local quantizations is to have simulations with the same data format in each baseband processing block, whereas the optimal fixed point implementation supports different configurations in different blocks and steps.

Comparison for fixed and floating point computations starts with the very first block of baseband processing which is CP Removal and FFT. Data received at the receiver is processed in this block for baseband processing which is an essential point to make ob- servations. FFT is one of the most complex baseband operations in the receiver chain

(38)

Figure 4.2. Quantization points in CP Removal and FFT Block. Orange (1) indicates optimum fixed point implementation of the whole block and Green (2A, 2B, 2C) indicates individual quantizations for computations CP Removal, FFT and Scaling.

which requires an observation in this study. Quantized data in this block is fed to other blocks. Thus, it is expected to have an impact on results by reducing the accuracy of used data formats. Fig. 4.2 is showing step by step starting from cyclic prefix (CP) removal, following with FFT and scaling. Colored numbers in this figure are indicating each simulation case considering this block. ’1’ in orange color basically shows the simulation case where full fixed point implementation is activated. ’2’s in green color, on the other hand, are showing the local quantization points. The same applies for the following baseband processing blocks.

Fixed point representation for FFT input is as follows;

InputW idth= [1,0,14]

Input=Quantize(V ector_i,2,1, InputW idth(1), InputW idth(2), InputW idth(3));

Vector_i indicates the received input for FFT after CP removal is computed. 2 and1are the rounding and saturation options which provided in Section4.3 respectively. Following three components ofQuantizeprovide the sign bit, integer word length and fraction word length in order. This is the optimized configuration for fixed-point implementation. The accuracy of this is reduced and compared in the next chapter.

The second baseband processing that is of interest is the channel estimation block.

Quantized parameters within these blocks are the ’pilot sequence’ and ’demapped data’

which is the output of the RE Demapping block.

The last block in this thesis work that is analyzed is equalization where weight matrix

(39)

Figure 4.3. Quantization points in Channel Estimation Block. Orange (1) indicates optimum fixed point implementation of the whole block and Green (2A, 2B) indicates individual quantizations for inputs RE De-mapped and pilot sequence.

obtained from channel estimator and demapped data from RE de-mapper are computed.

Scaling of equalized data is the last step within the block. Fig. 4.4 is showing each step in order, and the defined quantization points are given.

Figure 4.4.Quantization points in Equalization Block. Orange (1) indicates optimum fixed point implementation block and Green (2A, 2B, 2C) indicates individual quantizations for parameter Weight Matrix and computations Equalization and Scaling.

(40)

5 RESULTS

This section introduces Matlab results obtained for reference models which are double precision floating point (default) and 16+16 (IQ) full fixed point implementation. The same scenario is simulated separately for both number formats and Matlab activates fixed-point implementation for all baseband processing by introducing a switch along with creating the scenario. Due to credentials, it is not possible to share the exact codes. However, it can be illustrated with a structure as follows:

U nit1_F xpEnable = 1

Unit is indicating whether we want to configure user equipment or base station and (1) is used for base station for this scenario. As the name implies, FxpEnable is the parameter for which component of the base station that can be configured. In order to activate the fixed point implementation, this parameter must be switched on by setting its value to ’1’. Similarly, all baseband processing blocks must be activated to obtain fixed point computations.

5.1 Matlab Results for Reference Models

The comparison in reference models includes both fixed and floating point difference for various MCS values which are 1 and 22 (Table 4.1) in this thesis study. Fig.5.1 shows the BER values obtained with respect to increasing SINR values for varying receiver attenuation as stated in Section 4.2. According to this figure, SINR and BER values obtained for the same receiver attenuation varies significantly for each simulation. Clearly the average BER throughout the simulated range is lower for both number formats where MCS is 1. On the other hand, it requires higher SINR for MCS 22 to obtain transmission without errors. MCS 1 achieves BER 0 even with negative SINR values. Furthermore, there is a sharp drop directly from about 0.3 BER to 0. Behaviour of number formats vary for different MCS values. Although fixed and floating point results are coupled for different MCS, the difference between these formats are more significant for lower MCS.

While floating point BER drops to 0 at -10 dB SINR, it drops to 0 at -7 dB for fixed point implementation. In addition, the reduction of BER to 0 for MCS 22 results is gradually while it reduces sharply to 0 for MCS 1 as with the increasing SINR and decreasing receiver attenuation.

(41)

Figure 5.1. Reference BER values for fixed and floating point reference models when MCS is 1 and 22.

It is not always always clear that at which BER value we can still have maximum or at least some portion of throughput. Another way to compare the proposed scenarios is to plot the BLER obtained for the same SINR results. Fig.5.2 presents this comparison for each MCS and data format. Unlike BER results, values in this figure are 0 and 1 meaning that blocks are transmitted or lost entirely. BLER for each simulation case drops to 0 at the same SINR level where BER values drop to 0 as given in Fig.5.1. The difference between fixed and floating point formats are more clear for lower MCS results. This figure depicts that optimal fixed point implementation is adequate compared to floating point reference model.

Figure 5.2. Reference BLER values for fixed and floating point reference models for MCS 1 and MCS 22.

(42)

with BLER results where it drops from 1 to 0. Maximum throughput obtained for MCS 1 and MCS 22 are about 640 kbps and 8.2 Mbps respectively.

According to the results we can see that it is clear at which SINR transmission occurs. In overall, BER and BLER are lower for low MCS. Moreover, it is common in all the simulation cases that maximum throughput is achieved only where BER and BLER are at 0. These results are reference for the locally quantized scenarios in the following sections.

Figure 5.3. Reference Throughput values for fixed and floating point reference models for MCS 1 and MCS 22.

5.2 Quantized CP Removal and FFT

This section introduces simulation results obtained for different data format configurations in the very first baseband process in the receiver which is CP Removal and FFT. Table 5.1 presents optimal fixed point configuration, reduced fixed point configuration and the ratio between these two formats for each parameter. Scaled data requires higher dynamic range and longer data word length compared to other parameters. The lowest accuracy that can be achieved in simulations, is with 9 fraction bits for CP Removal and FFT.

Further decreasing the word length corrupts the results, preventing any throughput at the output. For comparison purposes, the accuracy is reduced to the same number of fraction bits. This results in memory saving from minimum 20% to maximum 37.5%.

Fig.5.4 depicts the BLER obtained for the scenario with MCS 22 and shows the effects of each configuration provided previously. In this figure, quantized CP Removal and FFT refers to local quantizations at input of these functions. According to this figure, full trans-

(43)

Table 5.1. Configured parameters and computations in CP removal & FFT block

Parameter Optimum Configured Reduction CP Removal FxP 1.0.15 FxP 1.0.9 37.5%

FFT Input FxP 1.0.14 FxP 1.0.9 33.33%

Scaling FxP 1.10.14 FxP 1.10.9 20%

mission is achieved at SINR 15.5 dB and 13.1 dB for quantized FFT and CP Removal where it is about 14 dB for the reference models. At first, this plot shows high performance for all simulation cases, in other words almost same performance is achieved for quantized results compared to reference models. Moreover, quantization of CP Removal curve achieves maximum throughput at lower SINR values. However, it is hard to inter- pret which simulation case has the highest performance by checking only SINR on the x-axis which is an indirect parameter obtained after running the simulation. Therefore, it is more convenient to set receiver attenuation on the x-axis and check both results.

Figure 5.4. BLER with respect SINR for reference models and locally quantized CP Removal & FFT block

Fig.5.5 illustrates the BLER achieved at each receiver attenuation. This figure depicts clearly that at which receiver attenuation throughput is achieved. In other words, transmission occurs where BLER is 0. In here, it is clearer to define a threshold where throughput is observed. It shows that reference models provide throughput at -14 dB attenuation where it is -24 dB and -26 dB for CP Removal and FFT respectively. As expected, reference models are more resistant to higher attenuations. It is more evident that FFT process is more sensitive compared to other steps within the block. On the other hand, reducing the accuracy of scaling to the same number of fraction bits with other parameters doesn’t change the results in this case providing even higher storage saving. In addition, this figure shows that reduced accuracy distorts SINR values along with the computation

(44)

Figure 5.5. BLER with respect varying Rx Attenuation for reference models and locally quantized CP Removal & FFT block

The difference between Scaling and reference models in terms of BER is negligible.

These results support the idea given in previous section where maximum throughput is achieved only when BER and BLER are 0. Therefore, it is not necessary to analyze also throughput for this and the following simulation cases.

5.3 Quantized Equalization

The scenario including MCS 22 requires higher data accuracy compared to the 1st scenario. Thus, it is not possible to reduce the accuracy of default configurations. Therefore, the analysis for equalization block contains simulation results obtained from the 1st scenario (Fig. 4.1) for each equalization configuration. The simulation results in this section consist of reference models and local quantizations defined in Fig. 4.4. Table 5.2 depicts optimum fixed-point format in Matlab and configured formats for each parameter. Accu- racy of reduced word length in each step of equalization is achieved by reducing fraction bits to 7 bits one by one. Similarly to CP Removal and FFT block, the defined format for each parameter is different depending on the dynamic range and accuracy requirements.

Therefore, the reduction is different in each quantization point to achieve equal accuracy.

Unlike the 2nd scenario, it is possible to reduce the number of fraction bits to 7 bits where scenarios with lower MCS values are less sensitive to accuracy. New configurations in these cases provide memory saving from 12.5% to 33.3%.

Fig. 5.6 presents BLER for each simulation with varying SINR. We can deduce that the accuracy of number formats affect results significantly. In this scenario, highest throughput is 638 kbps and floating point implementation achieves this where RX attenuation is

(45)

Table 5.2.Configured parameters and computations in equalization block

Parameter Optimum FxP Configured FxP Reduction

Weight Matrix 1.13.10 1.13.7 12.5%

Equalization 1.0.11 1.0.7 33.3%

Scaling 1.1.10 1.1.7 25.0%

lower than 5 dB and SINR is approximately higher than -10 dB. BLER with lowest SINR after floating point is obtained for the optimum fixed-point implementation. In this case, BLER is 1 where SINR is higher than -7 dB and RX attenuation is lower than 4 dB. These results are followed by local quantizations. Apparently, local quantizations for scaling process and weight matrix have similar impact to the throughput in this scenario. These simulation cases achieve successful transmission at SINR higher than 3.45 dB and RX attenuation is lower than -4 dB. Quantized equalization curve is placed on the most right in the figure, showing that the impact of reduced accuracy was the highest compared to the other steps. In this step, throughput is seen where SINR is higher than 6.5 dB and RX attenuation is lower than -7 dB. Simulation results are showing that, the maximum throughput drops sharply to 0 after a certain RX attenuation is achieved. This behaviour is directly proportional to the BLER results. Therefore, it is not necessary to plot throughput versus SINR graph. Furthermore, we can also deduce from this graph that a number format with same accuracy and dynamic range applied in each processing step can lead to different results. It is important to notice that, although Weight Matrix requires high dynamic range as Scaling after FFT step, it is more sensitive to data accuracy. In other words, transmission only occurs for higher SINR values compared to reference models for reduced weight matrix accuracy.

Another figure of merit is comparing BER results for each simulation case. Fig. 5.7 presents BER results for each simulation case for the equalization block. Similar to the previous graph, floating point results are on the most left of the graph and configured equalization results are on the most right. The rate of reduction for BER is higher in floating point and optimum fixed point simulations compared to the other setups. It reduces significantly from its maximum value, which is approximately 0.5, to 0 for SINR values 9.8 dB and 7 dB respectively. These curves are followed by configured weight matrix and scaling results as we obtained in Fig. 5.6. Moreover, it is important to notice that the throughput is achieved where BER is 0 for all simulation cases. Unlike the throughput graph, BER is not reducing sharply but gradually as SINR increases.

5.4 Quantized Channel Estimation

This section provides similar approach for channel estimation block as it in the previous blocks to analyze its characteristics with different number format configurations. In this block, quantization applies for two main parameters which are Demapped data and pilot sequence. Table 5.3 presents optimal fixed point configuration, reduced new configura-