AXI-Stream VIP Optimization for Simulation Acceleration : a Case Study

(1)

Eetu Kokkonen

AXI-STREAM VIP OPTIMIZATION FOR SIMULATION ACCELERATION

a Case Study

Master of Science Thesis Faculty of Information Technology and Communication Sciences Examiners: Professor Timo D. Hämäläinen

MSc Antti Rautakoura

March 2021

(2)

ABSTRACT

Eetu Kokkonen: AXI-Stream VIP Optimization for Simulation Acceleration Master of Science Thesis

Tampere University

Master’s Degree Programme in Electrical Engineering March 2021

The purpose of this thesis was to optimize Nokia’s already existing AXI-Stream verification IP to perform better in simulation acceleration and compose coding guidelines for developing test environments that can be used in simulation acceleration. The simulation acceleration occurs by using an emulator to simulate a design under test and a workstation or a server to simulate the test environment. The emulator is a supercomputer specifically designed for verification purposes and its properties enable more efficient and faster simulation performance over traditional simulation. The traditional simulation occurs by using a workstation to simulate both the test environment and the design under test. The simulation acceleration is used to shorten verification time of digital circuits and IP-blocks. It makes possible to develop the circuits faster since time used in verification can take up to 80 percent of the total time used in the whole development project.

Functional verification and its execution methods such as simulation, simulation acceleration and formal verification are discussed after introduction. Thus, an overall picture of the context of the simulation acceleration is pursued to be formed. Theory also covers presentation of SystemVerilog programming language and Universal Verification Methodology because the test environment and AXI-Stream verification IP are made using them.

In the traditional simulation the used test environment consists usually of one domain whereas in simulation acceleration the test environment is divided into two domains. The division is done to synthesizable and non-synthesizable domains of which the former is simulated in the emulator and latter in the workstation. The synthesizable domain includes the design under test and part of simulator and emulator’s communication interface. The non-synthesizable domain contains the other part of the communication interface and remaining test environment components such as input signal generation and result checking. In this thesis the AXI-Stream verification IP operates as the communication interface in the test environment.

The benefit that can be achieved with simulation acceleration is affected by relation of the time used to simulate the test environment versus the design under test, size of the design, length of test and the communication between the simulator and the emulator. The AXI-Stream VIP can be used to affect the efficiency of the communication in the used test environment. It was optimized to transfer larger data amounts in less frequent manner between the simulator and the emulator which decreases the communication time when compared to the initial verification IP version.

The optimized verification IP was compared to initial version using four tests. The longest test provided good results because the simulation run time almost halved. There were only marginal differences in results with the three other tests because these tests were relatively short and did not thus suit to simulation acceleration as well as the long tests.

Keywords: simulation acceleration, simulation, AXI-Stream VIP, emulator

The originality of this thesis has been checked using the Turnitin OriginalityCheck service.

(3)

TIIVISTELMÄ

Eetu Kokkonen: AXI-Stream varmennuslohkon optimointi simulointikiihdytystä varten Diplomityö

Tampereen yliopisto

Sähkötekniikan tutkinto-ohjelma Maaliskuu 2021

Työn tarkoitus oli optimoida Nokian olemassa olevaa AXI-Stream varmennuslohkoa soveltumaan paremmin simulointikiihdytykseen sekä koota ohjelmointiohjeistus simulointikiihdytystä varten rakennettaville testiympäristöille. Simulointikiihdytyksessä testattava lohko simuloidaan emulaattorilla ja testiympäristö työasemalla tai palvelimella. Emulaattori on erityisesti varmennusta varten suunniteltu supertietokone, jonka ominaisuudet mahdollistavat tehokkaamman ja nopeamman simuloinnin perinteiseen simulointiin verrattuna. Perinteisessä simuloinnissa sekä testattava lohko että testiympäristö simuloidaan työasemalla tai palvelimella.

Simuloinnin kiihdytyksellä pyritään lyhentämään digitaalisten piirien sekä IP-lohkojen testaamiseen kuluvaa aikaa. Sitä hyödyntämällä piirien kehitystä voidaan nopeuttaa, sillä varmennukseen voi kulua kehitysprojekteissa aikaa jopa 80 prosenttia koko projektiin käytetystä ajasta.

Johdannon jälkeen työssä käydään läpi funktionaalista varmennusta ja sen toteutuksen tapoja kuten simulointia, simulointikiihdytystä sekä formaalia varmennusta. Näin pyritään luomaan kuva simulointikiihdytyksen kontekstista sekä käytön päämärästä. Teoriaosuudessa esitellään myös SystemVerilog-ohjelmointikieltä sekä UVM-varmennusmenetelmää, sillä työssä käytetty testiympäristö sekä AXI-Stream varmennuslohko on rakennettu niiden avulla.

Perinteisessä simuloinnissa käytetty testiympäristö koostuu yleensä yhdestä kokonaisuudesta, kun taas simulointikiihdytyksessä testiympäristö on jaettu kahteen osaan. Jako tehdään syntetisoitavaan ja ei-syntetisoitavaan osaan, joista syntetisoitava osa simuloidaan emulaattorissa ja ei-syntetisoitava osa työasemalla. Syntetisoitava osa sisältää testattavan lohkon sekä osan simulaattorin ja emulaattorin kommunikointirajapinnasta. Ei-syntetisoitava osa sisältää toisen puolen kommunikaatiorajapinnasta sekä muut testiympäristön osat, kuten signaalien luonnin ja tuloksen tarkastuksen. AXI-Stream varmennuslohko toimii työssä käytetyssä testiympäristössä simulaattorin ja emulaattorin kommunikointirajapintana.

Simulointikiihdytyksellä saavutettuun hyötyyn vaikuttaa testiympäristön sekä testattavan lohkon simulointiaikojen suhde, testattavan lohkon koko, testin pituus sekä emulaattorin ja simulaattorin välinen kommunikointi. AXI-Stream varmennuslohkolla voidaan vaikuttaa kommunikoinnin tehokkuuteen. Varmennuslohko optimoitiin lähettämään harvemmin suurempia määriä dataa kerralla, mikä vähentää kommunikointiin kuluvaa aikaa sen alkuperäiseen versioon verrattuna.

Optimoitua varmennuslohkoa verrattiin alkuperäiseen versioon neljällä testillä. Pisimmän testin kohdalla tulos oli hyvä, sillä simulointiin kulunut aika lähes puolittui. Kolmen muun testin kohdalla eroa alkuperäiseen ei juuri ollut, koska testit olivat melko lyhyitä ja eivät täten sovellu simulointikiihdytykseen yhtä hyvin kuin pitkät testit.

Avainsanat: simulointikiihdytys, simulointi, AXI-Stream VIP, emulaattori

Tämän julkaisun alkuperäisyys on tarkastettu Turnitin OriginalityCheck –ohjelmalla.

(4)

PREFACE

I want to thank Nokia Solutions and Networks Oy and my manager Sakari Patrikainen for providing the subject for my master’s thesis. I also want to thank the whole L1low team for helping me in the process, especially Govind Venkatesan who gave important guidance in technical emulator relating issues.

I would also like to thank my supervisors Timo D. Hämäläinen and Antti Rautakoura for the advices and feedback provided for the thesis.

I am extremely satisfied with my decision to apply to the Tampere University of Technology over 6 years ago. The courses have been very interesting and provided lots of new information and skills. In addition, I was able to view the world as an exchange student in Singapore which has been one of the highlights during these years. Although the studying itself has been rewarding, the journey would have been nothing without all the friends I have made during it.

Finally, I want to thank my beloved family and Stina. You have encouraged and supported me throughout my studies in both good and difficult moments. It has been very important. Thank you.

Tampere, 29th March 2021 Eetu Kokkonen

(5)

LIST OF SYMBOLS AND ABBREVIATIONS

ADC Analog-to-Digital Converter

AMBA Advanced Microcontroller Bus Architecture API Application Programming Interface

ASIC Application Specific Integrated Circuit AXI Advanced eXtensible Interface BFM Bus-Functional Model

CPU Central Processing Unit DAC Digital-to-Analog Converter DSP Digital Signal Processing DUT Design Under Test

EDA Electronic Design Automation FPGA Field-Programmable Gate Array HDL Hardware Description Language HVL Hardware Verification Language HW Hardware

I/O Input-Output IC Integrated Circuit

IEEE Institute of Electrical and Electronics Engineers IP Intellectual Property

LP Logic Processor PC Personal Computer PCB Printed Circuit Board RTL Register-Transfer Level

SCEMI Standard Co-Emulation Modeling Interface SoC System-on-a-Chip

SW Software

TLM Transaction-Level Modelling UVM Universal Verification Methodology

VHDL Very High Speed Integrated Circuit Hardware Description Language VIP Verification Intellectual Property

5G Fifth Generation Mobile Network

(7)

1. INTRODUCTION

Usage of the internet and services supplied with it are part of the people’s everyday life.

There were 3,8 billion mobile internet users and 12 billion devices connected to the internet in 2019. The numbers are expected only to grow and the prediction for the year 2025 considering mobile internet users is 5 billion and for devices connected to internet 24,6 billion. [1] This increase of the devices means that the communication and transmission of data will grow accordingly which leads to greater utilization of the wireless networks. The prediction for global mobile data usage is to increase fourfold from 2019’s 7,5 GB per subscriber per month to 28 GB per subscriber per month in 2025 [1]. The development process is taken into account and made possible with the fifth generation 5G mobile networks which will offer extremely high data rates, massive amount of connected devices and very low latencies [2].

Because of these, wireless network devices must be able to process a lot of data which increases their performance requirements. One solution that the telecommunication and networking companies such as Nokia and Ericsson are using to answer to this demand are system-on-a-chips (SoCs) [3,4]. The SoC is an integrated circuit (IC) that contains multiple hardware (HW) components and software (SW) on a single chip forming a complete system. The components that an SoC can contain are for example a central processing unit (CPU), digital signal processing units (DSPs), analog-to-digital (ADC) and digital to analog (DAC) converters, different input and output (I/O) ports, memories and other intellectual properties (IPs). [5] The advantages of SoCs are lower cost and power consumption, smaller physical size and faster operation when compared to systems build from separate IC components. This is due for example to that in SoC components are inside a single chip and do not require separate assembling and wiring on the board. [6] Because of the single package, the components are also closer to each other which decreases communication latencies between them and thus allows usage of higher clock frequencies.

An SoC can be made as an application-specific integrated circuit (ASIC) or by combining a field-programmable gate array (FPGA) with CPU(s). SoCs made of FPGA and CPU(s) are also called as SoC FPGAs. ASICs are chips designed for specific applications as the name implies and cannot be modified after manufacturing whereas FPGAs can be re-

(8)

programmed for example to fix bugs if necessary. These both implementation methods include the same high level SoC design flow steps presented in the Figure 1 although the implementation processes themselves differ a lot.

Figure 1.Example of a high level SoC design flow [5,7].

The designing begins by collecting the requirements and specifications from the customer. In the second phase, obtained information is used to specify the architectural model of the design that determines which functions are implemented with HW and which with SW. [5] Next, the HW and the SW are designed and verified according to the model.

This thesis focuses on the HW verification area marked with dark blue background in Figure 1. This is followed by a HW and SW integration and its verification. In the last phase the HW and the SW are mapped to a SoC FPGA or in the case of ASIC, a chip is manufactured according to the HW to which the SW is applied. Communication is also important part in the design process, and it occurs between all phases although not indicated in Figure 1.

The design verification is mentioned to be the most important part of the product development process and consuming up to 80% of the total development time [8]. In 2018, the average time that ASIC design engineer and ASIC projects used in the verification were 46% and 53% respectively [9]. The great time used in the verification is necessary because it is important to confirm that the design functions as defined before taking it to markets. This is crucial from economical perspective since the earlier the bugs are found the cheaper it will be to fix those. For example, the cost of recalling Pentium

(9)

processor was more than 450 million USD for Intel in 1995 [10]. Misfunctioning or faulty products might also have negative impact to the company’s reputation and thereby cause further negative economic impacts.

Traditional methods used in the hardware verification process are simulation and formal verification. The formal verification is based on mathematical methods that are used to verify the system [10]. In the simulation a design under test (DUT) is simulated by applying stimuli to it while recording its reply to confirm a correct operation. The simulation is done with software or hardware simulators. [11]

The software simulators run usually on normal workstations (PCs) or on dedicated servers. The trend of growing size and complexity of the designs are causing the simulation times of the SW simulators to increase to unreasonable levels [12]. The hardware simulators or emulators, that are specially designed for simulation, make nevertheless possible to complete these simulations that require vast amount of computing performance. The acceleration that emulators offer can drop the simulation times significantly which permits the practical usage of these important simulations, thus increasing the quality and performance of the verification.

The SW and HW simulations are not mutually exclusive methods although they are alike.

The SW simulation suits better to the verification of relatively simple designs, short or simple tests and tests that are possible to do in parallel. The benefits of SW simulation in these cases are smaller number of tools required, multiple tests can be run concurrently for example in company’s already existing server grid and the involved costs are smaller than with emulators. Although emulators have more limited capacity in concurrent testing when compared to SW simulators, their advantages become visible when designs and tests are complex and large. To increase emulators’ benefits further electronic design automation (EDA) vendors are offering emulators as cloud-services which allows better scalability and decreases involved cost because purchaser does not require device or facility investments.

The other use case in which emulation can be used, although not under the scope of this thesis, is the development and verification process of SoC’s software. In this case the designed hardware is mapped to the emulator and the software can be run on it similarly as with a physical chip. This can decrease the SoC’s time to market because it is not necessary to wait the physical chip from the manufacturing process before using it in the software development and verification. Depending on the chip, manufacturing can take for example 2 months [13].

(10)

1.1 Thesis objectives and methods

This thesis is done as a case study for the SoC department of the Nokia Solutions and Networks Oy and it has two objectives. The first is to optimize an existing Advanced eXtensible Interface (AXI) -Stream verification IP (VIP) for simulation acceleration and thus improve simulation performance. The second is to compose coding guidelines for the Mentor’s Veloce emulator regarding the acceleration. The purpose of the guidelines is to describe the simulation environment and present the coding rules and limitations regarding to it.

Methods used in this thesis include reviewing literature and Mentor’s documentation for theory and coding guidelines. In the modification and testing part of the VIP, practical coding and simulations with the Veloce emulator were performed.

The general literature review was done first which was followed by familiarization to the Mentor’s documentation. Next, the coding guidelines were written. These steps were taken to ensure that the fundaments were understood correctly before optimization and simulation of the VIP.

1.2 Thesis outline

The thesis begins with basic introduction to the subject by providing overview of SoCs, their design process and role of verification and emulators in it. The second chapter begins by introducing functional verification, general verification flow, SystemVerilog and Universal Verification Methodology (UVM) to set ground for further chapters. Next, formal verification is described following simulation methods that are emphasising the use of emulators. The third chapter presents other works that are related to the subject.

In the fourth chapter the simulation set-up used in the tests is described by presenting the emulator, the test environment and the initial AXI-Stream VIP. Chapter 4 includes also the presentation of optimization modifications done to the VIP and the coding guidelines. The simulation results before and after VIP modifications are presented in Chapter 5. Conclusions are made in the final chapter.

(11)

2. FUNCTIONAL VERIFICATION

In this chapter SoC’s general hardware verification process is discussed from the functional perspective. This corresponds to the dark blue part in Figure 1.

First, introduction to the subject and general description of the verification process is given. After that one of the most common hardware verification language and methodology for building test environments are introduced. The final parts discuss about two main verification methodologies that are formal verification and simulation. The usage of software and hardware simulators are in the focus of the simulation chapter.

2.1 Introduction

Design verification aims to ensure that the implementation of the design corresponds to the design’s specifications and requirements [8,11]. It is the most important part of the product development process [8]. If the developed chip is not flawless on the first try, what it usually is not, or the verification process fails then financial loses are expected.

The loses consist of the time used in finding and fixing the bugs, re-manufacturing the chip, recalling the faulty products, delayed time-to-market which decrease the market share and sales, for example. [11]

The design verification can be divided to multiple subareas based on the verification target. Examples of these areas are layout verification, timing verification, electrical verification and functional verification to which this thesis focuses on. [11] The object of functional verification is to confirm that the behaviour of the design satisfies the specifications [8].

The functional errors in design are caused because of two different reasons. The other is faulty specification that leads to wrong design implementation, and the other is faulty implementation of the specification. The specification errors can be avoided and corrected with sufficient discussion and planning with customer, whether it is internal or external customer. [11] This process is called validation rather than verification [14].

The implementation errors can be prevented with two methods. The first is that design implementation would be synthesized directly from the specifications although it is limited in practice. The specifications are usually written in human language, such as English, and not in precise mathematical language, for example C++ or VHDL, which is why automated synthesis is not possible for the synthesis tools. This method would be also error prone to the used tool’s software and user errors. [11]

(12)

The second method is to produce two or more implementations with different procedures from the same specifications and do comparison between them. The more different procedures are used producing the implementations, the more reliable the verification can be considered in theory. In practise, usually two implementations are done because developing each implementation can be very time consuming, expensive and may create additional implementation specific errors. Traditional simulation-based verification is example of this method since it contains the design, the first implementation, and the reference output, which is the second implementation of the specification. [11] The verifying implementation, also called testbench, is usually made with high-level programming language which allows easier and faster implementation.

2.2 Verification process

The basics of the verification procedure include two phases that are implementation of the verification model from the specification and comparison of that to the design implementation to detect possible errors. [11] Figure 2 presents more accurate description of the verification process which can be called also verification cycle because it is developed continuously based on earlier experiences [13].

The cycle begins by defining functional specification of the product. It consists of design’s functional and interface information, describing what functions it must implement and which kind of interfaces are used in communication with surrounding system. [13] The specification can be determined for example in co-operation with external or internal customer. The verification engineers will use this information to produce verification environment and design engineers to develop the design [13].

The next phase of the verification process is to develop verification plan. The plan is based on functional specification and experiences of the previous projects. It describes the verification targets and how those are verified. The plan includes information of the specific tests, methods, resources, completion criteria, functions that are and are not verified. In addition, the complex designs are divided hierarchically to smaller components to make verification more feasible. For example, a new high risk and essential IP-block can be verified in more controllable way on low level which is not necessary for old and familiar submodules and IP-blocks. [13]

(13)

Figure 2. Functional verification cycle in chip design [13].

The development of the verification environment starts after verification plan is ready.

The verification environment consists of the tools and software code that the verification engineer uses to find bugs and ensure proper functionality of the DUT. The tools are for example simulators, waveform viewers and emulators that different EDA vendors offer.

SystemVerilog is one programming language used according to UVM in coding verification environments [15]. The purpose of test environment is to create stimuli to the DUT and check that its response is correct. The verification engineers create the environment separately from a design team and preferably with different programming language to reduce the probability to make similar designs and thus same mistakes.

[11,13]

In the next phase the DUT is tested in the verification environment. First, the verification engineer integrates the DUT to the environment and after that starts running the tests.

(14)

The possible differences between the reference and design outputs launch the search for finding the reason behind them. The verification engineer fixes the error if it is in the verification environment and vice versa the design engineer fixes the design if the error is caused by it. After updates the test is re-run to ensure the problem is really corrected.

[13]

Regression means running continuously tests that are determined in the verification plan.

The purpose of regression is to ensure that new fixes or changes do not cause unintentional errors to the design or test environment. Other main reason is the randomization that is usually included in the test environments. It causes different input stimuli to the design between test runs to achieve greater test coverage which is why it is beneficial to run enough tests. [13]

If all fabrication criteria, logical and physical, are met the design is sent to the chip manufacturer. This is called tape-out. The random regressions can be run during the fabrication to further increase verification coverage and discover bugs. After fabrication the chip is tested for physical defects before its functional tests begin. The chip will be assembled to the test board or other test system for hardware bring-up. The hardware errors found in this phase are very expensive and debugging of those is much harder than in verification environment because of limited visibility to the signals. Depending on the bug, there might be various ways to fix it. For example, software may be used to prevent the faulty state or fixes to the wires on the chip may be enough to avoid expensive rebuilds of manufacturing masks in re-fabrication. [13]

Finally, escape analysis is done if there were bugs found in the hardware bring-up state.

This aims to confirm the verification team’s understanding of the reasons why the bugs were not found earlier. When the reasons are clear the information can be used to develop the verification process further and thus avoid the same mistakes in the future.

[13]

2.3 SystemVerilog

SystemVerilog is a programming language that combines specification, design and verification of digital logic hardware. SystemVerilog is based on Verilog, that is hardware description language (HDL), and Accellera SystemVerilog 3.1a extensions for the Verilog language. [16,17] The extensions include features from for example VERA, VHDL, C and C++ languages which make possible to model and verify designs that are very complex and large [18]. The features that it supports are assertions, object-oriented programming, coverage, constrained random verification and application programming

(15)

interfaces (APIs) to foreign programming languages. In addition, the supported abstraction levels are behavioural, register-transfer level (RTL) and gate-level. [17]

Institute of Electrical and Electronics Engineers (IEEE) standardized SystemVerilog for the first time in 2005 in the IEEE 1800-2005 standard [16]. The latest update to the language is the IEEE Standard 1800-2017 that was published in 2018 [17].

The main concepts of SystemVerilog relating to the simulation are modules, interfaces, virtual interfaces, classes and procedural statements. The main use of modules is to represent design blocks or IPs. They can also be used to encapsulate verification code, the design blocks and connections between them. The modules are used to create design hierarchy and they can contain constructs such as ports, data declarations, class definitions and instantiations of other modules, interfaces and class objects. [17]

The interface encloses communication of design and verification blocks in the system.

The interface is in its simplest form simply a bundle of signals that can be connected to the instantiated modules and other interfaces in the system. The interface can also contain for example variables, tasks and functions which expands the connectivity aspect towards functional operations. Thus, it promotes design reuse and decreases repetition in the code. [17]

A virtual interface is a variable used to represent an interface instance [17]. It is a handle or pointer, as referred to C language, to the physical interface [19]. One should notice though that the SystemVerilog handle has some differences when compared to the C pointer. For example, the SystemVerilog handle is not allowed to be used with arbitrary data types, but the C pointer is. The virtual interface can be used in functions, tasks or methods by passing it to them as an argument. [17] The virtual interfaces are used to connect SystemVerilog classes used in the test environment to the physical interfaces.

This is done by creating the virtual interface variable inside the class and connecting it to the desired interface.

The class is a data type that encloses data, called class properties, and subroutines, called methods, which utilize its data. These features specify what object, that is an instance of the class, contains and what it can do. It is possible to create and destroy objects dynamically and pass those in the test environment via object handles. In addition, it is not required to allocate or deallocate memory as in C++ because SystemVerilog has implicit and automatic garbage collection as in Java language. [17]

The objects play crucial part for example in the UVM test environment.

Subroutines are called tasks and functions. There are four differences between tasks and functions. The first is that functions are executed in one simulation time unit and thus

(16)

cannot consume simulation time while tasks can contain statements that control and consume time. The second is that a function cannot enable a task, but the task can enable functions or other tasks which is related to the first difference. Thirdly, a nonvoid function must return a single value but a void function or a task cannot return any value.

The last difference is that a nonvoid function can be used in expression as an operand, but a task cannot because of the value returning rule. [17]

Procedural statements are programming statements that illustrate behavioural code.

These statements include for example selection (if-else, case) and loop statements, timing controls and subroutine calls. Initial and always are procedural blocks that contain procedural statements and are launched at the start of simulation. [17]

The initial procedures are executed only once while always procedures are executed repeatedly. The initial block can be used for example in combination with forever-loop, which executes its statements repeatedly, and #-delay control, which delays following statement by the amount of defined time, to create a clock signal in simulation. Event controls with edge-sensitive detection, which is indicated by @-sign, and with level- sensitive wait-statement are other ways to control timing in SystemVerilog. [17]

2.4 Universal Verification Methodology

Universal Verification Methodology is standardized verification methodology that is written with SystemVerilog. It is a base class library defined by a group of application programming interfaces. The purpose of UVM is to harmonize and make development and integration of verification environments and verification intellectual properties (VIPs) easier, more efficient and consistent. The achieved modularity, scalability, reusability and interoperability with UVM increases design and verification quality and reduces costs of buying and writing new VIPs. [20]

Accellera Systems Initiative is a non-profit standardizing organization, which consists of different electronics organizations such as Intel, Cadence, Mentor, Synopsys and ARM [21,22]. Mentor, Cadence and Synopsis created UVM in 2010 while working in accordance to the Accellera committee. After that Accellera’s UVM Working Group has participating to the implementation, development and maintenance of UVM. [15,23]

The UVM is built on top of Open Verification Methodology (OVM) which is combination of the Advanced Verification Methodology (AVM) and the Universal Reuse Methodology (URM), which was based on the e Reuse Methodology (eRM). The UVM contains concepts also from the Verification Methodology Manual (VMM) which was developed from Reuse Verification Methodology (RVM). In addition, UVM includes similar

(17)

transaction level modelling (TLM) concepts as was developed for SystemC. Figure 3 indicates the UVM’s development path. [20,24]

Figure 3. The development path of the Universal Verification Methodology [20,24].

A UVM testbench is built with user-defined components that are extended from the UVM library’s base classes. Figure 4 presents an example of a typical UVM testbench and a DUT that is connected to it. The UVM testbench architecture consists of Tests, Configuration Database, Environment, Sequences, Scoreboard, Coverage Collector, Agent, Sequencer, Driver and Monitor objects. [15,25] As Figure 4 illustrates the UVM architecture is built hierarchically. The top-level module called Testbench encapsulates Test class objects, the DUT and SystemVerilog interfaces. The Tests contain the Environment which again contains more UVM components. The communication between the UVM testbench and the DUT occurs via interfaces and Driver and Monitor components that are instantiated inside different Agents as Figure 4 presents. In simulation acceleration, the Driver and Monitor are the most relevant UVM components as well as the interface they use to communicate with the DUT.

The UVM components are dynamic class objects, which means that they are instantiated during the simulation while SystemVerilog modules, for example a DUT, and interfaces are static components and thus created in compilation phase before running the simulation. Because of this there must be top-level module that contains the DUT and the interfaces and launches the instantiation of UVM environment in the simulation.

(18)

Figure 4. An example of the UVM testbench’s architecture [15,25].

Communication in the UVM environment takes place with transactions between the objects. The UVM contains TLM ports such as ports, exports, implementation ports and corresponding analysis ports that connect objects to each other. [20] The ports and exports are used in communication between two individual objects. The emphasis with different analysis ports is in delivering the same information to multiple targets, for example to Coverage Collector and Scoreboard rather than to a certain single object [20]. The communication between objects, mainly Driver and Monitor, and the DUT occurs on pin-level after transforming transactions packets to signal-level stimulus [15].

The UVM Sequence Item is the transaction class object that encapsulates data that is required in modelling a unit of communication between two components. The data it includes can be for example address and data variables, constraints and wait instructions. [15,25]

The Driver is responsible for transforming transaction-based stimulus to the pin-level stimulus. The Driver receives the transactions one at a time from the Sequencer via TLM port. After receiving transaction, the Driver will transform it to pin-level stimulus and apply or “drive” it to the DUT via DUT’s interface. [25]

The Monitor’s function is to transform DUT’s pin-level activity to the transactions for the rest of the testbench environment. The Monitor observes the pin-level activity of the DUT’s interface and captures information from it. After that it will combine the signal level

(19)

information to transaction and broadcasts it forward via TLM analysis port. In addition, the Monitor can process the data by for example checking, collecting coverage or recording it. [25]

The Sequencer is a component that controls the flow of transactions from one or multiple Sequence objects to the Driver. The Sequence is an object that contains the functionality for generating the transactions that Sequencer will pass to the Driver. The Sequence can contain for example purely random stimulus to test DUT’s behaviour or well-defined stimulus for initialization of the DUT. The order of used transactions can be premeditated or contrarily randomized. [25,26]

The Agent usually consists of Sequencer, Driver and Monitor and is thus hierarchical component. It may contain other components in addition to the three common ones as for example Coverage Collectors. The Agent is designed to operate with certain logical DUT interface, such as AXI, by applying and receiving stimulus to and from it. There is usually one Agent per interface in the test environment. The Agent must be able to operate in two modes. The first is active mode in which it is controlling and generating stimulus to the interface while monitoring it. The second mode is passive in which it only monitors the interface. [15,25]

The Scoreboard is responsible for checking the correct functionality of the DUT. The Scoreboard is initiated in the Environment and is connected to the Agent. The Agent sends transactions that contain DUT’s input and output data to the Scoreboard. The Scoreboard will modify the input data according to the reference model to produce correct or predicted output data. After that, comparison is done for reference and received data to check the DUT’s behaviour. [25]

The Coverage Collector is a subscriber component that collects functional coverage group data by observing received transactions. It divides transactions to covergroups which are used to observe verification progress. [15] Thus, covergroups provide information from the tested and untested conditions that indicate if enough testing is done.

The Environment is a hierarchical component which defines the testbench architecture.

It instantiates usually Agents, Scoreboards, Coverage Collectors and other Environments depending on the size and complexity of the DUT. The test cases in which Environment contains other Environments concerns for example large systems which consists of multiple IP blocks. The individual IP Environments are combined under top- level environment to invoke stimulus to the corresponding IPs. The Environment level invokes most of the verification reuse in UVM. [20,25]

(20)

The Test is UVM’s top-level component which consist of an Environment. It usually uses Configuration Database or factory overrides to configure the Environment and chooses Sequences that define stimulus to be applied to the DUT. There can be multiple Tests for the same Environment in which case the configuration of the Environment and/or use of test sequences differ between the Tests. [25]

The Configuration Database is a class that is used to configure the UVM component instances. It is possible to save and retrieve type specific data from the resource database with it and use that data in configuration. Another use case for Configuration Database is to pass DUT’s interface handle to the dynamic test environment for example to the Driver and the Monitor. [15,26]

2.5 Formal verification

Formal verification is based on mathematical methods. Its objective is to make possible the complete verification of the design by covering design´s all potential states. [10]

Opposite to simulation, the formal verification is output driven in the sense that it does not use input stimuli. The verifier specifies the output functionality that the design should implement, and the formal verification tool proves that either true or false. Another difference to the simulation is that from output’s perspective the formal verification verifies a whole property, group of output signals, at once while simulation verifies only one point, that is based on a current input sequence. [11] There are three main formal approaches that are equivalent checking, model checking and theorem checking [10].

In the equivalence checking two design versions are compared to each other and investigated if they are equivalent. This type of verification is used for example to compare gate-level netlists before and after scan-chain assembly in normal operation to ensure that they are corresponding. Also, RTL-level implementation comparison against its layout version is commonly verified with equivalent checking. [10,11] The verification tool will create an input sequence which will point out the differences between compared circuits in simulation if tool detects that the circuits are not equivalent [11].

The model checking or property checking proves or disproves if given design has the property which is under verification [10,11,27]. The property means some part of the design specification that the design should satisfy [11]. The model checking is very automatic and quite fast technique and it is used to verify finite state systems [27,28]. “Is deadlock possible in this system?” is an example of a property that can be verified with model checking [27].

(21)

The principle in this model checking is to search the design’s state space completely to find a point in which the property does not work. This failure point will be used as counter- example to disprove the property if found. [11,27] Based on the counter-example the tool will generate input sequence that can be used in debugging the failure [11,28].

The main challenge with model checking is the enormous number of variables that must be solved within complex systems including a vast number of states. This is known as state explosion and because of this the consumption of memory limits this approach.

[11,27,28] The problem can be managed by isolating a part of the design that concerns the property for verification. This requires proper modelling of the test environment including the constrain of the input values to be similar as if the part was connected to the design. The constrains might be over-constrains that limits legal inputs or under- constrains that allow unexpected inputs. The under-constraints may cause unnecessary disproval of the property. The over-constraints on the other hand may cause the approval of property although there might be values causing the disproval in the legal but constrained area. [11]

Theorem proving aims to solve if design’s implementation meets the design specifications by mathematical reasoning methods [10,11]. In theorem proving the design is described as mathematical axioms and the property is treated as mathematical proposition. If the axioms can be used to deduce the proposition, then the property is true and otherwise false. This approach is less automatic than model checking and requires user interference for example in setting up lemmas to proceed with the process.

It also demands strong understanding of the used tool and knowledge of mathematical proofing. Additional differences to model checking are the minor memory requirements and ability to process larger and more complex designs. [11]

2.6 Simulation

Simulation-based verification is the most used technique in functional verification [10,11].

This approach is based on an interactive testbench that is connected to a DUT. The testbench feeds input stimuli to the DUT and records its output response. Then it compares the recorded response to the reference output that is considered to be correct.

The simulation is run on a simulator. [11] Figure 5 presents typical simulation set-up in which a verification environment or testbench is applying stimulus to a DUT and recording its response for checking. In Figure 5 the simulation is executed on simulator that is run on workstation.

(22)

Figure 5. A typical simulation set-up.

The input stimuli and the reference output, called also as golden response, can be generated before the simulation or during the simulation on run-time. If the generated input stimuli are targeted to verify a certain functionality from the design, they are called direct tests. Another test type is randomized test. In randomized tests the input stimuli are randomized to verify areas and input combinations that are not considered with direct tests. The randomization can be constrained to have certain control and guidance over stimuli. [11] The constrained random tests are very useful because those increase input combinations while keeping the values within legal and specified range.

It is important to know how well a design is stimulated and verified with different tests in simulation. The code or functional coverage of the simulation provides this information.

The code coverage describes how much of the design code the tests have utilized for example in percentage of executed branches or statements. The functional coverage on the other hand describes the number of functionalities utilized in the tests. Based on the coverage information a verification engineer can design and develop new tests to verify the areas not covered and use that information in analysing the maturity of the DUT. [11]

Before simulating the design, it can be checked with a linter program for static errors or coding style violations. The linter analyses the HDL code itself, rather than its purpose, to find defects that impact simulation, synthesis and performance. The static errors that the linter can detect are errors that can be found without input stimuli as for example multiple or no drivers on a signal, or a mismatch of port width between module’s definition and instantiation. [11,29]

The simulation is executed with software or hardware simulators which exists in event- driven and cycle-based. The event-driven simulator reacts to events that are changes of values in variables or signals. When an event occurs in a component’s input or sensitivity

(23)

list the simulator evaluates the component. If any of the component’s output values change then a new value is carried on to all attached components which will be then evaluated. This procedure continues through the simulation until all events are handled.

[11]

The cycle-based simulator divides the design according to used clock domains. The evaluation of the circuit happens only once at triggering clock edge. Because of this the cycle-based simulator suits only for synchronous circuits which have well defined clock domains. Additional features and restrictions include for example the assumption of zero delays for all components and incapability to simulate combinatory loops. The cycle- based simulators require also stricter coding style than event-driven simulators. The advantage of cycle-based simulators is that they are faster than the event-driven simulators. [11]

The principle of the simulation in software and hardware simulators, also called as emulators, is similar but with large and complex designs the performance of the software simulator limits its usability. In the next chapters emulators and their use cases in simulation purposes are discussed. Also, FPGA prototyping and differences between these methods are considered.

2.6.1 Emulators

Emulators are special-purpose computers manufactured to speed up simulations. They are usually some tens to thousands of times faster than software simulators depending on a use case. [11] When the design size reaches approximately tens to hundreds of million gates the software simulator’s performance starts to drop and benefits of using emulator becomes visible [30]. An emulator can be implemented in two different ways of which the other is indirect implementation and the other is direct implementation [13].

In indirect implementation the logic simulator is implemented with hardware. An example of indirect implementation is the cycle-based EVE machine. It is built with primitive logic processors (LP) that have four inputs channels and a Boolean function (e.g. AND or OR) that LPs execute continuously. The more there are LPs in the system, the more Boolean functions can be evaluated in parallel to increase performance although the communication between LPs take then more time. In addition, the emulator includes hardware elements for registers, latches and memory arrays which increase the functionalities that emulator supports. [13]

In direct implementation the design is directly mapped to the programmable hardware of the emulator [13]. The programmable hardware can consist of FPGAs, processor arrays

(24)

or programmable ASICs [11-13]. Because the design cannot typically fit on a single HW element it is partitioned to multiple sub-circuits that are mapped to the programmable HW elements. The programmable components are connected to each other via pins of their packaging on the printed circuit boards (PCB). There are two different connection architectures that are direct and indirect architectures. [12]

In the direct architecture the components are connected directly to each other with physical wiring while in the indirect approach they are connected using routing chips.

The direct routing is straight forward connection method but restricted because of the components’ limited number of I/O pins. The restricted number of the connection pins limit the signals that can be feed to a programmable component and out from it. Thus, all possible logic recourses of the component might not be used because the designed module to be mapped there cannot receive the signals that it would require to function.

This causes the module to be mapped to another component to which the required signals can be connected through available pins. One way to avoid this limitation is to use some of the components’ logic in time-multiplexed routing or choose indirect routing.

[12]

In time-multiplexed approach a certain connection wire between two processing elements is used in turns. This may require division of the emulation clock to sub-cycles to allow signalling between components and their corresponding logic circuits to occur correctly. [12]

An example of the indirect routing with programmable interconnect approach is the crossbar component. It is a component that can connect any of its pins together and thus improve inter-chip routing. The disadvantage is that the size of the crossbar component increases significantly if the number of components and their connections increase. In this case a single crossbar can be divided to multiple smaller components to decrease the size of single component. Time-multiplexing and crossbar interconnection methods can be combined to further improve inter-chip routing in the emulator as Figure 6 demonstrates. [12] For example, the usage pq-line of the Chip 1 is divided in time for signals p and q in Figure 6. In the crossbar of Multiplexer (MUX) Chip 1 these signals are then routed forward.

(25)

Figure 6. Example of indirect inter-chip routing between emulator’s processing units using time-multiplexed and crossbar schemes [12].

Concurrent execution of the RTL design requires correct timing control from an emulator.

It must take care of propagating clock signals without skew to different processing units to ensure intended design behaviour. The skew of the clocks can cause itself wrong signal value propagations or cause hold-time violations which moreover cause metastability. Most timing issues can be fixed by decreasing emulation clock frequency which on the other hand decreases emulation performance and its benefit. In addition, this procedure does not suit SoC designs with multiple different clocks and clock phases.

The clock skew can be decreased for example in case of internally generated secondary clocks by duplicating the clock generation block to processing units that require it. Thus, slow inter-chip path delays between these units can be avoided. [12]

The use of emulator requires host workstation that partitions and compiles the design code for emulation and programs the processing units, for example FPGAs, to which the design is mapped. The workstation is connected to the emulator via high-speed channels and it programs also the routing between the processing units. [12] Because of these phases the design compilation time for emulator might be 5 to 50 times longer than for software simulator. Although the compile times are longer for emulators there are differences between them. For example, emulators utilizing FPGAs have longer compile times compared to emulators utilizing processor arrays. [11]

In addition, the run-time controlling and debugging interface to the emulator is used with the host workstation. However, the communication between workstation and emulator should be kept in minimum to maximize the performance boost that the emulator offers.

[11,13] Multiple users can use modern emulators with remote access simultaneously through host workstations.

(26)

The emulators cannot be used to verify RTL design’s timing properties because the design is mapped to its hardware. The reason is that hardware or signal routing inside the emulator is not the same as in the future’s design on another chip. Thus, the results from the emulator would not correlate the actual implementation. The same reason of mapping the design to the emulator limits the emulators to use 2-state logic instead of 4- state logic used in software simulators. The unknow state (X or U) cannot exist in real hardware and high-impedance state (Z) cannot be observed with digital circuit which leaves logical values ‘1’ and ‘0’ left. [11]

2.6.2 Simulation acceleration

The purpose of the simulation acceleration is to decrease the time used in the simulation process, as its name suggest. The acceleration can decrease only the time that is consumed in simulating the synthesizable design code. This is due to that the synthesizable code is executed in the emulator with higher speed when compared to pure simulation.

In simulation acceleration, parts of design code that are synthesizable are mapped to the emulator’s hardware for execution. The non-synthesizable design code and verification environment code is executed in the workstation. [12] In the industry, the parts executed in the emulator are referred as “HDL domain” while parts executed in the simulator are called as “HVL (hardware verification language) domain” or “testbench domain”.

Input data is generated and applied from verification environment on the workstation to the DUT on the emulator via high-speed channels. Output data is transported similarly to another direction from the DUT to the testbench in which the response is checked.

The Figure 7 illustrates the simulation acceleration set-up. [12]

Figure 7. The general simulation acceleration set-up. [12]

(27)

The total time t that is spend in simulation with simulation acceleration can be presented in form of equation

𝑡_{𝑡𝑜𝑡𝑎𝑙} = 𝑡_{𝑒𝑚𝑢𝑙𝑎𝑡𝑜𝑟}+ 𝑡𝑤𝑜𝑟𝑘𝑠𝑡𝑎𝑡𝑖𝑜𝑛+ 𝑡𝑐𝑜𝑚𝑚𝑢𝑛𝑖𝑐𝑎𝑡𝑖𝑜𝑛

⁽¹⁾

in which temulatormeans time spend in the emulator, tworkstation, means time spend in the workstation and tcommunicationmeans time spend in switching between the emulator and the simulator domains including data exchange between them during the simulation.

The total time can be presented as in the equation 1 as the sum of the three components because they occur sequentially during the simulation. When execution is in the workstation, for example to generate new input signals during the runtime, the emulator is paused. After the data is generated it will be transmitted to the emulator after which it can continue to operate. The workstation will now wait the data from the emulator before it can continue operating. Thus, the simulation proceeds generally in turns although there are some methods, such as data streaming, to add parallelism and avoid fully lockstep proceeding of the simulation. [31]

As the equation 1 shows, the time spend in the emulator is not the only issue that impacts to the total simulation time. The time spend in the simulator on the workstation limits the achievable time reduction since it is slower than the emulator [12]. Because of this, the relation between the time spend to simulate a testbench and a synthesizable design code in pure simulation affects to the achievable benefit of the acceleration. The benefit decreases if the time used in the testbench, for example to generate and check data, takes a lot longer than time used to simulate design code.

Figure 8 presents an example of this with two different scenarios. The blue colour in pillars represents the time spent in executing non-synthesizable code in the workstation, the orange colour represents time spent in executing synthesizable code and grey colour represents communication between the workstation and the emulator. In the first scenario, the workstation uses 50% of the time to simulate the synthesizable design code and 50% of the time to simulate the non-synthesizable verification code in pure simulation as “Simulation 1” pillar presents. In this case, the total time could decrease only to half of the initial time if the design code had been accelerated to consume zero time. It would mean also that communication between the workstation and the emulator would not either consume any time. The “Simulation acceleration 1” pillar illustrates an example of simulation acceleration result in case of time-consuming verification code.

On the other hand, when in the initial state the time consumed in simulating the synthesizable code is 90% and rest 10 % is used in the non-synthesizable code, the

(28)

acceleration can reduce the simulation time greatly. This second scenario is illustrated with “Simulation 2” and “Simulation acceleration 2” pillars.

Figure 8. Illustration of the simulation acceleration’s effect to the simulation time in two example scenarios.

By optimizing the testbench and the simulation technique it is possible that communication between the emulator and the workstation becomes the bottleneck [12].

Because switching between the emulator and the workstation affects to simulation’s total time, as equation 1 presents, the exchange events of data should be kept in minimum.

Transferring as large data amounts per transfers as possible is one way to decrease the communication. [12]

To conclude, the best results from simulation acceleration will be achieved when the time spend to simulate synthesizable design code is longer than the non-synthesizable verification code in the first place and the interaction between the emulator and the workstation is infrequent. When choosing simulation method between the pure simulation and the simulation acceleration, one should also take compile times into account. The compile time for the pure simulation is considerably shorter than for the simulation acceleration which may cause the combined time of compilation and simulation to be less than with simulation acceleration. On the other hand, when the number of used tests and the total time used in those increases, the significance of the compile time decreases. The pure simulation might be better choice when considering short tests because those are possible to execute easily in parallel for example in a server grid.

(29)

2.6.3 Emulation

In emulation both the DUT and the verification environment are mapped into the emulator which increases the simulation speed further when compared to the simulation acceleration. The communication between the workstation and the emulator occurs only when required, for example to check execution process. Figure 9 illustrates the emulation set-up. Because the verification environment is mapped to the emulator it must be synthesizable and thus that the coding style used to implement it is limited to achieve synthesizable structures. [12]

Modifying of the already existing verification environments to synthesizable form may be difficult and consume lot of time because of the coding style restrictions. For example, the dynamic classes and the randomized test sequences used in UVM based environment are not synthesizable. Thus, functionalities implemented with them must be re-designed. In addition, re-use of synthesizable testbench components is likely to be more challenging when compared to non-synthesizable components. Because of these issues, the simulation acceleration is generally more practical option to speed up the verification.

When considering the equation 1 from the emulation’s perspective the tworkstationand the tcommunication decreases because emulator is now able to run also the verification environment in higher speed. The emulation does not require frequent participation from workstation and because of that the communication between them is minor. Thus, the major contributing factor will be temulator, which is determined by the performance of the used emulator.

Figure 9. Emulation set-up in which the verification environment is synthesizable in addition to the DUT. [12]

(30)

2.6.4 In-circuit emulation

In-circuit emulation (ICE) uses an external hardware that is often called the target system board in combination with an emulator. The DUT and test environment are synthesized and mapped to the emulator’s hardware as in pure emulation. The external hardware is connected to the emulator and it is used to generate stimuli to the DUT creating more realistic verification set-up. The workstation and the emulator change data on demand for example to view execution status. [12] Figure 10 illustrates the ICE set-up.

Figure 10. In-circuit emulation set-up. [12]

The emulator can be used to apply a clock signal to the target system in which case the system is considered static. This approach allows stopping and resuming of simulation and slowing or increasing the simulation speed without problems. If the target system has its own clock source, it is considered dynamic. In this case special care must be applied to achieve correct functionality of the system because there may be differences for example in data and clock rates between emulator and target board. In addition, the stopping of simulation is not as straightforward as with static target which restricts debugging. [12]

2.6.5 FPGA prototyping

In FPGA prototyping the design is synthesized and mapped to the FPGA chip. This is done by using the FPGA vendor’s tools. If the design does not fit to a single FPGA it must be partitioned and mapped to multiple ones which makes the mapping process error-prone and more time consuming. [12]

(31)

The FPGA prototyping offers the fastest simulation speeds compared to the other methods. The cons of this approach are the limited debugging capabilities and that the possible design partitioning must be done by hand when compared to the emulators which do it automatically.

2.6.6 Simulation method comparison

There are several differences between simulation methods presented above. The Table 1 presents comparison between them.

Table 1. Comparison of the different simulation methods. Modified from [6].

First notable difference is that software simulators are run on the workstations and they execute the design and verification code serially. In simulation acceleration the testbench is also executed serially in the workstation but the design code is executed concurrently in the hardware of the emulator. In the emulation, both the design and verification code are executed in the emulator in parallel and the workstation is not practically needed in this simulation process. The FPGA prototyping is done by mapping design code to the FPGAs and stimulated for example by connecting them to an existing system.

Secondly, the software simulators are more flexible regarding to the logic values and timing models whereas the emulators use 2-state logic and suit better to logic verification [12]. Otherwise the debugging capabilities respond each other because modern emulators offer full RTL-level visibility to signals as does the software simulators. Also, the debugging is done similarly by using workstation. On the other hand, debugging of RTL-code using FPGA prototyping is more challenging since RTL-level visibility is limited and must be done by using a SW logic analyser.

(32)

The third major difference between these methods is the simulation speed. The speed values in the Table 1 are approximates and depend on the DUT, the verification environment, tests, and the used emulator or FPGA. The compilation times differ also depending on the used method. In software simulation the DUT and testbench are compiled for the simulator whereas in the simulation acceleration the DUT must be synthesized, partitioned, placed and routed to the emulator which increases the compile time. In emulation the testbench must be also mapped to the emulator which further increases the compile time when compared to the acceleration. The compile time of prototyping is the longest because mapping and partitioning of the DUT to the FPGAs must be done manually which on the other hand is automatic process in simulation acceleration and emulation.

(33)

3. RELATED WORKS

This chapter presents research results and publications that relate to the simulation acceleration verification. The subject has been studied considering different viewpoints, and several beneficial propositions of using it as verification method has been given.

In [32] Wiśniewski et al. present simulation acceleration results that are approximately 170 times faster when compared to pure simulation for the same tests. They tested two RTL modules individually and as a combination. With the pure simulation individual tests took 45 and 30 minutes while the combination test took 75 min. The corresponding times taken by acceleration were 17, 10 and 27 seconds. They also illustrated the advantage of acceleration relating to incremental prototyping. [32]

In that methodology RTL design is made and tested in modules. When a new RTL module is ready it will be combined with previous ones in which case the final design will be completed piece by piece. Then the advantage of simulation acceleration was described to be such that readymade and tested modules can be accelerated with emulator while only the newest module will be simulated. Thus, test times would not increase as much as with the pure simulation although the size of DUT increases. [32]

Jain et al. achieved also notable results with simulation acceleration while using a UVM environment in testing. They used two DUTs that were of different sizes. The other was around 5 million gates and the other was almost double in size, around 9,5 million gates.

The test time in pure simulation was approximately 657 seconds and in acceleration 21 seconds by using the smaller DUT. This means gain a of 30 in test time. The difference was even more with the larger DUT. The corresponding times were around 2044 and 50 seconds. Thus, the gain in acceleration over simulation is approximately 40. [33]

Ruan et al., Kim.Y.I et al. and Kim J. et al. have studied simulation acceleration from the communication perspective of the simulator and the emulator. Ruan et al. have concentrated on developing the communication by using data streaming between the simulator and the emulator. The main idea in data streaming is that the emulator would not need to stop when data is transferred with simulator, but it could operate constantly while data is flowing in and out from it to the simulator. In their study the initial set-up was such that data is changed between the simulator and the emulator based on interrupts which means that the emulator is paused during data exchange. The data streaming method they proposed increased the communication throughput more than 10 times over the initial interrupt-based communication in their test cases. [34]

AXI-Stream VIP Optimization for Simulation Acceleration : a Case Study

Eetu Kokkonen