arXiv:2210.14364v1 [quant-ph] 25 Oct 2022
Functional Simulation of
Real-Time Quantum Con trol So ftware
Leon Riesebos
Departmen t of Electrical and Computer Engineer ing
Duke University, NC 27708, USA
leon.riesebo[email protected]
Kenneth R. Brown
Departmen t of Electrical and Computer Engineer ing
Duke University, NC 27708, USA
Abstract—Modern quantum computers rely heavily on real-
time control systems for operation. Software for these systems is
becoming increasingly more complex due to the d emand for more
features and more real-time d evices to control. Unfortunately,
testing real-time control software i s often a complex process,
and existing simu lation software is not usable or practical for
software testin g. For this purpose, we implemented an interactive
simulator that simulates signals at the application programming
interface level. We show that our simulation infrastructure
simulates kernels 6.9 times faster on average compared to
execution on hardware, while the position of the timeline cursor
is simulated with an average accuracy of 97.9% when choosing
the appropriate configuration.
Index Terms—real-time control software, si gn al simulation ,
software testing, quantum comput ing
I. INTRODUCTION
State-of-the-art quan tum hardware is becoming increasingly
powerful with recent systems demonstrating computations on
tens of qubits [1]–[7]. Recent papers [1], [5 ], [8], [9] have
shown that such systems rely heavily on real-time control
systems to control tens to h undreds of devices with nanosec-
ond precision. Programmable real-time control systems, as de-
scribed in [10]–[14], are already ava ilable and widely adopted.
An often underexposed area of such real-time control systems
is the increasing ly complex control software required to op-
erate them. Larger quantum systems control more real-time
devices, which leads to an inc reasing amount of software. In
addition, real-time software is taking on more responsibilities
ranging from hardware latency compensation to decompo sin g
quantum gate s into device control which further increases its
complexity.
With the growing complexity of real-time control software,
functional testing and verification is becoming increasingly
important. Unfortunately, testing real-time control software
is often complex, time-consuming , and resource-intensive.
Testing on ha rdware requires access to control hardware and
test equipment, such as oscilloscopes and signal generators,
to probe and stimulate the control system, as illustrated in
Figure 1. Even if all req uired test equipment is available,
configuring the equipment to simulate the correct test signals
can be co mplex and time-consumin g. Additionally, black-box
testing on hardware might not give enough insight into the
state of the software if incorrect behavior is observed. Software
testing with hardware requires har dware to be available, which
Fig. 1. The equipment required for hardware testing, which includes the
real-time control system, oscilloscopes, and signal generators.
might not be the case in th e early stages of development. The
use of simulation could enable testing of r eal-time control
software, but simulators are usually not available for real-
time contr ol systems, as is the case for [10]–[13]. Existing
simulation approach es that might be available, such as cycle-
accurate ha rdware simulation, often focus on the micr oarchi-
tectural level. Such simulations are too slow, inflexible, and
low-level to be useful for testing real-time control software.
In this paper, we present an open-source functional simula-
tor for real-time control software targeting the advanced real-
time infrastructure for quantum physics (ARTIQ) open-source
software and hardware ecosystem [10], [15]. Our interactive
simulator simulates all aspects of real-time control software,
including classical constructs, real-time events, and device
input. Real-tim e device signals are simulated at the applica tion
programming interface (API) level, which enables function a l
software testing and fast simula tion speeds. Our simulator
integrates seamlessly into th e ARTIQ host environment and
is capable of simulating interactio ns between the host and the
real-time control system. With our simulation infrastructure,
users can test and verify real-time control software using
existing tools f or step debugging, un it testing, and continuous
integration. With out the need for any of the test hardware
shown in Figure 1, our simulator enables software testing
in the early development stages. We show that our kernel
simulation is on average 6.9 times faster than execution on
control ha rdware. Even with the pre sence of variable d e la ys
and simplified timing models for devices, the position of the
timeline cur sor is simulated with an average accuracy of 97 .9%
when appro priately configured.
The remainder of this paper is structured as follows.
Fig. 2. Schematic overview of the accelerator model with a host program
and one or more kernels.
Section II briefly covers related work, and in Section III
we will provide an overview of th e ARTIQ hardware and
software components that we will simulate. The design of
our simulation platfo rm is presented in Section IV, while the
results of our performance and accuracy measurem ents c an be
found in Section V. We conclude our paper in Section VI.
II. RELATED WORK
Real-time control hardware and so ftware can be simulated
with techniques similar to ones used for the simulation of
embedd ed systems. Previous work such as [16], [17] proposes
various techniques and approaches for such simulations. Real-
time control hardware can be simulated on a microarchitectural
level based o n their hardware description using the same bina-
ries as the actual hardware. Cy c le -accurate microarchitectural
simulations can be p erformed with tools such as GEM 5 [18],
SystemC [19], [20], Chisel [21], or SimSoC [22]. M ost of
these tools c a n perform low-level and detailed cycle-ac c urate
simulations of the hardware. Unfortunately, cycle-accurate
simulations a re often not usable for software testing and
verification because simulations run slow and the simulated
signals are too low-level for testing real-time software and
device behavior. These simulations also requ ire detailed device
models that might not be available in the early development
stages. The same holds for simulation techniques ba sed on
communication models of the mic roarchitecture, such as [17],
[22]–[24].
High-level simulation approaches f or quantum computer
architecture as discussed in [25]–[27] can be fast and test
real-time quantum programs. Unfortunately, these simulators
operate on the quantum-gate level and d o not simulate the real-
time device control required to implement such op erations.
Hence, high-level simula tors are not usable fo r testing real-
time control software on a real-time device and signal level.
III. SYSTEM O VERVIEW
Our simulator targets the advanced real-time infrastruc-
ture for quantum physics (ARTIQ) open-source software and
hardware ecosystem [10], [15] which is used by dozens of
research groups and has deployed over 200 real-time con-
trol systems worldwide. The ARTIQ ecosystem combines a
Python-based software environment with modular real-time
control hardware, and its programmin g paradigm is based on
Fig. 3. A schematic overview of the microarchitectural components in the
core device.
the accelerator model as described in [ 13], [26], [28]–[33].
The ARTIQ software environmen t runs on a host computer
that co mmunicates with the control hardware, also referred to
as the core device, over ethernet. Users can program the system
using a Python host environment while kernels are executed
on the core device as illustrated in Figure 2.
A. Hardware
The core device is driven by a field-programmable gate
array (FPGA) which contains a classical CPU com bined with
an event-based real-time I/O (RTIO) subsystem similar to the
systems outlined in [13], [34]. Figur e 3 shows a simplified
schematic of the relevant microarchitectural components in the
FPGA. The classical CPU will ha ndle all classical instructions
of the kernel and has additional access to a timeline cursor
and an event timeline. The timeline cursor is a register that
holds the current position o n a timeline. The cursor is stored
as an integer value that represents a time in machine units
(MU), which normally corr esponds to a timestamp expressed
in nanoseconds. The CPU can also post events to the event
timeline where an even t is defined as a tuple of a timestamp
and an I/O command. To change the state of a device, the CPU
sets the time line cursor to the time at which the change should
occur b efore postin g the I/O command to the event timeline.
The current value of the tim eline cursor will be used to store
the event on the timeline. If the CPU posts two com mands
for the sam e device at the same timestamp, the last event will
overwrite the first o ne. By posting a series of events, a program
can build up an event timeline that rep resents the rea l-time
control of devices.
In parallel to the CPU’s execution, the RTIO subsystem
continuously verifies if any events are due. The RTIO cou nter
represents a timestamp in MU and is incremented every
nanosecond. The RTIO engine reads the event timeline and
verifies if a ny events are due based on the current value of
the RTIO counter. If an event is due, the RTIO engin e updates
the corresponding device according to the co mmand defin e d
by the event. In case an event generates a return value, for
example, when reading the value of a dig ital input, the return
value is inserted into the inpu t buffers. The CPU can read
results from the input buffers whenever th ey are ava ilable.
Fig. 4. A schematic overview of a host program a kernel with access to APIs
for the timeline cursor and device drivers.
For the RTIO system to operate proper ly, the slack (i.e.
the difference between the timeline cursor and the RTIO
counter) must be positive. Postin g an event with negative slack
translates to changing the state of a device in the past, which
is not possible. Doing so will result in an underflow exception.
Kernels normally start their pr ogram by synchron iz ing the
timeline cursor to the RTIO counter and incrementing the
timeline cursor with a fixed value of 125 × 10
3
MU to ensure
positive slack at the start of the program.
B. Software
The ARTIQ software environment is Python-based a nd
programs that run on the system are called experiments. An
experiment consists o f Python code that runs on the host and
can additionally contain kernel functions that run on the core
device. Kernel functions are written in the ARTIQ domain-
specific language (DSL) which is a subset of the Python lan-
guage. Inside kernels, programmers have access to additional
functions to manipulate the timeline cursor, post events, and
read input buffers. The latter two are norma lly not directly
used by progr a mmers as these functions are encapsulated in
device drivers. Such device drivers provide an application
programming interface (API) to translate functional device
behavior (e.g. switch off a digital output pin) to low-level
events. A schematic overview of a ho st program and a kernel
with access to APIs for the timeline cursor and device d rivers
is shown in Figure 4.
When the host calls a kernel function , the ARTIQ compiler
assembles a kernel binar y at ru ntime which is then uploaded
to and executed by the core device. Variables from the host
environm e nt th a t are accessed in a kernel will be compiled
into the binary. During kernel execution, th e host will handle
any (a)synchronous remote procedure calls (RPCs) initiated by
the kernel. Once the kernel is finished executing, the context
switches back to the host, an d any variables modified in the
kernel are synchronized with the host environm ent before
the experiment resumes executing on the host. As a result,
the context switch between host and kernel co de is almost
seamless from a programmer’s perspective.
IV. SIMULATION
Our goal is to enable the simulation of re a l-time control
software for software te sting and verification. A simulator
should integrate into the existing ARTIQ environment, sim-
ulate kernel execution, and simulate any interactions between
the host environment and the kernel as described in Section III.
The simulator should be fast enough to test comple te experi-
ments within a reasonable time. No real-time control hardware
should be requ ired to run simulations, only a mod el of the
hardware listing the available devices. Hardware/software co-
simulation for embedded systems is not new, and existing
work proposes various techniques and approaches for such
simulations [ 16], [17]. At the most detailed level, we find
cycle-accurate simulations, such as [18], [19], [21], that take
the same binary as the real system and simulate the compo-
nents and registers of the microarchitecture in great detail.
Such simulations require highly detailed models making them
inflexible and potentially time-consuming to develop. Cycle-
accurate simu la tors are extremely detailed and accur ate but are
also slow. It is not our goal to do perform ance analysis on the
ARTIQ microarchitecture, an d we do not need such a level of
detail. Since our target is software testing and not hardware
performance analysis, we will focus on API simulation. An
API simulation cross-com piles the target progr am to a simu-
lator that implements the same API as the target system. The
simulator r equires no execution model of the hardware and
can therefore be fast. Based on our requirements, we decide
to target functional simulation of kernels and real-time devices
using API simulation. Timeline cursor man ipulations will be
simulated at the A PI level. Real-time devices are simulated
at their driver API level, and functional behavior will be
based o n a simplified device mo del. Hence, we will replace
the timeline cursor API and the device driver APIs shown in
Figure 4 with calls to our simulation infrastructure. The state
of the RTI O counter and RTIO engine are not simulated, which
would require the use of a cycle-accurate simulator. Instead,
we estimate the value of the RTIO counter when synchr onizing
the timeline cursor with the RTIO counter.
For simulation o f real-time kernels, we will need to cover
classical constructs (i.e. the CPU), the timeline cursor, the
event timeline, and input buffers. Since both the host code
and the classical co nstructs of the kerne ls are valid Python
code, we dec ided to use the host Python process to simulate
kernels. Hence, our simulator is implemented in Python a nd
all components in Figure 4 will be executed by the Python
interpreter. Using the same Python process will also instantly
implement host-kernel variable synchronization and ha ndling
of RPCs. We decided to split the simulation of the remaining
components into two parts: time and signals. The time com-
ponen t covers the simulation of the time line cursor, and the
signals component covers the simulation of the event timeline
and input buffers. Figure 5 shows a schematic overview of the
simulated components. In the remainder of this section, we
will cover time and signal simulation.
A. Time
A kernel can read and write the value of the timeline cursor
using the functions now_mu() and at_mu(t), respectively.
Additionally, the cursor can be moved relative from its current
Fig. 5. A schematic overview of the simulated microarchitectural components.
position using the functions delay_mu(d) and delay(d).
The latter func tion is used with a de lay time expressed in
seconds instead of MU. Since the delay in seconds is converted
to a delay in MU, the delay(d) function is not further
discussed. Fun ctions used to modify the timeline cursor behave
differently depending on the timing context in which they are
used. There are two timing con texts, sequential and parallel,
which are used as regular Python context managers using the
with statement. The two contexts are used to specify if a
set of RTIO operations should be executed sequentially or in
parallel. The contexts can be nested ar bitrarily, and by default,
every function starts in a sequential context. As a result, the
timeline cursor simulation will have to adapt based on the
current timing context.
In a sequential context, any modification to the timelin e
cursor is interpreted as a seque nce of operations. Hence, two
successive delays with duration d
0
and d
1
is equal to one
delay with duration d
0
+d
1
. Any call to at_mu(t) is a pplied
instantly. Modifica tions to the timeline cursor in a parallel
context are postponed such that operations in the context can
be interpreted as parallel. Wh en the program exits the parallel
context, the timeline cursor will be moved f orward by the
duration of the long e st positive delay. If a para llel context
containing delays with duration d
0
, . . . , d
n
is entered with
the timeline cur sor at t
start
, the timeline curso r will be set to
t
start
+max (0, d
0
, . . . , d
n
) when the context exits. In a parallel
context, calls to at_mu(t) with value t
new
are interpreted as
delays with duratio n t
new
t
start
.
We simulate the time line cursor using a stack of simulation
contexts that represen t the nested timin g contexts. The ap-
propriate simulation context is pushed on and po pped off the
stack wh e n a timing context is entered and exited, respec tively.
Each simulation context holds a current time t
current
and a
duration t
duration
variable in MU. When pushed to the stack,
t
current
is inherited fr om the simulation context currently at th e
top of the stack while t
duration
is always initialized to zero.
When a simulation context is popped off the stack, t
duration
is propagated to the underlying simulation context as a delay.
There is a sequential and a parallel simulation context availab le
and when the simulation starts, the stack is initialized with a
sequential simulation context with t
current
= 0. At any time,
interactions with the timeline cursor are handled by the context
at the top of the stack. now_mu() always returns t
current
while calls to delay_mu(d) are handled differently by the
sequential and parallel simulation context. For a sequential
simulation context, a delay with duration d will in crement
t
current
and t
duration
by d while for a parallel simulation context,
t
current
is not changed and t
duration
= max (t
duration
, d). For both
simulation contexts, calls to at_mu(t) with value t
new
are
converted to delays with duration t
new
t
start
. The described
system using the stack of simulation contexts accu rately sim-
ulates the behavior of the timeline cursor.
For correct synchr onization of the timeline cursor to the
RTIO counter, we keep track of a timeline horizon which is
essentially an estimation RTIO counter state. For a simulation
with events a t timestamps t
0
, . . . , t
n
, the timeline horizon is
defined as max (t
cursor
, t
0
, . . . , t
n
) where t
cursor
is the current
position of the timeline cursor. Whe n we synchronize the
timeline cursor to the RTIO counter, we first set the po sition
of the timeline cursor to the position of the timeline horizon
before inserting a delay of 125 × 10
3
MU. Using the timeline
horizon for sync hronization is necessary to simulate code with
negative delay s corr ectly. Negative delays a re commo nly used
to compensate for latencies of physical equipment.
B. Sig nals
For signal simulation, we need to simulate the event timeline
and the input buffers. Interaction s with the event timeline
and input buffers happen through device drivers. We simulate
device drivers on an API level, and each driver simulates the
signals and state of a device based on a simplified model.
Signals will be simulated on a functional level, for examp le ,
frequency an d phase for a direct digital synthesis (DDS) chip
and a binary state for a digital output. To enable signal
simulation, we will capture all function calls to drivers by
replacing each device driver with a matching simulation driver.
During initialization, each simulation driver obtains one or
more nam e d signal objects corresponding to the state of the
device. Each time a driver fun c tion is called to change the
state of th e device, the dr iver will push new values to the
appropriate signal objects. Pushing a new value to a signal
object will cause an event to be c reated at th e current position
of the timeline cur sor. Each signal object stores its events
and therefore possesses a part of the complete event timeline
of the system. If two events for a single signal have the
same timestamp, the latest event overwrites the existing event.
Additionally, the simulation driver can keep a n internal state
and perf orm any additional processing for proper signal and
time simulation.
To test real-time control software, we must have the ability
to read the value of a signal at any given timestamp. To pull
the value of a signa l at a specific timestamp, we search for the
event with the highest timestamp that is less or equal to the
timestamp of interest. The value of that event will represent
the value of the signal at the given timestamp. If no event is
found, the signal has not been set, and its value is unknown.
The last component that must be simulated is the input
buffers. Values in these buffers origina te from events with
return values, such as sampling the value of a digital input
device. For software testing, return values from input devices
must be configurable by a test case. For th at purpose, we
introdu ce input signals that describe the state of a hypothetical
device that generates the input signal observed by a device.
Just as o utput signals, input signals a re obtained by the device
drivers during initialization, for example, an input probability
signal for a digital input device. When the simulation driver
is called to sample the input value, the driver p ulls the current
value of the input p robability signal and uses it to generate
a return value. The return value is stored in the input buffer
that is part of the simulatio n driver. Once the actual sampled
value is requested from the driver, th e value is taken from the
buffer an d retu rned. Each input device has input signals that
match the level of its functionality, such as input voltage for an
analog-to-digital converter (ADC) and input frequency for a
digital edge counter. During software testing, input signals can
be config ured using the same pu sh/pull infrastructure used for
output signals. This allows input signals to be adjusted using
the same event timeline as output signals.
C. Implementation
We have implemented a simulation platform for ARTIQ
based on the propo sed methodologies for time and signal sim-
ulation. The simulator is part of our open-source library Duke
ARTIQ extensions (DAX) [35] which in tegrate s tightly with
the ARTIQ open-so urce software environment. The integration
entry point for the DAX simulator is the device database
(DDB), a c entral file in every ARTIQ proje ct that defines
the list of availab le real-time devices and th eir correspond ing
drivers. To enable simu la tion, users m ake a small modification
that allows the DAX simulation infra structure to mutate the
DDB bef ore ARTIQ reads it at the start of an experiment.
During DDB mutation, all device drivers are replaced by
matching simulation drivers, a nd an extra simulation config-
uration device is inserted into the DDB. When the driver for
the core d evice is loaded in an experiment, the core device
simulation driver will be loaded, w hich in turn loads the driver
for the simulation configuration device. The DAX simulation
infrastructure is loaded during initialization of the simulation
configuration device, which includes the setup of a time and a
signal manager. Any other simulation drivers that are loaded
will request their signal objects from the signal manager.
When the expe riment runs and a kernel function is called,
the core device driver is requested to compile the kernel and
execute it on the core device. Instead, the simulation driver
for the core device will ju st run the kernel function inside a
sequential time context using the current Python process. Any
interactions with the timeline cursor or time context APIs are
forwarded to the time manage r for simulation while simulatio n
drivers will perform all the signal simulations. Events for
each signa l are stored in a sorted dictionary based on their
timestamps, and binary search algorithms are used to push
and pull events.
We integrated our simulation platform with the standard
Python unit test framework such that users can r un tests fo r
real-time control software using existing testing environments.
The DAX unit test base class, which inherits the standard
Python unit test class, provides functions to push, pull, and test
signal values at any timeline cursor position. Existing too ls for
step debugging, automated testing, and continuous integration
will allow real-time control software to be tested to the same
level as any other production-level software project.
D. Limitations
Functional simulation of kernels at the API level is fast
and especially useful for testing and verification of real-time
control software, but it also has limitations. Without simulation
of the RTIO counter and the RTIO engine, slack can not be
reliably simulated. As a result, API simulation can not accu-
rately predict und erflow exceptions. A low-level and cycle-
accurate micro architectural simulation would be required to
simulate slack. Such simulators are much slower and are not
convenient for software testing and verification at the level
discussed in this paper.
Some limitations are specific to our implem entation of the
simulation infrastru cture. We use the running Python process
to execute kernels, but th e ARTIQ DSL only supports a
subset of the Python language. He nce, the simulation is more
permissive than the ARTIQ compiler. We can mitigate this
issue by compiling kernels before simulation. By default, the
DA X simulator does not compile kernels to run simulations
faster.
Host-kernel attribute synchronizatio n also behaves differ-
ently in simulation. When running on a co re device, the
ARTIQ environment synchronizes host variables modified in a
kernel wh en the kernel finished executing (see Section III-B).
During simulation, attributes are continuously synchronize d
due to the use of a single Python process for host and kernel
code. The behavior of the simulator could be different when
a kernel modifies the same variable used by an RPC function
it calls. Su ch code would have confusing semantics to start
with, and we have not encountered any such code.
The model of the parallel timing context descr ibed in Sec-
tion IV-A differs slightly from the timing model imple mented
in the ARTIQ compiler. The DAX simula tor propagates the
parallel semantics until a sequential context is entered (deep
parallel) while the ARTIQ compiler only propa gates the par-
allel semantics to top-level statements in the context (shallow
parallel). Kernel code that potentially behaves differently with
deep and shallow parallel semantics can be detected using
abstract syntax tree (AST) analysis. We have developed a
separate tool [36] that flags such kernel code.
V. EVALUATION
To evaluate the p erformance of the DAX simulation plat-
form, we measured its kernel execution time and compared
Label Experiment
mw freq Microwave frequency scan
mw
rabi Microwave Rabi frequency scan
mw ramsey Microwave Ramsey scan
mw gate Microwave repeated gate scan
gco
freq Global co-propagating frequency scan
gco rabi Global co-propagating Rabi frequency scan
gco ramsey Global co-propagating Ramsey scan
ico freq Individual co-propagating frequency scan
ico
ttime Individual co-propagating time scan
state init Qubit state initialization scan
tickle Tickle scan
direct rb Direct randomized benchmarking
gst Gate set tomography
sqst Single-qubit state tomography
TABLE I
LI ST OF EXPERIMENTS US ED F OR THE EVALUATION.
it to the execution time on hardware. We used two experi-
mental tr apped-io n quantum processors for our evaluation, the
software-tailored architecture for quantum co-design (STAQ)
system [8] and the red chamber (RC) system [37] . Both
systems are con trolled by an ARTIQ co ntrol system, but
STAQ uses a core device based on the Kasli 2.0 contr oller
[15] while RC uses a KC705-based controller [38]. Besides
the different real-time control systems and devices, the main
difference between these two setups is that STAQ is at
cryoge nic temperatu res while RC is at room temperature. We
chose 14 commonly used exp e riments with a single kernel for
the STAQ system. The set of experiments, listed in Table I,
contains 11 scanning-type experiments used for calibration and
three benc hmarking experimen ts including, Direct randomized
benchm arking (RB) [39]–[41], gate set tomography (GST)
[42], and single-qubit state tomography ( SQST) [43]. Both
systems use modular real-tim e control software developed with
the DAX modular software framework [44], and parts of the
system-specific control software are available in the DAX-zoo
repository [45]. The three benchmark experiments are portable
and can also run on RC while the four microwave (MW)
calibration experiments have an equivalent implementation
for the RC system. All scanning-type experim ents scan over
20 points and take 100 samples per point. Direct RB is
performed with circuit lengths startin g at 1 and scaling up
exponentially to 16. For each circuit length, we benchmark
ten different circuits with 100 samples for each circuit. The
GST benchmarks are performed with a total of 523 different
circuits based on our germs, taking 100 samples per circuit.
Finally, SQST is perfo rmed with a grid of 5 times 10 angles
taking 100 samples for each point.
For our evaluation, we r un the experime nts for both sys-
tems o n a Kasli 2.0 controller. The RC software can run
on an appropria te ly configured Kasli controller by replacing
the DDB. All calibration experiments are executed with and
without buffering. Bu ffering allows the real-time co ntrol soft-
ware to schedule the operations for the next samples w hile
the incoming data of earlier samples are kept temporally in
hardware buffers. ARTIQ supports such hardware buffers,
mw_freq
mw_rabi
mw_ramsey
mw_gate
gco_freq
gco_rabi
gco_ramsey
ico_freq
ico_time
state_init
tickle
direct_rb
gst
sqst
0
5
10
15
20
25
30
Speedup
Fig. 6. Kernel execution time speedup for our simulator relative to the
execution time on a core device.
but the real-time software must be designed appropriately to
utilize them. Buffering can further increase the thr oughput
and performance of kernels by reducing stalling time at the
cost o f incre a sed latency between receiving and processing
input events. None of the experiments are sensitive to the
increased latency and will benefit from increased throughput.
We configure a buffer size of 16 samples, which should be
large enough to get th e maximum pe rforman ce gain achievable
with buffering. The Direct RB and GST exp eriments are
always buffered with a fixed buffer size of 1 and SQST is
always unbuffered. The kernel execution time is measured with
nanosecond precision using the real-time clock available in
the Kasli controller. We then run the same experiments u sin g
our DAX simulation platform on a computer equipped with
an AMD Ryzen 7 3700X CPU and 32 GB of memory. The
computer runs on Ubuntu 20.04 LTS, and the execution time
of the kernel simulation is measu red in nanosecon ds using
the standard Python time library. All experime nts run five
times on hardware and five times in simulation to take the
average simulation time. Our measurements are performed
using ARTIQ version 6.7659.c6a7b8a8 and the results are
presented in Figure 6.
The results in Figure 6 show that simulation speeds up
execution up to 2 6.8 times with an average speedup of 6.9
times. Especially the mw
ramsey, gco ramsey, and tickle
experiments achieve large sp e edups. The exceptional speedup
for these experiments is caused by the long delays th at are
part of the experiment. The core device waits for these delays
before the kernel finishes execution, while the simulator only
simulates the passing o f time but does not wait for it. The
experiments that show the least speedup are the direct
rb and
gst experiments. For STAQ, both experimen ts only yield a 1.3
times speedup, while for RC, the direct
rb experimen t has
no speedup and the gst experiment is slower with a speedup
of 0.8 times. The limited speedup of these two experiments
is caused by short delays and a high number of operations,
which results in a h igh event density. As a result, the sim ulator
must process many events while the experiment has a relatively
short execution time on hardware. In general, we could state
that the execution time on hardware t
hardware
is m ostly limited
by the length of delays inser ted during the experiment. These
delays sum up to the total length of the timeline and therefore
the duration of the experiment when running on hardware.
The execution time of the simulator t
sim
is not much affe cted
by d elays and instead is mostly limited by the total number
of events present in the exp e riment. We know that speedup
is define d as S = t
hardware
/t
sim
. Roughly speaking, we can
derive that the to ta l duration of an experiment is proportional
to speedup while the total number of events is inversely
proportional to speedup.
We can see from Figure 6 tha t th e experiments running
on the RC system always yield lower speedup compared to
the same experiment running on STAQ. The different results
are caused by differences in the control for the cooling and
pumping pro cedures. Both procedures are executed by a ll
experiments at the star t of ea ch sample. STAQ uses three
digital outputs and one DDS while RC h a s additional features
and uses five digital outputs and a DDS. As a result, RC
inserts more events for each cooling and pumping procedu re.
Additionally, STAQ uses a constant DDS frequency for both
proced ures while RC uses a different frequency for each pro ce-
dure which adds two additional DDS configuration events for
each sample. Hence, the total number of events for RC exper-
iments is higher than for STAQ which reduces the speedup.
The additional DDS operations also inser t extra delays into
the experime nt, but these delay s do not c ompensate for the
increased number of events. Figure 6 also shows buffered
experiments tend to have slightly less speedup compared to
their unbuffered counterparts. Buffering can reduce the execu-
tion time overhead of exper iments resulting in faster execution
on hardware. The total number of events per experiment is
not affected by buffering. The result is a reduced speedup for
experiments with buffering. The reduction in execution time
by buffering is limited though due to the highly optimized
control software.
In addition to speedup, we have also measured the timing
accuracy of the simulated timeline cursor compared to execu-
tion on the core device. High timing accuracy is not a specific
requirement for correct functional simulation, but a simulator
with high timing accuracy could be used for estimating the
timing of experiments. The timeline cursor simulation is ac-
curate, but variable delays and inaccurate d e la ys in simulated
device drivers can still introduce errors. Variable delays mainly
occur whe n the timeline cursor is synchroniz e d with the RTIO
counter. Such synchronization is performed at least on ce at
the star t of the experiment (see Section III-A) but can also
occur at other mome nts. We simulate the synchro nization
of the timeline cursor using a timeline horizon and insert
an additional delay of 125 × 10
3
MU. We would like to
emphasize that the presence of a variable delay indicates that
the relative timing between th e events before and after the
delay is not re levant, and any variation will not negatively
mw_freq
mw_rabi
mw_ramsey
mw_gate
gco_freq
gco_rabi
gco_ramsey
ico_freq
ico_time
state_init
tickle
direct_rb
gst
sqst
0.15
0.10
0.05
0.00
0.05
0.10
Relative error
Fig. 7. The error of the simulated timeline cursor relative to the timeline
cursor of the core device. The filled markers represent the regular configuration
while empty markers represent the optimistic configuration.
impact the functionality o f the experiment or the simulation.
Hence, simulating timeline cursor synchronization with a
timeline horizon is sufficient for correct functional simulation.
A variable delay can also occur when an experiment needs
to wait for an input event that occ urs at an unpredictable
time, though none of the experim ents in Table I contain such
constructions. Inaccu rate delays in simulated device drivers
are often cau sed by a simplified timing model of the device
driver. In practically all cases with inaccuracy, the simulated
driver inserts less delay than the actu a l driver.
To measure the timing accuracy of the simulated timeline
cursor, we store the value of the timeline cursor after the first
synchro nization with the RTIO counter and at the end of the
experiment. The difference between the two values represents
the total length of the event timeline in MU. We run the
simulations with two configurations: regular and optimistic.
When the timeline cursor is synchronized with the RTIO
counter, our simulator inserts a fixed delay of 125 × 10
3
and
0 MU for the regular an d optimistic config uration, respectively.
We measured the event timeline length on the core device
and with the two simulation config urations for a ll experiments
listed in Table I using the STAQ and RC system. For e ach
combination of system, experiment, and configuration, we
calculate the relative er ror of the simulation which is defined
as (t
sim
t
exe
)/t
exe
where t
exe
and t
sim
are th e measured event
timeline lengths on the core device and during simulation,
respectively. The results for are shown in Figure 7 and are
also listed in Table II and III.
The results in Figure 7 show the error of the simulated time-
line cursor relative to the timeline cursor of the core device.
The regular and optimistic configurations are represented by
the filled and empty markers, respectively. When comparing
the re sults of the two different configurations, we see that the
optimistic configuration always estimates a shor ter timeline
length, which is expected. If we only look at the results fo r the
optimistic configuration, we see that all have a relative error
Experiment STAQ STAQ (buffered)
Regular Optimistic Regular Optimistic
mw freq -4.9% -13.2% 7.4% -2.0%
mw rabi -4.6% -12.4% 6.9% -1.8%
mw ramsey -0.7% -1.8% 0.9% -0.2%
mw
gate -2.9% -8.0% 4.4% -1.2%
gco freq -4.6% -12.7% 7.2% -1.9%
gco rabi -0.5% -1.4% 0.6% -0.2%
gco ramsey -0.7% -1.7% 0.9% -0.2%
ico
freq -4.8% -12.8% 7.2% -1.9%
ico time -3.9% -10.3% 5.6% -1.5%
state init -5.0% -13.7% 7.8% -2.0%
tickle -1.2% -1.3% -1.2% -1.3%
direct
rb 6.2% -1.3%
gst 6.5% -1.7%
sqst -1.9% -5.8%
TABLE II
THE ERROR OF THE S IMULATED TIMELINE CURSOR RELATIVE TO THE
TIMELINE CURSOR OF THE CORE DEVICE FOR STAQ .
Experiment RC RC (buffered)
Regular Optimistic Regular Optimistic
mw freq -4.2% -10.8% 5.6% -1.7%
mw rabi -4.0% -10.2% 5.3% -1.6%
mw
ramsey -0.7% -1.7% 0.8% -0.3%
mw gate -2.8% -7.1% 3.5% -1.1%
direct rb 1.4% -3.2%
gst 2.5% -2.6%
sqst -1.6% -4.9%
TABLE III
THE ERROR OF THE S IMULATED TIMELINE CURSOR RELATIVE TO THE
TIMELINE CURSOR OF THE CORE DEVI CE F OR RC.
lower or equa l to 0.0. The optimistic configuration represents
the lower-bound execution time where variable delays a re
always zero. When ru nning on a ctual ha rdware, variable delays
are not always zero, and as a result, the optimistic configura-
tion underestimates the timeline length. We also noticed that
all un buffered results with regular configuration h ave a relative
error lower or equal to 0.0. When running on hard ware w ithout
buffers, the system has negative slack after each sample,
and timeline synchronizations will insert de la ys larger than
125 × 10
3
MU. The regular configuration underestimates the
length of the variable delay and therefore underestimates the
total timeline length. Regardless, the estimation of the regular
configuration is better than that of the optimistic configu-
ration for unbuffered experiments. The opposite is true for
buffered experiments. Buffering reduces the length o f variable
delays caused by timeline synchro nizations by maintaining
slack between samples. The regula r configuration is often too
pessimistic for buffered exp e riments and the e stima tion of the
optimistic configur ation is better most of the time.
We noticed two other trends in Figure 7 that relate to
the total timeline length of experiments. First, the results of
some experiments have little spread, in particular mw
ramsey,
gco
rabi, gco ramsey, and tickle. These are all calibration
experiments with relatively lo ng delays and long total timeline
lengths. The long timeline length combined with the limited
sources of errors (i.e. low density of variable delays and
events) results in a small relative er ror and therefore, a small
spread between different configurations. Second, the results
of the RC system tend to be closer to 0.0 than the equivalent
STAQ results. We already mentioned tha t due to differences
in the cooling and pu mping pro cedures, the RC system in-
serts more events for each sample of the experiment. These
additional events also insert extra delays into the experiment.
As a result, the total timeline length of RC experiments are
on average 28. 1% longer compared to their STAQ equivalents.
Again, the increased timeline length with no additional sources
of errors reduces the relative error.
Overall, th e average relative error for the regular config-
uration is 3.6%, and for the optimistic configuration, the
average relative error is 4.4%. Based on our analysis of the
regular and optimistic configurations, we concluded that the
timeline length of buffered and unbuffered experiments ar e
better e stima te d by the regular and optimistic configurations,
respectively. When choosing the optimistic configuration for
buffered experiments and the regular configuration for un-
buffered experiments, the resulting average relative er ror is
reduced to 2.1%, leading to an average a c curacy of 97.9%. We
can conclude that even in the presence of variable delays and
simulated device dr ivers with simplified timing mod els, the
position of the timeline cursor is simulated with high accuracy
when choosing the appropriate configuration.
VI. CONCLUSION
We have presented a functional simulation platform for
real-time control software that ena bles software testing and
verification. To simplify testing and verification, timeline ma-
nipulations and d evice drivers are simulated on the application
programming interface (API) level. Our simulation platform
accurately simulates a timeline cursor using a stack while
the event timeline is sim ulated using signals and events.
Input signals are also simulated on a f unctional level and
use the same interactive signal and event infrastructure used
for outpu t signals. We implemented a simulator based on the
proposed concepts, which is part of our open-source library
Duke A RTIQ extensions (DAX). Our simulator integrates
tightly into the advanced rea l-time infrastructure for quantum
physics (ARTIQ) environm ent and is capable of simulating
real-time kernels and host-kernel intera ctions. We integrated
our simulator with the standard Python unit test frameworks
such that real-time con trol so ftware can be tested using ex-
isting tools for step debugging, unit testing, and continuous
integration. Compar e d to kernel execution on the core device,
kernel simulation is 6.9 times faster on average. Even with
the presence of variable delays and simplified timing models
for device drivers, the position of the timeline cursor is
simulated with an average accuracy of 97.9% when choosing
the appropriate configuration.
ACKNOWLEDGMENT
This work is fund e d by EPiQC, an NSF Expeditions in
Computing (1832377), the Office of the Director of National
Intelligence - Intelligence Advanced Research Projects Activ-
ity throu gh an ArmyResearch Office contract (W911NF-16-1-
0082) and the NSF STAQ project (1818914).
REFERENCES
[1] F. Arute, K. Arya, R. Babbush, et al.,
“Quantum supremacy using a programmable
superconducting processor, Nature, vol. 574, no. 7779,
pp. 505–510, Oct. 2019, ISSN: 1476-4687. DOI:
10.1038/s41586-019-1666-5. [Online]. Available:
https://doi.org/1 0.1038/s41586-019-1666-5.
[2] C. Ryan-Anderson, J. G. Bohnet, K. Lee,
et al., “Realization of real-time fault-tolerant
quantum error correction, Phys. Rev. X,
vol. 11, p. 041 058, 4 Dec. 2021. DOI:
10.1103/PhysRevX.11.041058. [Online]. Available:
https://link.aps.org/doi/10.1103/PhysRevX.11.041058.
[3] L. Postler, S. Heußen, I. Pog orelov, et al., Demon-
stration o f fault-to le rant universal quantum gate oper-
ations, 2021. DOI: 10.48550/ARXIV.2111.12654. [On-
line]. Available: https://arxiv.org/abs/2111.1265 4.
[4] Y. Wang, Y. Li, Z.-q. Yin, et al., “16-qubit ibm uni-
versal quantum compute r can be fully entan gled, npj
Quantum information, vol. 4, no. 1, pp. 1–6, 2018.
[5] I. Pogorelov, T. Feldker, C. D. Marciniak, et al.,
“Compact ion-trap q uantum computing de monstrator,
PRX Quantu m , vol. 2, p. 020 343, 2 Jun. 2021. DOI:
10.1103/PRXQuantum .2.020343. [Online]. Available:
https://link.aps.org/doi/10.1103/PRXQuantum.2.020343.
[6] R. Acharya, I. Aleiner, R. A llen, et al., Suppress-
ing quantum errors by scaling a surface code logical
qubit, 2022. DOI: 10.48550/ARXIV.2207.06431. [On-
line]. Available: https://arxiv.org/abs/2207.0643 1.
[7] G. Pagano, A. Bapat, P. Becker, et al., “A quantum
approximate optimization algorithm in a tra pped-ion
quantum simulator, en, Oct. 2020. [Online]. Available:
https://tsapps.nist.gov/p ublication/get
pdf.cfm?pub id=928237.
[8] J. Kim, T. Chen, J. Whitlow, et al., “Hardware design o f
a trapped-ion quantum computer for software-tailored
architecture for qu antum co-design (staq) project,
in Quantum 2.0, Optical Society of America, 2020,
QM6A–2.
[9] M. Blok, V. Ramasesh, T. Schuster, et al., “Quantum
informa tion scrambling in a superconducting qutrit pro-
cessor, arXiv preprint arXiv:2003.03307, 2020.
[10] S. Bourdeauducq, R. J¨ordens, P. Zotov, et
al., Artiq 1.0, version 1.0, May 2016. DOI:
10.5281/zenodo.51303. [Online]. Available:
https://doi.org/1 0.5281/zenodo.51303.
[11] V. Negnevitsky, “Feedback-stabilised quantum states in
a mixed-species ion system, Ph.D. dissertation, ETH
Zurich, 2018.
[12] P. Maunz, J. Mizrahi, and J. Goldberg, Ioncontrol
v. 1.0, version 00, Jul. 2016. [Online]. Available:
https://www.osti.gov/biblio/1326630.
[13] X. Fu, L. Riesebos, M. A. Rol, et al., “Eqasm: An ex-
ecutable quantum instruction set architecture, in 2 019
IEEE International S ymposium on High Performance
Computer Architecture (HPCA), 2019, pp. 224–237.
DOI: 10.1109/HPCA.2019.00040.
[14] C. A. Ryan, B. R. Johnson, D. Rist`e, et al., “Hardware
for dynamic quantum computing, Review of Scientific
Instruments, vol. 88, no. 10, p. 104 703, 2017.
[15] G. Ka sprowicz, P. Kulik, M. Gaska, et al., Ar tiq
and sinara: Open software and hardware stacks for
quantum physics, in OSA Quantum 2.0 Conference,
Optical Society of America, 2020, QTu8 B.14. DOI:
10.1364/QUANT U M.2020.QTu8B.14. [Online]. Avail-
able: http://www.osapublishing.org/abstract.cfm?URI=QUANTUM-2020-QTu8B.14.
[16] J. Rowson, “Hardware/software co-simu la tion, in 31st
Design Automation Conference, 19 94, pp. 439–440.
DOI: 10.1109/DAC.1994.204143.
[17] K. Hines and G. Borriello , “Dynamic communica tion
models in embedded system co-simulation, in Proceed-
ings of the 34 th Annual Design Automation Conference,
ser. DAC ’97 , Anah eim, California, USA: Association
for Com puting Ma chinery, 1997 , pp. 395–400, ISBN:
0897919203. DOI: 10.1145/266021.266178. [Online].
Available: https://doi.org/10 .1145/266021.266178.
[18] J. Lowe-Power, A. M. Ahmad, A. Akram, et
al., The gem5 simulator: Version 20.0+, 2020.
DOI: 10.48550/ARXIV. 2007.03152. [Online]. Avail-
able: https://arxiv.org/abs/2007.03152.
[19] P. R. Panda, “ Systemc: A modeling platform supporting
multiple design abstractions, in Proceedings of the
14th Internationa l Sy m posium on Systems Synthesis,
ser. ISSS ’01, Montr´eal, P.Q., Canada: Association
for Computing Mac hinery, 2001, pp. 75–80, ISBN:
1581134185. DOI: 10.1145/500001.500018. [Online].
Available: https://doi.org/10 .1145/500001.500018.
[20] “Ieee standard for standa rd systemc language ref-
erence manual, IEEE Std 1666-2011 (Revision
of IEEE Std 1666-2005), pp. 1–638, 2012. DOI:
10.1109/IEEESTD.2012.6134619.
[21] J. Bachrac h, H. Vo, B. Richards, et al., “ Ch isel: Con-
structing hardware in a scala embedded language,
in DAC Design Automation Co nference 2012, 2012,
pp. 1212–1221. DOI: 10.1145/2228360.2228584.
[22] C. H e lmstetter and V. Joloboff, “Simsoc: A systemc
tlm integrated iss fo r full system simulation, in APC-
CAS 2008 - 2008 IEEE Asia Pacific Conference on
Circuits and Systems, 2008, pp. 1759–1762. DOI:
10.1109/APCCAS.2008. 4746381.
[23] C. Erbas, A. D. Pimentel, M. Thompson, et al., “A
framework for system-level modeling and simulation
of embedded systems architectures, EURASIP Journal
on E m bedded Systems, vol. 2007, no. 1, p. 082 12 3, Jul.
2007, ISSN: 1687-3963. DOI: 10.1155/2007/82123. [On-
line]. Available: https://doi.org/10.1155/2007/82123.
[24] A. Pimentel, C. Erbas, and S. Polstra, “A systematic
approa c h to exploring embedded system architec tures
at multiple abstraction levels, IEEE Transactions on
Computers, vol. 55, no. 2, pp. 99–112, 2006. DOI:
10.1109/TC.2006.16.
[25] G. Li, Y. Din g, and Y. Xie, “Sanq: A simulation frame-
work for architecting noisy intermediate-scale quantum
computing system, arXiv preprint arXiv:1904.115 90,
2019.
[26] X. Fu, J. Yu, X. Su, et al., “Quingo: A pro-
gramming fr a mework for heterogeneous quantum-
classical computing with nisq features, arXiv preprint
arXiv:2009.01686, 2020.
[27] L. Riesebos, X. Fu, S. Varsamopoulos, et al.,
“Pauli frames for quantum computer architectures,
in Proceedings of the 54th Annual Design
Automation Conference 2 017, ser. DAC ’17,
Austin, TX, USA: Association for Computin g
Machinery, 2017, ISBN: 9781450349277. DOI:
10.1145/306 1639.3062300. [Online]. Available:
https://doi.org/1 0.1145/3061639.3062300.
[28] L. Riesebos, X. Fu, A . Moue ddenne, et al.,
“Quantum accelerated computer architectures, in
2019 IEEE International Symposium on Circuits
and Systems (ISCAS), 2019, pp. 1–4. DOI:
10.1109/ISCAS.2019.8702488.
[29] K. M . Svore, A. Geller, M. Troyer, et al. , “Q#: Enabling
scalable quantum computing and development with a
high-level domain -specific language, arXiv preprint
arXiv:1803.00652, 2018.
[30] T. Nguyen, A. Santana, T. Kharazi, et al., “Extending
c++ for heterogeneous quantum-classical computing,
arXiv preprint arXiv:201 0.03935, 2020.
[31] R. S. Smith , M. J. Curtis, and W. J. Zeng, “A practical
quantum instruction set architecture, arXiv preprint
arXiv:1608.03355, 2016.
[32] F. T. Chong, D. Franklin, and M. Martonosi, “Program-
ming languages and compiler design for realistic quan-
tum hardware, Nature, vol. 549, no . 7671, pp. 180–187,
2017.
[33] J. E. Stone, D. Gohara, a nd G. Sh i, “Opencl: A parallel
programming standard for heterogeneous computing
systems, Computing in science & engine ering, vol. 12,
no. 3, p. 66, 2010.
[34] X. Fu, M. A. Rol, C. C. Bultink, et al., “An
experimental microarchitecture for a superconduct-
ing quantum pro cessor, in Proceedings of the
50th Annual IEEE/ACM International Symposium
on Microarchitecture, ser. MICRO-50 17, Cam-
bridge, Massachusetts: Association for Computing Ma-
chinery, 2017, pp. 813–825, ISBN: 9781 450349529.
DOI: 10.1145/3123939.3123952. [Online]. Available:
https://doi.org/1 0.1145/3123939.3123952.
[35] L. Riesebos, B. Bondurant, and K. R. Brown, Duke
artiq extensions (dax), 2021. [Online]. Available:
https://gitlab.com/duke-artiq/dax.
[36] L. Riesebos, Flake8 artiq plugin, 2020. [Onlin e]. Avail-
able: https://gitlab.com/duke-artiq/flake8-artiq.
[37] Y. Wang, S. Crain, C. Fang, et al. , “High-fidelity
two-qubit gates using a microelec tromechan ic a l-
system-based beam steering system for
individual qubit addressing, Phys. Rev. Lett.,
vol. 125, p. 150 505, 15 Oct. 2020. DOI:
10.1103/PhysRevLett.125.150505. [Online]. Available:
https://link.aps.org/doi/10.1103/PhysRevLett.125.150505.
[38] Xilinx kc705. [Online]. Available:
https://www.xilinx. com/products/boards-and-kits/ek-k 7-kc705-g.html.
[39] E. Magesan, J. M. Gambetta, and J. Emerson, “Scal-
able and robust randomized benchm a rking of quantum
processes, Physical review letters, vol. 106, no. 18,
p. 180 504, 2011.
[40] T. J. Proctor, A. Carignan -Dugas, K. Rudinger,
et al., “Direct randomized benchmarking
for mu ltiqubit devices, P hys. Rev. Lett.,
vol. 123, p. 030 503, 3 Jul. 2019. DOI:
10.1103/PhysRevLett.123.030503. [Online]. Available:
https://link.aps.org/doi/10.1103/PhysRevLett.123.030503.
[41] J. M. Epstein, A. W. Cross, E. Magesan, et al., “Inves-
tigating the limits of randomized b e nchmark ing proto-
cols, Physica l Review A, vol. 89, no. 6, p. 062 321,
2014.
[42] R. Blume-Kohout, J. K. Gamble, E. Nielsen, et
al., Robust, self-consistent, closed-form tomography of
quantum logic gates on a trapped ion qubit, 2013.
DOI: 10.4855 0/ARXIV.1310.4492. [Online]. Available:
https://arxiv.org/abs/1310.4492.
[43] R. Schmied, “Quantu m state tomography of a single
qubit: Comp a rison of methods, Journal of Modern
Optics, vol. 63, no. 18, pp. 1744–17 58, 2016.
[44] L. Riesebos, B. Bondurant, J. Whitlow, et al., “Mod-
ular software for real-time quantum control systems,
in 2022 IEEE International Conference on Quantum
Computing and Engin eering (QCE), 2022.
[45] Duke artiq extensions (dax) zoo, 2022. [Online]. Avail-
able: https://gitlab.com/duke-artiq/dax-zoo.