Scaling to Build the Consolidated Audit Trail: A Financial Services Application of Google Cloud Bigtable 1
1.1 Introduction
Google
Cloud Bigtable
1
is the latest formerly-internal Google
technology to be opened to the public through Google Cloud Platform
[1] [2]. We undertook an analysis of the capabilities of Google Cloud
Bigtable, specically to see if Bigtable would t into a proof-of-concept
system for the Consolidated Audit Trail (CAT) [3]. In this whitepaper,
we provide some context, explain the experiments we ran on Bigtable,
and share the results of our experiments.
1.2 The Consolidated Audit Trail (CAT)
FIS (that completed its purchase of SunGard on November 30, 2015) is a
bidder to build CAT [5], an SEC-mandated data repository that will hold
every event that happens in the US equities and options markets. This
means that every order, every trade, every route of an order between
different organizations, every cancellation, every change to an order —
essentially everything that happens after someone enters a trade on a
computer screen or communicates an order to a broker on a phone to
the fulfillment or termination of that order [6].
There are many requirements that the SEC and the stock exchanges
have identified for CAT [3]. Most crucially, CAT must be able to ingest,
process, and store as many as 100 billion market events every trading
day (about 30 petabytes of data over the next six years [7]), and this data
must be available to regulators for analysis, monitoring and surveillance [3].
But the step in between is the most challenging aspect of building
CAT: taking a huge amount of raw data from broker-dealers and stock
exchanges and turning it into something useful for market regulators.
The core part of converting this mass of raw market events into
something that regulators can leverage is presenting it in a logical,
consistent, and analysis-friendly way. Furthermore, a key aspect of
making stock market activity analysis-friendly is the notion of linking
individual market events together into order lifecycles. An order
lifecycle is the ow of a stock market order starting from when a
customer places an order with a broker-dealer to whenever that order
is fullled (or, in some cases, cancelled or changed) [8]. Figure 1 shows
an example order lifecycle.
Order lifecycles can involve thousands of market events, and there
are millions of order lifecycles per trading day, all which have to
be assembled by the CAT system. In a graph theory context, order
lifecycle assembly can be thought of as nding the edges between
billions of vertices across millions of different graphs, where only one
conguration is correct. Throughout the remainder of this whitepaper,
we refer to the building of order lifecycle graphs as “linkage.
Scaling to Build the Consolidated Audit
Trail: A Financial Services Application
of Google Cloud Bigtable
Neil Palmer, Michael Sherman, Yingxia Wang, Sebastian Just
(neil.palmer;michael.sherman;yingxia.wang;sebastian.just)@fisglobal.com
FIS
Abstract
Google Cloud Bigtable is a fully managed, high-performance, extremely scalable NoSQL database service offered through the
industry-standard, open-source Apache HBase API, powered by Bigtable. The Consolidated Audit Trail (CAT) is a massive,
government-mandated database that will track every equities and options market event in the US nancial industry over a
six-year period. We consider Google Cloud Bigtable in the context of CAT, and perform a series of experiments measuring the
speed and scalability of Cloud Bigtable. We nd that Google Cloud Bigtable scales well and will be able to meet the requirements
of CAT. Additionally, we discuss how Google Cloud Bigtable ts into the larger full technology stack of the CAT platform.
1 Background
1 Throughout this whitepaper we use the terms “Google Cloud Bigtable”, “Cloud Bigtable”, and “Bigtable” to refer to Google Cloud Bigtable. If we are referring to
Google’s internal Bigtable or the Bigtable technology as described in the original Bigtable paper [4], it is mentioned specically.
Jane sends an
order for 1000
shares of Google
to a stockbroker
Ramon sends an
order for 3000
shares of Google
to a stockbroker
The stockbroker
combines the two
orders into an order for
4000 shares of Google
and sends it to a stock
exchange
The trade for 4000
shares of Google is made
on the stock exchange
Scaling to Build the Consolidated Audit Trail: A Financial Services Application of Google Cloud Bigtable 2
Figure 1: An example of an order lifecycle. Two stock market orders from Jane and
Ramon are combined by a stockbroker into a new order, which the stockbroker
sends to a stock exchange. The stock exchange then completes the order. For CAT,
the stockbroker would submit three events (the orders from Jane and Ramon, and
the order the stockbroker sends to the stock market) and the stock exchange would
submit two events (one event for the order received from the stockbroker, and
another event for lling the order).
1.3 Google Cloud Bigtable as a Solution to the
CAT Problem
Google Cloud Bigtable is a fully managed, high-performance, extremely
scalable NoSQL database service offered through the industry-standard,
open-source Apache HBase API [1]. Under the hood, this new service
is powered by Bigtable [1]. Cloud Bigtable offers features that meet
the key technological requirements for CAT. These requirements and
features are detailed in Table 1.
2 Evaluating the Scalability of Data
Insertion into Google Cloud Bigtable
To determine if Cloud Bigtable can handle the volumes of data CAT
will process and to obtain a benchmark of performance, we undertook
a series of experiments. All the experiments involved the validation,
parsing, and insertion of large volumes of Financial Information
eXchange (FIX) messages
2
[9] into Bigtable. Our technology stack and
architecture are described in Figure 2.
Table 1: A list of CAT requirements with the applicable Google Cloud Bigtable
feature fullling the requirement.
CAT Requirement Google Cloud Bigtable Feature [4]
Validate, process, store, and create
linkages for 100 billion market
events in 4 hours
Highly scalable, atomicity of rows,
and distribution of tablets allow
many simultaneous table operations
Identify errors on reported events
within 4 hours
Support heterogeneous types of
data (e.g. options vs. regular stocks;
completed orders vs. cancelled
orders)
Flexible schema, support for
billions of columns, column data
only written to rows that have
values for that column for optimal
performance
Handle new types of linkages and
new kinds of orders not currently
supported by the industry
Identify linkage errors, but be
resilient enough to continue with
the linkage when some events
in the order lifecycle contain
certain errors
Column families allow for error
data to be written to a separate
logical location stored within the
same table.
Maintain a history and status of
changes to an event and order
lifecycle as error corrections
are submitted
Built-in timestamping of all
insertions means change history is
always kept and available for audit
Begin processing events and
creating linkages before all data
is submitted
Auto-balancing and compactions
ensure optimal speeds as more data
arrives, but reasonable costs while
data volumes are low
Scale up quickly when demand is
above expectations
New tablet servers can be added
to a Bigtable cluster in minutes,
making Bigtable highly scalable
Tolerate volume increases up to
25% annually without losing speed
Run queries during lifecycle linkage
process
Nothing prevents querying
while data is being written, and
distribution of data across tablets
mean different computers can serve
multiple requests simultaneously
Backups during linkage process for
quick recovery
Backups are co-written with
production data, with a quick,
invisible switchover in case of
failure
Affordable, with no need to
stockpile computing resources
used only on “worst-case” days
Cloud computing model means
no charges for unused computing
resources
2 FIX is an electronic communications protocol widely used in nancial services. Specically, for the tests in this whitepaper FIX messages were text strings of eld-data
pairs. Many organizations in the nancial services industry hope CAT will support FIX messages [12], minimizing compliance overhead.
1: Google Cloud Storage
FIX Messages
Google Compute Engine
2: Hadoop
Map Reduce
3: HBase API
4: Cloud Bigtable
CAT Records
Scaling to Build the Consolidated Audit Trail: A Financial Services Application of Google Cloud Bigtable 3
Figure 2: The technology stack for our Bigtable experiments. 1: FIX messages were
stored in Google Cloud Storage [10] buckets. Google Cloud Storage runs on Google
File System (GFS)
3
[11], which is similar to HDFS
4
. 2: Using Google Cloud’s bdutil
[14] tool, Hadoop clusters were created using virtual machines running on Google
Compute Engine [15]. No Hadoop settings were changed from the default settings
provided by bdutil. The Hadoop clusters used Google Cloud Storage buckets as a
shared le system instead of HDFS [14]. 3: MapReduce jobs were run on the Hadoop
clusters. The MapReduce jobs communicated with Bigtable through the open-source
HBase API. No default MapReduce settings were changed from the default settings
provided by bdutil 4: Processed data was stored on Bigtable, in a custom format
we developed specically to CAT’s requirements. Note that Bigtable resides in a
separate, Bigtable-only cluster from the cluster running the MapReduce jobs. This is a
characteristic of the Cloud Bigtable service.
2.1 Experiment 1: Performance of Different Virtual
Machine Congurations
This rst experiment compared the performance of different virtual
machine (VM) congurations available through Google Compute
Engine (GCE).
Four different groups of VM congurations are available on GCE
(standard, highcpu, shared core, and highmem). Initial tests showed that
the highcpu and shared core VM groups did not have enough memory
to handle our MapReduce job, and the highmem VM group performed
only a bit better than the standard VM group despite costing more
money to run. Therefore, we chose to formally test only the standard
VM group. There are ve different VM congurations available in
GCE’s standard VM group, outlined in Table 2 [16].
Table 2: Different virtual machine congurations (virtual CPUs and memory)
available in Google Compute Engine’s standard virtual machine group [16].
Virtual Machine
Conguration
Virtual CPUs Memory (GB)
n1-standard-1 1 3.75
n1-standard-2 2 7.50
n1-standard-4 4 15
n1-standard-8 8 30
n1-standard-16 16 60
The costs per core are the same across the standard VM group,
therefore we wanted the VM that gave us the best performance per
core. Our experimental procedure:
1. Four different clusters were created using bdutil, each having
a total of 80 cores and a variable number of VMs. Each cluster
was composed entirely of one conguration of standard VM,
excepting n1-standard-1 which was not powerful enough to run our
MapReduce job.
2. Each cluster ran the same MapReduce job: validate 100 million FIX
messages, parse the FIX messages into a CAT-compliant schema,
then insert the parsed messages into Cloud Bigtable.
3. The time to complete the MapReduce job was recorded for
each cluster, and this time was transformed into a throughput
measurement of FIX messages validated, parsed, and inserted per
second per core.
Results are in Figure 3.
n1-standard-8 gave the best results. n1-standard-4 also performed well
but we opted to do further experiments with n1-standard-8 because
smaller clusters are easier to manage.
The difference in performance is likely due to how many JVMs are
created, which is based on the CPU and RAM conguration of the
worker node. We observed CPU underutilization for n1-standard-2
(2 core) and n1-standard-4 (4 core) VM congurations, and CPU
overutilization for the n1-standard 16 (16 core) VM conguration.
By tuning the Hadoop and MapReduce settings and other system
parameters, we could potentially get better relative performance from
the 2, 4 and 16-core VM congurations, but to keep consistent with the
idea of using managed services we did not tune anything.
Lastly, relative VM performance can change as cluster sizes change, but
this test represented a starting baseline for the remaining experiments.
Research into the behavior of different VM congurations with
different cluster sizes and on the more advanced linkage engine
is ongoing.
3 File storage at Google also involves a technology called Colossus [13] but limited public information about Colossus is available.
4 HDFS is the Hadoop Distributed File System, and was inspired by the original GFS whitepaper [17].
Scaling to Build the Consolidated Audit Trail: A Financial Services Application of Google Cloud Bigtable 4
Figure 3: This gure shows the number of FIX messages validated, parsed, and
inserted into Cloud Bigtable per second per core across four different Hadoop
clusters, each composed entirely of workers of a single VM conguration. Each cluster
consisted of the same number of virtual CPU cores (80), but a different number of
worker nodes. The results were standardized to allow for a per-core comparison,
since the cost per core is consistent across all VM congurations in the gure.
n1-standard-4 and n1-standard-8 performed similarly, but since fewer VMs are
easier to manage we opted to rely on n1-standard-8 as the VM conguration for
all other experiments.
2.2 Experiment 2: Relative Performance of Google
Cloud Bigtable Insertion
We then measured the speed of Bigtable insertions relative to baselines.
Three MapReduce jobs were run, each on a single Google Compute
Engine n1-standard-8 (8 core) VM instance. The input to all three jobs
was 1 million FIX messages, but the outputs were different:
MapReduce job “GFS” processed and parsed each FIX message
then appended the output to a text le in a Google Cloud
Storage bucket.
MapReduce job “Bigtable” processed and parsed each FIX message,
then inserted the parsed output as a row into Bigtable.
MapReduce job “No Write” processed and parsed each FIX message
then did nothing — no write to anywhere.
The relative performance of the three MapReduce jobs can be seen in
Figure 4. The “Bigtable” job performs about half as fast as the “No
Write” job (47.5% as fast), and about two-thirds as fast as the “GFS”
job (66.7%).
Figure 4: This gure shows the number of FIX messages validated, parsed, and then
written (or not) per second across three MapReduce jobs with different output
destinations (but otherwise identical). All jobs were run on a clusters consisting only
of a single n1-standard-8 (8 core) worker. “Write to Bigtable” inserted all parsed FIX
messages into Bigtable, “Write to GFS” wrote all parsed FIX messages to a textle in
a Google Cloud Storage bucket, and “No Write” parsed FIX messages but then did
not write anything.
2.3 Experiment 3: Performance of Google Cloud
Bigtable with Increasing Job Size
The next experiment evaluated Bigtable’s performance with increasing
MapReduce job size. This experiment primarily functioned as a basic
check of Bigtable insertion, as embarrassingly parallel jobs should
not require more or less time per input unit as the number of input
units increases
5
[18].
Clusters of 10 n1-standard-8 (8 core) VMs were used. Each cluster
validated, parsed, and inserted a different number of FIX messages
into Bigtable. The amount of time to nish each job was tracked.
Figure 5a and 5b show the results of this experiment.
Unsurprisingly, as seen in Figure 5b, the cluster throughput was roughly
constant regardless of the amount of data processed [18].
2.4 Experiment 4: Performance of Google Cloud
Bigtable with Increasing Cluster Size
As a nal evaluation of small-scale
6
performance, we analyzed
the performance of Bigtable as the number of worker nodes in a
cluster increased.
5 It is worth noting that MapReduce jobs have a xed amount of overhead that does not change based on job size, and we do not explicitly take this xed overhead into
account. Our MapReduce jobs were all long enough (a half hour) that the impact of xed overhead is negligible, and less than the time variation we would often see
between different runs of the same job. Variation in times between identical jobs is expected with large parallel systems.
6 Small being a relative term. Many large broker-dealers could process a day’s worth of FIX messages in a reasonable amount of time on the cluster sizes we tested in
this section.
Scaling to Build the Consolidated Audit Trail: A Financial Services Application of Google Cloud Bigtable 5
Figure 5a
Figure 5b
Figures 5a and 5b: These gures show the effect on job duration and cluster throughput as the size of the MapReduce job is increased. All data is from clusters of 10 n1-standard-8
(8 core) virtual machines. Figure 5a shows the amount of time required to validate, parse, and insert into Bigtable an increasing number of FIX messages. Figure 5b is a
transformation of the same data in Figure 5a, created by dividing the total number of FIX messages in a job by the number of seconds the job took to complete. It shows the
whole-cluster throughput of the different jobs. The near-linearity of the lines connecting the data points in both gures show linear scaling with linearly-increasing job size.
Figure 6a
Figure 6b
Figures 6a and b: These gures show the effect of increasing cluster size on MapReduce job duration and worker throughput. All data is from clusters of n1-standard-8 (8 core)
VMs. Figure 6a shows the amount of time required to validate, parse, and insert into Bigtable 300 million FIX messages on increasingly large clusters. Figure 6b is a transformation of
the same data in Figure 6a, created by dividing 300 million by the job duration in minutes and the number of worker nodes in the cluster. It shows the average throughput of each
workers in each cluster. The near-linearity of the lines connecting the data points in Figure 6b shows near-linear scaling with linearly-increasing cluster size.
Scaling to Build the Consolidated Audit Trail: A Financial Services Application of Google Cloud Bigtable 6
Hadoop clusters of n1-standard-8 (8 core) VMs were created in sizes
from 5 to 20 worker nodes. Each cluster ran the same MapReduce job:
the validation, parsing, and insertion into Bigtable of 300 million FIX
messages. The amount of time to nish each job was tracked. Figures
6a and 6b show the results of this experiment.
2.5 Experiment 5: Performance of Google Cloud
Bigtable with Extremely Large Clusters
After our initial observations, it qualitatively appeared we did not use
large enough clusters to get a full understanding of Bigtable insertion
scalability. We were also far below the scale required for CAT. Thus, we
undertook another experiment, similar in design to experiment 4.
First, per the instructions of the Cloud Bigtable team, we “primed” Cloud
Bigtable by creating a small table with a similar structure to the data we
were about to insert. This “priming” allows Cloud Bigtable to balance
the table out amongst many tablets prior to large-scale insertion, which
increases performance. Then, three Hadoop clusters (in sizes of 100,
200, and 300 worker nodes) of n1-standard-8 (8 core) VMs were created,
data was inserted from each cluster, and the throughput of the clusters
measured
7
. This data was combined with the data from experiment 4
and the data from the single-worker “Write to Bigtable” insertion test in
experiment 2, and then the universal scalability model was applied to this
full dataset [19]. The results are shown in Figure 7, with the predicted
scalability model and a 95% condence band overlaid.
The universal scalability model interprets the scaling as linear, and predicts
innite scalability. While linear scalability is impossible, at the scale we
measured directly Cloud Bigtable insertion still effectively scales linearly.
3 Discussion
3.1 Key Findings
Bigtable demonstrated the ability to handle data at CAT scale,
with peak ingestion rates of 10 billion FIX messages per hour and
sustained ingestion of 6.25 billion FIX messages in one hour.
With clusters up to 300 worker nodes, Bigtable insertion scaling
does not begin to deviate from linear.
Tuning of virtual machines, Hadoop clusters, or MapReduce jobs is
not necessary to get good performance from Cloud Bigtable.
Given no tuning or optimization, n1-standard-8 (8 core) virtual
machines provide the best performance-per-dollar.
Bigtable insertion (via MapReduce) job duration scales linearly with
increasing job size.
3.2 Performance Metrics
Something not explicitly stated with our experimental results above
is exactly how well Bigtable performed both in relation to CAT
requirements as well as more generally. We have collected some key
performance metrics in Table 3.
Figure 7
Figure 7: This gure shows the effect of increasing cluster size on cluster throughput
and the induced universal scalability model. All data is from clusters of n1-standard-8
(8 core) VMs. Data in this gure is from MapReduce jobs of both varying cluster
size and varying job size, which may have caused inconsistencies but was necessary
given the large spread of cluster sizes covered. The data points are the throughput
measurements, the line is the predicted scalability as obtained from the universal
scalability model, and the colored band is the 95% condence band on both sides of
the prediction. The model is linear, and innite scalability is predicted.
Table 3: Various measurements of Cloud Bigtable performance.
Metric Approximate Performance
Maximum peak throughput, general 2.7 Gigabytes written per second
Data writable with 60 minutes of
peak throughput, general
10 Terabytes
Maximum peak throughput,
task-specic
2.7 Million FIX messages processed
and inserted per second
Data writable with one hour of
peak throughput, task-specic
10 Billion FIX messages processed
and inserted
Maximum sustained throughput for
one hour, general
1.7 Gigabytes written per second
Data written in an hour of
sustained throughput, general
6 Terabytes
Maximum sustained throughput for
one hour, task-specic
1.7 Million FIX messages processed
and inserted per second
Data written in an hour of
sustained throughput, task-specic
6.25 Billion FIX messages
processed and inserted
7 We expect the instability of the large cluster results is due to the rebalancing of the tablets as our insertion throughput increased. See the “Scalability of Google Cloud
Bigtable” subsection in the Discussion for more on tablet rebalancing.
Scaling to Build the Consolidated Audit Trail: A Financial Services Application of Google Cloud Bigtable 7
It is important to note that these metrics were constrained by the
number of worker nodes and the reserved bandwidth we provisioned
within Google Cloud Platform, and not by Bigtable itself. Future
testing will involve many more workers, and will reserve greater internal
bandwidth between Google Compute Engine, Google Cloud Storage,
and Google Bigtable.
3.3 Scalability of Google Cloud Bigtable
Bigtable scaled extremely well — at the volumes we tested, we did not
experience any measurable contention or coherency issues. We believe
that Bigtable can easily handle the scale of CAT: linear scaling was seen
up to a level of performance that can process and insert 1/16 of a
“worst-case” trading day in one hour.
However, we do want to clarify an important point: as Cloud Bigtable
is a managed service, we could not measure the scalability of Bigtable
directly. Additionally, our data was automatically moved between
tablets to increase access speeds, a process described in the Bigtable
paper [4]. This likely had some impact on our results, but we could
only monitor the resources communicating with Bigtable meaning
we could only calculate the scalability of our Hadoop worker cluster,
but not the Bigtable cluster directly. One way we mitigated this impact
was by “priming” Bigtable with data similar to the data it was about to
ingest, which would cause some of the rebalancing work to happen
prior to the start of the timed jobs. This“priming” would not be an
issue in production, it is merely necessary for proper performance tests
on newly created tables.
3.4 Optimal VM Congurations and Tunings
Our experiment to determine the optimal VM conguration was very
basic, and more testing is being done to optimize the worker nodes. For
the purposes of testing Google Cloud Platform and the Cloud Bigtable
service we decided to keep as many default settings as possible, in the
spirit of managed services. However, defaults are not always optimal,
especially with systems as complex as Hadoop, MapReduce, and Bigtable.
For example, we believe the reason the n1-standard-16 (16 core) VM
drops in per-core throughput from the n1-standard-8 (8 core) is due
to a sub-optimal balance between the number of MapReduce tasks
spawned on the node vs. the amount of free memory on the node.
These settings can be adjusted manually.
Additionally, the pattern of performance seen with single-worker
clusters does not hold exactly at different cluster sizes, and further
formal experiments are ongoing.
3.5 MapReduce vs. Google Dataow
For these experiments, we used MapReduce with the HBase API
to communicate with our Bigtable cluster. Our MapReduce job was
relatively naive, and was never subjected to refactoring or tuning. We
are certain the MapReduce job can be streamlined for a performance
improvement, but we do not feel that optimization is a priority because
we plan to switch from MapReduce to Google Dataow.
Google Dataow (which is derived from Google’s internal
“FlumeJava”) is a much more manageable framework than
MapReduce with comparable or superior performance [20]. It is the
“big data” ETL framework of choice internally at Google [21], and
we expect performance gains when we complete this conversion [20].
Additionally, a Dataow codebase will be much easier to manage than
a MapReduce codebase, due to the far greater simplicity and exibility
of the framework.
3.6 Bigtable Schema Design
Bigtable is a “NoSQL” database, and represents a very different way
of thinking about data and databases from traditional relational models
and even other popular NoSQL databases like MongoDB and Redis.
We made only minimal Bigtable-specic optimizations for our tests,
and because of this we missed out on some advantages of Bigtable.
Of particular interest to us are the notions of column families and
column sparsity in Bigtable. Column families speed up data access
while still allowing operations by row [4]. In addition, Bigtable’s
unique handling of columns allows for billions of columns without
performance degradation [4]. Both of these features have factored
into our designs for CAT, and have played a key role in our order
linkage engine.
3.7 Querying Google Cloud Bigtable
Post-linkage, when the power of Bigtable is not as crucial, CAT data can
quickly and easily be exported to BigQuery or an HDFS-based querying
tool where common SQL commands can be used to query it (e.g. Hive)
8
.
We are currently using Google’s BigQuery as an analytical store because
of its high-speed data access (terabytes in seconds) via common SQL-
like commands and its ability to be queried by existing sophisticated
business intelligence (BI) tools such as Tableau and Qlikview.
Experiments with Google’s BigQuery Connector for Hadoop [22]
have been very successful, with speeds comparable to the insertion
into Bigtable, thus allowing for balanced performance. However, as
HBase [23] and associated tooling such as Apache Phoenix [24] mature,
this may negate the need to use a separate analytical store with access
becoming more user friendly than just via the HBase API.
3.8 Limitations of Our Experiments
There are a number of limitations we have identied, both in the general
design of the experiments and the application of the experiments and
results to evaluate the suitability of Bigtable for CAT. These limitations,
as well as some comments and future work to address the limitations,
are listed in Table 4.
3.9 Limitations of Google Cloud Bigtable
During our experiments we came across a few limitations of the
current version of Cloud Bigtable that might require accommodations
in selected use cases. These are listed in Table 5.
8 Note that many HDFS-compatible data processing and querying tools will work on data stored in Google Cloud Storage.
Scaling to Build the Consolidated Audit Trail: A Financial Services Application of Google Cloud Bigtable 8
3.10 Next Steps
We expect to achieve in excess of 25 billion FIX messages processed
per hour, in order to ensure Bigtable can handle the “worst-case” CAT
scenario. Additionally, we have an event linkage engine in development that
handles reconstruction of full order lifecycles. We are working on scaling
the linkage engine through both volume and increased complexity of the
order scenarios. Finally, we are developing specic analytical stores and tools
to allow regulators to quickly learn what they need from the collected CAT
data. These analytical stores and tools will be fed from Bigtable, but will not
require technical skills beyond SQL commands and API calls and will allow
for rapid access to all six years of market event data stored by CAT.
4 Conclusions
We have described CAT, the challenge of order lifecycle linkage, and
shared results from a series of experiments assessing Cloud Bigtable’s
abilities to fulll the needs of CAT. Although experimentation is ongoing,
thus far Cloud Bigtable has met or exceeded all our expectations. We
have achieved throughput as high as 2.7 GB per second, which let us
process about 2.7 million FIX messages per second in a partially-linked,
CAT-compliant way. Based on our initial results, our next goal is to
exceed a rate of 25 billion validated and linked CAT events per hour,
which would fully satisfy initial CAT requirements.
We were able to conduct these experiments with both a relatively small
team and budget, which gives us another data point in understanding how
usage of Google Cloud Platform in general, and Bigtable in particular,
will help keep the costs of building and running CAT as low as possible.
While Cloud Bigtable is a recently developed public version of Google’s
internal system and very unlike other databases (with the exception of
HBase), we found it was an easy transition and have come to realize that
Bigtable’s unique way of storing data opens more doors than it closes.
We are conducting further research at this time, and look forward to
sharing more information with CAT stakeholders about how we are
using Bigtable’s characteristics to elegantly handle the whole of order
linkage and other CAT requirements. We believe that Google Cloud
Bigtable is clearly suitable for CAT and the order linkage problem. It
scales well, can handle the complex nature of the data, and interfaces
well with other tools necessary to meet all CAT requirements. We are
also analyzing other nancial services use cases for the architecture, and
encourage curious readers to contact the authors of the paper. Any use
case involving large volumes of data that need to be accessed quickly is
a good use case for Bigtable, particularly with varied data that does not
t cleanly into a relational-style schema.
Table 4: Some limitations of our experiments, with comments about the limitations
and descriptions of future work to address the limitations.
Limitation Comments and Future Work
Not enough trials given the
variation of time endemic to large,
complex MapReduce jobs
Further tests have been run and
continue to be run, and results
appear similar. Most data points
in this whitepaper are already the
average of multiple identical runs.
Formalized testing of reads not
completed
This is the next phase of our
testing, and is already underway.
No access to real CAT data, cannot
possibly account for all possible
order scenarios
Real CAT data does not exist yet.
We have used a combination of real
FIX messages and simulated data.
Testing of a larger variety and higher
complexity of orders is underway.
No evaluation of durability
Both workers and tablet servers died
during some of our tests, but did
not stop MapReduce jobs and would
not have been noticed except for the
logging messages generated. Formal
durability evaluation is planned, possibly
with a Chaos Monkey-like tool [25].
Relative performance of virtual
machine congurations only
measured on small clusters
Testing of virtual machine
congurations at higher scales
has already commenced.
Failure to account for rebalancing
and scaling of Bigtable cluster
While this may have affected
some of our experiments, a better
understanding of the Bigtable
cluster could only raise our
performance metrics. Making
rebalancing and scaling of the
Bigtable cluster more transparent
and manageable would be helpful
in increasing our understanding in
future experiments.
Table 5: Some limitations of the current version of Cloud Bigtable, with comments
about the limitations and possible accommodations to address the limitations.
Limitation Comments and Accommodations
Cannot directly manage
Bigtable cluster
During our experiments, Bigtable
managed itself quite well. We have
the option to manually change the
cluster size, and intelligent use of
Bigtable aids the Bigtable service’s
smart management.
Must use HBase API to access
Bigtable
Easy, fast transfer of data to
BigQuery with BigQuery connector
for SQL-like querying.
No SQL query support for Bigtable
Scaling to Build the Consolidated Audit Trail: A Financial Services Application of Google Cloud Bigtable 9
Acknowledgements
We would like to thank the following people for providing their
help,
support, and technical and domain expertise in getting this proof of
concept built: Sal Sferrazza, Sam Sinha, Benjamin Tortorelli, Meghadri
Ghosh, Srikanth Narayana, Keith Gould, Peter Giesin, Chris Tancsa,
Patricia Motta, Adriana Villasenor, Petra Kass, Gemma Wipfler, Steven
Silberstein, and Marianne Brown at FIS, and especially the Google Cloud
Bigtable team, who have been awesome.
More Information
FIS (formerly SunGard) Tests CAT Prototype Using Google Cloud Bigtable, with
Quarter Hour Ingestion Rates Equivalent to 10 Billion Financial Trade
Records per Hour: http://bit.ly/1JOn4do
SunGard Homepage: http://www.sungard.com
Announcing Google Cloud Bigtable: Googles world-famous database is now
available for everyone: http://goo.gl/lVWXuy
References
[1] Google. Announcing Google Cloud Bigtable: Google’s world-famous database
is now available for everyone,” googlecloudplatform.blogspot.com.
[Online]. Available: http://goo.gl/lVWXuy [Accessed May 4, 2015].
[2] Google. “Google Cloud Platform,” cloud.google.com. [Online].
Available: https://cloud.google.com/ [Accessed: May 4, 2015].
[3] CAT NMS Plan Participants, National Market System Plan Governing
the Consolidated Audit Trail Pursuant to Rule 613 of Regulation NMS
under the Securities Exchange Act of 1934. catnmsplan.com [Online].
Available: http://catnmsplan.com/web/groups/catnms/@catnms/
documents/appsupportdocs/p602500.pdf [Accessed: May 4, 2015].
[4] F. Chang, J. Dean, S. Ghemawat, W. Hsieh, D. Wallach, M. Burrows,
T. Chandra, A. Fikes and R. Gruber, “Bigtable: A Distributed Storage
System for Structured Data,” in Proceedings of the 7th USENIX
Symposium on Operating Systems Design and Implementation, May 30-Jun.
3, 2006, Boston, MA, Berkeley, CA: USENIX Association, 2006.
[5] CAT NMS Plan Participants, “SROs Select Short List Bids for the
Consolidated Audit Trail”, catnmsplan.com [Online]. Available: http://
catnmsplan.com/web/groups/catnms/@catnms/documents/
appsupportdocs/p542077.pdf [Accessed: May 4, 2015].
[6] A. Tabb and S. Bali, “The Consolidated Audit Trail (Part I):
Reconstructing Humpty Dumpty, Tabb Group, Westborough,
MA, Mar. 2 2015.
[7] A. Tabb and S. Bali, “The Consolidated Audit Trail (Part II):
Problems and Pitfalls, Tabb Group, Westborough, MA,
Mar. 10 2015.
[8] United States of America. Securities and Exchange Commission,
Consolidated Audit Trail. 17 CFR § 242 (2012).
[9] Financial Information Exchange Protocol Version 4.2, Fix Protocol
Limited, 2001.
[10] Google. “Cloud Storage” cloud.google.com. [Online]. Available:
https://cloud.google.com/storage/ [Accessed: May 4, 2015].
[11] S. Gallagher (2012, Jan 26). The Great Disk Drive in the Sky: How
Web giants store big—and we mean big—data”. Ars Technica [Online].
Available: http://arstechnica.com/business/2012/01/26/the-big-
disk-drive-in-the-sky-how-the-giants-of-the-web-store-big-data/
[Accessed: May 4, 2015].
[12] Financial Information Forum, FIF Consolidated Audit Trail (CAT)
Working Group Response to Proposed RFP Concepts Document, Financial
Information Forum, Jan. 18 2013. [Online]. Available: http://
catnmsplan.com/web/groups/catnms/@catnms/documents/
appsupportdocs/p602500.pdf [Accessed: May 4, 2015].
[13] C. Metz (2012, Jul. 10). “Google Remakes Online Empire with
‘Colossus’”, Wired [Online]. Available: http://www.wired.
com/2012/07/google-colossus/ [Accessed: May 4, 2014].
[14] Google. “Command-Line Deployment,” cloud.google.com [Online].
Available: https://cloud.google.com/hadoop/setting-up-a-hadoop-
cluster [Accessed: May 4, 2015].
[15] Google. “Google Compute Engine,” cloud.google.com. [Online]. Available:
https://cloud.google.com/compute/ [Accessed May 4, 2015].
[16] Google. “Instances,” cloud.google.com. [Online]. Available: https://
cloud.google.com/compute/docs/instances [Accessed: May 4, 2015].
[17] R. Vijayakumari, R. Kirankumar and K. Gangadhara Rao,
“Comparative analysis of Google File System and Hadoop
Distributed File System”, International Journal of Advanced Trends
in Computer Science and Engineering, vol. 3, no. 1, pp. 553-558, Feb.
2014. [Online]. Available: http://www.warse.org/pdfs/2014/
icetetssp106.pdf [Accessed: May 4, 2015].
[18] S. Kaisler, Software Paradigms. Hoboken, N.J.: Wiley-Interscience,
2005.
[19] N. Gunther, Guerrilla Capacity Planning. Berlin: Springer, 2007.
[20] C. Chambers, A. Raniwala, F. Perry, S. Adams, R. Henry, R.
Bradshaw and N. Weizenbaum, “FlumeJava,” in Proceedings of the
2010 ACM SIGPLAN conference on Programming language design and
implementation, June 5-10, 2010, Toronto, ON, New York: ACM, 2010.
pp. 363-375.
Scaling to Build the Consolidated Audit Trail: A Financial Services Application of Google Cloud Bigtable 10
[21] Y. Sverdlik (2014, Jun 25). “Google Dumps MapReduce in Favor of
New Hyper-Scale Cloud Analytics System”. Data Center Knowledge
[Online]. Available: http://www.datacenterknowledge.com/
archives/2014/06/25/google-dumps-mapreduce-favor-new-
hyper-scale-analytics-system/ [Accessed: May 4, 2015].
[22] Google. “BigQuery Connector for Hadoop,” cloud.google.com.
[Online]. Available: https://cloud.google.com/hadoop/bigquery-
connector [Accessed: May 4, 2015].
[23] L. George, Hbase: The Denitive Guide. Sebastopol, CA: O’Reilly,
2011.
[24] Apache. Apache Phoenix: High performance relational database layer
over HBase for low latency applications” phoenix.apache.org. [Online].
Available: http://phoenix.apache.org/ [Accessed: May 4, 2014].
[25] F. Faghri, S. Bazarbayev, M. Overholt, R. Farivar, R. Campbell and
W. Sanders, “Failure Scenario as a Service (FSaaS) for Hadoop
Clusters”, in SDMCMM ‘12: Proceedings of the Workshop on Secure
and Dependable Middleware for Cloud Monitoring and Management, Dec. 4,
2012, Montreal, QC, New York, NY: ACM, 2012.