AWS Prescriptive Guidance - Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB, and Aurora MySQL-Compatible

Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB, and

Aurora MySQL-Compatible

AWS Prescriptive Guidance

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

AWS Prescriptive Guidance: Archiving data in Amazon RDS for

MySQL, Amazon RDS for MariaDB, and Aurora MySQL-Compatible

Amazon's trademarks and trade dress may not be used in connection with any product or service

that is not Amazon's, in any manner that is likely to cause confusion among customers, or in any

manner that disparages or discredits Amazon. All other trademarks not owned by Amazon are

the property of their respective owners, who may or may not be aﬃliated with, connected to, or

sponsored by Amazon.

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

Table of Contents

Introduction ..................................................................................................................................... 1

Overview ......................................................................................................................................................... 1

Targeted outcomes .......................................................................................................................... 2

Archive from partitioned tables ..................................................................................................... 4

Archive from unpartitioned tables ................................................................................................. 6

Move data to Amazon S3 ................................................................................................................ 7

Use SELECT INTO OUTFILE S3 .................................................................................................................. 7

Use Data Pipeline ......................................................................................................................................... 8

Use AWS Glue ............................................................................................................................................... 8

Accessing archived data ................................................................................................................ 12

Standard storage class .............................................................................................................................. 12

S3 Glacier storage classes ........................................................................................................................ 13

Best practices .............................................................................................................................................. 14

Cleanup ........................................................................................................................................... 16

Resources ........................................................................................................................................ 17

Appendix I ...................................................................................................................................... 18

Appendix II ..................................................................................................................................... 20

Document history .......................................................................................................................... 23

Glossary .......................................................................................................................................... 24

# ..................................................................................................................................................................... 24

A ..................................................................................................................................................................... 25

B ..................................................................................................................................................................... 28

C ..................................................................................................................................................................... 30

D ..................................................................................................................................................................... 33

E ..................................................................................................................................................................... 37

F ..................................................................................................................................................................... 39

G ..................................................................................................................................................................... 40

H ..................................................................................................................................................................... 41

I ...................................................................................................................................................................... 42

L ..................................................................................................................................................................... 45

M .................................................................................................................................................................... 46

O .................................................................................................................................................................... 50

P ..................................................................................................................................................................... 52

Q .................................................................................................................................................................... 55

iii

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

R ..................................................................................................................................................................... 55

S ..................................................................................................................................................................... 58

T ..................................................................................................................................................................... 62

U ..................................................................................................................................................................... 63

V ..................................................................................................................................................................... 64

W .................................................................................................................................................................... 64

Z ..................................................................................................................................................................... 65

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

Archiving data in Amazon RDS for MySQL, Amazon RDS

for MariaDB, and Aurora MySQL-Compatible

Shyam Sunder Rakhecha, Abhishek Karmakar, Oliver Francis, and Saumya Singh Amazon Web

Services (AWS)

April 2023 (document history)

The need to archive historical data can stem from diﬀerent use cases. Your application might have

been designed without archiving capability, and growth in your business over time could result in

large amounts of historical data. This inevitably leads to degraded performance. You might also

retain historical data because of compliance requirements within your organization.

This guide discusses how to archive your historical data in Amazon Simple Storage Service (Amazon

S3) with minimal impact to your application and retrieve archived information when you need it.

Overview

This guide covers diﬀerent approaches for archiving historical data from large tables in Amazon

Relational Database Service (Amazon RDS) for MySQL, Amazon RDS for MariaDB, and Amazon

Aurora MySQL-Compatible Edition on the Amazon Web Services (AWS) Cloud. In this guide, you

will learn how to archive both partitioned table data and data that is not partitioned and resides in

large tables. You can implement the approaches presented in the guide to reduce the size of your

live data while keeping important historical data for further analysis.

Archiving your table data regularly results in a slimmer set of live data in your tables, which

leads to faster reads and writes and improves the performance of your application. Regular data

archiving falls under the operational excellence and performance eﬃciency pillars of the Well-

Architected Framework. When you move older data to Amazon Simple Storage Service (Amazon

S3) and clean up your archived data in your Amazon RDS instance or Aurora MySQL-Compatible

cluster, you can save on storage costs. This ﬁts the cost optimization pillar and helps you avoid

unnecessary costs on AWS.

Overview 1

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

Targeted business outcomes

This guide focuses on the following business outcomes:

• Improved user experience

• Data compliance requirements fulﬁlled

• Reduced storage costs

• Organized data

Improved user experience

Databases that retain historical data can have sluggish performance because of large tables and

indexes. When you archive your historical data, you slim your tables and indexes. This has a direct

positive impact on customer-facing API operations that interact with your database.

Data compliance requirements fulﬁlled

Industries such as ﬁnancial services, public sector organizations, and healthcare have stringent

archival requirements. By archiving application data that resides on your Amazon RDS for MySQL,

Amazon RDS for MariaDB, or Aurora MySQL-Compatible database in Amazon S3, you can meet the

requirements for regulated compliance, including the following:

• Payment Card Industry Data Security Standard (PCI DSS)

• Health Insurance Portability and Accountability Act (HIPAA) and Health Information Technology

for Economic and Clinical Health (HITECH) Act

• Federal Risk and Authorization Management Program (FedRAMP)

• General Data Protection Regulation (GDPR)

• Federal Information Processing Standards (FIPS) 140-2

• National Institute of Standards and Technology (NIST) 800–171

Reduced storage costs

Keeping data in Amazon RDS increases storage cost and requires higher IOPS. If you compare the

cost of storage per GB-month for Amazon RDS for MySQL Multi-AZ GP2 with that of Amazon S3

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

Glacier in the us-east-1 AWS Region, the S3 Glacier storage cost is about 57 times lower than

that of Amazon RDS.

Organized data

It's good to keep informative data that will be accessed frequently by application in the database.

However, applications generate a large quantity of data that isn't required very often or becomes

stale. These records can be archived and kept in place, which is cost-eﬀective and doesn't impact

application performance.

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

Archiving data in partitioned tables

MySQL supports partitioning for the InnoDB storage engine, and you can use this feature to

partition large tables. Partitions within the table are stored as separate physical tables, although

the SQL that operates on the partitioned table reads the whole table. This gives you the freedom

to remove unneeded partitions from the table without performing row-by-row deletes, so you can

archive historical rows in your database.

Consider the following example code. TABLE orders exists within the orderprocessing

schema. Its historical data is present in the partition phistorical, which contains data belonging

to 2021 and earlier. In the same table, application-level hot data is present in the live partitions for

each month of 2022. To archive the data in the partition phistorical, you can create an archive

table orders_2021_and_older with the same structure in the archive schema. You can then

use the MySQL EXCHANGE PARTITION to move the partition phistorical into that table. Note

that the archive table is not partitioned. After archiving, you can verify your data and move it to

Amazon S3.

CREATE TABLE orders (

orderid bigint NOT NULL AUTO_INCREMENT,

customerid bigint DEFAULT NULL,

............

order_date date NOT NULL,

PRIMARY KEY (`orderid`,`order_date`))

PARTITION BY RANGE (TO_DAYS(order_date)) (

PARTITION pstart VALUES LESS THAN (0),

PARTITION phistorical VALUES LESS THAN (TO_DAYS('2022-01-01')),

PARTITION p2022JAN VALUES LESS THAN (TO_DAYS('2022-02-01')),

PARTITION p2022FEB VALUES LESS THAN (TO_DAYS('2022-03-01')),

PARTITION p2022MAR VALUES LESS THAN (TO_DAYS('2022-04-01')),

PARTITION p2022APR VALUES LESS THAN (TO_DAYS('2022-05-01')),

PARTITION p2022MAY VALUES LESS THAN (TO_DAYS('2022-06-01')),

PARTITION p2022JUN VALUES LESS THAN (TO_DAYS('2022-07-01')),

PARTITION p2022JUL VALUES LESS THAN (TO_DAYS('2022-08-01')),

PARTITION p2022AUG VALUES LESS THAN (TO_DAYS('2022-09-01')),

PARTITION p2022SEP VALUES LESS THAN (TO_DAYS('2022-10-01')),

PARTITION p2022OCT VALUES LESS THAN (TO_DAYS('2022-11-01')),

PARTITION p2022NOV VALUES LESS THAN (TO_DAYS('2022-12-01')),

PARTITION p2022DEC VALUES LESS THAN (TO_DAYS('2023-01-01')),

PARTITION pfuture VALUES LESS THAN MAXVALUE

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

);

CREATE TABLE orders_2021_and_older (

orderid bigint NOT NULL AUTO_INCREMENT,

customerid bigint DEFAULT NULL,

............

order_date date NOT NULL,

PRIMARY KEY (`orderid`,`order_date`));

mysql> alter table orderprocessing.orders exchange partition phistorical with table

archive.orders_2021_and_older;

Query OK, 0 rows affected (0.33 sec)

When you use the EXCHANGE PARTITION feature to archive historical data, we recommend the

following best practices:

• Create a separate schema for storing archive data in your application. This schema will contain

archive tables that will house archived data. An archive table in your archive schema should have

the same structure as your live table, including its indexes and primary key. However, the target

archive table cannot be a partitioned table. Exchanging partitions between two partitioned

tables is not permitted in MySQL.

• Follow a naming convention for your archive table that helps you to identify the historical data

stored in it. This is useful when you perform auditing tasks or design jobs that move this data out

to Amazon S3.

•

Perform the EXCHANGE PARTITION data deﬁnition language (DDL) statement in a downtime

window when there is no traﬃc coming into your Aurora MySQL-Compatible writer, Amazon RDS

for MySQL, or Amazon RDS for MariaDB instances.

It might be possible to run EXCHANGE PARTITION during low-traﬃc windows in your

application or microservice. However, there should be no writes and no or very few selects on the

partitioned table. Existing long-running select queries can cause your EXCHANGE PARTITION

DDL to wait, causing resource contentions on your database. Design scripts that verify all these

conditions are met before you run EXCHANGE PARTITION on your system.

If your application design can support partitioned data and you currently have an unpartitioned

table, consider moving your data into partitioned tables to support archiving your data. For more

information, see the MySQL documentation.

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

Archiving data from unpartitioned tables

In database tables where partitioning is not possible, you can use the Percona Toolkit pt-archiver

tool to archive your table’s data into another table in your MySQL database.

The pt-archiver tool is used to archive the records from large tables to other tables or ﬁles. It’s a

read/write tool, which means it deletes data from the source table after archiving it, so you don’t

have to manage source data deletion separately. The main purpose of this script is to archive old

data from the table without impacting the existing online transaction processing (OLTP) query load

(see Appendix I) and insert the data into another table on the same or a diﬀerent server.

You can download the Percona Toolkit and install it on your local machine or on the Amazon Elastic

Compute Cloud (Amazon EC2) instance from where you are connecting to the database. To run the

pt-archiver tool, use the following syntax.

pt-archiver --source h=<HOST>,D=<DATABASE>,t=<TABLE>,u=<USER>,p=<PASSWORD> --dest

h=<HOST>,D=<DATABASE>,t=<TABLE> --where ""1=1"" --statistics

Replace the HOST, DATABASE, TABLE, and USER with your source and destination database details

and credentials.

You can also use AWS Batch to create and schedule this job for your tables.

When you use the pt-archiver tool to archive your table’s data, consider the following:

• Having a primary key on the source table will improve the performance of this tool. If the table

doesn’t have a primary key, you can create an index on a unique column, which will help pt-

archiver to go through all the rows of the table and archive them.

• By default, pt-archiver deletes the data after archiving the table. Before you run it on the

production server, be sure to test your archiving jobs with --dry-run. Alternatively, you can use

the --no-delete option.

• The pt-archiver tool adjusts its rate of archiving based on the load on your system (see Appendix

II). With higher loads, you can expect slower archiving performance.

After you run pt-archiver, your archived data should be in the corresponding table in the archive

schema. From there, you can move it to Amazon S3.

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

Moving archived table data to Amazon S3

Amazon Simple Storage Service (Amazon S3) is the natural target for archived data. It provides

99.999999999 percent durability, and it's less expensive than database storage.

Furthermore, Amazon S3 has built-in storage classes that are priced based on the retrieval pattern.

You have the option of transitioning the oﬄoaded S3 objects into a lower-priced storage tier based

on the data's retrieval frequency. For more information about storage classes and pricing, see the

Amazon S3 documentation.

For applications that use ﬂeets of MySQL instances, oﬄoading into Amazon S3 would mean saving

money on data that meets the following criteria:

• Must be archived from the database to increase eﬃciency

• Isn’t required immediately, or is sparingly needed for any business process

• Must be retained for a long term because of audit requirements

You can archive MySQL data in the following ways:

•

Export data by using SELECT INTO OUTFILE S3.

• Export data by using AWS Data Pipeline.

• Export data by using AWS Glue.

Export data by using SELECT INTO OUTFILE S3

To copy data from the Aurora MySQL-Compatible DB directly into Amazon S3, you can use the

statement SELECT INTO OUTFILE S3. This SQL statement can be run on the table, and the

required number of rows can be oﬄoaded as comma-separated values (CSV) ﬁles with a maximum

size of 6 GB. When the 6 GB threshold is crossed, multiple .csv ﬁles are created.

For the export to work, conﬁgure the following:

• AWS Identity and Access Management (IAM) roles and policies

• Database permissions granted to the user to run the command

Use SELECT INTO OUTFILE S3 7

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

• The Amazon S3 location for oﬄoading

For more information, see the Amazon Aurora documentation.

Note: To transition the S3 objects into cost-saving storage tiers, we recommend conﬁguring

Amazon S3 Lifecycle rules in the S3 bucket where the data is being exported.

Export data by using AWS Data Pipeline

AWS Data Pipeline is a web service for data-driven workﬂows. You can use Data Pipeline to

automate the movement of data from Amazon RDS for MySQL to Amazon S3. Data Pipeline runs a

series of actions or tasks serially, and it exports the data into Amazon S3 as a .csv ﬁle.

The pipeline can be created in either of the following ways:

• On the console, select the template Full copy of RDS MySQL table to S3 for the source, and

provide values for the parameters.

• From AWS CLI, deﬁne the data pipeline objects as JSON, create the pipeline, upload, and

activate.

The advantage of using Data Pipeline is that the export job can be scheduled and monitored.

Additionally, for larger tables, you can use Amazon EMR instead of Amazon EC2. For more

information and examples, see the AWS Data Pipeline documentation.

Note: To transition the S3 objects into cost-saving storage tiers, we recommend conﬁguring

Amazon S3 Lifecycle rules in the S3 bucket where the data export is being done.

Export data by using AWS Glue

You can archive MySQL data in Amazon S3 by using AWS Glue, which is a serverless analytical

service for big data scenarios. AWS Glue is powered by Apache Spark, a widely used distributed

cluster-computing framework that supports many database sources.

The oﬀ-loading of archived data from the database to Amazon S3 can be performed with a

few lines of code in an AWS Glue job. The biggest advantage that AWS Glue oﬀers is horizontal

scalability and a pay-as-you-go model, providing operational eﬃciency and cost optimization.

Use Data Pipeline 8

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

The following diagram shows a basic architecture for database archiving.

1. MySQL database creates the archive or backup table to be oﬀ-loaded in Amazon S3.

2. An AWS Glue job is initiated by one of the following approaches:

• Synchronously as a step within an AWS Step Functions state machine

• Asynchronously by an Amazon EventBridge event

• Through a manual request by using AWS CLI or an AWS SDK

3. DB credentials are retrieved from AWS Secrets Manager.

4. The AWS Glue job uses a Java Database Connectivity (JDBC) connection to access the database,

and read the table.

5. AWS Glue writes the data in Amazon S3 in Parquet format, which is an open, columnar, space-

saving data format.

Conﬁguring the AWS Glue Job

To work as intended, the AWS Glue job requires the following components and conﬁgurations:

Use AWS Glue 9

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

• AWS Glue connections – This is an AWS Glue Data Catalog object that you attach to the job to

access the database. A job can have multiple connections for making calls to multiple databases.

The connections contain the securely stored database credentials.

• GlueContext – This is a custom wrapper over the SparkContext The GlueContext class provides

higher-order API operations to interact with Amazon S3 and database sources. It enables

integration with Data Catalog. It also removes the need to rely on drivers for database

connection, which is handled within the Glue connection. Additionally, the GlueContext class

provides ways to handle Amazon S3 API operations, which is not possible with the original

SparkContext class.

• IAM policies and roles – Because AWS Glue interacts with other AWS services, you must set up

appropriate roles with the least privilege required. Services that require appropriate permissions

to interact with AWS Glue include the following:

• Amazon S3

• AWS Secrets Manager

• AWS Key Management Service (AWS KMS)

Best Practices

• For reading entire tables that have a large number of rows to be oﬀ-loaded, we recommend

using the read replica endpoint to increase read throughput without degrading performance of

the main writer instance.

• To achieve eﬃciency in the number of nodes used for processing the job, turn on auto scaling in

AWS Glue 3.0.

• If the S3 bucket is a part of data lake architecture, we recommend oﬀ-loading data by organizing

it into physical partitions. The partition scheme should be based on the access patterns.

Partitioning based on date values is one of the most recommended practices.

• Saving the data into open formats such as Parquet or Optimized Row Columnar (ORC) helps

to make the data available to other analytical services such as Amazon Athena and Amazon

Redshift.

• To make the oﬀ-loaded data read-optimized by other distributed services, the number of output

ﬁles must be controlled. It is almost always beneﬁcial to have a smaller number of larger ﬁles

instead of a large number of small ﬁles. Spark has built-in conﬁg ﬁles and methods to control

part-ﬁle generation.

Use AWS Glue 10

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

• Archived data by deﬁnition are often-accessed datasets. To achieve cost eﬃciency for storage,

the Amazon S3 class should be transitioned into less expensive tiers. This can be done using two

approaches:

• Synchronously transitioning the tier while oﬄoading – If you know beforehand that the

oﬀ-loaded data must be transitioned as part of the process, you can use the GlueContext

mechanism transition_s3_path within the same AWS Glue job that writes the data into

Amazon S3.

• Asynchronously transitioning using S3 Lifecycle – Set up the S3 Lifecycle rules with

appropriate parameters for Amazon S3 storage class transitioning and expiration. After this is

conﬁgured on the bucket, it will persist forever.

• Create and conﬁgure a subnet with a suﬃcient IP address range within the virtual private

cloud (VPC) where the database is deployed. This will avoid AWS Glue job failures caused by an

insuﬃcient number of network addresses when a large number of data processing units (DPUs)

are conﬁgured.

Use AWS Glue 11

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

Accessing archived data on Amazon S3

Amazon S3 provides a number of tools for reading the contents of the data. However, depending

on the storage class, a few preprocessing steps might be required. This section includes the

following:

• Reading archived S3 objects with Standard storage class, using AWS Glue or Amazon S3 Select

• Reading an archived S3 object with the S3 Glacier storage classes using Amazon S3 Glacier Select

or S3 Batch Operations

• Best practices

Reading archived S3 objects with Standard storage class

You can read S3 objects that are archived with the Amazon S3 storage class by using either AWS

Glue or Amazon S3 Select.

Using AWS Glue

The data oﬀ-loaded from MySQL to Amazon S3 retains the same structural rigidity and consistency

typical of a relational database management system (RDBMS).

AWS Glue Crawler crawls over S3 objects, infers the data types, and creates table metadata as

an external table DDL. When you conﬁgure the crawler job, use Amazon S3 as the source, and

specify the S3 preﬁx location where all the data-ﬁles are created. In the conﬁguration, include the

following:

• Crawler run options

• Optional table preﬁx preference

• Target database for creating the table

• IAM roles with required permissions

After you invoke the job, it will scan through the data to infer the schema and preserve it in AWS

Glue Data Catalog as AWS Glue tables. AWS Glue tables are essentially external tables that can be

queried with SQL statements like a normal database table using analytical services such as Amazon

Athena, Amazon Redshift Spectrum, and Apache Hive on Amazon EMR. For more information

about the crawler, see the AWS Glue documentation.

Standard storage class 12

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

For .csv ﬁles with a column header speciﬁed, the resultant table column names will reﬂect the

same ﬁeld names. The data type is inferred based on the values in the data object.

For Parquet ﬁles, the schema is preserved within the data itself and the resultant table will reﬂect

the same ﬁeld names and data type.

Alternatively, you can run a DDL manually within Athena to create the table deﬁnition with the

required column names and data type. This creates the table deﬁnition within Data Catalog. For

more information about creating Athena tables, see the Amazon Athena documentation.

Note: If the header row is missing from the CSV ﬁle, the crawler creates the ﬁeld name as generic

c_0, c_1,c_2,...

Using Amazon S3 Select

You can use Amazon S3 Select to read the S3 objects programmatically by using SQL expressions.

The API operation can be invoked by using the AWS CLI command select-object-content

or by using an SDK such as Boto3 and invoking the operation select_object_content from

Python.

The API operations support SQL statements as parameters and can read ﬁles only of type JSON

and Parquet. The outputs can be redirected as output ﬁles.

These operations are invoked for each S3 object. For multiple ﬁles, run the operations recursively.

For more information about running the operations by using AWS CLI, see the AWS CLI

documentation. For more information about running S3 Select by using the Python SDK Boto3, see

the Boto3 documentation.

Reading archived S3 objects with S3 Glacier storage classes

Amazon S3 Glacier classes are special storage classes with inexpensive pricing but high retrieval

time. Unlike S3 Standard objects, S3 Glacier objects can’t be read as AWS Glue tables. To make

the data available for analytical queries or reporting, you ﬁrst restore the S3 Glacier objects. The

restoration is an asynchronous process that happens over time and has a retention period. After

the objects are restored, they can be copied to a diﬀerent location as S3 Standard objects. Beyond

the retention period, the restored objects transition back to Amazon S3 Glacier.

Using Amazon S3 Glacier Select

S3 Glacier storage classes 13

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

Similar to using Amazon S3 Select with S3 Standard, you can query S3 Glacier objects to fetch a

subset of data. This enables programmatic access of data without needing any preprocessing such

as object restore or other AWS analytical services. For an example, see the request syntax for using

the Boto3 initiate_job operation to read S3 Glacier Select data.

Using S3 Batch Operations

S3 Batch Operations enables large-scale batch operations on Amazon S3 in the order of billions

of objects containing exabytes of data. Amazon S3 tracks progress, sends notiﬁcations, and stores

a detailed completion report of all actions, providing a fully managed, auditable, and serverless

experience.

S3 Batch Operations supports the Restore operation, which initiates S3 object restore for the

following storage tiers:

• Objects archived in the S3 Glacier Flexible Retrieval or S3 Glacier Deep Archive storage classes

• Objects archived through the S3 Intelligent-Tiering storage class in the Archive Access or Deep

Archive Access tiers

The batch operation can be invoked both programmatically and on the Amazon S3 console. For

input, it requires a .csv manifest ﬁle that contains the list objects to restore.

You can use an Amazon S3 Inventory report as an input for the batch work. The inventory report

is conﬁgured for a bucket and can be limited to objects under speciﬁc preﬁxes. It is an automated

report and gets generated either weekly or daily in either CSV, ORC, or Parquet format.

For more information about conﬁguring an inventory report, see the Amazon S3 documentation.

For information about using Boto3 to create an S3 Batch Operations job, see the Boto3

documentation.

Best practices

We recommend the following best practices for accessing archived data:

• Amazon S3 Select and Amazon S3 Glacier Select are suitable for cases where a subset of

data is the output using basic SQL expressions. Applications that must query Amazon S3

programmatically to derive selected datasets show dramatic performance enhancement—in

many cases, as much as a 400 percent improvement. For more information, see the blog post S3

Select and S3 Glacier Select – Retrieving Subsets of Objects.

Best practices 14

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

• S3 Select and S3 Glacier Select support reading data in only CSV, JSON, and Parquet formats. For

other open formats, such as ORC or Avro, you can use Amazon Athena or Amazon Redshift.

• For huge archival datasets, we recommend creating AWS Glue tables on top of the data so that

they can be read by using query engines such Athena and Amazon Redshift. Both Athena and

Amazon Redshift provide horizontal scaling of query performance. They also use a pay-per-query

model, which is cost-eﬀective in a one-time querying scenario. Additionally, Amazon Redshift has

Advanced Query Accelerator (AQUA) engines under the hood, which speeds up read performance

at no extra cost.

• Archived data oﬄoaded regularly in Amazon S3 should not be stored as a heap dump. Instead, it

should be saved as a new partition. A date partition will separate data into date dimensions (for

example, year=<value>/month=<value>/day=<value>). This is extremely beneﬁcial in two

situations:

• If AWS Glue tables are created by AWS Glue crawlers, these partitions act as pseudo columns.

This enhances read performance by restricting data scanned to the partitions in the range

query.

• This helps in an S3 Glacier restoration operation when you are restoring only a subset of the

object as S3 Standard.

• AWS Glue crawlers show great value when archived data saved in Amazon S3 is partitioned

physically. Every time that data is oﬀ-loaded as new preﬁx partition, the crawler scans only the

new partition and updates the metadata for that partition. If the schema of the table changes,

those changes will be captured in partition-level metadata.

Best practices 15

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

Archive table cleanup

The ﬁnal stage in the archive process is to clean up the tables in your archive schema. You can

do this after you conﬁrm that your archive data is safely archived in Amazon S3. To avoid any

impact to your application, we recommend dropping the archive schema tables during a scheduled

downtime or maintenance window or during a very low traﬃc window in your application. These

tables are not actively queried by your application and shouldn’t be cause for alarm for their

impact to ongoing transactions. Still, it is a best practice to run DDLs during a downtime.

After the storage for large archive schema tables is freed up, Amazon Aurora uses dynamic resizing

to help you save on storage costs.

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

Resources

Documentation

• Amazon Aurora

• AWS Data Pipeline

• AWS Glue

• Amazon S3

• Querying Amazon S3 Inventory with Amazon Athena

• Conﬁguring Amazon S3 Inventory

• Performing large-scale batch operations on Amazon S3 objects

• Boto3 documentation

• MySQL partitioning documentation

Pricing

• Amazon RDS for MySQL Multi-AZ GP2 pricing

• Amazon S3 Glacier pricing

Tools

• Percona Toolkit

• pt-archiver

• sysbench

Blog posts

• S3 Select and S3 Glacier Select – Retrieving Subsets of Objects

• Archive and Purge Data for Amazon RDS for PostgreSQL and Amazon Aurora with PostgreSQL

Compatibility using pg_partman and Amazon S3

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

Appendix I

All the tests are performed on an Amazon RDS for MySQL instance running on the

db.r6g.8xlarge instance class.

The following sysbench commands were used to prepare and run the load on the database.

sysbench oltp_read_write --db-driver=mysql --mysql-db=<DATABASE> --mysql-user=<USER> --

mysql-password=<PASSWORD> --mysql-host=<ENDPOINT> --tables=500 --table-size=2000000 --

threads=500 prepare

sysbench oltp_read_write --db-driver=mysql --mysql-db=employees --mysql-user=admin --

mysql-password=qwertyuiop --mysql-host=mysql8.cbbhujzeoxed.us-east-1.rds.amazonaws.com

--tables=500 --rate=500 --time=7200 run

In the following graph, an OLTP workload was running, and the pt-archiver process started where

arrow is marked.

There is no signiﬁcant change in the CPU utilization with pt-archiver running in parallel, which

infers that pt-archiver doesn't impact OLTP queries while running.

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

Appendix II

This section provides the benchmarking results for pt-archiver tools in diﬀerent scenarios. The

sysbench tool is used in this testing to put load on the database. All the tests are performed on

Amazon RDS for MySQL instance running on db.r6g.8xlarge instance class.

The following sysbench commands were used to prepare and run the load on the database:

sysbench oltp_read_write --db-driver=mysql --mysql-db=<DATABASE> --mysql-user=<USER> --

mysql-password=<PASSWORD> --mysql-host=<ENDPOINT> --tables=1000 --table-size=2000000 --

threads=500 prepare

sysbench oltp_read_write --db-driver=mysql --mysql-db=<DATABASE> --mysql-user=<USER>

--mysql-password=<PASSWORD> --mysql-host=<ENDPOINT> --tables=1000 --rate=500 --

threads=500 run

Archiving a table that has no primary key and only one index (no load on the database)

Started at 2022-11-07T05:29:12, ended at 2022-11-07T06:03:31

Action Count  Time Pct

commit600050 1715.3582 83.31

select 300025 166.5470 8.09

inserting 300024 165.4025 8.03

other 0 11.6644 0.57

It took around 34 minutes to archive 300,024 rows. This table had 2 million rows, but the tool

archived only the rows with unique data for the indexed column.

Archiving a table that has a primary key (no load on the database)

Started at 2022-11-16T08:53:49, ended at 2022-11-16T12:38:18

Action     Count  Time   Pct

commit    4000000 11065.9534   82.16

select    2000000 1278.1854    9.49

inserting  1999999 1050.4961    7.80

other       0  74.1519    0.55

It took around 3 hours, 44 minutes, and 29 seconds to archive 1,999,999 rows.

The following graph shows that pt-archiver consumes very little CPU and resources when run on its

own without any load existing in the system.

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

Archiving table that has a primary key (with load on the database)

Started at 2022-11-16T17:37:07, ended at 2022-11-17T03:20:43

Action     Count  Time    Pct

commit    4000000 19688.8362   56.23

inserting  1999999 13933.4418   39.79

select    2000000 1305.1770    3.73

other       0  89.1787    0.25

It took around 9 hours, 43 minutes, and 36 seconds to archive 1999999 rows.

The following graph shows that during the test, the CPU utilization was up to 15 percent due to

the load applied by sysbench. After the load completed, pt-archiver continued to work consuming

minimal CPU as expected to complete the archival.

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

As is evident from the graphs, pt-archiver doesn't archive aggressively when there is a load on your

database.

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

Document history

The following table describes signiﬁcant changes to this guide. If you want to be notiﬁed about

future updates, you can subscribe to an RSS feed.

Change Description Date

Initial publication — June 8, 2023

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

AWS Prescriptive Guidance glossary

The following are commonly used terms in strategies, guides, and patterns provided by AWS

Prescriptive Guidance. To suggest entries, please use the Provide feedback link at the end of the

glossary.

Numbers

7 Rs

Seven common migration strategies for moving applications to the cloud. These strategies build

upon the 5 Rs that Gartner identiﬁed in 2011 and consist of the following:

• Refactor/re-architect – Move an application and modify its architecture by taking full

advantage of cloud-native features to improve agility, performance, and scalability. This

typically involves porting the operating system and database. Example:Migrate your on-

premises Oracle database to the Amazon Aurora PostgreSQL-Compatible Edition.

• Replatform (lift and reshape) – Move an application to the cloud, and introduce some level

of optimization to take advantage of cloud capabilities. Example:Migrate your on-premises

Oracle database to Amazon Relational Database Service (Amazon RDS) for Oracle in the AWS

Cloud.

• Repurchase (drop and shop) – Switch to a diﬀerent product, typically by moving from

a traditional license to a SaaS model. Example:Migrate your customer relationship

management (CRM) system to Salesforce.com.

• Rehost (lift and shift) – Move an application to the cloud without making any changes to

take advantage of cloud capabilities. Example:Migrate your on-premises Oracle database to

Oracle on an EC2 instance in the AWS Cloud.

• Relocate (hypervisor-level lift and shift) – Move infrastructure to the cloud without

purchasing new hardware, rewriting applications, or modifying your existing operations.

You migrate servers from an on-premises platform to a cloud service for the same platform.

Example:Migrate a Microsoft Hyper-V application to AWS.

• Retain (revisit) – Keep applications in your source environment. These might include

applications that require major refactoring, and you want to postpone that work until a later

time, and legacy applications that you want to retain, because there’s no business justiﬁcation

for migrating them.

# 24

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

• Retire – Decommission or remove applications that are no longer needed in your source

environment.

ABAC

See attribute-based access control.

abstracted services

See managed services.

ACID

See atomicity, consistency, isolation, durability.

active-active migration

A database migration method in which the source and target databases are kept in sync (by

using a bidirectional replication tool or dual write operations), and both databases handle

transactions from connecting applications during migration. This method supports migration in

small, controlled batches instead of requiring a one-time cutover. It’s more ﬂexible but requires

more work than active-passive migration.

active-passive migration

A database migration method in which in which the source and target databases are kept in

sync, but only the source database handles transactions from connecting applications while

data is replicated to the target database. The target database doesn’t accept any transactions

during migration.

aggregate function

A SQL function that operates on a group of rows and calculates a single return value for the

group. Examples of aggregate functions include SUM and MAX.

See artiﬁcial intelligence.

AIOps

See artiﬁcial intelligence operations.

A 25

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

anonymization

The process of permanently deleting personal information in a dataset. Anonymization can help

protect personal privacy. Anonymized data is no longer considered to be personal data.

anti-pattern

A frequently used solution for a recurring issue where the solution is counter-productive,

ineﬀective, or less eﬀective than an alternative.

application control

A security approach that allows the use of only approved applications in order to help protect a

system from malware.

application portfolio

A collection of detailed information about each application used by an organization, including

the cost to build and maintain the application, and its business value. This information is key to

the portfolio discovery and analysis process and helps identify and prioritize the applications to

be migrated, modernized, and optimized.

artiﬁcial intelligence (AI)

The ﬁeld of computer science that is dedicated to using computing technologies to perform

cognitive functions that are typically associated with humans, such as learning, solving

problems, and recognizing patterns. For more information, see What is Artiﬁcial Intelligence?

artiﬁcial intelligence operations (AIOps)

The process of using machine learning techniques to solve operational problems, reduce

operational incidents and human intervention, and increase service quality. For more

information about how AIOps is used in the AWS migration strategy, see the operations

integration guide.

asymmetric encryption

An encryption algorithm that uses a pair of keys, a public key for encryption and a private key

for decryption. You can share the public key because it isn’t used for decryption, but access to

the private key should be highly restricted.

atomicity, consistency, isolation, durability (ACID)

A set of software properties that guarantee the data validity and operational reliability of a

database, even in the case of errors, power failures, or other problems.

A 26

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

attribute-based access control (ABAC)

The practice of creating ﬁne-grained permissions based on user attributes, such as department,

job role, and team name. For more information, see ABAC for AWS in the AWS Identity and

Access Management (IAM) documentation.

authoritative data source

A location where you store the primary version of data, which is considered to be the most

reliable source of information. You can copy data from the authoritative data source to other

locations for the purposes of processing or modifying the data, such as anonymizing, redacting,

or pseudonymizing it.

Availability Zone

A distinct location within an AWS Region that is insulated from failures in other Availability

Zones and provides inexpensive, low-latency network connectivity to other Availability Zones in

the same Region.

AWS Cloud Adoption Framework (AWS CAF)

A framework of guidelines and best practices from AWS to help organizations develop an

eﬃcient and eﬀective plan to move successfully to the cloud. AWS CAF organizes guidance

into six focus areas called perspectives: business, people, governance, platform, security,

and operations. The business, people, and governance perspectives focus on business skills

and processes; the platform, security, and operations perspectives focus on technical skills

and processes. For example, the people perspective targets stakeholders who handle human

resources (HR), staﬃng functions, and people management. For this perspective, AWS CAF

provides guidance for people development, training, and communications to help ready the

organization for successful cloud adoption. For more information, see the AWS CAF website and

the AWS CAF whitepaper.

AWS Workload Qualiﬁcation Framework (AWS WQF)

A tool that evaluates database migration workloads, recommends migration strategies, and

provides work estimates. AWS WQF is included with AWS Schema Conversion Tool (AWS SCT). It

analyzes database schemas and code objects, application code, dependencies, and performance

characteristics, and provides assessment reports.

A 27

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

bad bot

A bot that is intended to disrupt or cause harm to individuals or organizations.

BCP

See business continuity planning.

behavior graph

A uniﬁed, interactive view of resource behavior and interactions over time. You can use a

behavior graph with Amazon Detective to examine failed logon attempts, suspicious API

calls, and similar actions. For more information, see Data in a behavior graph in the Detective

documentation.

big-endian system

A system that stores the most signiﬁcant byte ﬁrst. See also endianness.

binary classiﬁcation

A process that predicts a binary outcome (one of two possible classes). For example, your ML

model might need to predict problems such as “Is this email spam or not spam?" or "Is this

product a book or a car?"

bloom ﬁlter

A probabilistic, memory-eﬃcient data structure that is used to test whether an element is a

member of a set.

blue/green deployment

A deployment strategy where you create two separate but identical environments. You run the

current application version in one environment (blue) and the new application version in the

other environment (green). This strategy helps you quickly roll back with minimal impact.

bot

A software application that runs automated tasks over the internet and simulates human

activity or interaction. Some bots are useful or beneﬁcial, such as web crawlers that index

information on the internet. Some other bots, known as bad bots, are intended to disrupt or

cause harm to individuals or organizations.

B 28

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

botnet

Networks of bots that are infected by malware and are under the control of a single party,

known as a bot herder or bot operator. Botnets are the best-known mechanism to scale bots and

their impact.

branch

A contained area of a code repository. The ﬁrst branch created in a repository is the main

branch. You can create a new branch from an existing branch, and you can then develop

features or ﬁx bugs in the new branch. A branch you create to build a feature is commonly

referred to as a feature branch. When the feature is ready for release, you merge the feature

branch back into the main branch. For more information, see About branches (GitHub

documentation).

break-glass access

In exceptional circumstances and through an approved process, a quick means for a user to

gain access to an AWS account that they don't typically have permissions to access. For more

information, see the Implement break-glass procedures indicator in the AWS Well-Architected

guidance.

brownﬁeld strategy

The existing infrastructure in your environment. When adopting a brownﬁeld strategy for a

system architecture, you design the architecture around the constraints of the current systems

and infrastructure. If you are expanding the existing infrastructure, you might blend brownﬁeld

and greenﬁeld strategies.

buﬀer cache

The memory area where the most frequently accessed data is stored.

business capability

What a business does to generate value (for example, sales, customer service, or marketing).

Microservices architectures and development decisions can be driven by business capabilities.

For more information, see the Organized around business capabilities section of the Running

containerized microservices on AWS whitepaper.

business continuity planning (BCP)

A plan that addresses the potential impact of a disruptive event, such as a large-scale migration,

on operations and enables a business to resume operations quickly.

B 29

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

CAF

See AWS Cloud Adoption Framework.

canary deployment

The slow and incremental release of a version to end users. When you are conﬁdent, you deploy

the new version and replace the current version in its entirety.

CCoE

See Cloud Center of Excellence.

CDC

See change data capture.

change data capture (CDC)

The process of tracking changes to a data source, such as a database table, and recording

metadata about the change. You can use CDC for various purposes, such as auditing or

replicating changes in a target system to maintain synchronization.

chaos engineering

Intentionally introducing failures or disruptive events to test a system’s resilience. You can use

AWS Fault Injection Service (AWS FIS) to perform experiments that stress your AWS workloads

and evaluate their response.

CI/CD

See continuous integration and continuous delivery.

classiﬁcation

A categorization process that helps generate predictions. ML models for classiﬁcation problems

predict a discrete value. Discrete values are always distinct from one another. For example, a

model might need to evaluate whether or not there is a car in an image.

client-side encryption

Encryption of data locally, before the target AWS service receives it.

C 30

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

Cloud Center of Excellence (CCoE)

A multi-disciplinary team that drives cloud adoption eﬀorts across an organization, including

developing cloud best practices, mobilizing resources, establishing migration timelines, and

leading the organization through large-scale transformations. For more information, see the

CCoE posts on the AWS Cloud Enterprise Strategy Blog.

cloud computing

The cloud technology that is typically used for remote data storage and IoT device

management. Cloud computing is commonly connected to edge computing technology.

cloud operating model

In an IT organization, the operating model that is used to build, mature, and optimize one or

more cloud environments. For more information, see Building your Cloud Operating Model.

cloud stages of adoption

The four phases that organizations typically go through when they migrate to the AWS Cloud:

• Project – Running a few cloud-related projects for proof of concept and learning purposes

• Foundation – Making foundational investments to scale your cloud adoption (e.g., creating a

landing zone, deﬁning a CCoE, establishing an operations model)

• Migration – Migrating individual applications

• Re-invention – Optimizing products and services, and innovating in the cloud

These stages were deﬁned by Stephen Orban in the blog post The Journey Toward Cloud-First

& the Stages of Adoption on the AWS Cloud Enterprise Strategy blog. For information about

how they relate to the AWS migration strategy, see the migration readiness guide.

CMDB

See conﬁguration management database.

code repository

A location where source code and other assets, such as documentation, samples, and scripts,

are stored and updated through version control processes. Common cloud repositories include

GitHub or AWS CodeCommit. Each version of the code is called a branch. In a microservice

structure, each repository is devoted to a single piece of functionality. A single CI/CD pipeline

can use multiple repositories.

C 31

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

cold cache

A buﬀer cache that is empty, not well populated, or contains stale or irrelevant data. This

aﬀects performance because the database instance must read from the main memory or disk,

which is slower than reading from the buﬀer cache.

cold data

Data that is rarely accessed and is typically historical. When querying this kind of data, slow

queries are typically acceptable. Moving this data to lower-performing and less expensive

storage tiers or classes can reduce costs.

computer vision (CV)

A ﬁeld of AI that uses machine learning to analyze and extract information from visual formats

such as digital images and videos. For example, AWS Panorama oﬀers devices that add CV to

on-premises camera networks, and Amazon SageMaker provides image processing algorithms

for CV.

conﬁguration drift

For a workload, a conﬁguration change from the expected state. It might cause the workload to

become noncompliant, and it's typically gradual and unintentional.

conﬁguration management database (CMDB)

A repository that stores and manages information about a database and its IT environment,

including both hardware and software components and their conﬁgurations. You typically use

data from a CMDB in the portfolio discovery and analysis stage of migration.

conformance pack

A collection of AWS Conﬁg rules and remediation actions that you can assemble to customize

your compliance and security checks. You can deploy a conformance pack as a single entity in

an AWS account and Region, or across an organization, by using a YAML template. For more

information, see Conformance packs in the AWS Conﬁg documentation.

continuous integration and continuous delivery (CI/CD)

The process of automating the source, build, test, staging, and production stages of the

software release process. CI/CD is commonly described as a pipeline. CI/CD can help you

automate processes, improve productivity, improve code quality, and deliver faster. For more

information, see Beneﬁts of continuous delivery. CD can also stand for continuous deployment.

For more information, see Continuous Delivery vs. Continuous Deployment.

C 32

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

See computer vision.

data at rest

Data that is stationary in your network, such as data that is in storage.

data classiﬁcation

A process for identifying and categorizing the data in your network based on its criticality and

sensitivity. It is a critical component of any cybersecurity risk management strategy because

it helps you determine the appropriate protection and retention controls for the data. Data

classiﬁcation is a component of the security pillar in the AWS Well-Architected Framework. For

more information, see Data classiﬁcation.

data drift

A meaningful variation between the production data and the data that was used to train an ML

model, or a meaningful change in the input data over time. Data drift can reduce the overall

quality, accuracy, and fairness in ML model predictions.

data in transit

Data that is actively moving through your network, such as between network resources.

data mesh

An architectural framework that provides distributed, decentralized data ownership with

centralized management and governance.

data minimization

The principle of collecting and processing only the data that is strictly necessary. Practicing

data minimization in the AWS Cloud can reduce privacy risks, costs, and your analytics carbon

footprint.

data perimeter

A set of preventive guardrails in your AWS environment that help make sure that only trusted

identities are accessing trusted resources from expected networks. For more information, see

Building a data perimeter on AWS.

D 33

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

data preprocessing

To transform raw data into a format that is easily parsed by your ML model. Preprocessing data

can mean removing certain columns or rows and addressing missing, inconsistent, or duplicate

values.

data provenance

The process of tracking the origin and history of data throughout its lifecycle, such as how the

data was generated, transmitted, and stored.

data subject

An individual whose data is being collected and processed.

data warehouse

A data management system that supports business intelligence, such as analytics. Data

warehouses commonly contain large amounts of historical data, and they are typically used for

queries and analysis.

database deﬁnition language (DDL)

Statements or commands for creating or modifying the structure of tables and objects in a

database.

database manipulation language (DML)

Statements or commands for modifying (inserting, updating, and deleting) information in a

database.

DDL

See database deﬁnition language.

deep ensemble

To combine multiple deep learning models for prediction. You can use deep ensembles to

obtain a more accurate prediction or for estimating uncertainty in predictions.

deep learning

An ML subﬁeld that uses multiple layers of artiﬁcial neural networks to identify mapping

between input data and target variables of interest.

D 34

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

defense-in-depth

An information security approach in which a series of security mechanisms and controls are

thoughtfully layered throughout a computer network to protect the conﬁdentiality, integrity,

and availability of the network and the data within. When you adopt this strategy on AWS,

you add multiple controls at diﬀerent layers of the AWS Organizations structure to help

secure resources. For example, a defense-in-depth approach might combine multi-factor

authentication, network segmentation, and encryption.

delegated administrator

In AWS Organizations, a compatible service can register an AWS member account to administer

the organization’s accounts and manage permissions for that service. This account is called the

delegated administrator for that service. For more information and a list of compatible services,

see Services that work with AWS Organizations in the AWS Organizations documentation.

deployment

The process of making an application, new features, or code ﬁxes available in the target

environment. Deployment involves implementing changes in a code base and then building and

running that code base in the application’s environments.

development environment

See environment.

detective control

A security control that is designed to detect, log, and alert after an event has occurred.

These controls are a second line of defense, alerting you to security events that bypassed the

preventative controls in place. For more information, see Detective controls in Implementing

security controls on AWS.

development value stream mapping (DVSM)

A process used to identify and prioritize constraints that adversely aﬀect speed and quality in

a software development lifecycle. DVSM extends the value stream mapping process originally

designed for lean manufacturing practices. It focuses on the steps and teams required to create

and move value through the software development process.

D 35

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

digital twin

A virtual representation of a real-world system, such as a building, factory, industrial

equipment, or production line. Digital twins support predictive maintenance, remote

monitoring, and production optimization.

dimension table

In a star schema, a smaller table that contains data attributes about quantitative data in a

fact table. Dimension table attributes are typically text ﬁelds or discrete numbers that behave

like text. These attributes are commonly used for query constraining, ﬁltering, and result set

labeling.

disaster

An event that prevents a workload or system from fulﬁlling its business objectives in its primary

deployed location. These events can be natural disasters, technical failures, or the result of

human actions, such as unintentional misconﬁguration or a malware attack.

disaster recovery (DR)

The strategy and process you use to minimize downtime and data loss caused by a disaster. For

more information, see Disaster Recovery of Workloads on AWS: Recovery in the Cloud in the

AWS Well-Architected Framework.

DML

See database manipulation language.

domain-driven design

An approach to developing a complex software system by connecting its components to

evolving domains, or core business goals, that each component serves. This concept was

introduced by Eric Evans in his book, Domain-Driven Design: Tackling Complexity in the Heart of

Software (Boston: Addison-Wesley Professional,2003). For information about how you can use

domain-driven design with the strangler ﬁg pattern, see Modernizing legacy Microsoft ASP.NET

(ASMX) web services incrementally by using containers and Amazon API Gateway.

See disaster recovery.

D 36

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

drift detection

Tracking deviations from a baselined conﬁguration. For example, you can use AWS

CloudFormation to detect drift in system resources, or you can use AWS Control Tower to detect

changes in your landing zone that might aﬀect compliance with governance requirements.

DVSM

See development value stream mapping.

EDA

See exploratory data analysis.

edge computing

The technology that increases the computing power for smart devices at the edges of an IoT

network. When compared with cloud computing, edge computing can reduce communication

latency and improve response time.

encryption

A computing process that transforms plaintext data, which is human-readable, into ciphertext.

encryption key

A cryptographic string of randomized bits that is generated by an encryption algorithm. Keys

can vary in length, and each key is designed to be unpredictable and unique.

endianness

The order in which bytes are stored in computer memory. Big-endian systems store the most

signiﬁcant byte ﬁrst. Little-endian systems store the least signiﬁcant byte ﬁrst.

endpoint

See service endpoint.

endpoint service

A service that you can host in a virtual private cloud (VPC) to share with other users. You can

create an endpoint service with AWS PrivateLink and grant permissions to other AWS accounts

E 37

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

or to AWS Identity and Access Management (IAM) principals. These accounts or principals

can connect to your endpoint service privately by creating interface VPC endpoints. For more

information, see Create an endpoint service in the Amazon Virtual Private Cloud (Amazon VPC)

documentation.

enterprise resource planning (ERP)

A system that automates and manages key business processes (such as accounting, MES, and

project management) for an enterprise.

envelope encryption

The process of encrypting an encryption key with another encryption key. For more

information, see Envelope encryption in the AWS Key Management Service (AWS KMS)

documentation.

environment

An instance of a running application. The following are common types of environments in cloud

computing:

• development environment – An instance of a running application that is available only to the

core team responsible for maintaining the application. Development environments are used

to test changes before promoting them to upper environments. This type of environment is

sometimes referred to as a test environment.

• lower environments – All development environments for an application, such as those used

for initial builds and tests.

• production environment – An instance of a running application that end users can access. In a

CI/CD pipeline, the production environment is the last deployment environment.

• upper environments – All environments that can be accessed by users other than the core

development team. This can include a production environment, preproduction environments,

and environments for user acceptance testing.

epic

In agile methodologies, functional categories that help organize and prioritize your work. Epics

provide a high-level description of requirements and implementation tasks. For example, AWS

CAF security epics include identity and access management, detective controls, infrastructure

security, data protection, and incident response. For more information about epics in the AWS

migration strategy, see the program implementation guide.

E 38

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

ERP

See enterprise resource planning.

exploratory data analysis (EDA)

The process of analyzing a dataset to understand its main characteristics. You collect or

aggregate data and then perform initial investigations to ﬁnd patterns, detect anomalies,

and check assumptions. EDA is performed by calculating summary statistics and creating data

visualizations.

fact table

The central table in a star schema. It stores quantitative data about business operations.

Typically, a fact table contains two types of columns: those that contain measures and those

that contain a foreign key to a dimension table.

fail fast

A philosophy that uses frequent and incremental testing to reduce the development lifecycle. It

is a critical part of an agile approach.

fault isolation boundary

In the AWS Cloud, a boundary such as an Availability Zone, AWS Region, control plane, or data

plane that limits the eﬀect of a failure and helps improve the resilience of workloads. For more

information, see AWS Fault Isolation Boundaries.

feature branch

See branch.

features

The input data that you use to make a prediction. For example, in a manufacturing context,

features could be images that are periodically captured from the manufacturing line.

feature importance

How signiﬁcant a feature is for a model’s predictions. This is usually expressed as a numerical

score that can be calculated through various techniques, such as Shapley Additive Explanations

F 39

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

(SHAP) and integrated gradients. For more information, see Machine learning model

interpretability with :AWS.

feature transformation

To optimize data for the ML process, including enriching data with additional sources, scaling

values, or extracting multiple sets of information from a single data ﬁeld. This enables the ML

model to beneﬁt from the data. For example, if you break down the “2021-05-27 00:15:37”

date into “2021”, “May”, “Thu”, and “15”, you can help the learning algorithm learn nuanced

patterns associated with diﬀerent data components.

FGAC

See ﬁne-grained access control.

ﬁne-grained access control (FGAC)

The use of multiple conditions to allow or deny an access request.

ﬂash-cut migration

A database migration method that uses continuous data replication through change data

capture to migrate data in the shortest time possible, instead of using a phased approach. The

objective is to keep downtime to a minimum.

geo blocking

See geographic restrictions.

geographic restrictions (geo blocking)

In Amazon CloudFront, an option to prevent users in speciﬁc countries from accessing content

distributions. You can use an allow list or block list to specify approved and banned countries.

For more information, see Restricting the geographic distribution of your content in the

CloudFront documentation.

Gitﬂow workﬂow

An approach in which lower and upper environments use diﬀerent branches in a source code

repository. The Gitﬂow workﬂow is considered legacy, and the trunk-based workﬂow is the

modern, preferred approach.

G 40

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

greenﬁeld strategy

The absence of existing infrastructure in a new environment. When adopting a greenﬁeld

strategy for a system architecture, you can select all new technologies without the restriction

of compatibility with existing infrastructure, also known as brownﬁeld. If you are expanding the

existing infrastructure, you might blend brownﬁeld and greenﬁeld strategies.

guardrail

A high-level rule that helps govern resources, policies, and compliance across organizational

units (OUs). Preventive guardrails enforce policies to ensure alignment to compliance standards.

They are implemented by using service control policies and IAM permissions boundaries.

Detective guardrails detect policy violations and compliance issues, and generate alerts

for remediation. They are implemented by using AWS Conﬁg, AWS Security Hub, Amazon

GuardDuty, AWS Trusted Advisor, Amazon Inspector, and custom AWS Lambda checks.

See high availability.

heterogeneous database migration

Migrating your source database to a target database that uses a diﬀerent database engine

(for example, Oracle to Amazon Aurora). Heterogeneous migration is typically part of a re-

architecting eﬀort, and converting the schema can be a complex task. AWS provides AWS SCT

that helps with schema conversions.

high availability (HA)

The ability of a workload to operate continuously, without intervention, in the event of

challenges or disasters. HA systems are designed to automatically fail over, consistently deliver

high-quality performance, and handle diﬀerent loads and failures with minimal performance

impact.

historian modernization

An approach used to modernize and upgrade operational technology (OT) systems to better

serve the needs of the manufacturing industry. A historian is a type of database that is used to

collect and store data from various sources in a factory.

H 41

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

homogeneous database migration

Migrating your source database to a target database that shares the same database engine

(for example, Microsoft SQL Server to Amazon RDS for SQL Server). Homogeneous migration

is typically part of a rehosting or replatforming eﬀort. You can use native database utilities to

migrate the schema.

hot data

Data that is frequently accessed, such as real-time data or recent translational data. This data

typically requires a high-performance storage tier or class to provide fast query responses.

hotﬁx

An urgent ﬁx for a critical issue in a production environment. Due to its urgency, a hotﬁx is

usually made outside of the typical DevOps release workﬂow.

hypercare period

Immediately following cutover, the period of time when a migration team manages and

monitors the migrated applications in the cloud in order to address any issues. Typically, this

period is 1–4 days in length. At the end of the hypercare period, the migration team typically

transfers responsibility for the applications to the cloud operations team.

IaC

See infrastructure as code.

identity-based policy

A policy attached to one or more IAM principals that deﬁnes their permissions within the AWS

Cloud environment.

idle application

An application that has an average CPU and memory usage between 5and 20percent over

a period of 90days. In a migration project, it is common to retire these applications or retain

them on premises.

IIoT

See industrial Internet of Things.

I 42

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

immutable infrastructure

A model that deploys new infrastructure for production workloads instead of updating,

patching, or modifying the existing infrastructure. Immutable infrastructures are inherently

more consistent, reliable, and predictable than mutable infrastructure. For more information,

see the Deploy using immutable infrastructure best practice in the AWS Well-Architected

Framework.

inbound (ingress) VPC

In an AWS multi-account architecture, a VPC that accepts, inspects, and routes network

connections from outside an application. The AWS Security Reference Architecture recommends

setting up your Network account with inbound, outbound, and inspection VPCs to protect the

two-way interface between your application and the broader internet.

incremental migration

A cutover strategy in which you migrate your application in small parts instead of performing

a single, full cutover. For example, you might move only a few microservices or users to the

new system initially. After you verify that everything is working properly, you can incrementally

move additional microservices or users until you can decommission your legacy system. This

strategy reduces the risks associated with large migrations.

Industry 4.0

A term that was introduced by Klaus Schwab in 2016 to refer to the modernization of

manufacturing processes through advances in connectivity, real-time data, automation,

analytics, and AI/ML.

infrastructure

All of the resources and assets contained within an application’s environment.

infrastructure as code (IaC)

The process of provisioning and managing an application’s infrastructure through a set

of conﬁguration ﬁles. IaC is designed to help you centralize infrastructure management,

standardize resources, and scale quickly so that new environments are repeatable, reliable, and

consistent.

I 43

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

industrial Internet of Things (IIoT)

The use of internet-connected sensors and devices in the industrial sectors, such as

manufacturing, energy, automotive, healthcare, life sciences, and agriculture. For more

information, see Building an industrial Internet of Things (IIoT) digital transformation strategy.

inspection VPC

In an AWS multi-account architecture, a centralized VPC that manages inspections of network

traﬃc between VPCs (in the same or diﬀerent AWS Regions), the internet, and on-premises

networks. The AWS Security Reference Architecture recommends setting up your Network

account with inbound, outbound, and inspection VPCs to protect the two-way interface

between your application and the broader internet.

Internet of Things (IoT)

The network of connected physical objects with embedded sensors or processors that

communicate with other devices and systems through the internet or over a local

communication network. For more information, see What is IoT?

interpretability

A characteristic of a machine learning model that describes the degree to which a human

can understand how the model’s predictions depend on its inputs. For more information, see

Machine learning model interpretability with AWS.

IoT

See Internet of Things.

IT information library (ITIL)

A set of best practices for delivering IT services and aligning these services with business

requirements. ITIL provides the foundation for ITSM.

IT service management (ITSM)

Activities associated with designing, implementing, managing, and supporting IT services for

an organization. For information about integrating cloud operations with ITSM tools, see the

operations integration guide.

ITIL

See IT information library.

I 44

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

ITSM

See IT service management.

label-based access control (LBAC)

An implementation of mandatory access control (MAC) where the users and the data itself are

each explicitly assigned a security label value. The intersection between the user security label

and data security label determines which rows and columns can be seen by the user.

landing zone

A landing zone is a well-architected, multi-account AWS environment that is scalable and

secure. This is a starting point from which your organizations can quickly launch and deploy

workloads and applications with conﬁdence in their security and infrastructure environment.

For more information about landing zones, see Setting up a secure and scalable multi-account

AWS environment.

large migration

A migration of 300or more servers.

LBAC

See label-based access control.

least privilege

The security best practice of granting the minimum permissions required to perform a task. For

more information, see Apply least-privilege permissions in the IAM documentation.

lift and shift

See 7 Rs.

little-endian system

A system that stores the least signiﬁcant byte ﬁrst. See also endianness.

lower environments

See environment.

L 45

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

machine learning (ML)

A type of artiﬁcial intelligence that uses algorithms and techniques for pattern recognition and

learning. ML analyzes and learns from recorded data, such as Internet of Things (IoT) data, to

generate a statistical model based on patterns. For more information, see Machine Learning.

main branch

See branch.

malware

Software that is designed to compromise computer security or privacy. Malware might disrupt

computer systems, leak sensitive information, or gain unauthorized access. Examples of

malware include viruses, worms, ransomware, Trojan horses, spyware, and keyloggers.

managed services

AWS services for which AWS operates the infrastructure layer, the operating system, and

platforms, and you access the endpoints to store and retrieve data. Amazon Simple Storage

Service (Amazon S3) and Amazon DynamoDB are examples of managed services. These are also

known as abstracted services.

manufacturing execution system (MES)

A software system for tracking, monitoring, documenting, and controlling production processes

that convert raw materials to ﬁnished products on the shop ﬂoor.

MAP

See Migration Acceleration Program.

mechanism

A complete process in which you create a tool, drive adoption of the tool, and then inspect the

results in order to make adjustments. A mechanism is a cycle that reinforces and improves itself

as it operates. For more information, see Building mechanisms in the AWS Well-Architected

Framework.

member account

All AWS accounts other than the management account that are part of an organization in AWS

Organizations. An account can be a member of only one organization at a time.

M 46

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

MES

See manufacturing execution system.

Message Queuing Telemetry Transport (MQTT)

A lightweight, machine-to-machine (M2M) communication protocol, based on the publish/

subscribe pattern, for resource-constrained IoT devices.

microservice

A small, independent service that communicates over well-deﬁned APIs and is typically

owned by small, self-contained teams. For example, an insurance system might include

microservices that map to business capabilities, such as sales or marketing, or subdomains,

such as purchasing, claims, or analytics. The beneﬁts of microservices include agility, ﬂexible

scaling, easy deployment, reusable code, and resilience. For more information, see Integrating

microservices by using AWS serverless services.

microservices architecture

An approach to building an application with independent components that run each application

process as a microservice. These microservices communicate through a well-deﬁned interface

by using lightweight APIs. Each microservice in this architecture can be updated, deployed,

and scaled to meet demand for speciﬁc functions of an application. For more information, see

Implementing microservices on AWS.

Migration Acceleration Program (MAP)

An AWS program that provides consulting support, training, and services to help organizations

build a strong operational foundation for moving to the cloud, and to help oﬀset the initial

cost of migrations. MAP includes a migration methodology for executing legacy migrations in a

methodical way and a set of tools to automate and accelerate common migration scenarios.

migration at scale

The process of moving the majority of the application portfolio to the cloud in waves, with

more applications moved at a faster rate in each wave. This phase uses the best practices and

lessons learned from the earlier phases to implement a migration factory of teams, tools, and

processes to streamline the migration of workloads through automation and agile delivery. This

is the third phase of the AWS migration strategy.

M 47

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

migration factory

Cross-functional teams that streamline the migration of workloads through automated, agile

approaches. Migration factory teams typically include operations, business analysts and owners,

migration engineers, developers, and DevOps professionals working in sprints. Between 20

and 50 percent of an enterprise application portfolio consists of repeated patterns that can

be optimized by a factory approach. For more information, see the discussion of migration

factories and the Cloud Migration Factory guide in this content set.

migration metadata

The information about the application and server that is needed to complete the migration.

Each migration pattern requires a diﬀerent set of migration metadata. Examples of migration

metadata include the target subnet, security group, and AWS account.

migration pattern

A repeatable migration task that details the migration strategy, the migration destination, and

the migration application or service used. Example: Rehost migration to Amazon EC2 with AWS

Application Migration Service.

Migration Portfolio Assessment (MPA)

An online tool that provides information for validating the business case for migrating to

the AWS Cloud. MPA provides detailed portfolio assessment (server right-sizing, pricing, TCO

comparisons, migration cost analysis) as well as migration planning (application data analysis

and data collection, application grouping, migration prioritization, and wave planning). The

MPA tool (requires login) is available free of charge to all AWS consultants and APN Partner

consultants.

Migration Readiness Assessment (MRA)

The process of gaining insights about an organization’s cloud readiness status, identifying

strengths and weaknesses, and building an action plan to close identiﬁed gaps, using the AWS

CAF. For more information, see the migration readiness guide. MRA is the ﬁrst phase of the AWS

migration strategy.

migration strategy

The approach used to migrate a workload to the AWS Cloud. For more information, see the 7 Rs

entry in this glossary and see Mobilize your organization to accelerate large-scale migrations.

M 48

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

See machine learning.

modernization

Transforming an outdated (legacy or monolithic) application and its infrastructure into an agile,

elastic, and highly available system in the cloud to reduce costs, gain eﬃciencies, and take

advantage of innovations. For more information, see Strategy for modernizing applications in

the AWS Cloud.

modernization readiness assessment

An evaluation that helps determine the modernization readiness of an organization’s

applications; identiﬁes beneﬁts, risks, and dependencies; and determines how well the

organization can support the future state of those applications. The outcome of the assessment

is a blueprint of the target architecture, a roadmap that details development phases and

milestones for the modernization process, and an action plan for addressing identiﬁed gaps. For

more information, see Evaluating modernization readiness for applications in the AWS Cloud.

monolithic applications (monoliths)

Applications that run as a single service with tightly coupled processes. Monolithic applications

have several drawbacks. If one application feature experiences a spike in demand, the

entire architecture must be scaled. Adding or improving a monolithic application’s features

also becomes more complex when the code base grows. To address these issues, you can

use a microservices architecture. For more information, see Decomposing monoliths into

microservices.

MPA

See Migration Portfolio Assessment.

MQTT

See Message Queuing Telemetry Transport.

multiclass classiﬁcation

A process that helps generate predictions for multiple classes (predicting one of more than

two outcomes). For example, an ML model might ask "Is this product a book, car, or phone?" or

"Which product category is most interesting to this customer?"

M 49

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

mutable infrastructure

A model that updates and modiﬁes the existing infrastructure for production workloads. For

improved consistency, reliability, and predictability, the AWS Well-Architected Framework

recommends the use of immutable infrastructure as a best practice.

OAC

See origin access control.

OAI

See origin access identity.

OCM

See organizational change management.

oﬄine migration

A migration method in which the source workload is taken down during the migration process.

This method involves extended downtime and is typically used for small, non-critical workloads.

See operations integration.

OLA

See operational-level agreement.

online migration

A migration method in which the source workload is copied to the target system without being

taken oﬄine. Applications that are connected to the workload can continue to function during

the migration. This method involves zero to minimal downtime and is typically used for critical

production workloads.

OPC-UA

See Open Process Communications - Uniﬁed Architecture.

O 50

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

Open Process Communications - Uniﬁed Architecture (OPC-UA)

A machine-to-machine (M2M) communication protocol for industrial automation. OPC-UA

provides an interoperability standard with data encryption, authentication, and authorization

schemes.

operational-level agreement (OLA)

An agreement that clariﬁes what functional IT groups promise to deliver to each other, to

support a service-level agreement (SLA).

operational readiness review (ORR)

A checklist of questions and associated best practices that help you understand, evaluate,

prevent, or reduce the scope of incidents and possible failures. For more information, see

Operational Readiness Reviews (ORR) in the AWS Well-Architected Framework.

operational technology (OT)

Hardware and software systems that work with the physical environment to control industrial

operations, equipment, and infrastructure. In manufacturing, the integration of OT and

information technology (IT) systems is a key focus for Industry 4.0 transformations.

operations integration (OI)

The process of modernizing operations in the cloud, which involves readiness planning,

automation, and integration. For more information, see the operations integration guide.

organization trail

A trail that’s created by AWS CloudTrail that logs all events for all AWS accounts in an

organization in AWS Organizations. This trail is created in each AWS account that’s part of the

organization and tracks the activity in each account. For more information, see Creating a trail

for an organization in the CloudTrail documentation.

organizational change management (OCM)

A framework for managing major, disruptive business transformations from a people, culture,

and leadership perspective. OCM helps organizations prepare for, and transition to, new

systems and strategies by accelerating change adoption, addressing transitional issues, and

driving cultural and organizational changes. In the AWS migration strategy, this framework is

called people acceleration, because of the speed of change required in cloud adoption projects.

For more information, see the OCM guide.

O 51

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

origin access control (OAC)

In CloudFront, an enhanced option for restricting access to secure your Amazon Simple Storage

Service (Amazon S3) content. OAC supports all S3 buckets in all AWS Regions, server-side

encryption with AWS KMS (SSE-KMS), and dynamic PUT and DELETE requests to the S3 bucket.

origin access identity (OAI)

In CloudFront, an option for restricting access to secure your Amazon S3 content. When you

use OAI, CloudFront creates a principal that Amazon S3 can authenticate with. Authenticated

principals can access content in an S3 bucket only through a speciﬁc CloudFront distribution.

See also OAC, which provides more granular and enhanced access control.

ORR

See operational readiness review.

See operational technology.

outbound (egress) VPC

In an AWS multi-account architecture, a VPC that handles network connections that are

initiated from within an application. The AWS Security Reference Architecture recommends

setting up your Network account with inbound, outbound, and inspection VPCs to protect the

two-way interface between your application and the broader internet.

permissions boundary

An IAM management policy that is attached to IAM principals to set the maximum permissions

that the user or role can have. For more information, see Permissions boundaries in the IAM

documentation.

personally identiﬁable information (PII)

Information that, when viewed directly or paired with other related data, can be used to

reasonably infer the identity of an individual. Examples of PII include names, addresses, and

contact information.

P 52

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

PII

See personally identiﬁable information.

playbook

A set of predeﬁned steps that capture the work associated with migrations, such as delivering

core operations functions in the cloud. A playbook can take the form of scripts, automated

runbooks, or a summary of processes or steps required to operate your modernized

environment.

PLC

See programmable logic controller.

PLM

See product lifecycle management.

policy

An object that can deﬁne permissions (see identity-based policy), specify access conditions (see

resource-based policy), or deﬁne the maximum permissions for all accounts in an organization

in AWS Organizations (see service control policy).

polyglot persistence

Independently choosing a microservice’s data storage technology based on data access patterns

and other requirements. If your microservices have the same data storage technology, they can

encounter implementation challenges or experience poor performance. Microservices are more

easily implemented and achieve better performance and scalability if they use the data store

best adapted to their requirements. For more information, see Enabling data persistence in

microservices.

portfolio assessment

A process of discovering, analyzing, and prioritizing the application portfolio in order to plan

the migration. For more information, see Evaluating migration readiness.

predicate

A query condition that returns true or false, commonly located in a WHERE clause.

P 53

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

predicate pushdown

A database query optimization technique that ﬁlters the data in the query before transfer. This

reduces the amount of data that must be retrieved and processed from the relational database,

and it improves query performance.

preventative control

A security control that is designed to prevent an event from occurring. These controls are a ﬁrst

line of defense to help prevent unauthorized access or unwanted changes to your network. For

more information, see Preventative controls in Implementing security controls on AWS.

principal

An entity in AWS that can perform actions and access resources. This entity is typically a root

user for an AWS account, an IAM role, or a user. For more information, see Principal in Roles

terms and concepts in the IAM documentation.

Privacy by Design

An approach in system engineering that takes privacy into account throughout the whole

engineering process.

private hosted zones

A container that holds information about how you want Amazon Route53 to respond to DNS

queries for a domain and its subdomains within one or more VPCs. For more information, see

Working with private hosted zones in the Route53 documentation.

proactive control

A security control designed to prevent the deployment of noncompliant resources. These

controls scan resources before they are provisioned. If the resource is not compliant with the

control, then it isn't provisioned. For more information, see the Controls reference guide in the

AWS Control Tower documentation and see Proactive controls in Implementing security controls

on AWS.

product lifecycle management (PLM)

The management of data and processes for a product throughout its entire lifecycle, from

design, development, and launch, through growth and maturity, to decline and removal.

production environment

See environment.

P 54

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

programmable logic controller (PLC)

In manufacturing, a highly reliable, adaptable computer that monitors machines and automates

manufacturing processes.

pseudonymization

The process of replacing personal identiﬁers in a dataset with placeholder values.

Pseudonymization can help protect personal privacy. Pseudonymized data is still considered to

be personal data.

publish/subscribe (pub/sub)

A pattern that enables asynchronous communications among microservices to improve

scalability and responsiveness. For example, in a microservices-based MES, a microservice can

publish event messages to a channel that other microservices can subscribe to. The system can

add new microservices without changing the publishing service.

query plan

A series of steps, like instructions, that are used to access the data in a SQL relational database

system.

query plan regression

When a database service optimizer chooses a less optimal plan than it did before a given

change to the database environment. This can be caused by changes to statistics, constraints,

environment settings, query parameter bindings, and updates to the database engine.

RACI matrix

See responsible, accountable, consulted, informed (RACI).

ransomware

A malicious software that is designed to block access to a computer system or data until a

payment is made.

Q 55

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

RASCI matrix

See responsible, accountable, consulted, informed (RACI).

RCAC

See row and column access control.

read replica

A copy of a database that’s used for read-only purposes. You can route queries to the read

replica to reduce the load on your primary database.

re-architect

See 7 Rs.

recovery point objective (RPO)

The maximum acceptable amount of time since the last data recovery point. This determines

what is considered an acceptable loss of data between the last recovery point and the

interruption of service.

recovery time objective (RTO)

The maximum acceptable delay between the interruption of service and restoration of service.

refactor

See 7 Rs.

Region

A collection of AWS resources in a geographic area. Each AWS Region is isolated and

independent of the others to provide fault tolerance, stability, and resilience. For more

information, see Specify which AWS Regions your account can use.

regression

An ML technique that predicts a numeric value. For example, to solve the problem of "What

price will this house sell for?" an ML model could use a linear regression model to predict a

house's sale price based on known facts about the house (for example, the square footage).

rehost

See 7 Rs.

R 56

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

release

In a deployment process, the act of promoting changes to a production environment.

relocate

See 7 Rs.

replatform

See 7 Rs.

repurchase

See 7 Rs.

resiliency

An application's ability to resist or recover from disruptions. High availability and disaster

recovery are common considerations when planning for resiliency in the AWS Cloud. For more

information, see AWS Cloud Resilience.

resource-based policy

A policy attached to a resource, such as an Amazon S3 bucket, an endpoint, or an encryption

key. This type of policy speciﬁes which principals are allowed access, supported actions, and any

other conditions that must be met.

responsible, accountable, consulted, informed (RACI) matrix

A matrix that deﬁnes the roles and responsibilities for all parties involved in migration activities

and cloud operations. The matrix name is derived from the responsibility types deﬁned in the

matrix: responsible (R), accountable (A), consulted (C), and informed (I). The support (S) type

is optional. If you include support, the matrix is called a RASCI matrix, and if you exclude it, it’s

called a RACI matrix.

responsive control

A security control that is designed to drive remediation of adverse events or deviations from

your security baseline. For more information, see Responsive controls in Implementing security

controls on AWS.

retain

See 7 Rs.

R 57

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

retire

See 7 Rs.

rotation

The process of periodically updating a secret to make it more diﬃcult for an attacker to access

the credentials.

row and column access control (RCAC)

The use of basic, ﬂexible SQL expressions that have deﬁned access rules. RCAC consists of row

permissions and column masks.

RPO

See recovery point objective.

RTO

See recovery time objective.

runbook

A set of manual or automated procedures required to perform a speciﬁc task. These are

typically built to streamline repetitive operations or procedures with high error rates.

SAML 2.0

An open standard that many identity providers (IdPs) use. This feature enables federated

single sign-on (SSO), so users can log into the AWS Management Console or call the AWS API

operations without you having to create user in IAM for everyone in your organization. For more

information about SAML 2.0-based federation, see About SAML 2.0-based federation in the IAM

documentation.

SCADA

See supervisory control and data acquisition.

SCP

See service control policy.

S 58

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

secret

In AWS Secrets Manager, conﬁdential or restricted information, such as a password or user

credentials, that you store in encrypted form. It consists of the secret value and its metadata.

The secret value can be binary, a single string, or multiple strings. For more information, see

What's in a Secrets Manager secret? in the Secrets Manager documentation.

security control

A technical or administrative guardrail that prevents, detects, or reduces the ability of a threat

actor to exploit a security vulnerability. There are four primary types of security controls:

preventative, detective, responsive, and proactive.

security hardening

The process of reducing the attack surface to make it more resistant to attacks. This can include

actions such as removing resources that are no longer needed, implementing the security best

practice of granting least privilege, or deactivating unnecessary features in conﬁguration ﬁles.

security information and event management (SIEM) system

Tools and services that combine security information management (SIM) and security event

management (SEM) systems. A SIEM system collects, monitors, and analyzes data from servers,

networks, devices, and other sources to detect threats and security breaches, and to generate

alerts.

security response automation

A predeﬁned and programmed action that is designed to automatically respond to or remediate

a security event. These automations serve as detective or responsive security controls that help

you implement AWS security best practices. Examples of automated response actions include

modifying a VPC security group, patching an Amazon EC2 instance, or rotating credentials.

server-side encryption

Encryption of data at its destination, by the AWS service that receives it.

service control policy (SCP)

A policy that provides centralized control over permissions for all accounts in an organization

in AWS Organizations. SCPs deﬁne guardrails or set limits on actions that an administrator can

delegate to users or roles. You can use SCPs as allow lists or deny lists, to specify which services

or actions are permitted or prohibited. For more information, see Service control policies in the

AWS Organizations documentation.

S 59

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

service endpoint

The URL of the entry point for an AWS service. You can use the endpoint to connect

programmatically to the target service. For more information, see AWS service endpoints in

AWS General Reference.

service-level agreement (SLA)

An agreement that clariﬁes what an IT team promises to deliver to their customers, such as

service uptime and performance.

service-level indicator (SLI)

A measurement of a performance aspect of a service, such as its error rate, availability, or

throughput.

service-level objective (SLO)

A target metric that represents the health of a service, as measured by a service-level indicator.

shared responsibility model

A model describing the responsibility you share with AWS for cloud security and compliance.

AWS is responsible for security of the cloud, whereas you are responsible for security in the

cloud. For more information, see Shared responsibility model.

SIEM

See security information and event management system.

single point of failure (SPOF)

A failure in a single, critical component of an application that can disrupt the system.

SLA

See service-level agreement.

SLI

See service-level indicator.

SLO

See service-level objective.

S 60

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible

split-and-seed model

A pattern for scaling and accelerating modernization projects. As new features and product

releases are deﬁned, the core team splits up to create new product teams. This helps scale your

organization’s capabilities and services, improves developer productivity, and supports rapid

innovation. For more information, see Phased approach to modernizing applications in the AWS

Cloud.

SPOF

See single point of failure.

star schema

A database organizational structure that uses one large fact table to store transactional or

measured data and uses one or more smaller dimensional tables to store data attributes. This

structure is designed for use in a data warehouse or for business intelligence purposes.

strangler ﬁg pattern

An approach to modernizing monolithic systems by incrementally rewriting and replacing

system functionality until the legacy system can be decommissioned. This pattern uses the

analogy of a ﬁg vine that grows into an established tree and eventually overcomes and replaces

its host. The pattern was introduced by Martin Fowler as a way to manage risk when rewriting

monolithic systems. For an example of how to apply this pattern, see Modernizing legacy

Microsoft ASP.NET (ASMX) web services incrementally by using containers and Amazon API

Gateway.

subnet

A range of IP addresses in your VPC. A subnet must reside in a single Availability Zone.

supervisory control and data acquisition (SCADA)

In manufacturing, a system that uses hardware and software to monitor physical assets and

production operations.

symmetric encryption

An encryption algorithm that uses the same key to encrypt and decrypt the data.

synthetic testing

Testing a system in a way that simulates user interactions to detect potential issues or to

monitor performance. You can use Amazon CloudWatch Synthetics to create these tests.

S 61

AWS Prescriptive Guidance Archiving data in Amazon RDS for MySQL, Amazon RDS for MariaDB,

and Aurora MySQL-Compatible