SUPPORT POOL
OF EXPERTS PROGRAMME
AI Auditing
Checklist for AI Auditing
by Dr. Gemma GALDON CLAVELL
AI Auditing - Checklist for AI Auditing
2
As part of the SPE programme, the EDPB may commission contractors to provide reports and tools
on specific topics.
The views expressed in the deliverables are those of their authors and they do not necessarily reflect
the official position of the EDPB. The EDPB does not guarantee the accuracy of the information
included in the deliverables. Neither the EDPB nor any person acting on the EDPB’s behalf may be
held responsible for any use that may be made of the information contained in the deliverables.
Some excerpts may be redacted or removed from the deliverables as their publication would
undermine the protection of legitimate interests, including, inter alia, the privacy and integrity of an
individual regarding the protection of personal data in accordance with Regulation (EU) 2018/1725
and/or the commercial interests of a natural or legal person.
AI Auditing - Checklist for AI Auditing
3
Table of Contents
1. Introduction .................................................................................................................................... 4
2. Scope of algorithmic auditing ......................................................................................................... 4
3. Auditing process .............................................................................................................................. 5
3.1. Model card .............................................................................................................................. 5
3.2. System map ............................................................................................................................. 7
3.3. Moments and sources of bias ............................................................................................... 12
3.4. Bias testing ............................................................................................................................ 18
3.5. Adversarial audit (optional) .................................................................................................. 19
4. The audit report ............................................................................................................................ 19
Document submitted in January 2023
AI Auditing - Checklist for AI Auditing
4
1. Introduction
Algorithmic auditing is a way to inspect AI systems in their specific contexts. It is an approach and
methodology that allows for a dynamic appraisal of regulation, standards and impacts. If its results
are public, it is also a tool for transparency and accountability.
AI audits are key tools for regulators and society, who can use audit reports to assess how systems
work and their impacts. But they are also useful for those developing and acquiring AI systems. An
end-to-end, socio-technical approach like the one proposed here generates documentation that
improves system accountability, organizational memory and compliance with AI and data regulations.
For those acquiring and incorporating AI systems into their operations, audits provide crucial evidence
that enable due diligence and proper assessment and comparison of the characteristics between
different systems and vendors.
The AI audit checklist proposed is specifically focused on AI impacts. This means that while it gathers
information on compliance and trust and safety mechanisms both before and after an AI system is
launched, the focus of the audit is to validate that AI developers and implementors have taken all
necessary measures, at all different stages, to make sure that the impacts of their systems are in line
with existing laws, trust and safety best practices and societal expectations.
It should be noted that an audit process in the framework of the controller implementation of the
accountability principle and the inspection/investigation carried out by a Supervisory Authority could
be different. Such differences rely, among others, in the final purpose of both activities (the SA could
do the inspection to get evidence of an infringement), the scope (limited to the GDPR: applies on
personal data processing activities but not on technologies) and the national regulations regarding
inspection by control authorities.
This document includes a methodology in the form of a check-list to perform an audit of an AI system.
We define an AI system as a logic with a specific outcome. An AI system may be composed of several
algorithms, and an AI service or product may include several AI systems.
In addition, it should be noted that there are different techniques for developing artificial intelligence
1
.
This document is focused on auditing an algorithm for artificial intelligence based on machine learning,
where its life cycle is divided in three totally independent stages from the point of view of data
processing and these stages are: algorithm training (pre-processing), the operation of the algorithm
implementing one (or more than one) operation in the framework of a personal data processing (in-
processing inference) and the decision making and impact of the same in the processing (post-
processing model deployment). It could be a fourth one, that is the algorithm evolution. All of those
stages could be different processing activities and could involve different controllers.
2. Scope of algorithmic auditing
An end-to-end, socio-technical algorithmic audit (E2EST/AA) should inspect a system in the actual
implementation, processing activity and running context, looking at the specific data used and the
data subjects impacted. It is an end-to-end approach because it recognizes that algorithmic systems
work with data produced by complex and imperfect individuals and societies, and operate and
intervene in complex social and organizational contexts. Thus, AI systems are deeply socio-technical,
and a focus on technical issues would fail to incorporate both problems and possibilities for system
1
There are different approaches to AI-based solutions: neural networks, rule-based systems, fuzzy
logic, machine learning, expert systems, adaptive systems, genetic algorithms, multi-agent systems,
etc.
AI Auditing - Checklist for AI Auditing
5
improvement and impact testing that go beyond in-processing. In fact, most of the E2EST/AA focuses
on pre-processing and post-processing stages of that algorithmic life-cycle. Models and systems that
have optimal performance and accuracy rates in-processing may perform in inefficient or harmful
ways when audited end-to-end and using social and technical means.
The E2EST/AA process is designed to inspect algorithmic systems used in ranking, image recognition
and natural language processing. It works with systems that make decisions on individuals or groups
based on known data sources, regardless of whether they use machine learning or classic computing.
This definition includes most systems used by the public and private sectors to make decisions on
resource allocation, categorization and identification/verification in sectors such as health, education,
security, finance and for applications like fraud detection, hiring, operations management, or
prediction/risk assessment.
The E2EST/AA is focused on bias assessment, but not limited to it. The methodology to carry out an
E2EST/AA incorporates questions related to broader social impact and desirability, as well as the
incorporation of end-users in the design process and the existence of recourse mechanisms for those
impacted by algorithmic systems. For a system to pass an algorithmic audit, issues of impact,
proportionality, participation and resource must be tackled.
A clear limitation of any audit process is that it is based in an existing system. This means that an audit
methodology does not prompt a reflection on whether a system should exist in the first place.
3. Auditing process
An E2EST/AA is an iterative process of interaction between the auditor/s and the development team/s.
The method provides templates and instructions to guide such interaction, specifying the data inputs
that are necessary for auditors to complete the assessment and validate results.
3.1. Model card
Model cards are documents designed to compile information about the training and testing of AI
models, as well as the features and the motivations of a given dataset or algorithmic model. A sample
model card like the one proposed below can be used and slightly adapted to different systems or
compliance concerns.
General information
o System name/code and version (5.2 GDPR)
o Leaflet version and version history (5.2 GDPR)
o System owner and suppliers data
o Suppliers’ role
o Risk level (AI Act)
o Governance roles (Chapter IV GDPR)
o Distribution date (5.2 GDPR)
o Existing documentation
Information on process
o Description of intended purposes, uses, context and role/service provided (Article 5.1.b,
5.2 and 24.1 GDPR)
o Stakeholder involvement
o Organizational context
o Human role/s
AI Auditing - Checklist for AI Auditing
6
Information on training/validation data
o Data sources/collection methodology (Articles 5 and 9 GDPR)
o Data types and characteristics (Article 5.1.a, b GDPR)
o Privacy by Design (Article 25 GDPR)
o Datasets (Article 5.1.a, b GDPR)
Information on the model
o Method/s used and justification
o Simplified output/s
o Decision variables
o Objective function/s (Article 5.1.d GDPR)
Information on bias and impacts (in lab/operational settings)
o Metrics (Articles 5.1.a and 5.1.b GDPR)
o Protected categories (Articles 13.1.e, 14.1.e and 35.9 GDPR)
o Impact rates per category and profile (Article 5.1.d GDPR)
o
Auditability (Articles 5 and 22 GDPR)
Information on redress:
o Explainability profiling (Recital 71 GDPR)
o Redress or review (Articles 13.2.f, 14.2.g and 15 GDPR)
o
Redress metrics, if applicable
The Model Card allows the auditor to have an initial picture of the system, as well as of the available
information, and so it is crucial to determine what issues need to be further explored and inquire
system developers on how some of the information provided was determined. It is also a useful way
to gather all existing documentation on the system.
Specifically, the inspector should record the existence of:
Documents
Available
N/A
DPIA/HRIA
Link to document or metadata
(author/s, date, etc)
Data reuse
permissions/authorizations
Link to documents
Data sharing agreements
Link to documents
Ethics/IRB approval
Link to request and approval
documents
DPA approval
Link to request and approval
documents
AI Auditing - Checklist for AI Auditing
7
Transparency report
Link to documents
Academic paper/s
Full references
GitHub/public repositories
Link
3.2. System map
The system map puts the algorithm in context, establishing the relationships and interactions between
an algorithmic model, a technical system and a decision-making process. A first version can be
designed by the auditor/s following the information provided in the MC, to be completed and
validated by the development team/s.
Model: The model is the trained algorithm, that is, the rules adapted to a particular domain, which
constitute the foundation of the technology we audit. Models are subject to performance evaluation,
test, and can be compared to each other via benchmark datasets. The model is the core of an AI
system, but it usually relies upon other elements (e.g. data pipelines, visualization platforms,...) for it
to work.
System: The system in this case refers to the entire technology. For a mobility service it could be the
app that integrates a Machine Learning (ML) model to predict demand and adjust pricing, including
the UI, including for example the data pipelines and protocols.
Process: By process we define the entire lifecycle of any unit of work, from the moment it enters into
the workflow all the way to the decision and, if part of the process, the actual way the decision is
utilized.
Specifically, in the framework of an investigation carried out by a Supervisory Authority should record
the existence of:
IDENTIFICATION AND TRANSPARENCY OF THE AI-BASED COMPONENT
Inventory of the audited AI-based component [Article 5.2]
Look for evidence to check, at least, the following questions:
o Is the AI-based component identified in the documentation by means of a name or
code, identification of version and date of creation?
o Do the code and any additional files defined by the version include a digital signature
over the entire package to guarantee its integrity?
o Is a version history of the evolution of the AI component available?
o Does every version recorded include the parameters used in the training of the
component and everything that ensures the traceability of the evolution/changes in
the component?
Identification of responsibilities [Chapter IV]
Look for evidence to check, at least, the following questions:
o Is there an identification about the person(s) or institution(s) who manage the life cycle
stages of the AI-based component?
o Is there an identification about the associate managers, and the representatives of the
controller and of the processor of every life cycle stage?
AI Auditing - Checklist for AI Auditing
8
o Does every contract associated to each processing stage specify the distribution of
responsibilities with regard to personal data protection?
o Has every contract associated to each processing stage been audited?
o Is there a registration in the Records of Processing Activities of the corresponding
controllers and processors?
o Is a Data Protection Officer appointed? If not, why?
o Has the Data Protection Officer been identified and communicated his/her identity to
the relevant Supervisory Authority?
Transparency [Article 5.1.a and Chapter III - Section 1, Articles 13.2.f and 14.2.g of Chapter
III - Section 2].
Look for evidence to check, at least, the following questions:
o Are data sources documented?
o Has an information mechanism been implemented?
o Are the characteristics of data used to train the AI component identified, documented
and duly justified?
o Is the model chosen for the AI-based component appropriate in terms of simplicity and
intelligibility, considering efficiency, quality and accuracy?
o Is the algorithm code explainability documented in order to facilitate its readability,
logic comprehension and internal consistency?
o Does the algorithm code documentation include information regarding metadata of
the AI-based component, its logic and the consequences that may arise from its use
are accessible to data subjects together with the means or mechanisms available to
exercise their rights in case of objections to the results?
o Does the algorithm code documentation include information about its behaviour
regarding input data sets, data use, intermediate data and output data?
o Can input data sets, data use, intermediate data and output data be traced?
o In case of an erroneous behaviour of the AI-based component that could cause harm
to data subjects, have mechanisms been established to minimise such damage?, are
communication channels provided to facilitate communication among all stakeholders
involved in the process?
PURPOSE OF THE AI-BASED COMPONENT
Identification of intended purposes and uses [Article 5.1.b].
Look for evidence to check, at least, the following questions:
o Is the intended purpose of the AI-based component documented both in quantitative
and qualitative terms?
o Is there a relation between the use of the AI component with the ultimate purpose of
the processing and the conditions guaranteeing the lawfulness of such processing?
o Are the different dynamics, activities and/or processes within the organization in
which the life cycle stage of the audited AI component is integrated are identified,
delimiting the context of use as much as possible?
o Are potential users of the AI-based component categorized?
o Are other possible uses and secondary users for the AI component? Have been
described together with the legal grounds for its use?
Definition of the intended context of the AI-based component [Article 24.1]
Look for evidence to check, at least, the following questions:
AI Auditing - Checklist for AI Auditing
9
o Are there any legal, social, economic, organizational, technical, scientific or other
contexts identified related to the inclusion of the AI-based component in the
processing? Are they documented?
o Is the organisational and/or contractual structure between the parties defined?
o Are the tasks and responsibilities distributed through the structure?
o Are the determining factors of the efficacy of the AI component described (including
legal guarantees, applicable laws and regulations, organizational and technical
resources, available data and internal dynamics that personal data processing needs
to implement the audited AI-based component with the appropriate guarantees)?
o Are the requirements applicable to human operators in charge of supervising and
interpreting the operation of the AI-based component defined?
o Is there any interaction between the AI-based component with other own or third-
party components, systems or applications? Are responsibilities for maintenance,
updating and minimising system privacy issues distributed and documented?
o Are levels or thresholds defined for interpreting and using the inference results?
o Are defined those contexts for a processing where the AI-based component is not
recommended (in terms of its purpose or characteristics, or when it represents an
inadequate level of reliability and/or accuracy with regard to the other processing)?
Analysis of proportionality and necessity [Article 35.7.b]
Look for evidence to check, at least, the following questions:
o Has the use of the AI component been assessed against other possible options from
an approach focusing on the rights and freedoms of data subjects?
o In case of new developments, has a comparative efficiency analysis and adequateness
of results of the AI-based component been carried out against other, more thoroughly
tested components, which use stricter minimisation criteria or which involve less risks
for the rights and freedoms of persons, most especially those that make less intensive
use of special data categories?
o In case of addressing a new issue, have the motivations and grounds for addressing
this issue by using an AI-based component been documented?
o When addressing a well-known problem, have the grounds for changing the previous
operation system that have led to a change in the previous mode of operation been
documented including the description of the new control objectives intended by using
the AI component in the framework of the procedure?
o Has the risk to the rights and freedoms of data subjects introduced by using an AI-
based component in data processing been analysed and managed?
Definition of the potential recipients of data [Chapter III; specially Articles 13.1.e and
14.1.e]
Look for evidence to check, at least, the following questions:
o Is the information obligation to data subjects identified regarding data processing
arising from the inclusion of the AI-based component?
o Are such obligations identified both for data directly obtained from data subjects and
for data obtained from other sources of information?
o When determining such obligations:
are the recipients or categories of recipients to whom the personal data
processed by the AI-based component were or are to be communicated
identified (including those who are in third countries or are international
organizations)?
AI Auditing - Checklist for AI Auditing
10
are the intentions of the controller of transferring personal data to a recipient
in a third country or international organization and the existence or absence
of a Commission decision on adequacy identified?
o Are data recipients including those from third countries or international
organizations identified under the activity or activities recorded in the Records of
Processing Activities in which the relevant AI-based component is included?
Limitation of data storage [Article 5.1.e, exceptions Article 89.1]
Look for evidence to check, at least, the following questions:
o Are the legal grounds to store personal data used by the AI-based component for a
period of time that exceeds the period established for processing purposes identified
(especially when it is related with compatible purposes or included in any of the
exceptions provided in the regulations)?
o Is it justified to store personal data once it is processed in any life cycle stages of the
AI-based component?
o Have appropriate technical and organizational measures and criteria been defined to
storage personal data?
o Are the time limits for erasure of stored personal data defined?
o Has a conservation policy been defined to keep a sample of training data for the
purpose of auditing the AI component? Does it consider the minimum or assumable
risks for the data subjects?
o Are there procedures to verify storage periods, criteria and implemented measures?
o For those cases where an excessive pattern of data storage has been detected, either
in terms of time or quantity, has a procedure for reviewing the analysis of the need
and the proportionality of data storage been defined?
o Has a storage policy for personal data included in the activity records of the AI-based
component and privacy strategies (minimisation, hiding, separation or abstraction)
been defined? Has it been implemented for operation purposes?
Analysis of categories of data subjects [Article 35.9]
Look for evidence to check, at least, the following questions:
o Are the categories of data subjects affected by the development of the AI component
and its use in the framework of the intended processing identified?
o Are the short- and long-term consequences that the implementation of the AI
component may have on the categories of data subjects identified?
o Is there any procedure that analyses the social context in which the AI component is
used, collecting information from people, groups or organizations affected by such AI
component for the purposes of knowing their levels of satisfaction, position, concerns
and uncertainties regarding the application of this technique for processing their
data?
BASES OF THE AI COMPONENT
Identification of the AI-based component development policy [Article 24.1]
Look for evidence to check, at least, the following questions:
o Do the documents with development policies of products and systems consider the
data protection policy?
o Are the policies reviewed and version controlled?
Involvement of the Data Protection Officer (DPO) [Section 4 of Chapter IV]
Look for evidence to check, at least, the following questions:
AI Auditing - Checklist for AI Auditing
11
o Does the DPO have the necessary professional qualifications and, particularly, the
legal and technical expertise, as well as data protection practice appropriate to the
project?
o Is the DPO assisted and advised by experts on specific matters relating to the AI
component?
o Are there internal procedures defined within the organisation for correct
communication between the DPOs and the people in charge of those projects that may
have an impact in data processing, in order to obtain assistance, particularly when
developing the data protection impact assessment for those processing activities
which include AI-based components?
o Has the DPO played an active role in the stages being audited? Has his or her
independence of judgment within the organisation and his or her obligations to
cooperate with the supervisory agencies been respected and his or her opinions,
remarks and considerations taken into account?
Adjustment of basic theoretical models [Article 5.1.a]
Look for evidence to check, at least, the following questions:
o Has an analysis been carried out regarding the theoretical framework and previous
similar experience on which the development of the AI component is based?
o Have the basic hypotheses and premises considered in order to create and develop the
relevant model been accurately described, justified and documented?
o Is a critical and verified procedural revision defined for the reasoning arising from
acceptance of important hypotheses for the development of the AI-based component
(i.e. examining which are the arguments for a causal relationship that models an
algorithm, such as the selection of variables defining a certain phenomenon)?
o Have appropriate premises been established regarding the potential proxy variables
intervening in the AI-based components after carrying out a careful analysis?
Appropriateness of the methodological framework [Article 5.1.a]
Look for evidence to check, at least, the following questions:
o Is there proper documentation that include the methodological framework for
defining the model and creating the AI component in the audited stages, such as the
methods for selecting, collecting and preparing component’s training data, labelling,
model building, using intermediate data, selecting the test/validation data subset or
measuring deviations for improvement purposes?
o Is the development model to be used properly determined depending on the results of
the analysis of the problem to be solved and in a justified way (i.e. supervised,
unsupervised or others)? In case of supervised models, does it specify the procedure
for supervising the learning process of the algorithm, the degree of supervision and
the basis for such supervision?
o Are the metrics for measuring the behaviour of the AI component duly selected and
measured?
o Has a procedure been implemented for recording and monitoring the deviations in the
behaviour of the AI component with respect to the defined metrics that allows to
identify the circumstances which may arise in an erroneous or biased behaviour?
Identification of the basic architecture of the AI-based component [Article 5.2]
Look for evidence to check, at least, the following questions:
o Does the project analysis phase of the AI-based component include, as part of the
requirement catalogue, a series of specific requirements too guarantee privacy and
personal data protection?
AI Auditing - Checklist for AI Auditing
12
o Is there documentation which assure that, when programming AI-based components,
the coding principles, codes and coding, best practices applied are followed in order to
guarantee that the code is readable, secure, low-maintenance and robust?
o Is the basic architecture of the AI component identified and documented? It must
include information on the chosen machine learning technique, the type(s) of tested
and, when appropriate, dismissed algorithms at the learning and training stages, and
other data on the functioning of the relevant component, such as the model loss
function or cost function.
o Does a systematic procedure for documenting the component implementation
procedure exist? Is it implemented? It is necessary to guarantee registration and
subsequent acquisition of all necessary information to identify such component, its
elements and its environment, understanding what it does and why it does it, and
enables to verify code quality and legibility for auditing purposes: description of the
programming language(s) used, most recent code version, commented-out code,
necessary packages and libraries, and interfaces with other components, when
appropriate, used APIs and useful documents such as requirements specifications,
functional and organic analyses, guidelines, etc.
o Is the AI component code impossible to access? If yes, is a reverse-engineering process
or other alternative method used (i.e., a zero-knowledge proof (ZKP))? A reverse-
engineering process enables to know more about the component function and to
establish the logic of rules applied in order to detect inconsistencies, direct
manipulations and underestimation or overestimation of the variables used in the
original component.
3.3. Moments and sources of bias
Bias refers to a deviation from the standard. As such, and in technical terms, bias may be needed and
desirable. In the context of AI accountability, however, “bias” has become an hypernym or umbrella
term for lack of fairness and discrimination in data processes which result in individual and/or
collective harms. By identifying and mitigating bias, we can ensure or protect fairness in AI systems.
Bias is the result of many factors, social and technical: from systematic errors introduced by
algorithmic design choices, dirty data, sampling procedures, reporting protocols, or wrong
assumptions that cause a mismatch between the input features and the target outputs. To date, most
studies on bias have focused on historical and aggregation bias, that is, the need to identify protected
groups and calculate disparate treatment and impact. This is at the heart of the E2EST/AA
methodology. However, bias and inefficiencies can emerge at other times, and a focus on historical
and aggregation bias alone will lead to incomplete and therefore harmful assessments of bias. This
will result in rights violations, stereotyping, bad or inefficient decisions, discrimination of individuals
and groups, and the reproduction of processes of inequality and dispossession. Partial or wrongful
identification of bias sources and inadequate mitigation measures will lead to unacceptable societal
harms and compliance risks.
The E2EST/AA distinguishing between moments and sources of bias. This provides the auditor with an
overview of the possible causes of a given disparate impact, understood not only as an individual
function of accuracy or performance but also as a general measure of (lack of) fairness in an
algorithmic process. The E2EST/AA method defines and identifies moments and sources of bias,
establishes the documents and tests needed to assess compliance with legal and social requirements,
provides an opportunity to address and mitigate inefficiencies and harms, and provides a measure for
overall system fairness and impact.
AI Auditing - Checklist for AI Auditing
13
AI life-cycle
Pre-processing
In-processing (model
inference)
Post-processing (model
deployment)
Moments of
bias
World → Data
Data → Population
Population → Sample
Sample → Variables + Values
Variables + Values →
Patterns
Patterns → Predictions
Predictions → Decisions
Decisions → World
Sources of
bias
Techno-solutionist bias
Selection bias
Historical bias
Label bias
Generalization bias
Statistical bias
Oversimplification, partial or
biased featurization
Omitted variable
Over and underfitting
Measurement bias
Hot hand fallacy
Privacy bias
Aggregation bias
Benchmark test bias
Data visualization
Automation bias
Deployment bias
Specifically, in the framework of an investigation carried out by a Supervisory Authority should record
the existence of:
DATA MANAGEMENT
Data quality assurance [Article 5.1]
Look for evidence to check, at least, the following questions:
o Is there a documented procedure to manage and ensure proper data governance,
which allows to verify and provide guarantees of the accuracy, integrity, accuracy,
veracity, update and adequacy of the datasets used for training, testing and
operation?
o Are there supervisory mechanisms for data collection, processing, storage and use
processes?
o Has a previous analysis been carried out together with a measurement of the sample
used for training the relevant model? Has the sample size been verified as adequate?
Has the frequency and distribution of each feature been verified, their intersection or
the relevant groups for the study are appropriate regarding defined parameters or to
reality?
o Has the learning process been analysed, both at the beginning and in each iteration of
the global learning process, and on the sample used to train the model? Has it been
verified that the final dataset is representative with respect to the population of the
context to which the AI-based component is oriented and that the groups defined by
said AI component are appropriate?
o Is the feature distribution appropriate and make de IA component not especially
sensitive or ignores any of them?
AI Auditing - Checklist for AI Auditing
14
o Are there procedures to analyse, measure and detect any possible imbalances
between the amount of data that the component collects on a certain feature with
respect to another and which may lead to behaviour deviations?
o Has an accurate compensation analysis been carried out in order to establish the
relationship between the amount and type of data to be collected/discarded and those
who are necessary to guarantee the that the AI component is effective and efficient?
o Has a sample size analysis has been carried out regarding data storage for audit
purposes?
Definition of the origin of the data sources [Articles 5 and 9]
Look for evidence to check, at least, the following questions:
o Has the origin and the data sources context used for training and validating the model
been identified?
o Is there documentation that justify the selection process of data sources used to train
the relevant AI-based component?
o Are legal grounds to used personal data in the different stages of the AI-based
component life cycle identified?
o Is there a justification to collect and use personal data when such data are not
necessary in the training stage, in order to test the model behaviour in the subsequent
stages of component verification and validation?
o If sensitive personal data are processed, has the need for their use been assessed and
certain circumstances justify to lift the general prohibition to process such data?
Preprocessing of personal data [Article 5]
Look for evidence to check, at least, the following questions:
o Is the criteria to carry out previous cleansing of original data sets and any other tasks
needed throughout the different iterations of the AI-based training process duly
identified and documented?
o Are data cleaning techniques and best practices used in the data cleansing process
properly selected and documented?
o Do classifying features define clearly distinguishable and identifiable types?
o Is the structure and properties of the processed data set documented, including the
number of data subjects and the extension of used data?
o Have data been previously classified into categories, organizing them in non personal
and personal data, and, for the latter, identifying which fields constitute identifiers,
quasi-identifiers and special data categories?
o Have the relevant model features for the model been determined (identifying the those
associated with special data categories and proxy variables, including the necessary
information for their interpretation)?
o Has data minimisation criteria been 14determined and applied to the different stages
of the AI component, using strategies such as data hiding, separation, abstraction,
anonymisation and pseudonymisation that might apply for the purposes of
maximising privacy in the operation of the relevant AI-based component?
o Do databases have an associated data-dictionary for the analysis and understanding?
o Have segregation and de-identification strategies been implemented on additional
information that is not required for training purposes but shall be required in the
verification and validation processes of the model’s behaviour? It is needed to analyse
correlations between variables, measure the degree of accuracy of the AI component
with regard to certain attributes and ensure that no biases are introduced.
AI Auditing - Checklist for AI Auditing
15
o Have data selection and assessment been carried out with the involvement of an
expert in modelling techniques and data science?
o Have training and validating data been previously pre-processed and cleaned in order
to detect any possible abnormality which may require previous processing (i.e.,
boundary values, incomplete records, etc.) and to convert any heterogeneous data
sources to a homogeneous format?
o In case of input data are not appropriate with regard to the functioning of the AI
component or because it is not representative of the reality it intends to reflect, have
the necessary modifications been introduced in the format of these data?
o When necessary, has a data anonymisation analysis, including the possible risk of re-
identification, been carried out?
o When necessary, if data imputation techniques have been used to complete the
information of the data set, have the procedures and algorithms used for such
imputation been documented?
Bias control [Article 5.1.d]
Look for evidence to check, at least, the following questions:
o Have appropriate procedures been defined in order to identify and remove, or at least
limit, any bias in the data used to train the relevant model?
o Has it been verified that in training data did not have previous biases?
o Is there a procedure to assess the need to have additional data for improving precision
or removing any possible bias?
o Are there human supervision mechanisms implemented in order to control and ensure
that results are bias-free?
o Are mechanisms implemented to enable data subjects to request human intervention,
provide feedback or refute the results obtained by means of automated decision-
making algorithms?
VERIFICATION AND VALIDATION
Adapting the verification and validation process of the AI based component [Articles 5.1.b
and 5.2]
Look for evidence to check, at least, the following questions:
o Is there documentation that duly describe the verification and validation process, the
techniques used, the verification and test assembly carried out, the results obtained,
and the proposed actions?
o Have there been established or followed guidelines, standards or regulations in order
to carry out a systematic procedure to verify and validate the AI-based component and
its behaviour once integrated in the processing activities it supports?
o Are control and supervision mechanisms in place to ensure that the AI-based
component efficiently complies with its intended goals and purposes?
o Are metrics and criteria, on which verifications within the verification and validation
process shall be carried out, defined and justified?
o Is a testing strategy defined? Related to this strategy, is there a testing plan to assess
the correction of the AI component both from structural and functional terms?
o Are the personnel involved in AI-component verification and validation tasks qualified
to carry out the necessary checks in order to ensure that the component has been
correctly built and behaves as expected?
Verification and validation of the AI-based component [Articles 5.1.a and 5.1.b]
Look for evidence to check, at least, the following questions:
AI Auditing - Checklist for AI Auditing
16
o Does the testing plan include reviews and, when appropriate, inspections for the
purposes of early identification and remedy of defects in requirements or design,
incorrect specifications or deviations from applicable criteria during development?
o Is white-box testing of the network design or the AI component considered as part of
the testing plan?
o Is white-box testing at code and implementation levels included in the testing plan?
o Is black-box testing considered as part of the testing plan in order to ensure that
functionality of AI-based component is guaranteed, it behaves as expected and the
information integrity is preserved?
o Are security check tests provided as part of the test plan in relation to the protection
of rights and freedoms, in its holistic definition (physical and IT) in the case of AI
components implemented in robotic systems, industry 4.0, or the Internet of Things?
o Does the validation test plan include verification of boundary values and extreme test
cases which might make the component functioning in an unexpected manner?
o Is there a cleaning procedure to correct any errors, shortcomings or inconsistencies
detected during the verification and validation process?
Performance [Article 5.1.d]
Look for evidence to check, at least, the following questions:
o Are metrics or sets of aggregated metrics used to determine the precision, accuracy,
sensitivity or other performance parameters of the relevant component in
consideration of the principle of data accuracy established?
o Are the rate values of false positives and false negatives yielded by the AI component
known and analysed and interpreted in order to establish their accuracy, specificity
and sensitivity of the component behaviour?
o Has the level and definition of the performance parameters required for the AI-based
component in the framework of the processing been assessed?
o Have the performance values between different options of AI components been
compared in the context of a process of selection of the most appropriate component
for a specific processing?
o Are output variables defined and determined with special consideration to those that
constitute special data categories?
o Have measures, which ensure that data used are exhaustive and updated, adopted?
o Have relevant parameters and their cut-off values been determined (so the model
considers certain variables in order to obtain significant results)?
o Are there procedures to detect whether the response of the AI-based component to
input data is erroneous or exceeds a predetermined error threshold, or whether there
are different error thresholds associated with different categories of data subjects in
the data set?
o Has a dimension reduction been carried out in order to achieve a balance between
complexity and generalization?
Consistency [Article 5.1.d]
Look for evidence to check, at least, the following questions:
o Is there a procedure to verify whether the obtained results present significant changes
with respect to the results expected, and to act accordingly?
o Has a threshold been established to determine when an obtained result deviates from
the expected result based on identical or similar data inputs (significant deviations)?
AI Auditing - Checklist for AI Auditing
17
o Has it been analysed whether the AI-based component behaves differently when
processing data from individuals who differ in their personal characteristic associated
to special data categories or in the values of the proxy variables?
o Has the effect of changes in low prevalence variables within the training dataset in
output results of the AI-based component been assessed?
o Are there measures adopted to ensure component independence?
o Is it verified that there is no correlation between the results and the additional
variables associated to data subjects that are not a part of the process variables and
which may evidence the existence of biases?
Stability and robustness [Article 5.2]
Look for evidence to check, at least, the following questions:
o Within the possible or actual context of function of the relevant component, are the
factors, whose variation may impact the properties of the AI component and may
establish the need to manage its readjustment, identified?
o Has the AI component behaviour in unexpected environments been assessed?
o Has a time estimation for reassessment, readjustment or reboot of the component in
order to have it adjusted to input data deviation or changes in decision-making criteria
is required been analysed?
o Is there any documentation that show whether the AI component has been built with
a static approach, a dynamic approach or a continuous learning approach by design?
o In case of continuous learning AI component, has the degree of adaptability to new
input data or types of data been assessed? have monitoring procedures and
mechanisms been defined in order to verify that conclusions obtained remain valid,
that the AI component is capable of acquiring new knowledge and/or previous
associations learned have not been lost?
Traceability [Articles 5 and 22]
Look for evidence to check, at least, the following questions:
o Is there a version control system in place for all elements of the AI-based component:
used datasets, AI-based component code, libraries used and any other element
associated with the component?
o Is there a formal and documented procedure, subject to reassessment as appropriate,
of risk assessment depending on such changes that may occur on the implementation
of the AI-based component throughout its life cycle?
o Have monitoring and supervision mechanisms (such as log files and results records)
been implemented to properly assess the behaviour of the component in interaction
with environment, to measure that the relevant outputs are adjusted to the responses
of real-life processes that they model and to any potential inconsistency between
expected behaviour and the automated one?
o Is there a record of incidences and previous abnormal behaviours detected and
remedied?
o Are there monitoring mechanisms available for human operators for monitoring and
verification purposes?
o Has a procedure been implemented and documented to ensure human intervention in
decision-making, either on its own initiative, when results deviate from expected
behaviour, or on request of data subjects affected by the AI-based component’s
output?
o Are mechanisms adopted within the framework of the processing so that the results
and decisions taken may be entirely the responsibility of human operators?
AI Auditing - Checklist for AI Auditing
18
Security [Articles 5.1.f, 25 and 32]
Look for evidence to check, at least, the following questions:
o Has a risk analysis developed with regard to the risks for rights and freedoms of
persons? Have the results of this risk analysis been used to determine the security and
privacy requirements of the AI-based component in the framework of the processing?
o Are data protection and security requirements related defined at the origin and
together with any other requirements, regardless of whether they are to be applied to
the design of a new AI-based component or to the modification of an existing one?
o Have standards and best practices been taken in consideration for secure
configuration and development of the AI relevant component?
o Are measures to ensure protection of the processed data implemented? particularly
those oriented to guarantee confidentiality by means of data anonymisation or
pseudonymisation, and integrity to protect component implementation from
accidental or intentional manipulation.
o Are measures to guarantee component resilience and its capacity to withstand an
attack implemented?
o Have procedures implemented in order to properly monitor the functioning of the
component and early detect any potential data leak, unauthorised access or other
security breaches?
o Do component users and operators have sufficient information and are able to be
aware of their security duties and responsibilities regarding data protection and
safeguarding data subjects’ rights and freedoms?
3.4. Bias testing
Based on the documentation provided and access to team developers and the data available, different
types of tests can be designed to determine whether different types of bias are impacting systems in
ways that may cause harm to individuals, groups, society or the efficient functioning of an AI system.
In all cases, bias testing involves a documentation and literature review, interviews with
developers/implementors and a good understanding on who is impacted by AI systems and how. Bias
testing involves statistical analysis and checking, and auditors have a choice of fairness definitions and
metrics to choose from. Statistical notions of fairness such as those described by Verma & Rubin (2018)
are a good starting point and can be the basis for more advanced approaches such as similarity-based
measures and causal reasoning. In some cases, bias testing requires reaching out to end users or those
impacted by systems.
As it may not be feasible for an inspector to go through all moments and sources of bias, the main
step of the inspection exercise must include:
a) Definition of protected groups: in the context of artificial intelligence, a protected group is a
group of people who are historically disadvantaged or marginalized, and who may be at risk
of discrimination or negative impacts from the development and deployment of AI. Protected
groups may be defined by characteristics such as race, ethnicity, gender, sexual orientation,
religion, age, ability, and socio-economic status.
b) Testing the output of the AI system: one way to measure bias is to test the output of the AI
system and compare it to a benchmark or ground truth. For example, if an AI system is
intended to classify objects in images, the inspector could test its performance on a dataset
that includes a diverse range of objects and see how accurately it classifies them.
c) Examining the training data: another way to measure bias is to examine the training data that
was used to develop the AI system. If the training data is not representative of the population
AI Auditing - Checklist for AI Auditing
19
that the AI system will be used on, or if it contains biased examples, then the AI system may
also be biased.
d) Using fairness metrics: fairness metrics are used to determine whether a protected group has
sufficient presence, receives consistent treatment and is properly represented in the system.
A good place to start is calculating the Risk Difference, where RD is p1-p2 and the Risk Ratio
(where p1/p2). The inspector can also measure demographic parity, equal opportunity,
equalized odds, and seek to measure both direct and indirect bias through different means.
The best means to utilize will be determined by the system’s transparence, complexity and
the inspection point (pre-processing, in-processing or post-processing). Any inconsistency
detected will point to issues that need to be discussed with the development team and further
explored to ensure that all necessary precautions and measures to ensure a fair functioning
of the system have been taken and documented.
3.5. Adversarial audit (optional)
The most thorough auditing methodology can still miss things. Omitted variables or proxies that only
become visible once an algorithmic system is functioning in real-life production settings will result in
unfair treatment and harmful impacts. For unsupervised ML models, reverse-engineering may be the
only way to trace back model attributes. For high-risk and unsupervised ML systems, performing an
adversarial audit once a system is implemented is highly recommended. Adversarial audits are also
useful to verify that the information provided during the auditing process is complete and accurate.
Adversarial auditing can reveal the existence of the moments of bias listed above, but also additional
sources of bias such as learning bias, which occurs when an unsupervised ML system incorporates new
variables and labels that emerge from the training data without human intervention or control,
leading to potential harms that are only identified when the auditor can access impact data at scale.
To conduct an adversarial audit, the auditor needs to gather impact data at scale. This can be done
through scrapping web sources (in the case of web-based systems), by interviewing end users, by
crowdsourcing end-user data or by sockpuppeting a system (creating fake profiles or input data with
specific characteristics to trigger mode outcomes and analyze them).
Adversarial audits can complement a E2EST/AA or be conducted as a stand-alone when impacted
communities or regulators do not have access to an algorithmic system.
4. The audit report
Audits should always result in a public document. However, this is not the only report that will be
produced during the audit process. A crucial part of auditing is documentation, and so all interactions
and documents exchanged must be compiled and either kept on file by system owners (and, if both
parties agree, by auditors). There are three main audit reports:
a) Internal E2EST/AA report with mitigation measures and annexes
This document captures the process followed, the issues identified and the mitigation measures that
have been applied or can be applied. Contrary to financial auditors, algorithmic auditors do engage in
proposing solutions, monitoring their implementation and reporting on the final results. The internal
audit report need not be published.
AI Auditing - Checklist for AI Auditing
20
b) Public E2EST/AA report
Final version of the audit process, where auditors describe the system, the auditing methodology, the
mitigation and improvement measures implemented and further recommendations, if any. The public
audit report must also include a proposal for the periodicity and methodology/metrics to be used in
follow-up audits.
c) Periodic E2EST/AA reports
Follow-up audit reports. These must always refer and provide access to the initial audit report, if it is
still relevant, and provide guarantees that the system developers have continued to test for bias,
implement mitigation measures and control for impact. Depending on the complexity of the system/s,
both parties may agree to produce an internal and a public version of each periodic audit.