European Commission’s Ethics Guidelines for Trustworthy AI

(link)

This is a lengthy article that considers the goal of “trustworthy AI”, as opposed to “ethical AI” in most other frameworks. The authors instead park “ethical” under the “trustworthy” umbrella.

Trustworthy AI

The guidelines defines trustworthy AI via three qualities:

Lawful
Ethical
Robust

Context-specific

These guidelines recognize that AI systems and their associated problems are contextual.

Given the context-specificity of AI systems, the implementation of these Guidelines needs to be adapted to the particular AI-application.

Foundations (Chapter 1)

This section highlights four ethical principles underpinning ethical AI, derived from fundamental human rights.

Respect for human autonomy
Prevention of harm
Fairness
Explicability

Context-specific Nature of Ethics

A domain-specific ethics code – however consistent, developed and fine-grained future versions of it may be – can never function as a substitute for ethical reasoning itself, which must always remain sensitive to contextual details that cannot be captured in general Guidelines .

Distinction between legal and ethical

[…] ethical norms that are not necessarily legally binding yet crucial to ensure trustworthiness.

[…] while many legal obligations reflect ethical principles, adherence to ethical principles goes beyond formal compliance with existing laws.

The Four Principles

Respect for human autonomy

This places emphasis on AI as augmentation rather than decision-makers, with references made to human-centric design and human-in-the-loop systems.

AI systems should not unjustifiably subordinate, coerce, deceive, manipulate, condition or herd humans. Instead, they should be designeed to augment, complement and empower human cognitive, social and cultural skills.

The elaboration also implicitly includes the cases of misinformation and news fabrication using AI technologies.

Prevention of harm

This section includes both intentional and unintentional harm in mental, physical and environmental aspects, making reference to notions robustness and fairness. Fairness here considers diversity, bias and power asymmetry.

Fairness

While we acknowledge that there are many different interpretations of fairness, we believe that fairness has both a substantive and a procedural dimension.

Substantive This refers to ensuring non-discrimination in AI systems, as well as equal access to “education, goods, services and technology”. This also refers to prohibition of deception and misinformation and the balance of competing interests.

Procedural This refers to the capacity for humans to “contest and seek effective redress”, which entails human oversight, accountability and explicability.

Explicability

Explicability here refers primarily to explainability and transparency of AI systems. In cases where these are unavailable, then “other explicability measures (e.g. traceability, auditability and transparent communication on system capabilities) may be required”.

The degree to which explicability is needed is highly dependent on the context and the severity of the consequences if that output is erroneous or otherwise inaccurate.

Tensions

The guidelines explicitly recognizes that tradeoffs may be required in the above principles, citing the example of predictive policing as a tradeoff between autonomy and prevention of harm.

But they also mention that certain principles are absolute (without explicitly listing these principles).

Certain fundamental rights and correlated principles are absolute and cannot be subject to a balancing exercise (e.g. human dignity).

Implementation (Chapter 2)

This section lists seven key requirements for the implementation of trustworthy AI, although the guidelines consider the list to be non-exhaustive.

Human agency and oversight
Technical robustness and safety
Privacy and data governance
Transparency
Diversity, non-discrimination and fairness
Societal and environmental wellbeing
Accountability

Requirements

Human agency and oversight

This considers the enabling and disabling roles of AI in freedom and the democratic society, as well as human autonomy in the form of oversight over AI systems.

This section also considers different mechanisms “such as a human-in-the-loop (HITL), human-on-the-loop (HOTL), or human-in-command (HIC) approach”.

Technical robustness and safety

This focuses on the risk of unintended harm due to unreliable systems. There are many aspects of robustness and safety here including:

Resilience to attack (cybersecurity, adversarial attacks and physical sabotage)
Safeguards (safety measures including graceful degradation e.g. deep learning to rule-based systems to human intervention; this includes both system failure and cases of wrong predictions)
Accuracy (this forms part of auditing and goes beyond naive accuracy to include other metrics; also considers explicit visualization of confidence levels / likelihoods, characterizations of error rates and differentiation between types of errors)
Reliability and Reproducibility (this forms the second part of auditing; however, consider cases where reliability can only be estimated but not proven)

Privacy and data governance

Closely linked to the principle of prevention of harm is privacy, a fundamental right particularly affected by AI systems.

This section considers all aspects of data governance, including physical and cyber security and quality / integrity of data. Considerations about quality / integrity include concerns about self-learning systems, algorithmic bias and adversarial attacks.

One aspect not explicitly mentioned here is the exploitation of AI systems to reveal training data, such as membership inference attacks.

Transparency

This section considers transparency of several components - the AI system, the data pipeline and the business models, as well as their constituent parts.

Traceability (This focuses on documentation and identification of why a decision / mistake was made, facilitating auditability and explainability)
Explanability (Emphasis on human understanding and the tradeoff between explainability and utility / accuracy; this also includes explanation of how the AI system factors into the business model e.g. what data is required and why, what does the AI influence and why it is necessary)
Communication (This has several aspects including: self-identification of AI-systems or explicit mention of the use of AI (e.g. Duplex) and the option for human interaction, communication of capabilities and limitations, informed consent for data collection)

Diversity, non-discrimination and fairness

Non-discrimination here refers to prevention of algorithmic bias in datasets and algorithms, as well as diversity in the workplace that designs and implements these systems. It also refers to accessibility to these goods and services across demographics, making explicit reference to Universal Design principles. This section also highlights the importance of open feedback and participatory mechnisms.

Societal and environmental wellbeing

This considers the societal and environmental impact of AI systems across the AI pipeline. This includes energy consumption, invisible low-wage labor, alteration of social relations and values, effects on users and society etc. There is also an implicit reference to misinformation.

Accountability

This includes the facilitation of formal and informal, external and internal audits, including considerations of accuracy and reliability mentioned above. Beyond audits, this section also considers the notion of redress i.e. compensation, which is considered a key component of trust “when things go wrong”.

Technical and Non-technical Methods

This section includes a general list of methods to implement trustworthy AI, refer to original document for details.

Assessment (Chapter 3)

This is an interesting section that details a long list of questions for enforcing trustworthy AI by AI practitioners, refer to original document for details.