Explainable artificial intelligence XAI
Quote of the day “Even if I become the number one, there is an entity that cannot be defeated.” Go champion Lee Sedol explains why he’s retiring from professional play, after being beaten at the game by an AI system built by DeepMind. |
Explainable AI (XAI) refers to methods and techniques in the application of artificial intelligence technology (AI) such that the results of the solution can be understood by human experts. It contrasts with the concept of the “black box” in machine learning where even their designers cannot explain why the AI arrived at a specific decision.[1] XAI is an implemention of the social right to explanation.[2] Some claim that transparency rarely comes for free and that there are often trade-offs between the accuracy and the explanaibility of a solution [3].
The technical challenge of explaining AI decisions is sometimes known as the interpretability problem.[4] Another consideration is info-besity (overload of information), thus, full transparency may not be always possible or even required.[citation needed]
AI systems optimize behavior to satisfy a mathematically-specified goal system chosen by the system designers, such as the command “maximize accuracy of assessing how positive film reviews are in the test dataset”. The AI may learn useful general rules from the test-set, such as “reviews containing the word ‘horrible’” are likely to be negative”. However, it may also learn inappropriate rules, such as “reviews containing ‘Daniel Day-Lewis’ are usually positive”; such rules may be undesirable if they are deemed likely to fail to generalize outside the test set, or if people consider the rule to be “cheating” or “unfair”. A human can audit rules in an XAI to get an idea how likely the system is to generalize to future real-world data outside the test-set.[4]
Cooperation between agents, in this case algorithms and humans, depends on trust. If humans are to accept algorithmic prescriptions, they need to trust them. Incompleteness in formalization of trust criteria is a barrier to straightforward optimization approaches. For that reason, interpretability and explainability are posited as intermediate goals for checking other criteria.[5]
AI systems sometimes learn undesirable tricks that do an optimal job of satisfying explicit pre-programmed goals on the training data, but that do not reflect the complicated implicit desires of the human system designers. For example, a 2017 system tasked with image recognition learned to “cheat” by looking for a copyright tag that happened to be associated with horse pictures, rather than learning how to tell if a horse was actually pictured.[1] In another 2017 system, a supervised learning AI tasked with grasping items in a virtual world learned to cheat by placing its manipulator between the object and the viewer in a way such that it falsely appeared to be grasping the object.[6][7]
One transparency project, the DARPA XAI program, aims to produce “glass box” models that are explainable to a “human-in-the-loop”, without greatly sacrificing AI performance. Human users should be able to understand the AI’s cognition (both in real-time and after the fact), and should be able to determine when to trust the AI and when the AI should be distrusted.[8][9] Other applications of XAI are knowledge extraction from black-box models and model comparisons.[10]. The term “glass box” has also been used to a system that monitors the inputs and outputs of a system, with the purpose of verifying the system’s adherence to ethical and socio-legal values and, therefore, producing value-based explanations [11]. Furthermore, the same term has been used to name a voice assistant that produces counterfactual statements as explanations [12].
From Wikipedia, the free encyclopedia