[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-pbiecek--xai_resources":3,"tool-pbiecek--xai_resources":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",160015,2,"2026-04-18T11:30:52",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",109154,"2026-04-18T11:18:24",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[52,13,15,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":73,"owner_avatar_url":74,"owner_bio":75,"owner_company":76,"owner_location":77,"owner_email":78,"owner_twitter":79,"owner_website":80,"owner_url":81,"languages":82,"stars":87,"forks":88,"last_commit_at":89,"license":78,"difficulty_score":90,"env_os":91,"env_gpu":92,"env_ram":92,"env_deps":93,"category_tags":96,"github_topics":97,"view_count":32,"oss_zip_url":78,"oss_zip_packed_at":78,"status":17,"created_at":101,"updated_at":102,"faqs":103,"releases":104},9307,"pbiecek\u002Fxai_resources","xai_resources","Interesting resources related to XAI (Explainable Artificial Intelligence)","xai_resources 是一个专注于可解释人工智能（XAI）领域的优质资源聚合库。它系统性地整理了学术论文、专业书籍、软件工具、新闻报道及学位论文等多类资料，旨在解决 XAI 领域知识分散、难以快速定位高质量参考文献的痛点。\n\n面对 AI 模型日益复杂的“黑箱”特性，如何让决策过程透明化并满足临床或金融等实际场景的需求，是当前技术落地的关键挑战。xai_resources 通过精选如多模态医疗影像评估、基于实际应用的后验解释方法对比等前沿研究，帮助用户深入理解不同解释算法的优劣与适用边界。例如，库中收录的研究揭示了现有热力图在多模态数据中的局限性，以及 LIME、SHAP 等工具在真实欺诈检测任务中对人类决策的实际影响。\n\n这份资源特别适合 AI 研究人员、算法开发者以及需要向利益相关者展示模型逻辑的产品设计师使用。无论是希望跟进最新学术动态，还是寻找经过验证的工具来优化系统透明度，xai_resources 都能提供坚实的理论与实证支持，助力构建更可信、更易理解的人工智能系统。","# Interesting resources related to XAI (Explainable Artificial Intelligence)\n\n* [Papers and preprints in scientific journals](README.md#papers)\n* [Books and longer materials](README.md#books)\n* [Software tools](README.md#tools)\n* [Short articles in newspapers](README.md#articles)\n* [Misc](README.md#theses)\n\n## Papers\n\n### 2021\n\n* [Evaluating Explainable AI on a Multi-Modal Medical Imaging Task: Can Existing Algorithms Fulfill Clinical Requirements?](https:\u002F\u002Fwww2.cs.sfu.ca\u002F~hamarneh\u002Fecopy\u002Faaai2022.pdf).  Weina Jin, Xiaoxiao Li, Ghassan Hamarneh. Being able to explain the prediction to clinical end-users is a necessity to leverage the power of artificial intelligence (AI) models for clinical decision support. For medical images, a feature attribution map, or heatmap, is the most common form of explanation that highlights important features for AI models’ prediction. However, it is still unknown how well heatmaps perform on explaining decisions on multi-modal medical images, where each modality\u002Fchannel carries distinct clinical meanings of the same underlying biomedical phenomenon. Understanding such modality-dependent features is essential for clinical users’ interpretation of AI decisions. To tackle this clinically important but technically ignored problem, we propose the Modality-Specific Feature Importance (MSFI) metric. It encodes the clinical requirements on modality prioritization and modality-specific feature localization. We conduct a clinical requirement-grounded, systematic evaluation on 16 commonly used XAI algorithms, assessed by MSFI, other non-modality-specific metrics, and a clinician user study. The results show that most existing XAI algorithms can not adequately highlight modality-specific important features to fulfill clinical requirements. The evaluation results and the MSFI metric can guide the design and selection of XAI algorithms to meet clinician’s requirements on multi-modal explanation.\n\n![EvaluatingExplainableAI](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_f41215d95ba2.png)\n\n\n* [How can I choose an explainer? An Application-grounded Evaluation of Post-hoc Explanations](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002F10.1145\u002F3442188.3445941). There have been several research works proposing new Explainable AI (XAI) methods designed to generate model explanations having specific properties, or desiderata, such as fidelity, robustness, or human-interpretability. However, explanations are seldom evaluated based on their true practical impact on decision-making tasks. Without that assessment, explanations might be chosen that, in fact, hurt the overall performance of the combined system of ML model + end-users. This study aims to bridge this gap by proposing XAI Test, an application-grounded evaluation methodology tailored to isolate the impact of providing the end-user with different levels of information. We conducted an experiment following XAI Test to evaluate three popular post-hoc explanation methods -- LIME, SHAP, and TreeInterpreter -- on a real-world fraud detection task, with real data, a deployed ML model, and fraud analysts. During the experiment, we gradually increased the information provided to the fraud analysts in three stages: Data Only, i.e., just transaction data without access to model score nor explanations, Data + ML Model Score, and Data + ML Model Score + Explanations. Using strong statistical analysis, we show that, in general, these popular explainers have a worse impact than desired. Some of the conclusion highlights include: i) showing Data Only results in the highest decision accuracy and the slowest decision time among all variants tested, ii) all the explainers improve accuracy over the Data + ML Model Score variant but still result in lower accuracy when compared with Data Only; iii) LIME was the least preferred by users, probably due to its substantially lower variability of explanations from case to case.\n\n![choose_explainer](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_6b7fa7fb4bbf.png)\n\n\n* [Reasons, Values, Stakeholders: A Philosophical Framework for Explainable Artificial Intelligence](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fpdf\u002F10.1145\u002F3442188.3445866). The societal and ethical implications of the use of opaque artificial intelligence systems in consequential decisions, such as welfare allocation and criminal justice, have generated a lively debate among multiple stakeholders, including computer scientists, ethicists, social scientists, policy makers, and end users. However, the lack of a common language or a multi-dimensional framework to appropriately bridge the technical, epistemic, and normative aspects of this debate nearly prevents the discussion from being as productive as it could be. Drawing on the philosophical literature on the nature and value of explanations, this paper offers a multi-faceted framework that brings more conceptual precision to the present debate by identifying the types of explanations that are most pertinent to artificial intelligence predictions, recognizing the relevance and importance of the social and ethical values for the evaluation of these explanations, and demonstrating the importance of these explanations for incorporating a diversified approach to improving the design of truthful algorithmic ecosystems. The proposed philosophical framework thus lays the groundwork for establishing a pertinent connection between the technical and ethical aspects of artificial intelligence systems.\n\n![RVSFramework](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_d2923e724fbf.png)\n\n* [Formalizing Trust in Artificial Intelligence: Prerequisites, Causes and Goals of Human Trust in AI](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fpdf\u002F10.1145\u002F3442188.3445923). \nTrust is a central component of the interaction between people and AI, in that 'incorrect' levels of trust may cause misuse, abuse or disuse of the technology. But what, precisely, is the nature of trust in AI? What are the prerequisites and goals of the cognitive mechanism of trust, and how can we promote them, or assess whether they are being satisfied in a given interaction? This work aims to answer these questions. We discuss a model of trust inspired by, but not identical to, interpersonal trust (i.e., trust between people) as defined by sociologists. This model rests on two key properties: the vulnerability of the user; and the ability to anticipate the impact of the AI model's decisions. We incorporate a formalization of 'contractual trust', such that trust between a user and an AI model is trust that some implicit or explicit contract will hold, and a formalization of 'trustworthiness' (that detaches from the notion of trustworthiness in sociology), and with it concepts of 'warranted' and 'unwarranted' trust. We present the possible causes of warranted trust as intrinsic reasoning and extrinsic behavior, and discuss how to design trustworthy AI, how to evaluate whether trust has manifested, and whether it is warranted. Finally, we elucidate the connection between trust and XAI using our formalization.\n\n![FormalizingTrust](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_f47b239a8db3.png)\n\n* [Comparative evaluation of contribution-value plots for machine learning understanding](https:\u002F\u002Flink.springer.com\u002Farticle\u002F10.1007\u002Fs12650-021-00776-w) The field of explainable artificial intelligence aims to help experts understand complex machine learning models. One key approach is to show the impact of a feature on the model prediction. This helps experts to verify and validate the predictions the model provides. However, many challenges remain open. For example, due to the subjective nature of interpretability, a strict definition of concepts such as the contribution of a feature remains elusive. Different techniques have varying underlying assumptions, which can cause inconsistent and conflicting views. In this work, we introduce local and global contribution-value plots as a novel approach to visualize feature impact on predictions and the relationship with feature value. We discuss design decisions and show an exemplary visual analytics implementation that provides new insights into the model. We conducted a user study and found the visualizations aid model interpretation by increasing correctness and confidence and reducing the time taken to obtain insights. [[website]](https:\u002F\u002Fexplaining.ml\u002Fcvplots)\n\n![CVPlots2021](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_4755a3c97b5a.jpg)\n\n### 2020\n\n* [A Performance-Explainability Framework to Benchmark Machine Learning Methods: Application to Multivariate Time Series Classifiers](https:\u002F\u002Farxiv.org\u002Fabs\u002F2005.14501); Kevin Fauvel, Véronique Masson, Élisa Fromont; Our research aims to propose a new performance-explainability analytical framework to assess and benchmark machine learning methods. The framework details a set of characteristics that operationalize the performance-explainability assessment of existing machine learning methods. In order to illustrate the use of the framework, we apply it to benchmark the current state-of-the-art multivariate time series classifiers.\n\n![MultivariateTimeSeriesClassifiers](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_50e61ed1ab50.png)\n\n* [EXPLAN: Explaining Black-box Classifiers using Adaptive Neighborhood Generation](https:\u002F\u002Fieeexplore.ieee.org\u002Fdocument\u002F9206710); Peyman Rasouli and Ingrid Chieh Yu; Defining a representative locality is an urgent challenge in perturbation-based explanation methods, which influences the fidelity and soundness of explanations. We address this issue by proposing a robust and intuitive approach for EXPLaining black-box classifiers using Adaptive Neighborhood generation (EXPLAN). EXPLAN is a module-based algorithm consisted of dense data generation, representative data selection, data balancing, and rule-based interpretable model. It takes into account the adjacency information derived from the black-box decision function and the structure of the data for creating a representative neighborhood for the instance being explained. As a local model-agnostic explanation method, EXPLAN generates explanations in the form of logical rules that are highly interpretable and well-suited for qualitative analysis of the model's behavior. We discuss fidelity-interpretability trade-offs and demonstrate the performance of the proposed algorithm by a comprehensive comparison with state-of-the-art explanation methods LIME, LORE, and Anchor. The conducted experiments on real-world data sets show our method achieves solid empirical results in terms of fidelity, precision, and stability of explanations. [[Paper]](https:\u002F\u002Fieeexplore.ieee.org\u002Fdocument\u002F9206710) [[Github]](https:\u002F\u002Fgithub.com\u002Fpeymanras\u002FEXPLAN)\n\n* [GRACE: Generating Concise and Informative Contrastive Sample to Explain Neural Network Model's Prediction](https:\u002F\u002Farxiv.org\u002Fabs\u002F1911.02042); Thai Le, Suhang Wang, Dongwon Lee; Despite the recent development in the topic of explainable AI\u002FML for image and text data, the majority of current solutions are not suitable to explain the prediction of neural network models when the datasets are tabular and their features are in high-dimensional vectorized formats. To mitigate this limitation, therefore, we borrow two notable ideas (i.e., \"explanation by intervention\" from causality and \"explanation are contrastive\" from philosophy) and propose a novel solution, named as GRACE, that better explains neural network models' predictions for tabular datasets. In particular, given a model's prediction as label X, GRACE intervenes and generates a minimally-modified contrastive sample to be classified as Y, with an intuitive textual explanation, answering the question of \"Why X rather than Y?\" We carry out comprehensive experiments using eleven public datasets of different scales and domains (e.g., # of features ranges from 5 to 216) and compare GRACE with competing baselines on different measures: fidelity, conciseness, info-gain, and influence. The user-studies show that our generated explanation is not only more intuitive and easy-to-understand but also facilitates end-users to make as much as 60% more accurate post-explanation decisions than that of Lime.\n\n* [ExplainExplore: Visual Exploration of Machine Learning Explanation](https:\u002F\u002Fresearch.tue.nl\u002Ffiles\u002F170065756\u002F09086281.pdf); Dennis Collaris, Jack J. van Wijk; Machine learning models often exhibit complex behavior that is difficult to understand. Recent research in explainable AI has produced promising techniques to explain the inner workings of such models using feature contribution vectors. These vectors are helpful in a wide variety of applications. However, there are many parameters involved in this process and determining which settings are best is difficult due to the subjective nature of evaluating interpretability. To this end, we introduce ExplainExplore: an interactive explanation system to explore explanations that fit the subjective preference of data scientists. We leverage the domain knowledge of the data scientist to find optimal parameter settings and instance perturbations, and enable the discussion of the model and its explanation with domain experts. We present a use case on a real-world dataset to demonstrate the effectiveness of our approach for the exploration and tuning of machine learning explanations. [[website]](https:\u002F\u002Fexplaining.ml)\n\n![ExplainExplore2020](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_e678eff96025.png)\n\n* [FACE: Feasible and Actionable Counterfactual Explanations](https:\u002F\u002Farxiv.org\u002Fabs\u002F1909.09369); Rafael Poyiadzi, Kacper Sokol, Raul Santos-Rodriguez, Tijl De Bie, Peter Flach; Work in Counterfactual Explanations tends to focus on the principle of \"the closest possible world\" that identifies small changes leading to the desired outcome. In this paper we argue that while this approach might initially seem intuitively appealing it exhibits shortcomings not addressed in the current literature. First, a counterfactual example generated by the state-of-the-art systems is not necessarily representative of the underlying data distribution, and may therefore prescribe unachievable goals(e.g., an unsuccessful life insurance applicant with severe disability may be advised to do more sports). Secondly, the counterfactuals may not be based on a \"feasible path\" between the current state of the subject and the suggested one, making actionable recourse infeasible (e.g., low-skilled unsuccessful mortgage applicants may be told to double their salary, which may be hard without first increasing their skill level). \n\n![FACE2020](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_26eb939e021f.png)\n\n* [Explainability Fact Sheets: A Framework for Systematic Assessment of Explainable Approaches](https:\u002F\u002Farxiv.org\u002Fabs\u002F1912.05100); Kacper Sokol, Peter Flach; Explanations in Machine Learning come in many forms, but a consensus regarding their desired properties is yet to emerge. In this paper we introduce a taxonomy and a set of descriptors that can be used to characterise and systematically assess explainable systems along five key dimensions: functional, operational, usability, safety and validation. In order to design a comprehensive and representative taxonomy and associated descriptors we surveyed the eXplainable Artificial Intelligence literature, extracting the criteria and desiderata that other authors have proposed or implicitly used in their research. The survey includes papers introducing new explainability algorithms to see what criteria are used to guide their development and how these algorithms are evaluated, as well as papers proposing such criteria from both computer science and social science perspectives. This novel framework allows to systematically compare and contrast explainability approaches, not just to better understand their capabilities but also to identify discrepancies between their theoretical qualities and properties of their implementations. We developed an operationalisation of the framework in the form of Explainability Fact Sheets, which enable researchers and practitioners alike to quickly grasp capabilities and limitations of a particular explainable method. \n\n\n* [One Explanation Does Not Fit All: The Promise of Interactive Explanations for Machine Learning Transparency](https:\u002F\u002Farxiv.org\u002Fabs\u002F2001.09734); Kacper Sokol, Peter Flach; The need for transparency of predictive systems based on Machine Learning algorithms arises as a consequence of their ever-increasing proliferation in the industry. Whenever black-box algorithmic predictions influence human affairs, the inner workings of these algorithms should be scrutinised and their decisions explained to the relevant stakeholders, including the system engineers, the system's operators and the individuals whose case is being decided. While a variety of interpretability and explainability methods is available, none of them is a panacea that can satisfy all diverse expectations and competing objectives that might be required by the parties involved. We address this challenge in this paper by discussing the promises of Interactive Machine Learning for improved transparency of black-box systems using the example of contrastive explanations -- a state-of-the-art approach to Interpretable Machine Learning. Specifically, we show how to personalise counterfactual explanations by interactively adjusting their conditional statements and extract additional explanations by asking follow-up \"What if?\" questions.\n\n![oneXdoesnotFitAll](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_1cf1d08654c1.png)\n\n* [FAT Forensics: A Python Toolbox for Algorithmic Fairness, Accountability and Transparency](https:\u002F\u002Farxiv.org\u002Fabs\u002F1909.05167); Kacper Sokol, Raul Santos-Rodriguez, Peter Flach; Given the potential harm that ML algorithms can cause, qualities such as fairness, accountability and transparency of predictive systems are of paramount importance. Recent literature suggested voluntary self-reporting on these aspects of predictive systems -- e.g., data sheets for data sets -- but their scope is often limited to a single component of a machine learning pipeline, and producing them requires manual labour. To resolve this impasse and ensure high-quality, fair, transparent and reliable machine learning systems, we developed an open source toolbox that can inspect selected fairness, accountability and transparency aspects of these systems to automatically and objectively report them back to their engineers and users. We describe design, scope and usage examples of this Python toolbox in this paper. The toolbox provides functionality for inspecting fairness, accountability and transparency of all aspects of the machine learning process: data (and their features), models and predictions.\n\n![FATForensics](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_861fcb880679.png)\n\n* [Adaptive Explainable Neural Networks (AxNNs)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2004.02353); Jie Chen, Joel Vaughan, Vijayan Nair, Agus Sudjianto; While machine learning techniques have been successfully applied in several fields, the black-box nature of the models presents challenges for interpreting and explaining the results. We develop a new framework called Adaptive Explainable Neural Networks (AxNN) for achieving the dual goals of good predictive performance and model interpretability. For predictive performance, we build a structured neural network made up of ensembles of generalized additive model networks and additive index models (through explainable neural networks) using a two-stage process. This can be done using either a boosting or a stacking ensemble. For interpretability, we show how to decompose the results of AxNN into main effects and higher-order interaction effects. \n\n![AxNN](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_eb75b98c1e1a.png)\n\n* [Information Leakage in Embedding Models](https:\u002F\u002Farxiv.org\u002Fabs\u002F2004.00053); Congzheng Song, Ananth Raghunathan; We demonstrate that embeddings, in addition to encoding generic semantics, often also present a vector that leaks sensitive information about the input data. We develop three classes of attacks to systematically study information that might be leaked by embeddings. First, embedding vectors can be inverted to partially recover some of the input data. Second, embeddings may reveal sensitive attributes inherent in inputs and independent of the underlying semantic task at hand. Third, embedding models leak moderate amount of membership information for infrequent training data inputs. We extensively evaluate our attacks on various state-of-the-art embedding models in the text domain. We also propose and evaluate defenses that can prevent the leakage to some extent at a minor cost in utility.\n\n![InformationLeakage](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_4b996cea9451.png)\n\n\n* [Closing the AI Accountability Gap: Defining an End-to-End Framework for Internal Algorithmic Auditing](https:\u002F\u002Farxiv.org\u002Fabs\u002F2001.00973); Inioluwa Deborah Raji, et. al. Rising concern for the societal implications of artificial intelligence systems has inspired a wave of academic and journalistic literature in which deployed systems are audited for harm by investigators from outside the organizations deploying the algorithms. However, it remains challenging for practitioners to identify the harmful repercussions of their own systems prior to deployment, and, once deployed, emergent issues can become difficult or impossible to trace back to their source. In this paper, we introduce a framework for algorithmic auditing that supports artificial intelligence system development end-to-end, to be applied throughout the internal organization development lifecycle. Each stage of the audit yields a set of documents that together form an overall audit report, drawing on an organization's values or principles to assess the fit of decisions made throughout the process. \n\n![AlgorithmicAuditing](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_f20190352c48.png)\n\n* [Explaining the Explainer: A First Theoretical Analysis of LIME](https:\u002F\u002Farxiv.org\u002Fabs\u002F2001.03447); Damien Garreau, Ulrike von Luxburg; Machine learning is used more and more often for sensitive applications, sometimes replacing humans in critical decision-making processes. As such, interpretability of these algorithms is a pressing need. One popular algorithm to provide interpretability is LIME (Local Interpretable Model-Agnostic Explanation). In this paper, we provide the first theoretical analysis of LIME. We derive closed-form expressions for the coefficients of the interpretable model when the function to explain is linear. The good news is that these coefficients are proportional to the gradient of the function to explain: LIME indeed discovers meaningful features. However, our analysis also reveals that poor choices of parameters can lead LIME to miss important features. \n\n![extLIME](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_036499b457ed.png)\n\n### 2019\n\n* [bLIMEy: Surrogate Prediction Explanations Beyond LIME?](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1910.13016.pdf); Kacper Sokol, Alexander Hepburn, Raul Santos-Rodriguez, Peter Flach. Surrogate explainers of black-box machine learning predictions are of paramount importance in the field of eXplainable Artificial Intelligence since they can be applied to any type of data (images, text and tabular), are model-agnostic and are post-hoc (i.e., can be retrofitted). The Local Interpretable Model-agnostic Explanations (LIME) algorithm is often mistakenly unified with a more general framework of surrogate explainers, which may lead to a belief that it is the solution to surrogate explainability. In this paper we empower the community to \"build LIME yourself\" (bLIMEy) by proposing a principled algorithmic framework for building custom local surrogate explainers of black-box model predictions, including LIME itself. To this end, we demonstrate how to decompose the surrogate explainers family into algorithmically independent and interoperable modules and discuss the influence of these component choices on the functional capabilities of the resulting explainer, using the example of LIME.\n\n![bLIMEy](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_9132ffb20851.png)\n\n* [Are Sixteen Heads Really Better than One?](https:\u002F\u002Farxiv.org\u002Fabs\u002F1905.10650); Paul Michel, Omer Levy, Graham Neubig. Attention is a powerful and ubiquitous mechanism for allowing neural models to focus on particular salient pieces of information by taking their weighted average when making predictions. In particular, multi-headed attention is a driving force behind many recent state-of-the-art NLP models such as Transformer-based MT models and BERT. In this paper we make the surprising observation that even if models have been trained using multiple heads, in practice, a large percentage of attention heads can be removed at test time without significantly impacting performance. In fact, some layers can even be reduced to a single head. We further examine greedy algorithms for pruning down models, and the potential speed, memory efficiency, and accuracy improvements obtainable therefrom. \n\n![DoWeNeed16Heads](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_20be89a3848e.png)\n\n\n* [Revealing the Dark Secrets of BERT](https:\u002F\u002Farxiv.org\u002Fabs\u002F1908.08593); Olga Kovaleva, Alexey Romanov, Anna Rogers, Anna Rumshisky. BERT-based architectures currently give state-of-the-art performance on many NLP tasks, but little is known about the exact mechanisms that contribute to its success. In the current work, we focus on the interpretation of self-attention, which is one of the fundamental underlying components of BERT. Using a subset of GLUE tasks and a set of handcrafted features-of-interest, we propose the methodology and carry out a qualitative and quantitative analysis of the information encoded by the individual BERT's heads. Our findings suggest that there is a limited set of attention patterns that are repeated across different heads, indicating the overall model overparametrization. While different heads consistently use the same attention patterns, they have varying impact on performance across different tasks. We show that manually disabling attention in certain heads leads to a performance improvement over the regular fine-tuned BERT models.\n\n![DarkSecrets](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_d192ef9cbbd6.png)\n\n* [Explanation in Artificial Intelligence:Insights from the Social Sciences](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1706.07269.pdf); Tim Miller. There has been a recent resurgence in the area of explainable artificial intelligence as researchers and practitioners seek to make their algorithms more understandable. Much of this research is focused on explicitly explaining decisions or actions to a human observer, and it should not be controversial to say that looking at how humans explain to each other can serve as a useful starting point for explanation in artificial intelligence. However, it is fair to say that most work in explainable artificial intelligence uses only the researchers' intuition of what constitutes a `good' explanation. There exists vast and valuable bodies of research in philosophy, psychology, and cognitive science of how people define, generate, select, evaluate, and present explanations, which argues that people employ certain cognitive biases and social expectations towards the explanation process. This paper argues that the field of explainable artificial intelligence should build on this existing research, and reviews relevant papers from philosophy, cognitive psychology\u002Fscience, and social psychology, which study these topics. \n\n![SocialSciences4XAI](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_2c899036fb3e.png)\n\n![SocialSciences4XAI2](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_e78ec9aebd0a.png)\n\n* [AnchorViz: Facilitating Semantic Data Exploration and Concept Discovery for Interactive Machine Learning](https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Fresearch\u002Fpublication\u002Fanchorviz-facilitating-semantic-data-exploration-and-concept-discovery-for-interactive-machine-learning\u002F); Jina Suh et. al., When building a classifier in interactive machine learning (iML), human knowledge about the target class can be a powerful reference to make the classifier robust to unseen items. The main challenge lies in finding unlabeled items that can either help discover or refine concepts for which the current classifier has no corresponding features (i.e., it has feature blindness). Yet it is unrealistic to ask humans to come up with an exhaustive list of items, especially for rare concepts that are hard to recall. This article presents AnchorViz, an interactive visualization that facilitates the discovery of prediction errors and previously unseen concepts through human-driven semantic data exploration.\n\n![AnchorViz](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_1c91f0644700.png)\n\n* [Randomized Ablation Feature Importance](https:\u002F\u002Farxiv.org\u002Fabs\u002F1910.00174); Luke Merrick; Given a model f that predicts a target y from a vector of input features x=x1,x2,…,xM, we seek to measure the importance of each feature with respect to the model's ability to make a good prediction. To this end, we consider how (on average) some measure of goodness or badness of prediction (which we term \"loss\"), changes when we hide or ablate each feature from the model. To ablate a feature, we replace its value with another possible value randomly. By averaging over many points and many possible replacements, we measure the importance of a feature on the model's ability to make good predictions. Furthermore, we present statistical measures of uncertainty that quantify how confident we are that the feature importance we measure from our finite dataset and finite number of ablations is close to the theoretical true importance value. \n\n* [Explainable AI for Trees: From Local Explanations to Global Understanding](https:\u002F\u002Farxiv.org\u002Fabs\u002F1905.04610); Scott M. Lundberg, Gabriel Erion, Hugh Chen, Alex DeGrave, Jordan M. Prutkin, Bala Nair, Ronit Katz, Jonathan Himmelfarb, Nisha Bansal, Su-In Lee; Tree-based machine learning models such as random forests, decision trees, and gradient boosted trees are the most popular non-linear predictive models used in practice today, yet comparatively little attention has been paid to explaining their predictions. Here we significantly improve the interpretability of tree-based models through three main contributions: 1) The first polynomial time algorithm to compute optimal explanations based on game theory. 2) A new type of explanation that directly measures local feature interaction effects. 3) A new set of tools for understanding global model structure based on combining many local explanations of each prediction. We apply these tools to three medical machine learning problems and show how combining many high-quality local explanations allows us to represent global structure while retaining local faithfulness to the original model. These tools enable us to i) identify high magnitude but low frequency non-linear mortality risk factors in the general US population, ii) highlight distinct population sub-groups with shared risk characteristics, iii) identify non-linear interaction effects among risk factors for chronic kidney disease, and iv) monitor a machine learning model deployed in a hospital by identifying which features are degrading the model's performance over time. Given the popularity of tree-based machine learning models, these improvements to their interpretability have implications across a broad set of domains. [GitHub](https:\u002F\u002Fgithub.com\u002Fsuinleelab\u002Ftreeexplainer-study)\n\n![treeeexplainerpr](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_751c05eee74a.png)\n\n* [One Explanation Does Not Fit All: A Toolkit and Taxonomy of AI Explainability Techniques](https:\u002F\u002Farxiv.org\u002Fabs\u002F1909.03012); Vijay Arya, Rachel K. E. Bellamy, Pin-Yu Chen, Amit Dhurandhar, Michael Hind, Samuel C. Hoffman, Stephanie Houde, Q. Vera Liao, Ronny Luss, Aleksandra Mojsilović, Sami Mourad, Pablo Pedemonte, Ramya Raghavendra, John Richards, Prasanna Sattigeri, Karthikeyan Shanmugam, Moninder Singh, Kush R. Varshney, Dennis Wei, Yunfeng Zhang; \nAs artificial intelligence and machine learning algorithms make further inroads into society, calls are increasing from multiple stakeholders for these algorithms to explain their outputs. At the same time, these stakeholders, whether they be affected citizens, government regulators, domain experts, or system developers, present different requirements for explanations. Toward addressing these needs, we introduce AI Explainability 360 (this http URL), an open-source software toolkit featuring eight diverse and state-of-the-art explainability methods and two evaluation metrics. Equally important, we provide a taxonomy to help entities requiring explanations to navigate the space of explanation methods, not only those in the toolkit but also in the broader literature on explainability. For data scientists and other users of the toolkit, we have implemented an extensible software architecture that organizes methods according to their place in the AI modeling pipeline. We also discuss enhancements to bring research innovations closer to consumers of explanations, ranging from simplified, more accessible versions of algorithms, to tutorials and an interactive web demo to introduce AI explainability to different audiences and application domains. Together, our toolkit and taxonomy can help identify gaps where more explainability methods are needed and provide a platform to incorporate them as they are developed. \n[GitHub](https:\u002F\u002Fgithub.com\u002FIBM\u002FAIX360); [Demo](http:\u002F\u002Faix360.mybluemix.net\u002Fdata)\n\n![aix360](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_8dcf5ea2ca62.png)\n\n* [LIRME: Locally Interpretable Ranking Model Explanation](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002F10.1145\u002F3331184.3331377); Manisha Verma, Debasis Ganguly; Information retrieval (IR) models often employ complex variations in term weights to compute an aggregated similarity score of a query-document pair. Treating IR models as black-boxes makes it difficult to understand or explain why certain documents are retrieved at top-ranks for a given query. Local explanation models have emerged as a popular means to understand individual predictions of classification models. However, there is no systematic investigation that learns to interpret IR models, which is in fact the core contribution of our work in this paper. We explore three sampling methods to train an explanation model and propose two metrics to evaluate explanations generated for an IR model. Our experiments reveal some interesting observations, namely that a) diversity in samples is important for training local explanation models, and b) the stability of a model is inversely proportional to the number of parameters used to explain the model.\n\n* [Understanding complex predictive models with Ghost Variables](https:\u002F\u002Farxiv.org\u002Fabs\u002F1912.06407); Pedro Delicado, Daniel Peña; Procedure for assigning a relevance measure to each explanatory variable in a complex predictive model. We assume that we have a training set to fit the model and a test set to check the out of sample performance. First, the individual relevance of each variable is computed by comparing the predictions in the test set, given by the model that includes all the variables with those of another model in which the variable of interest is substituted by its ghost variable, defined as the prediction of this variable by using the rest of explanatory variables. Second, we check the joint effects among the variables by using the eigenvalues of a relevance matrix that is the covariance matrix of the vectors of individual effects. It is shown that in simple models, as linear or additive models, the proposed measures are related to standard measures of significance of the variables and in neural networks models (and in other algorithmic prediction models) the procedure provides information about the joint and individual effects of the variables that is not usually available by other methods.\n\n![ghostVariables](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_2044715397d1.png)\n\n* [Unmasking Clever Hans predictors and assessing what machines really learn](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41467-019-08987-4); Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, Klaus-Robert Müller; Current learning machines have successfully solved hard application problems, reaching high accuracy and displaying seemingly intelligent behavior. Here we apply recent techniques for explaining decisions of state-of-the-art learning machines and analyze various tasks from computer vision and arcade games. This showcases a spectrum of problem-solving behaviors ranging from naive and short-sighted, to well-informed and strategic. We observe that standard performance evaluation metrics can be oblivious to distinguishing these diverse problem solving behaviors. Furthermore, we propose our semi-automated Spectral Relevance Analysis that provides a practically effective way of characterizing and validating the behavior of nonlinear learning machines. This helps to assess whether a learned model indeed delivers reliably for the problem that it was conceived for. Furthermore, our work intends to add a voice of caution to the ongoing excitement about machine intelligence and pledges to evaluate and judge some of these recent successes in a more nuanced manner.\n\n![spray](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_8364bc190380.png)\n\n* [Feature Impact for Prediction Explanation](https:\u002F\u002Fwww.researchgate.net\u002Fpublication\u002F335189270_Feature_Impact_for_Prediction_Explanation); Mohammad Bataineh; Companies across the globe have been adapting complex Machine Learning (ML) techniques to build advanced predictive models to improve their operations and services and help in decision making. While these ML techniques are extremely powerful and have found success in different industries for helping with decision making, a common feedback heard across many industries worldwide is that too often these techniques are opaque in nature with no details as to why a particular prediction probability was reached. T his work presents an innovative algorithm that addresses this limitation by providing a ranked list of all features according to their contribution to a model's prediction. T his new algorithm, Feature Impact for Prediction Explanation (FIPE), incorporates individual feature variations and correlations to calculate feature imp act for a prediction. T he true power of FIPE lies in its computationally-efficient ability to provide feature impact irrespective of the base ML technique used.\n\n![FIPE](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_6856590a0323.png)\n\n* [Relative Attributing Propagation: Interpreting the Comparative Contributions of Individual Units in Deep Neural Networks](https:\u002F\u002Farxiv.org\u002Fabs\u002F1904.00605); Woo-Jeoung Nam, Shir Gur, Jaesik Choi, Lior Wolf, Seong-Whan Lee; As Deep Neural Networks (DNNs) have demonstrated superhuman performance in a variety of fields, there is an increasing interest in understanding the complex internal mechanisms of DNNs. In this paper, we propose Relative Attributing Propagation (RAP), which decomposes the output predictions of DNNs with a new perspective of separating the relevant (positive) and irrelevant (negative) attributions according to the relative influence between the layers. The relevance of each neuron is identified with respect to its degree of contribution, separated into positive and negative, while preserving the conservation rule. Considering the relevance assigned to neurons in terms of relative priority, RAP allows each neuron to be assigned with a bi-polar importance score concerning the output: from highly relevant to highly irrelevant. Therefore, our method makes it possible to interpret DNNs with much clearer and attentive visualizations of the separated attributions than the conventional explaining methods. To verify that the attributions propagated by RAP correctly account for each meaning, we utilize the evaluation metrics: (i) Outside-inside relevance ratio, (ii) Segmentation mIOU and (iii) Region perturbation. In all experiments and metrics, we present a sizable gap in comparison to the existing literature.\n\n![Relative_Attributing_Propagation](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_7cf15fffab41.png)\n\n* [The Bouncer Problem: Challenges to Remote Explainability](https:\u002F\u002Farxiv.org\u002Fabs\u002F1910.01432v1); Erwan Le Merrer, Gilles Tredan; The concept of explainability is envisioned to satisfy society's demands for transparency on machine learning decisions. The concept is simple: like humans, algorithms should explain the rationale behind their decisions so that their fairness can be assessed. While this approach is promising in a local context (e.g. to explain a model during debugging at training time), we argue that this reasoning cannot simply be transposed in a remote context, where a trained model by a service provider is only accessible through its API. This is problematic as it constitutes precisely the target use-case requiring transparency from a societal perspective. Through an analogy with a club bouncer (which may provide untruthful explanations upon customer reject), we show that providing explanations cannot prevent a remote service from lying about the true reasons leading to its decisions. More precisely, we prove the impossibility of remote explainability for single explanations, by constructing an attack on explanations that hides discriminatory features to the querying user. We provide an example implementation of this attack. We then show that the probability that an observer spots the attack, using several explanations for attempting to find incoherences, is low in practical settings. This undermines the very concept of remote explainability in general. \n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_50e90ef5a10d.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_50e90ef5a10d.png)\n\n\n* [Understanding Black-box Predictions via Influence Functions](https:\u002F\u002Farxiv.org\u002Fabs\u002F1703.04730); Pang Wei Koh, Percy Liang;  How can we explain the predictions of a black-box model? In this paper, we use influence functions -- a classic technique from robust statistics -- to trace a model's prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction. To scale up influence functions to modern machine learning settings, we develop a simple, efficient implementation that requires only oracle access to gradients and Hessian-vector products. We show that even on non-convex and non-differentiable models where the theory breaks down, approximations to influence functions can still provide valuable information. On linear models and convolutional neural networks, we demonstrate that influence functions are useful for multiple purposes: understanding model behavior, debugging models, detecting dataset errors, and even creating visually-indistinguishable training-set attacks. \n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_85feb97fb7ff.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_85feb97fb7ff.png)\n\n* [Towards XAI: Structuringthe Processes of Explanations](https:\u002F\u002Fwww.researchgate.net\u002Fprofile\u002FMennatallah_El-Assady\u002Fpublication\u002F332802468_Towards_XAI_Structuring_the_Processes_of_Explanations\u002Flinks\u002F5ccad56b92851c8d22146613\u002FTowards-XAI-Structuring-the-Processes-of-Explanations.pdf); Mennatallah El-Assady, et al; Explainable Artificial Intelligence describes aprocessto reveal the logical propagation of operationsthat transform a given input to a certain output. In this paper, we investigate the design space ofexplanation processes based on factors gathered from six research areas, namely, Pedagogy, Story-telling, Argumentation, Programming, Trust-Building, and Gamification. We contribute a conceptualmodel describing the building blocks of explanation processes, including a comprehensive overview ofexplanation and verification phases, pathways, mediums, and strategies. We further argue for theimportance of studying effective methods of explainable machine learning, and discuss open researchchallenges and opportunities.\n\n\u003Ccenter>\u003Cimg width=\"500px\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_d1c77c96e0f7.png\">\u003C\u002Fcenter>\n\n* [Towards Automated Machine Learning: Evaluation and Comparison of AutoML Approaches and Tools](https:\u002F\u002Farxiv.org\u002Fabs\u002F1908.05557); Anh Truong, Austin Walters, Jeremy Goodsitt, Keegan Hines, C. Bayan Bruss, Reza Farivar; There has been considerable growth and interest in industrial applications of machine learning (ML) in recent years. ML engineers, as a consequence, are in high demand across the industry, yet improving the efficiency of ML engineers remains a fundamental challenge. Automated machine learning (AutoML) has emerged as a way to save time and effort on repetitive tasks in ML pipelines, such as data pre-processing, feature engineering, model selection, hyperparameter optimization, and prediction result analysis. In this paper, we investigate the current state of AutoML tools aiming to automate these tasks. We conduct various evaluations of the tools on many datasets, in different data segments, to examine their performance, and compare their advantages and disadvantages on different test cases. \n\n\u003Ccenter>\u003Cimg width=\"500px\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_d1c77c96e0f7.png\">\u003C\u002Fcenter>\n\n* [Intelligible Models for HealthCare: Predicting PneumoniaRisk and Hospital 30-day Readmission](https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Fresearch\u002Fwp-content\u002Fuploads\u002F2017\u002F06\u002FKDD2015FinalDraftIntelligibleModels4HealthCare_igt143e-caruanaA.pdf); Rich Caruana et al;  In machine learning often a tradeoff must be made betweenaccuracy and intelligibility.  More accurate models such as boosted trees,  random forests,  and neural nets usually arenot intelligible, but more intelligible models such as logistic regression, naive-Bayes, and single decision trees often havesignificantly worse accuracy.  This tradeoff sometimes limits the accuracy of models that can be applied in mission-criticalapplications such as healthcare where being able to under-stand, validate, edit, and trust a learned model is important.We present two case studies where high-performance gener-alized additive models with pairwise interactions (GA2Ms) are applied to real healthcare problems yielding intelligiblemodels  with  state-of-the-art  accuracy.   In  the  pneumoniarisk  prediction  case  study,  the  intelligible  model  uncoverssurprising  patterns  in  the  data  that  previously  had  pre-vented  complex  learned  models  from  being  fielded  in  thisdomain,  but  because  it  is  intelligible  and  modular  allowsthese  patterns  to  be  recognized  and  removed.   In  the  30-day hospital readmission case study, we show that the samemethods scale to large datasets containing hundreds of thou-sands of patients and thousands of attributes while remaining  intelligible  and  providing  accuracy  comparable  to  the best (unintelligible) machine learning methods .\n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_6efa5bd55b8a.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_6efa5bd55b8a.png)\n\n* [Shapley Decomposition of R-Squared in Machine Learning Models](https:\u002F\u002Farxiv.org\u002Fabs\u002F1908.09718); Nickalus Redell; In this paper we introduce a metric aimed at helping machine learning practitioners quickly summarize and communicate the overall importance of each feature in any black-box machine learning prediction model. Our proposed metric, based on a Shapley-value variance decomposition of the familiar R2 from classical statistics, is a model-agnostic approach for assessing feature importance that fairly allocates the proportion of model-explained variability in the data to each model feature. This metric has several desirable properties including boundedness at 0 and 1 and a feature-level variance decomposition summing to the overall model R2. Our implementation is available in the R package shapFlex. \n\n* [Data Shapley: Equitable Valuation of Data for Machine Learning](https:\u002F\u002Farxiv.org\u002Fabs\u002F1904.02868); Amirata Ghorbani, James Zou; As data becomes the fuel driving technological and economic growth, a fundamental challenge is how to quantify the value of data in algorithmic predictions and decisions. For example, in healthcare and consumer markets, it has been suggested that individuals should be compensated for the data that they generate, but it is not clear what is an equitable valuation for individual data. In this work, we develop a principled framework to address data valuation in the context of supervised machine learning. Given a learning algorithm trained on n data points to produce a predictor, we propose data Shapley as a metric to quantify the value of each training datum to the predictor performance. Data Shapley value uniquely satisfies several natural properties of equitable data valuation. We develop Monte Carlo and gradient-based methods to efficiently estimate data Shapley values in practical settings where complex learning algorithms, including neural networks, are trained on large datasets. In addition to being equitable, extensive experiments across biomedical, image and synthetic data demonstrate that data Shapley has several other benefits: 1) it is more powerful than the popular leave-one-out or leverage score in providing insight on what data is more valuable for a given learning task; 2) low Shapley value data effectively capture outliers and corruptions; 3) high Shapley value data inform what type of new data to acquire to improve the predictor. \n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_7c6c751d165e.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_7c6c751d165e.png)\n\n* [A Stratification Approach to Partial Dependence for Codependent Variables](https:\u002F\u002Farxiv.org\u002Fabs\u002F1907.06698); Terence Parr, James Wilson;  Model interpretability is important to machine learning practitioners, and a key component of interpretation is the characterization of partial dependence of the response variable on any subset of features used in the model. The two most common strategies for assessing partial dependence suffer from a number of critical weaknesses. In the first strategy, linear regression model coefficients describe how a unit change in an explanatory variable changes the response, while holding other variables constant. But, linear regression is inapplicable for high dimensional (p>n) data sets and is often insufficient to capture the relationship between explanatory variables and the response. In the second strategy, Partial Dependence (PD) plots and Individual Conditional Expectation (ICE) plots give biased results for the common situation of codependent variables and they rely on fitted models provided by the user. When the supplied model is a poor choice due to systematic bias or overfitting, PD\u002FICE plots provide little (if any) useful information. To address these issues, we introduce a new strategy, called StratPD, that does not depend on a user's fitted model, provides accurate results in the presence codependent variables, and is applicable to high dimensional settings. The strategy works by stratifying a data set into groups of observations that are similar, except in the variable of interest, through the use of a decision tree. Any fluctuations of the response variable within a group is likely due to the variable of interest. We apply StratPD to a collection of simulations and case studies to show that StratPD is a fast, reliable, and robust method for assessing partial dependence with clear advantages over state-of-the-art methods. \n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_29fd04707d99.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_29fd04707d99.png)\n\n* [DLIME: A Deterministic Local Interpretable Model-Agnostic Explanations Approach for Computer-Aided Diagnosis Systems](https:\u002F\u002Farxiv.org\u002Fabs\u002F1906.10263); Muhammad Rehman Zafar, Naimul Mefraz Khan;  While LIME and similar local algorithms have gained popularity due to their simplicity, the random perturbation and feature selection methods result in \"instability\" in the generated explanations, where for the same prediction, different explanations can be generated. This is a critical issue that can prevent deployment of LIME in a Computer-Aided Diagnosis (CAD) system, where stability is of utmost importance to earn the trust of medical professionals. In this paper, we propose a deterministic version of LIME. Instead of random perturbation, we utilize agglomerative Hierarchical Clustering (HC) to group the training data together and K-Nearest Neighbour (KNN) to select the relevant cluster of the new instance that is being explained. After finding the relevant cluster, a linear model is trained over the selected cluster to generate the explanations. Experimental results on three different medical datasets show the superiority for Deterministic Local Interpretable Model-Agnostic Explanations (DLIME), where we quantitatively determine the stability of DLIME compared to LIME utilizing the Jaccard similarity among multiple generated explanations.\n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_a9f2c7c78154.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_a9f2c7c78154.png)\n\n* [Exploiting patterns to explain individual predictions](https:\u002F\u002Fpeople.eng.unimelb.edu.au\u002Fbaileyj\u002Fpapers\u002FKAIS2019.pdf); Yunzhe Jia, James Bailey, Kotagiri Ramamohanarao, Christopher Leckie, Xingjun Ma;  Users  need  to  understand  the  predictions  of  a  classifier,  especially  when  decisions based on the predictions can have severe consequences. The explanation of a prediction reveals the reason why a classifier makes a certain prediction and it helps users to accept or reject the prediction with greater confidence. This paper proposes an explanation method called Pattern Aided Local  Explanation  (PALEX)  to  provide  instance-level  explanations  for  any  classifier.  PALEX takes a classifier, a test instance and a frequent pattern set summarizing the training data of the classifier as inputs, then outputs the supporting evidence that the classifier considers important for the prediction of the instance. To study the local behavior of a classifier in the vicinity of the test instance, PALEX uses the frequent pattern set from the training data as an extra input to guide generation of new synthetic samples in the vicinity of the test instance. Contrast patterns are also used in PALEX to identify locally discriminative features in the vicinity of a test instance. PALEXis particularly effective for scenarios where there exist multiple explanations.\n\n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_d73c1e951e4c.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_d73c1e951e4c.png)\n\n* [Fair is Better than Sensational:Man is to Doctor as Woman is to Doctor](https:\u002F\u002Farxiv.org\u002Fabs\u002F1905.09866); Malvina Nissim, Rik van Noord, Rob van der Goot; Analogies such as man is to king as woman is to X are often used to illustrate the amazing power of word embeddings. Concurrently, they have also exposed how strongly human biases are encoded in vector spaces built on natural language. While finding that queen is the answer to man is to king as woman is to X leaves us in awe, papers have also reported finding analogies deeply infused with human biases, like man is to computer programmer as woman is to homemaker, which instead leave us with worry and rage. In this work we show that,often unknowingly, embedding spaces have not been treated fairly. Through a series of simple experiments, we highlight practical and theoretical problems in previous works, and demonstrate that some of the most widely used biased analogies are in fact not supported by the data. \n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_421ae2cf95ac.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_421ae2cf95ac.png)\n\n* [Interpretable Counterfactual Explanations Guided by Prototypes](https:\u002F\u002Farxiv.org\u002Fabs\u002F1907.02584); Arnaud Van Looveren, Janis Klaise; We propose a fast, model agnostic method for finding interpretable counterfactual explanations of classifier predictions by using class prototypes. We show that class prototypes, obtained using either an encoder or through class specific k-d trees, significantly speed up the the search for counterfactual instances and result in more interpretable explanations. We introduce two novel metrics to quantitatively evaluate local interpretability at the instance level. We use these metrics to illustrate the effectiveness of our method on an image and tabular dataset, respectively MNIST and Breast Cancer Wisconsin (Diagnostic).\n* [Learning Explainable Models Using Attribution Priors](https:\u002F\u002Farxiv.org\u002Fabs\u002F1906.10670); Gabriel Erion, Joseph D. Janizek, Pascal Sturmfels, Scott Lundberg, Su-In Lee; Two important topics in deep learning both involve incorporating humans into the modeling process: Model priors transfer information from humans to a model by constraining the model's parameters; Model attributions transfer information from a model to humans by explaining the model's behavior. We propose connecting these topics with attribution priors, which allow humans to use the common language of attributions to enforce prior expectations about a model's behavior during training. We develop a differentiable axiomatic feature attribution method called expected gradients and show how to directly regularize these attributions during training. We demonstrate the broad applicability of attribution priors: 1) on image data, 2) on gene expression data, 3) on a health care dataset.  \n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_52b76eef8c9b.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_52b76eef8c9b.png)\n\n* [Guidelines for Responsible and Human-Centered Use of Explainable Machine Learning](https:\u002F\u002Farxiv.org\u002Fabs\u002F1906.03533); Patrick Hall; Explainable machine learning (ML) has been implemented in numerous open source and proprietary software packages and explainable ML is an important aspect of commercial predictive modeling. However, explainable ML can be misused, particularly as a faulty safeguard for harmful black-boxes, e.g. fairwashing, and for other malevolent purposes like model stealing. This text discusses definitions, examples, and guidelines that promote a holistic and human-centered approach to ML which includes interpretable (i.e. white-box ) models and explanatory, debugging, and disparate impact analysis techniques. \n\n* [Concept Tree: High-Level Representation of Variables for More Interpretable Surrogate Decision Trees](https:\u002F\u002Farxiv.org\u002Fabs\u002F1906.01297); Xavier Renard, Nicolas Woloszko, Jonathan Aigrain, Marcin Detyniecki; Interpretable surrogates of black-box predictors trained on high-dimensional tabular datasets can struggle to generate comprehensible explanations in the presence of correlated variables. We propose a model-agnostic interpretable surrogate that provides global and local explanations of black-box classifiers to address this issue. We introduce the idea of concepts as intuitive groupings of variables that are either defined by a domain expert or automatically discovered using correlation coefficients. Concepts are embedded in a surrogate decision tree to enhance its comprehensibility. \n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_afba0ab6a7cb.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_afba0ab6a7cb.png)\n\n* [The Secrets of Machine Learning: Ten Things You Wish You Had Known Earlier to be More Effective at Data Analysis](https:\u002F\u002Farxiv.org\u002Fabs\u002F1906.01998v1); Cynthia Rudin, David Carlson; Despite the widespread usage of machine learning throughout organizations, there are some key principles that are commonly missed. In particular: 1) There are at least four main families for supervised learning: logical modeling methods, linear combination methods, case-based reasoning methods, and iterative summarization methods. 2) For many application domains, almost all machine learning methods perform similarly (with some caveats). Deep learning methods, which are the leading technique for computer vision problems, do not maintain an edge over other methods for most problems (and there are reasons why). 3) Neural networks are hard to train and weird stuff often happens when you try to train them. 4) If you don't use an interpretable model, you can make bad mistakes. 5) Explanations can be misleading and you can't trust them. 6) You can pretty much always find an accurate-yet-interpretable model, even for deep neural networks. 7) Special properties such as decision making or robustness must be built in, they don't happen on their own. 8) Causal inference is different than prediction (correlation is not causation). 9) There is a method to the madness of deep neural architectures, but not always. 10) It is a myth that artificial intelligence can do anything. \n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_c0e7b5f18911.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_c0e7b5f18911.png)\n\n* [Proposals for model vulnerability and security](https:\u002F\u002Fwww.oreilly.com\u002Fideas\u002Fproposals-for-model-vulnerability-and-security); Patrick Hall;  Apply fair and private models, white-hat and forensic model debugging, and common sense to protect machine learning models from malicious actors. \n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_49cc095f9a50.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_49cc095f9a50.png)\n\n* [On Explainable Machine Learning Misconceptions and A More Human-Centered Machine Learning](https:\u002F\u002Fgithub.com\u002Fjphall663\u002Fxai_misconceptions\u002Fblob\u002Fmaster\u002Fxai_misconceptions.pdf); Patrick Hall; Due to obvious community and commercial demand, explainable machine learning (ML) methods have already been implemented in popular open source software and in commercial software. Yet, as someone who has been involved in the implementation of explainable ML software for the past three years, I find a lot of what I read about the topic confusing and detached from my personal, hands-on experiences. This short text presents arguments, proposals, and references to address some observed explainable ML misconceptions. \n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_c5da3cffce71.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_c5da3cffce71.png)\n\n* [Model Cards for Model Reporting](https:\u002F\u002Farxiv.org\u002Fabs\u002F1810.03993); Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, Timnit Gebru; Trained machine learning models are increasingly used to perform high-impact tasks in areas such as law enforcement, medicine, education, and employment. In order to clarify the intended use cases of machine learning models and minimize their usage in contexts for which they are not well suited, we recommend that released models be accompanied by documentation detailing their performance characteristics. In this paper, we propose a framework that we call model cards, to encourage such transparent model reporting. Model cards are short documents accompanying trained machine learning models that provide benchmarked evaluation in a variety of conditions, such as across different cultural, demographic, or phenotypic groups (e.g., race, geographic location, sex, Fitzpatrick skin type) and intersectional groups (e.g., age and race, or sex and Fitzpatrick skin type) that are relevant to the intended application domains. Model cards also disclose the context in which models are intended to be used, details of the performance evaluation procedures, and other relevant information. While we focus primarily on human-centered machine learning models in the application fields of computer vision and natural language processing, this framework can be used to document any trained machine learning model. To solidify the concept, we provide cards for two supervised models: One trained to detect smiling faces in images, and one trained to detect toxic comments in text. We propose model cards as a step towards the responsible democratization of machine learning and related AI technology, increasing transparency into how well AI technology works. We hope this work encourages those releasing trained machine learning models to accompany model releases with similar detailed evaluation numbers and other relevant documentation. \n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_6dd6997c2d53.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_6dd6997c2d53.png)\n\n* [Unbiased Measurement of Feature Importance in Tree-Based Methods](https:\u002F\u002Farxiv.org\u002Fabs\u002F1903.05179); Zhengze Zhou, Giles Hooker; We propose a modification that corrects for split-improvement variable importance measures in Random Forests and other tree-based methods. These methods have been shown to be biased towards increasing the importance of features with more potential splits. We show that by appropriately incorporating split-improvement as measured on out of sample data, this bias can be corrected yielding better summaries and screening tools. \n* [Please Stop Permuting Features: An Explanation and Alternatives](https:\u002F\u002Farxiv.org\u002Fabs\u002F1905.03151); Giles Hooker, Lucas Mentch;   This paper advocates against permute-and-predict (PaP) methods for interpreting black box functions. Methods such as the variable importance measures proposed for random forests, partial dependence plots, and individual conditional expectation plots remain popular because of their ability to provide model-agnostic measures that depend only on the pre-trained model output. However, numerous studies have found that these tools can produce diagnostics that are highly misleading, particularly when there is strong dependence among features. Rather than simply add to this growing literature by further demonstrating such issues, here we seek to provide an explanation for the observed behavior. In particular, we argue that breaking dependencies between features in hold-out data places undue emphasis on sparse regions of the feature space by forcing the original model to extrapolate to regions where there is little to no data. We explore these effects through various settings where a ground-truth is understood and find support for previous claims in the literature that PaP metrics tend to over-emphasize correlated features both in variable importance and partial dependence plots, even though applying permutation methods to the ground-truth models do not. As an alternative, we recommend more direct approaches that have proven successful in other settings: explicitly removing features, conditional permutations, or model distillation methods. \n    \n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_b7fc695a1c83.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_b7fc695a1c83.png)\n    \n* [Why should you trust my interpretation? Understanding uncertainty in LIME predictions](https:\u002F\u002Farxiv.org\u002Fabs\u002F1904.12991.pdf); Hui Fen (Sarah)Tan, Kuangyan Song, Madeilene Udell, Yiming Sun, Yujia Zhang; Methods for interpreting machine learning black-box models increase the outcomes' transparency and in turn generates insight into the reliability and fairness of the algorithms. However, the interpretations themselves could contain significant uncertainty that undermines the trust in the outcomes and raises concern about the model's reliability. Focusing on the method \"Local Interpretable Model-agnostic Explanations\" (LIME), we demonstrate the presence of two sources of uncertainty, namely the randomness in its sampling procedure and the variation of interpretation quality across different input data points. Such uncertainty is present even in models with high training and test accuracy. We apply LIME to synthetic data and two public data sets, text classification in 20 Newsgroup and recidivism risk-scoring in COMPAS, to support our argument. \n* [Aequitas: A Bias and Fairness Audit Toolkit](https:\u002F\u002Farxiv.org\u002Fabs\u002F1811.05577); Pedro Saleiro, Benedict Kuester, Loren Hinkson, Jesse London, Abby Stevens, Ari Anisfeld, Kit T. Rodolfa, Rayid Ghani; Recent work has raised concerns on the risk of unintended bias in AI systems being used nowadays that can affect individuals unfairly based on race, gender or religion, among other possible characteristics. While a lot of bias metrics and fairness definitions have been proposed in recent years, there is no consensus on which metric\u002Fdefinition should be used and there are very few available resources to operationalize them. Therefore, despite recent awareness, auditing for bias and fairness when developing and deploying AI systems is not yet a standard practice. We present Aequitas, an open source bias and fairness audit toolkit that is an intuitive and easy to use addition to the machine learning workflow, enabling users to seamlessly test models for several bias and fairness metrics in relation to multiple population sub-groups. Aequitas facilitates informed and equitable decisions around developing and deploying algorithmic decision making systems for both data scientists, machine learning researchers and policymakers. \n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_016e18cc2424.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_016e18cc2424.png)\n\n* [Variable Importance Clouds: A Way to Explore Variable Importance for the Set of Good Models](https:\u002F\u002Farxiv.org\u002Fabs\u002F1901.03209); Jiayun Dong, Cynthia Rudin; Variable importance is central to scientific studies, including the social sciences and causal inference, healthcare, and in other domains. However, current notions of variable importance are often tied to a specific predictive model. This is problematic: what if there were multiple well-performing predictive models, and a specific variable is important to some of them and not to others? In that case, we may not be able to tell from a single well-performing model whether a variable is always important in predicting the outcome. Rather than depending on variable importance for a single predictive model, we would like to explore variable importance for all approximately-equally-accurate predictive models. This work introduces the concept of a variable importance cloud, which maps every variable to its importance for every good predictive model. We show properties of the variable importance cloud and draw connections other areas of statistics. We introduce variable importance diagrams as a projection of the variable importance cloud into two dimensions for visualization purposes. \n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_fe00cef53fcb.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_fe00cef53fcb.png)\n\n* [A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models](https:\u002F\u002Fwww.sciencedirect.com\u002Fscience\u002Farticle\u002Fpii\u002FS0895435618310813); Evangelia Christodoulou, Jie Ma, Gary Collins, Ewout  Steyerberg, Jan Yerbakela, Ben Van Calster; Objectives: The objective of this study was to compare performance of logistic regression (LR) with machine learning (ML) for clinical prediction modeling in the literature. Study Design and Setting: We conducted a Medline literature search (1\u002F2016 to 8\u002F2017) and extracted comparisons between LR and ML models for binary outcomes. Results: We included 71 of 927 studies. The median sample size was 1,250 (range 72–3,994,872), with 19 predictors considered (range 5–563) and eight events per predictor (range 0.3–6,697). The most common ML methods were classification trees, random forests, artificial neural networks, and support vector machines. In 48 (68%) studies, we observed potential bias in the validation procedures. Sixty-four (90%) studies used the area under the receiver operating characteristic curve (AUC) to assess discrimination. Calibration was not addressed in 56 (79%) studies. We identified 282 comparisons between an LR and ML model (AUC range, 0.52–0.99). For 145 comparisons at low risk of bias, the difference in logit(AUC) between LR and ML was 0.00 (95% confidence interval, −0.18 to 0.18). For 137 comparisons at high risk of bias, logit(AUC) was 0.34 (0.20–0.47) higher for ML. Conclusion: We found no evidence of superior performance of ML over LR. Improvements in methodology and reporting are needed for studies that compare modeling algorithms.\n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_96060aa30437.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_96060aa30437.png)\n\n* [iBreakDown: Uncertainty of Model Explanations for Non-additive Predictive Models](https:\u002F\u002Farxiv.org\u002Fabs\u002F1903.11420); Alicja Gosiewska, Przemyslaw Biecek; Explainable Artificial Intelligence (XAI) brings a lot of attention recently. Explainability is being presented as a remedy for lack of trust in model predictions. Model agnostic tools such as LIME, SHAP, or Break Down promise instance level interpretability for any complex machine learning model. But how certain are these explanations? Can we rely on additive explanations for non-additive models? In this paper, we examine the behavior of model explainers under the presence of interactions. We define two sources of uncertainty, model level uncertainty, and explanation level uncertainty. We show that adding interactions reduces explanation level uncertainty. We introduce a new method iBreakDown that generates non-additive explanations with local interaction. \n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_3d27e10c0441.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_3d27e10c0441.png)\n\n* [Sampling, Intervention, Prediction, Aggregation: A Generalized Framework for Model Agnostic Interpretations](https:\u002F\u002Farxiv.org\u002Fabs\u002F1904.03959); Christian A. Scholbeck, Christoph Molnar, Christian Heumann, Bernd Bischl, Giuseppe Casalicchio; Non-linear machine learning models often trade off a great predictive performance for a lack of interpretability. However, model agnostic interpretation techniques now allow us to estimate the effect and importance of features for any predictive model. Different notations and terminology have complicated their understanding and how they are related. A unified view on these methods has been missing. We present the generalized SIPA (Sampling, Intervention, Prediction, Aggregation) framework of work stages for model agnostic interpretation techniques and demonstrate how several prominent methods for feature effects can be embedded into the proposed framework. We also formally introduce pre-existing marginal effects to describe feature effects for black box models. Furthermore, we extend the framework to feature importance computations by pointing out how variance-based and performance-based importance measures are based on the same work stages. The generalized framework may serve as a guideline to conduct model agnostic interpretations in machine learning. \n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_e7f6e4e95509.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_e7f6e4e95509.png)\n\n* [Quantifying Interpretability of Arbitrary Machine Learning Models Through Functional Decomposition](https:\u002F\u002Farxiv.org\u002Fabs\u002F1904.03867); Christoph Molnar, Giuseppe Casalicchio, Bernd Bischl; To obtain interpretable machine learning models, either interpretable models are constructed from the outset - e.g. shallow decision trees, rule lists, or sparse generalized linear models - or post-hoc interpretation methods - e.g. partial dependence or ALE plots - are employed. Both approaches have disadvantages. While the former can restrict the hypothesis space too conservatively, leading to potentially suboptimal solutions, the latter can produce too verbose or misleading results if the resulting model is too complex, especially w.r.t. feature interactions. We propose to make the compromise between predictive power and interpretability explicit by quantifying the complexity \u002F interpretability of machine learning models. Based on functional decomposition, we propose measures of number of features used, interaction strength and main effect complexity. We show that post-hoc interpretation of models that minimize the three measures becomes more reliable and compact. Furthermore, we demonstrate the application of such measures in a multi-objective optimization approach which considers predictive power and interpretability at the same time. \n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_059f816b3e00.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_059f816b3e00.png)\n\n* [One pixel attack for fooling deep neural networks](https:\u002F\u002Farxiv.org\u002Fabs\u002F1710.08864); Jiawei Su, Danilo Vasconcellos Vargas, Sakurai Kouichi; Recent research has revealed that the output of Deep Neural Networks (DNN) can be easily altered by adding relatively small perturbations to the input vector. In this paper, we analyze an attack in an extremely limited scenario where only one pixel can be modified. For that we propose a novel method for generating one-pixel adversarial perturbations based on differential evolution(DE). It requires less adversarial information(a black-box attack) and can fool more types of networks due to the inherent features of DE. The results show that 68.36% of the natural images in CIFAR-10 test dataset and 41.22% of the ImageNet (ILSVRC 2012) validation images can be perturbed to at least one target class by modifying just one pixel with 73.22% and 5.52% confidence on average. Thus, the proposed attack explores a different take on adversarial machine learning in an extreme limited scenario, showing that current DNNs are also vulnerable to such low dimension attacks. Besides, we also illustrate an important application of DE (or broadly speaking, evolutionary computation) in the domain of adversarial machine learning: creating tools that can effectively generate low-cost adversarial attacks against neural networks for evaluating robustness. \n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_5eaaed96cbf2.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_5eaaed96cbf2.png)\n\n* [VINE: Visualizing Statistical Interactions in Black Box Models](https:\u002F\u002Farxiv.org\u002Fabs\u002F1904.00561); Matthew Britton; As machine learning becomes more pervasive, there is an urgent need for interpretable explanations of predictive models. Prior work has developed effective methods for visualizing global model behavior, as well as generating local (instance-specific) explanations. However, relatively little work has addressed regional explanations - how groups of similar instances behave in a complex model, and the related issue of visualizing statistical feature interactions. The lack of utilities available for these analytical needs hinders the development of models that are mission-critical, transparent, and align with social goals. We present VINE (Visual INteraction Effects), a novel algorithm to extract and visualize statistical interaction effects in black box models. We also present a novel evaluation metric for visualizations in the interpretable ML space. \n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_0e9c6127ab31.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_0e9c6127ab31.png)\n\n* [Clinical applications of machine learning algorithms: beyond the black box](https:\u002F\u002Fwww.bmj.com\u002Fcontent\u002F364\u002Fbmj.l886); David Watson et al;     Machine learning algorithms may radically improve our ability to diagnose and treat disease; For moral, legal, and scientific reasons, it is essential that doctors and patients be able to understand and explain the predictions of these models; Scalable, customisable, and ethical solutions can be achieved by working together with relevant stakeholders, including patients, data scientists, and policy makers\n* [ICIE 1.0: A Novel Tool for InteractiveContextual Interaction Explanations](http:\u002F\u002Fwwwis.win.tue.nl\u002F~wouter\u002FPubl\u002FW6-ICIE.pdf); Simon B. van der Zon et al; With the rise of new laws around privacy and awareness,explanation of automated decision making becomes increasingly impor-tant.  Nowadays, machine  learning  models  are  used  to  aid  experts  indomains such as banking and insurance to find suspicious transactions,approve loans and credit card applications. Companies using such sys-tems have to be able to provide the rationale behind their decisions;blindly relying on the trained model is not sufficient. There are currentlya number of methods that provide insights in models and their decisions,but often they are either good at showing global or local behavior. Globalbehavior is often too complex to visualize or comprehend, so approxima-tions are shown, and visualizing local behavior is often misleading as itis difficult to define what local exactly means (i.e. our methods don’t“know” how easily a feature-value can be changed; which ones are flexi-ble, and which ones are static). We introduce theICIEframework (Inter-active Contextual Interaction Explanations) which enables users to viewexplanations of individual instances under differentcontexts.Wewillseethat various contexts for the same case lead to different explanations,revealing different feature interaction\n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_b5e6cf803dc3.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_b5e6cf803dc3.png)\n\n* [Explanation in Human-AI Systems: A Literature Meta-Review, Synopsis of Key Ideas and Publications, and Bibliography for Explainable AI](https:\u002F\u002Farxiv.org\u002Fabs\u002F1902.01876v1); Shane T. Mueller, Robert R. Hoffman, William Clancey, Abigail Emrey, Gary Klein; This is an integrative review that address the question, \"What makes for a good explanation?\" with reference to AI systems. Pertinent literatures are vast. Thus, this review is necessarily selective. That said, most of the key concepts and issues are expressed in this Report. The Report encapsulates the history of computer science efforts to create systems that explain and instruct (intelligent tutoring systems and expert systems). The Report expresses the explainability issues and challenges in modern AI, and presents capsule views of the leading psychological theories of explanation. Certain articles stand out by virtue of their particular relevance to XAI, and their methods, results, and key points are highlighted. \n* [Explaining Explanations: An Overview of Interpretability of Machine Learning](https:\u002F\u002Farxiv.org\u002Fabs\u002F1806.00069); Leilani H. Gilpin, David Bau, Ben Z. Yuan, Ayesha Bajwa, Michael Specter, Lalana Kagal; There has recently been a surge of work in explanatory artificial intelligence (XAI). This research area tackles the important problem that complex machines and algorithms often cannot provide insights into their behavior and thought processes. XAI allows users and parts of the internal system to be more transparent, providing explanations of their decisions in some level of detail. These explanations are important to ensure algorithmic fairness, identify potential bias\u002Fproblems in the training data, and to ensure that the algorithms perform as expected. However, explanations produced by these systems is neither standardized nor systematically assessed. In an effort to create best practices and identify open challenges, we provide our definition of explainability and show how it can be used to classify existing literature. We discuss why current approaches to explanatory methods especially for deep neural networks are insufficient. \n* [SAFE ML: Surrogate Assisted Feature Extraction for Model Learning](https:\u002F\u002Farxiv.org\u002Fabs\u002F1902.11035); Alicja Gosiewska, Aleksandra Gacek, Piotr Lubon, Przemyslaw Biecek; Complex black-box predictive models may have high accuracy, but opacity causes problems like lack of trust, lack of stability, sensitivity to concept drift. On the other hand, interpretable models require more work related to feature engineering, which is very time consuming. Can we train interpretable and accurate models, without timeless feature engineering? In this article, we show a method that uses elastic black-boxes as surrogate models to create a simpler, less opaque, yet still accurate and interpretable glass-box models. New models are created on newly engineered features extracted\u002Flearned with the help of a surrogate model. We show applications of this method for model level explanations and possible extensions for instance level explanations. We also present an example implementation in Python and benchmark this method on a number of tabular data sets.\n* [Attention is not Explanation](https:\u002F\u002Farxiv.org\u002Fabs\u002F1902.10186); Sarthak Jain, Byron C. Wallace; Attention mechanisms have seen wide adoption in neural NLP models. In addition to improving predictive performance, these are often touted as affording transparency: models equipped with attention provide a distribution over attended-to input units, and this is often presented (at least implicitly) as communicating the relative importance of inputs. However, it is unclear what relationship exists between attention weights and model outputs. In this work, we perform extensive experiments across a variety of NLP tasks that aim to assess the degree to which attention weights provide meaningful `explanations` for predictions. We find that they largely do not. For example, learned attention weights are frequently uncorrelated with gradient-based measures of feature importance, and one can identify very different attention distributions that nonetheless yield equivalent predictions. Our findings show that standard attention modules do not provide meaningful explanations and should not be treated as though they do. \n* [Efficient Search for Diverse Coherent Explanations](https:\u002F\u002Farxiv.org\u002Fabs\u002F1901.04909); Chris Russell; This paper proposes new search algorithms for counterfactual explanations based upon mixed integer programming. We are concerned with complex data in which variables may take any value from a contiguous range or an additional set of discrete states. We propose a novel set of constraints that we refer to as a \"mixed polytope\" and show how this can be used with an integer programming solver to efficiently find coherent counterfactual explanations i.e. solutions that are guaranteed to map back onto the underlying data structure, while avoiding the need for brute-force enumeration. We also look at the problem of diverse explanations and show how these can be generated within our framework.\n* [Seven Myths in Machine Learning Research](https:\u002F\u002Farxiv.org\u002Fabs\u002F1902.06789v1); Oscar Chang, Hod Lipson; As deep learning becomes more and more ubiquitous in high stakes applications like medical imaging, it is important to be careful of how we interpret decisions made by neural networks. For example, while it would be nice to have a CNN identify a spot on an MRI image as a malignant cancer-causing tumor, these results should not be trusted if they are based on fragile interpretation methods\n* [Towards Aggregating Weighted Feature Attributions](https:\u002F\u002Farxiv.org\u002Fabs\u002F1901.10040); Umang Bhatt, Pradeep Ravikumar, Jose M. F. Moura; Current approaches for explaining machine learning models fall into two distinct classes: antecedent event influence and value attribution. The former leverages training instances to describe how much influence a training point exerts on a test point, while the latter attempts to attribute value to the features most pertinent to a given prediction. In this work, we discuss an algorithm, AVA: Aggregate Valuation of Antecedents, that fuses these two explanation classes to form a new approach to feature attribution that not only retrieves local explanations but also captures global patterns learned by a model. \n* [An Evaluation of the Human-Interpretability of Explanation](https:\u002F\u002Farxiv.org\u002Fabs\u002F1902.00006v1); Isaac Lage, Emily Chen, Jeffrey He, Menaka Narayanan, Been Kim, Sam Gershman, Finale Doshi-Velez; What kinds of explanation are truly human-interpretable remains poorly understood. This work advances our understanding of what makes explanations interpretable under three specific tasks that users may perform with machine learning systems: simulation of the response, verification of a suggested response, and determining whether the correctness of a suggested response changes under a change to the inputs. Through carefully controlled human-subject experiments, we identify regularizers that can be used to optimize for the interpretability of machine learning systems. Our results show that the type of complexity matters: cognitive chunks (newly defined concepts) affect performance more than variable repetitions, and these trends are consistent across tasks and domains. This suggests that there may exist some common design principles for explanation systems.\n* [Interpretable machine learning: definitions, methods, and applications](https:\u002F\u002Fexport.arxiv.org\u002Fpdf\u002F1901.04592); W. James Murdocha, Chandan Singh, Karl Kumbiera, Reza Abbasi-As, and Bin Yu; Machine-learning models have demonstrated great success in learning complex patterns that enable them to make predictions about unobserved data. In addition to using models for prediction, the ability to interpret what a model has learned is receiving an increasing amount of attention. However, this increased focus has led to considerable confusion about the notion of interpretability. In particular, it is unclear how the wide array of proposed interpretation methods are related, and what common concepts can be used to evaluate them.\n* [Learning Optimal and Fair Decision Trees for Non-Discriminative Decision-Making](http:\u002F\u002Fwww-bcf.usc.edu\u002F~vayanou\u002Fpapers\u002F2019\u002FFair_DT_AAAI_2019_CameraReady.pdf); Sina Aghaei, Mohammad Javad Azizi, Phebe Vayanos; In recent years, automated data-driven decision-making systems have enjoyed a tremendous success in a variety of fields (e.g., to make product recommendations, or to guide the production of entertainment). More recently, these algorithms are increasingly being used to assist socially sensitive decisionmaking (e.g., to decide who to admit into a degree program or to prioritize individuals for public housing). Yet, these automated tools may result in discriminative decision-making in the sense that they may treat individuals unfairly or unequally based on membership to a category or a minority, resulting in disparate treatment or disparate impact and violating both moral and ethical standards. This may happen when the training dataset is itself biased (e.g., if individuals belonging to a particular group have historically been discriminated upon). However, it may also happen when the training dataset is unbiased, if the errors made by the system affect individuals belonging to a category or minority differently (e.g., if misclassification rates for Blacks are higher than for Whites). In this paper, we unify the definitions of unfairness across classification and regression. We propose a versatile mixed-integer optimization framework for learning optimal and fair decision trees and variants thereof to prevent disparate treatment and\u002For disparate impact as appropriate. This translates to a flexible schema for designing fair and interpretable policies suitable for socially sensitive decision-making. We conduct extensive computational studies that show that our framework improves the state-of-the-art in the field (which typically relies on heuristics) to yield non-discriminative decisions at lower cost to overall accuracy.\n\n* [Understanding Individual Decisions of CNNs via Contrastive Backpropagation](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1812.02100.pdf); Jindong Gu, Yinchong Yang, Volker Tresp; A number of backpropagation-based approaches such as DeConvNets, vanilla Gradient Visualization and Guided Backpropagation have been proposed to better understand individual decisions of deep convolutional neural networks. The saliency maps produced by them are proven to be non-discriminative. Recently, the Layer-wise Relevance Propagation (LRP) approach was proposed to explain the classification decisions of rectifier neural networks. In this work, we evaluate the discriminativeness of the generated explanations and analyze the theoretical foundation of LRP, i.e. Deep Taylor Decomposition. The experiments and analysis conclude that the explanations generated by LRP are not class-discriminative. Based on LRP, we propose Contrastive Layer-wise Relevance Propagation (CLRP), which is capable of producing instance-specific, class-discriminative, pixel-wise explanations. In the experiments, we use the CLRP to explain the decisions and understand the difference between neurons in individual classification decisions. We also evaluate the explanations quantitatively with a Pointing Game and an ablation study. Both qualitative and quantitative evaluations show that the CLRP generates better explanations than the LRP. The code is available.\n\n### 2018\n\n* [Conversational Explanations of Machine Learning PredictionsThrough Class-contrastive Counterfactual Statements](https:\u002F\u002Fwww.ijcai.org\u002Fproceedings\u002F2018\u002F0836.pdf); Kacper Sokol, Peter Flach;  Machine  learning  models  have  become  pervasivein our everyday life; they decide on important mat-ters influencing our education, employment and ju-dicial  system.   Many  of  these  predictive  systemsare commercial products protected by trade secrets,hence their decision-making is opaque.  Therefore,in our research we address interpretability and ex-plainability of predictions made by machine learn-ing models. Our work draws heavily on human ex-planation  research  in  social  sciences:   contrastiveand exemplar explanations provided through a di-alogue.  This user-centric design, focusing on a layaudience rather than domain experts, applied to ma-chine learning allows explainees to drive the expla-nation to suit their needs instead of being served aprecooked template.\n\n* [Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)](https:\u002F\u002Farxiv.org\u002Fabs\u002F1711.11279); Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, Rory Sayres; The interpretation of deep learning models is a challenge due to their size, complexity, and often opaque internal state. In addition, many systems, such as image classifiers, operate on low-level features rather than high-level concepts. To address these challenges, we introduce Concept Activation Vectors (CAVs), which provide an interpretation of a neural net's internal state in terms of human-friendly concepts. The key idea is to view the high-dimensional internal state of a neural net as an aid, not an obstacle. We show how to use CAVs as part of a technique, Testing with CAVs (TCAV), that uses directional derivatives to quantify the degree to which a user-defined concept is important to a classification result--for example, how sensitive a prediction of \"zebra\" is to the presence of stripes. Using the domain of image classification as a testing ground, we describe how CAVs may be used to explore hypotheses and generate insights for a standard image classification network as well as a medical application. [TowardsDataScience](https:\u002F\u002Ftowardsdatascience.com\u002Ftcav-interpretability-beyond-feature-attribution-79b4d3610b4d).\n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_ac8fe0645fa3.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_ac8fe0645fa3.png)\n\n* [Machine Decisions and Human Consequences](https:\u002F\u002Farxiv.org\u002Fabs\u002F1811.06747); Draft of a chapter that has been accepted for publication by Oxford University Press in the forthcoming book “Algorithmic Regulation”; Teresa Scantamburlo, Andrew Charlesworth, Nello Cristianini; The discussion here focuses primarily on the case of enforcement decisions in the criminal justice system, but draws on similar situations emerging from other algorithms utilised in controlling access to opportunities, to explain how machine learning works and, as a result, how decisions are made by modern intelligent algorithms or 'classifiers'. It examines the key aspects of the performance of classifiers, including how classifiers learn, the fact that they operate on the basis of correlation rather than causation, and that the term 'bias' in machine learning has a different meaning to common usage. An example of a real world 'classifier', the Harm Assessment Risk Tool (HART), is examined, through identification of its technical features: the classification method, the training data and the test data, the features and the labels, validation and performance measures. Four normative benchmarks are then considered by reference to HART: (a) prediction accuracy (b) fairness and equality before the law (c) transparency and accountability (d) informational privacy and freedom of expression, in order to demonstrate how its technical features have important normative dimensions that bear directly on the extent to which the system can be regarded as a viable and legitimate support for, or even alternative to, existing human decision-makers. \n\n* [Controversy Rules - Discovering Regions Where Classifiers (Dis-)Agree Exceptionally](https:\u002F\u002Farxiv.org\u002Fabs\u002F1808.07243); Oren Zeev-Ben-Mordehai, Wouter Duivesteijn, Mykola Pechenizkiy; Finding regions for which there is higher controversy among different classifiers is insightful with regards to the domain and our models. Such evaluation can falsify assumptions, assert some, or also, bring to the attention unknown phenomena. The present work describes an algorithm, which is based on the Exceptional Model Mining framework, and enables that kind of investigations. We explore several public datasets and show the usefulness of this approach in classification tasks. We show in this paper a few interesting observations about those well explored datasets, some of which are general knowledge, and other that as far as we know, were not reported before.  \n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_960942485b7a.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_960942485b7a.png)\n\n\n* [Stealing Hyperparameters in Machine Learning](https:\u002F\u002Farxiv.org\u002Fabs\u002F1802.05351); Binghui Wang, Neil Zhenqiang Gong; Hyperparameters are critical in machine learning, as different hyperparameters often result in models with significantly different performance. Hyperparameters may be deemed confidential because of their commercial value and the confidentiality of the proprietary algorithms that the learner uses to learn them. In this work, we propose attacks on stealing the hyperparameters that are learned by a learner. We call our attacks hyperparameter stealing attacks. Our attacks are applicable to a variety of popular machine learning algorithms such as ridge regression, logistic regression, support vector machine, and neural network. We evaluate the effectiveness of our attacks both theoretically and empirically. For instance, we evaluate our attacks on Amazon Machine Learning. Our results demonstrate that our attacks can accurately steal hyperparameters. We also study countermeasures. Our results highlight the need for new defenses against our hyperparameter stealing attacks for certain machine learning algorithms. \n* [Distill-and-Compare: Auditing Black-Box Models Using Transparent Model Distillation](https:\u002F\u002Farxiv.org\u002Fabs\u002F1710.06169); Black-box risk scoring models permeate our lives, yet are typically proprietary or opaque. We propose Distill-and-Compare, a model distillation and comparison approach to audit such models. To gain insight into black-box models, we treat them as teachers, training transparent student models to mimic the risk scores assigned by black-box models. We compare the student model trained with distillation to a second un-distilled transparent model trained on ground-truth outcomes, and use differences between the two models to gain insight into the black-box model. Our approach can be applied in a realistic setting, without probing the black-box model API. We demonstrate the approach on four public data sets: COMPAS, Stop-and-Frisk, Chicago Police, and Lending Club. We also propose a statistical test to determine if a data set is missing key features used to train the black-box model. Our test finds that the ProPublica data is likely missing key feature(s) used in COMPAS. \n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_b55a94189979.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_b55a94189979.png)\n\n* [DIVE: A Mixed-Initiative System Supporting Integrated DataExploration Workflows](https:\u002F\u002Fstatic1.squarespace.com\u002Fstatic\u002F5759bc7886db431d658b7d33\u002Ft\u002F5b969d5c89858325956a939f\u002F1536597342848\u002FDIVE_HILDA_2018.pdf); Kevin Hu et al; Generating knowledge from data is an increasingly important ac-tivity. This process of data exploration consists of multiple tasks:data ingestion, visualization, statistical analysis, and storytelling.Though these tasks are complementary, analysts often execute themin separate tools. Moreover, these tools have steep learning curvesdue to their reliance on manual query specification. Here, we de-scribe the design and implementation of DIVE, a web-based systemthat integrates state-of-the-art data exploration features into a sin-gle tool. DIVE contributes a mixed-initiative interaction schemethat combines recommendation with point-and-click manual spec-ification, and a consistent visual language that unifies differentstages of the data exploration workflow. In a controlled user studywith 67 professional data scientists, we find that DIVE users weresignificantly more successful and faster than Excel users at com-pleting predefined data visualization and analysis tasks\n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_5aadf12c71f4.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_5aadf12c71f4.png)\n\n* [Learning Explanatory Rules from Noisy Data](https:\u002F\u002Farxiv.org\u002Fabs\u002F1711.04574); Richard Evans, Edward Grefenstette; Artificial Neural Networks are powerful function approximators capable of modelling solutions to a wide variety of problems, both supervised and unsupervised. As their size and expressivity increases, so too does the variance of the model, yielding a nearly ubiquitous overfitting problem. Although mitigated by a variety of model regularisation methods, the common cure is to seek large amounts of training data---which is not necessarily easily obtained---that sufficiently approximates the data distribution of the domain we wish to test on. In contrast, logic programming methods such as Inductive Logic Programming offer an extremely data-efficient process by which models can be trained to reason on symbolic domains. However, these methods are unable to deal with the variety of domains neural networks can be applied to: they are not robust to noise in or mislabelling of inputs, and perhaps more importantly, cannot be applied to non-symbolic domains where the data is ambiguous, such as operating on raw pixels. In this paper, we propose a Differentiable Inductive Logic framework, which can not only solve tasks which traditional ILP systems are suited for, but shows a robustness to noise and error in the training data which ILP cannot cope with. \n* [Towards Interpretable R-CNN by Unfolding Latent Structures](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1711.05226.pdf); Tianfu Wu, Xilai Li, Xi Song, Wei Sun, Liang Dong and Bo Li; This paper presents a method of learning qualitatively interpretable models in object detection using popular two-stage region-based ConvNet detection systems (i.e., R-CNN). R-CNN consists of a region proposal network and a RoI (Region-of-Interest) prediction network. By interpretable models, we focus on weaklysupervised extractive rationale generation, that is learning to unfold latent discriminative part configurations of object instances automatically and simultaneously in detection without using any supervision for part configurations. We utilize a top-down hierarchical and compositional grammar model embedded in a directed acyclic AND-OR Graph (AOG) to explore and unfold the space of latent part configurations of RoIs. We propose an AOGParsing operator to substitute the RoIPooling operator widely used in RCNN, so the proposed method is applicable to many stateof-the-art ConvNet based detection systems. \n* [Fair lending needs explainable models for responsible recommendation](https:\u002F\u002Farxiv.org\u002Fabs\u002F1809.04684); Jiahao Chen; The financial services industry has unique explainability and fairness challenges arising from compliance and ethical considerations in credit decisioning. These challenges complicate the use of model machine learning and artificial intelligence methods in business decision processes.\n* [ICIE 1.0: A Novel Tool for Interactive Contextual Interaction Explanations](https:\u002F\u002Flink.springer.com\u002Fchapter\u002F10.1007\u002F978-3-030-13463-1_6); Simon B. van der Zon; Wouter Duivesteijn; Werner van Ipenburg; Jan Veldsink; Mykola Pechenizkiy; With the rise of new laws around privacy and awareness, explanation of automated decision making becomes increasingly important. Nowadays, machine learning models are used to aid experts in domains such as banking and insurance to find suspicious transactions, approve loans and credit card applications. Companies using such systems have to be able to provide the rationale behind their decisions; blindly relying on the trained model is not sufficient. There are currently a number of methods that provide insights in models and their decisions, but often they are either good at showing global or local behavior. Global behavior is often too complex to visualize or comprehend, so approximations are shown, and visualizing local behavior is often misleading as it is difficult to define what local exactly means (i.e. our methods don’t “know” how easily a feature-value can be changed; which ones are flexible, and which ones are static). We introduce the ICIE framework (Interactive Contextual Interaction Explanations) which enables users to view explanations of individual instances under different contexts. We will see that various contexts for the same case lead to different explanations, revealing different feature interactions.\n* [Delayed Impact of Fair Machine Learning](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1803.04383.pdf); Lydia T. Liu, Sarah Dean, Esther Rolf, Max Simchowitz, Moritz Hardt; Fairness in machine learning has predominantly been studied in static classification settings without concern for how decisions change the underlying population over time. Conventional wisdom suggests that fairness criteria promote the long-term well-being of those groups they aim to protect.  We study how static fairness criteria interact with temporal indicators of well-being, such as long-term improvement, stagnation, and decline in a variable of interest. We demonstrate that even in a one-step feedback model, common fairness criteria in general do not promote improvement over time, and may in fact cause harm in cases where an unconstrained objective would not.  We completely characterize the delayed impact of three standard criteria, contrasting the regimes in which these exhibit qualitatively different behavior. In addition, we find that a natural form of measurement error broadens the regime in which fairness criteria perform favorably.  Our results highlight the importance of measurement and temporal modeling in the evaluation of fairness criteria, suggesting a range of new challenges and trade-offs.\n* [The Challenge of Crafting Intelligible Intelligence](https:\u002F\u002Farxiv.org\u002Fabs\u002F1803.04263); Daniel S. Weld, Gagan Bansal; Since Artificial Intelligence (AI) software uses techniques like deep lookahead search and stochastic optimization of huge neural networks to fit mammoth datasets, it often results in complex behavior that is difficult for people to understand. Yet organizations are deploying AI algorithms in many mission-critical settings. To trust their behavior, we must make AI intelligible, either by using inherently interpretable models or by developing new methods for explaining and controlling otherwise overwhelmingly complex decisions using local approximation, vocabulary alignment, and interactive explanation. This paper argues that intelligibility is essential, surveys recent work on building such systems, and highlights key directions for research.\n* [An Interpretable Model with Globally Consistent Explanations for Credit Risk](https:\u002F\u002Farxiv.org\u002Fabs\u002F1811.12615); Chaofan Chen, Kangcheng Lin, Cynthia Rudin, Yaron Shaposhnik, Sijia Wang, Tong Wang; We propose a possible solution to a public challenge posed by the Fair Isaac Corporation (FICO), which is to provide an explainable model for credit risk assessment. Rather than present a black box model and explain it afterwards, we provide a globally interpretable model that is as accurate as other neural networks. Our \"two-layer additive risk model\" is decomposable into subscales, where each node in the second layer represents a meaningful subscale, and all of the nonlinearities are transparent. We provide three types of explanations that are simpler than, but consistent with, the global model. One of these explanation methods involves solving a minimum set cover problem to find high-support globally-consistent explanations. We present a new online visualization tool to allow users to explore the global model and its explanations.\n* [HELOC Applicant Risk Performance Evaluation by Topological Hierarchical Decomposition](https:\u002F\u002Farxiv.org\u002Fabs\u002F1811.10658); Kyle Brown, Derek Doran, Ryan Kramer, Brad Reynolds; Strong regulations in the financial industry mean that any decisions based on machine learning need to be explained. This precludes the use of powerful supervised techniques such as neural networks. In this study we propose a new unsupervised and semi-supervised technique known as the topological hierarchical decomposition (THD). This process breaks a dataset down into ever smaller groups, where groups are associated with a simplicial complex that approximate the underlying topology of a dataset. We apply THD to the FICO machine learning challenge dataset, consisting of anonymized home equity loan applications using the MAPPER algorithm to build simplicial complexes. We identify different groups of individuals unable to pay back loans, and illustrate how the distribution of feature values in a simplicial complex can be used to explain the decision to grant or deny a loan by extracting illustrative explanations from two THDs on the dataset.\n* [From Black-Box to White-Box: Interpretable Learning with Kernel Machines](https:\u002F\u002Flink.springer.com\u002Fchapter\u002F10.1007%2F978-3-319-96136-1_18); Hao Zhang, Shinji Nakadai, Kenji Fukumizu; We present a novel approach to interpretable learning with kernel machines. In many real-world learning tasks, kernel machines have been successfully applied. However, a common perception is that they are difficult to interpret by humans due to the inherent black-box nature. This restricts the application of kernel machines in domains where model interpretability is highly required. In this paper, we propose to construct interpretable kernel machines. Specifically, we design a new kernel function based on random Fourier features (RFF) for scalability, and develop a two-phase learning procedure: in the first phase, we explicitly map pairwise features to a high-dimensional space produced by the designed kernel, and learn a dense linear model; in the second phase, we extract an interpretable data representation from the first phase, and learn a sparse linear model. Finally, we evaluate our approach on benchmark datasets, and demonstrate its usefulness in terms of interpretability by visualization.\n* [From Soft Classifiers to Hard Decisions: How fair can we be?](https:\u002F\u002Farxiv.org\u002Fabs\u002F1810.02003); Ran Canetti, Aloni Cohen, Nishanth Dikkala, Govind Ramnarayan, Sarah Scheffler, Adam Smith;  We study the feasibility of achieving various fairness properties by post-processing calibrated scores, and then show that deferring post-processors allow for more fairness conditions to hold on the final decision. Specifically, we show: 1. There does not exist a general way to post-process a calibrated classifier to equalize protected groups' positive or negative predictive value (PPV or NPV). For certain \"nice\" calibrated classifiers, either PPV or NPV can be equalized when the post-processor uses different thresholds across protected groups... 2. When the post-processing is allowed to defer on some decisions (that is, to avoid making a decision by handing off some examples to a separate process), then for the non-deferred decisions, the resulting classifier can be made to equalize PPV, NPV, false positive rate (FPR) and false negative rate (FNR) across the protected groups. This suggests a way to partially evade the impossibility results of Chouldechova and Kleinberg et al., which preclude equalizing all of these measures simultaneously. We also present different deferring strategies and show how they affect the fairness properties of the overall system.  We evaluate our post-processing techniques using the COMPAS data set from 2016.\n* [A Survey of Methods for Explaining Black Box Models](https:\u002F\u002Fdl.acm.org\u002Fcitation.cfm?id=3236009); Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Franco Turini, Fosca Giannotti, Dino Pedreschi;  In recent years, many accurate decision support systems have been constructed as black boxes, that is as systems that hide their internal logic to the user. This lack of explanation constitutes both a practical and an ethical issue. The literature reports many approaches aimed at overcoming this crucial weakness, sometimes at the cost of sacrificing accuracy for interpretability. The applications in which black box decision systems can be used are various, and each approach is typically developed to provide a solution for a specific problem and, as a consequence, it explicitly or implicitly delineates its own definition of interpretability and explanation. The aim of this article is to provide a classification of the main problems addressed in the literature with respect to the notion of explanation and the type of black box system. Given a problem definition, a black box type, and a desired explanation, this survey should help the researcher to find the proposals more useful for his own work. The proposed classification of approaches to open black box models should also be useful for putting the many research open questions in perspective.\n* [Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust Deep Learning](https:\u002F\u002Farxiv.org\u002Fabs\u002F1803.04765); Nicolas Papernot, Patrick McDaniel;  In this work, we exploit the structure of deep learning to enable new learning-based inference and decision strategies that achieve desirable properties such as robustness and interpretability. We take a first step in this direction and introduce the Deep k-Nearest Neighbors (DkNN). This hybrid classifier combines the k-nearest neighbors algorithm with representations of the data learned by each layer of the DNN: a test input is compared to its neighboring training points according to the distance that separates them in the representations. We show the labels of these neighboring points afford confidence estimates for inputs outside the model's training manifold, including on malicious inputs like adversarial examples--and therein provides protections against inputs that are outside the models understanding. This is because the nearest neighbors can be used to estimate the nonconformity of, i.e., the lack of support for, a prediction in the training data. The neighbors also constitute human-interpretable explanations of predictions.\n* [RISE: Randomized Input Sampling for Explanation of Black-box Models](https:\u002F\u002Farxiv.org\u002Fabs\u002F1806.07421); Vitali Petsiuk, Abir Das, Kate Saenko; Deep neural networks are being used increasingly to automate data analysis and decision making, yet their decision-making process is largely unclear and is difficult to explain to the end users. In this paper, we address the problem of Explainable AI for deep neural networks that take images as input and output a class probability. We propose an approach called RISE that generates an importance map indicating how salient each pixel is for the model's prediction. In contrast to white-box approaches that estimate pixel importance using gradients or other internal network state, RISE works on black-box models. It estimates importance empirically by probing the model with randomly masked versions of the input image and obtaining the corresponding outputs. We compare our approach to state-of-the-art importance extraction methods using both an automatic deletion\u002Finsertion metric and a pointing metric based on human-annotated object segments. Extensive experiments on several benchmark datasets show that our approach matches or exceeds the performance of other methods, including white-box approaches. \n* [Visualizing the Feature Importance for Black Box Models](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1804.06620.pdf); Giuseppe Casalicchio, Christoph Molnar, and Bernd Bisch; Based on a recent method for model-agnostic global feature importance, we introduce a local feature importance measure for individual observations and propose two visual tools: partial importance (PI) and individual conditional importance (ICI) plots which visualize how changes in a feature affect the model performance on average, as well as for individual observations. Our proposed methods are related to partial dependence (PD) and individual conditional expectation (ICE) plots, but visualize the expected (conditional) feature importance instead of the expected (conditional) prediction. Furthermore, we show that averaging ICI curves across observations yields a PI curve, and integrating the PI curve with respect to the distribution of the considered feature results in the global feature importance\n* [Interpreting Blackbox Models via Model Extraction](https:\u002F\u002Farxiv.org\u002Fabs\u002F1705.08504); Osbert Bastani, Carolyn Kim, Hamsa Bastani; Interpretability has become incredibly important as machine learning is increasingly used to inform consequential decisions. We propose to construct global explanations of complex, blackbox models in the form of a decision tree approximating the original model---as long as the decision tree is a good approximation, then it mirrors the computation performed by the blackbox model. We devise a novel algorithm for extracting decision tree explanations that actively samples new training points to avoid overfitting. We evaluate our algorithm on a random forest to predict diabetes risk and a learned controller for cart-pole. Compared to several baselines, our decision trees are both substantially more accurate and equally or more interpretable based on a user study. Finally, we describe **several insights provided by our interpretations, including a causal issue validated by a physician.**\n* [A Game-Based Approximate Verification of Deep Neural Networks with Provable Guarantees](https:\u002F\u002Fexport.arxiv.org\u002Fpdf\u002F1807.03571); Min Wu, Matthew Wicke1, Wenjie Ruan, Xiaowei Huang, Marta Kwiatkowska; Despite the improved accuracy of deep neural networks, the discovery of adversarial examples has raised serious safety concerns. In this paper, we study two variants of pointwise robustness, the maximum safe radius problem, which for a given input sample computes the minimum distance to an adversarial example, and the feature robustness problem, which aims to quantify the robustness of individual features to adversarial perturbations. We demonstrate that, under the assumption of Lipschitz continuity, both problems can be approximated using finite optimisation by discretising the input space, and the approximation has provable guarantees, i.e., the error is bounded. We then show that the resulting optimisation problems can be reduced to the solution of two-player turn-based games, where the first player selects features and the second perturbs the image within the feature. While the second player aims to minimise the distance to an adversarial example, depending on the optimisation objective the first player can be cooperative or competitive. We employ an anytime approach to solve the games, in the sense of approximating the value of a game by monotonically improving its upper and lower bounds. The Monte Carlo tree search algorithm is applied to compute upper bounds for both games, and the Admissible A* and the Alpha-Beta Pruning algorithms are, respectively, used to compute lower bounds for the maximum safety radius and feature robustness games. When working on the upper bound of the maximum safe radius problem, our tool demonstrates competitive performance against existing adversarial example crafting algorithms. Furthermore, we show how our framework can be deployed to evaluate pointwise robustness of neural networks in safety-critical applications such as traffic sign recognition in self-driving cars.\n* [All Models are Wrong but Many are Useful: Variable Importance for Black-Box, Proprietary, or Misspecified Prediction Models, using Model Class Reliance](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1801.01489.pdf); Aaron Fisher, Cynthia Rudin, Francesca Dominici; Variable importance (VI) tools describe how much covariates contribute to a prediction model’s accuracy. However, important variables for one well-performing model (for example, a linear model f(x) = x T β with a fixed coefficient vector β) may be unimportant for another model. In this paper, we propose model class reliance (MCR) as the range of VI values across all well-performing model in a prespecified class. Thus, MCR gives a more comprehensive description of importance by accounting for the fact that many prediction models, possibly of different parametric forms, may fit the data well.\n* [Please Stop Explaining Black Box Models for High Stakes Decisions](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1811.10154v1.pdf); Cynthia Rudin; There are black box models now being used for high stakes decision-making throughout society. The practice of trying to explain black box models, rather than creating models that are interpretable in the first place, is likely to perpetuate bad practices and can potentially cause catastrophic harm to society. There is a way forward – it is to design models that are inherently interpretable.\n\n* [State of the Art in Fair ML: From Moral Philosophy and Legislation to Fair Classifiers](https:\u002F\u002Farxiv.org\u002Fabs\u002F1811.09539v1); Elias Baumann, Josef Rumberger; Machine learning is becoming an ever present part in our lives as many decisions, e.g. to lend a credit, are no longer made by humans but by machine learning algorithms. However those decisions are often unfair and discriminating individuals belonging to protected groups based on race or gender. With the recent General Data Protection Regulation (GDPR) coming into effect, new awareness has been raised for such issues and with computer scientists having such a large impact on peoples lives it is necessary that actions are taken to discover and prevent discrimination. This work aims to give an introduction into discrimination, legislative foundations to counter it and strategies to detect and prevent machine learning algorithms from showing such behavior.\n\n* [Explaining Explanations in AI](https:\u002F\u002Farxiv.org\u002Fabs\u002F1811.01439); Brent Mittelstadt, Chris Russell, Sandra Wachter; Recent work on interpretability in machine learning and AI has focused on the building of simplified models that approximate the true criteria used to make decisions. These models are a useful pedagogical device for teaching trained professionals how to predict what decisions will be made by the complex system, and most importantly how the system might break. However, when considering any such model it's important to remember Box's maxim that \"All models are wrong but some are useful.\" We focus on the distinction between these models and explanations in philosophy and sociology. These models can be understood as a \"do it yourself kit\" for explanations, allowing a practitioner to directly answer \"what if questions\" or generate contrastive explanations without external assistance. Although a valuable ability, giving these models as explanations appears more difficult than necessary, and other forms of explanation may not have the same trade-offs. We contrast the different schools of thought on what makes an explanation, and suggest that machine learning might benefit from viewing the problem more broadly.\n\n* [On Human Predictions with Explanations and Predictions of Machine Learning Models: A Case Study on Deception Detection](https:\u002F\u002Farxiv.org\u002Fabs\u002F1811.07901v1); Vivian Lai, Chenhao Tan; Humans are the final decision makers in critical tasks that involve ethical and legal concerns, ranging from recidivism prediction, to medical diagnosis, to fighting against fake news. Although machine learning models can sometimes achieve impressive performance in these tasks, these tasks are not amenable to full automation. To realize the potential of machine learning for improving human decisions, it is important to understand how assistance from machine learning models affect human performance and human agency. In this paper, we use deception detection as a testbed and investigate how we can harness explanations and predictions of machine learning models to improve human performance while retaining human agency. We propose a spectrum between full human agency and full automation, and develop varying levels of machine assistance along the spectrum that gradually increase the influence of machine predictions. We find that without showing predicted labels, explanations alone do not statistically significantly improve human performance in the end task. In comparison, human performance is greatly improved by showing predicted labels (>20% relative improvement) and can be further improved by explicitly suggesting strong machine performance. Interestingly, when predicted labels are shown, explanations of machine predictions induce a similar level of accuracy as an explicit statement of strong machine performance. Our results demonstrate a tradeoff between human performance and human agency and show that explanations of machine predictions can moderate this tradeoff.\n\n* [On the Art and Science of Machine Learning Explanations](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1810.02909v1.pdf); Patrick Hall; explanatory methods that go beyond the error measurements and plots traditionally used to assess machine learning models. Some of the methods are tools of the trade while others are rigorously derived and backed by long-standing theory. The methods, decision tree surrogate models, individual conditional expectation (ICE) plots, local interpretable model agnostic explanations (LIME), partial dependence plots, and Shapley explanations, vary in terms of scope, fidelity, and suitable application domain. Along with descriptions of these methods, this text presents real-world usage recommendations supported by a use case and in-depth software examples.\n\n* [Interpretable to Whom? A Role-based Model for Analyzing Interpretable Machine Learning Systems](https:\u002F\u002Farxiv.org\u002Fabs\u002F1806.07552); Richard Tomsett, Dave Braines, Dan Harborne, Alun Preece, Supriyo Chakraborty; we should not ask if the system is interpretable, but to whom is it interpretable. We describe a model intended to help answer this question, by identifying different roles that agents can fulfill in relation to the machine learning system. We illustrate the use of our model in a variety of scenarios, exploring how an agent's role influences its goals, and the implications for defining interpretability. Finally, we make suggestions for how our model could be useful to interpretability researchers, system developers, and regulatory bodies auditing machine learning systems.\n\n* [Interpreting Models by Allowing to Ask](https:\u002F\u002Farxiv.org\u002Fabs\u002F1811.05106); Sungmin Kang, David Keetae Park, Jaehyuk Chang, Jaegul Choo; Questions convey information about the questioner, namely what one does not know. In this paper, we propose a novel approach to allow a learning agent to ask what it considers as tricky to predict, in the course of producing a final output. By analyzing when and what it asks, we can make our model more transparent and interpretable. We first develop this idea to propose a general framework of deep neural networks that can ask questions, which we call asking networks. A specific architecture and training process for an asking network is proposed for the task of colorization, which is an exemplar one-to-many task and thus a task where asking questions is helpful in performing the task accurately. Our results show that the model learns to generate meaningful questions, asks difficult questions first, and utilizes the provided hint more efficiently than baseline models. We conclude that the proposed asking framework makes the learning agent reveal its weaknesses, which poses a promising new direction in developing interpretable and interactive models.\n\n* [Contrastive Explanation: A Structural-Model Approach](https:\u002F\u002Farxiv.org\u002Fabs\u002F1811.03163); Tim Miller; ...Research in philosophy and social sciences shows that explanations are contrastive: that is, when people ask for an explanation of an event *the fact* they (sometimes implicitly) are asking for an explanation relative to some contrast case; that is, \"Why P rather than Q?\". In this paper, we extend the structural causal model approach to define two complementary notions of contrastive explanation, and demonstrate them on two classical AI problems: classification and planning. \n\n* [Explainable AI for Designers: A Human-Centered Perspective on Mixed-Initiative Co-Creation](http:\u002F\u002Fantoniosliapis.com\u002Fpapers\u002Fexplainable_ai_for_designers.pdf); Jichen Zhu, Antonios Liapis, Sebastian Risi, Rafael Bidarra, Michael Youngblood; In this vision paper, we propose a new research area of eXplainable AI for Designers (XAID), specifically for game designers. By focusing on a specific user group, their needs and tasks, we propose a human-centered approach for facilitating game designers to co-create with AI\u002FML techniques through XAID. We illustrate our initial XAID framework through three use cases, which require an understanding both of the innate properties of the AI techniques and users’ needs, and we identify key open challenges.\n\n* [AI in Education needs interpretable machine learning: Lessons from Open Learner Modelling](https:\u002F\u002Farxiv.org\u002Fabs\u002F1807.00154); Cristina Conati, Kaska Porayska-Pomsta, Manolis Mavrikis; Interpretability of the underlying AI representations is a key raison d'être for Open Learner Modelling (OLM) -- a branch of Intelligent Tutoring Systems (ITS) research. OLMs provide tools for 'opening' up the AI models of learners' cognition and emotions for the purpose of supporting human learning and teaching. - use case\n\n* [Instance-Level Explanations for Fraud Detection: A Case Study](https:\u002F\u002Farxiv.org\u002Fabs\u002F1806.07129); Dennis Collaris, Leo M. Vink, Jarke J. van Wijk; Fraud detection is a difficult problem that can benefit from predictive modeling. However, the verification of a prediction is challenging; for a single insurance policy, the model only provides a prediction score. We present a case study where we reflect on different instance-level model explanation techniques to aid a fraud detection team in their work. To this end, we designed two novel dashboards combining various state-of-the-art explanation techniques.\n\n* [On the Robustness of Interpretability Methods](https:\u002F\u002Farxiv.org\u002Fabs\u002F1806.08049); David Alvarez-Melis, Tommi S. Jaakkola; We argue that robustness of explanations---i.e., that similar inputs should give rise to similar explanations---is a key desideratum for interpretability. We introduce metrics to quantify robustness and demonstrate that current methods do not perform well according to these metrics. Finally, we propose ways that robustness can be enforced on existing interpretability approaches.\n\n* [Contrastive Explanations with Local Foil Trees](https:\u002F\u002Farxiv.org\u002Fabs\u002F1806.07470); Jasper van der Waa, Marcel Robeer, Jurriaan van Diggelen, Matthieu Brinkhuis, Mark Neerincx; Recent advances in interpretable Machine Learning (iML) and eXplainable AI (XAI) construct explanations based on the importance of features in classification tasks. However, in a high-dimensional feature space this approach may become unfeasible without restraining the set of important features. We propose to utilize the human tendency to ask questions like \"Why this output (the fact) instead of that output (the foil)?\" to reduce the number of features to those that play a main role in the asked contrast. Our proposed method utilizes locally trained one-versus-all decision trees to identify the disjoint set of rules that causes the tree to classify data points as the foil and not as the fact. \n\n* [Evaluating Feature Importance Estimates](https:\u002F\u002Farxiv.org\u002Fabs\u002F1806.10758); Sara Hooker, Dumitru Erhan, Pieter-Jan Kindermans, Been Kim; Estimating the influence of a given feature to a model prediction is challenging. We introduce ROAR, RemOve And Retrain, a benchmark to evaluate the accuracy of interpretability methods that estimate input feature importance in deep neural networks. We remove a fraction of input features deemed to be most important according to each estimator and measure the change to the model accuracy upon retraining. \n\n* [Interpreting Embedding Models of Knowledge Bases: A Pedagogical Approach](https:\u002F\u002Farxiv.org\u002Fabs\u002F1806.09504); Arthur Colombini Gusmão, Alvaro Henrique Chaim Correia, Glauber De Bona, Fabio Gagliardi Cozman; Embedding models attain state-of-the-art accuracy in knowledge base completion, but their predictions are notoriously hard to interpret. In this paper, we adapt \"pedagogical approaches\" (from the literature on neural networks) so as to interpret embedding models by extracting weighted Horn rules from them. We show how pedagogical approaches have to be adapted to take upon the large-scale relational aspects of knowledge bases and show experimentally their strengths and weaknesses.\n\n* [Manifold: A Model-Agnostic Framework for Interpretation and Diagnosis of Machine Learning Models](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1808.00196.pdf); Jiawei Zhang, Yang Wang, Piero Molino, Lezhi Li and David S. Ebert; Intoduces Manifold - tool for visual exploration of a model during  inspection (hypothesis), explanation (reasoning), and refinement (verification). Supports comparison of multiple models. Visual exploratory approach for machine learning model development.\n\n* [Interpretable Explanations of Black Boxes by Meaningful Perturbation](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1704.03296.pdf); Ruth C. Fong, Andrea Vedaldi; (from abstract) general framework for learning different kinds of explanations for any black box algorithm. framework to find the part of an image most responsible for a classifier decision... method is model-agnostic and testable because it is grounded in explicit and interpretable image perturbations.\n\n* [Interpretability is Harder in the Multiclass Setting: Axiomatic Interpretability for Multiclass Additive Models](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1810.09092.pdf); Xuezhou Zhang, Sarah Tan, Paul Koch, Yin Lou, Urszula Chajewska, Rich Caruana; (...) We then develop a post-processing technique (API) that provably transforms pretrained additive models to satisfy the interpretability axioms without sacrificing accuracy. The technique works not just on models trained with our algorithm, but on any multiclass additive model. We demonstrate API on a 12-class infant-mortality dataset. (...) Initially for Generalized additive models (GAMs).\n\n* [Statistical Paradises and Paradoxes in Big Data](https:\u002F\u002Fstatistics.fas.harvard.edu\u002Ffiles\u002Fstatistics-2\u002Ffiles\u002Fstatistical_paradises_and_paradoxes_in_big_data_.pdf); Xiao-Li Meng; (...) Paradise gained or lost? Data quality-quantity tradeoff. (“Which one should I trust more: a 1% survey with 60% response rate or a non-probabilistic dataset covering 80% of the population?”); Data Quality × Data Quantity × Problem Difficulty; \n\n* [Explanation Methods in Deep Learning: Users, Values, Concerns and Challenges](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1803.07517.pdf); Gabrielle Ras, Marcel van Gerven, Pim Haselager; Issues regarding explainable AI involve four components: users, laws & regulations, explanations and algorithms. Overall, it is clear that (visual) explanations can be given about various aspects of the influence of the input on the output ... It is likely that in the future we will see the rise of a new category of explanation methods that combine aspects of rule-extraction, attribution and intrinsic methods, to answer specific questions in a simple human interpretable language. Furthermore, it is obvious that current explanation methods are tailored to expert users, since the interpretation of the results require knowledge of the DNN process. As far as we are aware, explanation methods, e.g. intuitive explanation interfaces, for lay users do not exist.\n\n* [TED: Teaching AI to Explain its Decisions](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1811.04896v1.pdf); Noel C. F. Codella et al; Artificial intelligence systems are being increasingly deployed due to their potential to increase the efficiency, scale, consistency, fairness, and accuracy of decisions. However, as many of these systems are opaque in their operation, there is a growing demand for such systems to provide explanations for their decisions. Conventional approaches to this problem attempt to expose or discover the inner workings of a machine learning model with the hope that the resulting explanations will be meaningful to the consumer. In contrast, this paper suggests a new approach to this problem. It introduces a simple, practical framework, called Teaching Explanations for Decisions (TED), that provides meaningful explanations that match the mental model of the consumer. \n\n* [Transparency in Algorithmic and Human Decision-Making: Is There a Double Standard?](https:\u002F\u002Flink.springer.com\u002Farticle\u002F10.1007\u002Fs13347-018-0330-6); John Zerilli, Alistair Knott, James Maclaurin, Colin Gavaghan; We are sceptical of concerns over the opacity of algorithmic decision tools. While transparency and explainability are certainly important desiderata in algorithmic governance, we worry that automated decision-making is being held to an unrealistically high standard, possibly owing to an unrealistically high estimate of the degree of transparency attainable from human decision-makers. In this paper, we review evidence demonstrating that much human decision-making is fraught with transparency problems, show in what respects AI fares little worse or better and argue that at least some regulatory proposals for explainable AI could end up setting the bar higher than is necessary or indeed helpful. The demands of practical reason require the justification of action to be pitched at the level of practical reason. Decision tools that support or supplant practical reasoning should not be expected to aim higher than this. We cast this desideratum in terms of Daniel Dennett’s theory of the “intentional stance” and argue that since the justification of action for human purposes takes the form of intentional stance explanation, the justification of algorithmic decisions should take the same form. In practice, this means that the sorts of explanations for algorithmic decisions that are analogous to intentional stance explanations should be preferred over ones that aim at the architectural innards of a decision tool.\n\n* [A comparative study of fairness-enhancing interventions in machine learning](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1802.04422.pdf); Sorelle A. Friedler, Carlos Scheidegger, Suresh Venkatasubramanian, Sonam Choudhary, Evan P. Hamilton, Derek Roth; Computers are increasingly used to make decisions that have significant impact in people's lives. Often, these predictions can affect different population subgroups disproportionately. As a result, the issue of fairness has received much recent interest, and a number of fairness-enhanced classifiers and predictors have appeared in the literature. This paper seeks to study the following questions: how do these different techniques fundamentally compare to one another, and what accounts for the differences? Specifically, we seek to bring attention to many under-appreciated aspects of such fairness-enhancing interventions. Concretely, we present the results of an open benchmark we have developed that lets us compare a number of different algorithms under a variety of fairness measures, and a large number of existing datasets. We find that although different algorithms tend to prefer specific formulations of fairness preservations, many of these measures strongly correlate with one another. In addition, we find that fairness-preserving algorithms tend to be sensitive to fluctuations in dataset composition (simulated in our benchmark by varying training-test splits), indicating that fairness interventions might be more brittle than previously thought.\n\n* [Check yourself before you wreck yourself: Assessing discrete choice models through predictive simulations](https:\u002F\u002Farxiv.org\u002Fabs\u002F1806.02307); Timothy Brathwaite; Graphical model checks : Typically, discrete choice modelers develop ever-more advanced models and estimation methods. Compared to the impressive progress in model development and estimation, model-checking techniques have lagged behind. Often, choice modelers use only crude methods to assess how well an estimated model represents reality. Such methods usually stop at checking parameter signs, model elasticities, and ratios of model coefficients. In this paper, I greatly expand the discrete choice modelers' assessment toolkit by introducing model checking procedures based on graphical displays of predictive simulations. \n\n* [Example and Feature importance-based Explanations for Black-box Machine Learning Models](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1812.09044); Ajaya Adhikari, D.M.J Tax, Riccardo Satta, Matthias Fath; As machine learning models become more accurate, they typically become more complex and uninterpretable by humans. The black-box character of these models holds back its acceptance in practice, especially in high-risk domains where the consequences of failure could be catastrophic such as health-care or defense. Providing understandable and useful explanations behind ML models or predictions can increase the trust of the user. Example-based reasoning, which entails leveraging previous experience with analogous tasks to make a decision, is a well known strategy for problem solving and justification. This work presents a new explanation extraction method called LEAFAGE, for a prediction made by any black-box ML model. The explanation consists of the visualization of similar examples from the training set and the importance of each feature. Moreover, these explanations are contrastive which aims to take the expectations of the user into account. LEAFAGE is evaluated in terms of fidelity to the underlying black-box model and usefulness to the user. The results showed that LEAFAGE performs overall better than the current state-of-the-art method LIME in terms of fidelity, on ML models with non-linear decision boundary. A user-study was conducted which focused on revealing the differences between example-based and feature importance-based explanations. It showed that example-based explanations performed significantly better than feature importance-based explanation, in terms of perceived transparency, information sufficiency, competence and confidence. Counter-intuitively, when the gained knowledge of the participants was tested, it showed that they learned less about the black-box model after seeing a feature importance-based explanation than seeing no explanation at all. The participants found feature importance-based explanation vague and hard to generalize it to other instances. \n\n### 2017\n\n* [Explainable AI: Beware of Inmates Running the Asylum Or: How I Learnt to Stop Worrying and Love the Social and Behavioural Sciences](https:\u002F\u002Farxiv.org\u002Fabs\u002F1712.00547); Tim Miller, Piers Howe, Liz Sonenberg; In his seminal book *The Inmates are Running the Asylum: Why High-Tech Products Drive Us Crazy And How To Restore The Sanity* [2004, Sams Indianapolis, IN, USA], Alan Cooper argues that a major reason why software is often poorly designed (from a user perspective) is that programmers are in charge of design decisions, rather than interaction designers. As a result, programmers design software for themselves, rather than for their target audience, a phenomenon he refers to as the inmates running the asylum. This paper argues that explainable AI risks a similar fate. While the re-emergence of explainable AI is positive, this paper argues most of us as AI researchers are building explanatory agents for ourselves, rather than for the intended users. But explainable AI is more likely to succeed if researchers and practitioners understand, adopt, implement, and improve models from the vast and valuable bodies of research in philosophy, psychology, and cognitive science, and if evaluation of these models is focused more on people than on technology. From a light scan of literature, we demonstrate that there is considerable scope to infuse more results from the social and behavioural sciences into explainable AI, and present some key results from these fields that are relevant to explainable AI. \n\n* [Interactive Graphics for Visually Diagnosing Forest Classifiers in R](https:\u002F\u002Farxiv.org\u002Fabs\u002F1704.02502); Natalia da Silva, Dianne Cook, Eun-Kyung Lee; This paper describes structuring data and constructing plots to explore forest classification models interactively. A forest classifier is an example of an ensemble, produced by bagging multiple trees. The process of bagging and combining results from multiple trees, produces numerous diagnostics which, with interactive graphics, can provide a lot of insight into class structure in high dimensions. Various aspects are explored in this paper, to assess model complexity, individual model contributions, variable importance and dimension reduction, and uncertainty in prediction associated with individual observations. The ideas are applied to the random forest algorithm, and to the projection pursuit forest, but could be more broadly applied to other bagged ensembles.\n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_3361c622da3e.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_3361c622da3e.png)\n\n* [Black Hat Visualization](https:\u002F\u002Fidl.cs.washington.edu\u002Ffiles\u002F2017-BlackHatVis-DECISIVe.pdf); Michael Correll, Jeffrey Heer; People lie, mislead, and bullshit in a myriad of ways. Visualizations,as a form of communication, are no exception to these tendencies.Yet, the language we use to describe how people can use visualiza-tions to mislead can be relatively sparse. For instance, one can be“lying with vis” or using “deceptive visualizations.” In this paper, weuse the language of computer security to expand the space of waysthat unscrupulous people (black hats) can manipulate visualizationsfor nefarious ends. In addition to forms of deception well-coveredin the visualization literature, we also focus on visualizations whichhave fidelity to the underlying data (and so may not be considereddeceptive in the ordinary use of the term in visualization), but stillhave negative impact on how data are perceived.  We encouragedesigners to think defensively and comprehensively about how theirvisual designs can result in data being misinterprete.\n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_1b87d594402e.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_1b87d594402e.png)\n\n* [A Workflow for Visual Diagnostics of Binary Classifiers using Instance-Level Explanations](https:\u002F\u002Farxiv.org\u002Fabs\u002F1705.01968); Josua Krause, Aritra Dasgupta, Jordan Swartz, Yindalon Aphinyanaphongs, Enrico Bertini; Human-in-the-loop data analysis applications necessitate greater transparency in machine learning models for experts to understand and trust their decisions. To this end, we propose a visual analytics workflow to help data scientists and domain experts explore, diagnose, and understand the decisions made by a binary classifier. The approach leverages \"instance-level explanations\", measures of local feature relevance that explain single instances, and uses them to build a set of visual representations that guide the users in their investigation. The workflow is based on three main visual representations and steps: one based on aggregate statistics to see how data distributes across correct \u002F incorrect decisions; one based on explanations to understand which features are used to make these decisions; and one based on raw data, to derive insights on potential root causes for the observed patterns. \n* [Fair Forests: Regularized Tree Induction to Minimize Model Bias](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1712.08197.pdf); Edward Raff, Jared Sylvester, Steven Mills; The potential lack of fairness in the outputs of machine learning algorithms has recently gained attention both within the research community as well as in society more broadly. Surprisingly, there is no prior work developing tree-induction algorithms for building fair decision trees or fair random forests. These methods have widespread popularity as they are one of the few to be simultaneously interpretable, non-linear, and easy-to-use. In this paper we develop, to our knowledge, the first technique for the induction of fair decision trees. We show that our \"Fair Forest\" retains the benefits of the tree-based approach, while providing both greater accuracy and fairness than other alternatives, for both \"group fairness\" and \"individual fairness.'\" We also introduce new measures for fairness which are able to handle multinomial and continues attributes as well as regression problems, as opposed to binary attributes and labels only. Finally, we demonstrate a new, more robust evaluation procedure for algorithms that considers the dataset in its entirety rather than only a specific protected attribute.\n* [Towards A Rigorous Science of Interpretable Machine Learning](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1702.08608.pdf); Finale Doshi-Velez and Been Kim; In such cases, a popular fallback is the criterion of interpretability: if the system can explain its reasoning, we then can verify whether that reasoning is sound with respect to these auxiliary criteria. Unfortunately, there is little consensus on what interpretability in machine learning is and how to evaluate it for benchmarking. To large extent, both evaluation approaches rely on some notion of “you’ll know it when you see it.” Should we be concerned about a lack of rigor?;  Multi-objective trade-offs: Mismatched objectives: Ethics: Safety: Scientific Understanding:\n* [Attentive Explanations: Justifying Decisions and Pointing to the Evidence](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1711.07373.pdf); Dong Huk Park et al; Deep models are the defacto standard in visual decision problems due to their impressive performance on a wide array of visual tasks. We propose two large-scale datasets with annotations that visually and textually justify a classification decision for various activities, i.e. ACT-X, and for question answering, i.e. VQA-X. \n* [SPINE: SParse Interpretable Neural Embeddings](https:\u002F\u002Farxiv.org\u002Fabs\u002F1711.08792); Anant Subramanian, Danish Pruthi, Harsh Jhamtani, Taylor Berg-Kirkpatrick, Eduard Hovy; Prediction without justification has limited utility. Much of the success of neural models can be attributed to their ability to learn rich, dense and expressive representations. While these representations capture the underlying complexity and latent trends in the data, they are far from being interpretable. We propose a novel variant of denoising k-sparse autoencoders that generates highly efficient and interpretable distributed word representations (word embeddings), beginning with existing word representations from state-of-the-art methods like GloVe and word2vec. Through large scale human evaluation, we report that our resulting word embedddings are much more interpretable than the original GloVe and word2vec embeddings. Moreover, our embeddings outperform existing popular word embeddings on a diverse suite of benchmark downstream tasks.\n* [Detecting concept drift in data streams using model explanation](https:\u002F\u002Fwww.researchgate.net\u002Fpublication\u002F320177686_Detecting_concept_drift_in_data_streams_using_model_explanation); Jaka Demšar, Zoran Bosnic; Interesting use case for explainers - PDP like explainers are used to identify concept drift.\n* [Explanation of Prediction Models with ExplainPrediction](http:\u002F\u002Fwww.informatica.si\u002Findex.php\u002Finformatica\u002Farticle\u002Fview\u002F2227\u002F1121) intoroduces two methods EXPLAIN and IME (R packages) for local and global explanations.\n* [What do we need to build explainable AI systems for the medical domain?](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1712.09923.pdf); Andreas Holzinger, Chris Biemann, Constantinos Pattichis, Douglas Kell. In this paper we outline some of our research topics in the context of the relatively new area of explainable-AI with a focus on the application in medicine, which is a very special domain. This is due to the fact that medical professionals are working mostly with distributed heterogeneous and complex sources of data. In this paper we concentrate on three sources: images, omics data and text. We argue that research in explainable-AI would generally help to facilitate the implementation of AI\u002FML in the medical domain, and specifically help to facilitate transparency and trust.  However, the full effectiveness of all AI\u002FML success is limited by the algorithm’s inabilities to explain its results to human experts - but exactly this is a big issue in the medical domain.\n\n### 2016 \n\n* [Equality of Opportunity in Supervised Learning](https:\u002F\u002Farxiv.org\u002Fabs\u002F1610.02413); Moritz Hardt, Eric Price, Nathan Srebro; We propose a criterion for discrimination against a specified sensitive attribute in supervised learning, where the goal is to predict some target based on available features. Assuming data about the predictor, target, and membership in the protected group are available, we show how to optimally adjust any learned predictor so as to remove discrimination according to our definition. Our framework also improves incentives by shifting the cost of poor classification from disadvantaged groups to the decision maker, who can respond by improving the classification accuracy.\nIn line with other studies, our notion is oblivious: it depends only on the joint statistics of the predictor, the target and the protected attribute, but not on interpretation of individualfeatures. We study the inherent limits of defining and identifying biases based on such oblivious measures, outlining what can and cannot be inferred from different oblivious tests.\nWe illustrate our notion using a case study of FICO credit scores. \n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_be06e21002f3.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_be06e21002f3.png)\n\n* [Interacting with Predictions: Visual Inspection of Black-box Machine Learning Models](http:\u002F\u002Fperer.org\u002Fpapers\u002FadamPerer-Prospector-CHI2016.pdf); Josua Krause, Adam Perer, Kenney Ng; Describes Prospector - tool for visual exploration of predictive models. Few interesting and novel ideas, like Partial Dependence Bars. Prospector can compare models and shows both local and global explanations.\n\n* [The Mythos of Model Interpretability](https:\u002F\u002Farxiv.org\u002Fabs\u002F1606.03490); Zachary C. Lipton; Supervised machine learning models boast remarkable predictive capabilities. But can you trust your model? Will it work in deployment? What else can it tell you about the world? We want models to be not only good, but interpretable. And yet the task of interpretation appears underspecified. (...) First, we examine the motivations underlying interest in interpretability, finding them to be diverse and occasionally discordant. Then, we address model properties and techniques thought to confer interpretability, identifying transparency to humans and post-hoc explanations as competing notions. Throughout, we discuss the feasibility and desirability of different notions, and question the oft-made assertions that linear models are interpretable and that deep neural networks are not. \n\n* [What makes classification trees comprehensible?](https:\u002F\u002Fwww.sciencedirect.com\u002Fscience\u002Farticle\u002Fpii\u002FS0957417416302901); Rok Piltaver, Mitja Luštrek, Matjaž Gams, Sanda Martinčić-Ipšić; Classification trees are attractive for practical applications because of their comprehensibility. However, the literature on the parameters that influence their comprehensibility and usability is scarce. This paper systematically investigates how tree structure parameters (the number of leaves, branching factor, tree depth) and visualisation properties influence the tree comprehensibility. In addition, we analyse the influence of the question depth (the depth of the deepest leaf that is required when answering a question about a classification tree), which turns out to be the most important parameter, even though it is usually overlooked. The analysis is based on empirical data that is obtained using a carefully designed survey with 98 questions answered by 69 respondents. The paper evaluates several tree-comprehensibility metrics and proposes two new metrics (the weighted sum of the depths of leaves and the weighted sum of the branching factors on the paths from the root to the leaves) that are supported by the survey results. The main advantage of the new comprehensibility metrics is that they consider the semantics of the tree in addition to the tree structure itself.\n\n### 2015\n\n* [The Residual-based Predictiveness Curve - A Visual Tool to Assess the Performance of Prediction Models](https:\u002F\u002Fwww.ncbi.nlm.nih.gov\u002Fpubmed\u002F26676377); Giuseppe Casalicchio, Bernd Bischl, Anne-Laure Boulesteix, Matthias Schmid; The RBP (residual-based predictiveness) curve reflects both the calibration and the discriminatory power of a prediction model. In addition, the curve can be conveniently used to conduct valid performance checks and marker comparisons. The RBP curve is implemented in the R package RBPcurve. \n* [Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model](https:\u002F\u002Farxiv.org\u002Fabs\u002F1511.01644); Benjamin Letham, Cynthia Rudin, Tyler H. McCormick, David Madigan; We aim to produce predictive models that are not only accurate, but are also interpretable to human experts. Our models are decision lists, which consist of a series of if...then... statements (e.g., if high blood pressure, then stroke) that discretize a high-dimensional, multivariate feature space into a series of simple, readily interpretable decision statements. We introduce a generative model called Bayesian Rule Lists that yields a posterior distribution over possible decision lists. It employs a novel prior structure to encourage sparsity.\n\n### 2009\n\n* [How to Explain Individual Classification Decisions](https:\u002F\u002Farxiv.org\u002Fpdf\u002F0912.1128.pdf), David Baehrens, Timon Schroeter, Stefan Harmeling, Motoaki Kawanabe, Katja Hansen, Klaus-Robert Muller; (from abstract) The only method that is currently able to provide such explanations are decision trees. ... Model agnostic method, introduces *explanation vectors* that summarise steepness of changes of model decisions as function of model inputs.\n\n### 2005\n\n* [The Tyranny of Tacit Knowledge: What Artificial Intelligence Tells us About Knowledge Representation\n](https:\u002F\u002Fwww.researchgate.net\u002Fpublication\u002F224755645_The_Tyranny_of_Tacit_Knowledge_What_Artificial_Intelligence_Tells_us_About_Knowledge_Representation); Kurt D. Fenstermacher; Polanyi's tacit knowledge captures the idea \"we can know more than we can tell.\" Many researchers in the knowledge management community have used the idea of tacit knowledge to draw a distinction between that which cannot be formally represented (tacit knowledge) and knowledge which can be so represented (explicit knowledge). I argue that the deference that knowledge management researchers give to tacit knowledge hinders potentially fruitful work for two important reasons. First, the inability to explicate knowledge does not imply that the knowledge cannot be formally represented. Second, assuming the inability to formalize tacit knowledge as it exists in the minds of people does not exclude the possibility that computer systems might perform the same tasks using alternative representations. By reviewing work from artificial intelligence, I will argue that a richer model of cognition and knowledge representation is needed to study and build knowledge management systems.\n\n### 2004\n\n* [Discovering additive structure in black box functions](https:\u002F\u002Fdl.acm.org\u002Fcitation.cfm?doid=1014052.1014122), Giles Hooker\n\n\n## Books\n\n### 2020\n\n* [Robustness and Explainability of Artificial Intelligence](https:\u002F\u002Fpublications.jrc.ec.europa.eu\u002Frepository\u002Fbitstream\u002FJRC119336\u002Fdpad_report.pdf); Hamon Ronan, Junklewitz Henrik, Sanchez Ignacio; From technical to policy solutions; JRC technical report; Technical report by the Joint Research Centre (JRC), the European Commission’s science and knowledge service. \n\n![jrc_xai](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_de5cd8e3c268.png)\n\n\n### 2019\n\n* [Predictive Models: Explore, Explain, and Debug](https:\u002F\u002Fgithub.com\u002Fpbiecek\u002FPM_VEE); Przemyslaw Biecek, Tomasz Burzykowski. Today, the  bottleneck in predictive modelling is not the lack of data, nor the lack of computational power, nor inadequate algorithms, nor the lack of flexible models. It is the lack of tools for model validation, model exploration, and explanation of model decisions. Thus, in this book, we present a collection of methods that may be used for this purpose.\n\n![drwhy_local_explainers.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_dcd02fdc6284.png)\n\n* [Explainable AI: Interpreting, Explaining and Visualizing Deep Learning](https:\u002F\u002Fwww.springer.com\u002Fgp\u002Fbook\u002F9783030289539); Samek, W., Montavon, G., Vedaldi, A., Hansen, L.K., Müller, K.-R.; The development of “intelligent” systems that can take decisions and perform autonomously might lead to faster and more consistent decisions. A limiting factor for a broader adoption of AI technology is the inherent risks that come with giving up human control and oversight to “intelligent” machines. For sensitive tasks involving critical infrastructures and affecting human well-being or health, it is crucial to limit the possibility of improper, non-robust and unsafe decisions and actions. Before deploying an AI system, we see a strong need to validate its behavior, and thus establish guarantees that it will continue to perform as expected when deployed in a real-world environment. In pursuit of that objective, ways for humans to verify the agreement between the AI decision structure and their own ground-truth knowledge have been explored. Explainable AI (XAI) has developed as a subfield of AI, focused on exposing complex AI models to humans in a systematic and interpretable manner.\nThe 22 chapters included in this book provide a timely snapshot of algorithms, theory, and applications of interpretable and explainable AI and AI techniques that have been proposed recently reflecting the current discourse in this field and providing directions of future development. The book is organized in six parts: towards AI transparency; methods for interpreting AI systems; explaining the decisions of AI systems; evaluating interpretability and explanations; applications of explainable AI; and software for explainable AI.\n\n### 2018\n\n* [Machine Learning Interpretability with H2O Driverless AI](http:\u002F\u002Fdocs.h2o.ai\u002Fdriverless-ai\u002Flatest-stable\u002Fdocs\u002Fbooklets\u002FMLIBooklet.pdf); Patrick Hall, Navdeep Gill, Megan Kurka, Wen Phan; \n* [An Introduction to Machine Learning Interpretability](https:\u002F\u002Fwww.oreilly.com\u002Flibrary\u002Fview\u002Fan-introduction-to\u002F9781492033158\u002F); Navdeep Gill, Patrick Hall; Lots of great figures, high level overview of the most common techniques to the model interpretability.\n* [Interpretable Machine Learning](https:\u002F\u002Fchristophm.github.io\u002Finterpretable-ml-book\u002F); Christoph Molnar; Intoduces the most popular methods (LIME, PDP, SHAP and few others) along with more general bird's-eye view over interpretability. \n\n\n## Tools\n\n### 2019\n* [ ExplainX ](https:\u002F\u002Fgithub.com\u002FexplainX\u002Fexplainx); ExplainX is a fast, light-weight, and scalable explainable AI framework for data scientists to explain any black-box model to business stakeholders in just one line of code. This library is maintained by the AI reearchers at New York University VIDA Lab. Detailed documentation can also be found on this [website](https:\u002F\u002Fwww.explainx.ai\u002F)\n\n![https:\u002F\u002Fcamo.githubusercontent.com\u002F03f9e0729544717710427ed393dae32b8d055159\u002F68747470733a2f2f692e6962622e636f2f7734534631474a2f47726f75702d322d312e706e67](https:\u002F\u002Fcamo.githubusercontent.com\u002F03f9e0729544717710427ed393dae32b8d055159\u002F68747470733a2f2f692e6962622e636f2f7734534631474a2f47726f75702d322d312e706e67)\n\n* [ EthicalML \u002F xai ](https:\u002F\u002Fgithub.com\u002FEthicalML\u002Fxai); XAI is a Machine Learning library that is designed with AI explainability in its core. XAI contains various tools that enable for analysis and evaluation of data and models. The XAI library is maintained by The Institute for Ethical AI & ML, and it was developed based on the 8 principles for Responsible Machine Learning. You can find the documentation at https:\u002F\u002Fethicalml.github.io\u002Fxai\u002Findex.html. \n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_aacd2288881e.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_aacd2288881e.png)\n\n* [Aequitas: A Bias and Fairness Audit Toolkit](https:\u002F\u002Farxiv.org\u002Fabs\u002F1811.05577v2); Recent work has raised concerns on the risk of unintended bias in AI systems being used nowadays that can affect individuals unfairly based on race, gender or religion, among other possible characteristics. While a lot of bias metrics and fairness definitions have been proposed in recent years, there is no consensus on which metric\u002Fdefinition should be used and there are very few available resources to operationalize them.  Aequitas facilitates informed and equitable decisions around developing and deploying algorithmic decision making systems for both data scientists, machine learning researchers and policymakers. \n\n![fairnessTree.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_2e1fbfdbc401.png)\n\n* [tf-explain](https:\u002F\u002Fgithub.com\u002Fsicara\u002Ftf-explain); tf-explain implements interpretability methods as Tensorflow 2.0 callbacks to ease neural network's understanding. See [Introducing tf-explain, Interpretability for Tensorflow 2.0](https:\u002F\u002Fblog.sicara.com\u002Ftf-explain-interpretability-tensorflow-2-9438b5846e35)\n\n* [XAI library maintained by The Institute for Ethical AI & ML](https:\u002F\u002Fgithub.com\u002FEthicalML\u002Fxai) XAI is a Machine Learning library that is designed with AI explainability in its core. XAI contains various tools that enable for analysis and evaluation of data and models. The XAI library is maintained by The Institute for Ethical AI & ML, and it was developed based on the [8 principles for Responsible Machine Learning](https:\u002F\u002Fethical.institute\u002Fprinciples.html).\n\n* [InterpretML by Microsoft](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Finterpret); Python library by Microsoft related to explainability of ML models\n\n![interpretML.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_b95cb8fc2191.png)\n\n* [Assessing Causality from Observational Data using Pearl's Structural Causal Models](https:\u002F\u002Fblog.methodsconsultants.com\u002Fposts\u002Fpearl-causality\u002F); \n* [sklearn_explain](https:\u002F\u002Fgithub.com\u002Fantoinecarme\u002Fsklearn_explain); Model explanation provides the ability to interpret the effect of the predictors on the composition of an individual score.\n* [heatmapping.org](http:\u002F\u002Fwww.heatmapping.org\u002F); This webpage aims to regroup publications and software produced as part of a joint project at Fraunhofer HHI, TU Berlin and SUTD Singapore on developing new method to understand nonlinear predictions of state-of-the-art machine learning models. Machine learning models, in particular deep neural networks (DNNs), are characterized by very high predictive power, but in many case, are not easily interpretable by a human. Interpreting a nonlinear classifier is important to gain trust into the prediction, and to identify potential data selection biases or artefacts. The project studies in particular techniques to decompose the prediction in terms of contributions of individual input variables such that the produced decomposition (i.e. explanation) can be visualized in the same way as the input data.\n\n* [iNNvestigate neural networks!](https:\u002F\u002Fgithub.com\u002Falbermax\u002Finnvestigate); A toolbox created by authors of [heatmapping.org](http:\u002F\u002Fwww.heatmapping.org\u002F) in the attempt to understand neural networks better. It contains implementations of e.g., Saliency, Deconvnet, GuidedBackprop, SmoothGrad, IntergratedGradients, LRP, PatternNet&-Attribution. This library provides a common interface and out-of-the-box implementation for many analysis methods. \n\n![innvestigate.PNG](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_f60d49b63985.png)\n\n* [ggeffects](https:\u002F\u002Fstrengejacke.wordpress.com\u002F2019\u002F01\u002F14\u002Fggeffects-0-8-0-now-on-cran-marginal-effects-for-regression-models-rstats\u002F); Daniel Lüdecke; Compute marginal effects from statistical models and returns the result as tidy data frames. These data frames are ready to use with the 'ggplot2'-package. Marginal effects can be calculated for many different models. Interaction terms, splines and polynomial terms are also supported. The main functions are ggpredict(), ggemmeans() and ggeffect(). There is a generic plot()-method to plot the results using 'ggplot2'.\n\n* [Contrastive LRP](https:\u002F\u002Fgithub.com\u002FJindong-Explainable-AI\u002FContrastive-LRP) - A pytorch implemention of the paper [Understanding Individual Decisions of CNNs via Contrastive Backpropagation](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1812.02100.pdf). The code creates CLRP saliency maps to explain individual classification on VGG16 model.\n\n* [Relative Attributing Propagation](https:\u002F\u002Fgithub.com\u002FwjNam\u002FRelative_Attributing_Propagation) - Relative attributing propagation (RAP) decomposes the output predictions of DNNs with a new perspective of separating the relevant (positive) and irrelevant (negative) attributions according to the relative influence between the layers. Detail description of this method is provided in the paper https:\u002F\u002Farxiv.org\u002Fpdf\u002F1904.00605.pdf.\n\n### 2018\n\n* [KDD 2018: Explainable Models for Healthcare AI](https:\u002F\u002Fnotepad.mmakowski.com\u002FTech\u002FKDD%202018:%20Explainable%20Models%20for%20Healthcare%20AI); The Explainable Models for Healthcare AI tutorial was presented by a trio from KenSci Inc. that included a data scientist and a clinician. The premise of the session was that explainability is particularly important in healthcare applications of machine learning, due to the far-reaching consequences of decisions, high cost of mistakes, fairness and compliance requirements. The tutorial walked through a number of aspects of interpretability and discussed techniques that can be applied to explain model predictions.\n* [MAGMIL: Model Agnostic Methods for Interpretable Machine Learning](https:\u002F\u002Fgithub.com\u002Fankitbit\u002FMAGMIL); European Union’s new General Data Protection Regulation which is going to be enforced beginning from 25th of May, 2018 will have potential impact on the routine use of machine learning algorithms by restricting automated individual decision-making (that is, algorithms that make decisions based on user-level predictors) which ”significantly affect” users. The law will also effectively create a ”right to explanation,” whereby a user can ask for an explanation of an algorithmic decision that was made about them. Considering such challenging norms on the use of machine learning systems, we are making an attempt to make the models more interpretable. While we are concerned about developing a deeper understanding of decisions made by a machine learning model, the idea of extracting the explaintations from the machine learning system, also known as model-agnostic interpretability methods has some benefits over techniques such as model specific interpretability methods in terms of flexibility.\n* [A toolbox to iNNvestigate neural networks' predictions!](https:\u002F\u002Fgithub.com\u002Falbermax\u002Finnvestigate); Maximilian Alber; In the recent years neural networks furthered the state of the art in many domains like, e.g., object detection and speech recognition. Despite the success neural networks are typically still treated as black boxes. Their internal workings are not fully understood and the basis for their predictions is unclear. In the attempt to understand neural networks better several methods were proposed, e.g., Saliency, Deconvnet, GuidedBackprop, SmoothGrad, IntergratedGradients, LRP, PatternNet&-Attribution. Due to the lack of a reference implementations comparing them is a major effort. This library addresses this by providing a common interface and out-of-the-box implementation for many analysis methods. Our goal is to make analyzing neural networks' predictions easy!\n* [Black Box Auditing and Certifying and Removing Disparate Impact](https:\u002F\u002Fgithub.com\u002Falgofairness\u002FBlackBoxAuditing); This repository contains a sample implementation of Gradient Feature Auditing (GFA) meant to be generalizable to most datasets. For more information on the repair process, see our paper on Certifying and Removing Disparate Impact. For information on the full auditing process, see our paper on Auditing Black-box Models for Indirect Influence.\n* [Skater: Python Library for Model Interpretation\u002FExplanations](https:\u002F\u002Fgithub.com\u002Fdatascienceinc\u002FSkater); Skater is a unified framework to enable Model Interpretation for all forms of model to help one build an Interpretable machine learning system often needed for real world use-cases(** we are actively working towards to enabling faithful interpretability for all forms models). It is an open source python library designed to demystify the learned structures of a black box model both globally(inference on the basis of a complete data set) and locally(inference about an individual prediction).\n* [Weight Watcher](https:\u002F\u002Fgithub.com\u002FCalculatedContent\u002FWeightWatcher); Charles Martin; Weight Watcher analyzes the Fat Tails in the weight matrices of Deep Neural Networks (DNNs). This tool can predict the trends in the generalization accuracy of a series of DNNs, such as VGG11, VGG13, ..., or even the entire series of ResNet models--without needing a test set ! This relies upon recent research into the Heavy (Fat) Tailed Self Regularization in DNNs \n* [Adversarial Robustness Toolbox - ART](https:\u002F\u002Fgithub.com\u002FIBM\u002Fadversarial-robustness-toolbox); This is a library dedicated to adversarial machine learning. Its purpose is to allow rapid crafting and analysis of attacks and defense methods for machine learning models. The Adversarial Robustness Toolbox provides an implementation for many state-of-the-art methods for attacking and defending classifiers.\n* [Model Describer](https:\u002F\u002Fgithub.com\u002FDataScienceSquad\u002Fmodel-describer); Python script that generates html report that summarizes predictive models. Interactive and rich in descriptions.\n* [AI Fairness 360](https:\u002F\u002Fgithub.com\u002FIBM\u002Faif360); Python library developed by IBM to help detect and remove bias in machine learning models. [Some introduction](https:\u002F\u002Farxiv.org\u002Fabs\u002F1810.01943)\n* [The What-If Tool: Code-Free Probing of Machine Learning Models](https:\u002F\u002Fai.googleblog.com\u002F2018\u002F09\u002Fthe-what-if-tool-code-free-probing-of.html); An interactive tool for What-If scenarios developed in Google, part of TensorBoard.\n\n### 2017\n\n* [Impact encoding for categorical features](https:\u002F\u002Fgithub.com\u002FDpananos\u002FCategorical-Features); Imagine working with a dataset containing all the zip codes in the United States. That is a datset containing nearly 40,000 unique categories. How would you deal with that kind of data if you planned to do predictive modelling? One hot encoding doesn't get you anywhere useful, since that would add 40,000 sparse variables to your dataset. Throwing the data out could be leaving valuable information on the table, so that doesn't seem right either. In this post, I'm going to examine how to deal with categorical variables with high cardinality using a stratey called impact encoding. To illustrate this example, I use a data set containing used car sales. The probelm is especially well suited because there are several categorical features with many levels. Let's get started.\n* [FairTest](https:\u002F\u002Fgithub.com\u002Fcolumbia\u002Ffairtest); FairTest enables developers or auditing entities to discover and test for unwarranted associations between an algorithm's outputs and certain user subpopulations identified by protected features.\n* [Explanation Explorer](https:\u002F\u002Fgithub.com\u002Fnyuvis\u002Fexplanation_explorer); Visual tool implemented in python for visual diagnostics of binary classifiers using lnstance-level explanations (local explainers).\n* [ggeffects](https:\u002F\u002Fstrengejacke.wordpress.com\u002F2017\u002F05\u002F24\u002Fggeffects-create-tidy-data-frames-of-marginal-effects-for-ggplot-from-model-outputs-rstats\u002F); Create Tidy Data Frames of Marginal Effects for ‚ggplot‘ from Model Outputs, The aim of the ggeffects-package is similar to the broom-package: transforming “untidy” input into a tidy data frame, especially for further use with ggplot. However, ggeffects does not return model-summaries; rather, this package computes marginal effects at the mean or average marginal effects from statistical models and returns the result as tidy data frame (as tibbles, to be more precisely).\n\n## Articles\n\n### 2019\n\n* [AI Black Box Horror Stories — When Transparency was Needed More Than Ever](https:\u002F\u002Fmedium.com\u002F@ODSC\u002Fai-black-box-horror-stories-when-transparency-was-needed-more-than-ever-3d6ac0439242) Arguably, one of the biggest debates happening in data science in 2019 is the need for AI explainability. The ability to interpret machine learning models is turning out to be a defining factor for the acceptance of statistical models for driving business decisions. Enterprise stakeholders are demanding transparency in how and why these algorithms are making specific predictions. A firm understanding of any inherent bias in machine learning keeps boiling up to the top of requirements for data science teams. As a result, many top vendors in the big data ecosystem are launching new tools to take a stab at resolving the challenge of opening the AI “black box.”\n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_e54b757d6536.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_e54b757d6536.png)\n\n* [Artificial Intelligence Confronts a 'Reproducibility' Crisis](https:\u002F\u002Fwww.wired.com\u002Fstory\u002Fartificial-intelligence-confronts-reproducibility-crisis\u002F); A few years ago, Joelle Pineau, a computer science professor at McGill, was helping her students design a new algorithm when they fell into a rut. Her lab studies reinforcement learning, a type of artificial intelligence that’s used, among other things, to help virtual characters (“half cheetah” and “ant” are popular) teach themselves how to move about in virtual worlds. It’s a prerequisite to building autonomous robots and cars. Pineau’s students hoped to improve on another lab’s system. But first they had to rebuild it, and their design, for reasons unknown, was falling short of its promised results. Until, that is, the students tried some “creative manipulations” that didn’t appear in the other lab’s paper.\n* [Model explainers and the press secretary — directly optimizing for trust in machine learning may be harmful](https:\u002F\u002Fmedium.com\u002F@stuart.reynolds\u002Fmodel-explainers-and-the-press-secretary-optimizing-for-trust-in-machine-learning-may-be-harmful-84275b27bea6); If black-box model explainers optimize human trust in machine learning models, why shouldn’t we expect that black-box model explainers will function like a dishonest government Press Secretary?\n* [Decoding the Black Box: An Important Introduction to Interpretable Machine Learning Models in Python](https:\u002F\u002Fwww.analyticsvidhya.com\u002Fblog\u002F2019\u002F08\u002Fdecoding-black-box-step-by-step-guide-interpretable-machine-learning-models-python\u002F); Ankit Choudhary; Interpretable machine learning is a critical concept every data scientist should be aware of; How can you build interpretable machine learning models? This article will provide a framework; We will also code these interpretable machine learning models in Python\n\n* [I, Black Box: Explainable Artificial Intelligence and the Limits of Human Deliberative Processes](https:\u002F\u002Fwarontherocks.com\u002F2019\u002F07\u002Fi-black-box-explainable-artificial-intelligence-and-the-limits-of-human-deliberative-processes\u002F); Much has been made about the importance of understanding the inner workings of machines when it comes to the ethics of using artificial intelligence (AI) on the battlefield. Delegates at the Group of Government Expert meetings on lethal autonomous weapons continue to raise the issue. Concerns expressed by legal and scientific scholars abound. One commentator sums it up: “for human decision makers to be able to retain agency over the morally relevant decisions made with AI they would need a clear insight into the AI black box, to understand the data, its provenance and the logic of its algorithms.”\n* [Teaching AI, Ethics, Law and Policy](https:\u002F\u002Farxiv.org\u002Fabs\u002F1904.12470); Asher Wilk; The cyberspace and the development of intelligent systems using Artificial Intelligence (AI) created new challenges to computer professionals, data scientists, regulators and policy makers. For example, self-driving cars raise new technical, ethical, legal and policy issues. This paper proposes a course Computers, Ethics, Law, and Public Policy, and suggests a curriculum for such a course. This paper presents ethical, legal, and public policy issues relevant to building and using software and artificial intelligence. It describes ethical principles and values relevant to AI systems. \n* [An introduction to explainable AI, and why we need it](https:\u002F\u002Fwww.kdnuggets.com\u002F2019\u002F04\u002Fintroduction-explainable-ai.html); Patrick Ferris; I was fortunate enough to attend the Knowledge Discovery and Data Mining(KDD) conference this year. Of the talks I went to, there were two main areas of research that seem to be on a lot of people’s minds: Firstly, finding a meaningful representation of graph structures to feed into neural networks. Oriol Vinyalsfrom DeepMind gave a talk about their Message Passing Neural Networks.  The second area, and the focus of this article, are explainable AI models. As we generate newer and more innovative applications for neural networks, the question of ‘How do they work?’ becomes more and more important.\n* [The AI Black Box Explanation Problem](https:\u002F\u002Fercim-news.ercim.eu\u002Fen116\u002Fspecial\u002Fthe-ai-black-box-explanation-problem); At a very high level, we articulated the problem in two different flavours: eXplanation by Design (XbD): given a dataset of training decision records, how to develop a machine learning decision model together with its explanation; Black Box eXplanation (BBX): given the decision records produced by a black box decision model, how to reconstruct an explanation for it.\n* [VOZIQ Launches ‘Agent Connect,’ an Explainable AI Product to Enable Large-Scale Customer Retention Programs](https:\u002F\u002Fwww.einnews.com\u002Fpr_news\u002F481152181\u002Fvoziq-launches-agent-connect-an-explainable-ai-product-to-enable-large-scale-customer-retention-programs); RESTON, VIRGINIA , USA, April 3, 2019 \u002FEINPresswire.com\u002F -- VOZIQ, an enterprise cloud-based application solution provider that enables recurring revenue businesses to drive large-scale predictive customer retention programs, announced the launch of its new eXplainable AI (XAI) product ‘Agent Connect’ to help businesses enhance proactive retention capabilities of their most critical resource – customer retention agents. ‘Agent Connect’ is VOZIQ’s newest product powered by next-generation eXplainable AI (XAI) that brings together multiple retention risk signals with expressed and inferred needs, sentiment, churn drivers and behaviors that lead to attrition of customers discovered directly from millions of customer interactions by analyzing unstructured and structured customer data, and converts insights those into easy-to-act, prescriptive intelligence about predicted health for any customer.\n* [Derisking machine learning and artificial intelligence ](https:\u002F\u002Fwww.mckinsey.com\u002Fbusiness-functions\u002Frisk\u002Four-insights\u002Fderisking-machine-learning-and-artificial-intelligence); Machine learning and artificial intelligence are set to transform the banking industry, using vast amounts of data to build models that improve decision making, tailor services, and improve risk management. According to the McKinsey Global Institute, this could generate value of more than $250 billion in the banking industry. But there is a downside, since machine-learning models amplify some elements of model risk. And although many banks, particularly those operating in jurisdictions with stringent regulatory requirements, have validation frameworks and practices in place to assess and mitigate the risks associated with traditional models, these are often insufficient to deal with the risks associated with machine-learning models. Conscious of the problem, many banks are proceeding cautiously, restricting the use of machine-learning models to low-risk applications, such as digital marketing. Their caution is understandable given the potential financial, reputational, and regulatory risks. Banks could, for example, find themselves in violation of antidiscrimination laws, and incur significant fines—a concern that pushed one bank to ban its HR department from using a machine-learning résumé screener. A better approach, however, and ultimately the only sustainable one if banks are to reap the full benefits of machine-learning models, is to enhance model-risk management.\n* [Explainable AI should help us avoid a third 'AI winter'](https:\u002F\u002Fwww.computing.co.uk\u002Fctg\u002Fopinion\u002F3073390\u002Fexplainable-ai-should-help-us-avoid-a-third-ai-winter); The General Data Protection Regulation (GDPR) that came into force last year across Europe has rightly made consumers and businesses more aware of personal data. However, there is a real risk that through over-correcting around data collection critical AI development will be negatively impacted. This is not only an issue for data scientists, but also those companies that use AI-based solutions to increase competitiveness. The potential negative impact would not only be on businesses implementing AI but also on consumers who may miss out on the benefits AI could bring to the products and services they rely on.\n* [Explainable AI: From Prediction To Understanding](https:\u002F\u002Fmedium.com\u002F@ODSC\u002Fexplainable-ai-from-prediction-to-understanding-38c81c11460); It’s not enough to make predictions. Sometimes, you need to generate a deep understanding. Just because you model something doesn’t mean you really know how it works. In classical machine learning, the algorithm spits out predictions, but in some cases, this isn’t good enough. Dr. George Cevora explains why the black box of AI may not always be appropriate and how to go from prediction to understanding.\n* [Why Explainable AI (XAI) is the future of marketing and e-commerce](https:\u002F\u002Fwww.the-future-of-commerce.com\u002F2019\u002F03\u002F11\u002Fwhat-is-explainable-ai-xai\u002F); “New machine-learning systems will have the ability to explain their rationale, characterize their strengths and weaknesses, and convey an understanding of how they will behave in the future.” – David Gunning, Head of DARPA. As machine learning begins to play a greater role in the delivery of personalized customer experiences in commerce and content, one of the most powerful opportunities is the development of systems that offer marketers the ability to maximize every dollar spent on marketing programs via actionable insights. But the rise of AI in business for actionable insights also creates a challenge: How can marketers know and trust the reasoning behind why an AI system is making recommendations for action? Because AI makes decisions using incredibly complex processes, its decisions are often opaque to the end-user.\n* [Interpretable AI or How I Learned to Stop Worrying and Trust AI](https:\u002F\u002Ftowardsdatascience.com\u002Finterpretable-ai-or-how-i-learned-to-stop-worrying-and-trust-ai-e61f9e8ee2c2) Techniques to build Robust, Unbiased AI Applications; Ajay Thampi; In the last five years alone, AI researchers have made significant breakthroughs in areas such as image recognition, natural language understanding and board games! As companies are considering handing over critical decisions to AI in industries like healthcare and finance, the lack of understanding of complex machine learned models is hugely problematic. This lack of understanding could result in models propagating bias and we’ve seen quite a few examples of this in criminal justice, politics, retail, facial recognition and language understanding.\n* [In Search of Explainable Artificial Intelligence](https:\u002F\u002Fwww.geopoliticalmonitor.com\u002Fin-search-of-explainable-artificial-intelligence\u002F); Today, if a new entrepreneur wants to understand why the banks rejected a loan application for his start-up, or if a young graduate wants to know why the large corporation for which he was hoping to work did not invite her for an interview, they will not be able to discover the reasons that led to these decisions. Both the bank and the corporation used artificial intelligence (AI) algorithms to determine the outcome of the loan or the job application. In practice, this means that if your loan application is rejected, or your CV rejected, no explanation can be provided. This produces an embarrassing scenario, which tends to relegate AI technologies to suggesting solutions, which must be validated by human beings.\n* [Explainable AI and the Rebirth of Rules](https:\u002F\u002Fwww.forbes.com\u002Fsites\u002Ftomdavenport\u002F2019\u002F03\u002F18\u002Fexplainable-ai-and-the-rebirth-of-rules\u002F); Artificial intelligence (AI) has been described as a set of “prediction machines.” In general, the technology is great at generating automated predictions. But if you want to use artificial intelligence in a regulated industry, you better be able to explain how the machine predicted a fraud or criminal suspect, a bad credit risk, or a good candidate for drug trials. International law firm Taylor Wessing (the firm) wanted to use AI as a triage tool to help advise clients of the firm about their predicted exposure to regulations such as the Modern Slavery Act or the Foreign Corrupt Practices Act.  Clients often have suppliers or acquisitions around the world, and they need systematic due diligence to determine where they should investigate more deeply into possible risk. Supply chains can be especially complicated with hundreds of small suppliers. Rumors of Rule Engines’ Death Have Been Greatly Exaggerated\n* [Attacking discrimination with smarter machine learning](https:\u002F\u002Fresearch.google.com\u002Fbigpicture\u002Fattacking-discrimination-in-ml\u002F); Here we discuss \"threshold classifiers,\" a part of some machine learning systems that is critical to issues of discrimination. A threshold classifier essentially makes a yes\u002Fno decision, putting things in one category or another. We look at how these classifiers work, ways they can potentially be unfair, and how you might turn an unfair classifier into a fairer one. As an illustrative example, we focus on loan granting scenarios where a bank may grant or deny a loan based on a single, automatically computed number such as a credit score. \n* [Better Preference Predictions: Tunable and Explainable Recommender Systems](https:\u002F\u002Fblog.insightdatascience.com\u002Ftunable-and-explainable-recommender-systems-cd52b6287bad); Amber Roberts; Ad recommendations should be understandable to the individual consumer, but is it possible to increase interpretability without sacrificing accuracy?\n* [Machine Learning is Creating a Crisis in Science](https:\u002F\u002Fwww.governmentciomedia.com\u002Fmachine-learning-creating-crisis-science); Kevin McCaney; The adoption of machine-learning techniques is contributing to a worrying number of research findings that cannot be repeated by other researchers.\n* [Artificial Intelligence and Ethics](https:\u002F\u002Fharvardmagazine.com\u002F2019\u002F01\u002Fartificial-intelligence-limitations); Jonathan Shaw; On march 2018, at around 10 P.M., Elaine Herzberg was wheeling her bicycle across a street in Tempe, Arizona, when she was struck and killed by a self-driving car. Although there was a human operator behind the wheel, an autonomous system—artificial intelligence—was in full control. This incident, like others involving interactions between people and AI technologies, raises a host of ethical and proto-legal questions. What moral obligations did the system’s programmers have to prevent their creation from taking a human life? And who was responsible for Herzberg’s death? The person in the driver’s seat? The company testing the car’s capabilities? The designers of the AI system, or even the manufacturers of its onboard sensory equipment?\n* [Building Trusted Human-Machine Partnerships](https:\u002F\u002Fwww.darpa.mil\u002Fnews-events\u002F2019-01-31); A key ingredient in effective teams – whether athletic, business, or military – is trust, which is based in part on mutual understanding of team members’ competence to fulfill assigned roles. When it comes to forming effective teams of humans and autonomous systems, humans need timely and accurate insights about their machine partners’ skills, experience, and reliability to trust them in dynamic environments. At present, autonomous systems cannot provide real-time feedback when changing conditions such as weather or lighting cause their competency to fluctuate. The machines’ lack of awareness of their own competence and their inability to communicate it to their human partners reduces trust and undermines team effectiveness.\n* [HOW AUGMENTED ANALYTICS AND EXPLAINABLE AI WILL CAUSE A DISRUPTION IN 2019 & BEYOND](https:\u002F\u002Fwww.analyticsinsight.net\u002Fhere-is-how-augmented-analytics-and-explainable-ai-will-cause-a-disruption-in-2019-beyond\u002F); Kamalika Some; Artificial intelligence (AI) is a transformational $15 trillion opportunity which has caught the attention of all tech users, leaders and influencers. Yet, as AI becomes more sophisticated, the algorithmic ‘black box’ dominates more to make all the decisions. To have a confident outcome and stakeholder trust with an ultimate aim to capitalise on the opportunities, it is essential to know the rationale of how the algorithm arrived at its recommendation or decision, the basic premise behind Explainable AI (XAI).\n* [Why ‘Explainable AI’ is the Next Frontier in Financial Crime Fighting ](http:\u002F\u002Fwww.bankingexchange.com\u002Fnews-feed\u002Fitem\u002F7785-why-explainable-ai-is-the-next-frontier-in-financial-crime-fighting); Chad Hetherington; Financial institutions (FIs) must manage compliance budgets without losing sight of primary functions and quality control. To answer this, many have made the move to automating time-intensive, rote tasks like data gathering and sorting through alerts by adopting innovative technologies like AI and machine learning to free up time-strapped analysts for more informed and precise decision-making processes.\n* [Machine Learning Interpretability: Do You Know What Your Model Is Doing?](https:\u002F\u002Fwww.inovex.de\u002Fblog\u002Fmachine-learning-interpretability\u002F); Marcel Spitzer; With the adoption of GDPR, there are now EU-wide regulations concerning automated individual decision-making and profiling (Art. 22, also termed „right to explanation“), engaging companies to give individuals information about processing, to introduce ways for them to request intervention and to even carry out regular checks to make sure that the systems are working as intended. \n* [Building explainable machine learning models](https:\u002F\u002Fwww.fastdatascience.com\u002F2019\u002F02\u002F08\u002Fbuilding-explainable-machine-learning-models\u002F); Thomas Wood; Sometimes as data scientists we will encounter cases where we need to build a machine learning model that should not be a black box, but which should make transparent decisions that humans can understand. This can go against our instincts as scientists and engineers, as we would like to build the most accurate model possible.\n* [AI is not IT](https:\u002F\u002Fwww.linkedin.com\u002Fpulse\u002Fai-silvie-spreeuwenberg); Silvie Spreeuwenberg; XAI suggests something in between. It is still narrow AI but used in such a way that there is a feedback loop to the environment. The feedback loop may involve human intervention. We understand the scope of the narrow AI solution. We can adjust the solution when the task at hand requires more knowledge, or are warned in a meaningful way when the task at hand does not fit in the scope of the AI solution.\n* [A computer program used for bail and sentencing decisions was labeled biased against blacks. It’s actually not that clear.](https:\u002F\u002Fwww.washingtonpost.com\u002Fnews\u002Fmonkey-cage\u002Fwp\u002F2016\u002F10\u002F17\u002Fcan-an-algorithm-be-racist-our-analysis-is-more-cautious-than-propublicas\u002F); This past summer, a heated debate broke out about a tool used in courts across the country to help make bail and sentencing decisions. It’s a controversy that touches on some of the big criminal justice questions facing our society. And it all turns on an algorithm.\n* [AAAS: Machine learning 'causing science crisis'](https:\u002F\u002Fwww.bbc.com\u002Fnews\u002Fscience-environment-47267081); Machine-learning techniques used by thousands of scientists to analyse data are producing results that are misleading and often completely wrong. Dr Genevera Allen from Rice University in Houston said that the increased use of such systems was contributing to a “crisis in science”. She warned scientists that if they didn’t improve their techniques they would be wasting both time and money.\n* [Automatic Machine Learning is broken](https:\u002F\u002Fpplonski.github.io\u002Fautomatic-machine-learning-is-broken\u002F); Debt that comes with maintenance and understand of complex models \n* [Charles River Analytics creates tool to help AI communicate effectively with humans](http:\u002F\u002Fmil-embedded.com\u002Fnews\u002Fcharles-river-analytics-creates-tool-to-help-ai-communicate-effectively-with-humans\u002F); Developer of intelligent systems solutions, Charles River Analytics Inc. created the Causal Models to Explain Learning (CAMEL) approach under the Defense Advanced Research Projects Agency's (DARPA) Explainable Artificial Intelligence (XAI) effort. The goal of the CAMEL tool approach will be help artificial intelligence effectively communicate with human teammates.\n* [Inside DARPA’s effort to create explainable artificial intelligence](https:\u002F\u002Fbdtechtalks.com\u002F2019\u002F01\u002F10\u002Fdarpa-xai-explainable-artificial-intelligence\u002F); Among DARPA’s many exciting projects is Explainable Artificial Intelligence (XAI), an initiative launched in 2016 aimed at solving one of the principal challenges of deep learning and neural networks, the subset of AI that is becoming increasing prominent in many different sectors.\n* [Boston University researchers develop framework to improve AI fairness](https:\u002F\u002Fventurebeat.com\u002F2019\u002F01\u002F30\u002Fboston-university-researchers-develop-framework-to-improve-ai-fairness\u002F); Experience in the past few years shows AI algorithms can manifest gender and racial bias, raising concern over their use in critical domains, such as deciding whose loan gets approved, who’s qualified for a job, who gets to walk free and who stays in prison. New research by scientists at Boston University shows just how hard it is to evaluate fairness in AI algorithms and tries to establish a framework for detecting and mitigating problematic behavior in automated decisions. Titled “From Soft Classifiers to Hard Decisions: How fair can we be?,” the research paper is being presented this week at the Association for Computing Machinery conference on Fairness, Accountability, and Transparency (ACM FAT*).\n\n### 2018\n\n* [Understanding Explainable AI](https:\u002F\u002Fwww.quantiply.com\u002Fblog\u002Funderstanding-explainable-ai); (Extracted from The Basis Technology Handbook for Integrating AI in Highly Regulated Industries) For the longest time, the public perception of AI has been linked to visions of the apocalypse: AI is Skynet, and we should be afraid of it. You can see that fear in the reactions to the Uber self-driving car tragedy. Despite the fact that people cause tens of thousands of automobile deaths per year, it strikes a nerve when even a single accident involves AI. This fear belies something very important about the technical infrastructure of the modern world: AI is already thoroughly baked in. That’s not to say that there aren’t reasons to get skittish about our increasing reliance on AI technology. The “black box” problem is one such justified reason for hesitation.\n* [The Importance of Human Interpretable Machine Learning](https:\u002F\u002Ftowardsdatascience.com\u002Fhuman-interpretable-machine-learning-part-1-the-need-and-importance-of-model-interpretation-2ed758f5f476); This article is the first in my series of articles aimed at ‘Explainable Artificial Intelligence (XAI)’. The field of Artificial Intelligence powered by Machine Learning and Deep Learning has gone through some phenomenal changes over the last decade. Starting off as just a pure academic and research-oriented domain, we have seen widespread industry adoption across diverse domains including retail, technology, healthcare, science and many more. Rather than just running lab experiments to publish a research paper, the key objective of data science and machine learning in the 21st century has changed to tackling and solving real-world problems, automating complex tasks and making our life easier and better. More than often, the standard toolbox of machine learning, statistical or deep learning models remain the same. New models do come into existence like Capsule Networks, but industry adoption of the same usually takes several years. Hence, in the industry, the main focus of data science or machine learning is more ‘applied’ rather than theoretical and effective application of these models on the right data to solve complex real-world problems is of paramount importance.\n* [Uber Has Open-Sourced Autonomous Vehicle Visualization](https:\u002F\u002Fwww.designnews.com\u002Fdesign-hardware-software\u002Fuber-has-open-sourced-autonomous-vehicle-visualization\u002F38672905960296); With an open source version of its Autonomous Visualization System, Uber is hoping to create a standard visualization system for engineers to use in autonomous vehicle development.\n* [Holy Grail of AI for Enterprise - Explainable AI (XAI)](https:\u002F\u002Fblog.goodaudience.com\u002Fholy-grail-of-ai-for-enterprise-explainable-ai-xai-6e630902f2a0); Saurabh Kaushik; Apart from a solution of the above scenarios, XAI offers deeper Business benefits, such as:  Improves AI Model performance as explanation help pinpoint issues in data and feature behaviors. Better Decision Making as explanation provides added info and confidence for Man-in-Middle to act wisely and decisively. Gives a sense of Control as an AI system owner clearly knows levers for its AI system’s behavior and boundary. Gives a sense of Safety as each decision can be subjected to pass through safety guidelines and alerts on its violation. Build Trust with stakeholders who can see through all the reasoning of each and every decision made. Monitor for Ethical issues and violation due to bias in training data. Better mechanism to comply with Accountability requirements within the organization for auditing and other purposes. Better adherence to Regulatory requirements (like GDPR) where ‘Right to Explain’ is must-have for a system.\n* [Artificial Intelligence Is Not A Technology](https:\u002F\u002Fwww.forbes.com\u002Fsites\u002Fcognitiveworld\u002F2018\u002F11\u002F01\u002Fartificial-intelligence-is-not-a-technology\u002F);  Kathleen Walch; Making intelligent machines is both the goal of AI as well as the underlying science behind understanding what it takes to make a machine intelligent. AI represents our desired outcome and many of the developments along the way of that understanding such as self-driving vehicles, image recognition technology, or natural language processing and generation are steps along the journey to AGI.\n* [The Building Blocks of Interpretability](https:\u002F\u002Fdistill.pub\u002F2018\u002Fbuilding-blocks\u002F); Chris Olah ...; Interpretability techniques are normally studied in isolation. We explore the powerful interfaces that arise when you combine them — and the rich structure of this combinatorial space\n* [Why Machine Learning Interpretability Matters](https:\u002F\u002Fblog.dataiku.com\u002Fwhy-machine-learning-interpretability-matters); Even though machine learning (ML) has been around for decades, it seems that in the last year, much of the news (notably in mainstream media) surrounding it has turned to interpretability - including ideas like trust, the ML black box, and fairness or ethics. Surely, if the topic is growing in popularity, that must mean it’s important. But why, exactly - and to whom?\n* [IBM, Harvard develop tool to tackle black box problem in AI translation](https:\u002F\u002Fventurebeat.com\u002F2018\u002F11\u002F01\u002Fibm-harvard-develop-tool-to-tackle-black-box-problem-in-ai-translation\u002F); seq2seq vis; Researchers at IBM and Harvard University have developed a new debugging tool to address this issue. Presented at the IEEE Conference on Visual Analytics Science and Technology in Berlin last week, the tool lets creators of deep learning applications visualize the decision-making an AI makes when translating a sequence of words from one language to another.\n* [The Five Tribes of Machine Learning Explainers](https:\u002F\u002Fwww.slideshare.net\u002Flopusz\u002Fthe-five-tribes-of-machine-learning-explainers); Michał Łopuszyński; Lightning talk from PyData Berlin 2018\n* [Beware Default Random Forest Importances](https:\u002F\u002Fexplained.ai\u002Frf-importance\u002Findex.html); Terence Parr, Kerem Turgutlu, Christopher Csiszar, and Jeremy Howard; TL;DR: The scikit-learn Random Forest feature importance and R's default Random Forest feature importance strategies are biased. To get reliable results in Python, use permutation importance, provided here and in our rfpimp package (via pip). For R, use importance=T in the Random Forest constructor then type=1 in R's importance() function. In addition, your feature importance measures will only be reliable if your model is trained with suitable hyper-parameters.\n* [A Case For Explainable AI & Machine Learning](https:\u002F\u002Fwww.kdnuggets.com\u002F2018\u002F12\u002Fexplainable-ai-machine-learning.html); Very nice list of possible use-cases for XAI, examples: Energy theft detection - Different types of theft require different action by the investigators; Credit scoring - he Fair Credit Reporting Act (FCRA) is a federal law that regulates credit reporting agencies and compels them to insure the information they gather and distribute is a fair and accurate summary of a consumer's credit history; Video threat detection - Flagging an individual as a threat has a potential for significant legal implications; \n\n* [Ethics of AI: A data scientist’s perspective](https:\u002F\u002Fmedium.com\u002F@QuantumBlack\u002Fethics-of-ai-a-data-scientists-perspective-cb7cdb1c8392); QuantumBlack\n\n* [Explainable AI vs Explaining AI](https:\u002F\u002Fmedium.com\u002F@ahmad.hajmosa\u002Fexplainable-ai-vs-explaining-ai-part-1-d39ea5053347); Ahmad Haj Mosa; Some ideas that links tools for XAI with ideas from ,,Thinking fast, thinking slow''.\n\n* [Regulating Black-Box Medicine](http:\u002F\u002Fmichiganlawreview.org\u002Fregulating-black-box-medicine\u002F); Data drive modern medicine. And our tools to analyze those data are growing ever more powerful. As health data are collected in greater and greater amounts, sophisticated algorithms based on those data can drive medical innovation, improve the process of care, and increase efficiency. Those algorithms, however, vary widely in quality. Some are accurate and powerful, while others may be riddled with errors or based on faulty science. When an opaque algorithm recommends an insulin dose to a diabetic patient, how do we know that dose is correct? Patients, providers, and insurers face substantial difficulties in identifying high-quality algorithms; they lack both expertise and proprietary information. How should we ensure that medical algorithms are safe and effective?\n\n* [3 Signs of a Good AI Model](https:\u002F\u002Ftdwi.org\u002Farticles\u002F2018\u002F11\u002F26\u002Fadv-all-3-signs-of-a-good-ai-model.aspx); Troy Hiltbrand; Until recently, the success of an AI project was judged only by its outcomes for the company, but an emerging industry trend suggests another goal -- explainable artificial intelligence (XAI). The gravitation toward XAI stems from demand from consumers (and ultimately society) to better understand how AI decisions are made. Regulations, such as the General Data Protection Regulation (GDPR) in Europe, have increased the demand for more accountability when AI is used to make automated decisions, especially in cases where bias has a detrimental effect on individuals.\n\n* [Rapid new advances are now underway in AI](https:\u002F\u002Fwww.technative.io\u002Fwhy-its-important-to-create-a-movement-around-explainable-ai\u002F); Yet, as AI gets more widely deployed, the importance of having explainable models will increase. Simply, if systems are responsible for making a decision, there comes a step in the process whereby that decision has to be shown — communicating what the decision is, how it was made and – now – why did the AI do what it did.\n\n* [Why We Need to Audit Algorithms](https:\u002F\u002Fhbr.org\u002F2018\u002F11\u002Fwhy-we-need-to-audit-algorithms); James Guszcza Iyad Rahwan Will Bible Manuel Cebrian Vic Katyal; Algorithmic decision-making and artificial intelligence (AI) hold enormous potential and are likely to be economic blockbusters, but we worry that the hype has led many people to overlook the serious problems of introducing algorithms into business and society. Indeed, we see many succumbing to what Microsoft’s Kate Crawford calls “data fundamentalism” — the notion that massive datasets are repositories that yield reliable and objective truths, if only we can extract them using machine learning tools. A more nuanced view is needed. It is by now abundantly clear that, left unchecked, AI algorithms embedded in digital and social technologies can encode societal biases, accelerate the spread of rumors and disinformation, amplify echo chambers of public opinion, hijack our attention, and even impair our mental wellbeing.\n\n\n* [Taking machine thinking out of the black box](https:\u002F\u002Fnews.mit.edu\u002F2018\u002Fmit-lincoln-laboratory-adaptable-interpretable-machine-learning-0905); Anne McGovern; Adaptable Interpretable Machine Learning project is redesigning machine learning models so humans can understand what computers are thinking.\n\n* [Explainable AI won’t deliver. Here’s why](https:\u002F\u002Fhackernoon.com\u002Fexplainable-ai-wont-deliver-here-s-why-6738f54216be); Cassie Kozyrkov; Interpretability: you do understand it but it doesn’t work well. Performance: you don’t understand it but it does work well. Why not have both?\n\n* [We Need an FDA For Algorithms](http:\u002F\u002Fnautil.us\u002Fissue\u002F66\u002Fclockwork\u002Fwe-need-an-fda-for-algorithms);  Hannah Fry; Do we need to develop a brand-new intuition about how to interact with algorithms? What do you mean when you say that the best algorithms are the ones that take the human into account at every stage? What is the most dangerous algorithm?\n\n* [Explainable AI, interactivity and HCI](https:\u002F\u002Fwww.linkedin.com\u002Fpulse\u002Fexplainable-ai-interactivity-hci-erik-stolterman-bergqvist\u002F); \nErik Stolterman Bergqvist; develop AI systems that technically can explain their inner workings in some way that makes sense to people. approach the XAI from a legal point of view. explanable AI is needed for practical reasons, pproach the topic from a more philosophical perspective and ask some broader questions about how reasonable it is for humans to ask systems to be able to explain their actions\n\n* [Why your firm must embrace explainable AI to get ahead of the hype and understand the business logic of AI](https:\u002F\u002Fwww.hfsresearch.com\u002Fpointsofview\u002Fescape-the-black-box-take-steps-toward-explainable-ai-today-or-risk-damaging-your-business); Maria Terekhova; If AI is to have true business-ready capabilities, it will only succeed if we can design the business logic behind it. That means business leaders who are steeped in business logic need to be front-and-center in the AI design and management processes.\n \n* [Explainable AI : The margins of accountability](https:\u002F\u002Fwww.information-age.com\u002Fexplainable-ai-123476397\u002F); Yaroslav Kuflinski; How much can anyone trust a recommendation from an AI? Increasing the adoption of ethics in artificial intelligence\n\n### 2017\n\n* [Sent to Prison by a Software Program’s Secret Algorithms](https:\u002F\u002Fwww.nytimes.com\u002F2017\u002F05\u002F01\u002Fus\u002Fpolitics\u002Fsent-to-prison-by-a-software-programs-secret-algorithms.html); Adam Liptak The new York Times; The report in Mr. Loomis’s case was produced by a product called Compas, sold by Northpointe Inc. It included a series of bar charts that assessed the risk that Mr. Loomis would commit more crimes. The Compas report, a prosecutor told the trial judge, showed “a high risk of violence, high risk of recidivism, high pretrial risk.” The judge agreed, telling Mr. Loomis that “you’re identified, through the Compas assessment, as an individual who is a high risk to the community.”\n* [AI Could Resurrect a Racist Housing Policy](https:\u002F\u002Fmotherboard.vice.com\u002Fen_us\u002Farticle\u002F4x44dp\u002Fai-could-resurrect-a-racist-housing-policy) And why we need transparency to stop it.- \"The fact that we can't investigate the COMPAS algorithm is a problem\"\n\n### 2016\n\n* [How We Analyzed the COMPAS Recidivism Algorithm](https:\u002F\u002Fwww.propublica.org\u002Farticle\u002Fhow-we-analyzed-the-compas-recidivism-algorithm); ProPublica investigation. Black defendants were often predicted to be at a higher risk of recidivism than they actually were. Our analysis found that black defendants who did not recidivate over a two-year period were nearly twice as likely to be misclassified as higher risk compared to their white counterparts (45 percent vs. 23 percent). The analysis also showed that even when controlling for prior crimes, future recidivism, age, and gender, black defendants were 45 percent more likely to be assigned higher risk scores than white defendants.\n\n## Theses\n\n### 2018 \n\n* [Shedding Light on Black Box Machine Learning Algorithms, Development of an Axiomatic Framework to Assess the Quality of Methods that Explain Individual Predictions](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1808.05054.pdf) Milo Honegger; \n\n### 2016\n\n* [Uncertainty and Label Noise in Machine Learning](https:\u002F\u002Fdial.uclouvain.be\u002Fpr\u002Fboreal\u002Fobject\u002Fboreal:134618\u002Fdatastream\u002FPDF_01\u002Fview); Benoit Frenay; This thesis addresses three challenge of machine learning: high-dimensional data, label noise and limited computational resources.\n\n## Audio\n\n### 2018\n\n* [Explaining Explainable AI](https:\u002F\u002Fwww.brighttalk.com\u002Fwebcast\u002F16463\u002F346891\u002Fexplaining-explainable-ai); In this webinar, we will conduct a panel discussion with Patrick Hall and Tom Aliff around the business requirements of explainable AI and the subsequent value that can benefit any organization\n* [Approaches to Fairness in Machine Learning with Richard Zemel](https:\u002F\u002Ftwimlai.com\u002Ftwiml-talk-209-approaches-to-fairness-in-machine-learning-with-richard-zemel\u002F); Today we continue our exploration of Trust in AI with this interview with Richard Zemel, Professor in the department of Computer Science at the University of Toronto and Research Director at Vector Institute.\n\n* [Making Algorithms Trustworthy with David Spiegelhalter](https:\u002F\u002Ftwimlai.com\u002Ftwiml-talk-212-making-algorithms-trustworthy-with-david-speigelhalter\u002F); In this, the second episode of our NeurIPS series, we’re joined by David Spiegelhalter, Chair of Winton Center for Risk and Evidence Communication at Cambridge University and President of the Royal Statistical Society.\n\n## Workshops\n\n### 2018\n\n* [2nd Workshop on Explainable Artificial Intelligence](https:\u002F\u002Fcris.vub.be\u002Ffiles\u002F38962039\u002Fproceedings_XAI_2018.pdf); David W. Aha, Trevor Darrell,Patrick Doherty and Daniele Magazzeni; \n* [Explainable AI](http:\u002F\u002Fcdn.bdigital.org\u002FPDF\u002FBDC18\u002FBDC18_ExplainableAI.pdf); Ricardo\tBaeza-Yates; Big Data Congress 2018\n* [Trust and explainability: The relationship between humans & AI](http:\u002F\u002Fwww.imm.dtu.dk\u002F~tobo\u002FAI_chora2.pdf); Thomas Bolander;  The measure of success for AI applications is the value they create for human lives. In that light, they should be designed to enable people to understand AI systems successfully, participate in their use, and build their trust. AI technologies already pervade our lives. As they become a central force in society, the field is shifting from simply building systems that are intelligent to building intelligent systems that are human-aware and trustworthy.\n* [21 fairness definitions and their politics](https:\u002F\u002Ffairmlbook.org\u002Ftutorial2.html); This tutorial has two goals. The first is to explain the technical definitions. In doing so, I will aim to make explicit the values embedded in each of them. This will help policymakers and others better understand what is truly at stake in debates about fairness criteria (such as individual fairness versus group fairness, or statistical parity versus error-rate equality). It will also help com­puter scientists recognize that the proliferation of definitions is to be celebrated, not shunned, and that the search for one true definition is not a fruitful direction, as technical considerations cannot adjudicate moral debates.\n* [Proceedings of the 2018 ICML Workshop on Human Interpretability in Machine Learning (WHI 2018)](https:\u002F\u002Farxiv.org\u002Fhtml\u002F1807.01308)\n\n### 2017\n\n* [NIPS 2017 Tutorial on Fairness in Machine Learning](https:\u002F\u002Ffairmlbook.org\u002Ftutorial1.html); Solon Barocas, Moritz Hardt\n* [Interpretability for AI safety](http:\u002F\u002Fs.interpretable.ml\u002Fnips_interpretable_ml_2017_victoria_Krakovna.pdf); Victoria Krakovna; Long-term AI safety, Reliably specifying human preferences and values to advanced AI systems, Setting incentives for AI systems that are aligned with these preferences\n* [Debugging machine-learning](https:\u002F\u002Fwww.slideshare.net\u002Flopusz\u002Fdebugging-machinelearning); Michał Łopuszyński; Model introspection You can answer thy why question, only for very simple models (e.g., linear model, basic decision trees) Sometimes, it is instructive to run such a simple model on your dataset, even though it does not provide top-level performance You can boost your simple model by feeding it with more advanced (non-linearly transformed) features  \n\n## Other\n\n* All about explainable AI, algorithmic fairness and more by Andrey Sharapov [https:\u002F\u002Fgithub.com\u002Fandreysharapov\u002Fxaience](https:\u002F\u002Fgithub.com\u002Fandreysharapov\u002Fxaience)\n* FAT ML [Fairness, Accountability, and Transparency in Machine Learning](http:\u002F\u002Fwww.fatml.org\u002F)\n* UW Interactive Data Lab  [IDL](https:\u002F\u002Fidl.cs.washington.edu\u002Fpapers\u002F)\n* CS 294: Fairness in Machine Learning [Fairness Berkeley](https:\u002F\u002Ffairmlclass.github.io\u002F)\n* [Machine Learning Fairness by Google](https:\u002F\u002Fdevelopers.google.com\u002Fmachine-learning\u002Ffairness-overview\u002F)\n* [Awesome Interpretable Machine Learning ](https:\u002F\u002Fgithub.com\u002Flopusz\u002Fawesome-interpretable-machine-learning) by Michał Łopuszyński\n* [Explainable AI: Expanding the frontiers of artificial intelligence](https:\u002F\u002Fwww.linkedin.com\u002Flearning\u002Flearning-xai-explainable-artificial-intelligence\u002Fexplainable-ai-expanding-the-frontiers-of-artificial-intelligence)\n* [Google - Explainable AI](https:\u002F\u002Fcloud.google.com\u002Fexplainable-ai\u002F) - Tools and frameworks to deploy interpretable and inclusive machine learning models.\n* [Google Explainability whitepaper](https:\u002F\u002Fstorage.googleapis.com\u002Fcloud-ai-whitepapers\u002FAI%20Explainability%20Whitepaper.pdf)\n","# 与可解释人工智能（XAI）相关的有趣资源\n\n* [科学期刊中的论文和预印本](README.md#papers)\n* [书籍和其他较长的资料](README.md#books)\n* [软件工具](README.md#tools)\n* [报纸上的短篇文章](README.md#articles)\n* [其他](README.md#theses)\n\n## 论文\n\n### 2021年\n\n* [在多模态医学影像任务上评估可解释人工智能：现有算法能否满足临床需求？](https:\u002F\u002Fwww2.cs.sfu.ca\u002F~hamarneh\u002Fecopy\u002Faaai2022.pdf)。魏娜·金、李晓晓、加桑·哈马尔内。能够向临床终端用户解释预测结果，是充分利用人工智能（AI）模型进行临床决策支持的必要条件。对于医学影像而言，特征归因图或热图是最常见的解释形式，它会突出显示对AI模型预测至关重要的特征。然而，目前尚不清楚热图在解释多模态医学影像决策方面的表现如何——在这种情况下，每种模态\u002F通道都承载着同一生物医学现象的不同临床意义。理解这些依赖于模态的特征对于临床用户解读AI决策至关重要。为了解决这一具有重要临床意义但技术上却被忽视的问题，我们提出了模态特异性特征重要性（MSFI）指标。该指标编码了关于模态优先级和模态特异性特征定位的临床要求。我们基于MSFI、其他非模态特异性的指标以及临床医生用户研究，对16种常用的XAI算法进行了以临床需求为导向的系统性评估。结果表明，大多数现有的XAI算法无法充分突出模态特异性的重要特征来满足临床需求。评估结果和MSFI指标可以指导XAI算法的设计和选择，以满足临床医生对多模态解释的要求。\n\n![EvaluatingExplainableAI](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_f41215d95ba2.png)\n\n\n* [我该如何选择解释器？基于应用的后验解释评估](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002F10.1145\u002F3442188.3445941)。近年来，许多研究工作提出了新的可解释人工智能（XAI）方法，旨在生成具有特定属性或期望特征的模型解释，例如保真度、鲁棒性或人类可解释性。然而，这些解释很少根据其对决策任务的实际影响进行评估。如果没有这样的评估，所选择的解释可能会实际上损害ML模型与最终用户组成的整体系统的性能。本研究旨在通过提出XAI测试这一面向应用的评估方法来填补这一空白，该方法专门用于隔离向最终用户提供不同程度信息的影响。我们按照XAI测试的方法，对三种流行的后验解释方法——LIME、SHAP和TreeInterpreter——在一项真实的欺诈检测任务中进行了评估，使用真实数据、已部署的ML模型以及欺诈分析师。实验过程中，我们逐步增加了提供给欺诈分析师的信息，分为三个阶段：仅数据，即仅有交易数据，不访问模型分数或解释；数据+ML模型分数；以及数据+ML模型分数+解释。通过强有力的统计分析，我们表明，总体而言，这些流行的解释器的效果并不理想。结论要点包括：i) 仅提供数据时，在所有测试变体中，决策准确率最高且决策时间最长；ii) 所有解释器都能提高“数据+ML模型分数”变体的准确率，但与“仅数据”相比，准确率仍然较低；iii) LIME最受用户青睐，这可能是因为其在不同案例之间的解释差异性显著较低。\n\n![choose_explainer](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_6b7fa7fb4bbf.png)\n\n\n* [理由、价值、利益相关者：可解释人工智能的哲学框架](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fpdf\u002F10.1145\u002F3442188.3445866)。在福利分配和刑事司法等重大决策中使用不透明的人工智能系统所带来的社会和伦理影响，已经在计算机科学家、伦理学家、社会科学家、政策制定者和最终用户等多个利益相关者之间引发了热烈的讨论。然而，由于缺乏共同的语言或一个多维框架来恰当地连接这场辩论的技术、认识论和规范性方面，讨论几乎无法达到应有的成效。本文借鉴了关于解释的本质和价值的哲学文献，提出了一种多方面的框架，通过识别与人工智能预测最相关的解释类型、承认社会和伦理价值在评估这些解释中的相关性和重要性，以及展示这些解释对于采用多样化方法改进真实算法生态系统设计的重要性，从而为当前的辩论带来了更多的概念精确性。因此，所提出的哲学框架为建立人工智能系统的技术和伦理层面之间的相关性奠定了基础。\n\n![RVSFramework](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_d2923e724fbf.png)\n\n* [人工智能信任的正式化：人类对AI信任的前提、原因和目标](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fpdf\u002F10.1145\u002F3442188.3445923)。\n信任是人与AI互动的核心组成部分，因为“不正确”的信任水平可能导致技术的误用、滥用或弃用。那么，究竟什么是AI中的信任本质呢？信任的认知机制有哪些前提和目标？我们又该如何促进这些前提和目标，或者评估它们是否在特定的互动中得到满足呢？本研究旨在回答这些问题。我们讨论了一个受人际信任（即社会学家定义的人与人之间的信任）启发但并不完全相同的信任模型。该模型基于两个关键属性：用户的脆弱性；以及预测AI模型决策影响的能力。我们引入了“契约式信任”的正式化概念，即用户与AI模型之间的信任是基于某种隐含或显式的契约将被遵守的信任；同时还引入了“可信度”的正式化概念（这与社会学中的“可信度”概念有所区别），并由此引出了“合理信任”和“不合理信任”的概念。我们指出，合理信任的可能原因包括内在推理和外在行为，并讨论了如何设计可信的AI、如何评估信任是否已经显现以及这种信任是否合理。最后，我们利用我们的正式化框架阐明了信任与XAI之间的联系。\n\n![FormalizingTrust](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_f47b239a8db3.png)\n\n* [机器学习理解中贡献值图的比较评估](https:\u002F\u002Flink.springer.com\u002Farticle\u002F10.1007\u002Fs12650-021-00776-w) 可解释人工智能领域旨在帮助专家理解复杂的机器学习模型。其中一种关键方法是展示某个特征对模型预测的影响，从而协助专家验证和确认模型的预测结果。然而，这一领域仍面临诸多挑战。例如，由于可解释性的主观性，诸如“特征贡献”之类的概念至今尚未有明确的定义。不同的技术往往基于不同的假设，这可能导致不一致甚至相互矛盾的观点。在本研究中，我们提出了一种新颖的方法——局部与全局贡献值图，用于可视化特征对预测的影响及其与特征取值之间的关系。我们讨论了设计决策，并展示了一个示例性的视觉分析实现，该实现为理解模型提供了新的洞见。通过用户研究，我们发现这些可视化工具能够提高解释的准确性和可信度，同时缩短获取洞察所需的时间，从而有效辅助模型解释。[[网站]](https:\u002F\u002Fexplaining.ml\u002Fcvplots)\n\n![CVPlots2021](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_4755a3c97b5a.jpg)\n\n\n\n### 2020年\n\n* [用于基准测试机器学习方法的性能-可解释性框架：应用于多变量时间序列分类器](https:\u002F\u002Farxiv.org\u002Fabs\u002F2005.14501)；凯文·福韦尔、维罗妮克·马松、埃莉莎·弗龙东；我们的研究旨在提出一种新的性能-可解释性分析框架，以评估和基准测试机器学习方法。该框架详细定义了一组特性，用以具体化对现有机器学习方法的性能与可解释性评估。为了说明该框架的应用，我们将其用于基准测试当前最先进的多变量时间序列分类器。\n\n![MultivariateTimeSeriesClassifiers](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_50e61ed1ab50.png)\n\n* [EXPLAN：利用自适应邻域生成解释黑盒分类器](https:\u002F\u002Fieeexplore.ieee.org\u002Fdocument\u002F9206710)；佩曼·拉苏利和英姬·齐赫·于；在基于扰动的解释方法中，如何定义具有代表性的局部区域是一个亟待解决的问题，它直接影响解释的忠实性和合理性。为此，我们提出了一种稳健且直观的方法——利用自适应邻域生成来解释黑盒分类器（EXPLAN）。EXPLAN是一种模块化的算法，包括密集数据生成、代表性数据选择、数据平衡以及基于规则的可解释模型构建。该方法会综合考虑从黑盒决策函数中提取的邻近信息以及数据本身的结构，从而为待解释的实例创建一个具有代表性的邻域。作为一种局部的、模型无关的解释方法，EXPLAN以逻辑规则的形式生成解释，这些规则高度可解释，非常适合用于定性分析模型的行为。我们探讨了忠实性与可解释性之间的权衡，并通过与当前最先进的解释方法LIME、LORE和Anchor的全面对比，展示了所提算法的性能。在真实数据集上进行的实验表明，我们的方法在解释的忠实性、精确性和稳定性方面均取得了扎实的实证结果。[[论文]](https:\u002F\u002Fieeexplore.ieee.org\u002Fdocument\u002F9206710) [[GitHub]](https:\u002F\u002Fgithub.com\u002Fpeymanras\u002FEXPLAN)\n\n* [GRACE：生成简洁而富有信息量的对比样本以解释神经网络模型的预测](https:\u002F\u002Farxiv.org\u002Fabs\u002F1911.02042)；泰·勒、王苏航、李东元；尽管近年来针对图像和文本数据的可解释人工智能\u002F机器学习领域取得了显著进展，但目前大多数解决方案并不适用于解释那些数据集为表格形式、特征以高维向量化格式呈现的神经网络模型的预测。为缓解这一局限性，我们借鉴了两个重要理念——即来自因果关系的“干预式解释”和来自哲学的“解释具有对比性”——并提出了一种名为GRACE的新方案，以更好地解释针对表格型数据集的神经网络模型预测。具体而言，给定模型预测标签为X时，GRACE会通过干预生成一个经过最小修改的对比样本，使其被分类为Y，并附带直观的文字说明，回答“为何是X而非Y？”这一问题。我们使用十一组不同规模和领域的公开数据集进行了全面实验（例如，特征数量范围从5到216），并在忠实性、简洁性、信息增益和影响力等多个指标上将GRACE与竞争基线进行了比较。用户研究表明，我们生成的解释不仅更加直观易懂，还能使最终用户在获得解释后做出的决策准确性比使用Lime时提高多达60%。\n\n* [ExplainExplore：机器学习解释的可视化探索](https:\u002F\u002Fresearch.tue.nl\u002Ffiles\u002F170065756\u002F09086281.pdf)；丹尼斯·科拉里斯、杰克·J·范威克；机器学习模型通常表现出复杂的行为，难以理解。近年来，可解释人工智能领域的研究已经开发出一些有前景的技术，可以通过特征贡献向量来解释此类模型的内部运作。这些向量在多种应用场景中都非常有用。然而，这一过程中涉及大量参数，而由于评估可解释性的主观性，很难确定最佳设置。为此，我们推出了ExplainExplore：一个交互式的解释系统，用于探索符合数据科学家主观偏好的解释方式。我们借助数据科学家的专业知识，找到最优的参数设置和实例扰动方案，并促进数据科学家与领域专家就模型及其解释展开讨论。我们以一个真实数据集为例，展示了该方法在机器学习解释的探索与调优方面的有效性。[[网站]](https:\u002F\u002Fexplaining.ml)\n\n![ExplainExplore2020](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_e678eff96025.png)\n\n* [FACE：可行且可操作的反事实解释](https:\u002F\u002Farxiv.org\u002Fabs\u002F1909.09369)；拉斐尔·波亚季、卡茨珀·索科尔、劳尔·桑托斯-罗德里格斯、蒂耶尔·德·比、彼得·弗拉赫；反事实解释领域的研究通常聚焦于“最接近可能世界”原则，即寻找能够导致期望结果的最小改动。然而，在本文中，我们指出尽管这一方法在直觉上颇具吸引力，但它存在当前文献尚未解决的缺陷。首先，由现有最先进系统生成的反事实示例并不一定代表底层数据分布，因此可能会提出难以实现的目标（例如，一位因严重残疾而未能通过人寿保险申请的人，可能会被建议多参加体育锻炼）。其次，这些反事实可能并未基于主体当前状态与建议状态之间的“可行路径”，从而使得可操作的补救措施变得不可行（如，技能较低且未能获得抵押贷款的申请人可能会被告知要将收入翻倍，但在不先提升自身技能的情况下，这显然难以实现）。\n\n![FACE2020](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_26eb939e021f.png)\n\n* [可解释性信息表：一种系统评估可解释性方法的框架](https:\u002F\u002Farxiv.org\u002Fabs\u002F1912.05100)；卡茨珀·索科尔、彼得·弗拉赫；机器学习中的解释形式多种多样，但关于其理想属性的共识尚未形成。在本文中，我们提出了一套分类体系和描述符，可用于从功能、运行、可用性、安全性和验证五个关键维度对可解释性系统进行系统化刻画与评估。为了设计出全面且具有代表性的分类体系及相应描述符，我们调研了可解释人工智能领域的相关文献，提取了其他研究者在其工作中明确提出或隐含使用的标准与要求。该调研涵盖了介绍新型可解释性算法的论文，以了解指导其开发的标准以及这些算法的评估方式；同时也包括从计算机科学和社会科学视角提出此类标准的文献。这一创新框架能够系统地比较和对比不同的可解释性方法，不仅有助于更好地理解它们的能力，还能识别其理论特性与实际实现之间的差异。我们进一步将该框架具体化为可解释性信息表，使研究人员和从业者都能快速把握特定可解释性方法的优势与局限。\n\n* [一种解释无法满足所有需求：交互式解释在提升机器学习透明度方面的潜力](https:\u002F\u002Farxiv.org\u002Fabs\u002F2001.09734)；卡茨珀·索科尔、彼得·弗拉赫；随着基于机器学习算法的预测系统在工业界日益普及，对其透明性的需求也愈发迫切。每当黑箱算法的预测影响到人类事务时，就必须审视这些算法的内部运作机制，并向相关利益方——包括系统工程师、系统操作人员以及案件当事人——解释其决策依据。尽管目前已有多种可解释性和可说明性方法可供选择，但尚无一种万能方案能够同时满足各方的不同期望与相互冲突的目标。本文旨在应对这一挑战，以对比式解释这一最先进的可解释机器学习方法为例，探讨交互式机器学习在提升黑箱系统透明度方面的潜力。具体而言，我们展示了如何通过交互调整条件语句来个性化反事实解释，并借助后续的“如果……会怎样？”问题进一步挖掘额外的解释信息。\n\n![oneXdoesnotFitAll](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_1cf1d08654c1.png)\n\n* [FAT取证工具包：用于算法公平性、问责制和透明度的Python工具箱](https:\u002F\u002Farxiv.org\u002Fabs\u002F1909.05167)；卡茨珀·索科尔、劳尔·桑托斯-罗德里格斯、彼得·弗拉赫；鉴于机器学习算法可能带来的潜在危害，预测系统的公平性、问责制和透明度显得尤为重要。近期文献曾建议对这些方面进行自愿性自我报告——例如针对数据集的数据说明书——但其覆盖范围往往仅限于机器学习流程中的某一环节，且制作过程需要大量人工投入。为打破这一僵局，确保高质量、公平、透明且可靠的机器学习系统，我们开发了一个开源工具箱，能够自动、客观地检测并报告这些系统在公平性、问责制和透明度方面的特定指标，供其工程师和用户参考。本文详细介绍了该Python工具箱的设计、适用范围及使用示例。该工具箱可对机器学习过程的各个环节——数据及其特征、模型和预测——的公平性、问责制和透明度进行检测。\n\n![FATForensics](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_861fcb880679.png)\n\n* [自适应可解释神经网络（AxNNs）](https:\u002F\u002Farxiv.org\u002Fabs\u002F2004.02353)；陈杰、乔尔·沃恩、维贾扬·奈尔、阿古斯·苏吉安托；尽管机器学习技术已在多个领域取得成功应用，但模型的黑箱特性却给结果的解释与说明带来了挑战。为此，我们提出了一种名为自适应可解释神经网络（AxNNs）的新框架，旨在同时实现优异的预测性能与模型的可解释性。在预测性能方面，我们采用两阶段流程，构建由广义加性模型网络和加性指数模型组成的结构化神经网络（通过可解释神经网络实现），可使用提升集成或堆叠集成的方法完成。而在可解释性方面，我们展示了如何将AxNN的输出分解为主效应和高阶交互效应。\n\n![AxNN](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_eb75b98c1e1a.png)\n\n* [嵌入模型中的信息泄露](https:\u002F\u002Farxiv.org\u002Fabs\u002F2004.00053)；宋 Congzheng、Ananth Raghunathan；我们证明，嵌入除了编码通用语义外，通常还会生成一个泄露输入数据敏感信息的向量。我们开发了三类攻击方法，以系统性地研究嵌入可能泄露的信息。首先，嵌入向量可以被逆向还原，从而部分恢复部分输入数据。其次，嵌入可能会揭示输入中固有的、与当前语义任务无关的敏感属性。第三，对于不常出现的训练数据样本，嵌入模型会泄露一定程度的成员身份信息。我们在文本领域的多种最先进嵌入模型上广泛评估了这些攻击方法。此外，我们还提出并评估了一些防御措施，这些措施能够在对模型效用影响较小的情况下，在一定程度上防止信息泄露。\n\n![InformationLeakage](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_4b996cea9451.png)\n\n\n* [弥合人工智能问责鸿沟：定义端到端的内部算法审计框架](https:\u002F\u002Farxiv.org\u002Fabs\u002F2001.00973)；Inioluwa Deborah Raji 等人。人们对人工智能系统社会影响日益增长的担忧，催生了一波学术和新闻报道，其中部署的系统由算法部署组织外部的调查人员进行危害性审计。然而，从业者在系统部署前识别其潜在危害仍然面临挑战，而一旦系统上线，新出现的问题往往难以甚至无法追溯到其根源。在本文中，我们提出了一种端到端支持人工智能系统开发的算法审计框架，可在组织内部开发生命周期的各个阶段应用。审计的每个阶段都会产生一组文档，这些文档共同构成一份整体审计报告，该报告基于组织的价值观或原则来评估整个过程中所做决策的合理性。\n\n![AlgorithmicAuditing](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_f20190352c48.png)\n\n* [解释解释器：LIME 的首次理论分析](https:\u002F\u002Farxiv.org\u002Fabs\u002F2001.03447)；Damien Garreau、Ulrike von Luxburg；机器学习越来越多地被应用于敏感领域，有时甚至取代人类参与关键决策过程。因此，这些算法的可解释性变得尤为迫切。目前广受欢迎的可解释性算法之一是 LIME（局部可解释的模型无关解释）。在本文中，我们首次对 LIME 进行了理论分析。当待解释函数为线性时，我们推导出了可解释模型系数的闭式表达式。好消息是，这些系数与待解释函数的梯度成正比：LIME 确实能够发现有意义的特征。然而，我们的分析也表明，参数选择不当可能导致 LIME 错过重要特征。\n\n![extLIME](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_036499b457ed.png)\n\n\n\n### 2019年\n\n* [bLIMEy：超越 LIME 的代理预测解释？](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1910.13016.pdf)；Kacper Sokol、Alexander Hepburn、Raul Santos-Rodriguez、Peter Flach。黑箱机器学习预测的代理解释器在可解释人工智能领域具有至关重要的地位，因为它们可以应用于任何类型的数据（图像、文本和表格数据），且与具体模型无关，属于事后解释方法。然而，局部可解释的模型无关解释（LIME）算法常常被误认为是代理解释器这一更广泛框架的代表，这可能导致人们误以为它是解决代理可解释性的唯一方案。在本文中，我们通过提出一种构建自定义局部黑箱模型预测代理解释器的原则性算法框架——包括 LIME 本身——鼓励社区“自己动手构建 LIME”（bLIMEy）。为此，我们展示了如何将代理解释器家族分解为算法上独立且可互操作的模块，并以 LIME 为例，探讨了这些组件选择对最终解释器功能特性的影响。\n\n![bLIMEy](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_9132ffb20851.png)\n\n* [十六个注意力头真的比一个更好吗？](https:\u002F\u002Farxiv.org\u002Fabs\u002F1905.10650)；Paul Michel、Omer Levy、Graham Neubig。注意力机制是一种强大而普遍存在的技术，它允许神经网络在做出预测时通过对信息加权平均来聚焦于特定的关键信息。尤其是多头注意力机制，已成为许多最新 NLP 模型的核心驱动力，例如基于 Transformer 的机器翻译模型和 BERT。在本文中，我们发现了一个令人惊讶的现象：即使模型在训练时使用了多个注意力头，但在实际测试时，很大一部分注意力头都可以被移除，而不会显著影响性能。事实上，某些层甚至可以简化为单个注意力头。我们进一步研究了用于剪枝模型的贪心算法，以及由此可能带来的速度、内存效率和准确率提升。\n\n![DoWeNeed16Heads](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_20be89a3848e.png)\n\n\n* [揭秘 BERT 的黑暗秘密](https:\u002F\u002Farxiv.org\u002Fabs\u002F1908.08593)；Olga Kovaleva、Alexey Romanov、Anna Rogers、Anna Rumshisky。基于 BERT 的架构目前在许多 NLP 任务中表现出最先进的性能，但关于其成功背后的精确机制却知之甚少。在本工作中，我们专注于对自注意力机制的解读，这是 BERT 的基础组成部分之一。我们利用 GLUE 数据集的部分任务和一组精心设计的关注特征，提出了一套方法论，并对 BERT 各个注意力头所编码的信息进行了定性和定量分析。我们的研究结果表明，不同注意力头之间存在有限的一组重复出现的注意力模式，这暗示了模型的整体过度参数化。尽管不同的注意力头持续使用相同的注意力模式，它们对不同任务的性能影响却各不相同。我们发现，手动禁用某些注意力头反而能使微调后的 BERT 模型性能得到提升。\n\n![DarkSecrets](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_d192ef9cbbd6.png)\n\n* [人工智能中的解释：来自社会科学的洞见](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1706.07269.pdf)；蒂姆·米勒。近年来，随着研究人员和从业者致力于使算法更加易于理解，可解释人工智能领域迎来了新的发展热潮。这一领域的许多研究都集中在向人类观察者明确解释决策或行为上，而认为从人类彼此之间的解释方式中汲取灵感，可以为人工智能的解释提供有益的起点，这一点并不令人争议。然而，也必须承认，目前大多数可解释人工智能的研究仅仅依赖于研究者对“良好”解释的直觉判断。事实上，在哲学、心理学和认知科学中，已有大量且极具价值的研究探讨了人们如何定义、生成、选择、评估并呈现解释，并指出人们在解释过程中会受到特定认知偏差和社会期望的影响。本文主张，可解释人工智能领域应当建立在这些既有研究的基础上，并综述了来自哲学、认知心理学\u002F认知科学以及社会心理学的相关文献，这些文献正是围绕上述主题展开的。\n\n![SocialSciences4XAI](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_2c899036fb3e.png)\n\n![SocialSciences4XAI2](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_e78ec9aebd0a.png)\n\n* [AnchorViz：促进交互式机器学习中的语义数据探索与概念发现](https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Fresearch\u002Fpublication\u002Fanchorviz-facilitating-semantic-data-exploration-and-concept-discovery-for-interactive-machine-learning\u002F)；Jina Suh 等人。在构建交互式机器学习（iML）分类器时，人类关于目标类别的知识可以成为强有力的参考，从而使分类器对未见过的样本更具鲁棒性。主要挑战在于找到那些能够帮助发现或细化当前分类器尚无对应特征的概念的未标注样本（即分类器存在特征盲区）。然而，要求人类列出详尽的样本清单并不现实，尤其是对于难以回忆的稀有概念而言更是如此。本文介绍了一种名为 AnchorViz 的交互式可视化工具，它通过人类主导的语义数据探索，促进对预测错误及此前未见概念的发现。\n\n![AnchorViz](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_1c91f0644700.png)\n\n* [随机化消融特征重要性](https:\u002F\u002Farxiv.org\u002Fabs\u002F1910.00174)；卢克·梅里克。假设有模型 f，可根据输入特征向量 x=x1,x2,…,xM 预测目标 y。我们希望衡量每个特征对于模型做出准确预测能力的重要性。为此，我们考察当将每个特征从模型中隐藏或消融时，某种用于衡量预测优劣的指标（我们称之为“损失”）平均会发生怎样的变化。所谓消融特征，即用另一个可能的值随机替换该特征的原始值。通过对多个数据点及多种可能的替换进行平均，我们可以量化某一特征对模型预测准确性的影响程度。此外，我们还提出了统计上的不确定性度量，以说明我们基于有限的数据集和有限次消融实验所测得的特征重要性，究竟在多大程度上接近其理论上的真实值。\n\n* [树模型的可解释人工智能：从局部解释到全局理解](https:\u002F\u002Farxiv.org\u002Fabs\u002F1905.04610)；斯科特·M·伦德伯格、加布里埃尔·埃里昂、休·陈、亚历克斯·德格雷夫、乔丹·M·普鲁特金、巴拉·奈尔、罗尼特·卡茨、乔纳森·希梅尔法布、尼莎·班萨尔、苏-因·李。基于树的机器学习模型，如随机森林、决策树和梯度提升树，是当今实践中最为流行的非线性预测模型，但对其预测结果的解释却一直关注较少。在此，我们通过三项主要贡献显著提升了树模型的可解释性：1) 首个基于博弈论的多项式时间最优解释算法；2) 一种直接衡量局部特征交互效应的新类型解释；3) 一套结合每条预测的局部解释来理解全局模型结构的新工具。我们将这些工具应用于三个医学机器学习问题，证明通过整合大量高质量的局部解释，既能够展现全局结构，又能保持对原始模型的忠实性。借助这些工具，我们得以：i) 在美国普通人群中识别出高风险但低频的非线性死亡危险因素；ii) 突出具有共同风险特征的不同人群子群；iii) 发现慢性肾脏病风险因素之间的非线性交互作用；iv) 监控医院中部署的机器学习模型，从而确定哪些特征正在随时间推移降低模型性能。鉴于基于树的机器学习模型的广泛应用，这些对其可解释性的改进将在广泛的领域产生深远影响。[GitHub](https:\u002F\u002Fgithub.com\u002Fsuinleelab\u002Ftreeexplainer-study)\n\n![treeeexplainerpr](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_751c05eee74a.png)\n\n* [一种解释并不适用于所有情况：人工智能可解释性技术工具包与分类体系](https:\u002F\u002Farxiv.org\u002Fabs\u002F1909.03012)；Vijay Arya、Rachel K. E. Bellamy、Pin-Yu Chen、Amit Dhurandhar、Michael Hind、Samuel C. Hoffman、Stephanie Houde、Q. Vera Liao、Ronny Luss、Aleksandra Mojsilović、Sami Mourad、Pablo Pedemonte、Ramya Raghavendra、John Richards、Prasanna Sattigeri、Karthikeyan Shanmugam、Moninder Singh、Kush R. Varshney、Dennis Wei、Yunfeng Zhang；\n随着人工智能和机器学习算法在社会中的应用日益广泛，来自各方利益相关者的呼声也愈发高涨——要求这些算法能够对其输出结果作出解释。与此同时，这些利益相关者，无论是受影响的普通民众、政府监管机构、领域专家，还是系统开发者，对于解释的需求各不相同。为应对这些需求，我们推出了 AI Explainability 360（此网址），这是一个开源软件工具包，内含八种多样且处于前沿的可解释性方法以及两种评估指标。同样重要的是，我们提供了一个分类体系，以帮助需要解释的主体在解释方法的广阔领域中找到方向——不仅包括工具包中的方法，也涵盖更广泛的可解释性文献。针对数据科学家及其他工具包用户，我们设计了一套可扩展的软件架构，将各种方法按照其在 AI 模型构建流程中的位置进行组织。此外，我们还探讨了如何通过简化算法、推出更易懂的版本、编写教程以及开发交互式网页演示等方式，使研究成果更加贴近解释的受众，并向不同群体及应用领域介绍人工智能的可解释性。综上所述，我们的工具包与分类体系有助于识别当前尚需补充的可解释性方法空白，并为未来新方法的引入提供平台。\n[GitHub](https:\u002F\u002Fgithub.com\u002FIBM\u002FAIX360)；[演示](http:\u002F\u002Faix360.mybluemix.net\u002Fdata)\n\n![aix360](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_8dcf5ea2ca62.png)\n\n* [LIRME：局部可解释的排序模型解释](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002F10.1145\u002F3331184.3331377)；Manisha Verma、Debasis Ganguly；信息检索（IR）模型通常会采用复杂的词权重变化来计算查询-文档对的综合相似度得分。如果将 IR 模型视为黑箱，则很难理解或解释为何某些文档会在特定查询下被检索到前列。局部解释模型已成为理解分类模型单个预测的一种流行方式。然而，目前尚无系统性的研究来探索如何解释 IR 模型，而这正是本文的核心贡献。我们探讨了三种采样方法来训练解释模型，并提出了两种指标用于评估针对 IR 模型生成的解释。实验结果揭示了一些有趣的发现：一是样本多样性对训练局部解释模型至关重要；二是模型的稳定性与其用于解释该模型的参数数量呈反比关系。\n\n* [借助“幽灵变量”理解复杂预测模型](https:\u002F\u002Farxiv.org\u002Fabs\u002F1912.06407)；Pedro Delicado、Daniel Peña；一种为复杂预测模型中的每个解释变量分配相关性度量的方法。我们假设已有一个训练集用于拟合模型，以及一个测试集用于检验模型的泛化性能。首先，通过比较包含所有变量的模型与仅用“幽灵变量”替代目标变量的另一模型在测试集上的预测结果，计算出每个变量的单独相关性。其次，利用由各个变量单独效应向量构成的相关性矩阵的特征值，检查变量之间的联合效应。研究表明，在线性或加性等简单模型中，所提出的度量与变量的标准显著性度量相关；而在神经网络模型（以及其他算法类预测模型）中，该方法能够提供其他常规方法难以获得的关于变量联合与单独效应的信息。\n\n![ghostVariables](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_2044715397d1.png)\n\n* [揭秘“聪明汉斯”式预测器并评估机器究竟学到了什么](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41467-019-08987-4)；Sebastian Lapuschkin、Stephan Wäldchen、Alexander Binder、Grégoire Montavon、Wojciech Samek、Klaus-Robert Müller；当前的学习型机器已经成功解决了许多高难度的实际问题，达到了极高的准确率，并展现出看似智能的行为。在此，我们运用最新的解释先进学习机器决策的技术，分析了来自计算机视觉和街机游戏领域的多项任务。这展示了从天真短视到深谋远虑、从简单直接到策略周密等多种问题解决行为模式。我们观察到，标准的性能评估指标往往无法区分这些多样的问题解决行为。此外，我们提出了一种半自动化的光谱相关性分析方法，能够切实有效地刻画和验证非线性学习机器的行为。这有助于评估所学习的模型是否确实能可靠地解决其设计初衷所针对的问题。同时，我们的工作旨在为当前关于机器智能的热烈讨论增添一份审慎的声音，并承诺以更为细致入微的方式评估和评判近期取得的一些成功成果。\n\n![spray](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_8364bc190380.png)\n\n* [预测解释中的特征影响力](https:\u002F\u002Fwww.researchgate.net\u002Fpublication\u002F335189270_Feature_Impact_for_Prediction_Explanation)；Mohammad Bataineh；全球各地的企业都在积极采用复杂的机器学习（ML）技术，构建先进的预测模型，以优化运营和服务，并辅助决策。尽管这些 ML 技术功能强大，在多个行业中都取得了显著成效，但许多行业普遍反馈的一个问题是：这些技术往往过于“黑箱”，缺乏对特定预测概率是如何得出的细节说明。本文提出了一种创新算法，通过按各特征对模型预测的贡献程度排列清单，从而弥补这一不足。这种名为“预测解释中的特征影响力”（FIPE）的新算法，结合了单个特征的变化及其相互关联，计算出每个特征对预测的影响。FIPE 的真正优势在于其高效的计算能力，无论使用何种基础 ML 技术，都能快速给出特征影响力的结果。\n\n![FIPE](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_6856590a0323.png)\n\n* [相对归因传播：解读深度神经网络中各单元的相对贡献](https:\u002F\u002Farxiv.org\u002Fabs\u002F1904.00605)；Nam Woo-Jeoung、Shir Gur、Choi Jaesik、Wolf Lior、Lee Seong-Whan；随着深度神经网络（DNN）在多个领域展现出超越人类的表现，人们对理解DNN复杂内部机制的兴趣日益浓厚。本文提出了一种名为相对归因传播（RAP）的方法，该方法从层间相对影响的角度出发，将DNN的输出预测分解为相关（正向）和不相关（负向）归因两部分。每个神经元的相关性根据其贡献程度被划分为正向与负向，并同时遵循守恒原则。通过考虑神经元按相对优先级分配的相关性，RAP能够为每个神经元赋予一个与输出相关的双极重要性评分——从高度相关到高度不相关。因此，我们的方法相比传统解释方法，能够以更加清晰、细致的可视化方式呈现分离后的归因结果，从而更好地解释DNN的行为。为了验证RAP传播的归因是否准确反映了每种含义，我们采用了以下评估指标：(i) 外部-内部相关性比例，(ii) 分割mIOU，以及(iii) 区域扰动。在所有实验和指标中，我们的方法均显著优于现有文献。\n\n![Relative_Attributing_Propagation](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_7cf15fffab41.png)\n\n* [守门人问题：远程可解释性的挑战](https:\u002F\u002Farxiv.org\u002Fabs\u002F1910.01432v1)；Le Merrer Erwan、Tredan Gilles；可解释性的概念旨在满足社会对机器学习决策透明度的需求。其核心思想很简单：就像人类一样，算法也应当能够解释其决策背后的逻辑，以便对其公平性进行评估。然而，尽管这种思路在本地场景下颇具前景（例如在训练过程中调试模型时对其进行解释），但我们认为，这一逻辑无法简单地迁移到远程场景中——即由服务提供商训练好的模型仅能通过API访问的情况。这恰恰构成了从社会视角来看最需要透明度的应用场景，因而显得尤为棘手。通过类比于夜店守门员（可能在拒绝顾客入场时给出不实的解释），我们证明了提供解释并不能阻止远程服务就其决策的真实原因撒谎。更具体地说，我们通过构造一种针对解释的攻击，证明了对于单一解释而言，远程可解释性是不可能实现的；该攻击会向查询用户隐藏具有歧视性的特征。我们还提供了这一攻击的具体实现示例。随后，我们进一步指出，在实际应用中，观察者利用多份解释来寻找不一致之处从而发现此类攻击的概率非常低。这从根本上削弱了远程可解释性这一概念本身。\n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_50e90ef5a10d.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_50e90ef5a10d.png)\n\n* [通过影响力函数理解黑盒预测](https:\u002F\u002Farxiv.org\u002Fabs\u002F1703.04730)；Koh Pang Wei、Liang Percy；我们如何解释黑盒模型的预测呢？在本文中，我们使用来自稳健统计学的经典技术——影响力函数——沿着学习算法的路径追溯至训练数据，从而识别出对特定预测贡献最大的训练样本点。为了将影响力函数扩展到现代机器学习环境中，我们开发了一种简单高效的实现方案，仅需梯度和Hessian向量乘积的oracle访问权限即可。我们证明，即使在理论失效的非凸、不可微分模型上，对影响力函数的近似计算仍能提供有价值的信息。在线性模型和卷积神经网络中，我们展示了影响力函数在多种用途上的价值：理解模型行为、调试模型、检测数据集错误，甚至用于创建视觉上难以区分的训练集投毒攻击。\n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_85feb97fb7ff.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_85feb97fb7ff.png)\n\n* [迈向XAI：构建解释流程体系](https:\u002F\u002Fwww.researchgate.net\u002Fprofile\u002FMennatallah_El-Assady\u002Fpublication\u002F332802468_Towards_XAI_Structuring_the_Processes_of_Explanations\u002Flinks\u002F5ccad56b92851c8d22146613\u002FTowards-XAI-Structuring-the-Processes-of-Explanations.pdf)；El-Assady Mennatallah等；可解释的人工智能描述了一种揭示操作逻辑传播过程的方法，这些操作将给定输入转化为特定输出。在本文中，我们基于教育学、故事讲述、论证、编程、信任建立和游戏化六个研究领域的相关因素，探讨了解释流程的设计空间。我们提出了一种概念模型，用以描述解释流程的基本构成模块，其中包括对解释与验证阶段、路径、媒介和策略的全面概述。此外，我们还强调了研究有效可解释机器学习方法的重要性，并讨论了当前开放的研究挑战与机遇。\n\n\u003Ccenter>\u003Cimg width=\"500px\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_d1c77c96e0f7.png\">\u003C\u002Fcenter>\n\n* [迈向自动化机器学习：AutoML方法与工具的评估与比较](https:\u002F\u002Farxiv.org\u002Fabs\u002F1908.05557)；Truong Anh、Walters Austin、Goodsitt Jeremy、Hines Keegan、Bruss C. Bayan、Farivar Reza；近年来，机器学习（ML）在工业领域的应用迅速增长，相关兴趣也持续升温。因此，ML工程师在整个行业中需求旺盛，但如何提升ML工程师的工作效率仍然是一个根本性挑战。自动化机器学习（AutoML）应运而生，旨在节省ML流水线中重复性任务的时间与精力，例如数据预处理、特征工程、模型选择、超参数优化以及预测结果分析等。在本文中，我们调研了当前用于自动化这些任务的AutoML工具现状，针对多个数据集和不同数据子集进行了多项评估，以考察其性能，并比较它们在不同测试案例下的优缺点。\n\n\u003Ccenter>\u003Cimg width=\"500px\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_d1c77c96e0f7.png\">\u003C\u002Fcenter>\n\n* [可理解的医疗模型：预测肺炎风险与医院30天再入院](https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Fresearch\u002Fwp-content\u002Fuploads\u002F2017\u002F06\u002FKDD2015FinalDraftIntelligibleModels4HealthCare_igt143e-caruanaA.pdf); Rich Caruana 等；在机器学习中，常常需要在准确性和可理解性之间做出权衡。更精确的模型，如提升树、随机森林和神经网络，通常难以理解；而更易理解的模型，如逻辑回归、朴素贝叶斯和单棵决策树，则往往准确度显著较低。这种权衡有时会限制那些应用于医疗等关键任务中的模型的准确性，因为在这些领域中，能够理解、验证、修改并信任所学模型至关重要。我们展示了两个案例研究，其中高性能的具有两两交互作用的广义加性模型（GA2Ms）被应用于真实的医疗问题，生成了兼具先进准确度与可理解性的模型。在肺炎风险预测案例中，该可理解模型揭示了数据中一些令人惊讶的模式，而这些模式此前曾阻碍复杂的学习模型在此领域的部署。然而，由于该模型既可理解又模块化，这些模式得以被识别并移除。在30天医院再入院案例中，我们证明了相同的方法可以扩展到包含数十万患者和数千个特征的大规模数据集上，同时保持可理解性，并提供与最佳（不可理解）机器学习方法相当的准确度。\n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_6efa5bd55b8a.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_6efa5bd55b8a.png)\n\n* [机器学习模型中R²的夏普利分解](https:\u002F\u002Farxiv.org\u002Fabs\u002F1908.09718); Nickalus Redell；本文介绍了一种旨在帮助机器学习从业者快速总结并传达任何黑盒机器学习预测模型中各特征总体重要性的指标。我们提出的这一指标基于经典统计学中熟悉的R²的夏普利值方差分解，是一种不依赖于具体模型的特征重要性评估方法，能够公平地将模型解释的数据变异比例分配给每个特征。该指标具有若干理想特性，包括取值范围限定在0到1之间，且各特征层面的方差分解之和等于整个模型的R²。我们的实现已在R语言的shapFlex包中提供。\n\n* [数据夏普利：机器学习中数据的公平估值](https:\u002F\u002Farxiv.org\u002Fabs\u002F1904.02868); Amirata Ghorbani, James Zou；随着数据成为推动技术和经济增长的动力，如何量化数据在算法预测与决策中的价值成为一个根本性挑战。例如，在医疗和消费市场中，有人建议应就个人所产生的数据给予补偿，但何为公平的个体数据估值仍不甚明确。在本工作中，我们针对监督式机器学习场景下数据估值的问题，提出了一套原则性的框架。给定一个基于n个数据点训练而成的预测模型，我们提出使用“数据夏普利”作为衡量每个训练数据对模型性能贡献大小的指标。数据夏普利值独特地满足了公平数据估值的若干自然属性。我们开发了蒙特卡洛方法和基于梯度的方法，以高效估算复杂学习算法（包括神经网络）在大型数据集上训练时的各数据夏普利值。除了具备公平性之外，我们在生物医学、图像及合成数据上的大量实验还表明，数据夏普利还有其他多项优势：1) 与流行的留一法或杠杆率相比，它更能揭示哪些数据对于特定学习任务更为重要；2) 夏普利值较低的数据能有效捕捉异常值和数据污染；3) 夏普利值较高的数据则提示应采集何种新数据来进一步提升模型性能。\n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_7c6c751d165e.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_7c6c751d165e.png)\n\n* [针对共依变量的偏依赖分层分析方法](https:\u002F\u002Farxiv.org\u002Fabs\u002F1907.06698); Terence Parr, James Wilson；模型可解释性对机器学习从业者至关重要，而解释的关键组成部分之一就是刻画响应变量与模型中任意一组特征之间的偏依赖关系。目前最常用的两种偏依赖评估策略都存在诸多严重缺陷。第一种策略是通过线性回归模型的系数来描述当某一解释变量发生单位变化时，在其他变量保持不变的情况下，响应变量会如何变化。然而，线性回归并不适用于高维（p>n）数据集，且往往不足以捕捉解释变量与响应变量之间的复杂关系。第二种策略则是使用偏依赖图（PD）和个体条件期望图（ICE），但这两种方法在处理共依变量的常见情形时会产生偏差，而且它们依赖于用户提供的拟合模型。若用户提供的模型因系统性偏差或过拟合而选择不当，则PD\u002FICE图几乎无法提供任何有用的信息。为解决这些问题，我们提出了一种名为StratPD的新方法，该方法不依赖于用户提供的拟合模型，在存在共依变量的情况下仍能给出准确结果，并且适用于高维场景。其核心思想是利用决策树将数据集划分为若干组，每组内的观测值除目标变量外均相似，因此某组内响应变量的变化很可能是由目标变量引起的。我们将StratPD应用于一系列模拟和案例研究中，证实该方法速度快、可靠且稳健，相较于现有先进技术具有明显优势。\n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_29fd04707d99.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_29fd04707d99.png)\n\n* [DLIME：一种用于计算机辅助诊断系统的确定性局部可解释模型无关解释方法](https:\u002F\u002Farxiv.org\u002Fabs\u002F1906.10263)；Muhammad Rehman Zafar、Naimul Mefraz Khan；尽管LIME及类似局部算法因其简单性而广受欢迎，但其随机扰动和特征选择方式会导致生成的解释“不稳定”，即对于同一预测，可能会产生不同的解释。这一问题极为关键，可能阻碍LIME在计算机辅助诊断（CAD）系统中的部署，因为在该领域，稳定性是赢得医疗专业人员信任的重中之重。本文提出了一种LIME的确定性版本。我们未采用随机扰动，而是利用凝聚层次聚类（HC）对训练数据进行分组，并使用K近邻算法（KNN）来选择待解释新实例的相关簇。找到相关簇后，会在该簇上训练线性模型以生成解释。针对三个不同医学数据集的实验结果表明，确定性局部可解释模型无关解释（DLIME）具有显著优势，我们通过计算多次生成解释之间的Jaccard相似度，定量评估了DLIME相较于LIME的稳定性。\n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_a9f2c7c78154.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_a9f2c7c78154.png)\n\n* [利用模式解释个体预测](https:\u002F\u002Fpeople.eng.unimelb.edu.au\u002Fbaileyj\u002Fpapers\u002FKAIS2019.pdf)；Yunzhe Jia、James Bailey、Kotagiri Ramamohanarao、Christopher Leckie、Xingjun Ma；用户需要理解分类器的预测，尤其是在基于这些预测做出的决策可能带来严重后果时。对某一预测的解释能够揭示分类器作出该预测的原因，从而帮助用户更自信地接受或拒绝该预测。本文提出了一种名为模式辅助局部解释（PALEX）的解释方法，旨在为任意分类器提供实例级别的解释。PALEX以分类器、测试实例以及总结分类器训练数据的频繁模式集作为输入，输出分类器认为对该实例预测至关重要的支持证据。为了研究分类器在测试实例附近的行为，PALEX将训练数据中的频繁模式集作为额外输入，用以指导在测试实例附近生成新的合成样本。此外，PALEX还利用对比模式来识别测试实例附近的局部判别性特征。PALEX在存在多种解释的情境下尤为有效。\n\n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_d73c1e951e4c.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_d73c1e951e4c.png)\n\n* [公平胜过哗众取宠：男人之于医生，正如女人之于医生](https:\u002F\u002Farxiv.org\u002Fabs\u002F1905.09866)；Malvina Nissim、Rik van Noord、Rob van der Goot；诸如“男人之于国王，正如女人之于X”之类的类比常被用来展示词嵌入的强大能力。然而，它们同时也揭示了人类偏见如何强烈地编码在基于自然语言构建的向量空间中。当我们发现“女王”是“男人之于国王，正如女人之于X”这一类比的答案时，往往会感到惊叹；但也有论文报告称，发现了深植人类偏见的类比，例如“男人之于计算机程序员，正如女人之于家庭主妇”，这类例子则令人担忧和愤怒。在本工作中，我们表明，嵌入空间常常未能得到公平对待，而且往往是无意之中。通过一系列简单的实验，我们指出了先前研究中存在的实际与理论问题，并证明了一些最广泛使用的有偏类比实际上并不符合数据支持。\n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_421ae2cf95ac.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_421ae2cf95ac.png)\n\n* [由原型引导的可解释反事实解释](https:\u002F\u002Farxiv.org\u002Fabs\u002F1907.02584)；Arnaud Van Looveren、Janis Klaise；我们提出了一种快速且模型无关的方法，利用类别原型来寻找分类器预测的可解释反事实解释。我们证明，无论是通过编码器还是通过特定于类别的k-d树获得的类别原型，都能显著加快反事实实例的搜索速度，并生成更具可解释性的解释。我们引入了两个新颖的指标，用于定量评估实例层面的局部可解释性。借助这两个指标，我们分别在图像和表格型数据集——MNIST和威斯康星州乳腺癌（诊断）数据集——上展示了我们方法的有效性。\n\n* [利用归因先验学习可解释模型](https:\u002F\u002Farxiv.org\u002Fabs\u002F1906.10670)；Gabriel Erion、Joseph D. Janizek、Pascal Sturmfels、Scott Lundberg、Su-In Lee；深度学习中有两个重要议题都涉及将人类因素融入建模过程：模型先验通过约束模型参数，将人类信息传递给模型；而模型归因则通过解释模型行为，将信息从模型传递给人类。我们提出将这两者结合，引入归因先验的概念，使人类能够借助归因这一通用语言，在训练过程中对模型行为施加先验期望。我们开发了一种可微分的公理化特征归因方法——期望梯度，并展示了如何在训练过程中直接对这些归因进行正则化。我们通过实证展示了归因先验的广泛应用：1) 在图像数据上，2) 在基因表达数据上，3) 在医疗健康数据集上。  \n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_52b76eef8c9b.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_52b76eef8c9b.png)\n\n* [可解释机器学习负责任且以人为本的使用指南](https:\u002F\u002Farxiv.org\u002Fabs\u002F1906.03533)；Patrick Hall；可解释机器学习（ML）已被应用于众多开源和专有软件包中，同时也是商业预测建模的重要组成部分。然而，可解释ML也可能被滥用，尤其可能被用作对有害黑盒模型的虚假保护措施，例如“漂绿”行为，或用于模型窃取等恶意目的。本文讨论了相关定义、示例及指导原则，旨在推动一种整体性和以人为本的机器学习方法，其中包括可解释（即白盒）模型，以及解释、调试和差异影响分析等技术。\n\n* [概念树：用于更易解释的代理决策树的变量高层次表示](https:\u002F\u002Farxiv.org\u002Fabs\u002F1906.01297)；Xavier Renard、Nicolas Woloszko、Jonathan Aigrain、Marcin Detyniecki；在高维表格型数据集上训练的黑箱预测器的可解释代理模型，在存在相关变量的情况下，往往难以生成易于理解的解释。为此，我们提出了一种模型无关的可解释代理模型，能够提供黑箱分类器的全局和局部解释。我们引入了“概念”的概念，即由领域专家定义或利用相关系数自动发现的直观变量分组。这些概念被嵌入到代理决策树中，以提升其可解释性。\n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_afba0ab6a7cb.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_afba0ab6a7cb.png)\n\n* [机器学习的奥秘：你在数据分析中若早知这十点，将会更加高效](https:\u002F\u002Farxiv.org\u002Fabs\u002F1906.01998v1)；Cynthia Rudin、David Carlson；尽管机器学习已在各组织中得到广泛应用，但仍有一些关键原则常常被忽视。具体而言：1）监督学习至少有四大类方法：逻辑建模方法、线性组合方法、基于案例的推理方法以及迭代归纳方法。2）对于许多应用领域，几乎所有机器学习方法的表现都相差无几（当然也有一些例外）。深度学习作为计算机视觉问题的主流技术，并未在大多数其他问题上保持优势——这背后自有其原因。3）神经网络训练难度大，且在训练过程中经常会出现一些奇怪的现象。4）如果不使用可解释模型，就可能犯下严重错误。5）解释本身也可能具有误导性，不可盲目信任。6）即使对于深度神经网络，也几乎总能找到既准确又可解释的模型。7）诸如决策能力或鲁棒性等特殊属性必须在设计时就加以构建，而不能自然形成。8）因果推断与预测不同（相关性并不等于因果关系）。9）深度神经网络架构看似复杂，其中确实存在一定的规律，但并非总是如此。10）人工智能无所不能只是一种误解。\n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_c0e7b5f18911.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_c0e7b5f18911.png)\n\n* [关于模型脆弱性和安全性的建议](https:\u002F\u002Fwww.oreilly.com\u002Fideas\u002Fproposals-for-model-vulnerability-and-security)；Patrick Hall；应采用公平且私密的模型、白帽与取证式的模型调试方法，以及常识性的防护措施，以保护机器学习模型免受恶意攻击。\n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_49cc095f9a50.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_49cc095f9a50.png)\n\n* [关于可解释机器学习的误解及更具人文关怀的机器学习](https:\u002F\u002Fgithub.com\u002Fjphall663\u002Fxai_misconceptions\u002Fblob\u002Fmaster\u002Fxai_misconceptions.pdf)；Patrick Hall；由于社区和商业领域的强烈需求，可解释机器学习（ML）方法已被广泛应用于流行的开源软件和商业软件中。然而，作为一名在过去三年中一直参与可解释ML软件开发的人，我发现目前关于这一主题的许多文献既令人困惑，又与我个人的实践经验相去甚远。本文旨在通过论点、建议和参考文献，澄清一些常见的可解释机器学习误解。\n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_c5da3cffce71.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_c5da3cffce71.png)\n\n* [用于模型报告的模型卡片](https:\u002F\u002Farxiv.org\u002Fabs\u002F1810.03993)；Margaret Mitchell、Simone Wu、Andrew Zaldivar、Parker Barnes、Lucy Vasserman、Ben Hutchinson、Elena Spitzer、Inioluwa Deborah Raji、Timnit Gebru；经过训练的机器学习模型正越来越多地被用于执法、医疗、教育和就业等领域中具有重大影响的任务。为了明确机器学习模型的预期用途，并尽量避免将其应用于不适宜的场景，我们建议在发布模型时附带一份详细说明其性能特征的文档。本文提出了一种名为“模型卡片”的框架，以促进透明的模型报告。模型卡片是随训练好的机器学习模型附带的简短文件，其中提供了在多种条件下的基准评估结果，例如针对不同文化、人口统计或表型群体（如种族、地理位置、性别、菲茨帕特里克皮肤类型）以及交叉群体（如年龄与种族、性别与菲茨帕特里克皮肤类型）的相关评估。此外，模型卡片还会披露模型的预期使用场景、性能评估的具体流程以及其他相关信息。虽然我们的重点主要放在计算机视觉和自然语言处理领域中以人为本的机器学习模型上，但该框架同样适用于任何经过训练的机器学习模型。为巩固这一理念，我们为两款监督学习模型提供了示例卡片：一款用于检测图像中的微笑面孔，另一款用于识别文本中的有毒评论。我们提议将模型卡片作为推动机器学习及相关AI技术负责任地普及化的重要一步，从而提高人们对AI技术实际效果的透明度。我们希望这项工作能够鼓励那些发布训练好的机器学习模型的机构，在每次发布时都附上类似的详细评估数据及其他相关文档。\n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_6dd6997c2d53.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_6dd6997c2d53.png)\n\n* [基于树的方法中特征重要性的无偏估计](https:\u002F\u002Farxiv.org\u002Fabs\u002F1903.05179)；Zhengze Zhou、Giles Hooker；我们提出了一种改进方法，用于校正随机森林及其他基于树的方法中基于分裂增益的特征重要性度量。已有研究表明，这类方法倾向于高估具有更多潜在分裂点的特征的重要性。我们证明，通过合理地利用在留出数据上计算的分裂增益，可以纠正这一偏差，从而得到更可靠的汇总结果和筛选工具。\n  \n* [请停止打乱特征：解释与替代方案](https:\u002F\u002Farxiv.org\u002Fabs\u002F1905.03151)；Giles Hooker、Lucas Mentch；本文反对使用“打乱并预测”（PaP）方法来解释黑箱模型。诸如随机森林中的特征重要性度量、部分依赖图以及个体条件期望图等方法之所以广受欢迎，是因为它们能够提供与具体模型无关的度量，仅依赖于预先训练好的模型输出。然而，大量研究发现，这些工具可能会产生极具误导性的诊断结果，尤其是在特征之间存在强相关性时。与其简单地重复已有文献中关于这些问题的论述，我们在此尝试对观察到的现象进行解释。具体而言，我们认为，在保留数据中打破特征间的依赖关系，会过度强调特征空间中的稀疏区域，迫使原始模型外推出现极少或没有数据的区域。我们通过若干已知真实情况的场景对此进行了探讨，并支持了先前文献中的观点：即使对真实模型应用打乱方法也不会出现类似现象，但PaP指标在特征重要性和部分依赖图中仍倾向于过度强调相关特征。作为替代方案，我们建议采用其他已被证明在不同场景中有效的直接方法，例如显式移除特征、条件打乱或模型蒸馏技术。\n    \n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_b7fc695a1c83.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_b7fc695a1c83.png)\n    \n* [为什么你应该信任我的解释？理解LIME预测中的不确定性](https:\u002F\u002Farxiv.org\u002Fabs\u002F1904.12991.pdf)；Hui Fen (Sarah) Tan、Kuangyan Song、Madeilene Udell、Yiming Sun、Yujia Zhang；用于解释机器学习黑箱模型的方法能够提高结果的透明度，进而帮助人们深入了解算法的可靠性和公平性。然而，这些解释本身可能包含显著的不确定性，从而削弱人们对结果的信任，并引发对模型可靠性的担忧。以“局部可解释的模型无关解释”（LIME）方法为例，我们展示了两种不确定性来源：其采样过程中的随机性，以及不同输入数据点之间解释质量的差异。即使在训练和测试准确率都很高的模型中，这种不确定性依然存在。我们分别将LIME应用于合成数据、20 Newsgroup文本分类数据集以及COMPAS再犯风险评分数据集，以佐证我们的论点。\n  \n* [Aequitas：偏见与公平性审计工具包](https:\u002F\u002Farxiv.org\u002Fabs\u002F1811.05577)；Pedro Saleiro、Benedict Kuester、Loren Hinkson、Jesse London、Abby Stevens、Ari Anisfeld、Kit T. Rodolfa、Rayid Ghani；近期的研究引发了人们对当前人工智能系统中潜在无意偏见的担忧，这些偏见可能基于种族、性别或宗教等特征，对个人造成不公平的影响。尽管近年来提出了许多偏见度量和公平性定义，但对于应采用哪种度量或定义尚未达成共识，且可用于实际操作的资源也极为有限。因此，尽管人们的意识有所提高，但在开发和部署人工智能系统时进行偏见与公平性审计仍未成为标准做法。我们推出了Aequitas开源工具包，它是一个直观易用的机器学习工作流补充工具，使用户能够无缝地针对多个群体子集，测试模型在多种偏见和公平性指标上的表现。Aequitas有助于数据科学家、机器学习研究人员和政策制定者做出更加知情和公平的决策，以指导算法决策系统的开发和部署。\n  \n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_016e18cc2424.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_016e18cc2424.png)\n\n* [特征重要性云：探索一组优秀模型中特征重要性的方法](https:\u002F\u002Farxiv.org\u002Fabs\u002F1901.03209)；Jiayun Dong、Cynthia Rudin；特征重要性在科学研究中占据核心地位，涵盖社会科学与因果推断、医疗健康等领域。然而，目前的特征重要性概念往往与特定的预测模型紧密相关。这存在问题：如果存在多个性能良好的预测模型，而某个特征只对其中一部分重要，对另一部分则不重要，那么仅凭单一表现良好的模型，我们便无法判断该特征是否始终对预测结果至关重要。因此，我们希望不再依赖单个预测模型的特征重要性，而是探索所有近似精度相当的预测模型中特征的重要性。本文提出了“特征重要性云”的概念，它将每个特征映射到其在所有优秀预测模型中的重要性。我们探讨了特征重要性云的性质，并将其与统计学的其他领域建立了联系。此外，我们还引入了“特征重要性图”，作为特征重要性云在二维空间中的投影，以便于可视化展示。\n  \n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_fe00cef53fcb.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_fe00cef53fcb.png)\n\n* [一项系统综述表明，对于临床预测模型而言，机器学习并不比逻辑回归更具性能优势](https:\u002F\u002Fwww.sciencedirect.com\u002Fscience\u002Farticle\u002Fpii\u002FS0895435618310813)；Evangelia Christodoulou、Jie Ma、Gary Collins、Ewout Steyerberg、Jan Yerbakela、Ben Van Calster；研究目的：本研究旨在比较文献中用于临床预测建模的逻辑回归（LR）与机器学习（ML）的性能。研究设计与设置：我们对Medline数据库进行了检索（2016年1月至2017年8月），并提取了针对二分类结局的逻辑回归模型与机器学习模型之间的比较结果。研究结果：在927项研究中，我们纳入了71项。样本量的中位数为1,250（范围72–3,994,872），平均每个模型包含19个预测变量（范围5–563），每个多重共线性变量对应8个事件（范围0.3–6,697）。最常见的机器学习方法包括分类树、随机森林、人工神经网络和支持向量机。在48项（68%）研究中，我们观察到验证程序存在潜在偏倚。64项（90%）研究使用受试者工作特征曲线下面积（AUC）来评估区分能力。而在56项（79%）研究中并未考虑校准问题。我们共识别出282组逻辑回归模型与机器学习模型的比较（AUC范围0.52–0.99）。对于145组低偏倚风险的比较，逻辑回归与机器学习之间logit(AUC)的差异为0.00（95%置信区间，−0.18至0.18）。而对于137组高偏倚风险的比较，机器学习的logit(AUC)则高出0.34（0.20–0.47）。结论：我们未发现机器学习优于逻辑回归的证据。在比较不同建模算法的研究中，仍需改进方法学与报告规范。\n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_96060aa30437.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_96060aa30437.png)\n\n* [iBreakDown：非可加性预测模型的解释不确定性](https:\u002F\u002Farxiv.org\u002Fabs\u002F1903.11420)；Alicja Gosiewska、Przemyslaw Biecek；可解释人工智能（XAI）近年来备受关注。解释性被视为解决人们对模型预测缺乏信任的一种途径。诸如LIME、SHAP或Break Down等模型无关工具，承诺为任何复杂的机器学习模型提供实例级别的可解释性。然而，这些解释究竟有多可靠？我们能否依赖于针对非可加性模型的可加性解释呢？在本文中，我们探讨了在存在交互作用时模型解释器的行为。我们定义了两种不确定性来源：模型层面的不确定性以及解释层面的不确定性。我们证明，引入交互作用可以降低解释层面的不确定性。同时，我们提出了一种新的方法——iBreakDown，该方法能够生成包含局部交互效应的非可加性解释。\n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_3d27e10c0441.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_3d27e10c0441.png)\n\n* [采样、干预、预测、聚合：一种通用的模型无关解释框架](https:\u002F\u002Farxiv.org\u002Fabs\u002F1904.03959)；Christian A. Scholbeck、Christoph Molnar、Christian Heumann、Bernd Bischl、Giuseppe Casalicchio；非线性机器学习模型往往以牺牲可解释性为代价，换取卓越的预测性能。然而，如今的模型无关解释技术使我们能够估算任意预测模型中各特征的影响及重要性。不同的表述方式和术语使得这些方法的理解及其相互关系变得复杂。目前尚缺乏对这些方法的统一视角。我们提出了SIPA（采样、干预、预测、聚合）这一通用的工作流程框架，用于模型无关的解释技术，并展示了如何将几种主流的特征效应分析方法嵌入到该框架中。此外，我们正式引入了“边际效应”这一概念，用以描述黑箱模型中的特征效应。进一步地，我们将该框架扩展至特征重要性的计算，指出基于方差和基于性能的重要性度量均建立在相同的工作步骤之上。这一通用框架可作为开展机器学习中模型无关解释工作的指导方针。\n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_e7f6e4e95509.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_e7f6e4e95509.png)\n\n* [通过功能分解量化任意机器学习模型的可解释性](https:\u002F\u002Farxiv.org\u002Fabs\u002F1904.03867)；Christoph Molnar、Giuseppe Casalicchio、Bernd Bischl；为了获得可解释的机器学习模型，人们通常会选择从一开始就构建可解释的模型——例如浅层决策树、规则列表或稀疏广义线性模型——或者采用事后解释方法——如部分依赖图或ALE图。然而，这两种方法各有其缺点。前者可能过于保守地限制假设空间，从而导致次优解；而后者在面对复杂模型尤其是涉及特征交互作用时，则可能产生过于冗长或误导性的结果。为此，我们建议通过量化机器学习模型的复杂性与可解释性之间的权衡，明确预测能力与可解释性之间的折衷。基于功能分解，我们提出了三个衡量指标：使用的特征数量、交互作用强度以及主效应的复杂程度。我们证明，对那些在这三项指标上都尽可能低的模型进行事后解释，会更加可靠且简洁。此外，我们还展示了一种多目标优化方法的应用，在该方法中同时考虑模型的预测能力和可解释性。\n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_059f816b3e00.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_059f816b3e00.png)\n\n* [仅用一个像素即可攻破深度神经网络的攻击方法](https:\u002F\u002Farxiv.org\u002Fabs\u002F1710.08864)；Jiawei Su、Danilo Vasconcellos Vargas、Sakurai Kouichi；近期研究表明，只需在输入向量中加入相对较小的扰动，就能轻易改变深度神经网络（DNN）的输出。在本文中，我们分析了一种极端受限场景下的攻击方式，即仅能修改图像中的一个像素。为此，我们提出了一种基于差分进化的新型单像素对抗性扰动生成方法。该方法所需对抗信息较少（属于黑盒攻击），并且由于差分进化本身的特性，能够欺骗更多类型的网络。实验结果显示，在CIFAR-10测试数据集中，68.36%的自然图像以及在ImageNet（ILSVRC 2012）验证集中的41.22%图像，仅通过修改一个像素便能被分别以73.22%和5.52%的置信度引导至至少一个目标类别。因此，所提出的攻击方法从极端受限的角度重新审视了对抗性机器学习，揭示了当前的深度神经网络同样容易受到此类低维度攻击的影响。此外，我们还说明了差分进化（或更广泛地说，进化计算）在对抗性机器学习领域的重要应用：即开发能够高效生成低成本对抗性攻击的工具，用于评估神经网络的鲁棒性。\n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_5eaaed96cbf2.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_5eaaed96cbf2.png)\n\n* [VINE：在黑盒模型中可视化统计交互作用](https:\u002F\u002Farxiv.org\u002Fabs\u002F1904.00561)；马修·布里顿；随着机器学习的日益普及，对预测模型的可解释性说明的需求变得十分迫切。先前的研究已经开发出有效的方法来可视化全局模型行为，并生成局部（特定实例）的解释。然而，针对区域级解释——即在复杂模型中相似实例群体的行为方式——以及相关的统计特征交互作用可视化问题，相关工作相对较少。由于缺乏满足这些分析需求的工具，阻碍了关键任务型、透明且符合社会目标的模型的开发。我们提出了VINE（Visual INteraction Effects，交互效应可视化），这是一种新颖的算法，用于提取并可视化黑盒模型中的统计交互效应。同时，我们还提出了一种新的评估指标，用于可解释机器学习领域的可视化效果。\n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_0e9c6127ab31.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_0e9c6127ab31.png)\n\n* [机器学习算法的临床应用：超越黑盒](https:\u002F\u002Fwww.bmj.com\u002Fcontent\u002F364\u002Fbmj.l886)；大卫·沃森等；机器学习算法有望从根本上提升我们诊断和治疗疾病的能力；出于道德、法律和科学层面的考虑，医生和患者必须能够理解并解释这些模型的预测结果；通过与包括患者、数据科学家和政策制定者在内的相关利益方紧密合作，可以实现可扩展、可定制且符合伦理的解决方案。\n* [ICIE 1.0：一种用于交互式上下文交互解释的新工具](http:\u002F\u002Fwwwis.win.tue.nl\u002F~wouter\u002FPubl\u002FW6-ICIE.pdf)；西蒙·B·范德宗等；随着有关隐私和知情权的新法律法规的出台，对自动化决策的解释愈发重要。如今，机器学习模型被广泛应用于银行和保险等领域，以帮助专业人士识别可疑交易、审批贷款和信用卡申请。使用此类系统的公司必须能够提供其决策背后的依据；单纯依赖训练好的模型是远远不够的。目前已有多种方法可以洞察模型及其决策过程，但它们往往只擅长展示全局或局部行为。全局行为通常过于复杂而难以直观呈现或理解，因此只能采用近似的方式；而局部行为的可视化则容易产生误导，因为很难准确定义“局部”究竟指什么（例如，我们的方法无法判断某个特征值的改变难度，哪些是灵活可变的，哪些则是固定不变的）。为此，我们提出了ICIE框架（交互式上下文交互解释），使用户能够在不同情境下查看单个实例的解释。我们将看到，同一案例在不同情境下的解释会有所不同，从而揭示出不同的特征交互模式。\n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_b5e6cf803dc3.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_b5e6cf803dc3.png)\n\n* [人机系统中的解释：文献元分析、关键思想与出版物概要以及可解释人工智能的参考文献](https:\u002F\u002Farxiv.org\u002Fabs\u002F1902.01876v1)；谢恩·T·穆勒、罗伯特·R·霍夫曼、威廉·克兰西、阿比盖尔·埃姆雷、加里·克莱因；这是一篇综合性综述，围绕“什么样的解释才是好的解释”这一问题展开讨论，并以人工智能系统为参照。相关文献浩如烟海，因此本综述必然具有选择性。尽管如此，报告中仍涵盖了大多数关键概念和议题。报告梳理了计算机科学领域在构建能够解释与教学的系统（如智能辅导系统和专家系统）方面的历史进程。同时，报告阐述了现代人工智能中的可解释性问题与挑战，并简要介绍了主流的解释心理学理论。其中一些文章因其与XAI的高度相关性而尤为突出，其研究方法、结果及核心观点均被重点呈现。\n* [解释的解释：机器学习可解释性的概述](https:\u002F\u002Farxiv.org\u002Fabs\u002F1806.00069)；莱拉尼·H·吉尔平、大卫·鲍、本·Z·袁、阿耶莎·巴杰瓦、迈克尔·斯佩克特、拉拉娜·卡加尔；近年来，解释性人工智能（XAI）领域的研究呈现出蓬勃发展的态势。该研究方向致力于解决一个重要问题：复杂的机器和算法往往难以揭示其行为与思维过程。XAI通过提高用户与系统内部各环节的透明度，能够在一定程度上对决策作出解释。这些解释对于确保算法公平性、识别训练数据中的潜在偏差或问题，以及保证算法按预期运行至关重要。然而，目前由这些系统生成的解释既缺乏标准化，也未得到系统的评估。为了制定最佳实践并明确尚未解决的挑战，我们提出了可解释性的定义，并说明如何利用这一概念对现有文献进行分类。此外，我们还探讨了当前针对解释方法，尤其是深度神经网络的解释方法为何存在不足之处。\n* [SAFE ML：基于代理模型的特征提取用于模型学习](https:\u002F\u002Farxiv.org\u002Fabs\u002F1902.11035)；艾丽西亚·戈谢夫斯卡、亚历山德拉·加切克、皮奥特尔·卢邦、普热米斯瓦夫·别切克；复杂的黑盒预测模型可能具有较高的准确性，但其不透明性会导致信任缺失、稳定性不足以及对概念漂移的敏感等问题。另一方面，可解释模型则需要投入大量精力进行特征工程，耗时较长。那么，我们能否在无需耗费大量时间进行特征工程的情况下，训练出既可解释又准确的模型呢？本文提出了一种方法，即利用弹性黑盒作为代理模型，从而构建更为简单、透明度更高，同时保持准确性和可解释性的白盒模型。新模型基于借助代理模型提取或学习到的新特征而构建。我们展示了该方法在模型层面解释中的应用，并探讨了其向实例层面解释扩展的可能性。此外，文中还提供了一个Python实现示例，并在多个表格型数据集上对该方法进行了基准测试。\n* [注意力并非解释](https:\u002F\u002Farxiv.org\u002Fabs\u002F1902.10186)；萨尔塔克·贾因、拜伦·C·华莱士；注意力机制已在神经网络自然语言处理模型中得到广泛应用。除了提升预测性能外，它们还常被宣传为能够提供透明度：配备注意力机制的模型会输出对输入单元的关注分布，而这往往被（至少是隐性地）视为传达了各输入的重要性权重。然而，注意力权重与模型输出之间究竟存在何种关系尚不明确。在本研究中，我们针对多种NLP任务开展了广泛的实验，旨在评估注意力权重在多大程度上能够为预测提供有意义的“解释”。结果表明，它们在很大程度上并不能做到这一点。例如，学习到的注意力权重通常与基于梯度的特征重要性度量无关，而且即使关注分布截然不同，也可能产生相同的预测结果。我们的研究发现，标准的注意力模块并不能提供有意义的解释，也不应被视为能够做到这一点。\n* [高效搜索多样且一致的反事实解释](https:\u002F\u002Farxiv.org\u002Fabs\u002F1901.04909)；克里斯·拉塞尔；本文提出了一种基于混合整数规划的反事实解释新搜索算法。我们关注的是复杂数据，其中变量可以取连续范围内的任意值，或属于一组离散状态。我们提出了一组新颖的约束条件，称为“混合多面体”，并展示了如何将其与整数规划求解器结合使用，以高效地找到一致的反事实解释——即那些能够可靠地映射回原始数据结构，同时避免暴力枚举的解决方案。此外，我们还探讨了多样化解释的问题，并说明如何在我们的框架内生成此类解释。\n* [机器学习研究中的七个误区](https:\u002F\u002Farxiv.org\u002Fabs\u002F1902.06789v1)；奥斯卡·张、霍德·利普森；随着深度学习在医学影像等高风险应用中日益普及，我们必须谨慎对待对神经网络所作决策的解读。例如，虽然让卷积神经网络将核磁共振图像上的一个斑点识别为恶性肿瘤是一件好事，但如果这一结论建立在脆弱的解释方法之上，则不应轻信。\n* [迈向加权特征归因的聚合](https:\u002F\u002Farxiv.org\u002Fabs\u002F1901.10040)；乌芒·巴特、普拉迪普·拉维库马尔、若泽·M·F·莫拉；当前用于解释机器学习模型的方法主要分为两类：先决事件影响法和数值归因法。前者利用训练样本描述某个训练点对测试点的影响程度，而后者则试图为与特定预测最相关的特征赋予数值意义。在本工作中，我们讨论了一种名为AVA的算法——先决事件价值聚合法——它将这两种解释方法融合在一起，形成一种新的特征归因方式，不仅能够获取局部解释，还能捕捉模型所学习到的全局模式。\n* [人类对解释的可理解性评估](https:\u002F\u002Farxiv.org\u002Fabs\u002F1902.00006v1)；艾萨克·拉格、艾米莉·陈、杰弗里·何、梅纳卡·纳拉亚南、彬·金、萨姆·格什曼、菲娜莱·多希-维莱兹；究竟怎样的解释才真正符合人类的理解习惯，这一问题至今仍缺乏清晰的认识。本研究通过三项用户在使用机器学习系统时可能执行的具体任务——模拟响应、验证建议响应，以及判断建议响应的正确性是否会因输入变化而改变——来深化我们对解释可理解性的理解。通过精心设计的人体实验，我们确定了可用于优化机器学习系统可解释性的正则化参数。研究结果表明，复杂性的类型至关重要：认知块（新定义的概念）对解释效果的影响大于变量重复，且这一趋势在不同任务和领域中均保持一致。这提示我们，解释系统或许存在一些共通的设计原则。\n* [可解释的机器学习：定义、方法与应用](https:\u002F\u002Fexport.arxiv.org\u002Fpdf\u002F1901.04592)；W·詹姆斯·默多卡、钱丹·辛格、卡尔·昆比埃拉、雷扎·阿巴西-阿斯、宾·宇；机器学习模型在学习复杂模式并据此对未观测数据进行预测方面取得了巨大成功。除了利用模型进行预测之外，如何解释模型所学到的内容也日益受到关注。然而，这种关注度的提升却引发了关于可解释性概念的诸多困惑。特别是，人们尚不清楚各种提出的解释方法之间有何关联，以及可以用哪些共同的概念来对其进行评估。\n* [学习最优且公平的决策树以实现非歧视性决策](http:\u002F\u002Fwww-bcf.usc.edu\u002F~vayanou\u002Fpapers\u002F2019\u002FFair_DT_AAAI_2019_CameraReady.pdf)；西纳·阿加伊、穆罕默德·贾瓦德·阿齐齐、菲比·瓦亚诺斯；近年来，基于数据驱动的自动化决策系统在多个领域取得了巨大成功（例如用于产品推荐或指导娱乐内容的制作）。近来，这类算法越来越多地被用于辅助涉及社会敏感性的决策（如决定录取哪些学生进入学位课程，或为公共住房分配优先顺序）。然而，这些自动化工具可能会导致歧视性决策，即根据个体所属的群体或少数族裔身份对其区别对待，从而造成差别待遇或差别影响，进而违背道德与伦理规范。这种情况可能发生在训练数据本身存在偏见时（例如，某一特定群体的成员历来遭受歧视）。但也可能出现在训练数据无偏见的情况下，如果系统错误对不同群体或少数族裔成员造成的影响不同（例如，黑人的误分类率高于白人）。在本文中，我们统一了分类与回归任务中不公平的定义。随后，我们提出了一种多功能的混合整数优化框架，用于学习最优且公平的决策树及其变体，以根据具体情况防止差别待遇和\u002F或差别影响。这相当于为设计公平且可解释的政策提供了一个灵活的方案，适用于各类社会敏感决策。我们进行了大量的计算实验，结果表明，我们的框架能够改进该领域的现有技术水平（通常依赖启发式方法），从而在不降低整体准确性的前提下实现非歧视性决策。\n\n* [通过对比反向传播理解CNN的个体决策](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1812.02100.pdf)；Jindong Gu、Yinchong Yang、Volker Tresp；为了更好地理解深度卷积神经网络的个体决策，已经提出了许多基于反向传播的方法，如DeConvNets、普通梯度可视化和引导反向传播等。然而，这些方法生成的显著性图被证明缺乏判别性。最近，层级相关性传播（LRP）方法被提出用于解释整流器神经网络的分类决策。在本工作中，我们评估了所生成解释的判别能力，并分析了LRP的理论基础——深度泰勒分解。实验和分析表明，LRP生成的解释并不具备类别判别性。在此基础上，我们提出了对比层级相关性传播（CLRP），该方法能够生成实例特异、类别判别且像素级别的解释。在实验中，我们利用CLRP来解释决策过程，并理解个体分类决策中不同神经元之间的差异。此外，我们还通过“指向游戏”和消融研究对解释进行了定量评估。定性和定量评估均显示，CLRP生成的解释优于LRP。代码已公开。\n\n\n\n### 2018年\n\n* [通过类别对比的反事实陈述进行机器学习预测的对话式解释](https:\u002F\u002Fwww.ijcai.org\u002Fproceedings\u002F2018\u002F0836.pdf)；Kacper Sokol、Peter Flach；机器学习模型已广泛渗透到我们的日常生活中，它们决定了影响教育、就业和司法系统等重要事务的决策。许多此类预测系统属于受商业秘密保护的商业产品，因此其决策过程不透明。为此，在我们的研究中，我们关注机器学习模型预测的可解释性和说明性问题。我们的工作大量借鉴了社会科学领域的人类解释研究：通过对话提供的对比式和示例式解释。这种以用户为中心的设计，面向普通大众而非领域专家，应用于机器学习时，能够让被解释者根据自身需求主导解释过程，而不是被动接受预设模板。\n\n* [超越特征归因的可解释性：使用概念激活向量进行量化测试（TCAV）](https:\u002F\u002Farxiv.org\u002Fabs\u002F1711.11279)；Been Kim、Martin Wattenberg、Justin Gilmer、Carrie Cai、James Wexler、Fernanda Viegas、Rory Sayres；由于深度学习模型规模庞大、结构复杂且内部状态往往不透明，对其解释一直是一项挑战。此外，许多系统（如图像分类器）是基于低层次特征而非高层次概念进行操作的。为应对这些挑战，我们引入了概念激活向量（CAV），它能够用人类友好的概念来解释神经网络的内部状态。核心思想是将神经网络的高维内部状态视为一种辅助工具，而非障碍。我们展示了如何将CAV作为“使用CAV进行测试”（TCAV）技术的一部分，利用方向导数来量化用户定义的概念对分类结果的重要性——例如，“斑马”的预测对条纹存在的敏感程度。以图像分类领域为试验平台，我们描述了如何利用CAV探索假设并为标准图像分类网络以及医疗应用提供洞见。[TowardsDataScience](https:\u002F\u002Ftowardsdatascience.com\u002Ftcav-interpretability-beyond-feature-attribution-79b4d3610b4d)。\n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_ac8fe0645fa3.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_ac8fe0645fa3.png)\n\n* [机器决策与人类后果](https:\u002F\u002Farxiv.org\u002Fabs\u002F1811.06747)；牛津大学出版社即将出版的《算法监管》一书中已获接受发表的一章草稿；Teresa Scantamburlo、Andrew Charlesworth、Nello Cristianini；此处的讨论主要聚焦于刑事司法系统中的执法决策案例，但也借鉴了其他用于控制机会获取的算法中的类似情形，以解释机器学习的工作原理及其导致现代智能算法或“分类器”作出决策的方式。文中考察了分类器性能的关键方面，包括分类器的学习方式、其基于相关性而非因果关系运作的事实，以及机器学习中“偏差”一词与日常用法的不同含义。随后，以现实世界中的一个典型“分类器”——伤害评估风险工具（HART）为例，通过识别其技术特征——分类方法、训练数据与测试数据、特征与标签、验证及性能指标——展开分析。接着，参照HART，从四个规范性基准出发进行考量：(a) 预测准确性 (b) 公平与法律面前的平等 (c) 透明度与问责制 (d) 信息隐私与言论自由，以此展示其技术特征所具有的重要规范性维度，这些维度直接决定了该系统在多大程度上可以被视为现有人类决策者的可行且合法的支持，甚至替代方案。\n\n* [争议规则——发现分类器之间存在异常分歧的区域](https:\u002F\u002Farxiv.org\u002Fabs\u002F1808.07243)；Oren Zeev-Ben-Mordehai、Wouter Duivesteijn、Mykola Pechenizkiy；寻找不同分类器之间存在更高争议的区域，对于特定领域及其模型而言具有重要的洞察意义。这类评估既可以证伪某些假设，也可以强化部分假设，甚至揭示此前未知的现象。本文描述了一种基于异常模型挖掘框架的算法，能够支持此类研究。我们探索了多个公开数据集，并展示了该方法在分类任务中的实用性。在论文中，我们分享了几项关于这些已被广泛研究的数据集的有趣观察，其中一些是广为人知的知识，另一些则据我们所知此前未曾报道过。\n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_960942485b7a.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_960942485b7a.png)\n\n* [机器学习中的超参数窃取](https:\u002F\u002Farxiv.org\u002Fabs\u002F1802.05351)；王炳辉、戈登·尼尔；超参数在机器学习中至关重要，因为不同的超参数设置往往会导致模型性能出现显著差异。由于其商业价值以及训练者用于学习这些超参数的专有算法的保密性，超参数可能被视为机密信息。在本工作中，我们提出了针对训练者所学习超参数的窃取攻击，并将其称为“超参数窃取攻击”。我们的攻击适用于多种流行的机器学习算法，如岭回归、逻辑回归、支持向量机和神经网络。我们从理论和实验两方面评估了这些攻击的有效性。例如，我们在亚马逊机器学习平台上对这些攻击进行了评估。结果表明，我们的攻击能够准确地窃取超参数。此外，我们也研究了相应的防御措施。研究结果强调，对于某些机器学习算法而言，亟需开发新的防御机制来抵御此类超参数窃取攻击。\n  \n* [蒸馏与比较：利用透明模型蒸馏审计黑盒模型](https:\u002F\u002Farxiv.org\u002Fabs\u002F1710.06169)；黑盒风险评分模型广泛存在于我们的生活中，但通常属于专有或不透明的系统。我们提出了一种名为“蒸馏与比较”的模型蒸馏与对比方法，用于审计这类模型。为了深入了解黑盒模型，我们将它们视为教师，训练透明的学生模型以模仿黑盒模型所分配的风险评分。随后，我们将通过蒸馏训练得到的学生模型与另一款基于真实标签数据训练的未蒸馏透明模型进行比较，并利用两者之间的差异来洞察黑盒模型的特性。该方法无需探测黑盒模型的API即可在实际场景中应用。我们在四个公开数据集上展示了这一方法：COMPAS、Stop-and-Frisk、芝加哥警察局数据集以及Lending Club数据集。此外，我们还提出了一项统计检验方法，用于判断某个数据集是否缺失了训练黑盒模型时使用的关键特征。我们的检验结果显示，ProPublica的数据很可能缺少COMPAS模型训练过程中使用的关键特征。\n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_b55a94189979.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_b55a94189979.png)\n\n* [DIVE：支持集成式数据探索工作流的混合交互系统](https:\u002F\u002Fstatic1.squarespace.com\u002Fstatic\u002F5759bc7886db431d658b7d33\u002Ft\u002F5b969d5c89858325956a939f\u002F1536597342848\u002FDIVE_HILDA_2018.pdf)；胡凯文等；从数据中生成知识正变得越来越重要。这一数据探索过程包含多个环节：数据导入、可视化、统计分析和故事讲述。尽管这些任务相辅相成，但分析师通常会在不同的工具中分别执行它们。此外，由于这些工具依赖于手动查询语句的编写，其学习曲线往往较为陡峭。在此，我们描述了DIVE系统的设计与实现——这是一个将最先进的数据探索功能整合到单一工具中的Web平台。DIVE采用一种混合交互模式，将推荐机制与点选式的手动指定相结合，并提供一套统一的视觉语言，以衔接数据探索流程的不同阶段。在一项针对67名专业数据科学家的受控用户研究中，我们发现，与使用Excel的用户相比，DIVE用户在完成预定义的数据可视化和分析任务时，不仅效率更高，而且速度更快。\n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_5aadf12c71f4.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_5aadf12c71f4.png)\n\n* [从噪声数据中学习解释规则](https:\u002F\u002Farxiv.org\u002Fabs\u002F1711.04574)；理查德·埃文斯、爱德华·格雷芬斯特特；人工神经网络是强大的函数逼近器，能够对各类有监督和无监督问题的解进行建模。随着网络规模和表达能力的增加，模型的方差也随之增大，从而导致几乎普遍存在的过拟合问题。尽管可以通过多种模型正则化方法来缓解这一问题，但常见的解决办法仍然是获取大量训练数据——而这些数据并不总是容易获得——以充分近似我们希望测试的领域中的数据分布。相比之下，诸如归纳逻辑编程之类的逻辑编程方法提供了一种极高的数据效率流程，使模型能够在符号域上进行推理。然而，这些方法无法处理神经网络可应用的多样化领域：它们对输入中的噪声或标签错误缺乏鲁棒性，更重要的是，无法应用于数据具有歧义性的非符号域，例如直接操作原始像素。在本文中，我们提出了一种可微归纳逻辑框架，它不仅能够解决传统ILP系统擅长的任务，还表现出ILP无法应对的训练数据噪声和误差下的鲁棒性。\n* [通过展开潜在结构实现可解释的R-CNN](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1711.05226.pdf)；吴天富、李西来、宋曦、孙伟、董亮和李博；本文提出了一种在目标检测中学习定性可解释模型的方法，使用流行的两阶段基于区域的卷积神经网络检测系统（即R-CNN）。R-CNN由区域建议网络和RoI（感兴趣区域）预测网络组成。所谓可解释模型，我们关注的是弱监督提取式理由生成，即在不使用任何关于部件配置的监督信息的情况下，自动且同时地在检测过程中展开对象实例的潜在判别性部件配置。我们利用一种自顶向下、分层且组合式的语法模型，该模型嵌入在一个有向无环与或图（AOG）中，以探索并展开RoI的潜在部件配置空间。我们提出用AOG解析算子替代R-CNN中广泛使用的RoIPooling算子，因此所提方法适用于许多最先进的基于卷积神经网络的检测系统。\n* [公平信贷需要可解释模型以实现负责任的推荐](https:\u002F\u002Farxiv.org\u002Fabs\u002F1809.04684)；陈嘉豪；金融服务行业在信用决策中面临着由合规性和伦理考量引发的独特可解释性和公平性挑战。这些挑战使得在业务决策过程中使用机器学习和人工智能方法变得复杂。\n* [ICIE 1.0：用于交互式上下文交互解释的新工具](https:\u002F\u002Flink.springer.com\u002Fchapter\u002F10.1007\u002F978-3-030-13463-1_6)；西蒙·B·范德宗、沃特·杜伊夫斯泰因、维尔纳·范伊彭堡、扬·费尔德辛克、米科拉·佩切尼茨基；随着有关隐私和知情权的新法律的出台，对自动化决策的解释变得日益重要。如今，机器学习模型被用于协助银行和保险等领域的专家识别可疑交易、审批贷款和信用卡申请。使用此类系统的公司必须能够提供其决策背后的依据；仅仅依赖于训练好的模型是不够的。目前已有多种方法可以提供对模型及其决策的洞察，但这些方法往往要么擅长展示全局行为，要么擅长展示局部行为。全局行为通常过于复杂而难以可视化或理解，因此只能展示近似结果；而局部行为的可视化又常常具有误导性，因为很难界定“局部”究竟意味着什么（即我们的方法并不知道某个特征值可以多容易地被改变，哪些是灵活的，哪些是固定的）。我们引入了ICIE框架（交互式上下文交互解释），使用户能够在不同情境下查看单个实例的解释。我们会发现，同一案例在不同情境下的解释会有所不同，揭示出不同的特征交互作用。\n* [公平机器学习的延迟效应](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1803.04383.pdf)；莉迪娅·T·刘、萨拉·迪恩、埃丝特·罗尔夫、马克斯·辛乔维茨、莫里茨·哈特；机器学习中的公平性研究主要集中在静态分类场景中，而较少关注决策如何随时间推移改变底层人群。传统观点认为，公平性标准有助于其所保护群体的长期福祉。我们研究静态公平性标准与福祉的时间指标之间的相互作用，例如某一变量的长期改善、停滞和下降。我们证明，即使在单步反馈模型中，常见的公平性标准通常并不能促进长期改善，反而可能在原本不受约束的目标不会造成伤害的情况下带来损害。我们全面刻画了三种标准公平性准则的延迟效应，并对比了它们在不同情况下表现出的质性差异。此外，我们还发现，一种自然的测量误差会扩大公平性标准表现良好的范围。我们的研究结果强调了在评估公平性标准时测量和时间建模的重要性，指出了诸多新的挑战和权衡。\n* [构建可理解智能的挑战](https:\u002F\u002Farxiv.org\u002Fabs\u002F1803.04263)；丹尼尔·S·韦尔德、加甘·班萨尔；由于人工智能（AI）软件采用深度前瞻搜索和大型神经网络的随机优化等技术来拟合海量数据集，其行为往往非常复杂，难以被人理解。然而，各组织却在许多关键任务环境中部署AI算法。为了信任这些算法的行为，我们必须使AI变得可理解，要么使用本身可解释的模型，要么开发新的方法，通过局部近似、词汇对齐和交互式解释来解释和控制那些原本极其复杂的决策。本文认为可理解性至关重要，综述了近年来构建此类系统的研究进展，并指出了未来研究的关键方向。\n* [具有全局一致解释的信用风险可解释模型](https:\u002F\u002Farxiv.org\u002Fabs\u002F1811.12615)；陈超凡、林康成、辛西娅·鲁丁、亚伦·沙波什尼克、王思佳、王彤；我们针对公平艾萨克公司（FICO）提出的公开挑战——提供一个信用风险评估的可解释模型——提出了一种解决方案。我们没有简单地呈现一个黑盒模型后再加以解释，而是提供了一个与其它神经网络同样准确的全局可解释模型。“两层加法风险模型”可以分解为多个子尺度，第二层中的每个节点都代表一个有意义的子尺度，且所有的非线性部分都是透明的。我们提供了三种比全局模型更简单但与其保持一致的解释方式。其中一种解释方法涉及求解最小集合覆盖问题，以找到高支持度的全局一致解释。我们还展示了一款新的在线可视化工具，供用户探索全局模型及其解释。\n* [通过拓扑层次分解评估HELOC申请人风险表现](https:\u002F\u002Farxiv.org\u002Fabs\u002F1811.10658)；凯尔·布朗、德里克·多兰、瑞安·克莱默、布拉德·雷诺兹；金融行业的严格监管要求所有基于机器学习的决策都必须加以解释。这限制了诸如神经网络等强大有监督技术的应用。在本研究中，我们提出了一种名为拓扑层次分解（THD）的新无监督及半监督技术。该过程将数据集逐步分解为越来越小的组别，每组与一个单纯复形相关联，该复形近似表示数据集的底层拓扑结构。我们将THD应用于FICO机器学习挑战数据集，该数据集包含匿名化的房屋净值贷款申请，并使用MAPPER算法构建单纯复形。我们识别出无法偿还贷款的不同人群，并说明如何利用单纯复形中特征值的分布来解释批准或拒绝贷款的决定，具体做法是从数据集上的两个THD中提取说明性解释。\n* [从黑箱到白箱：基于核机器的可解释学习](https:\u002F\u002Flink.springer.com\u002Fchapter\u002F10.1007%2F978-3-319-96136-1_18)；张浩、中台真司、福水健二；我们提出了一种基于核机器的可解释学习新方法。在许多实际学习任务中，核机器已被成功应用。然而，普遍的看法是，由于其固有的黑箱性质，核机器难以被人理解。这限制了核机器在对模型可解释性要求极高的领域的应用。在本文中，我们提出构建可解释的核机器。具体而言，我们设计了一种基于随机傅里叶特征（RFF）的新型核函数以提高可扩展性，并开发了一种两阶段学习流程：在第一阶段，我们将成对特征显式映射到由所设计核产生的高维空间，并学习一个稠密的线性模型；在第二阶段，从第一阶段提取可解释的数据表示，并学习一个稀疏的线性模型。最后，我们使用基准数据集评估了我们的方法，并通过可视化展示了其在可解释性方面的优势。\n* [从软分类器到硬决策：我们能有多公平？](https:\u002F\u002Farxiv.org\u002Fabs\u002F1810.02003)；兰·卡内蒂、阿洛尼·科恩、尼山特·迪卡拉、戈文德·拉姆纳拉扬、萨拉·谢弗勒、亚当·史密斯；我们研究通过后处理校准分数来实现各种公平属性的可行性，并表明允许后处理器推迟某些决策，可以使最终决策满足更多的公平条件。具体来说，我们证明：1. 并不存在一种通用的方法，可以通过后处理使校准分类器在受保护群体之间实现相等的阳性预测值或阴性预测值（PPV或NPV）。对于某些“良好”的校准分类器，当后处理器在不同受保护群体间使用不同阈值时，PPV或NPV可以被均等化……2. 当允许后处理在某些决策上推迟执行（即通过将部分示例转交给另一个流程来避免做出决策）时，对于未推迟的决策，最终得到的分类器可以在受保护群体之间实现PPV、NPV、假阳性率（FPR）和假阴性率（FNR）的均等化。这暗示了一种部分规避丘尔德乔娃和克莱因伯格等人提出的不可能性结果的方式，这些结果禁止同时均等化所有这些指标。我们还介绍了不同的推迟策略，并展示了它们如何影响整个系统的公平性。我们使用2016年的COMPAS数据集评估了我们的后处理技术。\n* [黑箱模型解释方法综述](https:\u002F\u002Fdl.acm.org\u002Fcitation.cfm?id=3236009)；里卡多·圭多蒂、安娜·蒙雷亚莱、萨尔瓦托雷·鲁吉耶里、弗朗科·图里尼、福斯卡·詹诺蒂、迪诺·佩德雷斯基；近年来，许多精确的决策支持系统被构建为黑箱，即隐藏其内部逻辑的系统。这种缺乏解释既是一个实际问题，也是一个伦理问题。文献中报告了许多旨在克服这一关键弱点的方法，有时甚至是以牺牲准确性为代价来换取可解释性。黑箱决策系统可应用的领域多种多样，而每种方法通常都是为了解决特定问题而开发的，因此它明确或隐含地界定了自己对可解释性和解释的定义。本文旨在根据解释的概念和黑箱系统的类型，对文献中讨论的主要问题进行分类。给定一个问题定义、黑箱类型和期望的解释，这篇综述应帮助研究人员找到对其工作更有用的方案。所提出的开放黑箱模型方法分类也有助于将众多研究中的开放性问题置于更清晰的视角之下。\n* [深度k近邻：迈向自信、可解释且鲁棒的深度学习](https:\u002F\u002Farxiv.org\u002Fabs\u002F1803.04765)；尼古拉斯·帕佩罗特、帕特里克·麦克丹尼尔；在本工作中，我们利用深度学习的结构，开发新的基于学习的推理和决策策略，以实现鲁棒性和可解释性等理想特性。我们迈出了第一步，提出了深度k近邻（DkNN）。这种混合分类器将k近邻算法与深度神经网络每一层所学习到的数据表示相结合：测试输入会根据其在这些表示中的距离，与相邻的训练样本进行比较。我们发现，这些相邻样本的标签可以为模型训练流形之外的输入提供置信度估计，包括对抗样本等恶意输入——从而为超出模型理解范围的输入提供保护。这是因为可以通过最近邻来估计预测在训练数据中是否缺乏支持，即非一致性程度。同时，这些邻居也为预测提供了人类可解释的说明。\n* [RISE：用于黑箱模型解释的随机输入采样](https:\u002F\u002Farxiv.org\u002Fabs\u002F1806.07421)；维塔利·佩秋克、阿比尔·达斯、凯特·萨恩科；深度神经网络越来越多地被用于自动化数据分析和决策制定，然而其决策过程在很大程度上仍不明确，也难以向最终用户解释。在本文中，我们针对以图像为输入、输出类别概率的深度神经网络的可解释AI问题提出了解决方案。我们提出了一种名为RISE的方法，该方法可以生成一张重要性地图，指示每个像素对模型预测的重要性。与使用梯度或其他内部网络状态来估计像素重要性的白箱方法不同，RISE适用于黑箱模型。它通过用随机遮蔽的输入图像版本探测模型，并获取相应的输出，以经验方式估计重要性。我们还将我们的方法与最先进的重要性提取方法进行了比较，既使用自动删除\u002F插入指标，也使用基于人工标注对象片段的指向指标。在多个基准数据集上的大量实验表明，我们的方法在性能上与白箱方法相当或超越之。\n* [可视化黑箱模型的特征重要性](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1804.06620.pdf)；朱塞佩·卡萨利奇奥、克里斯托夫·莫尔纳尔和贝恩德·比施；基于一种近期的模型无关全局特征重要性方法，我们引入了针对单个观测的局部特征重要性度量，并提出了两种可视化工具：部分重要性（PI）和个体条件重要性（ICI）图，它们分别展示了特征变化如何平均影响模型性能，以及对单个观测的影响。我们提出的方法与部分依赖（PD）和个体条件期望（ICE）图相关，但它们展示的是预期的（条件性）特征重要性，而非预期的（条件性）预测。此外，我们还表明，将各个观测的ICI曲线取平均即可得到PI曲线，而将PI曲线按所考虑特征的分布积分，则可得到全局特征重要性。\n* [通过模型提取解释黑箱模型](https:\u002F\u002Farxiv.org\u002Fabs\u002F1705.08504)；奥斯伯特·巴斯塔尼、卡罗琳·金、汉萨·巴斯塔尼；随着机器学习越来越多地用于支持重大决策，可解释性变得愈发重要。我们提议以一棵近似原模型的决策树形式，为复杂的黑箱模型构建全局解释——只要这棵决策树是良好的近似，它就能反映黑箱模型所做的计算。我们设计了一种新颖的决策树解释提取算法，该算法会主动采样新的训练点，以避免过拟合。我们以一棵用于预测糖尿病风险的随机森林和一个用于控制倒立摆的已学习控制器为例，评估了我们的算法。与多个基线相比，我们的决策树不仅精度显著更高，而且在用户研究中也被认为同样或更加可解释。最后，我们描述了**由我们的解释所提供的若干见解，其中包括一项经医生验证的因果关系问题**。\n* [基于游戏的深度神经网络近似验证，附带可证明的保证](https:\u002F\u002Fexport.arxiv.org\u002Fpdf\u002F1807.03571)；吴敏、马修·维克1、阮文杰、黄晓伟、玛尔塔·克维亚特科夫斯卡；尽管深度神经网络的准确度有所提高，但对抗样本的发现引发了严重的安全担忧。在本文中，我们研究了两种点态鲁棒性的变体：最大安全半径问题，即对于给定的输入样本，计算其与对抗样本之间的最小距离；以及特征鲁棒性问题，旨在量化单个特征对对抗扰动的抵抗能力。我们证明，在假设利普希茨连续性的前提下，这两个问题都可以通过离散化输入空间来进行有限优化近似，且这种近似具有可证明的保证，即误差是有界的。随后，我们指出，由此产生的优化问题可以简化为两人轮流进行的游戏，其中一方选择特征，另一方在选定的特征范围内扰动图像。虽然第二方的目标是尽量缩短与对抗样本的距离，但根据优化目标的不同，第一方可能是合作的，也可能是竞争的。我们采用随时可用的方法来解决这些游戏，即通过单调改进游戏的上下界来近似其价值。我们使用蒙特卡洛树搜索算法来计算这两场游戏的上界，而可接受A*算法和Alpha-Beta剪枝算法则分别用于计算最大安全半径和特征鲁棒性游戏的下界。在计算最大安全半径问题的上界时，我们的工具在与现有对抗样本生成算法的竞争中表现出优异性能。此外，我们还展示了如何将我们的框架应用于评估自动驾驶汽车中交通标志识别等安全关键应用中的神经网络点态鲁棒性。\n* [所有模型都是错的，但许多是有用的：利用模型类别依赖性计算黑箱、专有或误设预测模型的变量重要性](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1801.01489.pdf)；亚伦·费舍尔、辛西娅·鲁丁、弗朗切斯卡·多米尼奇；变量重要性（VI）工具描述了协变量对预测模型准确度的贡献程度。然而，对于一个表现良好的模型（例如线性模型f(x) = x T β，其中系数向量β固定）来说重要的变量，对于另一个模型可能并不重要。在本文中，我们提出使用模型类别依赖性（MCR）作为预先指定类别中所有表现良好模型的VI值范围。因此，MCR通过考虑到许多预测模型（可能具有不同的参数形式）都能很好地拟合数据这一事实，提供了更为全面的关于重要性的描述。\n* [请停止为高风险决策解释黑箱模型](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1811.10154v1.pdf)；辛西娅·鲁丁；当前社会上存在许多用于高风险决策的黑箱模型。与其一开始就创建可解释的模型，不如不断尝试解释黑箱模型的做法，很可能会延续不良实践，甚至可能对社会造成灾难性危害。正确的出路在于设计本身就可解释的模型。\n\n* [公平机器学习的最新进展：从道德哲学和立法到公平分类器](https:\u002F\u002Farxiv.org\u002Fabs\u002F1811.09539v1)；埃利亚斯·鲍曼、约瑟夫·伦贝格；随着许多决策，例如是否发放贷款，不再由人类而是由机器学习算法做出，机器学习正日益渗透到我们的生活中。然而，这些决策往往存在不公平现象，会基于种族或性别等受保护特征歧视特定群体。随着近期《通用数据保护条例》（GDPR）的生效，人们对这类问题的认识显著提高。鉴于计算机科学家对人们生活的影响日益深远，采取行动以发现并防止歧视行为显得尤为必要。本文旨在介绍歧视的概念、应对歧视的法律基础，以及检测和预防机器学习算法出现此类行为的策略。\n\n* [人工智能中的解释性研究](https:\u002F\u002Farxiv.org\u002Fabs\u002F1811.01439)；布伦特·米特尔施塔特、克里斯·拉塞尔、桑德拉·瓦赫特；近年来，关于机器学习和人工智能可解释性的研究主要集中在构建能够近似真实决策标准的简化模型上。这些模型对于培训专业人员理解复杂系统将如何作出决策，尤其是系统可能出现哪些故障，具有重要的教学价值。然而，在评估任何此类模型时，都应牢记乔治·博克斯的名言：“所有模型都是错的，但有些模型是有用的。”我们着重探讨这些模型与哲学和社会学中“解释”概念之间的区别。这些模型可以被视为一种用于生成解释的“自助工具包”，使从业者无需外部协助即可直接回答“如果……会怎样”的问题，或生成对比性解释。尽管这一能力颇具价值，但将这些模型作为解释提供却显得过于复杂，而其他形式的解释可能并不具备同样的权衡取舍。我们比较了关于何为有效解释的不同理论流派，并建议机器学习领域可以从更广阔的视角来审视这一问题。\n\n* [人类在获得解释与机器学习模型预测情况下的判断：以欺骗检测为例](https:\u002F\u002Farxiv.org\u002Fabs\u002F1811.07901v1)；维维安·赖、陈浩·谭；在涉及伦理和法律问题的关键任务中，如累犯风险预测、医学诊断以及打击虚假新闻等，人类始终是最终决策者。尽管机器学习模型在这些任务中有时能取得令人瞩目的表现，但这些任务并不适合完全自动化。为了充分发挥机器学习改善人类决策的潜力，有必要了解机器学习模型的辅助如何影响人类的表现与自主性。在本文中，我们以欺骗检测作为实验平台，探讨如何利用机器学习模型提供的解释和预测来提升人类表现，同时保持人类的自主性。我们提出了一个介于完全人类自主与完全自动化之间的连续谱，并沿该谱设计了不同层次的机器辅助方案，逐步增加机器预测的影响力。研究发现，仅提供解释而不展示预测标签时，人类在最终任务中的表现并无统计学上的显著提升。相比之下，展示预测标签可大幅提高人类表现（相对提升超过20%），而明确提示机器表现出色则能进一步提升效果。有趣的是，当同时展示预测标签和机器预测的解释时，其准确率与直接声明机器表现出色的效果相当。我们的研究结果揭示了人类表现与自主性之间的权衡，并表明机器预测的解释可以在一定程度上缓解这一矛盾。\n\n* [机器学习解释的艺术与科学](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1810.02909v1.pdf)；帕特里克·霍尔；介绍了超越传统用于评估机器学习模型的误差度量和图表的解释方法。其中一些方法属于行业常用工具，另一些则基于长期积累的理论体系严谨推导而来。这些方法包括决策树代理模型、个体条件期望图（ICE）、局部可解释的模型无关解释（LIME）、部分依赖图以及沙普利值解释等，它们在适用范围、忠实度及应用场景等方面各有差异。除了对这些方法的详细描述外，本文还结合实际案例和深入的软件示例，给出了相应的使用建议。\n\n* [可解释性是为谁而设？一种基于角色的可解释性机器学习系统分析模型](https:\u002F\u002Farxiv.org\u002Fabs\u002F1806.07552)；理查德·汤姆塞特、戴夫·布雷内斯、丹·哈本、阿伦·普里斯、苏普里约·查克拉博蒂；我们不应仅仅询问系统是否可解释，而应关注它对谁而言是可解释的。为此，我们提出了一种模型，旨在通过识别主体在机器学习系统中可能扮演的不同角色来帮助解答这一问题。我们通过多种场景展示了该模型的应用，探讨了主体角色对其目标的影响，以及这对定义可解释性的意义。最后，我们还就该模型如何为可解释性研究人员、系统开发者以及负责审计机器学习系统的监管机构提供帮助提出了建议。\n\n* [通过允许提问来解释模型](https:\u002F\u002Farxiv.org\u002Fabs\u002F1811.05106)；康成珉、朴基泰、张在赫、秋在国；问题本身蕴含着提问者的信息，即其所不了解的内容。在本文中，我们提出了一种新方法，允许学习型智能体在生成最终输出的过程中，主动询问自己认为难以预测的部分。通过分析其提问的时机和内容，我们可以使模型更加透明且易于理解。首先，我们将这一想法扩展为一套通用的深度神经网络框架——我们称之为“提问网络”。针对色彩化这一典型的“一对多”任务，我们提出了专门的架构和训练流程，因为在这种任务中，通过提问有助于更准确地完成工作。研究结果表明，该模型能够学会提出有意义的问题，优先询问难点，并比基准模型更高效地利用所提供的提示信息。我们得出结论：所提出的提问框架能够让学习型智能体暴露自身的薄弱环节，这为开发可解释且具有交互性的模型开辟了一个充满前景的新方向。\n\n* [对比性解释：基于结构模型的方法](https:\u002F\u002Farxiv.org\u002Fabs\u002F1811.03163)；蒂姆·米勒；……哲学和社会科学研究表明，解释具有对比性：也就是说，当人们询问某个事件的原因时，他们（有时是隐含地）实际上是在寻求相对于某个对比情境的解释，即“为什么是P而不是Q？”在本文中，我们扩展了结构因果模型方法，定义了两种互补的对比性解释概念，并将其应用于两个经典的AI问题：分类和规划。\n\n* [面向设计师的可解释AI：一种以人为本的混合式协同创作视角](http:\u002F\u002Fantoniosliapis.com\u002Fpapers\u002Fexplainable_ai_for_designers.pdf)；朱继晨、安东尼奥斯·利亚皮斯、塞巴斯蒂安·里西、拉斐尔·比达拉、迈克尔·扬布拉德；在这篇愿景论文中，我们提出了一种新的研究领域——面向设计师的可解释AI（XAID），尤其针对游戏设计师。通过聚焦特定用户群体的需求与任务，我们提出了一种以人为本的方法，旨在借助XAID技术帮助游戏设计师与AI\u002FML系统共同创作。我们通过三个用例展示了初步的XAID框架，这些用例既需要理解AI技术的本质特性，也需要洞察用户需求，并在此基础上识别出关键的开放性挑战。\n\n* [教育中的AI需要可解释的机器学习：来自开放学习者建模的经验教训](https:\u002F\u002Farxiv.org\u002Fabs\u002F1807.00154)；克里斯蒂娜·科纳蒂、卡斯卡·波赖斯卡-蓬斯塔、马诺利斯·马夫里基斯；底层AI表示的可解释性是开放学习者建模（OLM）——智能辅导系统（ITS）研究的一个分支——的核心价值所在。OLM提供了工具，能够“打开”学习者认知与情感的AI模型，从而支持人类的学习与教学。——用例\n\n* [欺诈检测中的实例级解释：一个案例研究](https:\u002F\u002Farxiv.org\u002Fabs\u002F1806.07129)；丹尼斯·科拉里斯、利奥·M·芬克、雅克·J·范·维克；欺诈检测是一个复杂的问题，预测建模可以为其提供帮助。然而，对预测结果的验证却颇具挑战：对于单个保险单，模型仅会给出一个预测分数。我们展示了一个案例研究，探讨了不同的实例级模型解释技术如何协助欺诈检测团队开展工作。为此，我们设计了两款新颖的仪表盘，结合了多种最先进的解释技术。\n\n* [论可解释性方法的鲁棒性](https:\u002F\u002Farxiv.org\u002Fabs\u002F1806.08049)；大卫·阿尔瓦雷斯-梅利斯、汤米·S·雅各卡；我们认为，解释的鲁棒性——即相似的输入应产生相似的解释——是可解释性的关键要求之一。我们引入了用于量化鲁棒性的指标，并证明当前的方法在这些指标下表现不佳。最后，我们提出了在现有可解释性方法上强化鲁棒性的途径。\n\n* [基于局部对照树的对比性解释](https:\u002F\u002Farxiv.org\u002Fabs\u002F1806.07470)；贾斯珀·范德瓦、马塞尔·罗贝尔、尤里安·范迪格伦、马蒂厄·布林克胡伊斯、马克·尼林克斯；近年来，可解释机器学习（iML）和可解释AI（XAI）领域的进展主要基于特征在分类任务中的重要性来构建解释。然而，在高维特征空间中，如果不限制重要特征的集合，这种方法可能会变得不可行。我们建议利用人类倾向于提出“为什么是这个输出（事实）而不是那个输出（对照）？”这类问题的心理倾向，将特征数量缩减到那些在所提对比中起主要作用的特征。我们提出的方法使用本地训练的一对多决策树，以识别出使树将数据点分类为对照而非事实的互斥规则集。\n\n* [评估特征重要性估计](https:\u002F\u002Farxiv.org\u002Fabs\u002F1806.10758)；萨拉·胡克、杜米特鲁·埃尔汉、彼得-扬·金德曼斯、彬·金；估计某一特征对模型预测的影响是一项挑战。我们引入ROAR——移除并重新训练——这一基准测试，用于评估深度神经网络中估计输入特征重要性的可解释性方法的准确性。我们根据每种估计方法认为最重要的特征，移除一部分输入特征，并测量重新训练后模型准确率的变化。\n\n* [知识库嵌入模型的解释：一种教学法方法](https:\u002F\u002Farxiv.org\u002Fabs\u002F1806.09504)；阿瑟·科隆比尼·古斯芒、阿尔瓦罗·恩里克·柴姆·科雷亚、格劳伯·德·博纳、法比奥·加利亚尔迪·科兹曼；嵌入模型在知识库补全任务中达到了最先进的精度，但其预测却以难以解释而闻名。在本文中，我们借鉴神经网络文献中的“教学法方法”，通过从嵌入模型中提取加权霍恩规则来实现对其的解释。我们展示了教学法方法如何适应知识库的大规模关系特性，并通过实验说明了其优势与局限性。\n\n* [流形：一种与模型无关的机器学习模型解释与诊断框架](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1808.00196.pdf)；张佳伟、王洋、皮耶罗·莫利诺、李乐志和戴维·S·埃伯特；介绍了流形——一种用于在检查（假设）、解释（推理）和优化（验证）过程中对模型进行可视化探索的工具。支持多模型比较。这是一种面向机器学习模型开发的可视化探索方法。\n\n* [通过有意义扰动实现黑盒模型的可解释性解释](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1704.03296.pdf)；露丝·C·冯、安德烈娅·韦达尔迪；（摘自摘要）一种通用框架，可用于为任何黑盒算法学习不同类型的解释。该框架能够找到图像中对分类器决策贡献最大的部分……该方法与模型无关且可测试，因为它基于明确且可解释的图像扰动。\n\n* [多分类场景下的可解释性更为困难：多分类加法模型的公理化可解释性](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1810.09092.pdf)；张学周、莎拉·谭、保罗·科赫、尹娄、乌尔苏拉·查耶夫斯卡、里奇·卡鲁阿纳；（……）随后，我们开发了一种后处理技术（API），该技术能够保证将预训练的加法模型转换为满足可解释性公理的形式，同时不牺牲准确性。该技术不仅适用于使用我们的算法训练的模型，也适用于任何多分类加法模型。我们在一个包含12个类别的婴儿死亡率数据集上演示了该API。（……）最初针对广义加法模型（GAMs）。\n\n* [大数据中的统计天堂与悖论](https:\u002F\u002Fstatistics.fas.harvard.edu\u002Ffiles\u002Fstatistics-2\u002Ffiles\u002Fstatistical_paradises_and_paradoxes_in_big_data_.pdf)；孟晓犁；(...) 天堂得失？数据质量与数量的权衡。（“我应该更信任回复率为60%的1%抽样调查，还是覆盖了80%人口的非概率数据集？”）；数据质量 × 数据数量 × 问题难度；\n\n* [深度学习中的解释方法：用户、价值、关切与挑战](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1803.07517.pdf)；加布里埃尔·拉斯、马塞尔·范·赫尔文、皮姆·哈塞拉格尔；可解释AI相关问题涉及四个要素：用户、法律法规、解释本身以及算法。总体而言，很明显，关于输入对输出影响的各个方面都可以给出（可视化）解释……未来很可能会出现一类新的解释方法，它们结合规则提取、归因分析和内在解释等技术，以简单易懂的人类语言回答具体问题。此外，显而易见的是，当前的解释方法主要面向专家用户，因为解读结果需要对深度神经网络的工作原理有深入了解。据我们所知，目前尚不存在专为普通用户设计的解释方法，例如直观的解释界面。\n\n* [TED：教会AI解释其决策](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1811.04896v1.pdf)；诺埃尔·C·F·科德拉等人；由于人工智能系统在提升决策效率、规模、一致性、公平性和准确性方面的潜力，其应用正日益广泛。然而，许多此类系统的工作机制不透明，因此社会对这些系统提供决策解释的需求也不断增长。传统的解决思路是试图揭示或挖掘机器学习模型的内部运作机制，期望由此产生的解释能够被用户理解。与此不同，本文提出了一种全新的方法，即“教人解释决策”（TED）框架——一个简单实用的体系，能够生成符合用户认知模型的有意义解释。\n\n* [算法与人类决策中的透明度：是否存在双重标准？](https:\u002F\u002Flink.springer.com\u002Farticle\u002F10.1007\u002Fs13347-018-0330-6)；约翰·泽里利、阿里斯泰尔·诺特、詹姆斯·麦克劳林、科林·加瓦根；我们对算法决策工具不透明性的担忧持怀疑态度。尽管透明度和可解释性确实是算法治理中值得追求的重要目标，但我们担心自动化决策正被要求达到一种不切实际的高标准，这或许源于人们对人类决策者所能达到的透明度水平存在过高估计。在本文中，我们回顾了大量证据，表明许多人类决策同样存在透明度问题，并指出AI在这些方面并未显著优于或劣于人类；同时认为，某些关于可解释AI的监管提案可能将标准定得过高，甚至超出必要范围，反而不利于实际应用。实践理性的要求决定了行动的理由应以实践理性所能接受的程度来阐述。那些支持或替代实践推理的决策工具，不应被期待达到更高的标准。我们将这一要求置于丹尼尔·丹内特的“意向立场”理论框架下，认为既然人类行为的合理性通常以意向立场式的解释来表达，那么算法决策的合理性也应采取相同的形式。实际上，这意味着在算法决策的解释中，应优先选择与意向立场式解释相类似的类型，而非那些试图深入剖析决策工具内部结构的解释。\n\n* [机器学习中增强公平性的干预措施比较研究](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1802.04422.pdf)；索雷尔·A·弗里德勒、卡洛斯·谢伊德格、苏雷什·文卡塔苏布拉马尼安、索南·乔杜里、埃文·P·汉密尔顿、德里克·罗斯；计算机越来越多地被用于做出对人们生活产生重大影响的决策。然而，这些预测往往会对不同的人口子群体造成不成比例的影响。因此，公平性问题近年来备受关注，学术界也涌现出了许多旨在提升公平性的分类器和预测模型。本文旨在探讨以下问题：这些不同的技术在本质上如何相互比较？造成差异的原因又是什么？具体而言，我们希望引起人们对这类公平性干预措施中诸多未受足够重视方面的关注。为此，我们介绍了一个自行开发的公开基准测试平台，该平台允许我们在多种公平性指标及大量现有数据集上比较不同的算法。研究发现，尽管不同算法倾向于偏好特定形式的公平性保障，但这些指标之间却高度相关。此外，我们还发现，保持公平性的算法对数据集构成的变化较为敏感（我们的基准测试通过调整训练集与测试集的比例来模拟这种变化），这表明公平性干预措施的实际稳健性可能比先前预期的要差。\n\n* [三思而后行：通过预测性模拟评估离散选择模型](https:\u002F\u002Farxiv.org\u002Fabs\u002F1806.02307)；蒂莫西·布拉思韦特；图形化模型检验：通常，离散选择模型的研究者会不断开发更为复杂的模型和估计方法。然而，相较于模型开发和估计技术取得的显著进步，模型检验技术却相对滞后。许多选择模型研究者仅采用一些粗略的方法来评估已估计模型对现实的拟合程度。这些方法往往仅限于检查参数符号、模型弹性以及系数之间的比例关系。在本文中，我通过引入基于预测性模拟图形展示的模型检验程序，大幅扩展了离散选择模型研究者的评估工具箱。\n\n* [基于示例和特征重要性的黑盒机器学习模型解释](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1812.09044); Ajaya Adhikari, D.M.J Tax, Riccardo Satta, Matthias Fath; 随着机器学习模型精度的提高，它们往往变得更加复杂，难以被人类理解。这些模型的“黑箱”特性阻碍了其在实际应用中的推广，尤其是在医疗保健或国防等高风险领域，失败的后果可能极为严重。为机器学习模型或预测提供可理解且有用的解释，能够增强用户的信任。基于示例的推理——即利用以往处理类似任务的经验来做出决策——是一种广为人知的问题解决与论证策略。本文提出了一种新的解释提取方法LEAFAGE，用于任何黑盒机器学习模型的预测。该解释包括训练集中相似示例的可视化以及各特征的重要性。此外，这些解释具有对比性，旨在考虑用户的期望。LEAFAGE从对底层黑盒模型的忠实度和对用户的实用性两个方面进行了评估。结果表明，在具有非线性决策边界的机器学习模型上，LEAFAGE在忠实度方面总体优于当前最先进的方法LIME。研究还进行了一项用户研究，重点比较基于示例的解释与基于特征重要性的解释之间的差异。结果显示，就感知到的透明度、信息充分性、能力感和信心而言，基于示例的解释显著优于基于特征重要性的解释。出乎意料的是，当测试参与者获得的知识时，发现他们通过查看基于特征重要性的解释后，对黑盒模型的理解反而比完全没有解释时更少。参与者认为基于特征重要性的解释模糊不清，且难以推广到其他实例。\n\n\n\n### 2017年\n\n* [可解释人工智能：警惕疯人院里的病人掌权——或者：我如何学会不再担忧并爱上社会与行为科学](https:\u002F\u002Farxiv.org\u002Fabs\u002F1712.00547); Tim Miller, Piers Howe, Liz Sonenberg; 在其经典著作《疯人院里的病人正在掌权：为什么高科技产品让我们抓狂，以及如何恢复理智》[2004，Sams印第安纳波利斯，美国]中，Alan Cooper认为，软件经常从用户角度设计不佳的一个主要原因在于，程序设计师而非交互设计师主导了设计决策。因此，程序员往往为自己设计软件，而不是为目标用户服务，他将这一现象称为“疯人院里的病人掌权”。本文指出，可解释人工智能也面临类似的命运。尽管可解释人工智能的重新兴起是积极的，但本文认为，大多数人工智能研究人员实际上是在为自己构建解释性工具，而非为最终用户服务。然而，如果研究人员和从业者能够理解、采纳、实施并改进哲学、心理学和认知科学等领域丰富而有价值的理论成果，并且在评估这些模型时更多地关注人本身而非技术本身，那么可解释人工智能才更有可能取得成功。通过对相关文献的初步梳理，我们证明了将更多来自社会与行为科学的研究成果融入可解释人工智能具有广阔的空间，并列举了一些与可解释人工智能相关的关键研究成果。\n\n* [用于在R中交互式诊断随机森林分类器的交互式图形](https:\u002F\u002Farxiv.org\u002Fabs\u002F1704.02502); Natalia da Silva, Dianne Cook, Eun-Kyung Lee; 本文描述了如何组织数据并构建图表，以交互方式探索随机森林分类模型。随机森林分类器是集成学习的一种，由对多棵决策树进行自助采样并组合其结果而成。通过自助采样和整合多棵树的结果，会产生大量诊断指标，结合交互式图形，可以深入洞察高维空间中的类别结构。本文探讨了多个方面，用以评估模型复杂性、各子模型的贡献、变量重要性和降维，以及与单个观测相关的预测不确定性。这些思路被应用于随机森林算法和投影寻踪森林，但也同样适用于其他采用自助采样的集成模型。\n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_3361c622da3e.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_3361c622da3e.png)\n\n* [黑帽可视化](https:\u002F\u002Fidl.cs.washington.edu\u002Ffiles\u002F2017-BlackHatVis-DECISIVe.pdf); Michael Correll, Jeffrey Heer; 人们会以各种方式撒谎、误导或胡说八道。作为沟通形式之一，可视化也不例外。然而，我们用来描述人们如何利用可视化手段进行误导的语言却相对匮乏。例如，人们可能会“用可视化撒谎”或使用“欺骗性可视化”。在本文中，我们借鉴计算机安全领域的术语，扩展了不道德者（黑帽）为不良目的操纵可视化的方式。除了可视化文献中已有详细讨论的欺骗形式外，我们还重点关注那些忠实于原始数据的可视化（因此在常规的可视化语境下可能并不被视为欺骗），但仍然会对数据的感知产生负面影响。我们鼓励设计师从防御性和全面性的角度思考其视觉设计可能导致的数据误读问题。\n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_1b87d594402e.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_1b87d594402e.png)\n\n* [基于实例级解释的二分类器可视化诊断工作流](https:\u002F\u002Farxiv.org\u002Fabs\u002F1705.01968)；Josua Krause、Aritra Dasgupta、Jordan Swartz、Yindalon Aphinyanaphongs、Enrico Bertini；人机协作的数据分析应用要求机器学习模型具有更高的透明度，以便专家理解并信任其决策。为此，我们提出了一种可视分析工作流，帮助数据科学家和领域专家探索、诊断并理解二分类器的决策过程。该方法利用“实例级解释”——即用于解释单个样本的局部特征重要性度量——构建一系列可视化表示，引导用户进行深入调查。该工作流主要包含三种可视化表示及相应步骤：一种基于聚合统计，用于观察数据在正确与错误决策之间的分布情况；一种基于解释，用于理解哪些特征被用来做出这些决策；另一种则基于原始数据，以挖掘导致所观察到模式的潜在根本原因。\n* [公平森林：通过正则化树结构构建最小化模型偏差的随机森林](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1712.08197.pdf)；Edward Raff、Jared Sylvester、Steven Mills；近年来，机器学习算法输出中可能存在的不公平性问题，不仅在学术界，也在更广泛的社会层面引起了关注。令人惊讶的是，此前尚未有研究致力于开发用于构建公平决策树或公平随机森林的树结构生成算法。这类方法因其兼具可解释性、非线性建模能力以及易用性而广受欢迎。本文首次提出了公平决策树的构建技术。实验表明，“公平森林”既保留了树模型的优势，又在“群体公平”和“个体公平”两个维度上，同时实现了比其他方法更高的准确性和公平性。此外，我们还引入了新的公平性度量指标，能够处理多类别和连续型特征以及回归问题，而不仅仅是二元特征和标签。最后，我们提出了一种更为稳健的算法评估流程，该流程会综合考虑整个数据集，而非仅针对某一特定受保护属性。\n* [迈向可解释机器学习的严谨科学](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1702.08608.pdf)；Finale Doshi-Velez 和 Been Kim；在这种情况下，一个常见的应对策略是采用可解释性标准：如果系统能够解释其推理过程，我们便可以验证该推理是否符合这些辅助性标准。然而，目前对于什么是机器学习中的可解释性以及如何对其进行基准测试式评估，仍缺乏共识。在很大程度上，现有的评估方法都依赖于一种“一目了然”的主观判断。我们是否应该对这种缺乏严谨性的现状感到担忧？多目标权衡：目标不一致：伦理：安全：科学理解：\n* [注意力驱动的解释：为决策提供依据并指向证据](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1711.07373.pdf)；Dong Huk Park 等；由于在各类视觉任务中表现出色，深度模型已成为视觉决策问题的事实标准。我们提出了两个大规模数据集，其中包含针对不同活动分类决策（ACT-X）以及问答任务（VQA-X）的视觉和文本双重解释性标注。\n* [SPINE：稀疏可解释的神经网络词嵌入](https:\u002F\u002Farxiv.org\u002Fabs\u002F1711.08792)；Anant Subramanian、Danish Pruthi、Harsh Jhamtani、Taylor Berg-Kirkpatrick、Eduard Hovy；缺乏解释的预测其实际价值有限。神经网络模型的成功很大程度上归功于它们能够学习丰富、稠密且富有表现力的表征。然而，尽管这些表征能够捕捉数据背后的复杂性和潜在趋势，却远未达到可解释的程度。为此，我们提出了一种新颖的去噪k稀疏自编码器变体，能够从现有最先进的GloVe和word2vec等词嵌入基础上，生成高效且可解释的分布式词表示。通过大规模的人工评估，我们发现所生成的词嵌入相比原始的GloVe和word2vec嵌入，可解释性显著提升。此外，在一系列下游基准任务中，我们的词嵌入也优于现有的主流词嵌入。\n* [利用模型解释检测数据流中的概念漂移](https:\u002F\u002Fwww.researchgate.net\u002Fpublication\u002F320177686_Detecting_concept_drift_in_data_streams_using_model_explanation)；Jaka Demšar、Zoran Bosnic；解释工具的一个有趣应用场景——类似PDP的解释器可用于识别概念漂移。\n* [使用ExplainPrediction解释预测模型](http:\u002F\u002Fwww.informatica.si\u002Findex.php\u002Finformatica\u002Farticle\u002Fview\u002F2227\u002F1121)介绍了两种用于局部和全局解释的方法EXPLAIN和IME（R包）。\n* [为医疗领域构建可解释人工智能系统需要什么？](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1712.09923.pdf)；Andreas Holzinger、Chris Biemann、Constantinos Pattichis、Douglas Kell。本文概述了我们在相对较新的可解释人工智能领域的一些研究方向，并特别关注其在医学领域的应用，因为医学是一个非常特殊的领域。这主要是由于医疗专业人员通常需要处理分散、异构且复杂的数据源。本文聚焦于三大类数据：影像、组学数据和文本。我们认为，可解释人工智能的研究总体上有助于推动人工智能和机器学习在医疗领域的应用，尤其是能够提升透明度和信任度。然而，人工智能和机器学习技术的整体效能，往往受限于算法无法向人类专家解释其结果这一缺陷——而这正是医疗领域面临的一大挑战。\n\n### 2016年\n\n* [监督学习中的机会均等](https:\u002F\u002Farxiv.org\u002Fabs\u002F1610.02413)；莫里茨·哈特、埃里克·普赖斯、内森·斯雷布罗；我们提出了一种在监督学习中针对特定敏感属性进行歧视的判定标准，其目标是基于可用特征预测某个目标变量。假设我们已知预测器、目标变量以及个体所属受保护群体的相关数据，我们展示了如何对任意已学习的预测器进行最优调整，以消除按照我们的定义所界定的歧视。此外，我们的框架通过将低分类准确率的成本从弱势群体转移到决策者身上来改善激励机制，而决策者则可以通过提高分类准确性来应对这一变化。\n与其它研究一致，我们的概念是“无感知”的：它仅依赖于预测器、目标变量和受保护属性的联合统计分布，而不涉及对单个特征的解释。我们研究了基于此类无感知度量来定义和识别偏见的内在局限性，并阐明了不同无感知测试能够推断出什么、又不能推断出什么。\n我们以FICO信用评分为例说明了这一概念。\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_be06e21002f3.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_be06e21002f3.png)\n\n* [与预测交互：黑箱机器学习模型的可视化检查](http:\u002F\u002Fperer.org\u002Fpapers\u002FadamPerer-Prospector-CHI2016.pdf)；约书亚·克劳斯、亚当·佩雷尔、肯尼·吴；介绍了Prospector——一款用于预测模型可视化探索的工具。其中包含一些有趣且新颖的想法，例如部分依赖条形图。Prospector可以比较多个模型，并同时展示局部和全局的解释。\n\n* [模型可解释性的迷思](https:\u002F\u002Farxiv.org\u002Fabs\u002F1606.03490)；扎卡里·C·利普顿；监督学习模型具有非凡的预测能力。但你真的能信任你的模型吗？它在实际部署中会表现良好吗？它还能告诉你关于世界的哪些信息呢？我们希望模型不仅性能优异，而且具备可解释性。然而，可解释性的任务似乎缺乏明确的定义。(…) 首先，我们考察了人们对可解释性感兴趣的各种动机，发现这些动机多种多样，有时甚至相互矛盾。接着，我们探讨了被认为能够赋予模型可解释性的特性及技术，指出面向人类的透明性和事后解释是两种相互竞争的概念。在整个过程中，我们讨论了不同概念的可行性和合理性，并质疑一种常见的说法——即线性模型是可解释的，而深度神经网络则不是。\n\n* [是什么让分类树易于理解？](https:\u002F\u002Fwww.sciencedirect.com\u002Fscience\u002Farticle\u002Fpii\u002FS0957417416302901)；罗克·皮尔塔韦尔、米特雅·卢什特雷克、马蒂亚日·甘姆斯、桑达·马丁契奇-伊普希奇；分类树因其易懂性而在实际应用中备受青睐。然而，关于影响其可理解性和可用性的参数的研究却十分匮乏。本文系统地研究了树结构参数（叶子数量、分支因子、树深度）以及可视化特性如何影响分类树的可理解性。此外，我们还分析了问题深度（回答有关分类树的问题时所需访问的最深叶子的深度）的影响，结果表明这是最重要的参数，尽管它通常被忽视。分析基于精心设计的问卷调查所得的实证数据，该问卷包含98个问题，由69名受访者作答。论文评估了几种树的可理解性指标，并提出了两个新指标（叶子深度加权总和以及从根到叶子路径上分支因子的加权总和），这些指标得到了调查结果的支持。新可理解性指标的主要优势在于，它们不仅考虑了树的结构，还结合了树的语义信息。\n\n### 2015年\n\n* [基于残差的预测曲线——评估预测模型性能的可视化工具](https:\u002F\u002Fwww.ncbi.nlm.nih.gov\u002Fpubmed\u002F26676377)；朱塞佩·卡萨利基奥、伯恩德·比施尔、安妮-洛尔·布勒斯泰克斯、马蒂亚斯·施密德；RBP（基于残差的预测）曲线同时反映了预测模型的校准程度和区分能力。此外，该曲线还可方便地用于进行有效的性能检验和指标比较。RBP曲线已在R包RBPcurve中实现。\n* [基于规则与贝叶斯分析的可解释分类器：构建更好的脑卒中预测模型](https:\u002F\u002Farxiv.org\u002Fabs\u002F1511.01644)；本杰明·莱瑟姆、辛西娅·鲁丁、泰勒·H·麦科米克、大卫·马迪根；我们的目标是构建不仅准确，而且对人类专家可解释的预测模型。我们的模型是决策列表，由一系列“如果…那么…”语句组成（例如，“如果血压高，则发生脑卒中”），这些语句将高维的多变量特征空间离散化为一系列简单且易于理解的决策陈述。我们引入了一种名为贝叶斯规则列表的生成模型，该模型给出了可能的决策列表的后验分布，并采用了一种新颖的先验结构来鼓励稀疏性。\n\n### 2009年\n\n* [如何解释个体分类决策](https:\u002F\u002Farxiv.org\u002Fpdf\u002F0912.1128.pdf)，大卫·贝伦斯、蒂蒙·施罗特、斯特凡·哈梅林、川边元章、卡佳·汉森、克劳斯-罗伯特·穆勒；（摘自摘要）目前唯一能够提供此类解释的方法是决策树。……一种模型无关的方法，引入了“解释向量”，用以总结模型决策随输入变化的陡峭程度。\n\n### 2005年\n\n* [隐性知识的暴政：人工智能告诉我们关于知识表示的什么](https:\u002F\u002Fwww.researchgate.net\u002Fpublication\u002F224755645_The_Tyranny_of_Tacit_Knowledge_What_Artificial_Intelligence_Tells_us_About_Knowledge_Representation)；库尔特·D·芬斯特马赫；波兰尼的隐性知识概念揭示了一个观点：“我们所知道的往往多于我们所能言说的。”许多知识管理领域的研究者都曾利用隐性知识的概念，将无法正式表述的知识（隐性知识）与能够被正式表述的知识（显性知识）区分开来。我则认为，知识管理研究者对隐性知识的过度推崇，由于两个重要原因，阻碍了潜在的富有成效的工作。首先，无法将知识明确表达出来并不意味着该知识就无法被正式表示；其次，即使承认人们头脑中的隐性知识无法被形式化，也不能排除计算机系统通过其他表示方式完成相同任务的可能性。通过对人工智能相关工作的回顾，我将论证，为了研究和构建知识管理系统，我们需要一个更为丰富的认知与知识表示模型。\n\n### 2004年\n\n* [在黑盒函数中发现加性结构](https:\u002F\u002Fdl.acm.org\u002Fcitation.cfm?doid=1014052.1014122)，贾尔斯·胡克\n\n\n## 图书\n\n### 2020年\n\n* [人工智能的鲁棒性与可解释性](https:\u002F\u002Fpublications.jrc.ec.europa.eu\u002Frepository\u002Fbitstream\u002FJRC119336\u002Fdpad_report.pdf)；哈蒙·罗南、容克莱维茨·亨里克、桑切斯·伊格纳西奥著；从技术解决方案到政策解决方案；JRC技术报告；由欧盟委员会的科学与知识服务机构——联合研究中心（JRC）发布的技术报告。\n\n![jrc_xai](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_de5cd8e3c268.png)\n\n\n### 2019年\n\n* [预测模型：探索、解释与调试](https:\u002F\u002Fgithub.com\u002Fpbiecek\u002FPM_VEE)；普热米斯瓦夫·别切克、托马什·布尔齐科夫斯基著。如今，预测建模的瓶颈并不在于数据不足、计算能力匮乏、算法不够完善或模型不够灵活，而在于缺乏用于模型验证、模型探索以及解释模型决策的工具。因此，在本书中，我们介绍了一系列可用于此目的的方法。\n\n![drwhy_local_explainers.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_dcd02fdc6284.png)\n\n* [可解释的人工智能：深度学习的解读、解释与可视化](https:\u002F\u002Fwww.springer.com\u002Fgp\u002Fbook\u002F9783030289539)；萨梅克、蒙塔冯、韦达尔迪、汉森、穆勒—克劳斯等著。能够自主做出决策并执行任务的“智能”系统的开发，可能会带来更快、更一致的决策。然而，人工智能技术更广泛普及的一个限制因素，是将人类控制权和监督权交予“智能”机器所带来的固有风险。对于涉及关键基础设施、影响人类福祉或健康的敏感任务而言，必须严格限制不当、不稳健且不安全的决策和行动的可能性。在部署人工智能系统之前，我们迫切需要对其行为进行验证，从而确保其在真实环境中部署后仍能按预期运行。为实现这一目标，人们一直在探索让人类能够验证人工智能决策结构与其自身基于事实的知识之间是否一致的方法。可解释的人工智能（XAI）作为人工智能的一个子领域，专注于以系统化且易于理解的方式向人类揭示复杂的人工智能模型。\n\n本书包含的22章，及时呈现了近年来提出的可解释人工智能及相关技术的算法、理论和应用，反映了该领域的当前讨论，并指明了未来的发展方向。全书共分为六部分：迈向人工智能透明度；人工智能系统的解释方法；解释人工智能系统的决策；评估可解释性和解释结果；可解释人工智能的应用；以及可解释人工智能的软件工具。\n\n### 2018年\n\n* [使用H2O Driverless AI进行机器学习可解释性分析](http:\u002F\u002Fdocs.h2o.ai\u002Fdriverless-ai\u002Flatest-stable\u002Fdocs\u002Fbooklets\u002FMLIBooklet.pdf)；帕特里克·霍尔、纳夫迪普·吉尔、梅根·库尔卡、温·范著；\n* [机器学习可解释性导论](https:\u002F\u002Fwww.oreilly.com\u002Flibrary\u002Fview\u002Fan-introduction-to\u002F9781492033158\u002F)；纳夫迪普·吉尔、帕特里克·霍尔著；书中包含大量优秀图表，对模型可解释性的常用技术进行了高层次概述。\n* [可解释的机器学习](https:\u002F\u002Fchristophm.github.io\u002Finterpretable-ml-book\u002F)；克里斯托夫·莫尔纳尔著；介绍了最流行的方法（LIME、PDP、SHAP等）以及关于可解释性的更为宏观的概览。\n\n\n## 工具\n\n### 2019年\n* [ExplainX](https:\u002F\u002Fgithub.com\u002FexplainX\u002Fexplainx)；ExplainX 是一个快速、轻量且可扩展的可解释人工智能框架，旨在帮助数据科学家仅用一行代码就能向业务相关方解释任何黑盒模型。该库由纽约大学 VIDA 实验室的人工智能研究人员维护。详细的文档也可在此[网站](https:\u002F\u002Fwww.explainx.ai\u002F)上找到。\n\n![https:\u002F\u002Fcamo.githubusercontent.com\u002F03f9e0729544717710427ed393dae32b8d055159\u002F68747470733a2f2f692e6962622e636f2f7734534631474a2f47726f75702d322d312e706e67](https:\u002F\u002Fcamo.githubusercontent.com\u002F03f9e0729544717710427ed393dae32b8d055159\u002F68747470733a2f2f692e6962622e636f2f7734534631474a2f47726f75702d322d312e706e67)\n\n* [EthicalML \u002F xai](https:\u002F\u002Fgithub.com\u002FEthicalML\u002Fxai)；XAI 是一个以 AI 可解释性为核心设计的机器学习库。它包含多种工具，可用于对数据和模型进行分析与评估。XAI 库由伦理人工智能与机器学习研究所维护，并基于“负责任机器学习的八项原则”开发而成。您可以在 https:\u002F\u002Fethicalml.github.io\u002Fxai\u002Findex.html 找到相关文档。\n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_aacd2288881e.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_aacd2288881e.png)\n\n* [Aequitas：偏见与公平审计工具包](https:\u002F\u002Farxiv.org\u002Fabs\u002F1811.05577v2)；近期的研究引发了人们对当前使用的 AI 系统中潜在无意偏见的担忧，这些偏见可能基于种族、性别或宗教等特征而对个人造成不公平的影响。尽管近年来提出了许多偏见度量指标和公平性定义，但尚未就应采用哪种指标或定义达成共识，且可用于实际操作的资源也十分有限。Aequitas 能够帮助数据科学家、机器学习研究人员以及政策制定者在开发和部署算法决策系统时做出知情且公平的决策。\n\n![fairnessTree.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_2e1fbfdbc401.png)\n\n* [tf-explain](https:\u002F\u002Fgithub.com\u002Fsicara\u002Ftf-explain)；tf-explain 将可解释性方法实现为 TensorFlow 2.0 的回调函数，从而简化神经网络的理解过程。详情请参阅[介绍 tf-explain：TensorFlow 2.0 的可解释性工具](https:\u002F\u002Fblog.sicara.com\u002Ftf-explain-interpretability-tensorflow-2-9438b5846e35)。\n\n* [由伦理人工智能与机器学习研究所维护的 XAI 库](https:\u002F\u002Fgithub.com\u002FEthicalML\u002Fxai)；XAI 是一个以 AI 可解释性为核心设计的机器学习库。它包含多种工具，可用于对数据和模型进行分析与评估。该库由伦理人工智能与机器学习研究所维护，并基于[负责任机器学习的八项原则](https:\u002F\u002Fethical.institute\u002Fprinciples.html)开发而成。\n\n* [微软的 InterpretML](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Finterpret)；微软推出的一款与机器学习模型可解释性相关的 Python 库。\n\n![interpretML.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_b95cb8fc2191.png)\n\n* [利用佩尔的结构因果模型从观测数据中评估因果关系](https:\u002F\u002Fblog.methodsconsultants.com\u002Fposts\u002Fpearl-causality\u002F)；\n* [sklearn_explain](https:\u002F\u002Fgithub.com\u002Fantoinecarme\u002Fsklearn_explain)；模型解释能够帮助理解各个预测变量对个体评分构成的影响。\n* [heatmapping.org](http:\u002F\u002Fwww.heatmapping.org\u002F)；该网页旨在汇集弗劳恩霍夫 HHI 研究所、柏林工业大学和新加坡科技与设计大学联合项目所产生的出版物及软件，致力于开发新的方法来理解最先进机器学习模型的非线性预测。机器学习模型，尤其是深度神经网络（DNN），具有极强的预测能力，但在许多情况下却难以被人类理解。解释非线性分类器对于建立对预测结果的信任、识别潜在的数据选择偏差或伪影至关重要。该项目特别研究如何将预测分解为各个输入变量的贡献，以便将这种分解结果（即解释）以与输入数据相同的方式可视化。\n\n* [iNNvestigate 神经网络！](https:\u002F\u002Fgithub.com\u002Falbermax\u002Finnvestigate)；由 [heatmapping.org](http:\u002F\u002Fwww.heatmapping.org\u002F) 的作者创建的一个工具箱，旨在更好地理解神经网络。它包含了显著图、反卷积网络、引导反向传播、平滑梯度、集成梯度、LRP、PatternNet 和归因等多种方法的实现。该库提供了一个通用接口和开箱即用的实现方案，适用于多种分析方法。\n\n![innvestigate.PNG](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_f60d49b63985.png)\n\n* [ggeffects](https:\u002F\u002Fstrengejacke.wordpress.com\u002F2019\u002F01\u002F14\u002Fggeffects-0-8-0-now-on-cran-marginal-effects-for-regression-models-rstats\u002F)；丹尼尔·吕德克；用于计算统计模型的边际效应，并将结果以整洁的数据框形式返回。这些数据框可以直接与 ggplot2 包配合使用。边际效应可以针对多种不同的模型进行计算。交互项、样条曲线和多项式项同样受到支持。主要函数包括 ggpredict()、ggemmeans() 和 ggeffect()。此外，还有一个通用的 plot() 方法，可用于借助 ggplot2 绘制结果。\n\n* [对比 LRP](https:\u002F\u002Fgithub.com\u002FJindong-Explainable-AI\u002FContrastive-LRP)——一篇论文《通过对比反向传播理解 CNN 的个体决策》（https:\u002F\u002Farxiv.org\u002Fpdf\u002F1812.02100.pdf）的 PyTorch 实现。该代码生成 CLRP 显著图，用于解释 VGG16 模型中的单个分类结果。\n\n* [相对归因传播](https:\u002F\u002Fgithub.com\u002FwjNam\u002FRelative_Attributing_Propagation)——一种新的 DNN 输出预测分解方法，它根据各层之间的相对影响，将归因分为相关（正向）和无关（负向）两部分。该方法的详细描述发表在论文 https:\u002F\u002Farxiv.org\u002Fpdf\u002F1904.00605.pdf 中。\n\n### 2018年\n\n* [KDD 2018：医疗AI中的可解释模型](https:\u002F\u002Fnotepad.mmakowski.com\u002FTech\u002FKDD%202018:%20Explainable%20Models%20for%20Healthcare%20AI)；该教程由KenSci公司的三位专家主讲，其中包括一位数据科学家和一位临床医生。会议的核心观点是，在机器学习的医疗应用中，可解释性尤为重要，因为决策的影响范围广、错误代价高，且涉及公平性和合规性要求。教程详细介绍了可解释性的多个方面，并探讨了可用于解释模型预测结果的技术。\n* [MAGMIL：面向可解释机器学习的模型无关方法](https:\u002F\u002Fgithub.com\u002Fankitbit\u002FMAGMIL)；欧盟将于2018年5月25日起生效的新《通用数据保护条例》将对机器学习算法的常规使用产生潜在影响，它限制了可能“显著影响”用户的自动化个人决策（即基于用户级特征进行决策的算法）。该法律还将有效确立“知情权”，允许用户要求对其作出的算法决策提供解释。鉴于这些对机器学习系统使用的严格规范，我们正致力于提高模型的可解释性。在努力深入理解机器学习模型决策的同时，从机器学习系统中提取解释信息——也称为模型无关的可解释性方法——相比特定于模型的可解释性方法，在灵活性上具有一定的优势。\n* [用于探究神经网络预测的工具箱！](https:\u002F\u002Fgithub.com\u002Falbermax\u002Finnvestigate)；Maximilian Alber；近年来，神经网络在目标检测、语音识别等诸多领域不断刷新技术前沿。然而，尽管取得了成功，神经网络通常仍被视为黑箱，其内部工作机制尚未完全阐明，预测依据也并不清晰。为了更好地理解神经网络，研究者们提出了多种方法，如显著性图、反卷积网络、引导反向传播、平滑梯度、积分梯度、LRP、PatternNet及归因分析等。由于缺乏统一的参考实现，对比这些方法的工作量极大。该库通过提供通用接口和开箱即用的实现，解决了这一问题，旨在简化神经网络预测的分析过程！\n* [黑盒审计、认证与消除差异性影响](https:\u002F\u002Fgithub.com\u002Falgofairness\u002FBlackBoxAuditing)；该仓库包含梯度特征审计（GFA）的示例实现，旨在使其能够推广到大多数数据集。有关修复流程的更多信息，请参阅我们的论文《认证与消除差异性影响》。关于完整的审计流程，请参阅我们的论文《针对间接影响的黑盒模型审计》。\n* [Skater：用于模型解释的Python库](https:\u002F\u002Fgithub.com\u002Fdatascienceinc\u002FSkater)；Skater是一个统一的框架，旨在为各种类型的模型提供解释能力，帮助构建适用于实际应用场景的可解释机器学习系统（我们正在积极努力实现对所有类型模型的忠实可解释性）。它是一个开源的Python库，设计用于揭示黑箱模型的内在结构，无论是在全局层面（基于完整数据集的推断）还是局部层面（针对单个预测的推断）。\n* [Weight Watcher](https:\u002F\u002Fgithub.com\u002FCalculatedContent\u002FWeightWatcher)；Charles Martin；Weight Watcher用于分析深度神经网络（DNN）权重矩阵中的厚尾分布。该工具无需测试集，即可预测一系列DNN模型（如VGG11、VGG13等）或整个ResNet系列模型的泛化准确率趋势。其原理基于近期关于DNN中重尾自正则化的研究成果。\n* [对抗鲁棒性工具箱 - ART](https:\u002F\u002Fgithub.com\u002FIBM\u002Fadversarial-robustness-toolbox)；这是一个专门用于对抗性机器学习的库，旨在快速构建和分析针对机器学习模型的攻击与防御方法。对抗鲁棒性工具箱提供了许多最先进的分类器攻击与防御方法的实现。\n* [模型描述器](https:\u002F\u002Fgithub.com\u002FDataScienceSquad\u002Fmodel-describer)；一个生成HTML报告的Python脚本，用于总结预测模型的信息，交互性强且内容丰富。\n* [AI公平性360](https:\u002F\u002Fgithub.com\u002FIBM\u002Faif360)；由IBM开发的Python库，用于帮助检测和消除机器学习模型中的偏差。[相关介绍](https:\u002F\u002Farxiv.org\u002Fabs\u002F1810.01943)\n* [What-If工具：无代码的机器学习模型探查工具](https:\u002F\u002Fai.googleblog.com\u002F2018\u002F09\u002Fthe-what-if-tool-code-free-probing-of.html)；这是谷歌开发的一款交互式What-If场景工具，属于TensorBoard的一部分。\n\n### 2017年\n\n* [分类特征的影响编码](https:\u002F\u002Fgithub.com\u002FDpananos\u002FCategorical-Features)；想象一下，你正在处理一个包含美国所有邮政编码的数据集。这个数据集几乎有4万个唯一的类别。如果你打算进行预测建模，该如何处理这类数据呢？独热编码并不能带来任何有用的结果，因为它会为你的数据集增加4万个稀疏变量。而直接丢弃这些数据又可能浪费掉宝贵的信息，因此也不太合适。在这篇文章中，我将探讨如何使用一种称为“影响编码”的策略来处理高基数的分类变量。为了说明这一点，我使用了一个二手车销售的数据集。这个问题特别适合演示，因为其中包含多个具有大量水平的分类特征。让我们开始吧。\n* [FairTest](https:\u002F\u002Fgithub.com\u002Fcolumbia\u002Ffairtest)；FairTest使开发者或审计机构能够发现并测试算法输出与由受保护特征标识的特定用户子群体之间是否存在不合理的关联。\n* [Explanation Explorer](https:\u002F\u002Fgithub.com\u002Fnyuvis\u002Fexplanation_explorer)；这是一个用Python实现的可视化工具，用于通过实例级解释（局部解释器）对二元分类器进行视觉诊断。\n* [ggeffects](https:\u002F\u002Fstrengejacke.wordpress.com\u002F2017\u002F05\u002F24\u002Fggeffects-create-tidy-data-frames-of-marginal-effects-for-ggplot-from-model-outputs-rstats\u002F)；从模型输出创建适用于ggplot的边际效应整洁数据框。ggeffects包的目标与broom包类似：将“不整洁”的输入转换为整洁的数据框，尤其便于后续与ggplot结合使用。然而，ggeffects并不返回模型摘要；相反，该包会计算统计模型在均值处的边际效应或平均边际效应，并以整洁的数据框形式（更准确地说，是tibbles格式）返回结果。\n\n## 文章\n\n### 2019年\n\n* [AI黑箱恐怖故事——当透明度比以往任何时候都更加重要时](https:\u002F\u002Fmedium.com\u002F@ODSC\u002Fai-black-box-horror-stories-when-transparency-was-needed-more-than-ever-3d6ac0439242)；可以说，2019年数据科学领域最大的争论之一就是对AI可解释性的需求。解释机器学习模型的能力正逐渐成为企业决策中接受统计模型的关键因素。企业利益相关者要求透明地了解这些算法是如何以及为何做出特定预测的。对机器学习中任何潜在偏见的清晰理解，始终处于数据科学团队需求的最前沿。因此，大数据生态系统中的许多顶级供应商都在推出新工具，试图破解打开AI“黑箱”这一难题。\n\n![https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_e54b757d6536.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_readme_e54b757d6536.png)\n\n* [人工智能面临“可重复性”危机](https:\u002F\u002Fwww.wired.com\u002Fstory\u002Fartificial-intelligence-confronts-reproducibility-crisis\u002F)；几年前，麦吉尔大学的计算机科学教授乔埃尔·派诺正在帮助她的学生设计一种新的算法，但他们的工作陷入僵局。她的实验室研究强化学习，这是一种人工智能技术，常被用来让虚拟角色（如“半猎豹”和“蚂蚁”等）在虚拟世界中自主学习运动方式。这是构建自主机器人和自动驾驶汽车的基础。派诺的学生希望改进另一个实验室的系统，但首先他们需要重新构建该系统，然而由于某种未知原因，他们的设计始终无法达到预期效果。直到有一天，学生们尝试了一些并未出现在另一家实验室论文中的“创造性操作”。\n* [模型解释器与新闻发言人——直接优化机器学习的信任度可能有害](https:\u002F\u002Fmedium.com\u002F@stuart.reynolds\u002Fmodel-explainers-and-the-press-secretary-optimizing-for-trust-in-machine-learning-may-be-harmful-84275b27bea6)；如果黑箱模型解释器旨在优化人类对机器学习模型的信任，那么我们为什么不能期待黑箱模型解释器会像一位不诚实的政府新闻发言人一样行事呢？\n* [破解黑箱：Python中可解释机器学习模型的重要入门指南](https:\u002F\u002Fwww.analyticsvidhya.com\u002Fblog\u002F2019\u002F08\u002Fdecoding-black-box-step-by-step-guide-interpretable-machine-learning-models-python\u002F)；安基特·乔杜里；可解释的机器学习是每位数据科学家都应了解的关键概念；那么，如何构建可解释的机器学习模型呢？本文将提供一个框架，并且我们还将使用Python编写这些可解释的机器学习模型。\n\n* [我，黑箱：可解释的人工智能与人类 deliberative 过程的局限](https:\u002F\u002Fwarontherocks.com\u002F2019\u002F07\u002Fi-black-box-explainable-artificial-intelligence-and-the-limits-of-human-deliberative-processes\u002F)；在战场上使用人工智能（AI）的伦理问题上，关于理解机器内部运作的重要性已被广泛讨论。致命性自主武器政府专家组会议的代表们仍在不断提出这一议题。法律和科学学者对此表达了大量担忧。一位评论员总结道：“为了让人类决策者能够在涉及 AI 的道德相关决策中保持自主性，他们需要清晰地洞察 AI 黑箱，理解数据、其来源以及算法逻辑。”\n* [教授人工智能、伦理、法律与政策](https:\u002F\u002Farxiv.org\u002Fabs\u002F1904.12470)；Asher Wilk；网络空间以及利用人工智能（AI）开发智能系统，为计算机专业人士、数据科学家、监管机构和政策制定者带来了新的挑战。例如，自动驾驶汽车引发了全新的技术、伦理、法律和政策问题。本文提出了一门名为“计算机、伦理、法律与公共政策”的课程，并给出了相应的课程大纲。文章探讨了构建和使用软件及人工智能时所涉及的伦理、法律和公共政策问题，同时阐述了与 AI 系统相关的伦理原则和价值观。\n* [可解释人工智能简介及其必要性](https:\u002F\u002Fwww.kdnuggets.com\u002F2019\u002F04\u002Fintroduction-explainable-ai.html)；Patrick Ferris；今年我有幸参加了知识发现与数据挖掘（KDD）大会。在众多报告中，有两个研究方向尤为引人关注：一是如何为图结构找到有意义的表示形式，以输入到神经网络中。DeepMind 的 Oriol Vinyals 介绍了他们的消息传递神经网络。另一个方向，也是本文的重点，就是可解释的人工智能模型。随着我们开发出更多创新的神经网络应用，关于“它们是如何工作的？”这一问题变得愈发重要。\n* [AI 黑箱解释难题](https:\u002F\u002Fercim-news.ercim.eu\u002Fen116\u002Fspecial\u002Fthe-ai-black-box-explanation-problem)；从高层次来看，我们将这一问题归纳为两种不同的模式：设计驱动型解释（XbD），即给定一组用于训练的决策记录数据集，如何开发一个兼具决策功能及其解释的机器学习模型；黑箱解释（BBX），即给定由黑箱决策模型生成的决策记录，如何为其重建解释。\n* [VOZIQ 推出“Agent Connect”，一款可解释的人工智能产品，助力大规模客户留存计划](https:\u002F\u002Fwww.einnews.com\u002Fpr_news\u002F481152181\u002Fvoziq-launches-agent-connect-an-explainable-ai-product-to-enable-large-scale-customer-retention-programs)；美国弗吉尼亚州雷斯顿，2019 年 4 月 3 日 \u002FEINPresswire.com\u002F——VOZIQ 是一家基于云的企业级应用解决方案提供商，旨在帮助实现经常性收入的企业开展大规模预测性客户留存计划。该公司宣布推出其全新的可解释人工智能（XAI）产品“Agent Connect”，以提升企业最关键资源——客户留存专员——的主动式留存能力。“Agent Connect”是 VOZIQ 最新推出的下一代可解释人工智能产品，它能够整合多种留存风险信号，结合客户表达与推断的需求、情感倾向、流失驱动因素及行为模式，从而从数百万次客户互动中直接挖掘出导致客户流失的关键信息，并将这些洞察转化为易于执行的、具有指导性的预测性客户健康状况分析。\n* [降低机器学习和人工智能的风险](https:\u002F\u002Fwww.mckinsey.com\u002Fbusiness-functions\u002Frisk\u002Four-insights\u002Fderisking-machine-learning-and-artificial-intelligence)；机器学习和人工智能有望通过利用海量数据构建模型，从而改善决策、定制服务并加强风险管理，进而彻底改变银行业。据麦肯锡全球研究院估计，这将在银行业创造超过 2500 亿美元的价值。然而，这一趋势也存在潜在风险：机器学习模型会放大某些模型风险因素。尽管许多银行，尤其是那些位于监管要求严格的司法管辖区的银行，已经建立了验证框架和实践来评估并缓解传统模型的相关风险，但这些措施往往不足以应对机器学习模型带来的风险。鉴于此，许多银行正采取谨慎态度，仅将机器学习模型应用于低风险场景，如数字营销。考虑到可能面临的财务、声誉和监管风险，这种谨慎做法完全可以理解。例如，银行可能会因违反反歧视法律而面临巨额罚款——正是这一担忧促使一家银行禁止其人力资源部门使用机器学习简历筛选工具。不过，更好的方法——也是银行若想充分受益于机器学习模型所必须采取的唯一可持续方式——是加强模型风险管理。\n* [可解释的人工智能应帮助我们避免第三次“AI 冬季”](https:\u002F\u002Fwww.computing.co.uk\u002Fctg\u002Fopinion\u002F3073390\u002Fexplainable-ai-should-help-us-avoid-a-third-ai-winter)；去年在整个欧洲生效的《通用数据保护条例》（GDPR）确实提高了消费者和企业的个人数据意识。然而，过度纠正数据收集行为可能会对关键的人工智能发展产生负面影响，这一风险不容忽视。这不仅是数据科学家面临的问题，同样也影响着那些利用 AI 解决方案提升竞争力的企业。潜在的负面影响不仅会波及实施 AI 的企业，还可能导致消费者错失 AI 为其依赖的产品和服务所带来的益处。\n* [可解释的人工智能：从预测到理解](https:\u002F\u002Fmedium.com\u002F@ODSC\u002Fexplainable-ai-from-prediction-to-understanding-38c81c11460)；仅仅做出预测是不够的。有时，我们需要获得深入的理解。仅仅建立了一个模型，并不意味着我们就真正了解它的运作机制。在传统的机器学习中，算法会输出预测结果，但在某些情况下，这并不足以解决问题。George Cevora 博士解释了为什么 AI 的黑箱模式并不总是适用，以及如何从预测转向理解。\n* [为什么可解释的人工智能（XAI）是营销和电子商务的未来](https:\u002F\u002Fwww.the-future-of-commerce.com\u002F2019\u002F03\u002F11\u002Fwhat-is-explainable-ai-xai\u002F)；“新一代机器学习系统将具备解释其推理过程、描述其优缺点以及传达对其未来行为方式理解的能力。”——DARPA 负责人 David Gunning。随着机器学习在商业和内容领域个性化客户体验中的作用日益增强，最有力的机会之一就是开发能够让营销人员通过可操作的洞察最大化每一分营销预算投入的系统。然而，AI 在商业中用于提供可操作性洞察的同时，也带来了一个挑战：营销人员如何才能了解并信任 AI 系统作出行动建议背后的逻辑？由于 AI 借助极其复杂的流程做出决策，其决策过程往往对终端用户而言不透明。\n* [可解释的 AI 或者我如何学会不再担心并信任 AI](https:\u002F\u002Ftowardsdatascience.com\u002Finterpretable-ai-or-how-i-learned-to-stop-worrying-and-trust-ai-e61f9e8ee2c2) 构建稳健且无偏见的 AI 应用程序的技术；Ajay Thampi；仅在过去五年里，AI 研究人员就在图像识别、自然语言理解以及棋类游戏等领域取得了重大突破！随着企业开始考虑将关键决策权交予 AI，尤其是在医疗和金融等行业，人们对复杂机器学习模型缺乏理解的问题显得尤为突出。这种理解不足可能导致模型传播偏见，我们在刑事司法、政治、零售、面部识别和语言理解等领域都已目睹了不少此类案例。\n* [探寻可解释的人工智能](https:\u002F\u002Fwww.geopoliticalmonitor.com\u002Fin-search-of-explainable-artificial-intelligence\u002F)；如今，如果一位初创企业家想要弄清银行为何拒绝了他的贷款申请，或者如果一位年轻毕业生想知道她心仪的大型企业为何没有邀请她参加面试，他们将无法得知这些决定背后的原因。因为银行和企业都使用了人工智能（AI）算法来决定贷款或求职申请的结果。实际上，这意味着如果你的贷款申请被拒，或者你的简历未获采纳，你将不会收到任何解释。这种情况令人尴尬，也使得 AI 技术常常只能提供需要人类进一步验证的解决方案。\n* [可解释的人工智能与规则的复兴](https:\u002F\u002Fwww.forbes.com\u002Fsites\u002Ftomdavenport\u002F2019\u002F03\u002F18\u002Fexplainable-ai-and-the-rebirth-of-rules\u002F)；人工智能（AI）常被称为“预测机器”。总体而言，这项技术非常擅长生成自动化预测。然而，如果你想在受监管的行业中使用人工智能，最好能够解释机器是如何预测欺诈行为、犯罪嫌疑人、不良信用风险或适合参与药物试验的候选人的。国际律师事务所泰勒·韦辛希望利用 AI 作为分诊工具，帮助其客户评估其可能受到《现代奴隶法》或《反海外腐败法》等法规约束的程度。这些客户通常在全球范围内拥有供应商或进行并购，因此需要系统的尽职调查来确定哪些地方需要更深入的审查。供应链尤其复杂，特别是当其中包含数百家小型供应商时。关于规则引擎即将消亡的传言被大大夸大了。\n* [用更智能的机器学习打击歧视](https:\u002F\u002Fresearch.google.com\u002Fbigpicture\u002Fattacking-discrimination-in-ml\u002F)；这里我们讨论的是“阈值分类器”，它是某些机器学习系统中与歧视问题密切相关的一部分。阈值分类器本质上是一种二元决策机制，将事物归入某一类别或另一类别。我们探讨了这些分类器的工作原理、可能存在的不公平之处，以及如何将不公平的分类器改进为更加公平的形式。以贷款审批为例，银行可能会根据一个自动计算出的单一数值（如信用评分）决定是否批准贷款。\n* [更精准的偏好预测：可调谐且可解释的推荐系统](https:\u002F\u002Fblog.insightdatascience.com\u002Ftunable-and-explainable-recommender-systems-cd52b6287bad)；Amber Roberts；广告推荐应当让每位消费者都能理解，但能否在不牺牲准确性的前提下提高其可解释性呢？\n* [机器学习正在引发科学危机](https:\u002F\u002Fwww.governmentciomedia.com\u002Fmachine-learning-creating-crisis-science)；Kevin McCaney；机器学习技术的广泛应用正导致越来越多的研究成果无法被其他研究人员重复验证。\n* [人工智能与伦理](https:\u002F\u002Fharvardmagazine.com\u002F2019\u002F01\u002Fartificial-intelligence-limitations)；Jonathan Shaw；2018 年 3 月，晚上 10 点左右，Elaine Herzberg 正在亚利桑那州坦佩市的一条街道上推行自行车，却被一辆自动驾驶汽车撞倒并身亡。尽管当时车内有驾驶员，但完全掌控车辆的是一个自主系统——人工智能。这起事件以及其他涉及人与 AI 技术交互的事故，都引发了诸多伦理和准法律问题。该系统的程序员在防止其产品夺走人类生命方面负有哪些道德义务？而 Herzberg 的死又该由谁来负责？是坐在驾驶座上的人？还是测试车辆性能的公司？抑或是 AI 系统的设计者，甚至是其车载传感器设备的制造商？\n* [构建值得信赖的人机合作关系](https:\u002F\u002Fwww.darpa.mil\u002Fnews-events\u002F2019-01-31)；无论是体育、商业还是军事团队，高效协作的关键要素之一就是信任，而信任的基础部分在于团队成员对彼此履行职责能力的相互理解。在组建人类与自主系统组成的高效团队时，人类需要及时、准确地了解其机器伙伴的能力、经验和可靠性，才能在动态环境中对其产生信任感。目前，当天气或光照等条件变化导致机器能力波动时，自主系统尚无法提供实时反馈。机器自身对其能力缺乏认知，也无法将其传达给人类伙伴，这降低了信任度并削弱了团队效能。\n* [增强分析与可解释的人工智能将如何在 2019 年及以后引发颠覆性变革](https:\u002F\u002Fwww.analyticsinsight.net\u002Fhere-is-how-augmented-analytics-and-explainable-ai-will-cause-a-disruption-in-2019-beyond\u002F)；Kamalika Some；人工智能（AI）是一项价值 15 万亿美元的变革性机遇，吸引了所有科技用户、领导者和意见领袖的关注。然而，随着 AI 技术的日益成熟，算法“黑箱”在决策过程中占据了主导地位。为了确保最终结果的可信度和利益相关者的信任，并充分利用这一机遇，我们必须了解算法得出其推荐或决策的依据，而这正是可解释的人工智能（XAI）的基本前提。\n* [为什么“可解释的人工智能”是打击金融犯罪的下一个前沿](http:\u002F\u002Fwww.bankingexchange.com\u002Fnews-feed\u002Fitem\u002F7785-why-explainable-ai-is-the-next-frontier-in-financial-crime-fighting)；Chad Hetherington；金融机构（FIs）必须在管理合规预算的同时，不偏离其核心职能和质量控制目标。为此，许多机构开始采用 AI 和机器学习等创新技术，自动化耗时且重复性强的数据收集和警报筛选工作，从而释放时间紧张的分析师，使其能够专注于更明智、更精确的决策过程。\n* [机器学习的可解释性：你知道你的模型在做什么吗？](https:\u002F\u002Fwww.inovex.de\u002Fblog\u002Fmachine-learning-interpretability\u002F)；Marcel Spitzer；随着 GDPR 的实施，欧盟范围内现已出台有关自动化个人决策和画像的法规（第 22 条，又称“解释权”），要求企业向个人提供处理相关信息，允许其请求干预，甚至定期检查以确保系统按预期运行。\n* [构建可解释的机器学习模型](https:\u002F\u002Fwww.fastdatascience.com\u002F2019\u002F02\u002F08\u002Fbuilding-explainable-machine-learning-models\u002F)；Thomas Wood；有时作为数据科学家，我们会遇到需要构建一种不应是黑箱的机器学习模型的情况，这种模型应当做出人类可以理解的透明决策。这可能与我们的科研直觉相悖，因为我们往往倾向于构建尽可能精确的模型。\n* [AI 不是 IT](https:\u002F\u002Fwww.linkedin.com\u002Fpulse\u002Fai-silvie-spreeuwenberg)；Silvie Spreeuwenberg；XAI 提供了一种介于两者之间的解决方案。它仍然是窄义 AI，但以一种能够与环境形成反馈回路的方式使用。这种反馈回路可能涉及人工干预。我们可以明确窄义 AI 解决方案的适用范围。当任务需要更多知识时，我们可以调整方案；反之，当任务超出 AI 解决方案的范围时，我们也会得到有意义的提示。\n* [用于保释和量刑判决的计算机程序被指存在针对黑人的偏见。但实际上情况并不那么明确。](https:\u002F\u002Fwww.washingtonpost.com\u002Fnews\u002Fmonkey-cage\u002Fwp\u002F2016\u002F10\u002F17\u002Fcan-an-algorithm-be-racist-our-analysis-is-more-cautious-than-propublicas\u002F)；今年夏天，一场激烈的辩论围绕着全国各法院用来辅助保释和量刑判决的工具展开。这场争议触及了我们社会面临的一些重大刑事司法问题。而这一切的核心就在于一个算法。\n* [AAAS：机器学习“引发科学危机”](https:\u002F\u002Fwww.bbc.com\u002Fnews\u002Fscience-environment-47267081)；数千名科学家使用的机器学习技术在分析数据时会产生误导性甚至完全错误的结果。休斯敦莱斯大学的 Genevera Allen 博士表示，这类系统的使用增加正在导致“科学危机”。她警告科学家们，如果不改进自己的技术，他们将会浪费时间和金钱。\n* [自动机器学习已失效](https:\u002F\u002Fpplonski.github.io\u002Fautomatic-machine-learning-is-broken\u002F)；维护和理解复杂模型所带来的负担\n* [查尔斯河分析公司开发工具，帮助 AI 与人类有效沟通](http:\u002F\u002Fmil-embedded.com\u002Fnews\u002Fcharles-river-analytics-creates-tool-to-help-ai-communicate-effectively-with-humans\u002F)；智能系统解决方案开发商查尔斯河分析公司，在美国国防高级研究计划局（DARPA）的可解释人工智能（XAI）项目框架下，提出了因果模型解释学习（CAMEL）方法。CAMEL 工具的目标是帮助人工智能与其人类队友进行有效沟通。\n* [DARPA 创建可解释的人工智能的努力内幕](https:\u002F\u002Fbdtechtalks.com\u002F2019\u002F01\u002F10\u002Fdarpa-xai-explainable-artificial-intelligence\u002F)；在 DARPA 许多令人振奋的项目中，可解释的人工智能（XAI）是一项始于 2016 年的倡议，旨在解决深度学习和神经网络领域的主要挑战之一——而这一子集的人工智能正日益在各个不同行业中占据重要地位。\n* [波士顿大学研究人员开发框架，以提升 AI 的公平性](https:\u002F\u002Fventurebeat.com\u002F2019\u002F01\u002F30\u002Fboston-university-researchers-develop-framework-to-improve-ai-fairness\u002F)；近年来的经验表明，AI 算法可能会表现出性别和种族偏见，这引发了人们对其在关键领域的应用的担忧，比如决定谁的贷款获批、谁有资格获得工作、谁可以获释、谁将继续服刑。波士顿大学科学家的新研究表明，评估 AI 算法的公平性有多么困难，并试图建立一个框架来检测和缓解自动化决策中的问题行为。这篇题为“从软分类器到硬决策：我们能有多公平？”的研究论文将于本周在计算机协会关于公平、问责制和透明度（ACM FAT*）的会议上发表。\n\n### 2018年\n\n* [理解可解释人工智能](https:\u002F\u002Fwww.quantiply.com\u002Fblog\u002Funderstanding-explainable-ai)；（摘自《在高度监管行业整合人工智能的基础技术手册》）长期以来，公众对人工智能的看法往往与末日景象联系在一起：人工智能就是“天网”，我们应该对其感到恐惧。这种恐惧情绪在人们对优步自动驾驶汽车悲剧的反应中可见一斑。尽管每年因人为因素导致的交通事故死亡人数高达数万，但只要涉及人工智能的事故哪怕只有一起，就会引发强烈的社会关注。然而，这种恐惧掩盖了一个关于现代世界技术基础设施的重要事实：人工智能早已深度融入我们的生活。这并不是说我们无需对日益依赖人工智能技术保持警惕。“黑箱”问题正是这种担忧的一个合理理由。\n* [人类可解释机器学习的重要性](https:\u002F\u002Ftowardsdatascience.com\u002Fhuman-interpretable-machine-learning-part-1-the-need-and-importance-of-model-interpretation-2ed758f5f476)；本文是我关于“可解释人工智能（XAI）”系列文章的第一篇。过去十年间，由机器学习和深度学习驱动的人工智能领域经历了翻天覆地的变化。从最初纯粹的学术研究领域，如今已广泛应用于零售、科技、医疗、科学等多个行业。进入21世纪后，数据科学和机器学习的核心目标已不再是单纯为发表论文而进行实验室实验，而是致力于解决现实世界中的复杂问题、自动化繁琐任务，从而让生活更加便捷美好。尽管如此，常用的机器学习、统计或深度学习模型工具箱却变化不大。虽然诸如胶囊网络等新模型不断涌现，但这些新技术真正被行业采纳通常需要数年时间。因此，在实际应用中，数据科学和机器学习更注重“落地”而非理论探讨，如何将这些模型有效应用于正确的数据以解决复杂的现实问题，显得尤为重要。\n* [优步开源自动驾驶车辆可视化系统](https:\u002F\u002Fwww.designnews.com\u002Fdesign-hardware-software\u002Fuber-has-open-sourced-autonomous-vehicle-visualization\u002F38672905960296)；通过开源其自动驾驶可视化系统，优步希望为工程师开发自动驾驶车辆时提供一套标准化的可视化工具。\n* [企业级人工智能的圣杯——可解释人工智能（XAI）](https:\u002F\u002Fblog.goodaudience.com\u002Fholy-grail-of-ai-for-enterprise-explainable-ai-xai-6e630902f2a0)；Saurabh Kaushik；除了应对上述场景外，XAI还能带来更深层次的商业价值，例如：提升AI模型性能，因为解释有助于 pinpoint 数据和特征行为中的问题；促进更明智、果断的决策，因为解释提供了额外的信息和信心，使决策者能够采取恰当行动；增强掌控感，作为AI系统的拥有者，可以清晰了解影响系统行为的关键因素及边界；提升安全感，因为每项决策都可以经过安全准则的检验，并在违反时发出警报；赢得利益相关者的信任，他们能够全面理解每一项决策背后的逻辑；监控伦理问题及由训练数据偏差引发的违规行为；更好地满足组织内部审计及其他用途下的问责制要求；更有效地遵守监管规定（如GDPR），其中“知情权”已成为系统的基本要求。\n* [人工智能并非一项技术](https:\u002F\u002Fwww.forbes.com\u002Fsites\u002Fcognitiveworld\u002F2018\u002F11\u002F01\u002Fartificial-intelligence-is-not-a-technology\u002F)；Kathleen Walch；制造智能机器既是人工智能的目标，也是理解如何使机器具备智能这一根本科学的核心所在。人工智能代表了我们期望达到的结果，而在此过程中取得的一系列进展，例如自动驾驶汽车、图像识别技术以及自然语言处理与生成等，都是通往通用人工智能（AGI）道路上的重要步骤。\n* [可解释性的构建模块](https:\u002F\u002Fdistill.pub\u002F2018\u002Fbuilding-blocks\u002F)；Chris Olah …；通常，可解释性技术会被单独研究。我们则探索当这些技术组合使用时所产生的强大交互作用，以及由此形成的丰富组合空间结构。\n* [为什么机器学习的可解释性至关重要](https:\u002F\u002Fblog.dataiku.com\u002Fwhy-machine-learning-interpretability-matters)；尽管机器学习（ML）已经存在数十年，但似乎在过去一年里，围绕它的新闻报道（尤其是在主流媒体中）更多地聚焦于可解释性问题——包括信任、机器学习“黑箱”以及公平性和伦理等议题。毫无疑问，如果这一话题越来越受到关注，那一定是因为它很重要。但究竟为何重要？又对哪些人而言重要呢？\n* [IBM与哈佛大学联合开发工具，解决人工智能翻译中的“黑箱”问题](https:\u002F\u002Fventurebeat.com\u002F2018\u002F11\u002F01\u002Fibm-harvard-develop-tool-to-tackle-black-box-problem-in-ai-translation\u002F)；seq2seq可视工具；IBM和哈佛大学的研究人员共同开发了一款新的调试工具来解决这一问题。该工具上周在柏林举行的IEEE视觉分析科学与技术大会上亮相，它能够让深度学习应用的开发者直观地观察到人工智能在将一段文字从一种语言翻译成另一种语言时的决策过程。\n* [机器学习解释器的五大流派](https:\u002F\u002Fwww.slideshare.net\u002Flopusz\u002Fthe-five-tribes-of-machine-learning-explainers)；Michał Łopuszyński；PyData Berlin 2018闪电演讲\n* [警惕随机森林特征重要性的默认设置](https:\u002F\u002Fexplained.ai\u002Frf-importance\u002Findex.html)；Terence Parr、Kerem Turgutlu、Christopher Csiszar 和 Jeremy Howard；简而言之：scikit-learn 随机森林的特征重要性算法以及 R 语言中随机森林的默认特征重要性计算方法都存在偏差。若要在 Python 中获得可靠结果，应使用置换重要性指标，该指标在此处及我们的 rfpimp 包中均有提供（可通过 pip 安装）。对于 R 语言，则应在随机森林构造函数中设置 importance=T，随后在 R 的 importance() 函数中指定 type=1。此外，只有当模型采用合适的超参数进行训练时，所得到的特征重要性评估才具有可靠性。\n* [可解释人工智能与机器学习的应用案例](https:\u002F\u002Fwww.kdnuggets.com\u002F2018\u002F12\u002Fexplainable-ai-machine-learning.html)；非常详尽地列举了可解释人工智能的潜在应用场景，例如：检测能源窃取行为——不同类型的窃电行为需要调查人员采取不同的应对措施；信用评分——《公平信用报告法》（FCRA）是一项联邦法律，旨在规范信用报告机构的行为，并要求其确保所收集和发布的消费者信用信息真实、准确；视频威胁检测——将某人标记为威胁可能引发严重的法律后果；\n\n* [人工智能伦理：数据科学家的视角](https:\u002F\u002Fmedium.com\u002F@QuantumBlack\u002Fethics-of-ai-a-data-scientists-perspective-cb7cdb1c8392)；QuantumBlack\n\n* [可解释AI与解释AI](https:\u002F\u002Fmedium.com\u002F@ahmad.hajmosa\u002Fexplainable-ai-vs-explaining-ai-part-1-d39ea5053347)；Ahmad Haj Mosa；一些观点将XAI工具与《思考，快与慢》中的理念联系起来。\n\n* [监管黑箱医疗](http:\u002F\u002Fmichiganlawreview.org\u002Fregulating-black-box-medicine\u002F)；数据驱动现代医学。而我们用于分析这些数据的工具正变得越来越强大。随着健康数据的不断积累，基于这些数据的复杂算法能够推动医疗创新、改善诊疗流程并提高效率。然而，这些算法的质量参差不齐。有些准确且强大，而另一些则可能充满错误或建立在有缺陷的科学基础上。当一个不透明的算法为糖尿病患者推荐胰岛素剂量时，我们如何确定该剂量是正确的呢？患者、医疗服务提供者和保险公司都面临着识别高质量算法的巨大困难——他们既缺乏专业知识，也难以获取相关专有信息。那么，我们应如何确保医疗算法的安全性和有效性？\n\n* [优秀AI模型的三个标志](https:\u002F\u002Ftdwi.org\u002Farticles\u002F2018\u002F11\u002F26\u002Fadv-all-3-signs-of-a-good-ai-model.aspx)；Troy Hiltbrand；直到最近，AI项目的成功与否仅以公司收益来衡量，但一种新兴的行业趋势提出了另一个目标——可解释的人工智能（XAI）。向XAI发展的动因来自消费者（乃至整个社会）对理解AI决策过程的需求增加。例如，欧洲的《通用数据保护条例》（GDPR）等法规，进一步提升了在使用AI进行自动化决策时的问责性要求，尤其是在偏见可能对个人造成不利影响的情况下。\n\n* [人工智能领域正迅速取得新进展](https:\u002F\u002Fwww.technative.io\u002Fwhy-its-important-to-create-a-movement-around-explainable-ai\u002F)；然而，随着AI应用的日益广泛，具备可解释性的模型的重要性也将不断提升。简而言之，如果系统负责做出决策，那么在流程中就必须展示这一决策——即说明决策是什么、如何得出的，以及如今更为关键的——为什么AI会做出这样的选择。\n\n* [为何我们需要审计算法](https:\u002F\u002Fhbr.org\u002F2018\u002F11\u002Fwhy-we-need-to-audit-algorithms)；James Guszcza, Iyad Rahwan, Will Bible, Manuel Cebrian, Vic Katyal；算法决策和人工智能（AI）蕴含巨大潜力，有望成为经济领域的重磅突破，但我们担心炒作使得许多人忽视了将算法引入商业和社会所面临的严重问题。事实上，我们看到许多人陷入了微软的Kate Crawford所称的“数据原教旨主义”——即认为海量数据集本身就是可靠客观真理的宝库，只要我们能用机器学习工具将其挖掘出来即可。然而，这种看法过于简单化，现在已十分清楚，若不加以约束，嵌入数字和社交技术中的AI算法可能会固化社会偏见、加速谣言和虚假信息的传播、放大舆论回音壁效应、劫持我们的注意力，甚至损害我们的心理健康。\n\n\n* [让机器思维走出黑箱](https:\u002F\u002Fnews.mit.edu\u002F2018\u002Fmit-lincoln-laboratory-adaptable-interpretable-machine-learning-0905)；Anne McGovern；可适应可解释机器学习项目正在重新设计机器学习模型，以便人类能够理解计算机的思考过程。\n\n* [可解释AI无法兑现承诺。原因如下](https:\u002F\u002Fhackernoon.com\u002Fexplainable-ai-wont-deliver-here-s-why-6738f54216be)；Cassie Kozyrkov；可解释性：你确实能理解它，但它效果并不好。性能：你并不理解它，但它却非常有效。为何不能两者兼得呢？\n\n* [我们需要一个算法版的FDA](http:\u002F\u002Fnautil.us\u002Fissue\u002F66\u002Fclockwork\u002Fwe-need-an-fda-for-algorithms)；Hannah Fry；我们是否需要培养一种全新的直觉，来理解如何与算法互动？当你说最好的算法是在每个环节都考虑到人的因素时，具体指的是什么？最危险的算法又是什么？\n\n* [可解释AI、交互性与人机交互](https:\u002F\u002Fwww.linkedin.com\u002Fpulse\u002Fexplainable-ai-interactivity-hci-erik-stolterman-bergqvist\u002F)；\nErik Stolterman Bergqvist；开发能够在技术层面上以人类易于理解的方式解释其内部运作的AI系统。从法律角度探讨XAI。可解释AI出于实际需求而必要，也可以从更哲学的角度切入，提出关于人类要求系统解释自身行为是否合理等更广泛的问题。\n\n* [为何你的企业必须拥抱可解释AI，才能在热潮中脱颖而出并理解AI的业务逻辑](https:\u002F\u002Fwww.hfsresearch.com\u002Fpointsofview\u002Fescape-the-black-box-take-steps-toward-explainable-ai-today-or-risk-damaging-your-business)；Maria Terekhova；如果AI要真正具备可应用于商业的能力，只有当我们能够清晰地设计其背后的业务逻辑时才有可能实现。这意味着深谙业务逻辑的企业领导者必须处于AI设计和管理流程的核心位置。\n \n* [可解释AI：问责的边界](https:\u002F\u002Fwww.information-age.com\u002Fexplainable-ai-123476397\u002F)；Yaroslav Kuflinski；人们究竟能在多大程度上信任AI的建议呢？提升人工智能伦理的应用程度\n\n\n\n### 2017年\n\n* [被软件程序的秘密算法送进监狱](https:\u002F\u002Fwww.nytimes.com\u002F2017\u002F05\u002F01\u002Fus\u002Fpolitics\u002Fsent-to-prison-by-a-software-programs-secret-algorithms.html)；Adam Liptak，《纽约时报》；Loomis先生案件中的报告是由Northpointe公司销售的一款名为Compas的产品生成的。报告包含一系列条形图，用于评估Loomis先生再次犯罪的风险。检察官在庭审中告诉法官，Compas报告表明“暴力风险高、累犯风险高、审前风险高”。法官同意这一观点，并告知Loomis先生：“根据Compas评估，您被认定为对社会构成高度威胁的人。”\n* [AI可能复活种族歧视性住房政策](https:\u002F\u002Fmotherboard.vice.com\u002Fen_us\u002Farticle\u002F4x44dp\u002Fai-could-resurrect-a-racist-housing-policy) 以及为何我们需要透明度来阻止它——“我们无法调查Compas算法的事实本身就是一个问题”\n\n### 2016年\n\n* [我们如何分析COMPAS累犯预测算法](https:\u002F\u002Fwww.propublica.org\u002Farticle\u002Fhow-we-analyzed-the-compas-recidivism-algorithm)；ProPublica调查报告。黑人被告往往被预测具有比实际情况更高的累犯风险。我们的分析发现，在两年内未再次犯罪的黑人被告中，被错误归类为高风险的概率几乎是白人被告的两倍（45%对比23%）。该分析还表明，即使在控制了既往犯罪记录、未来累犯可能性、年龄和性别等因素后，黑人被告被分配到高风险评分的可能性仍比白人被告高出45%。\n\n## 学位论文\n\n### 2018年\n\n* [揭示黑盒机器学习算法：开发用于评估个体预测解释方法质量的公理化框架](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1808.05054.pdf) 米洛·霍内格；\n\n### 2016年\n\n* [机器学习中的不确定性与标签噪声](https:\u002F\u002Fdial.uclouvain.be\u002Fpr\u002Fboreal\u002Fobject\u002Fboreal:134618\u002Fdatastream\u002FPDF_01\u002Fview)；贝努瓦·弗雷奈；这篇论文探讨了机器学习面临的三大挑战：高维数据、标签噪声以及计算资源有限的问题。\n\n## 音频\n\n### 2018年\n\n* [解释可解释的人工智能](https:\u002F\u002Fwww.brighttalk.com\u002Fwebcast\u002F16463\u002F346891\u002Fexplaining-explainable-ai)；在本次网络研讨会上，我们将与帕特里克·霍尔和汤姆·阿利夫围绕可解释人工智能的业务需求及其对任何组织可能带来的价值展开小组讨论。\n* [与理查德·泽梅尔探讨机器学习中的公平性问题](https:\u002F\u002Ftwimlai.com\u002Ftwiml-talk-209-approaches-to-fairness-in-machine-learning-with-richard-zemel\u002F)；今天，我们将继续探索“AI之信任”这一主题，采访多伦多大学计算机科学系教授兼Vector研究所研究主任理查德·泽梅尔。\n* [与大卫·施皮格尔哈尔特探讨如何使算法值得信赖](https:\u002F\u002Ftwimlai.com\u002Ftwiml-talk-212-making-algorithms-trustworthy-with-david-speigelhalter\u002F)；在本系列NeurIPS专题的第二期节目中，我们邀请到了剑桥大学温顿风险与证据传播中心主任、英国皇家统计学会会长大卫·施皮格尔哈尔特。\n\n## 研讨会\n\n### 2018年\n\n* [第二届可解释人工智能研讨会](https:\u002F\u002Fcris.vub.be\u002Ffiles\u002F38962039\u002Fproceedings_XAI_2018.pdf)；戴维·W·阿哈、特雷弗·达雷尔、帕特里克·多赫蒂和丹尼尔·马加泽尼；\n* [可解释的人工智能](http:\u002F\u002Fcdn.bdigital.org\u002FPDF\u002FBDC18\u002FBDC18_ExplainableAI.pdf)；里卡多·巴埃萨-耶茨；2018年大数据大会\n* [信任与可解释性：人与AI之间的关系](http:\u002F\u002Fwww.imm.dtu.dk\u002F~tobo\u002FAI_chora2.pdf)；托马斯·博兰德；衡量AI应用成功与否的标准，在于其为人类生活创造的价值。从这个角度来看，AI系统的设计应当帮助人们更好地理解这些系统、参与其使用并建立信任。如今，AI技术已广泛渗透到我们的生活中。随着它逐渐成为社会的核心力量，该领域正从单纯构建智能系统，转向打造具备人类感知且值得信赖的智能系统。\n* [21种公平性定义及其政治内涵](https:\u002F\u002Ffairmlbook.org\u002Ftutorial2.html)；本教程有两个目标。一是解释各种技术性定义，同时明确其中所蕴含的价值观。这将有助于政策制定者及其他相关人士更深入地理解关于公平性标准的争论真正涉及的内容（例如个体公平与群体公平、统计均等与误差率平等之间的区别）。此外，也能让计算机科学家认识到，定义的多样化应被积极看待，而非回避；而一味追求单一的“正确”定义并无意义，因为技术考量无法解决道德层面的争议。\n* [2018年ICML机器学习中人类可解释性研讨会论文集（WHI 2018）](https:\u002F\u002Farxiv.org\u002Fhtml\u002F1807.01308)\n\n### 2017年\n\n* [NIPS 2017机器学习公平性教程](https:\u002F\u002Ffairmlbook.org\u002Ftutorial1.html)；索隆·巴罗卡斯、莫里茨·哈特\n* [可解释性与AI安全](http:\u002F\u002Fs.interpretable.ml\u002Fnips_interpretable_ml_2017_victoria_Krakovna.pdf)；维多利亚·克拉科夫娜；长期AI安全、可靠地向高级AI系统传达人类偏好与价值观、为与这些偏好一致的AI系统设置激励机制\n* [调试机器学习](https:\u002F\u002Fwww.slideshare.net\u002Flopusz\u002Fdebugging-machinelearning)；米哈尔·沃普什任斯基；模型内省 只有非常简单的模型（如线性模型、基础决策树）才能回答“为什么”的问题有时，即使这样的简单模型无法达到顶级性能，将其应用于你的数据集仍然具有启发性你可以通过引入更高级的（非线性变换的）特征来提升简单模型的表现\n\n## 其他\n\n* 安德烈·沙拉波夫关于可解释人工智能、算法公平性等方面的全部内容 [https:\u002F\u002Fgithub.com\u002Fandreysharapov\u002Fxaience](https:\u002F\u002Fgithub.com\u002Fandreysharapov\u002Fxaience)\n* FAT ML [机器学习中的公平、问责与透明度](http:\u002F\u002Fwww.fatml.org\u002F)\n* 华盛顿大学交互式数据实验室 [IDL](https:\u002F\u002Fidl.cs.washington.edu\u002Fpapers\u002F)\n* CS 294：机器学习中的公平性 [Fairness Berkeley](https:\u002F\u002Ffairmlclass.github.io\u002F)\n* [谷歌的机器学习公平性指南](https:\u002F\u002Fdevelopers.google.com\u002Fmachine-learning\u002Ffairness-overview\u002F)\n* 米哈尔·沃普什任斯基的[优秀可解释机器学习资源库](https:\u002F\u002Fgithub.com\u002Flopusz\u002Fawesome-interpretable-machine-learning)\n* [可解释的人工智能：拓展人工智能的边界](https:\u002F\u002Fwww.linkedin.com\u002Flearning\u002Flearning-xai-explainable-artificial-intelligence\u002Fexplainable-ai-expanding-the-frontiers-of-artificial-intelligence)\n* [谷歌 - 可解释的人工智能](https:\u002F\u002Fcloud.google.com\u002Fexplainable-ai\u002F) —— 用于部署可解释且包容性机器学习模型的工具和框架。\n* [谷歌可解释性白皮书](https:\u002F\u002Fstorage.googleapis.com\u002Fcloud-ai-whitepapers\u002FAI%20Explainability%20Whitepaper.pdf)","# xai_resources 快速上手指南\n\n`xai_resources` 并非一个可安装的软件库或工具包，而是一个**可解释人工智能（XAI）领域的精选资源汇总列表**。它主要收集了相关的学术论文、书籍、软件工具链接及文章。因此，本指南旨在指导开发者如何高效地浏览和利用这些资源，而非执行传统的安装流程。\n\n## 环境准备\n\n由于该项目本质上是文档和资源索引，**无需特定的操作系统或复杂的依赖环境**。\n\n*   **系统要求**：任意支持现代浏览器的操作系统（Windows, macOS, Linux）。\n*   **前置依赖**：\n    *   稳定的网络连接（部分论文链接可能需要学术网络权限）。\n    *   Git（可选，仅用于克隆仓库到本地浏览）。\n    *   Markdown 阅读器（可选，用于本地查看 `.md` 文件）。\n\n## 获取资源\n\n你可以通过以下两种方式访问资源列表：\n\n### 方式一：在线直接浏览（推荐）\n直接访问该项目的 GitHub 仓库页面查看整理好的目录和链接：\n*   **GitHub 地址**: `https:\u002F\u002Fgithub.com\u002F...\u002Fxai_resources` (请替换为实际仓库地址)\n*   **国内加速方案**: 如果访问 GitHub 缓慢，可使用镜像站如 `https:\u002F\u002Fghproxy.com\u002F` 前缀访问，或使用 Gitee 上的镜像仓库（如有）。\n\n### 方式二：克隆到本地\n如果你希望离线阅读或贡献内容，可以使用 Git 克隆项目：\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002F...\u002Fxai_resources.git\ncd xai_resources\n```\n\n*国内用户推荐使用国内镜像源克隆（如果存在）：*\n```bash\ngit clone https:\u002F\u002Fgitee.com\u002Fmirror\u002Fxai_resources.git\ncd xai_resources\n```\n\n## 基本使用\n\n该项目的使用核心在于**按需检索**。打开根目录下的 `README.md` 文件，你将看到以下分类导航：\n\n1.  **Papers and preprints (学术论文)**\n    *   按年份（如 2021, 2020）分类，包含论文标题、摘要简介及 PDF\u002FDOI 链接。\n    *   *示例*: 查找关于“多模态医学影像”的解释性研究，可定位到 2021 年的 *\"Evaluating Explainable AI on a Multi-Modal Medical Imaging Task\"* 条目。\n\n2.  **Books and longer materials (书籍与长篇资料)**\n    *   提供系统性学习 XAI 理论的书籍链接。\n\n3.  **Software tools (软件工具)**\n    *   汇总了具体的 XAI 算法实现库（如 LIME, SHAP, EXPLAN, GRACE 等）的官方代码库链接。\n    *   *操作*: 点击对应工具的名称，跳转至其独立的 GitHub 仓库进行安装和使用。\n\n4.  **Short articles & Misc (短文与其他)**\n    *   包含新闻报道、学位论文等非正式出版物资源。\n\n**使用建议**：\n开发者应根据当前需求（例如：需要寻找特定场景的评估指标，或寻找可用的开源解释器代码），直接在 `README.md` 中搜索关键词（如 \"Medical\", \"Trust\", \"Tabular data\"），点击对应链接获取原始文献或工具源码。","某医疗 AI 团队正在开发一款多模态医学影像辅助诊断系统，急需向临床医生解释模型为何做出特定癌症预测以通过伦理审查。\n\n### 没有 xai_resources 时\n- 团队在海量文献中盲目搜索，难以找到针对“多模态影像”这一特定场景的可解释性评估指标，导致选用的热力图无法区分不同成像通道的临床意义。\n- 直接套用通用的 LIME 或 SHAP 算法，未察觉这些方法在真实欺诈检测或医疗决策中可能反而降低人类专家的判断准确率，存在误导医生的风险。\n- 缺乏系统的评估框架，无法回答监管机构关于“算法是否满足临床优先级需求”的质询，项目上线审批被迫搁置。\n- 开发人员花费数周时间整理零散的论文和工具列表，严重拖慢了从模型训练到临床验证的迭代进度。\n\n### 使用 xai_resources 后\n- 团队迅速定位到《Evaluating Explainable AI on a Multi-Modal Medical Imaging Task》等关键论文，直接采用文中提出的 MSFI 指标，精准评估并优化了模型对特定影像模态的特征提取能力。\n- 参考《How can I choose an explainer?》中的实证研究结论，团队避免了盲目部署通用解释器，转而设计符合医生实际决策习惯的人机协作流程，提升了最终诊断准确率。\n- 利用资源库中分类清晰的软件工具和评估方法论，快速构建了符合临床要求的解释性报告，顺利通过了医院伦理委员会的严格审查。\n- 借助汇总好的书籍、预印本及案例文章，团队将调研时间从数周缩短至几天，将更多精力投入到核心算法的临床适配与优化中。\n\nxai_resources 通过提供经过筛选的权威资源与实证依据，帮助开发者避开理论陷阱，让可解释性 AI 真正服务于高风险领域的落地决策。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpbiecek_xai_resources_f41215d9.png","pbiecek","Przemysław Biecek","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fpbiecek_e81268d8.jpg","I just like to make things. \r\nhttps:\u002F\u002Fmi2.ai\u002F","@ModelOriented @MI2DataLab @MI2-Education @BetaAndBit  @mim-uw University of Warsaw","Warsaw, Poland",null,"PrzeBiec","http:\u002F\u002Fwww.linkedin.com\u002Fin\u002Fpbiecek","https:\u002F\u002Fgithub.com\u002Fpbiecek",[83],{"name":84,"color":85,"percentage":86},"R","#198CE7",100,853,138,"2026-04-10T04:56:10",1,"","未说明",{"notes":94,"python":92,"dependencies":95},"该仓库（xai_resources）并非一个可运行的软件工具或代码库，而是一个关于可解释人工智能（XAI）的资源列表（包含论文、书籍、工具链接等）。因此，它没有特定的操作系统、GPU、内存、Python 版本或依赖库要求。用户只需通过浏览器阅读 README 或访问其中列出的外部链接即可。",[],[14],[98,99,100],"xai","interpretability","interpretable-machine-learning","2026-03-27T02:49:30.150509","2026-04-19T06:02:49.958189",[],[]]