EMA 发布《附录 22：人工智能》（Annex 22）——“法规解读”

等风来123456715 · 发表于 2025-8-1 13:36:54

欢迎您注册蒲公英

您需要登录才可以下载或查看，没有帐号？立即注册

x

1. Scope (范围)
This annex applies to all types of computerised systems used in the manufacturing of medicinal products and active substances, where Artificial Intelligence models are used in critical applications with direct impact on patient safety, product quality or data integrity, e.g. to predict or classify data. The document provides additional guidance to Annex 11 for computerised systems in which AI models are embedded.
• 解读：本附件适用于所有用于药品和活性物质生产的计算机化系统，特别是当这些系统在关键应用中使用人工智能模型时。这些关键应用需对患者安全、产品质量或数据完整性产生直接影响，例如用于预测或分类数据。它是对《附件11》关于嵌入AI模型的计算机化系统的额外指导。
• 示例：
一个AI模型用于预测药品生产过程中某个批次是否可能存在质量问题。一个AI模型用于分类成品药物，将其区分为“合格”或“不合格”产品。这些应用因直接影响药品质量和患者安全，因此受本附件管辖。
The document applies to machine learning (AI/ML) models which have obtained their functionality through training with data, rather than being explicitly programmed. Models may consist of several individual models, each automating specific process steps in GMP.
• 解读：本文件适用于通过数据训练获得功能，而非通过明确编程实现的机器学习（AI/ML）模型。一个完整的模型可能由多个子模型组成，每个子模型自动化GMP（良好生产规范）中的特定工艺步骤。
• 示例：一个AI系统通过分析大量历史生产数据来学习识别缺陷产品，而不是通过编写硬编码的规则来识别。这个系统可能包含一个子模型用于图像识别，另一个子模型用于数据分析。
The document applies to static models, i.e. models that do not adapt their performance during use by incorporating new data. The use of dynamic models which continuously and automatically learn and adapt performance during use, is not covered by this document, and should not be used in critical GMP applications.
• 解读：本文件适用于静态模型，即在实际使用中不通过吸收新数据来改变其性能的模型。动态模型（即在使用过程中持续自动学习并调整性能的模型）不属于本文件管辖范围，且不应用于关键的GMP应用。
• 示例：
◦ 静态模型：一个AI模型在部署前经过训练和验证，其参数在运行中保持固定，不会根据新输入的数据自动调整其判断逻辑
◦ 动态模型（不适用）：一个AI模型在生产线上实时运行，并根据每次新处理的产品数据自动微调其识别精度。这种模型不被允许用于关键GMP应用。
The document applies to models with a deterministic output which, when given identical inputs, provide identical outputs. Models with a probabilistic output which, when given identical inputs, might not provide identical outputs are not covered by this document and should not be used in critical GMP applications.
• 解读：本文件适用于确定性输出的模型，即相同输入总是产生相同输出的模型。具有概率性输出的模型（即相同输入可能产生不同输出的模型）不属于本文件管辖范围，且不应用于关键的GMP应用。
• 示例：
◦ 确定性模型：一个AI模型用于根据特定参数计算化学反应的终点，每次给定相同参数，它都会给出完全相同的终点值’
概率性模型（不适用）：一个AI模型用于生成新的分子结构，即使输入相同，每次也可能生成不同的结构。这种模型不被允许用于关键GMP应用。
Following the above, the document does not apply to Generative AI and Large Language Models (LLM), and such models should not be used in critical GMP applications. If used in non-critical GMP applications, which do not have direct impact on patient safety, product quality or data integrity, personnel with adequate qualification and training should always be responsible for ensuring that the outputs from such models are suitable for the intended use, i.e. a human-in-the-loop (HITL) and the principles described in this document may be considered where applicable.
• 解读：基于上述规定，本文件不适用于生成式AI和大型语言模型（LLM），且这些模型不应用于关键的GMP应用。如果它们用于对患者安全、产品质量或数据完整性没有直接影响的非关键GMP应用，则必须由具备资质和培训的人员负责确保其输出适用于预期用途，即需要人工介入（human-in-the-loop, HITL），并且可以考虑适用本文件中描述的原则。
• 示例：
  不适用关键GMP应用：一个LLM用于自动生成药品的批次放行报告。这属于关键应用，因此不允许使用。
可用于非关键GMP应用（需HITL）：一个LLM用于自动生成内部培训文档草稿，这不直接影响患者安全或产品质量。在这种情况下，文档草稿必须由具备资质的员工审核和批准，以确保其准确性，这意味着需要“人工介入”。
------------------------------------------------------------------------
2. Principles (原则）
2.1. Personnel. In order to adequately understand the intended use and the associated risks of the application of an AI model in a GMP environment, there should be close cooperation between all relevant parties during algorithm selection, and model training, validation, testing and operation. This includes but may not be limited to process subject matter experts (SMEs), QA, data scientists, IT, and consultants. All personnel should have adequate qualifications, defined responsibilities and appropriate level of access.
• 解读：为了充分理解AI模型在GMP环境中应用的预期用途和相关风险，在算法选择、模型训练、验证、测试和操作过程中，所有相关方都应紧密合作。这包括但不限于工艺主题专家（SME）、质量保证（QA）、数据科学家、IT人员和顾问。所有人员都应具备足够的资质、明确的职责和适当的访问权限。
• 示例：开发一个用于质量控制的AI模型时，制药工艺工程师（SME）需要与数据科学家协作定义数据特征，QA人员审核验证计划，IT人员负责系统部署，确保各环节的专业性与协同性。
2.2. Documentation. Documentation for activities described in this section should be available and reviewed by the regulated user irrespective of whether a model is trained, validated and tested in-house or whether it is provided by a supplier or service provider.
• 解读：本节所述活动的所有文档都应可查阅，并由受监管的用户进行审查，无论模型是内部训练、验证和测试，还是由供应商或服务提供商提供。
• 示例：无论是公司内部开发还是外包的AI质量检测系统，其训练过程、验证报告、测试结果等所有相关文档都必须由公司质量部门审核和存档。
2.3. Quality Risk Management Activities described in this document should be implemented based on the risk to patient safety, product quality and data integrity.
• 解读：本文件所述的所有活动都应基于对患者安全、产品质量和数据完整性的风险进行管理和实施。
• 示例：在决定AI模型测试的严谨程度时，需要评估其可能对患者健康造成的潜在风险。例如，用于关键诊断的AI模型需要比用于预测设备维护的AI模型更高程度的验证和风险控制
------------------------------------------------------------------------
3. Intended Use (预期用途)
3.1. Intended use. The intended use of a model and the specific tasks it is designed to assist or automate should be described in detail based on an in-depth knowledge of the process the model is integrated in. This should include a comprehensive characterisation of the data the model is intended to use as input and all common and rare variations; i.e. the input sample space. Any limitations and possible erroneous and biased inputs should be identified. A process subject matter expert (SME) should be responsible for the adequacy of the description, and it should be documented and approved before the start of acceptance testing.
• 解读：模型的预期用途及其旨在辅助或自动化的具体任务，应基于对模型所集成工艺的深入了解进行详细描述。这应包括对模型预期使用的输入数据及其所有常见和罕见变异进行全面表征，即定义输入样本空间。应识别任何限制以及可能的错误和有偏差的输入。工艺主题专家（SME）应负责描述的充分性，且该描述应在开始验收测试前记录和批准。
• 示例：
  一个AI模型用于识别药片外观缺陷。其预期用途应详细描述为“自动检测药片表面裂纹、崩边、异色点等缺陷”。
输入样本空间应明确包括各种药片的尺寸、颜色、批次差异，以及在生产过程中可能出现的各种缺陷类型（常见和罕见）。
还需识别模型可能无法识别的缺陷类型（例如，极小的内部缺陷），或可能因光照变化等引起的误判（限制和错误输入）。
  这些描述必须由SME确认并记录批准，然后才能进行验收测试。
3.2. Subgroups. Where applicable, the input sample space should be divided into subgroups based on relevant characteristics. Subgroups may be defined by characteristics like the decision output (e.g. ‘accept’ or ‘reject’), process specific baseline characteristics (e.g. geographical site or equipment), specific characteristics in material or product, and characteristics specific to the task being automated (e.g. types and severity of defects).
• 解读：在适用的情况下，输入样本空间应根据相关特征划分为子组。子组可根据决策输出（例如“接受”或“拒绝”）、工艺特定基线特征（例如地理位置或设备）、物料或产品的特定特征，以及自动化任务的特定特征（例如缺陷类型和严重程度）来定义。
• 示例：
  一个用于药物质量控制的AI模型，其输入数据可以根据药物类型（如片剂、胶囊）、生产批次、生产设备、缺陷类型（如颜色不均、尺寸偏差、杂质）等划分为不同的子组。
模型对“接受”或“拒绝”的判断，也可能根据不同的子组有不同的表现或标准。
3.3. Human-in-the-loop. Where a model is used to give an input to a decision made by a human operator (human-in-the-loop), and where the effort to test such model has been diminished, the description of the intended use should include the responsibility of the operator. In this case, the training and consistent performance of the operator should be monitored like any other manual process.
• 解读：当模型作为人工操作员决策的输入（即“人工介入”），并且模型的测试工作因此有所减少时，预期用途的描述应包含操作员的责任。在这种情况下，操作员的培训和持续表现应像任何其他手动过程一样受到监控。
• 示例：
  一个AI模型根据图像识别结果，推荐操作员某批次产品可能存在缺陷，最终由操作员决定是否“放行”或“拒绝”。由于AI只是提供建议，测试工作可能相对简化。
  在这种情况下，文件的预期用途说明中，必须明确操作员对最终决策的责任。同时，需要定期评估操作员的培训情况，并监控他们是否始终能正确地基于AI的建议做出判断。
------------------------------------------------------------------------
4. Acceptance Criteria (验收标准)
4.1. Test metrics. Suitable, case dependent test metrics, should be defined to measure the performance of the model according to the intended use. As an example, suitable test metrics for a model used to classify products (e.g. ‘accept’ or ‘reject’) may include, but may not be limited to, a confusion matrix, sensitivity, specificity, accuracy, precision and/or F1 score.
• 解读：应根据预期用途定义合适且依赖具体情况的测试指标，以衡量模型的性能。例如，用于产品分类（如“接受”或“拒绝”）的模型的适用测试指标可以包括但不限于混淆矩阵、敏感性、特异性、准确性、精确度和/或F1分数。
• 示例：一个AI模型用于识别不合格药品，测试指标需要清晰定义。除了总体准确率，还需要关注：
敏感性（Recall）：能正确识别出所有不合格品的比例（避免漏报）。
特异性（Specificity）：能正确识别出所有合格品的比例（避免误报）。
混淆矩阵：表格形式展示真阳性、真阴性、假阳性和假阴性的数量。
4.2. Acceptance criteria. Acceptance criteria for the defined test metrics should be established by which the performance of the model should be considered acceptable for the intended use. The acceptance criteria may differ for specific subgroups within the intended use. A process subject matter expert (SME) should be responsible for the definition of the acceptance criteria, which should be documented and approved before the start of acceptance testing.
• 解读：应为定义的测试指标建立验收标准，以此判断模型的性能是否符合预期用途。验收标准对于预期用途中的特定子组可能有所不同。工艺主题专家（SME）应负责定义验收标准，且该标准应在开始验收测试前记录和批准。
• 示例：
对于识别药片外观缺陷的AI模型，验收标准可能设定为“总准确率不低于98%，对裂纹缺陷的敏感性不低于95%”。
如果药片有不同颜色，对红色药片的识别准确率可能允许略低于白色药片，因为红色药片的光学检测难度更高。这些子组差异化的标准需由SME定义。这些标准必须在验收测试开始前，由SME签字批准，并有完整记录。
4.3. No decrease. The acceptance criteria of a model, should be at least as high as the performance of the process it replaces. This implies, that the performance should be known for the process which is to be replaced by a model (see Annex 11 2.7).
• 解读：模型的验收标准应至少与其所替代工艺的性能一样高。这意味着，必须已知被模型替代的现有工艺的性能。
• 示例：如果AI模型将取代人工目视检查药片缺陷，那么AI模型的缺陷识别准确率必须至少达到甚至超过熟练人工目视检查的准确率。这意味着在引入AI前，必须对人工目视检查的性能有一个清晰的量化评估。
------------------------------------------------------------------------
5. Test Data (测试数据)    （转帖微信公众号：OpenPharmSolutions）

xqliu · 发表于 2025-8-2 10:04:03

学习了，谢谢提供分享。

[欧盟药事] EMA 发布《附录 22：人工智能》（Annex 22）——“法规解读”

欢迎您注册蒲公英

浏览过的版块

[欧盟药事] EMA 发布《附录&#8239;22：人工智能》（Annex&#8239;22）——“法规解读”

欢迎您注册蒲公英

浏览过的版块

[欧盟药事] EMA 发布《附录 22：人工智能》（Annex 22）——“法规解读”