Open issues of validating AI components

[raivo.sell][✓ raivo.sell, 2025-09-18]

A. AI COMPONENT VALIDATION Both the automotive and airborne spaces have reacted to AI by viewing it as “specialized Software” in standards such as ISO 8800 [14] and [13]. This approach has the great utility of leveraging all the past work in generic mechanically safety and past work in software validation. However, now, one must manage the issue of how to handle the fact that we have a data generated “code” vs conventional programming code. In the world of V&V, this difference is manifested in three significant aspects: coverage analysis, code reviews, and version control. TABLE III V&V Technique Software AI/ML Coverage Analysis: Code Structure provides basis of coverage No structure Code Reviews: Crowd source expert knowledge No Code to Review Version Control Careful construction/release Very Difficult with data

These differences generate an enormous issue for intelligent test generation and any argument for completeness. This is an area of active research, and two threads have emerged: 1) Training Set Validation: Since the final referenced component is very hard to analyze, one approach is to examine the training set and the ODD to find interesting tests which may expose the cracks between them [16]. 2) Robustness to Noise: Either through simulation or using formal methods [17], the approach is to assert various higher-level properties and use these to test the component. An example in object recognition might be to assert the property that an object should be recognized independent of orientation. Overall, developing robust methods for AI component validation is quite an active and unsolved research topic for “fixed” function AI components. That is, AI components where the function is changing with active version control. Of course, many AI applications prefer a model where the AI component is constantly morphing. Validating the morphing situation is a topic of future research.

B. AI SPECIFICATION

For well-defined systems with an availability of system level abstractions, AI/ML components significantly increase the difficulty of intelligent test generation. With a golden spec, one can follow a structured process to make significant progress in validation and even gate the AI results with conventional safeguards. Unfortunately, one of the most compelling uses of AI is to employ it in situations where the specification of the system is not well defined or not viable using conventional programming. In these Specification Less /ML (SLML) situations, not only is building interesting tests difficult, but evaluating the correctness of the results creates further difficulty. Further, most of the major systems (perception, location services, path planning, etc.) in autonomous vehicles fall into this category of system function and AI usage. To date, there have been two approaches to attack the lack of specification problem: Anti-Spec and AI-Driver. 1) Anti-Spec In these situations, the only approach left is to specify correctness through an anti-spec. The simplest anti-spec is to avoid accidents. Based on some initial work by Intel, there is a standard, IEEE 2846, “Assumptions for Models in Safety-Related Automated Vehicle Behavior” [18] which establishes a framework for defining a minimum set of assumptions regarding the reasonably foreseeable behaviors of other road users. For each scenario, it specifies assumptions about the kinematic properties of other road users, including their speed, acceleration, and possible maneuvers. Challenges include an argument for completeness, a specification for the machinery for checking against the standard, and the connection to a liability governance framework. 2) AI-Driver While IEEE 2846 comes from a bottom-up technology perspective, Koopman/Widen [19] have proposed the concept of defining an AI driver which must replicate all the competencies of a human driver in a complex, real-world environment. Key points of Koopman’s AI driver concept include:

a) Full Driving Capability: The AI driver must handle the entire driving task, including perception (sensing the environment), decision-making (planning and responding to scenarios), and control (executing physical movements like steering and braking). It must also account for nuances like social driving norms and unexpected events. b) Safety Assurance: Koopman stresses that AVs need rigorous safety standards, similar to those in industries like aviation. This includes identifying potential failures, managing risks, and ensuring safe operation even in the face of unforeseen events. c) Human Equivalence: The AI driver must meet or exceed the performance of a competent, human driver. This involves adhering to traffic laws, responding to edge cases (rare or unusual driving scenarios), and maintaining situational awareness at all times. d) Ethical and Legal Responsibility: An AI driver must operate within ethical and legal frameworks, including handling situations that involve moral decisions or liability concerns. e) Testing and Validation: Koopman emphasizes the importance of robust testing, simulation, and on-road trials to validate AI driver systems. This includes covering edge cases, long-tail risks, and ensuring that systems generalize across diverse driving conditions. Overall, it is a very ambitious endeavor and there are significant challenges to building this specification of a reasonable driver. First, the idea of a “reasonable” driver is not even well encoded on the human side. Rather, this definition of “reasonableness” is built over a long history of legal distillation, and of course, the human standard is built on the understanding of humans by other humans. Second, the complexity of such a standard would be very high and it is not clear if it is doable. Finally, it may take quite a while of legal distillation to reach some level of closure on a human like an “AI-Driver.” Currently, the state-of-art for specification is relatively poor for both ADAS and AV. ADAS systems, which are widely proliferated, have massive divergences in behavior and completeness. When a customer buys ADAS, it is not entirely clear what they are getting. Tests by industry groups such as AAA, consumer reports, and IIHS have shown the significant shortcomings of existing solutions [20]. In 2024, IIHS introduced a ratings program to evaluate the safeguards of partial driving automation systems. Out of 14 systems tested, only one received an acceptable rating, highlighting the need for improved measures to prevent misuse and ensure driver engagement [21]. Today, there is only one non process oriented regulation in the marketplace, and this is the NHTSA regulations around AEB [22].

en/safeav/softsys/vaicomp.txt · Last modified: 2025/09/18 11:22 by raivo.sell