| Both sides previous revisionPrevious revisionNext revision | Previous revision |
| en:safeav:avt:vvtechniq [2025/06/30 02:56] – rahulrazdan | en:safeav:avt:vvtechniq [2025/07/22 18:01] (current) – rahulrazdan |
|---|
| ====== Overview of V&V Techniques ====== | ====== Overview of V&V Techniques ====== |
| {{:en:iot-open:czapka_b.png?50| Bachelors (1st level) classification icon }} | {{:en:iot-open:czapka_m.png?50| Masters (2nd level) classification icon }} |
| |
| <todo @raivo.sell></todo> | |
| The fundamental characteristics of digital software systems are problematic in safety critical systems. However, the IT sector has been a key megatrend which has transformed the world over the last 50 years. In the process, it has developed large ecosystems around semiconductors, operating systems, communications, and application software. At this point, using these ecosystems is critical to nearly every product’s success, so mixed-domain safety critical products are now a reality. Mixed Domain structures can be classified in three broad paradigms each of which have very different V&V requirements: Mechanical Replacement (Big PBE, small DBE), Electronic Adjacent (separate PBE and DBE), autonomy (Big DBE, small PBE). | Validation and verification (V&V) are critical processes in systems engineering and software development that ensure a system meets its intended purpose and functions reliably. **Verification** is the process of evaluating whether a product, service, or system complies with its specified requirements—essentially asking, "Did we build the system right?" It involves activities such as inspections, simulations, tests, and reviews throughout the development lifecycle. **Validation**, on the other hand, ensures that the final system fulfills its intended use in the real-world environment—answering the question, "Did we build the right system?" This typically includes user acceptance testing, field trials, and performance assessments under operational conditions. Together, V&V help reduce risks, improve safety and quality, and increase confidence that a system will operate effectively and as expected. In the context of autonomous systems, V&V combines two historical trends. The first from the mechanical systems and the second more recent one from classical digital decision systems. Finally, AI adds further complexity to the testing of the digital decision systems. |
| Drive-by-Wire functionality is an example of the mechanical replacement paradigm where the implementation of the original mechanical functionality is done by electronic components (HW/SW). In their initial configurations, these mixed electronic/mechanical systems were physically separated as independent subsystems. In this configuration, the V&V process looked very similar to the traditional mechanical verification process. Regulations were updated to include the idea of electronics failure with standards such as SOTIF (See Table 1). | |
| | |
| | For traditional safety-critical systems in automotive, the evolution of V&V has been closely linked to regulatory standards frameworks such as ISO 26262. Key elements of this framework include: |
| | - System Design Process: A structured development assurance approach for complex systems, incorporating safety certification within the integrated development process. |
| | - Formalization: The formal definition of system operating conditions, functionalities, expected behaviors, risks, and hazards that must be mitigated. |
| | - Lifecycle Management: The management of components, systems, and development processes throughout their lifecycle. |
| | The primary objective was to meticulously and formally define the system design, anticipate expected behaviors and potential issues, and comprehend the impact over the product's lifespan. |
| | |
| | With the advent of conventional software paradigms, safety-critical V&V adapted by preserving the original system design approach while integrating software as system components. These software components maintained the same overall structure of fault analysis, lifecycle management, and hazard analysis within system design. However, certain aspects required extension. For instance, in the airborne domain, standard DO-178C, which addresses "Software Considerations in Airborne Systems and Equipment Certification," updated the concept of hazard from physical failure mechanisms to functional defects, acknowledging that software does not degrade due to physical processes. Also revised were lifecycle management concepts, reflecting traditional software development practices. Design Assurance Levels (DALs) were incorporated, allowing the integration of software components into system design, functional allocation, performance specification, and the V&V process, akin to SOTIF in the automotive industry. |
| |
| {{:en:safeav:as:table1.png?400|}} | {{:en:safeav:as:table1.png?400|}} |
| | |
| |
| The paradigm of separate physical subsystems has the advantage of V&V simplification and safety, but the large disadvantage of component skew and material cost. Thus, a large trend has been to build underlying computational fabrics with networking and virtually separate functionality. From a V&V perspective, this means that the virtual backbone which maintains this separation (ex: RTOS) must be verified to a very high standard. | Table one above shows the difference between ISO 26262 and SOTIF. In general, the fundamental characteristics of digital software systems are problematic in safety critical systems. However, the IT sector has been a key megatrend which has transformed the world over the last 50 years. In the process, it has developed large ecosystems around semiconductors, operating systems, communications, and application software. At this point, using these ecosystems is critical to nearly every product’s success, so mixed-domain safety critical products are now a reality. Mixed Domain structures can be classified in three broad paradigms each of which have very different V&V requirements: Mechanical Replacement (Big Physical, small Digital), Electronic Adjacent (separate Physical and Digital), autonomy (Big Digital, small Physical). |
| Infotainment systems are an example of Electronics Adjacent integration. Generally, there is an independent IT infrastructure working with the safety critical infrastructure, and from a V&V perspective, they can be validated separately. However, the presence of infotainment systems enables very powerful communication technologies (5G, Bluetooth, etc.) where the cyber-physical system can be impacted by external third parties. From a safety perspective, the simplest method for maintaining safety would be to physically separate these systems. However, this is not typically done because a connection is required to provide “over-the-air” updates to the device. Thus, the V&V capability must again verify the virtual safeguards against malicious intent are robust. | Drive-by-Wire functionality is an example of the mechanical replacement paradigm where the implementation of the original mechanical functionality is done by electronic components (HW/SW). In their initial configurations, these mixed electronic/mechanical systems were physically separated as independent subsystems. In this configuration, the V&V process looked very similar to the traditional mechanical verification process. |
| Finally, the last level of integration is in the context of autonomy. In autonomy, the DBE processes of sensing, perception, location services, path planning envelope the traditional mechanical PBE functionality. As Figure 5 shows, the Execution paradigm consists of four layers of functionality. The inner core, layer 4, is of course the world of physics which has all the nice PBE properties. Layer 3 consists of the traditional actuation and edge sensing functionality which maintains nice PBE properties. As we go to layer 2, there is a combination of software and AI which operate in the DBE-AI world. Finally, the outer design for the experiment V&V layer has the unique challenge of testing a system with fundamentally PBE properties but doing so through a layer dominated by DBE-AI functions. | |
| |
| {{:en:safeav:as:picture1.jpg?400|}} | |
| | |
| Fig. 5. Conceptual Layers in Cyber-Physical Systems | |
| |
| V. AUTONOMY V&V CURRENT APPROACHES | The paradigm of separate physical subsystems has the advantage of V&V simplification and safety, but the large disadvantage of component skew and material cost. Thus, an emerging trend has been to build underlying computational fabrics with networking and virtually (through software) separate functionality. From a V&V perspective, this means that the virtual backbone which maintains this separation (ex: RTOS) must be verified to a very high standard. Infotainment systems are an example of Electronics Adjacent integration. Generally, there is an independent IT infrastructure working with the safety critical infrastructure, and from a V&V perspective, they can be validated separately. However, the presence of infotainment systems enables very powerful communication technologies (5G, Bluetooth, etc.) where the cyber-physical system can be impacted by external third parties. From a safety perspective, the simplest method for maintaining safety would be to physically separate these systems. However, this is not typically done because a connection is required to provide “over-the-air” updates to the device. Thus, the V&V capability must again verify the virtual safeguards against malicious intent are robust. |
| For safety-critical systems, the evolution of V&V has been closely linked to regulatory standards frameworks such as ISO 26262. Key elements of this framework include: | Finally, the last level of integration is in the context of autonomy. In autonomy, the processes of sensing, perception, location services, path planning envelope the traditional mechanical functionality. |
| 1) System Design Process: A structured development assurance approach for complex systems, incorporating safety certification within the integrated development process. | |
| 2) Formalization: The formal definition of system operating conditions, functionalities, expected behaviors, risks, and hazards that must be mitigated. | |
| 3) Lifecycle Management: The management of components, systems, and development processes throughout their lifecycle. | |
| The primary objective was to meticulously and formally define the system design, anticipate expected behaviors and potential issues, and comprehend the impact over the product's lifespan. | |
| With the advent of conventional software paradigms, safety-critical V&V adapted by preserving the original system design approach while integrating software as system components. These software components maintained the same overall structure of fault analysis, lifecycle management, and hazard analysis within system design. However, certain aspects required extension. For instance, in the airborne domain, standard DO-178C, which addresses "Software Considerations in Airborne Systems and Equipment Certification," updated the concept of hazard from physical failure mechanisms to functional defects, acknowledging that software does not degrade due to physical processes. Also revised were lifecycle management concepts, reflecting traditional software development practices. Design Assurance Levels (DALs) were incorporated, allowing the integration of software components into system design, functional allocation, performance specification, and the V&V process, akin to SOTIF in the automotive industry. | |
| TABLE II | |
| CONTRAST OF CONVENTIONAL AND MACHINE LEARNING ALGORITHMS | |
| |
| Conventional Algorithm ML Algorithms Comment | {{:en:safeav:avt:ai_sw_dev.jpg?600|}} |
| Logical Theory No Theory In conventional algorithms, one needs a theory of operation to implement the solution. ML algorithms can often “work” without a clear understanding of exactly why they work. | |
| Analyzable Not Analyzable Conventional algorithms are encoded in a way one can see and analyze the software code. Most validation and verification methodologies rely on this ability to find errors. ML algorithms offer no such ability, and this leaves a large gap in validation. | |
| Causal Correlation Conventional algorithms have built in causality and ML algorithms discover correlations. The difference is important if one wants to reason at a higher level. | |
| Deterministic Non-Deterministic Conventional algorithms are deterministic in nature, and ML algorithms are fundamentally probabilistic in nature. | |
| Known Computational Complexity Unknown Computational Complexity Given the analyzable nature of conventional algorithms, one can build a model for computational complexity. That is, how long will it take the algorithm to work. For ML techniques, no generic method exists to evaluate computational complexity. | |
| |
| Moving beyond software, AI has built a “learning” paradigm. In this paradigm, there is a period of training where the AI machine “learns” from data to build its own rules, and in this case, learning is defined on top of traditional optimization algorithms which try to minimize some notion of error. This effectively is data driven software development. | Moving beyond software, AI has built a “learning” paradigm. In this paradigm, there is a period of training where the AI machine “learns” from data to build its own rules, and in this case, learning is defined on top of traditional optimization algorithms which try to minimize some notion of error. This effectively is data driven software development as shown in figure below. |
| However, as Table 2 above shows, there are profound differences between AI software and conventional software. These differences have generated three “elephants in the room” issues: AI component validation, AI Specification, and Intelligent Scaling. | However,there are profound differences between AI software and conventional software. The introduction of AI generated software introduces significant issues to the V&V task as shown in table 2 below. |
| | |
| | |
| | |
| | |
| | {{:en:safeav:avt:table2.png?600|}} |
| | |
| |
| C. INTELLIGENT TEST GENERATION | |
| Recognizing the importance of intelligent scenarios for testing, three major styles of intelligent test generation are currently active: physical testing, real-world seeding, and virtual testing. | |
| 1) Physical Testing | |
| Typically, physical scaling is the most expensive method to verify functionality. However, Tesla has built a flow where their existing fleet is a large distributed testbed. Using this fleet, Tesla's approach to autonomous driving uses a sophisticated data pipeline and deep learning system designed to process vast amounts of sensor data efficiently [23]. In this flow, the scenario under construction is the one driven by the driver, and the criterion for correctness is the driver's corrective action. Behind the scenes, the MaVV flow can be managed by large databases and supercomputers (DoJo) [24]. By employing this methodology, Tesla knows that its scenarios are always valid. However, there are challenges with this approach. First, the real world moves very slowly in terms of new unique situations. Second, by definition the scenarios seen are very much tied to the market presence of Tesla, so not predictive of new situations. Finally, the process of capturing data, discerning an error, and building corrective action is non-trivial. At the extreme, this process is akin to taking crash logs from broken computers, diagnosing them, and building the fixes. | |
| 2) Real-World Seeding | |
| Another line of test generation is to use physical situations as a seed for further virtual testing. Pegasus, the seminal project initiated in Germany, took such an approach. The project emphasized a scenario-based testing methodology which used observed data from real-world conditions as a base [25]. Another similar effort comes from Warwick University with a focus on test environments, safety analysis, scenario-based testing, and safe AI. One of the contributions from Warwick is Safety Pool Scenario Database [26]. Databases and seeding methods, especially of interesting situations, offer some value, but of course, their completeness is not clear. Further, databases of tests are very susceptible to be over optimized by AI algorithms. | |
| 3) Virtual Testing | |
| Another important contribution was ASAM OpenSCENARIO 2.0 [27] which is a domain-specific language designed to enhance the development, testing, and validation of Advanced Driver-Assistance Systems (ADAS) and Automated Driving Systems (ADS). A high-level language allows for a symbolic higher level description of the scenario with an ability to grow in complexity by rules of composition. Underneath the symbolic apparatus are pseudo-random test generation which can scale the scenario generation process. The randomness also offers a chance to expose “unknown-unknown” errors. | |
| Beyond component validation, there have been proposed solutions specifically for autonomous systems such as UL 4600, "Standard for Safety for the Evaluation of Autonomous Products." [28] Similar to ISO 26262/SOTIF, UL 4600 has a focus on safety risks across the full lifecycle of the product and introduces a structured “safety case” approach. The crux of this methodology is to document and justify how autonomous systems meet safety goals. It also emphasizes the importance of identifying and validating against a wide range of real-world scenarios, including edge cases and rare events. There is also a focus on including human-machine interactions. UL 4600 is a good step forward, but at the end, it is a process standard, and does not offer any advice on how to exactly solve the “elephants” in the room for AI validation. | |
| Overall, nearly all the standards and current regulations are process centric. They focus on the product developer making an argument and either through self-certification or explicit regulator getting approval. This methodology has the Achilles heel that the product owner does not have a method to get past the critical issues, nor does the regulator have a way to access completeness. | |
| All of these techniques have moved the state-of-art forward, but there remains a very fundamental issue. For both physical and virtual execution, how does one sufficient scale to reasonably explore the ODD. Further, when performing virtual execution, what level of abstraction is appropriate? Is it better to have abstract models or highly detailed physics-based models? Typically, the answer is dependent on the nature of the verification. If so, how do these abstraction levels connect to each other? A key missing piece is an ability to split the problem into manageable pieces and then recompose the result. This capability has not been developed for cyber-physical systems but has been developed for semiconductor designs. | |