Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
en:safeav:maps:ai [2025/10/21 11:24] – [Deep Learning Architectures] kosnarken:safeav:maps:ai [2025/10/21 13:18] (current) – [Data Requirements] kosnark
Line 18: Line 18:
   *     Behavioral prediction – anticipating the likely trajectories and intentions of dynamic agents.    *     Behavioral prediction – anticipating the likely trajectories and intentions of dynamic agents. 
  
-==== Deep Learning Architectures ====+===== Deep Learning Architectures =====
  
 Deep learning architectures form the computational backbone of AI-based perception systems in autonomous vehicles.  Deep learning architectures form the computational backbone of AI-based perception systems in autonomous vehicles. 
Line 39: Line 39:
 Alternatively, point-based networks like ''PointNet'' and ''PointNet++'' operate directly on raw point sets without voxelization, preserving fine geometric detail. Alternatively, point-based networks like ''PointNet'' and ''PointNet++'' operate directly on raw point sets without voxelization, preserving fine geometric detail.
 These models are critical for estimating the shape and distance of objects in 3D space, especially under challenging lighting or weather conditions. These models are critical for estimating the shape and distance of objects in 3D space, especially under challenging lighting or weather conditions.
 +
 +{{ :en:safeav:maps:cnn.webp?400 |}}
  
 === Transformer Architectures === === Transformer Architectures ===
Line 45: Line 47:
 In autonomous driving, transformers are used for **feature fusion**, **bird’s-eye-view (BEV) mapping**, and **trajectory prediction**.  In autonomous driving, transformers are used for **feature fusion**, **bird’s-eye-view (BEV) mapping**, and **trajectory prediction**. 
 Notable examples include ''DETR'' (Detection Transformer), ''BEVFormer'', and ''TransFusion'', which unify information from cameras and LiDARs into a consistent spatial representation. Notable examples include ''DETR'' (Detection Transformer), ''BEVFormer'', and ''TransFusion'', which unify information from cameras and LiDARs into a consistent spatial representation.
 +
 +
  
 === Recurrent and Temporal Models === === Recurrent and Temporal Models ===
Line 51: Line 55:
 They are common in **object tracking** and **motion prediction**, where maintaining consistent identities and velocities of moving objects over time is essential. They are common in **object tracking** and **motion prediction**, where maintaining consistent identities and velocities of moving objects over time is essential.
 More recent architectures use temporal convolutional networks or transformers to achieve similar results with greater parallelism and stability. More recent architectures use temporal convolutional networks or transformers to achieve similar results with greater parallelism and stability.
 +
 +{{ :en:safeav:maps:lstm.png?400 |}}
  
 === Graph Neural Networks (GNNs) === === Graph Neural Networks (GNNs) ===
Line 60: Line 66:
 For instance, a CNN may extract image features, a point-based network may process LiDAR geometry, and a transformer may fuse both into a joint representation. For instance, a CNN may extract image features, a point-based network may process LiDAR geometry, and a transformer may fuse both into a joint representation.
 These hierarchical and multimodal architectures enable robust perception across varied environments and sensor conditions, providing the high-level scene understanding required for safe autonomous behavior. These hierarchical and multimodal architectures enable robust perception across varied environments and sensor conditions, providing the high-level scene understanding required for safe autonomous behavior.
 +
 +====  Data Requirements ====
 +
 +The effectiveness of '''AI-based perception''' systems depends fundamentally on the quality, diversity, and management of data used throughout their development lifecycle. 
 +As deep neural networks do not rely on explicit programming, but they learn to interpret the environment from large, annotated datasets, data becomes the foundation of reliable perception for autonomous vehicles.
 +
 +Robust perception requires exposure to the full range of operating conditions that a vehicle may encounter. 
 +Datasets must include variations in:
 +
 +  * **Sensor modalities** – data from cameras, LiDAR, radar, GNSS, and IMU, reflecting the multimodal nature of perception.
 +  * **Environmental conditions** – daytime and nighttime scenes, different seasons, weather effects such as rain, fog, or snow.
 +  * **Geographical and cultural contexts** – urban, suburban, and rural areas; diverse traffic rules and road signage conventions.
 +  * **Behavioral diversity** – normal driving, aggressive maneuvers, and rare events such as jaywalking or emergency stops.
 +  * **Edge cases** – rare but safety-critical situations, including near-collisions or sensor occlusions.
 +
 +A balanced dataset should capture both common and unusual situations to ensure that perception models generalize safely beyond the training distribution.
 +Because collecting real-world data for every possible scenario is impractical and almost impossible, simulated or synthetic data are often used to supplement real-world datasets.
 +Photorealistic simulators such as ''CARLA'', ''LGSVL'', or ''AirSim'' allow the generation of labeled sensor data under controlled conditions, including rare or hazardous events.
 +Synthetic data helps to fill gaps in real-world coverage and supports transfer learning, though domain adaptation is often required to mitigate the so-called ''sim-to-real gap'' — differences between simulated and actual sensor distributions.
 +
 +=== Annotation and Labeling ===
 +Supervised learning models rely on accurately annotated datasets, where each image, frame, or point cloud is labeled with semantic information such as object classes, bounding boxes, or segmentation masks.
 +Annotation quality is critical: inconsistent or noisy labels can propagate systematic errors through the learning process.
 +Modern annotation pipelines combine human labeling with automation — using pre-trained models, interactive tools, and active learning to accelerate the process.
 +High-precision labeling is particularly demanding for LiDAR point clouds and multi-sensor fusion datasets, where 3D geometric consistency must be maintained across frames.
 +
 +
 +=== Ethical and Privacy Considerations ===
 +Data used in autonomous driving frequently includes imagery of people, vehicles, and property. 
 +To comply with privacy regulations and ethical standards, datasets must be anonymized by blurring faces and license plates, encrypting location data, and maintaining secure data storage.
 +Fairness and inclusivity in dataset design are equally important to prevent bias across geographic regions or demographic contexts.
 +
  
 ==== Scene Understanding ==== ==== Scene Understanding ====
Line 82: Line 120:
 Spatial relation describes e.g. mutual distance, relative velocity, and possible collision of trajectories. Functional relations describe when one entity modifies, limits, or restricts functions of another, e.g., traffic lanes modify the movement of vehicles, railing restricts the movement of pedestrians, etc.   Spatial relation describes e.g. mutual distance, relative velocity, and possible collision of trajectories. Functional relations describe when one entity modifies, limits, or restricts functions of another, e.g., traffic lanes modify the movement of vehicles, railing restricts the movement of pedestrians, etc.  
  
-These relations can be explicitly represented by scene graphs, where nodes represent entities and edges represent relationships, or encoded in different types of neural networks, e.g., visual-language models.  +These relations can be explicitly represented by scene graphs, where nodes represent entities and edges represent relationships, or encoded in different types of neural networks, e.g., visual-language models.  
 +  
 +{{:en:safeav:maps:scenegraph.jpg?400|}}
  
 Scene understanding must maintain temporal stability across frames. Flickering detections or inconsistent semantic labels can lead to unstable planning.  Scene understanding must maintain temporal stability across frames. Flickering detections or inconsistent semantic labels can lead to unstable planning. 
en/safeav/maps/ai.1761045897.txt.gz · Last modified: 2025/10/21 11:24 by kosnark
CC Attribution-Share Alike 4.0 International
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0