This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| en:safeav:maps:ai [2025/10/21 11:24] – [Deep Learning Architectures] kosnark | en:safeav:maps:ai [2025/10/21 13:18] (current) – [Data Requirements] kosnark | ||
|---|---|---|---|
| Line 18: | Line 18: | ||
| * | * | ||
| - | ==== Deep Learning Architectures ==== | + | ===== Deep Learning Architectures |
| Deep learning architectures form the computational backbone of AI-based perception systems in autonomous vehicles. | Deep learning architectures form the computational backbone of AI-based perception systems in autonomous vehicles. | ||
| Line 39: | Line 39: | ||
| Alternatively, | Alternatively, | ||
| These models are critical for estimating the shape and distance of objects in 3D space, especially under challenging lighting or weather conditions. | These models are critical for estimating the shape and distance of objects in 3D space, especially under challenging lighting or weather conditions. | ||
| + | |||
| + | {{ : | ||
| === Transformer Architectures === | === Transformer Architectures === | ||
| Line 45: | Line 47: | ||
| In autonomous driving, transformers are used for **feature fusion**, **bird’s-eye-view (BEV) mapping**, and **trajectory prediction**. | In autonomous driving, transformers are used for **feature fusion**, **bird’s-eye-view (BEV) mapping**, and **trajectory prediction**. | ||
| Notable examples include '' | Notable examples include '' | ||
| + | |||
| + | |||
| === Recurrent and Temporal Models === | === Recurrent and Temporal Models === | ||
| Line 51: | Line 55: | ||
| They are common in **object tracking** and **motion prediction**, | They are common in **object tracking** and **motion prediction**, | ||
| More recent architectures use temporal convolutional networks or transformers to achieve similar results with greater parallelism and stability. | More recent architectures use temporal convolutional networks or transformers to achieve similar results with greater parallelism and stability. | ||
| + | |||
| + | {{ : | ||
| === Graph Neural Networks (GNNs) === | === Graph Neural Networks (GNNs) === | ||
| Line 60: | Line 66: | ||
| For instance, a CNN may extract image features, a point-based network may process LiDAR geometry, and a transformer may fuse both into a joint representation. | For instance, a CNN may extract image features, a point-based network may process LiDAR geometry, and a transformer may fuse both into a joint representation. | ||
| These hierarchical and multimodal architectures enable robust perception across varied environments and sensor conditions, providing the high-level scene understanding required for safe autonomous behavior. | These hierarchical and multimodal architectures enable robust perception across varied environments and sensor conditions, providing the high-level scene understanding required for safe autonomous behavior. | ||
| + | |||
| + | ==== Data Requirements ==== | ||
| + | |||
| + | The effectiveness of ''' | ||
| + | As deep neural networks do not rely on explicit programming, | ||
| + | |||
| + | Robust perception requires exposure to the full range of operating conditions that a vehicle may encounter. | ||
| + | Datasets must include variations in: | ||
| + | |||
| + | * **Sensor modalities** – data from cameras, LiDAR, radar, GNSS, and IMU, reflecting the multimodal nature of perception. | ||
| + | * **Environmental conditions** – daytime and nighttime scenes, different seasons, weather effects such as rain, fog, or snow. | ||
| + | * **Geographical and cultural contexts** – urban, suburban, and rural areas; diverse traffic rules and road signage conventions. | ||
| + | * **Behavioral diversity** – normal driving, aggressive maneuvers, and rare events such as jaywalking or emergency stops. | ||
| + | * **Edge cases** – rare but safety-critical situations, including near-collisions or sensor occlusions. | ||
| + | |||
| + | A balanced dataset should capture both common and unusual situations to ensure that perception models generalize safely beyond the training distribution. | ||
| + | Because collecting real-world data for every possible scenario is impractical and almost impossible, simulated or synthetic data are often used to supplement real-world datasets. | ||
| + | Photorealistic simulators such as '' | ||
| + | Synthetic data helps to fill gaps in real-world coverage and supports transfer learning, though domain adaptation is often required to mitigate the so-called '' | ||
| + | |||
| + | === Annotation and Labeling === | ||
| + | Supervised learning models rely on accurately annotated datasets, where each image, frame, or point cloud is labeled with semantic information such as object classes, bounding boxes, or segmentation masks. | ||
| + | Annotation quality is critical: inconsistent or noisy labels can propagate systematic errors through the learning process. | ||
| + | Modern annotation pipelines combine human labeling with automation — using pre-trained models, interactive tools, and active learning to accelerate the process. | ||
| + | High-precision labeling is particularly demanding for LiDAR point clouds and multi-sensor fusion datasets, where 3D geometric consistency must be maintained across frames. | ||
| + | |||
| + | |||
| + | === Ethical and Privacy Considerations === | ||
| + | Data used in autonomous driving frequently includes imagery of people, vehicles, and property. | ||
| + | To comply with privacy regulations and ethical standards, datasets must be anonymized by blurring faces and license plates, encrypting location data, and maintaining secure data storage. | ||
| + | Fairness and inclusivity in dataset design are equally important to prevent bias across geographic regions or demographic contexts. | ||
| + | |||
| ==== Scene Understanding ==== | ==== Scene Understanding ==== | ||
| Line 82: | Line 120: | ||
| Spatial relation describes e.g. mutual distance, relative velocity, and possible collision of trajectories. Functional relations describe when one entity modifies, limits, or restricts functions of another, e.g., traffic lanes modify the movement of vehicles, railing restricts the movement of pedestrians, | Spatial relation describes e.g. mutual distance, relative velocity, and possible collision of trajectories. Functional relations describe when one entity modifies, limits, or restricts functions of another, e.g., traffic lanes modify the movement of vehicles, railing restricts the movement of pedestrians, | ||
| - | These relations can be explicitly represented by scene graphs, where nodes represent entities and edges represent relationships, | + | These relations can be explicitly represented by scene graphs, where nodes represent entities and edges represent relationships, |
| + | |||
| + | {{: | ||
| Scene understanding must maintain temporal stability across frames. Flickering detections or inconsistent semantic labels can lead to unstable planning. | Scene understanding must maintain temporal stability across frames. Flickering detections or inconsistent semantic labels can lead to unstable planning. | ||