This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| en:safeav:maps:ai [2025/10/21 11:51] – [Scene Understanding] kosnark | en:safeav:maps:ai [2025/10/21 13:18] (current) – [Data Requirements] kosnark | ||
|---|---|---|---|
| Line 18: | Line 18: | ||
| * | * | ||
| - | ==== Deep Learning Architectures ==== | + | ===== Deep Learning Architectures |
| Deep learning architectures form the computational backbone of AI-based perception systems in autonomous vehicles. | Deep learning architectures form the computational backbone of AI-based perception systems in autonomous vehicles. | ||
| Line 39: | Line 39: | ||
| Alternatively, | Alternatively, | ||
| These models are critical for estimating the shape and distance of objects in 3D space, especially under challenging lighting or weather conditions. | These models are critical for estimating the shape and distance of objects in 3D space, especially under challenging lighting or weather conditions. | ||
| + | |||
| + | {{ : | ||
| === Transformer Architectures === | === Transformer Architectures === | ||
| Line 45: | Line 47: | ||
| In autonomous driving, transformers are used for **feature fusion**, **bird’s-eye-view (BEV) mapping**, and **trajectory prediction**. | In autonomous driving, transformers are used for **feature fusion**, **bird’s-eye-view (BEV) mapping**, and **trajectory prediction**. | ||
| Notable examples include '' | Notable examples include '' | ||
| + | |||
| + | |||
| === Recurrent and Temporal Models === | === Recurrent and Temporal Models === | ||
| Line 51: | Line 55: | ||
| They are common in **object tracking** and **motion prediction**, | They are common in **object tracking** and **motion prediction**, | ||
| More recent architectures use temporal convolutional networks or transformers to achieve similar results with greater parallelism and stability. | More recent architectures use temporal convolutional networks or transformers to achieve similar results with greater parallelism and stability. | ||
| + | |||
| + | {{ : | ||
| === Graph Neural Networks (GNNs) === | === Graph Neural Networks (GNNs) === | ||
| Line 60: | Line 66: | ||
| For instance, a CNN may extract image features, a point-based network may process LiDAR geometry, and a transformer may fuse both into a joint representation. | For instance, a CNN may extract image features, a point-based network may process LiDAR geometry, and a transformer may fuse both into a joint representation. | ||
| These hierarchical and multimodal architectures enable robust perception across varied environments and sensor conditions, providing the high-level scene understanding required for safe autonomous behavior. | These hierarchical and multimodal architectures enable robust perception across varied environments and sensor conditions, providing the high-level scene understanding required for safe autonomous behavior. | ||
| + | |||
| + | ==== Data Requirements ==== | ||
| + | |||
| + | The effectiveness of ''' | ||
| + | As deep neural networks do not rely on explicit programming, | ||
| + | |||
| + | Robust perception requires exposure to the full range of operating conditions that a vehicle may encounter. | ||
| + | Datasets must include variations in: | ||
| + | |||
| + | * **Sensor modalities** – data from cameras, LiDAR, radar, GNSS, and IMU, reflecting the multimodal nature of perception. | ||
| + | * **Environmental conditions** – daytime and nighttime scenes, different seasons, weather effects such as rain, fog, or snow. | ||
| + | * **Geographical and cultural contexts** – urban, suburban, and rural areas; diverse traffic rules and road signage conventions. | ||
| + | * **Behavioral diversity** – normal driving, aggressive maneuvers, and rare events such as jaywalking or emergency stops. | ||
| + | * **Edge cases** – rare but safety-critical situations, including near-collisions or sensor occlusions. | ||
| + | |||
| + | A balanced dataset should capture both common and unusual situations to ensure that perception models generalize safely beyond the training distribution. | ||
| + | Because collecting real-world data for every possible scenario is impractical and almost impossible, simulated or synthetic data are often used to supplement real-world datasets. | ||
| + | Photorealistic simulators such as '' | ||
| + | Synthetic data helps to fill gaps in real-world coverage and supports transfer learning, though domain adaptation is often required to mitigate the so-called '' | ||
| + | |||
| + | === Annotation and Labeling === | ||
| + | Supervised learning models rely on accurately annotated datasets, where each image, frame, or point cloud is labeled with semantic information such as object classes, bounding boxes, or segmentation masks. | ||
| + | Annotation quality is critical: inconsistent or noisy labels can propagate systematic errors through the learning process. | ||
| + | Modern annotation pipelines combine human labeling with automation — using pre-trained models, interactive tools, and active learning to accelerate the process. | ||
| + | High-precision labeling is particularly demanding for LiDAR point clouds and multi-sensor fusion datasets, where 3D geometric consistency must be maintained across frames. | ||
| + | |||
| + | |||
| + | === Ethical and Privacy Considerations === | ||
| + | Data used in autonomous driving frequently includes imagery of people, vehicles, and property. | ||
| + | To comply with privacy regulations and ethical standards, datasets must be anonymized by blurring faces and license plates, encrypting location data, and maintaining secure data storage. | ||
| + | Fairness and inclusivity in dataset design are equally important to prevent bias across geographic regions or demographic contexts. | ||
| + | |||
| ==== Scene Understanding ==== | ==== Scene Understanding ==== | ||