Revolutionizing Embedded Systems Debugging: Applying Retrieval-Augmented Generation To Heterogeneous Log Analysis
Keywords:
Retrieval-Augmented Generation, Embedded Systems Debugging, Heterogeneous Log Analysis, Root Cause Analysis, Large Language ModelsAbstract
The complexity of modern embedded systems, particularly within the automotive sector, has created a significant and growing challenge in diagnostic data management. A single failure in a distributed, software-defined vehicle may produce log artifacts spanning a Real-Time Operating System (RTOS), a high-level application framework, and low-level hardware interfaces operating on separate bus protocols. Traditional keyword-based search methods are insufficient against this heterogeneous data landscape, resulting in extended triage cycles and delayed software release schedules. This article investigates the application of Large Language Models (LLMs) combined with Retrieval-Augmented Generation (RAG) as a mechanism for automating root cause analysis of complex, intermittent software defects. Treating log analysis as a semantic reasoning problem rather than a pattern matching problem enables AI agents to reason over disparate diagnostic events within the software stack, resulting in major improvements in Mean Time to Resolution (MTTR). The proposed architecture addresses the core limitations of conventional tooling: vocabulary mismatch, the semantic gap in log interpretation, and temporal correlation failure across heterogeneous sources. Our evaluation on a synthetic embedded log corpus shows that our hybrid dense-plus-sparse retrieval architecture serves to bridge the vocabulary gap between engineering fault concepts and system log strings. Hybrid search outperforms keyword-only baseline search in terms of Precision@10 and Recall@10; HyDE query expansion further increases the recall of hybrid search for queries with mismatched vocabulary.




