Starten Sie Ihre Suche...


Wir weisen darauf hin, dass wir technisch notwendige Cookies verwenden. Weitere Informationen

Designing Grammar-Guided LLM Outputs for Open Data Integration – A DSR Approach to IoT Data Platforms

Samir Chatterjee; Jan Brocke; Ricardo Anderson (Hrsg). Local Solutions for Global Challenges : 20th International Conference on Design Science Research in Information Systems and Technology, DESRIST 2025, Montego Bay, Jamaica, June 2-4, 2025, Proceedings, Part I. Bd. 1. Cham: Springer Nature Switzerland 2025 S. 178 - 195

Erscheinungsjahr: 2025

Publikationstyp: Diverses (Konferenzbeitrag)

Sprache: Englisch

Doi/URN: https://doi.org/10.1007/978-3-031-93976-1_12

Volltext über DOI/URN

Geprüft:Bibliothek

Inhaltszusammenfassung


This paper designs and implements an artifact for converting unstructured or semi-structured open data into outputs conforming to the OGC SensorThings API (STA). Motivated by the growing influx of heterogeneous data in Internet-of-Things environments, the study employs an Action Design Research process to apply formalized grammars to Large Language Models (LLMs) to produce valid, STA-compliant JSON documents. Early prototypes using JSON schemas and Pydantic models highlighted the need for str...This paper designs and implements an artifact for converting unstructured or semi-structured open data into outputs conforming to the OGC SensorThings API (STA). Motivated by the growing influx of heterogeneous data in Internet-of-Things environments, the study employs an Action Design Research process to apply formalized grammars to Large Language Models (LLMs) to produce valid, STA-compliant JSON documents. Early prototypes using JSON schemas and Pydantic models highlighted the need for stricter control mechanisms to handle real-world open data complexity. Evaluation across multiple open data sources demonstrates the effectiveness of grammar-driven constraints in reducing malformed or incomplete outputs. Three smaller LLMs—Qwen 2.5 Instruct, Llama 3.1 Instruct, and Phi-4—were tested, showing that grammar length and input context can significantly influence output quality and model throughput. The findings underscore the advantages of embedding strict syntax requirements without sacrificing flexibility for diverse use cases. While domain-level validation (e.g., verifying realistic time-series values) remains a future direction, this research confirms the promise of grammar-based generation for streamlining data ingestion in IoT platforms. The approach facilitates more consistent and maintainable pipelines, potentially boosting interoperability and data quality in sensor-driven environments.» weiterlesen» einklappen

  • Context-free Grammar
  • Large Language Model
  • Open Data
  • SensorThings API

Autoren


Arz von Straussenburg, Arnold F. (Autor)

Verbundene Forschungsprojekte



Verknüpfte Personen



Beteiligte Einrichtungen