International Journal of Electrical and Computer Engineering
Vol 12, No 6: December 2022

Towards a new hybrid approach for building document-oriented data warehouses

Nawfal El Moukhi (Ibn Tofail University)
Ikram El Azami (Ibn Tofail University)
Soufiane Hajbi (Ibn Tofail University)



Article Info

Publish Date
01 Dec 2022

Abstract

Schemaless databases offer a large storage capacity while guaranteeing high performance in data processing. Unlike relational databases, which are rigid and have shown their limitations in managing large amounts of data. However, the absence of a well-defined schema and structure in not only SQL (NoSQL) databases makes the use of data for decision analysis purposes even more complex and difficult. In this paper, we propose an original approach to build a document-oriented data warehouse from unstructured data. The new approach follows a hybrid paradigm that combines data analysis and user requirements analysis. The first data-driven step exploits the fast and distributed processing of the spark engine to generate a general schema for each collection in the database. The second requirement-driven step consists of analyzing the semantics of the decisional requirements expressed in natural language and mapping them to the schemas of the collections. At the end of the process, a decisional schema is generated in JavaScript object notation (JSON) format and the data loading with the necessary transformations is performed.

Copyrights © 2022






Journal Info

Abbrev

IJECE

Publisher

Subject

Computer Science & IT Electrical & Electronics Engineering

Description

International Journal of Electrical and Computer Engineering (IJECE, ISSN: 2088-8708, a SCOPUS indexed Journal, SNIP: 1.001; SJR: 0.296; CiteScore: 0.99; SJR & CiteScore Q2 on both of the Electrical & Electronics Engineering, and Computer Science) is the official publication of the Institute of ...