Artículo de Publicación Periódica:
Design and implementation of ETL processes using BPMN and relational algebra

dc.contributor.authorAwiti, Judith
dc.contributor.authorVaisman, Alejandro Ariel
dc.contributor.authorZimányi, Esteban
dc.date.accessioned2020-09-28T20:01:47Z
dc.date.available2020-09-28T20:01:47Z
dc.date.issued2020-06-13
dc.description.abstract"Extraction, transformation, and loading (ETL) processes are used to extract data from internal and external sources of an organization, transform these data, and load them into a data warehouse. The Business Process Modeling and Notation (BPMN) has been proposed for expressing ETL processes at a conceptual level. A different approach is studied in this paper, where relational algebra (RA), extended with update operations, is used for specifying ETL processes. In this approach, data tasks in an ETL workflow can be automatically translated into SQL queries to be executed over a DBMS. To illustrate this study, the paper addresses the problem of updating Slowly Changing Dimensions (SCDs) with dependencies, that is, the case when updating a SCD table impacts on associated SCD tables. Tackling this problem requires extending the classic RA with update operations. The paper also shows the implementation of a portion of the TPC-DI benchmark that results from both approaches. Thus, the paper presents three implementations: (a) An SQL implementation based on the extended RA-based specification of an ETL process expressed in BPMN4ETL; and (b) Two implementations of workflows that follow from BPMN4ETL, one that uses the Pentaho DI tool, and another one that uses Talend Open Studio for DI. Experiments over these implementations of the TPC-DI benchmark for different scale factors were carried out, and are described and discussed in the paper, showing that the extended RA approach results in more efficient processes than the ones produced by implementing the BPMN4ETL specification over the mentioned ETL tools. The reasons for this result are also discussed."en
dc.identifier.issn0169-023X
dc.identifier.urihttp://ri.itba.edu.ar/handle/123456789/3080
dc.language.isoenen
dc.relationinfo:eu-repo/semantics/altIdentifier/10.1016 / j.datak.2020.101837
dc.relationinfo:eu-repo/semantics/acceptedVersion
dc.relationinfo:eu-repo/grantAgreement/EC/EMJDs/IT4BI-DC/ BE. Bruselas
dc.relationinfo:eu-repo/grantAgreement/ANPCyT/PICT/2017-1054/AR. Ciudad Autónoma de Buenos Aires
dc.subjectALMACENES DE DATOSes
dc.subjectOLAPen
dc.subjectETLen
dc.subjectBPMNen
dc.titleDesign and implementation of ETL processes using BPMN and relational algebraen
dc.typeArtículos de Publicaciones Periódicases
dspace.entity.typeArtículo de Publicación Periódica
itba.description.filiationFil: Awiti, Judith. Université Libre de Bruxelles; Bélgica.
itba.description.filiationFil: Vaisman, Alejandro Ariel. Instituto Tecnológico de Buenos Aires; Argentina.
itba.description.filiationFil: Zimányi, Esteban. Université Libre de Bruxelles; Bélgica.

Archivos

Bloque original
Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
Awiti_2020_ING_INFORMATICA_embargo24meses.pdf
Tamaño:
3.6 MB
Formato:
Adobe Portable Document Format
Descripción:
Articulo_Awiti
Bloque de licencias
Mostrando 1 - 1 de 1
No hay miniatura disponible
Nombre:
license.txt
Tamaño:
1.6 KB
Formato:
Item-specific license agreed upon to submission
Descripción: