Ingeniería Informática
Permanent URI for this community
Browse
Browsing Ingeniería Informática by Subject "ALMACENES DE DATOS"
Now showing 1 - 10 of 10
Results Per Page
Sort Options
artículo de publicación periódica.listelement.badge An algebra for OLAP(2017) Kuijpers, Bart; Vaisman, Alejandro Ariel"Online Analytical Processing (OLAP) comprises tools and algorithms that allow querying multidimensional databases. It is based on the multidimensional model, where data can be seen as a cube, where each cell contains one or more measures can be aggregated along dimensions. Despite the extensive corpus of work in the field, a standard language for OLAP is still needed, since there is no well-defined, accepted semantics, for many of the usual OLAP operations. In this paper, we address this problem, and present a set of operations for manipulating a data cube. We clearly define the semantics of these operations, and prove that they can be composed, yielding a language powerful enough to express complex OLAP queries. We express these operations as a sequence of atomic transformations over a fixed multidimensional matrix, whose cells contain a sequence of measures. Each atomic transformation produces a new measure. When a sequence of transformations defines an OLAP operation, a flag is produced indicating which cells must be considered as input for the next operation. In this way, an elegant algebra is defined. Our main contribution, with respect to other similar efforts in the field is that, for the first time, a formal proof of the correctness of the operations is given, thus providing a clear semantics for them. We believe the present work will serve as a basis to build more solid practical tools for data analysis."artículo de publicación periódica.listelement.badge Design and implementation of ETL processes using BPMN and relational algebra(2020-06-13) Awiti, Judith; Vaisman, Alejandro Ariel; Zimányi, Esteban"Extraction, transformation, and loading (ETL) processes are used to extract data from internal and external sources of an organization, transform these data, and load them into a data warehouse. The Business Process Modeling and Notation (BPMN) has been proposed for expressing ETL processes at a conceptual level. A different approach is studied in this paper, where relational algebra (RA), extended with update operations, is used for specifying ETL processes. In this approach, data tasks in an ETL workflow can be automatically translated into SQL queries to be executed over a DBMS. To illustrate this study, the paper addresses the problem of updating Slowly Changing Dimensions (SCDs) with dependencies, that is, the case when updating a SCD table impacts on associated SCD tables. Tackling this problem requires extending the classic RA with update operations. The paper also shows the implementation of a portion of the TPC-DI benchmark that results from both approaches. Thus, the paper presents three implementations: (a) An SQL implementation based on the extended RA-based specification of an ETL process expressed in BPMN4ETL; and (b) Two implementations of workflows that follow from BPMN4ETL, one that uses the Pentaho DI tool, and another one that uses Talend Open Studio for DI. Experiments over these implementations of the TPC-DI benchmark for different scale factors were carried out, and are described and discussed in the paper, showing that the extended RA approach results in more efficient processes than the ones produced by implementing the BPMN4ETL specification over the mentioned ETL tools. The reasons for this result are also discussed."artículo de publicación periódica.listelement.badge Efficient analytical queries on semantic web data cubes(2017-12) Etcheverry, Lorena; Vaisman, Alejandro Ariel"The amount of multidimensional data published on the semantic web (SW) is constantly increasing, due to initiatives such as Open Data and Open Government Data, among other ones. Models, languages, and tools, that allow obtaining valuable information e ciently, are thus required. Multidimensional data are typically represented as data cubes, and exploited using Online Analytical Processing (OLAP) techniques. The RDF Data Cube Vocabulary, also denoted QB, is the current W3C standard to represent statistical data on the SW. Given that QB does not include key features needed for OLAP analysis, in previous work we have proposed an extension, denoted QB4OLAP, to overcome this problem without the need of modifying already published data. Once data cubes are appropriately represented on the SW, we need mechanisms to analyze them. However, in the current state-of-the-art, writing e cient analytical queries over SW data cubes demands a deep knowledge of standards like RDF and SPARQL. These skills are unlikely to be found in typical analytical users. Further, OLAP languages like MDX are far from being easily understood by the final user. The lack of friendly tools to exploit multidimensional data on the SW is a barrier that needs to be broken to promote the publication of such data. This is the problem we address in this paper. Our approach is based on allowing analytical users to write queries using what they know best: OLAP operations over data cubes, without dealing with SW technicalities. For this, we devised CQL (standing for Cube Query Language), a simple, high-level query language that operates over data cubes. Taking advantage of structural metadata provided by QB4OLAP, we translate CQL queries into SPARQL ones. Then, we propose query improvement strategies to produce e cient SPARQL queries, adapting general-purpose SPARQL query optimization techniques. We evaluate our implementation using the Star-Schema benchmark, showing that our proposal outperforms others. The QB4OLAP toolkit,a web application that allows exploring and querying (using CQL) SW data cubes, completes our contributions."ponencia en congreso.listelement.badge From conceptual to logical ETL design using BPMN and relational algebra(2019) Awiti, Judith; Vaisman, Alejandro Ariel; Zimányi, Esteban"Extraction, transformation, and loading (ETL) processes are used to extract data from internal and external sources of an organization, transform these data, and load them into a data warehouse. The Business Process Modeling Notation (BPMN) has been proposed for expressing ETL processes at a conceptual level. This paper extends relational algebra (RA) with update operations for specifying ETL processes at a logical level. In this approach, data tasks can be automatically translated into SQL queries to be executed over a DBMS. An extension of RA is presented, as well as a translation mechanism from BPMN to the RA specification. Throughout the paper, the TPC-DI benchmark is used for comparing both approaches. Experiments show the efficiency of the resulting ETL flow with respect to the Pentaho Data Integration tool."artículo de publicación periódica.listelement.badge Mobility data warehouses(2019-04) Vaisman, Alejandro Ariel; Zimányi, Esteban"The interest in mobility data analysis has grown dramatically with the wide availability of devices that track the position of moving objects. Mobility analysis can be applied, for example, to analyze traffic flows. To support mobility analysis, trajectory data warehousing techniques can be used. Trajectory data warehouses typically include, as measures, segments of trajectories, linked to spatial and non-spatial contextual dimensions. This paper goes beyond this concept, by including, as measures, the trajectories of moving objects at any point in time. In this way, online analytical processing (OLAP) queries, typically including aggregation, can be combined with moving object queries, to express queries like “List the total number of trucks running at less than 2 km from each other more than 50% of its route in the province of Antwerp” in a concise and elegant way. Existing proposals for trajectory data warehouses do not support queries like this, since they are based on either the segmentation of the trajectories, or a pre-aggregation of measures. The solution presented here is implemented using MobilityDB, a moving object database that extends the PostgresSQL database with temporal data types, allowing seamless integration with relational spatial and non-spatial data. This integration leads to the concept of mobility data warehouses. This paper discusses modeling and querying mobility data warehouses, providing a comprehensive collection of queries implemented using PostgresSQL and PostGIS as database backend, extended with the libraries provided by MobilityDB."proyecto final de grado.listelement.badge Modelado y consulta de data warehouses usando bases de datos de grafo(2019-06) Besteiro, María Florencia; Valverde Melito, Maximiliano Javier; Vaisman, Alejandro Ariel"El objetivo de este proyecto es poder modelar y consultar un data warehouse usando bases de datos de grafos, de forma de poder comparar la performance del mismo en contraste con su alternativa relacional. Para ello se definirán distintos tipos de consultas que luego serán ejecutadas en ambos modelos, con el fin de determinar que situaciones se destaca cada uno de ellos en términos de performance."ponencia en congreso.listelement.badge Modelling and querying star and snowflake warehouses using graph databases(2019) Vaisman, Alejandro Ariel; Besteiro, María Florencia; Valverde Melito, Maximiliano Javier"In current “Big Data” scenarios, graph databases are increasingly being used. Online Analytical Processing (OLAP) operations can expand the possibilities of graph analysis beyond the traditional graphbased computation. This paper studies graph databases as an alternative to implement star and snowflake schemas, the typical choices for data warehouse design. For this, the MusicBrainz database is used. A data warehouse for this database is designed, and implemented over a Postgres relational database. This warehouse is also represented as a graph, and implemented over the Neo4j graph database. A collection of typical OLAP queries is used to compare both implementations. The results reported here show that in ten out of thirteen queries tested, the graph implementation outperforms the relational one, in ratios that go from 1.3 to 26 times faster, and performs similarly to the relational implementation in the three remaining cases."artículo de publicación periódica.listelement.badge Online analytical processsing on graph data(2020) Gómez, Leticia Irene; Kuijpers, Bart; Vaisman, Alejandro Ariel"Online Analytical Processing (OLAP) comprises tools and algorithms that allow querying multidimensional databases. It is based on the multidimensional model, where data can be seen as a cube such that each cell contains one or more measures that can be aggregated along dimensions. In a “Big Data” scenario, traditional data warehousing and OLAP operations are clearly not sufficient to address current data analysis requirements, for example, social network analysis. Furthermore, OLAP operations and models can expand the possibilities of graph analysis beyond the traditional graph-based computation. Nevertheless, there is not much work on the problem of taking OLAP analysis to the graph data model. This paper proposes a formal multidimensional model for graph analysis, that considers the basic graph data, and also background information in the form of dimension hierarchies. The graphs in this model are node- and edge-labelled directed multihypergraphs, called graphoids, which can be defined at several different levels of granularity using the dimensions associated with them. Operations analogous to the ones used in typical OLAP over cubes are defined over graphoids. The paper presents a formal definition of the graphoid model for OLAP, proves that the typical OLAP operations on cubes can be expressed over the graphoid model, and shows that the classic data cube model is a particular case of the graphoid data model. Finally, a case study supports the claim that, for many kinds of OLAP-like analysis on graphs, the graphoid model works better than the typical relational OLAP alternative, and for the classic OLAP queries, it remains competitive."artículo de publicación periódica.listelement.badge Schema evolution in multiversion data warehouses(2021) Ahmed, Waqas; Vaisman, Alejandro Ariel; Zimányi, Esteban; Wrembel, Robert"Data warehouses (DWs) evolve in both their content and schema due to changes of user requirements, business processes, or external sources to name a few. Although multiple approaches using temporal and/or multiversion DWs have been proposed to handle these changes, an efficient solution for this problem is still lacking. The authors' approach is to separate concerns and use temporal DWs to deal with content changes, and multiversion DWs to deal with schema changes. To address the former, previously, they have proposed a temporal multidimensional (MD) model. In this paper, they propose a multiversion MD model for schema evolution to tackle the latter problem. The two models complement each other and allow managing both content and schema evolution. In this paper, the semantics of schema modification operators (SMOs) to derive various schema versions are given. It is also shown how online analytical processing (OLAP) operations like roll-up work on the model. Finally, the mapping from the multiversion MD model to a relational schema is given along with OLAP operations in standard SQL."artículo de publicación periódica.listelement.badge A temporal multidimensional model and OLAP operators(2020) Ahmed, Waqas; Zimányi, Esteban; Vaisman, Alejandro Ariel; Wrembel, Robert"Usually, data in data warehouses (DWs) are stored using the notion of the multidimensional (MD) model. Often, DWs change in content and structure due to several reasons, like, for instance, changes in a business scenario or technology. For accurate decision-making, a DW model must allow storing and analyzing time-varying data. This paper addresses the problem of keeping track of the history of the data in a DW. For this, first, a formalization of the traditional MD model is proposed and then extended as a generalized temporal MD model. The model comes equipped with a collection of typical online analytical processing (OLAP) operations with temporalsemantics, which isformalized for the four classic operations, namely roll-up, dice, project, and drill-across. Finally, the mapping from the generalized temporal model into a relational schema is presented together with an implementation of the temporal OLAP operations in standard SQL."