Browsing by Subject "OLAP"
Now showing 1 - 19 of 19
Results Per Page
Sort Options
ponencia en congreso.listelement.badge Aggregation languages for moving object and places of interest(2008) Gómez, Leticia Irene; Kuijpers, Bart; Vaisman, Alejandro Ariel"We address aggregate queries over GIS data and moving object data, where non-spatial information is stored in a data warehouse. We propose a formal data model and query language to express complex aggregate queries. Next, we study the compression of trajectory data, produced by moving objects, using the notions of stops and moves. We show that stops and moves are expressible in our query language and we consider a fragment of this language, consisting of regular expressions to talk about temporally ordered sequences of stops and moves. This fragment can be used not only for querying, but also for expressing data mining and pattern matching tasks over trajectory data."artículo de publicación periódica.listelement.badge An algebra for OLAP(2017) Kuijpers, Bart; Vaisman, Alejandro Ariel"Online Analytical Processing (OLAP) comprises tools and algorithms that allow querying multidimensional databases. It is based on the multidimensional model, where data can be seen as a cube, where each cell contains one or more measures can be aggregated along dimensions. Despite the extensive corpus of work in the field, a standard language for OLAP is still needed, since there is no well-defined, accepted semantics, for many of the usual OLAP operations. In this paper, we address this problem, and present a set of operations for manipulating a data cube. We clearly define the semantics of these operations, and prove that they can be composed, yielding a language powerful enough to express complex OLAP queries. We express these operations as a sequence of atomic transformations over a fixed multidimensional matrix, whose cells contain a sequence of measures. Each atomic transformation produces a new measure. When a sequence of transformations defines an OLAP operation, a flag is produced indicating which cells must be considered as input for the next operation. In this way, an elegant algebra is defined. Our main contribution, with respect to other similar efforts in the field is that, for the first time, a formal proof of the correctness of the operations is given, thus providing a clear semantics for them. We believe the present work will serve as a basis to build more solid practical tools for data analysis."tesis de doctorado.listelement.badge Categorical sequential pattern mining in a spatio-temporal environment(c2009) Gómez, Leticia Irene; Vaisman, Alejandro Ariel"En esta tesis argumentamos que la información de trayectorias también puede ser integrada con datos GIS y OLAP, generando un marco poderoso de análisis".artículo de publicación periódica.listelement.badge A data model and query language for spatio-temporal decision support(2010) Gómez, Leticia Irene; Kuijpers, Bart; Vaisman, Alejandro Ariel"In recent years, applications aimed at exploring and analyzing spatial data have emerged, powered by the increasing need of software that integrates Geographic Information Systems(GIS) and On-Line Analytical Processing (OLAP). These applications have been called SOLAP (Spatial OLAP). In previous work, the authors have introduced Piet, a system based on a formal data model that integrates in a single framework GIS, OLAP (On-Line Analytical Processing), and Moving Object data. Real-world problems are inherently spatio-temporal. Thus, in this paper we present a data model that extends Piet, allowing tracking the history of spatial data in the GIS layers. We present a formal study of the two typical ways of intro ducing time into Piet: timestamping the thematic layers in the GIS, and timestamping the spatial objects in each layer. We denote these strategies snapshot-based and timestamp-based representations, respectively, following well-known terminology borrowed from temporal databases. We present and discuss the formal model for both alternatives. Based on the timestamp-based representation, we introduce a formal First-Order spatio-temporal query language, which we denote Lt, able to express spatio-temporal queries over GIS, OLAP, and trajectory data. Finally, we discuss implementation issues, the update operators that must be supported by the model, and sketch a temporal extension to Piet-QL, the SQL-like query language that supports Piet."artículo de publicación periódica.listelement.badge Design and implementation of ETL processes using BPMN and relational algebra(2020-06-13) Awiti, Judith; Vaisman, Alejandro Ariel; Zimányi, Esteban"Extraction, transformation, and loading (ETL) processes are used to extract data from internal and external sources of an organization, transform these data, and load them into a data warehouse. The Business Process Modeling and Notation (BPMN) has been proposed for expressing ETL processes at a conceptual level. A different approach is studied in this paper, where relational algebra (RA), extended with update operations, is used for specifying ETL processes. In this approach, data tasks in an ETL workflow can be automatically translated into SQL queries to be executed over a DBMS. To illustrate this study, the paper addresses the problem of updating Slowly Changing Dimensions (SCDs) with dependencies, that is, the case when updating a SCD table impacts on associated SCD tables. Tackling this problem requires extending the classic RA with update operations. The paper also shows the implementation of a portion of the TPC-DI benchmark that results from both approaches. Thus, the paper presents three implementations: (a) An SQL implementation based on the extended RA-based specification of an ETL process expressed in BPMN4ETL; and (b) Two implementations of workflows that follow from BPMN4ETL, one that uses the Pentaho DI tool, and another one that uses Talend Open Studio for DI. Experiments over these implementations of the TPC-DI benchmark for different scale factors were carried out, and are described and discussed in the paper, showing that the extended RA approach results in more efficient processes than the ones produced by implementing the BPMN4ETL specification over the mentioned ETL tools. The reasons for this result are also discussed."artículo de publicación periódica.listelement.badge Efficient analytical queries on semantic web data cubes(2017-12) Etcheverry, Lorena; Vaisman, Alejandro Ariel"The amount of multidimensional data published on the semantic web (SW) is constantly increasing, due to initiatives such as Open Data and Open Government Data, among other ones. Models, languages, and tools, that allow obtaining valuable information e ciently, are thus required. Multidimensional data are typically represented as data cubes, and exploited using Online Analytical Processing (OLAP) techniques. The RDF Data Cube Vocabulary, also denoted QB, is the current W3C standard to represent statistical data on the SW. Given that QB does not include key features needed for OLAP analysis, in previous work we have proposed an extension, denoted QB4OLAP, to overcome this problem without the need of modifying already published data. Once data cubes are appropriately represented on the SW, we need mechanisms to analyze them. However, in the current state-of-the-art, writing e cient analytical queries over SW data cubes demands a deep knowledge of standards like RDF and SPARQL. These skills are unlikely to be found in typical analytical users. Further, OLAP languages like MDX are far from being easily understood by the final user. The lack of friendly tools to exploit multidimensional data on the SW is a barrier that needs to be broken to promote the publication of such data. This is the problem we address in this paper. Our approach is based on allowing analytical users to write queries using what they know best: OLAP operations over data cubes, without dealing with SW technicalities. For this, we devised CQL (standing for Cube Query Language), a simple, high-level query language that operates over data cubes. Taking advantage of structural metadata provided by QB4OLAP, we translate CQL queries into SPARQL ones. Then, we propose query improvement strategies to produce e cient SPARQL queries, adapting general-purpose SPARQL query optimization techniques. We evaluate our implementation using the Star-Schema benchmark, showing that our proposal outperforms others. The QB4OLAP toolkit,a web application that allows exploring and querying (using CQL) SW data cubes, completes our contributions."ponencia en congreso.listelement.badge From conceptual to logical ETL design using BPMN and relational algebra(2019) Awiti, Judith; Vaisman, Alejandro Ariel; Zimányi, Esteban"Extraction, transformation, and loading (ETL) processes are used to extract data from internal and external sources of an organization, transform these data, and load them into a data warehouse. The Business Process Modeling Notation (BPMN) has been proposed for expressing ETL processes at a conceptual level. This paper extends relational algebra (RA) with update operations for specifying ETL processes at a logical level. In this approach, data tasks can be automatically translated into SQL queries to be executed over a DBMS. An extension of RA is presented, as well as a translation mechanism from BPMN to the RA specification. Throughout the paper, the TPC-DI benchmark is used for comparing both approaches. Experiments show the efficiency of the resulting ETL flow with respect to the Pentaho Data Integration tool."artículo de publicación periódica.listelement.badge Mobility data warehouses(2019-04) Vaisman, Alejandro Ariel; Zimányi, Esteban"The interest in mobility data analysis has grown dramatically with the wide availability of devices that track the position of moving objects. Mobility analysis can be applied, for example, to analyze traffic flows. To support mobility analysis, trajectory data warehousing techniques can be used. Trajectory data warehouses typically include, as measures, segments of trajectories, linked to spatial and non-spatial contextual dimensions. This paper goes beyond this concept, by including, as measures, the trajectories of moving objects at any point in time. In this way, online analytical processing (OLAP) queries, typically including aggregation, can be combined with moving object queries, to express queries like “List the total number of trucks running at less than 2 km from each other more than 50% of its route in the province of Antwerp” in a concise and elegant way. Existing proposals for trajectory data warehouses do not support queries like this, since they are based on either the segmentation of the trajectories, or a pre-aggregation of measures. The solution presented here is implemented using MobilityDB, a moving object database that extends the PostgresSQL database with temporal data types, allowing seamless integration with relational spatial and non-spatial data. This integration leads to the concept of mobility data warehouses. This paper discusses modeling and querying mobility data warehouses, providing a comprehensive collection of queries implemented using PostgresSQL and PostGIS as database backend, extended with the libraries provided by MobilityDB."ponencia en congreso.listelement.badge Modelling and querying star and snowflake warehouses using graph databases(2019) Vaisman, Alejandro Ariel; Besteiro, María Florencia; Valverde Melito, Maximiliano Javier"In current “Big Data” scenarios, graph databases are increasingly being used. Online Analytical Processing (OLAP) operations can expand the possibilities of graph analysis beyond the traditional graphbased computation. This paper studies graph databases as an alternative to implement star and snowflake schemas, the typical choices for data warehouse design. For this, the MusicBrainz database is used. A data warehouse for this database is designed, and implemented over a Postgres relational database. This warehouse is also represented as a graph, and implemented over the Neo4j graph database. A collection of typical OLAP queries is used to compare both implementations. The results reported here show that in ten out of thirteen queries tested, the graph implementation outperforms the relational one, in ratios that go from 1.3 to 26 times faster, and performs similarly to the relational implementation in the three remaining cases."tesis de doctorado.listelement.badge Un modelo y lenguaje de consulta genérico para el procesamiento analítico online y su aplicación a campos de datos continuos(2014) Gómez, Silvia Alicia; Vaisman, Alejandro Ariel"El análisis de los datos históricos es crucial para la gestión estratégica y la toma de decisiones en diferentes tipos de organizaciones, desde empresas comerciales hasta entidades gubernamentales o civiles. A diferencia de los inicios, la proliferación actual de datos útiles supera los límites de las propias organizaciones. Por otra parte, los datos que se incorporan al análisis organizacional son muy complejos, involucrando imágenes, funcionalidades geográficas, mapas satelitales, web logs, información de redes sociales y datos de bioinformática, entre otros."artículo de publicación periódica.listelement.badge Online analytical processsing on graph data(2020) Gómez, Leticia Irene; Kuijpers, Bart; Vaisman, Alejandro Ariel"Online Analytical Processing (OLAP) comprises tools and algorithms that allow querying multidimensional databases. It is based on the multidimensional model, where data can be seen as a cube such that each cell contains one or more measures that can be aggregated along dimensions. In a “Big Data” scenario, traditional data warehousing and OLAP operations are clearly not sufficient to address current data analysis requirements, for example, social network analysis. Furthermore, OLAP operations and models can expand the possibilities of graph analysis beyond the traditional graph-based computation. Nevertheless, there is not much work on the problem of taking OLAP analysis to the graph data model. This paper proposes a formal multidimensional model for graph analysis, that considers the basic graph data, and also background information in the form of dimension hierarchies. The graphs in this model are node- and edge-labelled directed multihypergraphs, called graphoids, which can be defined at several different levels of granularity using the dimensions associated with them. Operations analogous to the ones used in typical OLAP over cubes are defined over graphoids. The paper presents a formal definition of the graphoid model for OLAP, proves that the typical OLAP operations on cubes can be expressed over the graphoid model, and shows that the classic data cube model is a particular case of the graphoid data model. Finally, a case study supports the claim that, for many kinds of OLAP-like analysis on graphs, the graphoid model works better than the typical relational OLAP alternative, and for the classic OLAP queries, it remains competitive."ponencia en congreso.listelement.badge Performing OLAP over graph data: query language, implementation, and a case study(2017-08) Gómez, Leticia Irene; Kuijpers, Bart; Vaisman, Alejandro Ariel"In current Big Data scenarios, traditional data warehousing and Online Analytical Processing (OLAP) operations on cubes are clearly not sufficient to address the current data analysis requirements. Nevertheless, OLAP operations and models can expand the possibilities of graph analysis beyond the traditional graph-based computation. In spite of this, there is not much work on the problem of taking OLAP analysis to the graph data model. In previous work we proposed a multidimensional (MD) data model for graph analysis, that considers not only the basic graph data, but background information in the form of dimension hierarchies as well. The graphs in our model are node- and edge-labelled directed multi-hypergraphs, called graphoids, defined at several different levels of granularity. In this paper we show how we implemented this proposal over the widely used Neo4J graph database, discuss implementation issues, and present a detailed case study to show how OLAP operations can be used on graphs."artículo de publicación periódica.listelement.badge Piet: a GIS-OLAP implementation(2007) Vaisman, Alejandro Ariel; Gómez, Leticia Irene; Kuijpers, Bart; Escribano, Ariel"Data aggregation in Geographic Information Systems (GIS) is a desirable feature, although only marginally present in commercial systems, which also fail to provide integration between GIS and OLAP (On Line Analytical Processing). With this in mind, we have developed Piet, a system that makes use of a novel query processing technique: first, a process called sub-polygonization decomposes each thematic layer in a GIS, into open convex polygons; then, another process computes and stores in a database the overlay of those layers for later use by a query processor. We describe the implementation of Piet, and provide experimental evidence that overlay precomputation can outperform GIS systems that employ indexing schemes based on R-trees."ponencia en congreso.listelement.badge Querying semantic web data cubes(2016) Etcheverry, Lorena; Vaisman, Alejandro Ariel"We address the problem of querying data cubes for Online Analytical Processing (OLAP) analysis, directly on the Semantic Web (SW). We rst introduce CQL, a simple algebra for querying data cubes at a conceptual level. Taking advantage of QB4OLAP metadata, we automatically translate CQL queries into SPARQL ones, and propose query optimization strategies that adapt, to the particular OLAP set ting, general-purpose techniques. A web application allows exploring and querying OLAP cubes on the SW, using the machinery presented here."artículo de publicación periódica.listelement.badge Schema evolution in multiversion data warehouses(2021) Ahmed, Waqas; Vaisman, Alejandro Ariel; Zimányi, Esteban; Wrembel, Robert"Data warehouses (DWs) evolve in both their content and schema due to changes of user requirements, business processes, or external sources to name a few. Although multiple approaches using temporal and/or multiversion DWs have been proposed to handle these changes, an efficient solution for this problem is still lacking. The authors' approach is to separate concerns and use temporal DWs to deal with content changes, and multiversion DWs to deal with schema changes. To address the former, previously, they have proposed a temporal multidimensional (MD) model. In this paper, they propose a multiversion MD model for schema evolution to tackle the latter problem. The two models complement each other and allow managing both content and schema evolution. In this paper, the semantics of schema modification operators (SMOs) to derive various schema versions are given. It is also shown how online analytical processing (OLAP) operations like roll-up work on the model. Finally, the mapping from the multiversion MD model to a relational schema is given along with OLAP operations in standard SQL."artículo de publicación periódica.listelement.badge A temporal multidimensional model and OLAP operators(2020) Ahmed, Waqas; Zimányi, Esteban; Vaisman, Alejandro Ariel; Wrembel, Robert"Usually, data in data warehouses (DWs) are stored using the notion of the multidimensional (MD) model. Often, DWs change in content and structure due to several reasons, like, for instance, changes in a business scenario or technology. For accurate decision-making, a DW model must allow storing and analyzing time-varying data. This paper addresses the problem of keeping track of the history of the data in a DW. For this, first, a formalization of the traditional MD model is proposed and then extended as a generalized temporal MD model. The model comes equipped with a collection of typical online analytical processing (OLAP) operations with temporalsemantics, which isformalized for the four classic operations, namely roll-up, dice, project, and drill-across. Finally, the mapping from the generalized temporal model into a relational schema is presented together with an implementation of the temporal OLAP operations in standard SQL."ponencia en congreso.listelement.badge Temporal SOLAP: query language, implementation, and a use case(2012) Bisceglia, Pablo; Gómez, Leticia Irene; Vaisman, Alejandro Ariel"The integration of Geographic Information Systems (GIS) and On-Line Analytical Processing (OLAP), denoted SOLAP, is aimed at exploring and analyzing spatial data. In real-world SOLAP applications, spatial and non-spatial data are subject to changes. In this paper we present a temporal query language for SOLAP, called TPiet-QL, supporting so-called discrete changes (for example, in land use or cadastral applications there are situations where parcels are merged or split). TPiet-QL allows expressing integrated GIS-OLAP queries in an scenario where spatial objects change across time. We also present a prototype implementation, and show how this application is used in a real-world scenario: the analysis of protected areas in Uruguay."ponencia en congreso.listelement.badge Towards temporal graph database(2016) Campos, Alexander; Mozzino, Jorge; Vaisman, Alejandro Ariel"In spite of the extensive literature on graph databases (GDBs), temporal GDBs have not received too much attention so far. Tempo ral GBDs can capture, for example, the evolution of social networks across time, a relevant topic in data analysis nowadays. We propose a data model and query language (denoted TEG-QL) for temporal GDBs, based on the notion of attribute graphs. This allows a straightforward translation to Neo4J, a well-known GBD."ponencia en congreso.listelement.badge Trajectory sequential patterns with regular expression constraints including spatial queries(2010-05) Gardella, Pablo; Gómez, Leticia Irene; Vaisman, Alejandro Ariel"Moving object (MO) data representation and computing have received a fair share of attention over recent years from the database community. Replacing raw trajectory data (i.e., MO positions at different time instants) by sequences of application-dependent stops occurred at so-called places of interest (Pols) leads to the notion of semantic trajectories. Different techniques exist for sequential pattern analysis of trajectories defined in this way. One of them, RE-SPaM, expresses sequential patterns by means of regular expressions built not only over item identifiers, but also over constraints defined on the (temporal and non-temporal) attributes of the items to be analyzed. This analysis could be greatly enriched if spatial and non-spatial data associated with the MO are taken into account. In this paper we show how we can take advantage of the extensibility properties of RE-SPaM to augment its expressive power by allowing to include spatial queries in the constraints. For this, we make use of Piet, a framework allowing to integrate OLAP, GIS and MO data, and its associated query language denoted Piet-QL, providing a link between moving object data and their geographic environment."