Ingeniería Informática

URI permanente para esta colección

http://ri.itba.edu.ar/handle/123456789/38

Examinar

Mostrando 1 - 20 de 42

An algebra for OLAP
(2017) Kuijpers, Bart; Vaisman, Alejandro Ariel
"Online Analytical Processing (OLAP) comprises tools and algorithms that allow querying multidimensional databases. It is based on the multidimensional model, where data can be seen as a cube, where each cell contains one or more measures can be aggregated along dimensions. Despite the extensive corpus of work in the field, a standard language for OLAP is still needed, since there is no well-defined, accepted semantics, for many of the usual OLAP operations. In this paper, we address this problem, and present a set of operations for manipulating a data cube. We clearly define the semantics of these operations, and prove that they can be composed, yielding a language powerful enough to express complex OLAP queries. We express these operations as a sequence of atomic transformations over a fixed multidimensional matrix, whose cells contain a sequence of measures. Each atomic transformation produces a new measure. When a sequence of transformations defines an OLAP operation, a flag is produced indicating which cells must be considered as input for the next operation. In this way, an elegant algebra is defined. Our main contribution, with respect to other similar efforts in the field is that, for the first time, a formal proof of the correctness of the operations is given, thus providing a clear semantics for them. We believe the present work will serve as a basis to build more solid practical tools for data analysis."
Analytical queries on semantic trajectories using graph databases
(2019-10) Gómez, Leticia Irene; Kuijpers, Bart; Vaisman, Alejandro Ariel
"This article studies the analysis of moving object data collected by location-aware devices, such as GPS, using graph databases. Such raw trajectories can be transformed into so-called semantic trajectories, which are sequences of stops that occur at “places of interest.” Trajectory data analysis can be enriched if spatial and non-spatial contextual data associated with the moving objects are taken into account, and aggregation of trajectory data can reveal hidden patterns within such data. When trajectory data are stored in relational databases, there is an “impedance mismatch” between the representation and storage models. Graphs in which the nodes and edges are annotated with properties are gaining increasing interest to model a variety of networks. Therefore, this article proposes the use of graph databases (Neo4j in this case) to represent and store trajectory data, which can thus be analyzed at different aggregation levels using graph query languages (Cypher, for Neo4j). Through a real-world public data case study, the article shows that trajectory queries are expressed more naturally on the graph-based representation than over the relational alternative, and perform better in many typical cases."
Analyzing public transport in the city of Buenos Aires with MobilityDB
(2022) Godfrid, Juan; Radnic, Pablo; Vaisman, Alejandro Ariel; Zimányi, Esteban
"The General Transit Feed Specification (GTFS) is a data format widely used to share data about public transportation schedules and associated geographic information. GTFS comes in two versions: GTFS Static describing the planned itineraries and GTFS Realtime describing the actual ones. MobilityDB is a novel and free open-source moving object database, developed as a PostgreSQL and PostGIS extension, that adds spatial and temporal data types along with a large number of functions, that facilitate the analysis of mobility data. Loading GTFS data into MobilityDB is a quite complex task that, nevertheless, must be done in an ad-hoc fashion. This work describes how MobilityDB is used to analyze public transport mobility in the city of Buenos Aires, using both, static and real-time GTFS data for the Buenos Aires public transportation system. Visualizations are also produced to enhance the analy-sis. To the authors’ knowledge, this is the first attempt to analyze GTFS data with a moving object database."
Analyzing the quality of Twitter data streams
(2022) Arolfo, Franco; Cortés Rodriguez, Kevin; Vaisman, Alejandro Ariel
"There is a general belief that the quality of Twitter data streams is generally low and unpredictable, making, in some way, unreliable to take decisions based on such data. The work presented here addresses this problem from a Data Quality (DQ) perspective, adapting the traditional methods used in relational databases, based on quality dimensions and metrics, to capture the characteristics of Twitter data streams in particular, and of Big Data in a more general sense. Therefore, as a first contribution, this paper re-defines the classic DQ dimensions and metrics for the scenario under study. Second, the paper introduces a software tool that allows capturing Twitter data streams in real time, computing their DQ and displaying the results through a wide variety of graphics. As a third contribution of this paper, using the aforementioned machinery, a thorough analysis of the DQ of Twitter streams is performed, based on four dimensions: Readability, Completeness, Usefulness, and Trustworthiness. These dimensions are studied for several different cases, namely unfiltered data streams, data streams filtered using a collection of keywords, and classifying tweets referring to different topics, studying the DQ for each topic. Further, although it is well known that the number of geolocalized tweets is very low, the paper studies the DQ of tweets with respect to the place from where they are posted. Last but not least, the tool allows changing the weights of each quality dimension considered in the computation of the overall data quality of a tweet. This allows defining weights that fit different analysis contexts and/or different user profiles. Interestingly, this study reveals that the quality of Twitter streams is higher than what would have been expected."
Analyzing the quality of Twitter data streams
(2020) Arolfo, Franco A.; Cortes Rodriguez, Kevin; Vaisman, Alejandro Ariel
"There is a general belief that the quality of Twitter data streams is generally low and unpredictable, making, in some way, unreliable to take decisions based on such data. The work presented here addresses this problem from a Data Quality (DQ) perspective, adapting the traditional methods used in relational databases, based on quality dimensions and metrics, to capture the characteristics of Twitter data streams in particular, and of Big Data in a more general sense. Therefore, as a first contribution, this paper re-defines the classic DQ dimensions and metrics for the scenario under study. Second, the paper introduces a software tool that allows capturing Twitter data streams in real time, computing their DQ and displaying the results through a wide variety of graphics. As a third contribution of this paper, using the aforementioned machinery, a thorough analysis of the DQ of Twitter streams is performed, based on four dimensions: Readability, Completeness, Usefulness, and Trustworthiness. These dimensions are studied for several different cases, namely unfiltered data streams, data streams filtered using a collection of keywords, and classifying tweets referring to different topics, studying the DQ for each topic. Further, although it is well known that the number of geolocalized tweets is very low, the paper studies the DQ of tweets with respect to the place from where they are posted. Last but not least, the tool allows changing the weights of each quality dimension considered in the computation of the overall data quality of a tweet. This allows defining weights that fit different analysis contexts and/or different user profiles. Interestingly, this study reveals that the quality of Twitter streams is higher than what would have been expected."
ATR: Template-based repair for alloy specifications
(2022) Zheng, Guolong; Vu Nguyen, Thanh; Gutiérrez Brida, Simón; Regis, Germán; Aguirre, Nazareno; Frías, Marcelo F.; Bagheri, Hamid
"Automatic Program Repair (APR) is a practical research topic that studies techniques to automatically repair programs to fix bugs. Most existing APR techniques are designed for imperative programming languages, such as C and Java, and rely on analyzing correct and incorrect executions of programs to identify and repair suspicious statements. We introduce a new APR approach for software specifications written in the Alloy declarative language, where specifications are not “executed”, but rather converted into logical formulas and analyzed using backend constraint solvers, to find specification instances and counterexamples to assertions. We present ATR, a technique that takes as input an Alloy specification with some violated assertion and returns a repaired specification that satisfies the assertion. The key ideas are (i) analyzing the differences between counterexamples that do not satisfy the assertion and instances that do satisfy the assertion to guide the repair and (ii) generating repair candidates from specific templates and pruning the space of repair candidates using the counterexamples and satisfying instances. Experimental results using existing large Alloy benchmarks show that ATR is effective in generating difficult repairs. ATR repairs 66.3% of 1974 fault specifications, including specification repairs that cannot be handled by existing Alloy repair techniques."
Automated workarounds from Java program specifications based on SAT solving
(2018-11) Uva, Marcelo; Ponzio, Pablo; Regis, Germán; Aguirre, Nazareno; Frías, Marcelo
"The failures that bugs in software lead to can sometimes be bypassed by the so-called workarounds: when a (faulty) routine fails, alternative routines that the system offers can be used in place of the failing one, to circumvent the failure. Existing approaches to workaround-based system recovery consider workarounds that are produced from equivalent method sequences, utomatically computed from user-provided abstract models, or directly produced from user-provided equivalent sequences of operations. In this paper, we present two techniques for computing workarounds from Java code equipped with formal specifications, that improve previous approaches in two respects. First, the particular state where the failure originated is actively involved in computing workarounds, thus leading to repairs that are more state specific. Second, our techniques automatically compute workarounds on concrete program state characterizations, avoiding abstract software models and user-provided equivalences. The first technique uses SAT solving to compute a sequence of methods that is equivalent to a failing method on a specific failing state, but which can also be generalized to schemas for workaround reuse. The second technique directly exploits SAT to circumvent a failing method, building a state that mimics the (correct) behaviour of a failing routine, from a specific program state too. We perform an experimental evaluation based on case studies involving implementations of collections and a library for date arithmetic, showing that the techniques can effectively compute workarounds from complex contracts in an important number of cases, in time that makes them feasible to be used for run-time repairs. Our results also show that our state-specific workarounds enable us to produce repairs in many cases where previous workaround-based approaches are inapplicable."
Characterization of electric load with information theory quantifiers
(2017-01) Aquino, Andre L. L.; Ramos, Heitor S.; Frery, Alejandro C.; Viana, Leonardo P.; Cavalcante, Tamer S. G.; Rosso, Osvaldo A.
"This paper presents a study of the electric load behavior based on the Causality Complexity–Entropy Plane.We use a public data set, namely REDD, which contains detailed power usage information from several domestic appliances. In our characterization, we use the available power data of the circuit/devices of all houses. The Bandt–Pompe methodology combined with the Causality Complexity–Entropy Plane was used to identify and characterize regimes and behaviors over these data. The results showed that this characterization provides a useful insight into the underlying dynamics that govern the electric load."
Comparing samples from the 𝒢0 distribution using a geodesic distance
(2020-06) Frery, Alejandro C.; Gambini, Juliana
"The 𝒢0 distribution is widely used for monopolarized SAR image modeling because it can characterize regions with different degrees of texture accurately. It is indexed by three parameters: the number of looks (which can be estimated for the whole image), a scale parameter and a texture parameter. This paper presents a new proposal for comparing samples from the 𝒢0 distribution using a geodesic distance (GD) as a measure of dissimilarity between models. The objective is quantifying the difference between pairs of samples from SAR data using both local parameters (scale and texture) of the 𝒢0 distribution. We propose three tests based on the GD which combine the tests presented in Naranjo-Torres et al. (IEEE J Sel Top Appl Earth Obs Remote Sens 10(3):987–997, 2017), and we estimate their probability distributions using permutation methods."
A data model and query language for spatio-temporal decision support
(2010) Gómez, Leticia Irene; Kuijpers, Bart; Vaisman, Alejandro Ariel
"In recent years, applications aimed at exploring and analyzing spatial data have emerged, powered by the increasing need of software that integrates Geographic Information Systems(GIS) and On-Line Analytical Processing (OLAP). These applications have been called SOLAP (Spatial OLAP). In previous work, the authors have introduced Piet, a system based on a formal data model that integrates in a single framework GIS, OLAP (On-Line Analytical Processing), and Moving Object data. Real-world problems are inherently spatio-temporal. Thus, in this paper we present a data model that extends Piet, allowing tracking the history of spatial data in the GIS layers. We present a formal study of the two typical ways of intro ducing time into Piet: timestamping the thematic layers in the GIS, and timestamping the spatial objects in each layer. We denote these strategies snapshot-based and timestamp-based representations, respectively, following well-known terminology borrowed from temporal databases. We present and discuss the formal model for both alternatives. Based on the timestamp-based representation, we introduce a formal First-Order spatio-temporal query language, which we denote Lt, able to express spatio-temporal queries over GIS, OLAP, and trajectory data. Finally, we discuss implementation issues, the update operators that must be supported by the model, and sketch a temporal extension to Piet-QL, the SQL-like query language that supports Piet."
Data-driven simulation of pedestrian collision avoidance with a nonparametric neural network
(2020-02) Martin, Rafael F.; Parisi, Daniel
"Data-driven simulation of pedestrian dynamics is an incipient and promising approach for building reliable microscopic pedestrian models. We propose a methodology based on generalized regression neural networks, which does not have to deal with a huge number of free parameters as in the case of multilayer neural networks. Although the method is general, we focus on the one pedestrian - one obstacle problem. Experimental data were collected in a motion capture laboratory providing high-precision trajectories. The proposed model allows us to simulate the trajectory of a pedestrian avoiding an obstacle from any direction. Together with the methodology specifications, we provide the data set needed for performing the simulations of this kind of pedestrian dynamic system."
Design and implementation of ETL processes using BPMN and relational algebra
(2020-06-13) Awiti, Judith; Vaisman, Alejandro Ariel; Zimányi, Esteban
"Extraction, transformation, and loading (ETL) processes are used to extract data from internal and external sources of an organization, transform these data, and load them into a data warehouse. The Business Process Modeling and Notation (BPMN) has been proposed for expressing ETL processes at a conceptual level. A different approach is studied in this paper, where relational algebra (RA), extended with update operations, is used for specifying ETL processes. In this approach, data tasks in an ETL workflow can be automatically translated into SQL queries to be executed over a DBMS. To illustrate this study, the paper addresses the problem of updating Slowly Changing Dimensions (SCDs) with dependencies, that is, the case when updating a SCD table impacts on associated SCD tables. Tackling this problem requires extending the classic RA with update operations. The paper also shows the implementation of a portion of the TPC-DI benchmark that results from both approaches. Thus, the paper presents three implementations: (a) An SQL implementation based on the extended RA-based specification of an ETL process expressed in BPMN4ETL; and (b) Two implementations of workflows that follow from BPMN4ETL, one that uses the Pentaho DI tool, and another one that uses Talend Open Studio for DI. Experiments over these implementations of the TPC-DI benchmark for different scale factors were carried out, and are described and discussed in the paper, showing that the extended RA approach results in more efficient processes than the ones produced by implementing the BPMN4ETL specification over the mentioned ETL tools. The reasons for this result are also discussed."
EEG waveform analysis of P300 ERP with applications to brain computer interfaces
(2018-11) Ramele, Rodrigo; Villar, Ana Julia; Santos, Juan Miguel
"The Electroencephalography (EEG) is not just a mere clinical tool anymore. It has become the de-facto mobile, portable, non-invasive brain imaging sensor to harness brain information in real time. It is now being used to translate or decode brain signals, to diagnose diseases or to implement Brain Computer Interface (BCI) devices. The automatic decoding is mainly implemented by using quantitative algorithms to detect the cloaked information buried in the signal. However, clinical EEG is based intensively on waveforms and the structure of signal plots. Hence, the purpose of this work is to establish a bridge to fill this gap by reviewing and describing the procedures that have been used to detect patterns in the electroencephalographic waveforms, benchmarking them on a controlled pseudo-real dataset of a P300-Based BCI Speller and verifying their performance on a public dataset of a BCI Competition."
Efficient analytical queries on semantic web data cubes
(2017-12) Etcheverry, Lorena; Vaisman, Alejandro Ariel
"The amount of multidimensional data published on the semantic web (SW) is constantly increasing, due to initiatives such as Open Data and Open Government Data, among other ones. Models, languages, and tools, that allow obtaining valuable information e ciently, are thus required. Multidimensional data are typically represented as data cubes, and exploited using Online Analytical Processing (OLAP) techniques. The RDF Data Cube Vocabulary, also denoted QB, is the current W3C standard to represent statistical data on the SW. Given that QB does not include key features needed for OLAP analysis, in previous work we have proposed an extension, denoted QB4OLAP, to overcome this problem without the need of modifying already published data. Once data cubes are appropriately represented on the SW, we need mechanisms to analyze them. However, in the current state-of-the-art, writing e cient analytical queries over SW data cubes demands a deep knowledge of standards like RDF and SPARQL. These skills are unlikely to be found in typical analytical users. Further, OLAP languages like MDX are far from being easily understood by the final user. The lack of friendly tools to exploit multidimensional data on the SW is a barrier that needs to be broken to promote the publication of such data. This is the problem we address in this paper. Our approach is based on allowing analytical users to write queries using what they know best: OLAP operations over data cubes, without dealing with SW technicalities. For this, we devised CQL (standing for Cube Query Language), a simple, high-level query language that operates over data cubes. Taking advantage of structural metadata provided by QB4OLAP, we translate CQL queries into SPARQL ones. Then, we propose query improvement strategies to produce e cient SPARQL queries, adapting general-purpose SPARQL query optimization techniques. We evaluate our implementation using the Star-Schema benchmark, showing that our proposal outperforms others. The QB4OLAP toolkit,a web application that allows exploring and querying (using CQL) SW data cubes, completes our contributions."
An evolutionary approach to translating operational specifications into declarative specifications
(2019-07) Molina, Facundo; Cornejo, César; Degiovanni, Renzo; Regis, Germán; Castro, Pablo; Aguirre, Nazareno; Frías, Marcelo
"Various tools for program analysis, including run-time assertion checkers and static analyzers such as verification and test generation tools, require formal specifications of the programs being analyzed. Moreover, many of these tools and techniques require such specifications to be written in a particular style, or follow certain patterns, in order to obtain an acceptable performance from the corresponding analyses. Thus, having a formal specification sometimes is not enough for using a particular technique, since such specification may not be provided in the right formalism. In this paper, we deal with this problem in the increasingly common case of having an operational specification, while for analysis reasons requiring a declarative specification. We propose an evolutionary approach to translate an operational specification written in a sequential programming language, into a declarative specification, in relational logic. We perform experiments on a benchmark of data structure implementations, for which operational invariants are available, and show that our evolutionary computation based approach to translating specifications achieves very good precision in this context, and produces declarative specifications that are more amenable to analyses that demand specifications in this style. This is assessed in two contexts: bounded verification of data structure invariant preservation, and instance enumeration using symbolic execution aided by tight bounds."
Foundations and applications for secure triggers
(2006-02) Futoransky, Ariel; Kargieman, Emiliano; Sarraute, Carlos; Waissbein, Ariel
"Imagine there is certain content we want to maintain private until some particular event occurs, when we want to have it automatically disclosed. Suppose furthermore, that we want this done in a (possibly) malicious host. Say, the conﬁdential content is a piece of code belonging to a computer program that should remain ciphered and then “be triggered” (i.e., deciphered and executed) when the underlying system satisﬁes a preselected condition which must remain secret after code inspection. In this work we present diﬀerent solutions for problems of this sort, using diﬀerent “declassiﬁcation” criteria, based on a primitive we call secure triggers. We establish the notion of secure triggers in the universally-composable security framework of [Canetti 2001] and introduce several examples. Our examples demonstrate that a new sort of obfuscation is possible. Finally, we motivate its use with applications in realistic scenarios."
The geodesic distance between 𝒢I0 models and its application to region discrimination
(2017-03) Naranjo-Torres, José; Gambini, Juliana; Frery, Alejandro C.
"The 𝒢I0 distribution is able to characterize different regions in monopolarized SAR imagery. It is indexed by three parameters: the number of looks (which can be estimated in the whole image), a scale parameter, and a texture parameter. This paper presents a new proposal for feature extraction and region discrimination in SAR imagery, using the geodesic distance as a measure of dissimilarity between 𝒢I0 models. We derive geodesic distances between models that describe several practical situations, assuming the number of looks known, for same and different texture and for same and different scale. We then apply this new tool to the problems of identifying edges between regions with different texture, and quantify the dissimilarity between pairs of samples in actual SAR data. We analyze the advantages of using the geodesic distance when compared to stochastic distances."
Histogram of gradient orientations of signal plots applied to P300 detection
(2019-07) Ramele, Rodrigo; Villar, Ana Julia; Santos, Juan Miguel
"The analysis of Electroencephalographic (EEG) signals is of ulterior importance to aid in the diagnosis of mental disease and to increase our understanding of the brain. Traditionally, clinical EEG has been analyzed in terms of temporal waveforms, looking at rhythms in spontaneous activity, subjectively identifying troughs and peaks in Event-Related Potentials (ERP), or by studying graphoelements in pathological sleep stages. Additionally, the discipline of Brain Computer Interfaces (BCI) requires new methods to decode patterns from non-invasive EEG signals. This field is developing alternative communication pathways to transmit volitional information from the Central Nervous System. The technology could potentially enhance the quality of life of patients affected by neurodegenerative disorders and other mental illness. This work mimics what electroencephalographers have been doing clinically, visually inspecting, and categorizing phenomena within the EEG by the extraction of features from images of signal plots. These features are constructed based on the calculation of histograms of oriented gradients from pixels around the signal plot. It aims to provide a new objective framework to analyze, characterize and classify EEG signal waveforms. The feasibility of the method is outlined by detecting the P300, an ERP elicited by the oddball paradigm of rare events, and implementing an offline P300-based BCI Speller. The validity of the proposal is shown by offline processing a public dataset of Amyotrophic Lateral Sclerosis (ALS) patients and an own dataset of healthy subjects."
Improving lazy abstraction for SCR specifications through constraint relaxation
(2018-03) Degiovanni, Renzo; Ponzio, Pablo; Aguirre, Nazareno; Frías, Marcelo
"Formal requirements specifications, eg, software cost reduction (SCR) specifications, are challenging to analyse using automated techniques such as model checking. Since such specifications are meant to capture requirements, they tend to refer to real-world magnitudes often characterized through variables over large domains. At the same time, they feature a high degree of nondeterminism, as opposed to other analysis contexts such as (sequential) program verification. This makes model checking of SCR specifications difficult even for symbolic approaches. Moreover, automated abstraction refinement techniques such as counterexample guided abstraction refinement fail in many cases in this context, since the concrete state space is typically large, and reaching specific states of interest may require complex executions involving many different states, causing these approaches to perform many abstraction refinements, and making them ineffective in practice. In this paper, an approach to tackle the above situation, through a 2-stage abstraction, is presented. The specification is first relaxed, by disregarding the constraints imposed in the specification by physical laws or by the environment, before being fed to a counterexample guided abstraction refinement procedure, tailored to SCR. By relaxing the original specification, shorter spurious counterexamples are produced, favouring the abstraction refinement through the introduction of fewer abstraction predicates. Then, when a counterexample is concretizable with respect to the relaxed (concrete) specification but it is spurious with respect to the original specification, an efficient though incomplete refinement step is applied to the constraints, to cause the removal of the spurious case. This approach is experimentally assessed, comparing it with related techniques in the verification of properties and in automated test case generation, using various SCR specifications drawn from the literature as case studies. The experiments show that this new approach runs faster and scales better to larger, more complex specifications than related techniques."
An inference engine based on fuzzy logic for uncertain and imprecise expert reasoning
(2002-07) D'Aquila, Raimundo; Crespo Crespo, Cecilia; Mate, J. L.; Pazos, J.
"This paper addresses the development and computational implementation of an inference engine based on a full fuzzy logic, excluding only imprecise quantifiers, for handling uncertainty and imprecision in rule-based expert systems. The logical model exploits some connectives of Lukasiewicz’s infinite multi-valued logic and is mainly founded on the work of L.A. Zadeh and J.F. Baldwin. As it is oriented to expert systems, the inference engine was developed to be as knowledge domain independent as possible, while having satisfactory computational efficiency (...)."

Examinar

Examinando Ingeniería Informática por Título

Resultados por página

Opciones de ordenación