Ingeniería Informática

URI permanente para esta colección

http://ri.itba.edu.ar/handle/123456789/38

Examinar

Mostrando 1 - 20 de 42

User-centered road network traffic analysis with MobilityDB
(2022) Sakr, Mahmoud; Zimányi, Esteban; Vaisman, Alejandro Ariel; Bakli, Mohamed
"Performance indicators of road networks are a long-lasting topic of research. Existing schemes assess network properties such as the average speed on road segments and the queuing time at intersections. The increasing availability of user trajectories, collected mainly using mobile phones with a variety of applications, creates opportunities for developing user-centered performance indicators. Performing such an analysis on big trajectory data sets remains a challenge for the existing data management systems, because they lack support for spatiotemporal trajectory data. This article presents an end-to-end solution, based on MobilityDB, a novel moving object database system that extends PostgreSQL with spatiotemporal data types and functions. A new class of indicators is proposed, focused on the users' experience. The indicators address the network design, the traffic flow, and the driving comfort of the motorists. Furthermore, these indicators are expressed as analytical MobilityDB queries over a big set of real vehicle trajectories."
ATR: Template-based repair for alloy specifications
(2022) Zheng, Guolong; Vu Nguyen, Thanh; Gutiérrez Brida, Simón; Regis, Germán; Aguirre, Nazareno; Frías, Marcelo F.; Bagheri, Hamid
"Automatic Program Repair (APR) is a practical research topic that studies techniques to automatically repair programs to fix bugs. Most existing APR techniques are designed for imperative programming languages, such as C and Java, and rely on analyzing correct and incorrect executions of programs to identify and repair suspicious statements. We introduce a new APR approach for software specifications written in the Alloy declarative language, where specifications are not “executed”, but rather converted into logical formulas and analyzed using backend constraint solvers, to find specification instances and counterexamples to assertions. We present ATR, a technique that takes as input an Alloy specification with some violated assertion and returns a repaired specification that satisfies the assertion. The key ideas are (i) analyzing the differences between counterexamples that do not satisfy the assertion and instances that do satisfy the assertion to guide the repair and (ii) generating repair candidates from specific templates and pruning the space of repair candidates using the counterexamples and satisfying instances. Experimental results using existing large Alloy benchmarks show that ATR is effective in generating difficult repairs. ATR repairs 66.3% of 1974 fault specifications, including specification repairs that cannot be handled by existing Alloy repair techniques."
Statistical properties of the entropy from ordinal patterns
(2022) Chagas, Eduarda T. C.; Frery, Alejandro C.; Gambini, Juliana; Lucini, María M.; Ramos, Heitor S.; Rey, Andrea
"The ultimate purpose of the statistical analysis of ordinal patterns is to characterize the distribution of the features they induce. In particular, knowing the joint distribution of the pair entropy-statistical complexity for a large class of time series models would allow statistical tests that are unavailable to date. Working in this direction, we characterize the asymptotic distribution of the empirical Shannon’s entropy for any model under which the true normalized entropy is neither zero nor one. We obtain the asymptotic distribution from the central limit theorem (assuming large time series), the multivariate delta method, and a third-order correction of its mean value. We discuss the applicability of other results (exact, first-, and second-order corrections) regarding their accuracy and numerical stability. Within a general framework for building test statistics about Shannon’s entropy, we present a bilateral test that verifies if there is enough evidence to reject the hypothesis that two signals produce ordinal patterns with the same Shannon’s entropy. We applied this bilateral test to the daily maximum temperature time series from three cities (Dublin, Edinburgh, and Miami) and obtained sensible results."
Analyzing public transport in the city of Buenos Aires with MobilityDB
(2022) Godfrid, Juan; Radnic, Pablo; Vaisman, Alejandro Ariel; Zimányi, Esteban
"The General Transit Feed Specification (GTFS) is a data format widely used to share data about public transportation schedules and associated geographic information. GTFS comes in two versions: GTFS Static describing the planned itineraries and GTFS Realtime describing the actual ones. MobilityDB is a novel and free open-source moving object database, developed as a PostgreSQL and PostGIS extension, that adds spatial and temporal data types along with a large number of functions, that facilitate the analysis of mobility data. Loading GTFS data into MobilityDB is a quite complex task that, nevertheless, must be done in an ad-hoc fashion. This work describes how MobilityDB is used to analyze public transport mobility in the city of Buenos Aires, using both, static and real-time GTFS data for the Buenos Aires public transportation system. Visualizations are also produced to enhance the analy-sis. To the authors’ knowledge, this is the first attempt to analyze GTFS data with a moving object database."
Analyzing the quality of Twitter data streams
(2022) Arolfo, Franco; Cortés Rodriguez, Kevin; Vaisman, Alejandro Ariel
"There is a general belief that the quality of Twitter data streams is generally low and unpredictable, making, in some way, unreliable to take decisions based on such data. The work presented here addresses this problem from a Data Quality (DQ) perspective, adapting the traditional methods used in relational databases, based on quality dimensions and metrics, to capture the characteristics of Twitter data streams in particular, and of Big Data in a more general sense. Therefore, as a first contribution, this paper re-defines the classic DQ dimensions and metrics for the scenario under study. Second, the paper introduces a software tool that allows capturing Twitter data streams in real time, computing their DQ and displaying the results through a wide variety of graphics. As a third contribution of this paper, using the aforementioned machinery, a thorough analysis of the DQ of Twitter streams is performed, based on four dimensions: Readability, Completeness, Usefulness, and Trustworthiness. These dimensions are studied for several different cases, namely unfiltered data streams, data streams filtered using a collection of keywords, and classifying tweets referring to different topics, studying the DQ for each topic. Further, although it is well known that the number of geolocalized tweets is very low, the paper studies the DQ of tweets with respect to the place from where they are posted. Last but not least, the tool allows changing the weights of each quality dimension considered in the computation of the overall data quality of a tweet. This allows defining weights that fit different analysis contexts and/or different user profiles. Interestingly, this study reveals that the quality of Twitter streams is higher than what would have been expected."
A data model and query language for spatio-temporal decision support
(2010) Gómez, Leticia Irene; Kuijpers, Bart; Vaisman, Alejandro Ariel
"In recent years, applications aimed at exploring and analyzing spatial data have emerged, powered by the increasing need of software that integrates Geographic Information Systems(GIS) and On-Line Analytical Processing (OLAP). These applications have been called SOLAP (Spatial OLAP). In previous work, the authors have introduced Piet, a system based on a formal data model that integrates in a single framework GIS, OLAP (On-Line Analytical Processing), and Moving Object data. Real-world problems are inherently spatio-temporal. Thus, in this paper we present a data model that extends Piet, allowing tracking the history of spatial data in the GIS layers. We present a formal study of the two typical ways of intro ducing time into Piet: timestamping the thematic layers in the GIS, and timestamping the spatial objects in each layer. We denote these strategies snapshot-based and timestamp-based representations, respectively, following well-known terminology borrowed from temporal databases. We present and discuss the formal model for both alternatives. Based on the timestamp-based representation, we introduce a formal First-Order spatio-temporal query language, which we denote Lt, able to express spatio-temporal queries over GIS, OLAP, and trajectory data. Finally, we discuss implementation issues, the update operators that must be supported by the model, and sketch a temporal extension to Piet-QL, the SQL-like query language that supports Piet."
Texture descriptors for robust SAR image segmentation
(2022-12-28) Rey, Andrea; Gambini, Juliana; Delrieux, Claudio
"SAR (synthetic aperture radar) and PolSAR (polarimetric synthetic aperture radar) images fulfill a fundamental role in Earth observation, due to their advantages over optical images. However, the presence of speckle noise hinders their automatic interpretation and unsupervised use, rendering traditional segmentation tools ineffective. For this reason, advanced image segmentation models are sought to overcome the limitations that make an adequate treat ment of speckled images difficult. We propose a procedure for SAR and PolSAR image clas sification, based on texture descriptors, that combines fractal dimension, a specific probability distribution function, Tsallis entropy, and the entropic index. A vector of local texture features is built using a set of reference regions, then a support vector machine classifier is applied. The proposed algorithm is tested with synthetic and actual monopolarimetric and polarimetric SAR imagery, exhibiting visually remarkable and robust results in coincidence with quantitative qual ity metrics as accuracy and F1-score."
Object detection and statistical analysis of microscopy image sequences
(2022) Hurovitz, Sasha Ivan; Chan, Debora; Ramele, Rodrigo; Gambini, Juliana
"Confocal microscope images are wide useful in medical diagnosis and research. The automatic interpretation of this type of images is very important but it is a challenging endeavor in image processing area, since these images are heavily contaminated with noise, have low contrast and low resolution. This work deals with the problem of analyzing the penetration velocity of a chemotherapy drug in an ocular tumor called retinoblastoma. The primary retinoblastoma cells cultures are exposed to topotecan drug and the penetration evolution is documented by producing sequences of microscopy images. It is possible to quantify the penetration rate of topotecan drug because it produces fluorescence emission by laser excitation which is captured by the camera. In order to estimate the topotecan penetration time in the whole retinoblastoma cell culture, a procedure based on an active contour detection algorithm, a neural network classifier and a statistical model and its validation, is proposed. This new inference model allows to estimate the penetration time. Results show that the penetration mean time strongly depends on tumorsphere size and on chemotherapeutic treatment that the patient has previously received."
Revisiting soliton dynamics in fiber optics under strict photon-number conservation
(2021) Linale, N.; Fierens, Pablo Ignacio; Grosz, Diego
"We revisit the complex interplay between the Raman-induced frequency shift (RIFS) and the effect of self steepening (SS) in the propagation of solitons, and in the frame work of an equation that ensures strict conservation of the num ber of photons. The generalized nonlinear Schrodinger equation (GNLSE) is shown to severely fail in preserving the number of photons for sub-100-fs solitons, leading to a large overestimation of the frequency shift. Furthermore, when considering the case of a frequency-dependent nonlinear coefficient, the GNLSE also fails to provide a good estimation of the time shift experienced by the soliton. We tackle these shortcomings of the GNLSE by resorting to the recently introduced photon-conserving GNLSE (pcGNLSE) and study the interplay between the RIFS and self steepening. As a result, we make apparent the impact of higher order nonlinearities on short-soliton propagation and propose an original and direct method for the estimation of the second-order nonlinear coefficient."
Joint position and clock tracking of wireless node
(2021) Grisales Campeón, Juan Pablo; Fierenz, Pablo Ignacio
"In this paper we consider the problem of joint position and clock tracking of a mobile wireless node by a set of reference nodes. Imperfections of the mobile clock are characterized by its skew and offset, which are assumed to change with time according to simple random walk models. We put forth a measurement protocol, similar to that used in two-way rang ing, and apply extended and unscented Kalman filters to estimate the position and the velocity of the mobile, and the skew and offset of its clock. We analyze the performance of the algorithms by means of extensive simulations, where the mobile’s velocity is assumed to follow a random walk. Simulation results are compared to the Cramér-Rao bound for a simplified model of a mobile with constant velocity. We show that estimation errors are largely independent of the mean values of the offset and the skew, but they increase with the mean speed. We also study how estimation errors are influenced by other factors such as the number of reference nodes. We believe these results to be of relevance, specially, in indoor positioning applications."
Towards the Internet of water: Using graph databases for hydrological analysis on the Flemish river system
(2021-07) Bollen, Erik; Hendrix, Rik; Kuijpers, Bart; Vaisman, Alejandro Ariel
"The “Internet of Water” project will deploy 2,500 sensors along the Flemish river system, in Belgium. These sensors will be part of a monitoring system. This will produce anenormous amount of data, on which prediction and analysis tasks can be performed. To represent, store, and query river data, relational databases are normally used. However, this choice introduces an “impedance mismatch” between the conceptual representation (typically a graph) and the storage model (relational tables). To solve this problem, this article proposes to use graph databases. The Flemish river system is presented as a use case and the Neo4j graph database and its high-level query language, Cypher, are used for storing and querying the data, respectively. A relational alternative is implemented over the PostgreSQL database. A collection of representative queries of interest for hydrologists is defined over both database implementations."
Schema evolution in multiversion data warehouses
(2021) Ahmed, Waqas; Vaisman, Alejandro Ariel; Zimányi, Esteban; Wrembel, Robert
"Data warehouses (DWs) evolve in both their content and schema due to changes of user requirements, business processes, or external sources to name a few. Although multiple approaches using temporal and/or multiversion DWs have been proposed to handle these changes, an efficient solution for this problem is still lacking. The authors' approach is to separate concerns and use temporal DWs to deal with content changes, and multiversion DWs to deal with schema changes. To address the former, previously, they have proposed a temporal multidimensional (MD) model. In this paper, they propose a multiversion MD model for schema evolution to tackle the latter problem. The two models complement each other and allow managing both content and schema evolution. In this paper, the semantics of schema modification operators (SMOs) to derive various schema versions are given. It is also shown how online analytical processing (OLAP) operations like roll-up work on the model. Finally, the mapping from the multiversion MD model to a relational schema is given along with OLAP operations in standard SQL."
A simple linearization of the self-shrinking generator by means of cellular automata
(2010) Fúster-Sabater, Amparo; Pazo-Robles, María Eugenia; Caballero-Gil, Pino
"In this work, it is shown that the output sequence of a well-known cryptographic generator, the so-called self-shrinking generator, can be obtained from a simple linear model based on cellular automata. In fact, such a cellular model is a linear version of a nonlinear keystream generator currently used in stream ciphers. The linearization procedure is immediate and is based on the concatenation of a basic structure. The obtained cellular automata can be easily implemented with FPGA logic. Linearity and symmetry properties in such automata can be advantageously exploited for the analysis and/or cryptanalysis of this particular type of sequence generator."
Online guidance updates using neural networks
(2010) Filici, Cristian; Sánchez-Peña, Ricardo
"The aim of this article is to present a method for the online guidance update for a launcher ascent trajectory that is based on the utilization of a neural network approximator. Generation of training patterns and selection of the input and output spaces of the neural network are presented, and implementation issues are discussed. The method is illustrated by a 2-dimensional launcher simulation."
A model and query language for temporal graph databases
(2021-09) Debrouvier, Ariel; Parodi, Eliseo; Perazzo, Matías; Soliani, Valeria; Vaisman, Alejandro Ariel
"Graph databases are becoming increasingly popular for modeling different kinds of networks for data analysis. They are built over the property graph data model, where nodes and edges are annotated with property-value pairs. Most existing work in the field is based on graphs were the temporal dimension is not considered. However, time is present in most real world problems. Many different kinds of changes may occur in a graph as the world it represents evolves across time. For instance, edges, nodes, and properties can be added and/or deleted, and property values can be updated. This paper addresses the problem of modeling, storing, and querying temporal property graphs, allowing keeping the history of a graph database. This paper introduces a temporal graph data model, where nodes and relationships contain attributes (properties) timestamped with a validity interval. Graphs in this model can be heterogeneous, that is, relationships may be of different kinds. Associated with the model, a high-level graph query language, denoted T-GQL, is presented, together with a collection of algorithms for computing different kinds of temporal paths in a graph, capturing different temporal path semantics. T-GQL can express queries like “Give me the friends of the friends of Mary, who lived in Brussels at the same time than her, and also give me the periods when this happened”. As a proof-of-concept, a Neo4j-based implementation of the above is also presented, and a client-side interface allows submitting queries in T-GQL to a Neo4j server. Finally, experiments were carried out over synthetic and real-world data sets, with a twofold goal: on the one hand, to show the plausibility of the approach; on the other hand, to analyze the factors that affect performance, like the length of the paths mentioned in the query, and the size of the graph."
Time-series-based queries on stable transportation networks equipped with sensors
(2021) Bollen, Erik; Hendrix, Rik; Kuijpers, Bart; Vaisman, Alejandro Ariel
"In this paper, we propose a formalism to query transportation networks that are equipped with sensors that produce time-series data. The core of the proposed query mechanism is a logic based language that is capable to return time, value, and time-series outputs, as well as Boolean queries. We can also use the language for node selection and path selection. Furthermore, we propose an implementation of this language in a graph database system and evaluate its working on a fragment of the Flemish river system that is equipped with sensors that measure the water height at regular moments in time."
People counting using visible and infrared images
(2020-10) Biagini, Martín; Filipic, Joaquín; Mas, Ignacio; Pose, Claudio D.; Giribet, Juan I.; Parisi, Daniel
"We propose the use of convolutional neural networks (CNN) for counting and positioning people in visible and infrared images. Our data set is made of semi-artificial images created from real photographs taken from a drone using a dual FLIR camera. We compare the performance between CNN’s using 3 (RGB) and 4 (RGB+IR) channels, both under different lighting conditions. The 4-channel network responds better in all situations, particularly in cases of poor visible illumination that can be found in night scenarios. The proposed methodology could be applied to real situations when an extensive databank of 4-channel images will be available."
Online analytical processsing on graph data
(2020) Gómez, Leticia Irene; Kuijpers, Bart; Vaisman, Alejandro Ariel
"Online Analytical Processing (OLAP) comprises tools and algorithms that allow querying multidimensional databases. It is based on the multidimensional model, where data can be seen as a cube such that each cell contains one or more measures that can be aggregated along dimensions. In a “Big Data” scenario, traditional data warehousing and OLAP operations are clearly not sufficient to address current data analysis requirements, for example, social network analysis. Furthermore, OLAP operations and models can expand the possibilities of graph analysis beyond the traditional graph-based computation. Nevertheless, there is not much work on the problem of taking OLAP analysis to the graph data model. This paper proposes a formal multidimensional model for graph analysis, that considers the basic graph data, and also background information in the form of dimension hierarchies. The graphs in this model are node- and edge-labelled directed multihypergraphs, called graphoids, which can be defined at several different levels of granularity using the dimensions associated with them. Operations analogous to the ones used in typical OLAP over cubes are defined over graphoids. The paper presents a formal definition of the graphoid model for OLAP, proves that the typical OLAP operations on cubes can be expressed over the graphoid model, and shows that the classic data cube model is a particular case of the graphoid data model. Finally, a case study supports the claim that, for many kinds of OLAP-like analysis on graphs, the graphoid model works better than the typical relational OLAP alternative, and for the classic OLAP queries, it remains competitive."
Low-cost robust estimation for the single-look 𝒢I0 model using the Pareto distribution
(2020) Chan, Debora; Rey, Andrea; Gambini, Juliana; Frery, Alejandro C.
"The statistical properties of Synthetic Aperture Radar (SAR) image texture reveal useful target characteristics. It is well-known that these images are affected by speckle and prone to extreme values due to double bounce and corner reflectors. The G0 I distribution is flexible enough to model different degrees of texture in speckled data. It is indexed by three parameters: α, related to the texture, γ , a scale parameter, and L, the number of looks. Quality estimation of α is essential due to its immediate interpretability. In this letter, we exploit the connection between the G0 I and Pareto distributions. With this, we obtain six estimators that have not been previously used in the SAR literature. We compare their behavior with others in the noisiest case for monopolarized intensity data, namely single look case. We evaluate them using Monte Carlo methods for noncontaminated and contaminated data, considering convergence rate, bias, mean squared error, and computational time. We conclude that two of these estimators based on the Pareto law are the safest choices when dealing with actual data and small samples, as is the case of despeckling techniques and segmentation, to name just two applications. We verify the results with an actual SAR image."
Analyzing the quality of Twitter data streams
(2020) Arolfo, Franco A.; Cortes Rodriguez, Kevin; Vaisman, Alejandro Ariel
"There is a general belief that the quality of Twitter data streams is generally low and unpredictable, making, in some way, unreliable to take decisions based on such data. The work presented here addresses this problem from a Data Quality (DQ) perspective, adapting the traditional methods used in relational databases, based on quality dimensions and metrics, to capture the characteristics of Twitter data streams in particular, and of Big Data in a more general sense. Therefore, as a first contribution, this paper re-defines the classic DQ dimensions and metrics for the scenario under study. Second, the paper introduces a software tool that allows capturing Twitter data streams in real time, computing their DQ and displaying the results through a wide variety of graphics. As a third contribution of this paper, using the aforementioned machinery, a thorough analysis of the DQ of Twitter streams is performed, based on four dimensions: Readability, Completeness, Usefulness, and Trustworthiness. These dimensions are studied for several different cases, namely unfiltered data streams, data streams filtered using a collection of keywords, and classifying tweets referring to different topics, studying the DQ for each topic. Further, although it is well known that the number of geolocalized tweets is very low, the paper studies the DQ of tweets with respect to the place from where they are posted. Last but not least, the tool allows changing the weights of each quality dimension considered in the computation of the overall data quality of a tweet. This allows defining weights that fit different analysis contexts and/or different user profiles. Interestingly, this study reveals that the quality of Twitter streams is higher than what would have been expected."

Examinar

Envíos recientes