Ingeniería Informática
Examinar
Envíos recientes
- Artículo de Publicación PeriódicaUser-centered road network traffic analysis with MobilityDB(2022)"Performance indicators of road networks are a long-lasting topic of research. Existing schemes assess network properties such as the average speed on road segments and the queuing time at intersections. The increasing availability of user trajectories, collected mainly using mobile phones with a variety of applications, creates opportunities for developing user-centered performance indicators. Performing such an analysis on big trajectory data sets remains a challenge for the existing data management systems, because they lack support for spatiotemporal trajectory data. This article presents an end-to-end solution, based on MobilityDB, a novel moving object database system that extends PostgreSQL with spatiotemporal data types and functions. A new class of indicators is proposed, focused on the users' experience. The indicators address the network design, the traffic flow, and the driving comfort of the motorists. Furthermore, these indicators are expressed as analytical MobilityDB queries over a big set of real vehicle trajectories."
- Artículo de Publicación PeriódicaATR: Template-based repair for alloy specifications(2022)"Automatic Program Repair (APR) is a practical research topic that studies techniques to automatically repair programs to fix bugs. Most existing APR techniques are designed for imperative programming languages, such as C and Java, and rely on analyzing correct and incorrect executions of programs to identify and repair suspicious statements. We introduce a new APR approach for software specifications written in the Alloy declarative language, where specifications are not “executed”, but rather converted into logical formulas and analyzed using backend constraint solvers, to find specification instances and counterexamples to assertions. We present ATR, a technique that takes as input an Alloy specification with some violated assertion and returns a repaired specification that satisfies the assertion. The key ideas are (i) analyzing the differences between counterexamples that do not satisfy the assertion and instances that do satisfy the assertion to guide the repair and (ii) generating repair candidates from specific templates and pruning the space of repair candidates using the counterexamples and satisfying instances. Experimental results using existing large Alloy benchmarks show that ATR is effective in generating difficult repairs. ATR repairs 66.3% of 1974 fault specifications, including specification repairs that cannot be handled by existing Alloy repair techniques."
- Artículo de Publicación PeriódicaStatistical properties of the entropy from ordinal patterns(2022)"The ultimate purpose of the statistical analysis of ordinal patterns is to characterize the distribution of the features they induce. In particular, knowing the joint distribution of the pair entropy-statistical complexity for a large class of time series models would allow statistical tests that are unavailable to date. Working in this direction, we characterize the asymptotic distribution of the empirical Shannon’s entropy for any model under which the true normalized entropy is neither zero nor one. We obtain the asymptotic distribution from the central limit theorem (assuming large time series), the multivariate delta method, and a third-order correction of its mean value. We discuss the applicability of other results (exact, first-, and second-order corrections) regarding their accuracy and numerical stability. Within a general framework for building test statistics about Shannon’s entropy, we present a bilateral test that verifies if there is enough evidence to reject the hypothesis that two signals produce ordinal patterns with the same Shannon’s entropy. We applied this bilateral test to the daily maximum temperature time series from three cities (Dublin, Edinburgh, and Miami) and obtained sensible results."
- Artículo de Publicación PeriódicaAnalyzing public transport in the city of Buenos Aires with MobilityDB(2022)"The General Transit Feed Specification (GTFS) is a data format widely used to share data about public transportation schedules and associated geographic information. GTFS comes in two versions: GTFS Static describing the planned itineraries and GTFS Realtime describing the actual ones. MobilityDB is a novel and free open-source moving object database, developed as a PostgreSQL and PostGIS extension, that adds spatial and temporal data types along with a large number of functions, that facilitate the analysis of mobility data. Loading GTFS data into MobilityDB is a quite complex task that, nevertheless, must be done in an ad-hoc fashion. This work describes how MobilityDB is used to analyze public transport mobility in the city of Buenos Aires, using both, static and real-time GTFS data for the Buenos Aires public transportation system. Visualizations are also produced to enhance the analy-sis. To the authors’ knowledge, this is the first attempt to analyze GTFS data with a moving object database."
- Artículo de Publicación PeriódicaAnalyzing the quality of Twitter data streams(2022)"There is a general belief that the quality of Twitter data streams is generally low and unpredictable, making, in some way, unreliable to take decisions based on such data. The work presented here addresses this problem from a Data Quality (DQ) perspective, adapting the traditional methods used in relational databases, based on quality dimensions and metrics, to capture the characteristics of Twitter data streams in particular, and of Big Data in a more general sense. Therefore, as a first contribution, this paper re-defines the classic DQ dimensions and metrics for the scenario under study. Second, the paper introduces a software tool that allows capturing Twitter data streams in real time, computing their DQ and displaying the results through a wide variety of graphics. As a third contribution of this paper, using the aforementioned machinery, a thorough analysis of the DQ of Twitter streams is performed, based on four dimensions: Readability, Completeness, Usefulness, and Trustworthiness. These dimensions are studied for several different cases, namely unfiltered data streams, data streams filtered using a collection of keywords, and classifying tweets referring to different topics, studying the DQ for each topic. Further, although it is well known that the number of geolocalized tweets is very low, the paper studies the DQ of tweets with respect to the place from where they are posted. Last but not least, the tool allows changing the weights of each quality dimension considered in the computation of the overall data quality of a tweet. This allows defining weights that fit different analysis contexts and/or different user profiles. Interestingly, this study reveals that the quality of Twitter streams is higher than what would have been expected."