    Analyzing public transport in the city of Buenos Aires with MobilityDB
    (2022) Godfrid, Juan; Radnic, Pablo; Vaisman, Alejandro Ariel; Zimányi, Esteban
    "The General Transit Feed Specification (GTFS) is a data format widely used to share data about public transportation schedules and associated geographic information. GTFS comes in two versions: GTFS Static describing the planned itineraries and GTFS Realtime describing the actual ones. MobilityDB is a novel and free open-source moving object database, developed as a PostgreSQL and PostGIS extension, that adds spatial and temporal data types along with a large number of functions, that facilitate the analysis of mobility data. Loading GTFS data into MobilityDB is a quite complex task that, nevertheless, must be done in an ad-hoc fashion. This work describes how MobilityDB is used to analyze public transport mobility in the city of Buenos Aires, using both, static and real-time GTFS data for the Buenos Aires public transportation system. Visualizations are also produced to enhance the analy-sis. To the authors’ knowledge, this is the first attempt to analyze GTFS data with a moving object database."
    Analyzing the quality of Twitter data streams
    (2022) Arolfo, Franco; Cortés Rodriguez, Kevin; Vaisman, Alejandro Ariel
    "There is a general belief that the quality of Twitter data streams is generally low and unpredictable, making, in some way, unreliable to take decisions based on such data. The work presented here addresses this problem from a Data Quality (DQ) perspective, adapting the traditional methods used in relational databases, based on quality dimensions and metrics, to capture the characteristics of Twitter data streams in particular, and of Big Data in a more general sense. Therefore, as a first contribution, this paper re-defines the classic DQ dimensions and metrics for the scenario under study. Second, the paper introduces a software tool that allows capturing Twitter data streams in real time, computing their DQ and displaying the results through a wide variety of graphics. As a third contribution of this paper, using the aforementioned machinery, a thorough analysis of the DQ of Twitter streams is performed, based on four dimensions: Readability, Completeness, Usefulness, and Trustworthiness. These dimensions are studied for several different cases, namely unfiltered data streams, data streams filtered using a collection of keywords, and classifying tweets referring to different topics, studying the DQ for each topic. Further, although it is well known that the number of geolocalized tweets is very low, the paper studies the DQ of tweets with respect to the place from where they are posted. Last but not least, the tool allows changing the weights of each quality dimension considered in the computation of the overall data quality of a tweet. This allows defining weights that fit different analysis contexts and/or different user profiles. Interestingly, this study reveals that the quality of Twitter streams is higher than what would have been expected."
    A data model and query language for spatio-temporal decision support
    (2010) Gómez, Leticia Irene; Kuijpers, Bart; Vaisman, Alejandro Ariel
    "In recent years, applications aimed at exploring and analyzing spatial data have emerged, powered by the increasing need of software that integrates Geographic Information Systems(GIS) and On-Line Analytical Processing (OLAP). These applications have been called SOLAP (Spatial OLAP). In previous work, the authors have introduced Piet, a system based on a formal data model that integrates in a single framework GIS, OLAP (On-Line Analytical Processing), and Moving Object data. Real-world problems are inherently spatio-temporal. Thus, in this paper we present a data model that extends Piet, allowing tracking the history of spatial data in the GIS layers. We present a formal study of the two typical ways of intro ducing time into Piet: timestamping the thematic layers in the GIS, and timestamping the spatial objects in each layer. We denote these strategies snapshot-based and timestamp-based representations, respectively, following well-known terminology borrowed from temporal databases. We present and discuss the formal model for both alternatives. Based on the timestamp-based representation, we introduce a formal First-Order spatio-temporal query language, which we denote Lt, able to express spatio-temporal queries over GIS, OLAP, and trajectory data. Finally, we discuss implementation issues, the update operators that must be supported by the model, and sketch a temporal extension to Piet-QL, the SQL-like query language that supports Piet."
    Texture descriptors for robust SAR image segmentation
    (2022-12-28) Rey, Andrea; Gambini, Juliana; Delrieux, Claudio
    "SAR (synthetic aperture radar) and PolSAR (polarimetric synthetic aperture radar) images fulfill a fundamental role in Earth observation, due to their advantages over optical images. However, the presence of speckle noise hinders their automatic interpretation and unsupervised use, rendering traditional segmentation tools ineffective. For this reason, advanced image segmentation models are sought to overcome the limitations that make an adequate treat ment of speckled images difficult. We propose a procedure for SAR and PolSAR image clas sification, based on texture descriptors, that combines fractal dimension, a specific probability distribution function, Tsallis entropy, and the entropic index. A vector of local texture features is built using a set of reference regions, then a support vector machine classifier is applied. The proposed algorithm is tested with synthetic and actual monopolarimetric and polarimetric SAR imagery, exhibiting visually remarkable and robust results in coincidence with quantitative qual ity metrics as accuracy and F1-score."
    Object detection and statistical analysis of microscopy image sequences
    (2022) Hurovitz, Sasha Ivan; Chan, Debora; Ramele, Rodrigo; Gambini, Juliana
    "Confocal microscope images are wide useful in medical diagnosis and research. The automatic interpretation of this type of images is very important but it is a challenging endeavor in image processing area, since these images are heavily contaminated with noise, have low contrast and low resolution. This work deals with the problem of analyzing the penetration velocity of a chemotherapy drug in an ocular tumor called retinoblastoma. The primary retinoblastoma cells cultures are exposed to topotecan drug and the penetration evolution is documented by producing sequences of microscopy images. It is possible to quantify the penetration rate of topotecan drug because it produces fluorescence emission by laser excitation which is captured by the camera. In order to estimate the topotecan penetration time in the whole retinoblastoma cell culture, a procedure based on an active contour detection algorithm, a neural network classifier and a statistical model and its validation, is proposed. This new inference model allows to estimate the penetration time. Results show that the penetration mean time strongly depends on tumorsphere size and on chemotherapeutic treatment that the patient has previously received."