# Seminario Núcleo Milenio MiDaS

Scientific seminar of the Millenium Nucleus Center for the Discovery of Structures in Complex Data (MiDaS). More information in: midas.mat.uc.cl
2020-06-17
15:00hrs.
Miguel de Carvalho. University of Edinburgh
Elements of Bayesian geometry
Zoom (Pedir link a Luis Gutiérrez)
Abstract:
In this talk, I will discuss a geometric interpretation to Bayesian inference that will yield a natural measure of the level of agreement between priors, likelihoods, and posteriors. The starting point for the construction of the proposed geometry is the observation that the marginal likelihood can be regarded as an inner product between the prior and the likelihood. A key concept in our geometry is that of compatibility, a measure which is based on the same construction principles as Pearson correlation, but which can be used to assess how much the prior agrees with the likelihood, to gauge the sensitivity of the posterior to the prior, and to quantify the coherency of the opinions of two experts. Estimators for all the quantities involved in our geometric setup are discussed, which can be directly computed from the posterior simulation output. Some examples are used to illustrate our methods, including data related to on-the-job drug usage, midge wing length, and prostate cancer. Joint work with G. L. Page and with B. J. Barney.
2020-05-27
15:00hrs.
Nicolás Kuschinski. Pontificia Universidad Católica de Chile
Grid-Uniform Copulas and Rectangle Exchanges: Model and Bayesian Inference Method for a Rich Class of Copula Functions
Zoom (Pedir link a Luis Gutiérrez)
Abstract:
We introduce a new class of copulas which we call Grid-Uniform Copulas. We show the richness of this class of copulas by proving that for any copula $C$ and any $\epsilon>0$ there is a Grid-Uniform Copula that approximates it within Hellinger distance $\epsilon$. We then proceed to show how Grid-Uniform Copulas can be used to create semiparametric models for multivariate data, and show an elegant way to perform MCMC sampling for these models.
2020-05-20
15:00hrs.
Mauricio Castro. Pontificia Universidad Católica de Chile
Automated learning of t factor analysis models with complete and incomplete data
Zoom (Pedir link a Luis Gutiérrez)
Abstract:
The t factor analysis (tFA) model is a promising tool for robust reduction of high-dimensional data in the presence of heavy-tailed noises. When determining the number of factors of the tFA model, a two-stage procedure is commonly performed in which parameter estimation is carried out for a number of candidate models, and then the best model is chosen according to certain penalized likelihood indices such as the Bayesian information criterion. However, the computational burden of such a procedure could be extremely high to achieve the optimal performance, particularly for extensively large data sets. In this paper, we develop a novel automated learning method in which parameter estimation and model selection are seamlessly integrated into a one-stage algorithm. This new scheme is called the automated tFA (AtFA) algorithm, and it is also workable when values are missing. In addition, we derive the Fisher information matrix to approximate the asymptotic covariance matrix associated with the ML estimators of tFA models. Experiments on real and simulated data sets reveal that the AtFA algorithm not only provides identical fitting results, as compared to traditional two-stage procedures, but also runs much faster, especially when values are missing.
2020-05-13
15:00hrs.
Freddy Palma Mancilla. Universidad Nacional Autónoma de México
Intertwinings for Markov branching processes
Zoom (Pedir link a Luis Gutiérrez)
Abstract:
Using a stochastic filtering framework we devise some intertwining relationships in the setting of Markov branching processes. One of our result turns out to be the basis of an exact simulation method for these kind of processes. Also, the population dynamic scheme inherent in the model helps to study the behavior of prolific individuals by observing the total size of the population. Moreover, we study a population with two types of immigrations, where it is observed the total immigration, and our objective is to study each immigration separately. This result allows to link continuous-time Markov chains with continuous-state branching (CB) processes.
2020-05-06
15:00hrs.
Luis Gutiérrez. Pontificia Universidad Católica de Chile
Bayesian nonparametric hypothesis testing procedures
Zoom (Pedir link a Luis Gutiérrez)
Abstract:
Scientific knowledge is firmly based on the use of statistical hypothesis testing procedures. A scientific hypothesis can be established by performing one or many statistical tests based on the evidence provided by the data. Given the importance of hypothesis testing in science, these procedures are an essential part of statistics. The literature of hypothesis testing is vast and covers a wide range of practical problems. However, most of the methods are based on restrictive parametric assumptions. In this talk, we will discuss Bayesian nonparametric approaches to construct hypothesis tests in different contexts. Our proposal resorts to the literature of model selection to define Bayesian tests for multiple samples, paired-samples, and longitudinal data analysis. Applications with real-life datasets and illustrations with simulated data will be discussed.
2020-04-29
15:00hrs.
Inés Varas. Pontificia Universidad Católica de Chile
Linking measurements: a Bayesian nonparametric approach
Zoom (Pedir link a Luis Gutiérrez)
Abstract:
Equating methods is a family of statistical models and methods used to adjust scores on different test forms so that scores can be comparable and used interchangeably. These methods lie on functions to transform scores on two or more versions of a test. Most of the proposed approaches for the estimation of these functions are based on continuous approximations of the score distributions, as they are most of the time, discrete functions. Considering scores as ordinal random variables, we propose a flexible dependent Bayesian nonparametric model for test equating. The new approach avoids continuous assumptions of the score distributions, in contrast to current equating methods. Additionally, it allows the use of covariates in the estimation of the score distribution functions, an approach not explored at all in the equating literature. Applications of the proposed model to real and simulated data under different sampling designs are discussed. Several methods are considered to evaluate the performance of our method and to compare it with current methods of equating. Respect to discrete versions of equated scores obtained from traditional equating methods,  results show that the proposed method has better performance.
2020-04-22
15:00hrs.
Diego Morales Navarrete. Pontificia Universidad Católica de Chile
On modeling and estimating geo-referenced count spatial data
Zoom (Pedir link a Luis Gutiérrez)
Abstract:

Modeling spatial data is a challenging task in statistics. In many applications, the observed data can be modeled using Gaussian, skew-Gaussian or even restricted random field models. However, in several fields, such as population genetics, epidemiology and aquaculture, the data of interest are often count data, and therefore the mentioned models are not suitable for their analysis. Consequently, there is a need for spatial models that are able to properly describe data coming from counting processes. Commonly three approaches are used to model this type of data: GLMMs with gaussian random field (GRF) effects, hierarchical models, and copula models. Unfortunately, these approaches do not give an explicit characterization of the count random field like their q-dimensional distribution or correlation function. It is important to stress that GLMMs and hierarchical models induces a discontinuity in the path. Therefore, samples located nearby are more dissimilar in value than in the case when the correlation function is continuous at the origin. Moreover, there are cases in which the copula representation for discrete distributions is not unique, so it is unidentifiable. Hence to deal with this, we propose a novel approach to model spatial count data in an efficient and accurate manner. Briefly, starting from independent copies of a “parent” gaussian random field, a set of transformations can be applied, and the result is a non-Gaussian random field. This approach is based on the characterization of count random fields that inherit the well-known geometric properties from Gaussian random fields.

2020-01-29
12:00 hrs.
José Quinlan. Pontificia Universidad Católica de Chile
On the Support of Yao-based Random Ordered Partitions for Change-Point Analysis
Sala 1, Facultad de Matemáticas
Abstract:

In Bayesian change-point analysis for univariate time series, prior distributions on the set of ordered partitions play a key role for change-point detection. In this context, mixtures of product partition models based on Yao's cohesion are very popular due to their tractability and simplicity. However, how flexible are these prior processes to describe different beliefs about the number and locations of change-points? In this talk I will address the previous question in terms of its weak support.

2020-01-22
12:00 hrs.
Miles Ott. Smith College
Respondent-Driven Sampling: Challenges and Opportunities
Sala 1, Facultad de Matemáticas
Abstract:
Respondent-driven sampling leverages social networks to sample hard-to-reach human populations, including among those who inject drugs, sexual minority, sex worker, and migrant populations.  As with other link-tracing sampling strategies, sampling involves recruiting a small convenience sample, who invite their contacts into the sample, and in turn invite their contacts until the desired sample size is reached. Typically, the sample is used to estimate prevalence, though multivariable analyses of data collected through respondent-driven sampling are becoming more common. Although respondent-driven sampling may allow for quickly attaining large and varied samples, its reliance on social network contacts, participant recruitment decisions, and self-report of ego-network size makes it subject to several concerns for statistical inference.  After introducing respondent-driven sampling I will discuss how these data are actually being collected and analyzed, and opportunities for statisticians to improve upon this widely-adopted method.
2020-01-15
12:00 hrs.
Nicolas Kuschinski. Pontificia Universidad Católica de Chile
FATSO: Una familia de operadores para selección de variables en modelos lineales
Sala 1, Facultad de Matemáticas
Abstract:
En modelos lineales es común encontrarse con situaciones donde varios de los coeficientes de regresión son 0. En estas situaciones, una herramienta común es un operador de selección de variables de tipo "sparsity promoting". El más común de estos operadores es el LASSO, el cual promueve estimaciones en 0. Sin embargo, el LASSO y sus derivados dan poco en términos de parámetros fácilmente interpretables para controlar el grado de selectividad. En esta plática se propondrá una nueva familia de operadores de selección, la cual toma como base la geometría del LASSO, pero que tienen forma analítica distinta, y que dan una manera fácilmente interpretable de controlar el grado de selectividad. Estos operadores corresponden con densidades a priori propias, y por ende se pueden usar para hacer inferencia Bayesiana.