Seminario de Estadística


2017-12-06
12:00hrs.
Guillermina Eslava. Departamento de Matemáticas, Facultad de Ciencias, Unam
Modelos Log-Lineales, la Regresión Logística y Las Redes Bayesianas, Su Relación y Aplicación
Sala 1, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:

Los modelos log-lineales son modelos estadísticos útiles para explicar la interrelación entre un conjunto de variables aleatorias discretas $X = (X_1, \ldots, X_p)$. El modelo se expresa a través del logaritmo de la función de distribución conjunta, $\log(P(X_1 = x_1,\ldots,X_p = x_p))$. Si una de estas variables, digamos $X_1: 0,1$, es considerada como variable respuesta y el resto como explicativas, podemos considerar a un modelo de regresión logística para modelar el logaritmo del momio $log(P (X_1 = 1|X_2, \ldots, X_p)/P(X_1 = 0|X_2,\ldots, X_p))$.

Estos dos modelos guardan una relación estrecha en el sentido que a través de cada uno de ellos puede expresarse la probabilidad condicional de $X_1$ dado el resto de las variables, $P (X_1 = 1|X_2 = x_2 ,\ldots, X_p = x_p )$. La probabilidad condicional $P (X_1 = 1|X_2 = x_2,\ldots,X_p = x_p)$ derivada del modelo log-lineal y la derivada de la regresión logística en general no son iguales.

En esta plática damos la condición bajo la cual esta probabilidad condicional es la misma bajo un modelos log-lineales jerárquico y bajo una regresión logística. Se presenta un ejemplo de aplicación que ilustra la utilidad de los modelos y la relación que guardan entre ellos. Adicionalmente presentamos los mismos ejemplos desde la perspectiva de los modelos gráficos dirigidos probabilísticos, también conocidos como redes Bayesianas. 

2017-12-04
12:00hrs.
Mogens Bladt. Institute for Applied Mathematics and Systems, Unam, México
Fitting Phase-Type Scale Mixtures To Heavy-Tailed Data and Distributions
Sala 1, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:

If X has a phase–type distribution and N is any positive discrete random variable, then we say that the distribution of X · N belongs to the class of NPH distributions. Such distributions preserve the tractability and generality of phase–type distributions (often allowing for explicit solutions to stochastic models and being dense in the class of distri- butions on the positive reals) but with a different tail behaviour which is basically dictated by the tail of N. We thereby gain a tool for specifying distributions with a “body” shaped by X and with a tail defined by N. After reviewing the construction and basic properties of distributions from the NPH class, we will consider the problem of their estimation. To this end we will employ the EM algorithm, using a similar method as for finite–dimensional phase–type distributions. We consider the the fitting of a NPH distribution to observed data, (left-,right and interval-) censored data, theoretical distributions, histograms, and a couple of examples. 

2017-11-24
12:00hrs.
Juan F. Olivares. Universidad de Atacama
Inferencia y Reducción de Sesgo en el Modelo Estructural Normal y Elíptico
Sala 2, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:

En este trabajo estudiamos los modelos estructurales eli?pticos siguiendo el enfoque de que los errores son independientes y tienen distribucio?n marginal eli?ptica. Adoptaremos un enfoque estudiado por Gleser (1992), para el cual obtenemos una representacio?n alternativa y conveniente a la forma usual de la funcio?n de log-verosimilitud. Esta representacio?n involucra una nueva parametrizacio?n la cual produce un modelo ma?s parsimonioso, transformando el modelo con medida de error cla?sico en un modelo de regresio?n con matriz de disen?o aleatoria usual, pero heterosceda?stico. Adema?s, esta parametrizacio?n tiene la ventaja de establecer una u?til conexio?n entre los modelos de regresio?n multivariado usuales y los modelos multivariados con medida de error. Los problemas de identificacio?n presentes en los modelos estructurales normales se tienen tambie?n en los modelos estructurales eli?pticos. Por tanto, supuestos adicionales son necesarios para hacer el problema de estimacio?n factible. Estos supuestos pueden ser considerados como esta?ndar en la literatura de modelos con medida de error. Por otro lado, determinamos el vector de score y la matriz de informacio?n esperada para el modelo reparametrizado. Se dan expresiones simples para calcular los elementos de la matriz de informacio?n, en el cual so?lo algunos momentos univariados deben ser calculados nume?ricamente. Una ilustracio?n de las distribuciones eli?pticas es considerada. Finalmente, consideramos en el contexto de modelos con medida de error, los estimadores de ma?xima verosimilitud son sesgados, entonces implementamos dos algoritmos para corregir el sesgo de las estimaciones. En particular, consideramos el me?todo de reduccio?n de sesgo propuesto por Firth (1993), el cual tiene la ventaja de que este no depende directamente de la existencia de los estimadores de ma?xima verosimilitud. 

2017-11-10
12:00hrshrs.
Daniel Taylor Rodriguez. Portland State University
Análisis Bayesiano Intrínseco Para Modelos de Ocupancia
Sala 2, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:
Los modelos de ocupancia son utilizados para determinar la probabilidad de que una especie este presente en una locación controlando por el efecto de la detección imperfecta.  Los datos recolectados para este tipo de análisis incluyen multiples predictores con los cuales se pretende caracterizar la viabilidad del habitat, al igual que la facilidad con que la especie de interés puede ser detectada.  Sin embargo no todos los predictores recolectados resultan ser relevantes, por lo cual es necesario identificar las variables que contienen algún valor explicativo.  La practica usual consiste en utilizar el criterio de Akaike (AIC).  Dada la ausencia de alternativas adaptadas a este tipo de respuesta proponemos la primera estrategia Bayesiana no-informativa. Construimos priors en el espacio de parámetros basados en la metodología de los "intrinsic priors", e incorporamos priors en el espacio de modelos que controlan por "multiple-testing" y respetan el orden jerárquico de los predictors cuando interacciones y términos de orden dos (o mayores) son considerados.   El método controla adecuadamente la inclusion de falsos positivos sin comprometer su habilidad para identificar predictores relevantes.  Validamos el desempeño de la metodología a través de simulaciones y utilizando datos reales.
2017-11-03
12:00hrs.
Brajendra Sutradhar. Carleton University, Ottawa, Canada and Memorial University, St. John's, Canada
Semi-Parametric Models for Longitudinal Count, Binary and Multinomial Data
Sala 2, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:

In this talk, I will demonstrate the use of dynamic models to fit (1) longitudinal count responses such as repeated number of yearly physician visits by an individual; (2) longitudinal binary responses such as repeated asthma status (yes or no) of an individual over several months, and (3) longitudinal multinomial responses such as repeated stress levels (low, medium, high) of an individual worker over a period of few years. These dynamic models are developed to accommodate the correlations of the repeated responses and then to find out the regression effects of certain primary covariates on the repeated responses. In some situations, it may be necessary to add a non-parametric function in certain secondary covariates to the regression function. The ex- tended models are referred to as the longitudinal semi-parametric models. Estimation theory for the model parameters and the non-parametric functions will be discussed. The models and the inference methodologies will also be illustrated by numerical examples. 

2017-10-23
12:00hrs.hrs.
Claudia Wehrhahn. University of California, Santa Cruz
A Bayesian Non Parametric Approach for Human Mobility Modeling
Sala 5, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:
Mobility models are widely used in disciplines as diverse as engineering, computer science, sociology, and ecology.  For instance, human mobility models are a key tool in the design and evaluating of wireless networks protocols.   There is a vast literature on human mobility models, but most proposed models rely on relatively simple assumptions about human behavior.  In this talk, we present a Bayesian non parametric  model  for human mobility. First, we discuss a non-homogeneous time dependent Poisson process in which the intensity function is modeled using multivariate Bernstein polynomials.  Then we discuss how to incorporate human interactions into the model, by means of repulsive Matérn point processes.  The performance of the model is illustrated in simulated and real traces for groups of individuals collected by GPS.
2017-10-06
12:00hrs.
Tamara Broderick. Massachusetts Institute of Technology
Fast Quantification of Uncertainty and Robustness With Variational Bayes
Sala 2, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:
In Bayesian analysis, the posterior follows from the data and a choice of a prior and a likelihood. These choices may be somewhat subjective and reasonably vary over some range. Thus, we wish to measure the sensitivity of posterior estimates to variation in these choices. While the field of robust Bayes has been formed to address this problem, its tools are not commonly used in practice. We demonstrate that variational Bayes (VB) techniques are readily amenable to fast robustness analysis. Since VB casts posterior inference as an optimization problem, its methodology is built on the ability to calculate derivatives of posterior quantities with respect to model parameters. We use this insight to develop local prior robustness measures for mean-field variational Bayes (MFVB), a particularly popular form of VB due to its fast runtime on large data sets. A potential problem with MFVB is that it has a well-known major failing: it can severely underestimate uncertainty and provides no information about covariance. We generalize linear response methods from statistical physics to deliver accurate uncertainty estimates for MFVB---both for individual variables and coherently across variables. We call our method linear response variational Bayes (LRVB).
2017-09-08
12:00hrs.
Daniela Castro. King Abdullah University of Science and Technology (Saudi Arabia)
Spatial Analysis of U.s. Precipitation Extremes: a Local Likelihood Approach for Estimating Complex Tail Dependence Structures in High Dimensions
Sala 2, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:
In order to model the complex non-stationary dependence structure of precipitation extremes over the entire contiguous U.S., we propose a flexible local approach based on factor copula models. Specifically, by using Gaussian location mixture processes, we assume that there is a common, unobserved, random factor affecting the joint dependence of all measurements in small regional neighborhoods. Choosing this common factor to be exponentially distributed, one can show that the joint upper tail of the resulting copula is asymptotically equivalent to the (max-stable) Hüsler-Reiss copula or its Pareto process counterpart; therefore, the so-called exponential factor copula model captures tail dependence, but unlike the latter, has weakening dependence strength as events become more extreme, a feature commonly observed with precipitation data. In order to describe the stochastic behavior of extreme precipitation events over the U.S., we embed the exponential factor model in a more general non-stationary model, but we fit its local stationary counterpart to high threshold exceedances under the assumption of local stationarity. This allows us to gain in flexibility, while making inference for such a large and complex dataset feasible. Adopting a local censored likelihood approach, inference is made on a fine spatial grid, and local model fitting is performed taking advantage of distributed computing resources and of the embarrassingly parallel nature of this estimation method. The local model is efficiently fitted at all grid points, and uncertainty is measured using a block bootstrap procedure. Simulation results show that our approach is able to adequately capture complex dependencies on a local scale, therefore providing valuable input for regional risk assessment. Additionally, our data application shows that the model is able to flexibly represent extreme rainfall characteristics on a continental scale. A comparison between past and current U.S. rainfall data suggests that extremal dependence might be stronger nowadays than during the first half of the twentieth century in some areas, which has important implications on regional flood risk assessment.
2017-07-07
12:00hrs.
Gabriel Muñoz. Pontificia Universidad Católica de Chile
Identificación en Modelos Tipo Rasch, Convirtiendo 1Pl-Ag en 2*1Pl
Sala 2, Facultad de Matemáticas
Abstract:

La identificabilidad en los modelos tipo Rasch es de suma importancia ya que establece una biyección entre los parámetros identifi cados por naturaleza y los parámetros de interés. Mientras en algunos de estos modelos ha sido posible identifi car los parámetros de interés (1PL, 1PL-G 2PL), en otros aun es un problema abierto (1PL-AG, 3PL). Las técnicas usualmente usadas para resolver estos problemas relacionan los parámetros de interés entre si, para luego fijar restricciones sobre el espacio paramétrico. En esta charla, se mostrará una forma de relacionar modelos distintos (1PL-AG con 1PL y 3PL con 2PL) para buscar la identi ficación de los parámetros de interés a partir de resultados obtenidos en modelos más sencillos.

2017-06-30
12:00 hrshrs.
Giovanni Motta. Pontificia Universidad Católica de Chile
Cholesky Decomposition for Time-Varying Covariances and Auto-Covariances
Sala 2, Facultad de Matemáticas
Abstract:

In this work we introduce a positive definite non-parametric estimator of the covariance matrix while permitting different smoothing parameters. This estimator is based on the Cholesky decomposition of a pre-estimator of the covariance matrix.

Kernel-type smoothers of the cross-products are based on the same bandwidth for all the variances and co-variances. Though this approach guarantees positive semi-definiteness of the estimated covariance matrix, the use of one common bandwidth for all the entries might be restrictive, as variances and co-variances are in general characterized by different degrees of smoothness. On the other hand, a kernel-type estimator based on different smoothing parameters may deliver an estimated matrix which is not necessarily positive semi-definite. The estimator we propose in this paper is based on the Cholesky decomposition of a preliminary raw estimate of the local covariances. This new approach allows for different smoothing bandwidths while preserving positive definiteness. This approach is particularly appealing for locally stationary time series. In particular, we address the estimation problems for two types of time-varying covariance matrices: the contemporaneous covariance matrix of a multivariate non-stationary time series, and the auto-covariance matrix of a univariate non-stationary time series.

2017-06-23
12:00hrs.
Carlos Sing-Long. Ingeniería Matemática y Computacional Escuela de Ingeniería
Computational Methods for Unbiased Risk Estimation
Sala 2, Facultad de Matemáticas
Abstract:

In many engineering applications, one seeks to estimate, recover or reconstruct an unknown object of interest from an incomplete set of linear measurements. Mathematically, the unknown object can be represented as the solution to an underdetermined system of linear equations. In recent years it has been shown that it is possible to recover the true object by exploiting a priori information about its structure, such as sparsity in compressed sensing or low-rank in matrix completion. However, in practice the measurements are corrupted by noise and exact recovery is not possible.

A popular approach to address this issue is to solve an unconstrained convex optimization problem to obtain an estimate that both explains the measurements and resembles the known structural characteristics of the true object. The objective function quantifies the trade-off between data fidelity and structural fidelity, which is usually controlled by a single regularization parameter. One possible criterion for selecting the value of this parameter is to minimize an unbiased estimate for the prediction error as a surrogate for the true prediction risk. Unfortunately, evaluating this estimate requires an expression for the weak divergence of the predicted observations. Therefore, it is necessary to characterize the regularity of the solution to the convex optimization problem with respect to the measurements.

In this talk I will present a conceptual and practical framework to study the regularity of the solution to a popular class of such optimization problems. The approach consists of using an auxiliary optimization problem that characterizes the smoothness of the predicted observations. In particular, we can relate the analytic singularities of the predicted observations with the geometric singularities of the feasible set to this problem. I will then present a disciplined approach for obtaining closed-form expressions for the derivatives of the predicted measurements that are amenable to computation. Finally, I will explain how the expressions establish a connection between the geometry of the convex optimization problem and the unbiased estimate for the prediction risk.

2017-06-09
12:00 hrs.
Ricardo Bórquez. Departamento de Economía Agraria, Facultad de Agronomía e Ingeniería Forestal
Financial Markets: Controversial Or Untestable Theories?
Sala 2, Facultad de Matemáticas
Abstract:

This paper demonstrates a measurability property of no-arbitrage prices, which precludes martingale law identification based on price information. Theoretical model hypotheses defined on martingale laws  are therefore untestable using prices. Examples of such untestable models are  found in the  literature of rational  bubbles, efficient  markets, and more  recently, of arbitrage pricing.

2017-06-02
12:00hrs.
Mauricio Castro. Universidad de Concepción
Recent Advances in Censored Regression Models
Sala 2, Facultad de Matemáticas
Abstract:

Recently, the study of statistical models where the dependent variable is censored has been studied in different fields, namely, econometric analysis, clinical assays and biostatistical analysis among others. In practice, it is quite common assume Gaussianity for the random components of the model due mainly to the computational flexibility for parameter estimation.

However, such an assumption may not be realistic. In fact, the likelihood-based and Bayesian inferences for censored models can be seriously affected by the presence of atypical observations, skewness and/or the misspecification of the distributions for random terms.

The objective of this talk is to provide a review about statistical models for censored data using non-Gaussian families of distributions to model human immunodeficiency virus (HIV) dynamics. 

2017-05-26
12:00 hrshrs.
Nedret Billor. Department of Mathematics and Statistics Auburn University
Robust Inference in Functional Data Analysis
Sala 2, Facultad de Matemáticas
Abstract:

In the last twenty years, a substantial amount of attention has been drawn to the field of functional data analysis. While the study of the probabilistic tools for infinite dimensional variables started in the beginning of the 20th century, the development of statistical models and methods for functional data has only really been developed in the last two decades since many scientific fields involving applied statistics have started measuring and recording massive continuous data due to rapid technological advancements. The methods developed in this field mainly require homogeneity of functional data, namely free of outliers. However, the development of methods in the presence of outliers has just been recently studied. In this talk, we focus on the effect of outliers on functional data analysis techniques. Then we introduce robust estimation and variable selection methods for a special functional regression model as well as simultaneous confidence band for the Mean Function of functional data. Simulation studies and data applications are presented to compare the performance of the proposed methods with the non?robust techniques. 

2017-05-19
12:00 hrs.
Isabelle Beaudry. Pontificia Universidad Católica de Chile
Correcting for Nonrandom Recruitment With Rds Data: Design-Based and Bayesian Approaches
Sala 2, Facultad de Matemáticas
Abstract:

Respondent-driven sampling (RDS) is a sampling mechanism that has proven very effective to sample hard-to-reach human populations connected through social networks. A small number of individuals typically known to the researcher are initially sampled and asked to recruit a small fixed number of their contacts who are also members of the target population. Each subsequent sampling waves are produced by peer recruitment until a desired sample size is achieved. However, the researcher's lack of control over the sampling process has posed several challenges to producing valid statistical inference from RDS data. For instance, participants are generally assumed to recruit completely at random among their contacts despite the growing empirical evidence that suggests otherwise and the substantial sensitivity of most RDS estimators to this assumption. The main contributions of this study are to parameterize alternative recruitment behaviors and propose a design-based and a model-based (Bayesian) estimators to correct for nonrandom recruitment.

2017-04-21
12:00hrs.
Felipe Osorio. Instituto de Estadística, Pontificia Universidad Católica de Valparaíso
Test Gradiente Para Extremum Estimators
Sala 2, Facultad de Matemáticas
Abstract:

En este trabajo se introduce el test gradiente propuesto por Terrell [Comp. Sci. Stat. 34: 206-215, 2002] al contexto de estimadores que surgen como el extremo de una función objetivo, esta clase general de estimadores frecuentemente conocidos como “extremum estimators” proveen un marco general para el estudio de distintos procedimientos de estimación que comparten principios comunes. En esta charla, nos enfocamos principalmente en abordar test de hipótesis no lineales así como de la aplicación del test gradiente en diagnóstico de influencia. La metodología es aplicada para determinar la igualdad entre razones de Sharpe asociado a las rentabilidades de los fondos de pensiones desde el sistema previsional chileno. 

2017-03-06
12:00hrs.
Garritt Page. Brigham Young University
Estimation and Prediction in The Presence of Spatial Confounding for Spatial Linear Models
Sala 2, Facultad de Matemáticas
Abstract:

In studies that produce data with spatial structure it is common that covariates of interest vary spatially in addition to the error. Because of this,  the error and covariate are often correlated. When this occurs it is difficult to distinguish the covariate effect from residual spatial variation.  In an iid normal error setting, it is well known that this type of correlation produces biased coefficient estimates but predictions remain unbiased.  In a spatial setting  recent studies have shown that coefficient estimates remain biased, but spatial prediction has not been addressed. The purpose of this paper is to provide a more detailed study of coefficient estimation from spatial models when covariate and error are correlated and then begin a formal study regarding spatial prediction. This is carried out by investigating properties of the generalized least squares estimator and the best linear unbiased predictor when a spatial random effect and a covariate are jointly modeled. Under this setup we demonstrate that the mean squared prediction error is possibly reduced when covariate and error are correlated.  

2017-01-11
12:00hrs.
Marc G. Genton. King Abdullah University of Science and Technology (Kaust), Saudi Arabia
Computational Challenges With Big Environmental Data
Sala 2, Facultad de Matemáticas
Abstract:

Two types of computational challenges arising from big environmental data are discussed. The first type occurs with multivariate or spatial extremes. Indeed, inference for max-stable processes observed at a large collection of locations is among the most challenging problems in computational statistics, and current approaches typically rely on less expensive composite likelihoods constructed from small subsets of data. We explore the limits of modern state-of-the-art computational facilities to perform full likelihood inference and to efficiently evaluate high-order composite likelihoods. With extensive simulations, we assess the loss of information of composite likelihood estimators with respect to a full likelihood approach for some widely-used multivariate or spatial extreme models. The second type of challenges occurs with the emulation of climate model outputs. We consider fitting a statistical model to over 1 billion global 3D spatio-temporal temperature data using a distributed computing approach. The statistical model exploits the gridded geometry of the data and parallelization across processors. It is therefore computationally convenient and allows to fit a non-trivial model to a data set with a covariance matrix comprising of 10^{18} entries. We provide 3D visualization of the results. The talk is based on joint work with Stefano Castruccio and Raphael Huser.

 

2017-01-11
11:00hrs.
Ying Sun. King Abdullah University of Science and Technology (Kaust), Saudi Arabia
Total Variation Depth for Functional Data
Sala 2, Facultad de Matemáticas
Abstract:

There has been extensive work on data depth-based methods for robust multivariate data analysis. Recent developments have moved to infinite-dimensional objects such as functional data. In this work, we propose a new notion of depth, the total variation depth, for functional data. As a measure of depth, its properties are studied theoretically, and the associated outlier detection performance is investigated through simulations. Compared to magnitude outliers, shape outliers are often masked among the rest of samples and harder to identify. We show that the proposed total variation depth has many desirable features and is well suited for outlier detection. In particular, we propose to decompose the total variation depth into two components that are associated with shape and magnitude outlyingness, respectively. This decomposition allows us to develop an effective procedure for outlier detection and useful visualization tools, while naturally accounting for the correlation in functional data. Finally, the proposed methodology is demonstrated using real datasets of curves, images, and video frames. The talk is based on joint work with Huang Huang.

2016-12-16
12:00hrs.
Fernanda de Bastiani. Pontificia Universidad Católica de Chile
Flexible Regression and Smoothing: Gaussian Markov Random Field Models in Gamlss
Sala 2, Facultad de Matemáticas
Abstract:

This work describes a brief history about GAMLSS and the modelling and fitting of Gaussian Markov random field components within a GAMLSS model. This allows modelling of any or all the parameters of the distribution for the response variable using explanatory variables and spatial effects. The response variable distribution is allowed to be a non-exponential family distribution. A new package developed in R to achieve this is presented. We use Gaussian Markov random fields to model  the spatial effect in Munich  rent data and explore some features and characteristics of the data. The potential of using spatial analysis within GAMLSS is discussed. We argue that the flexibility of  parametric distributions, ability to model all the parameters of the distribution and diagnostic tools of GAMLSS provide an ideal environment for modelling spatial features of data.