Seminario de Estadística


2018-12-21
12:00hrs.
Tamara Fernandez. Gatsby Computational Neuroscience Unit
RKHS testing for censored data
Sala 5, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:
We introduce kernel-based tests for censored data, where observations may be missing in random time intervals: a common occurrence in clinical trials and industrial life testing. Our approach is based on computing distances between probability distribution embeddings in a reproducing kernel Hilbert space (RKHS). Previously, this approach has been applied in many Machine Learning and Statistical data settings obtaining very good results. The main advantages of these methods are the ability of kernels to deal with complex data and high dimensionality. In this talk, we revert to the real-line problem in which the complexity of the data is due to censored observations. In particular, we propose an extension of these set of tools to censored data, derive its asymptotic results and explain its relation with dominant approaches in Survival Analysis such as the Log-rank test. We finalize showing an empirical evaluation of our methods in which we outperform competing approaches in multiple scenarios.

Seminario organizado por el Centro para el Descubrimiento de Estructuras en Datos Complejos - MiDaS.
http://midas.mat.uc.cl
2018-11-30
12:00hrs.
Danilo Alvares. Departamento de Estadística, Pontificia Universidad Católica de Chile
A sequential approach to updating posterior information
Sala 5, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:
In this talk we show the performance of a sequential Monte Carlo (SMC) algorithm. As prerequisite to understand it, we discuss the Metropolis-Hastings algorithm and also illustrate the general idea of particle-based methods. The SMC algorithm presented here is a particular case of the sequential methods, where the objective is to update the posterior distribution in "static" models.

Seminario organizado por el Centro para el Descubrimiento de Estructuras en Datos Complejos - MiDaS.
http://midas.mat.uc.cl
2018-11-23
12:00hrs.
Carlos Díaz Ávalos. Iimas- Unam, México
Procesos puntuales espaciales como herramienta de análisis en ecología
Sala 5, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:
Los procesos puntuales espaciales han cobrado popularidad en los últimos años debido a su utilidad para contestar diversas preguntas en campos científicos.  En el campo de la ecología de comunidades, los procesos puntuales han mostrado su utilidad para detectar la presencia de interacciones intra e interespecíficas en ecosistemas boscosos o para evaluar el riesgo y los factores asociados a perturbaciones ecológicas como incendios forestales.  Aunque la estimación de los parámetros de modelos en aplicaciones de procesos puntuales espaciales puede ser complicada, los avances en la parte computacional han permitido lograr aproximaciones numéricas aceptables, los cual ha sido factor para su uso en diversos campos del conocimiento humano.

En esta charla se presenta un panorama general de los fundamentos teóricos de los procesos puntuales espaciales y se ilustra con un ejemplo de su aplicación en la construcción de mapas de riesgo de incendios forestales.

Seminario organizado por el Centro para el Descubrimiento de Estructuras en Datos Complejos - MiDaS.
http://midas.mat.uc.cl
2018-11-16
12:00hrs.
Nishant Mehta. University of Victoria
Fast Rates for Unbounded Losses: from ERM to Generalized Bayes
Sala 5, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:
I will present new excess risk bounds for randomized and deterministic estimators, discarding boundedness assumptions to handle general unbounded loss functions like log loss and squared loss under heavy tails. These bounds have a PAC-Bayesian flavor in both derivation and form, and their expression in terms of the information complexity forms a natural connection to generalized Bayesian estimators. The bounds hold with high probability and a fast $\tilde{O}(1/n)$ rate in parametric settings, under the recently introduced central' condition (or various weakenings of this condition with consequently weaker results) and a type of 'empirical witness of badness' condition. The former conditions are related to the Tsybakov margin condition in classification and the Bernstein condition for bounded losses, and they help control the lower tail of the excess loss. The 'witness' condition is new and suitably controls the upper tail of the excess loss. These conditions and our techniques revolve tightly around a pivotal concept, the generalized reversed information projection, which generalizes the reversed information projection of Li and Barron. Along the way, we connect excess risk (a KL divergence in our language) to a generalized Rényi divergence, generalizing previous results connecting Hellinger distance to KL divergence.

This is joint work with Peter Grünwald.

Seminario organizado por el Centro para el Descubrimiento de Estructuras en Datos Complejos - MiDaS.
http://midas.mat.uc.cl
2018-10-26
12:00hrs.
Fernando Quintana. Departamento de Estadística, Pontificia Universidad Católica de Chile
Discovering Interactions Using Covariate Informed Random Partition Models
Sala 5, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:
Combination chemotherapy treatment regimens created for patients diagnosed with childhood acute lymphoblastic leukemia have had great success in improving cure rates. Unfortunately, patients prescribed these types of treatment regimens have displayed susceptibility to the onset of osteonecrosis. Some have suggested that this is due to pharmacokinetic interaction between two agents in the treatment regimen (asparaginase and dexamethasone) and other physiological variables. Determining which physiological variables to consider when searching for interactions in scenarios like these, minus a priori guidance, has proved to be a challenging problem, particularly if interactions influence the response distribution in ways beyond shifts in expectation or dispersion only. In this paper we propose an exploratory technique that is able to discover associations between covariates and  responses  in a very general way. The procedure connects covariates to responses very flexibly through dependent random partition prior distributions, and then employs machine learning techniques to highlight potential associations found in each cluster. We apply the method to data produced from a study dedicated to learning which physiological predictors influence severity of osteonecrosis multiplicatively. 

Seminario organizado por el Centro para el Descubrimiento de Estructuras en Datos Complejos - MiDaS.
http://midas.mat.uc.cl
2018-10-24
15:00hrs.
Leidy Rocío León Dávila. Escuela de Matemáticas y Estadística de la Universidad Pedagógica y Tecnológica de Colombia
Presentación Libro Análisis de Datos Categóricos
Sala 2, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:
El texto ha sido elaborado pensando en un lector que demande el uso de algunas herramientas estadísticas, útiles para el análisis de la información, principalmente de tipo categórico, producto de algún trabajo de investigación. No obstante que los primeros destinatarios son las personas que trabajen en torno a problemas de la salud y la biología, el material estadístico que se ofrece puede ser empleado por investigadores de otras disciplinas, pues basta cambiar el escenario de los ejemplos e ilustraciones, para hacer de este texto un instrumento de apoyo a varias disciplinas.
2018-10-19
12:00hrs.
Claudio Chamorro. Departamento de Ciencias de la Salud UC
Regla de predicción clínica para determinar qué tipo de pacientes son candidatos a cirugía de manguito rotador en el hombro
Sala 5, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:

La cirugía del manguito rotador es el procedimiento quirúrgico más común dentro de las patologías de hombro. Sin embargo el porcentaje de satisfacción del paciente post quirúrgico es bastante variable fluctuando entre el 38% y 95%. La decisión de operar o no al paciente es compleja. Las manifestaciones clínicas de la patología y las expectativas del paciente son muy diversas. Estudios internacionales reportan incrementos relevantes en la frecuencia de este procedimiento quirúrgico que van desde 1.4 cirugías cada 105 sujetos en el año 2000 a 16.3 cirugías cada 105 sujetos en el año 2015. En Chile el dato exacto se desconoce. La patología de manguito rotador en el sector público genera un gasto asociado considerable tanto para los sistemas de salud del país. Los costos directos entre cirugía y rehabilitación post quirúrgica fluctúan entre US 10.000 a más de US 17.000 por procedimiento. A estos altos costos se debe agregar el complejo proceso post rehabilitación (alrededor de 6 meses) y largo tiempo de licencia médica. Los altos costos asociados, potenciales complicaciones postoperatorias y las cirugías insatisfactorias para el paciente demandan guías que asistan al médico tratante a determinar que paciente se deben operar y quien no. Existen varios factores pronósticos de éxito que han sido estudiados en la literatura como la edad del paciente, tiempo que presenta la sintomatología, número de tendones comprometidos, presencia o ausencia de retracciones y tamaño de la rotura tendínea. Los resultados acerca de la influencia de estos factores de riesgo son difusos y poco consistentes entre los diferentes estudios con un análisis estadístico cuestionable lo que hace que los resultados sean difícil de interpretar. Dada la importancia que ha adquirido la percepción del paciente en el outcome de la cirugía, en los últimos años se ha trabajado mucho en base a los cuestionarios subjetivos autoadministrados, siendo hoy relevantes para los prestadores de salud. Dentro de los diferentes cuestionarios subjetivos utilizados en el hombro, el Quick dash ha presentado alta confiabilidad interevaluador y consistencia interna además de escaso tiempo requerido para ser completado. La adaptación transcultural para la población chilena ya existe, pero sus propiedades psicométricas no han sido determinadas. El propósito del estudio es i) determinar las propiedades psicométricas del cuestionario Quick dash para la población chilena; ii) crear una regla de predicción clínica basada en la influencia relativa que tienen los diferentes factores pronósticos asociados al paciente y la patología en el outcome funcional determinada por el cuestionario Quick dash; iii) basado en los puntajes obtenidos en el Quick dash, comparar el éxito de la cirugía de manguito rotador en pacientes seleccionados mediante el criterio del médico tratante apoyado con la regla de predicción clínica v/s los pacientes seleccionados mediante el criterio exclusivo del médico tratante y iv) comparar los puntajes del cuestionario Quick dash entre pacientes que fueron seleccionados a tratamiento conservador (no cirugía) mediante el criterio del médico tratante apoyado con la regla de predicción clínica v/s los pacientes seleccionados mediante el criterio exclusivo del médico tratante.

2018-10-08
16:00hrs.
Timothy E. Obrien. Department of Mathematics and Statistics and Institute of Environmental Sustainability, Loyola University Chicago
Statistical Modelling and Robust Experimental Design Strategies
Sala 3, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:

Researchers often find that nonlinear regression models are more applicable for modelling various biological, physical and chemical processes than are linear ones since they tend to fit the data well and since these models (and model parameters) are more scientifically meaningful. These researchers are thus often in a position of requiring optimal or near-optimal designs for a given nonlinear model. A common shortcoming of most optimal designs for nonlinear models used in practical settings, however, is that these designs typically focus only on (first-order) parameter variance or predicted variance, and thus ignore the inherent nonlinear of the assumed model function. Another shortcoming of optimal designs is that they often have only support points, where is the number of model parameters.

Furthermore, measures of marginal curvature, first introduced in Clarke (1987) and extended in Haines et al (2004), provide a useful means of assessing this nonlinearity. Other relevant developments are the second-order volume design criterion introduced in Hamilton and Watts (1985) and extended in O’Brien (2010), and the second-order MSE criterion developed and illustrated in Clarke and Haines (1995).

In the context of applied statistical modelling, this talk examines various robust design criteria and those based on second-order (curvature) considerations. These techniques, coded in popular software packages, are illustrated with several examples including one from a preclinical dose-response setting encountered in a recent consulting session.

2018-09-14
12:00hrs.
Alejandro Murua. Universidad de Montreal
Cox regression with Potts-driven latent clusters model
Sala 5, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:
We consider a Bayesian nonparametric survival regression model with latent partitions. Our goal is to predict survival, and to cluster survival patients within the context of building prognosis systems. We propose the Potts clustering model as a prior on the covariates space so as to drive cluster formation on individuals and/or Tumor-Node-Metastasis stage system patient blocks. For any given partition, our model assumes a interval-wise Weibull distribution for the baseline hazard rate. The number of intervals is unknown. It is estimated with a lasso-type penalty given by a sequential double exponential prior. Estimation and inference are done with the aid of MCMC. To simplify the computations, we use the Laplace's approximation method to estimate some constants, and to propose parameter updates within MCMC. We illustrate the methodology with an application to cancer survival.

Seminario organizado por el Centro para el Descubrimiento de Estructuras en Datos Complejos - MiDaS.
http://midas.mat.uc.cl
2018-09-07
12:00hrs.
Luis Gutierrez. Departamento de Estadística, Pontificia Universidad Católica de Chile
A Bayesian nonparametric multiple testing procedure for comparing several treatments against a control
Sala 5, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:
We propose a Bayesian nonparametric strategy to test for differences between a control group and several treatment regimes. Most of the existing tests for this type of comparison are based on the differences between location parameters. In contrast, our approach identifies differences across the entire distribution, avoids strong modeling assumptions over the distributions for each treatment, and accounts for multiple testing through the prior distribution on the space of hypotheses. The proposal is compared to other commonly used hypothesis testing procedures under simulated scenarios. A real application is also analyzed with the proposed methodology.

Seminario organizado por el Centro para el Descubrimiento de Estructuras en Datos Complejos - MiDaS.
http://midas.mat.uc.cl
2018-08-31
12:00hrs.
Claudia Wehrhahn. University of California, Santa Cruz
A Bayesian approach to Disease Clustering using restricted Chinese restaurant processes
Sala 5, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:
Disease clustering models, whose goal is to detect clusters of regions with unusual high incidence, are of importance in epidemiology and public health. We describe a restricted Chinese restaurant process that constrains clusters to be formed of contiguous regions. The model is illustrated using synthetic data sets and in an application to oral cancer in Germany. The performance of the model is compared to the disease clustering model proposed by Knorr-Held and Rasser [Biometrics, 1, 56 (2000)].

Seminario organizado por el Centro para el Descubrimiento de Estructuras en Datos Complejos - MiDaS.
http://midas.mat.uc.cl
2018-08-24
12:00hrs.
Garritt Page. Department of Statistics, Brigham Young University
Temporal and Spatio-Temporal Random Partition Models
Auditorio Ninoslav Bralic, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:
Data that are spatially referenced often represent an instantaneous point in time at which the spatial process is measured.  Because of this it is becoming more common to monitor spatial processes over time.  We propose capturing the temporal evolution of dependent structures by modeling a sequence of partitions indexed by time jointly.  We derive a few characteristics from the joint model and show how it impacts dependence at the observation level.  Computation strategies are detailed and apply the method to Chilean standardized testing scores.

Seminario organizado por el Centro para el Descubrimiento de Estructuras en Datos Complejos (http://bnp.mat.uc.cl).

http://bnp.mat.uc.cl
2018-08-17
12:00hrs.
Evan Ray. Mount Holyoke College
Ensemble Forecasts of Infectious Disease
Sala 5, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:
Real-time forecasts of measures of the timing and severity of the spread of infectious disease are important inputs to public policy officials planning interventions designed to slow or stop the spread of the disease.  A wide variety of models have been developed to generate these forecasts, using different data sources and model structures.  In general, no single modeling approach always outperforms all other approaches.  In recent forecasting competitions focusing on influenza in the Unites States, models with very different structures and using different data sources have performed at or near the top of the rankings.  Here we describe two recent experiments with using ensemble approaches to combine the forecasts from several different component models.  We show that these ensembles have improved performance relative to the individual component models, in terms of having better performance on average and more consistent performance across multiple seasons.

Seminario organizado por el Centro para el Descubrimiento de Estructuras en Datos Complejos (http://bnp.mat.uc.cl).

2018-08-09
16:00hrs.
Victor Hugo Lachos. Department of Statistics, University of Connecticut
Censored Regression Models for Complex Data
Sala 3, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:
Measurement data can be subject to some upper and/or lower detection limits because of the restriction/limitation of experimental apparatus. A complication arises when these continuous measures present a heavy-tailed behavior because inference can be seriously affected by the misspecification of their parametric distribution. For such data structures, we discuss some useful models and estimation strategies for robust estimation. The practical utility of the proposed method are exemplified using real datasets.
2018-08-08
16:00hrs.
Mariangela Guidolin. Dipartimento Di Scienze Statistiche, Università Di Padova
On inverse product cannibalisation: A new Lotka-Volterra model for asymmetric competition in the ICTs
Sala 2, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:
Product cannibalisation is a well-known phenomenon in marketing and technological research, describing the case when a new product steals
sales from another product under the same brand. An extremely special case of cannibalisation may occur when the older
product reacts to the competitive strength of the newer one, absorbing the corresponding market shares. Given its special character, we call
this phenomenon inverse product cannibalisation . We suppose that a case of inverse cannibalisation is observed between two products of Apple
Inc.- the iPhone and the more recent iPad- and the first has been able to succeed at the expense of the second. To explore this hypothesis, from a diffusion of
innovations perspective, we propose a modified Lotka-Volterra model for mean trajectories in asymmetric competition, allowing us to test the
presence and extent of the inverse cannibalisation phenomenon. A SARMAX refinement integrates the short-term predictions with seasonal and
autodependent components. A non-dimensional representation of the proposed model shows that the penetration of the second technology has
been beneficial for the first, both in terms of the market size and life cycle length
2018-08-06
16:00hrs.
Bruno Scarpa. Dipartimento Di Scienze Statistiche, Università Di Padova
Bayesian modelling of networks in business intelligence problems
Sala 5, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:
Complex network data problems are increasingly common in many fields of application. Our motivation is drawn from strategic marketing studies monitoring customer choices of specific products, along with co-subscription networks encoding multiple purchasing behavior.

Data are available for several agencies within the same insurance company, and our goal is to efficiently exploit co-subscription networks to inform targeted advertising of cross-sell strategies to currently mono-product customers. We address this goal by developing a Bayesian hierarchical model, which clusters agencies according to common mono-product customer choices and co-subscription networks. Within each cluster, we efficiently model customer behavior via a cluster-dependent mixture of latent eigenmodels. This formulation provides key information on mono-product customer choices and multiple purchasing behavior within each cluster, informing targeted cross-sell strategies. We develop simple algorithms for tractable inference, and assess performance in simulations and an application to business intelligence.
2018-06-08
12:00hrs.
Fabio Lopes. Instituto de Estadística, Pontificia Universidad Católica de Valparaíso
Extensions of the birth-and-assassination process and their applications
Sala 1, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:
In this talk, we introduce some extensions of the birth-and-assassination (BA) process. The BA process is a variant of the continuous-time branching process which was introduced by Aldous and Krebs. In this model, each individual reproduces independently at rate $\lambda$ throughout its lifetime, but it is not at risk of dying until its parent's death. A typical realization of this process resembles a finite collection of ´´clans'', where only the current leaders of clans can be killed. This model behaves differently than the classical branching process, and has found interesting applications in queueing theory and the spread of rumors.
 
The extensions we will introduce can be seen as versions of the BA process with infinitely many types and mutations. We will illustrate them with some applications related to immunology.

This is a joint work with C. Grejo (USP), F. Machado (USP) and A. Roldán-Correa (U. Antioquia).
2018-06-01
12:00hrs.
Christian Caamaño Carrillo. Departamento de Estadística, Universidad del Bío-Bío
Modeling and estimation of some non Gaussian random fields
Sala 5, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:

In this work, we propose two types of models for the analysis of regression and dependence of positive and continuous spatio-temporal data, and of continuous spatio-temporal data with possible asymmetry and/or heavy tails. For the first case, we propose two (possibly non stationary) random fields with Gamma and Weibull marginals. Both random fields are obtained transforming a rescaled sum of independent copies of squared Gaussian random fields. For the second case, we propose a random field with t marginal distribution. We then consider two possible generalizations allowing for possible asymmetry. In the first approach we obtain a skew-t random field mixing a skew Gaussian random field with an inverse square root Gamma random field. In the second approach we obtain a two piece t random field mixing a specific binary discrete random field with half-t random field.

We study the associated second order properties and in the stationary case, the geometrical properties. Since maximum likelihood estimation is computationally unfeasible, even for relatively small data-set, we propose the use of the pairwise likelihood. The effectiveness of our proposal for the gamma and weibull cases, is illustrated through a simulation study and a re-analysis of the Irish Wind speed data (Haslett and Raftery, 1989) without considering any prior transformation of the data as in previous statistical analysis. For the t and asymmetric t cases we present a simulated study in order to show the performance of our method.

2018-05-25
12:00hrshrs.
Moreno Bevilacqua. Instituto de Estadística, Universidad de Valparaíso
Estimation and prediction of Gaussian random field using generalized Wendland covariance functions under fixed domain asymptotics
Sala 5, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:
Some results on  the  estimation and prediction of Gaussian random fields with covariance models belonging to the generalized Wendland class, under fixed domain asymptotics are presented. As for the Matern case, this class allows for a continuous parameterization of the smoothness of the underlying Gaussian random field, being additionally compactly supported. 

We first study the equivalence of two  Gaussian measures with  Matern and generalized Wendland  covariance models. Then  we give strong consistency and asymptotic distribution of the maximum likelihood estimator of the microergodic parameter associated to generalized Wendland covariance model, under fixed domain asymptotics. Finally we  give some  results in terms of (misspecified) best linear unbiased predictor, under fixed domain asymptotics when the true model is Matern and the misspecified is Generalized Wendland.
2018-05-18
12:00hrshrs.
Orietta Nicolis. Instituto de Estadística, Universidad de Valparaíso
Modelos espacio-temporales para la evaluación de riesgos ambientales
Sala 5, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:
El objetivo principal del trabajo es el uso de modelos estadísticos espacio-temporales para construir mapas de riesgo ambiental para los desastres naturales y antrópicos, con el fin de mejorar la evaluación, la prevención y la mitigación de sus impactos. Con este fin, se analiza la variabilidad espacial y temporal de los puntos georreferenciados y observaciones, se estudia la dependencia de las variables exógenas, y se produce mapas de riesgo. Aunque todos los datos que describen algunos fenómenos ambientales (tales como terremotos, incendios forestales, avalanchas, contaminación del aire, etc.) pueden ser caracterizados por una variabilidad temporal y espacial, diferentes supuestos tienen que tomarse para una correcta definición del modelo. Algunos estudios de caso serán mostrados usando el catálogo de terremotos de Chile y los datos sobre la contaminación del aire de Santiago de Chile.