2020-07-15
12:00hrs.
Carolina Marchant. Facultad de Ciencias Básicas, Universidad Católica del Maule, Talca, Chile
Multivariate Birnbaum-Saunders Distributions: Modelling and Applications
Sala 5, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:

Since its origins and numerous applications in material science, the Birnbaum-Saunders family of distributions has now found widespread uses in some areas of the applied sciences such as environment and medicine, as well as in quality control, among others. It is able to model varied data behaviour and hence provides a flexible alternative to the most usual distri- butions. The family includes Birnbaum–Saunders and log-Birnbaum–Saunders distributions in univariate and multivariate versions. There are now well-developed methods for estimation and diagnostics that allow in-depth analyses. This presentation gives a review of methods and of relevant literature, introducing properties and theoretical results in a systematic way. To emphasise the range of suitable applications, full analyses are included of examples based on regression and diagnostics in material science and control charts for environmental monitoring.

Seminario organizado por el Centro para el Descubrimiento de Estructuras en Datos Complejos - MiDaS.

http://midas.mat.uc.cl
2019-08-23
12:00 hrs.
Katherine R. Mclaughlin. Department of Statistics, oregon State University
Visibility Imputation for Population Size Estimation using Respondent-Driven Sampling
Sala 5, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:
Respondent-driven sampling (RDS) is a network sampling method commonly used to access hidden populations, such as those at high risk for HIV/AIDS and related diseases, in situations where sampling frames do not exist and conventional sampling techniques are not possible. In RDS, participants recruit their peers into the study, which has proven effective as an enrollment strategy but requires careful statistical analysis when making inference about the population. Data from RDS surveys inform key policy and resource allocation decisions, and in particular population size estimates are essential to understand counts of at-risk individuals to develop counseling and treatment programs and monitor health needs and epidemics. Successive sampling population size estimation (SS-PSE) is a commonly used method to estimate population size from RDS surveys, in which the decrease in social network size of participants over the study period is used to gauge the sample fraction. However, SS-PSE relies on self-reported social network sizes, which are subject to missingness, misreporting, and bias, and it is not robust to extreme values. In this talk, we present a modification to the SS-PSE methodology that jointly models the effective social network size of each individual along with the population size in a Bayesian framework. The model for effective network size, which we call visibility to reflect its usage as a proxy for inclusion probability, incorporates a measurement error model for self-reported social network size, as well as the number of recruits an individual was able to enroll and the time they had to recruit. We present and assess the imputed visibility SS-PSE framework, and demonstrate its utility using an RDS study of people who inject drugs (PWID) from Kosovo.
2019-07-19
12:00hrs.
Jorge Luis Bazán. Department of Applied Mathematics and Statistics, University of Sao Paulo
Performance of asymmetric links and correction methods for imbalanced data in binary regression
Sala 5, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:
In binary regression, imbalanced data result from the presence of values equal to zero (or one) in a proportion that is significantly greater than the corresponding real values of one (or zero). In this work, we evaluate two methods developed to deal with imbalanced data and compare them to the use of asymmetric links. The results based on simulation study show, that correction methods do not adequately correct bias in the estimation of regression coefficients and that the models with power links and reverse power considered produce better results for certain types of imbalanced data. Additionally, we present an application for imbalanced data, identifying the best model among the various ones proposed. The parameters are estimated using a Bayesian approach, considering the Hamiltonian Monte-Carlo method, utilizing the No-U-Turn Sampler algorithm and the comparisons of models were developed using different criteria for model comparison, predictive evaluation and quantile residuals.

Seminario organizado por el Centro para el Descubrimiento de Estructuras en Datos Complejos - MiDaS.

http://midas.mat.uc.cl
2019-07-19
11:00hrs.
Caio L. N. Azevedo. Department of Statistics, Institute of Mathematics, Statistics and Scientific Computing, University of Campinas, Brazil.
Birnbaum-Saunders Linear Mixed-Effects Models with Censored Data: Bayesian MCMC Inference
Sala 5, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:

It is usual in data analysis the use of linear mixed effects models, when the responses are clustered around some random effects. This paper is focused on the Bayesian inference for the log-Birnabuam-Saunders linear mixed (log-BSLM) models, previously defined in the literature, under a frequentist point of view. The use of Markov chain Monte Carlo (MCMC) method is explored, which provides an alternative to the marginal maximum likelihood approach, which depends on the approximation of the likelihood. We developed, besides parameter estimation, residual analysis, influence diagnostics, model comparison and Bayesian prediction. We developed two MCMC algorithms, with and without consider a certain acceleration procedure. Simulation studies are conducted, under different scenarios of interest, where it is shown that the Bayesian approach, in general, provides better results than the frequentist one. In addition, the algorithm with the acceleration procedure showed to be better, in terms of convergence, than the usual MCMC approach. Also, a real data is analyzed, where is shown that our approach works properly. Finally, some directions toward some extensions are discussed.

Seminario organizado por el Centro para el Descubrimiento de Estructuras en Datos Complejos - MiDaS.

http://midas.mat.uc.cl

http://midas.mat.uc.cl
2019-06-14
12:00hrs.
Daniela Castro. University of Glasgow
A spliced Gamma-Generalized Pareto model for short-term extreme wind speed probabilistic forecasting
Sala 5, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:
Renewable sources of energy, such as wind power have become a sustainable alternative to fossil fuel-based energy. However, the uncertainty and fluctuation of the wind speed derived from its intermittent nature bring a great threat to the wind power production stability, and the wind turbines themselves. Lately, much work has been done on developing models to forecast average wind speed values, yet surprisingly little has focused on proposing models to accurately forecast extreme wind speeds, which can damage the turbines. In this work, we develop a flexible spliced Gamma-Generalized Pareto model to forecast extreme and non-extreme wind speeds simultaneously. Our model belongs to the class of latent Gaussian models, for which inference is conveniently performed based on the integrated nested Laplace approximation method. Considering a flexible additive regression structure, we propose two models for the latent linear predictor to capture the spatio-temporal dynamics of wind speeds. Our models are fast to fit and can describe both the bulk and the tail of the wind speed distribution while producing short-term extreme and non-extreme wind speed probabilistic forecasts.
2019-06-07
12:00hrs.
Desafíos y oportunidades multi-disciplinares para un programa de investigación en Neurociencia Humana en el contexto actual
Sala 5, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:
La última década dio vida a uno de los conceptos más interesantes en neurociencia cognitiva, el estudio de la cognición en su estado natural. Este engloba un esfuerzo multi-disciplinario que busca desarrollar una nueva fase en el entendimiento de la biofísica de la experiencia humana. Debido a la complejidad del objeto de estudio, es infructuoso tratar de distinguir límites entre disciplinas; los objetivos a cumplir van desde el desarrollo de software y hardware hasta nuevas hipótesis, análisis e intervenciones. La presente charla provee el contexto actual respecto a los principales desafíos y oportunidades al momento de medir, analizar y entender las dinámicas cerebro/cuerpo que están a la base de los procesos cognitivos y la subjetividad humana con principal énfasis en las oportunidades disponibles para colaboración ínter-disciplina.

Este trabajo se enmarca dentro del proyecto FONDECYT Nº11180620 “Desde el ciclo percepción-acción hasta la interacción social: explorando correlatos neurales de la interacción humana"
2019-05-17
12:00 hrs.
Federico Crudu. Department of Economics and Statistics, Università Di Siena
Inference in instrumental variables models with heteroskedasticity and many instruments
Sala 5, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:
This paper proposes a specification test for instrumental variable models that is robust to the presence of heteroskedasticity. The test can be seen as a generalization the Anderson-Rubin test. Our approach is based on the jackknife principle. We are able to show that under the null the proposed statistic has a Gaussian limiting distribution. Moreover, a simulation study shows its competitive finite sample properties in terms of size and power.
2019-05-10
12:00hrs.
Alessandra Guglielmi. Department of Mathematics, Politecnico Di Milano
Determinantal Point Process Mixtures Via Spectral Density Approach
Sala 5, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:
We consider mixture models where location parameters are a priori encouraged to be well separated. We explore a class of determinantal point process (DPP) mixture models, which provide the desired notion of separation or repulsion. Instead of using the rather restrictive case where analytical results are partially available, we adopt a spectral representation from which approximations to the DPP density functions can be readily computed. For the sake of concreteness the presentation focuses on a power exponential spectral density, but the proposed approach is in fact quite general. We later extend our model to incorporate covariate information in the likelihood and also in the assignment to mixture components, yielding a trade-off between repulsiveness of locations in the mixtures and attraction among subjects with similar covariates. We develop full Bayesian inference, and explore model properties and posterior behavior using several simulation scenarios and data illustrations.  The talk is based on the following  paper: Bianchini, Guglielmi, Quintana (2019),  Determinantal Point Process Mixtures Via Spectral Density Approach, Bayesian Analysis.
2019-04-05
12:00hrs.
Ilias Diakonikolas. University of Southern California
Algorithmic Questions in High-Dimensional Robust Statistics
Sala 5, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:
Fitting a model to a collection of observations is one of the quintessential questions in statistics. The standard assumption is that the data was generated by a model of a given type (e.g., a mixture model). This simplifying assumption is at best only approximately valid, as real datasets are typically exposed to some source of contamination. Hence, any estimator designed for a particular model must also be robust in the presence of corrupted data. This is the prototypical goal in robust statistics, a field that took shape in the 1960s with the pioneering works of Tukey and Huber. Until recently, even for the basic problem of robustly estimating the mean of a high-dimensional dataset, all known robust estimators were hard to compute. Moreover, the quality of the common heuristics degrades badly as the dimension increases.

In this talk, we will survey the recent progress in algorithmic high-dimensional robust statistics. We will describe the first computationally efficient algorithms for robust mean and covariance estimation and the main insights behind them. We will also present practical applications of these estimators to exploratory data analysis and adversarial machine learning. Finally, we will discuss new directions and opportunities for future work.

The talk will be based on a number of joint works with (various subsets of) G. Kamath, D. Kane, J. Li, A. Moitra, and A. Stewart.

Seminario organizado por el Centro para el Descubrimiento de Estructuras en Datos Complejos - MiDaS.
http://midas.mat.uc.cl
2019-03-22
12:00 hrs.
Debajyoti Sinha. Florida State University
Semiparametric Bayesian latent variable regression for skewed multivariate data
Sala 5, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:
For many real-life studies with skewed multivariate responses, the level of skewness and association structure assumptions are essential for evaluating the covariate eff ects on the response and its predictive distribution. We present a novel semiparametric multivariate model and associated Bayesian analysis for multivariate skewed responses. Similar to multivariate Gaussian, this multivariate model is closed under marginalization, allows a wide class of multivariate associations, and has meaningful physical interpretations of skewness levels and covariate eff ects on the marginal density. Other desirable properties of our model include the Markov Chain Monte Carlo computation through available statistical software, and the assurance of consistent Bayesian estimates of the parameters and the nonparametric error density under a set of plausible prior assumptions. We illustrate the practical advantages of our methods over existing alternatives via simulation studies, the analysis of a clinical study on periodontal disease and extensions to Bayesian regression trees.

This is a joint work with Drs. A.Bhingare, S.Lipsitz, D.Bandopadyay and A.Linero.

Seminario organizado por el Centro para el Descubrimiento de Estructuras en Datos Complejos - MiDaS.
http://midas.mat.uc.cl
2019-01-11
12:00 hrs.
David Dahl. Department of Statistics, Brigham Young University
Summarizing Distributions of Latent Structure
Sala 5, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:
In a typical Bayesian analysis, consider effort is placed on "fitting the model" (e.g., obtaining samples from the posterior distribution) but this is only half of the inference problem.  Meaningful inference usually requires summarizing the posterior distribution of the parameters of interest.  Posterior summaries can be especially important in communicating the results and conclusions from a Bayesian analysis to a diverse audience.  If the parameters of interest live in R^n, common posterior summaries are means, medians, and modes.  Summarizing posterior distributions of parameters with complicated structure is a more difficult problem.  For example, the "average" network in the posterior distribution on a network is not easily defined. This paper reviews methods for summarizing distributions of latent structure and then proposes a novel search algorithm for posterior summaries.  We apply our method to distributions on variable selection indicators, partitions, feature allocations, and networks.  We illustrate our approach in a variety of models for both simulated and real datasets.

Seminario organizado por el Centro para el Descubrimiento de Estructuras en Datos Complejos - MiDaS.
http://midas.mat.uc.cl
2019-01-10
12:00 hrs.
Dae-Jin Lee. Bcam - Basque Center for Applied Mathematics
Hierarchical modelling of patient-reported outcomes data based on the beta-binomial distribution
Sala 1, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:
The beta-binomial distribution does not belong to the exponential family and, hence classical regression techniques cannot be used when dealing with outcomes following the mentioned distribution. In this talk, we propose and develop regression models based on the beta-binomial distribution for the analysis of U, J or inverse J-shaped discrete and bounded outcomes. In fact, although this work is focused on the analysis of patient-reported outcomes (PROs), which usually follow the mentioned distributional shapes, proposed models can also be extended to several fields. First of all, we make a review and comparison of existing beta-binomial regression approaches in independent data context, concluding that the marginal approach is the most adequate. However, PRO studies are usually carried out in a longitudinal framework, where patients' responses are measured over time. This leads to a multilevel or correlated data structure and consequently, we extend the marginal beta-binomial regression approach to the inclusion of random effects to accommodate the hierarchical structure of the data. We develop the estimation and inference procedure for the model proposal. Furthermore, we compare the performance of our proposal with similar approaches in the literature, showing that it gets better results in terms of reducing the bias of the estimates. We apply the model to a longitudinal Chronic Obstructive Pulmonary Disease study carried out at Galdakao Hospital in Biscay, Spain, reaching clinically and statistically relevant results about the evolution of the patients over time. PROs are usually obtained using rating scale questionnaires consisting of questions or items, grouped into one or more subscales, often called dimensions or domains. Therefore, we also propose a multivariate regression model based on the beta-binomial distribution for the joint analysis of all the longitudinal dimensions provided by different questionnaires. Finally, it is worth mentioning that we have implemented all the proposed regression models in the PROreg R- package which is available at CRAN.
2018-12-21
12:00hrs.
Tamara Fernandez. Gatsby Computational Neuroscience Unit
RKHS testing for censored data
Sala 5, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:
We introduce kernel-based tests for censored data, where observations may be missing in random time intervals: a common occurrence in clinical trials and industrial life testing. Our approach is based on computing distances between probability distribution embeddings in a reproducing kernel Hilbert space (RKHS). Previously, this approach has been applied in many Machine Learning and Statistical data settings obtaining very good results. The main advantages of these methods are the ability of kernels to deal with complex data and high dimensionality. In this talk, we revert to the real-line problem in which the complexity of the data is due to censored observations. In particular, we propose an extension of these set of tools to censored data, derive its asymptotic results and explain its relation with dominant approaches in Survival Analysis such as the Log-rank test. We finalize showing an empirical evaluation of our methods in which we outperform competing approaches in multiple scenarios.

Seminario organizado por el Centro para el Descubrimiento de Estructuras en Datos Complejos - MiDaS.
http://midas.mat.uc.cl
2018-11-30
12:00hrs.
A sequential approach to updating posterior information
Sala 5, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:
In this talk we show the performance of a sequential Monte Carlo (SMC) algorithm. As prerequisite to understand it, we discuss the Metropolis-Hastings algorithm and also illustrate the general idea of particle-based methods. The SMC algorithm presented here is a particular case of the sequential methods, where the objective is to update the posterior distribution in "static" models.

Seminario organizado por el Centro para el Descubrimiento de Estructuras en Datos Complejos - MiDaS.
http://midas.mat.uc.cl
2018-11-23
12:00hrs.
Carlos Díaz Ávalos. Iimas- Unam, México
Procesos puntuales espaciales como herramienta de análisis en ecología
Sala 5, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:
Los procesos puntuales espaciales han cobrado popularidad en los últimos años debido a su utilidad para contestar diversas preguntas en campos científicos.  En el campo de la ecología de comunidades, los procesos puntuales han mostrado su utilidad para detectar la presencia de interacciones intra e interespecíficas en ecosistemas boscosos o para evaluar el riesgo y los factores asociados a perturbaciones ecológicas como incendios forestales.  Aunque la estimación de los parámetros de modelos en aplicaciones de procesos puntuales espaciales puede ser complicada, los avances en la parte computacional han permitido lograr aproximaciones numéricas aceptables, los cual ha sido factor para su uso en diversos campos del conocimiento humano.

En esta charla se presenta un panorama general de los fundamentos teóricos de los procesos puntuales espaciales y se ilustra con un ejemplo de su aplicación en la construcción de mapas de riesgo de incendios forestales.

Seminario organizado por el Centro para el Descubrimiento de Estructuras en Datos Complejos - MiDaS.
http://midas.mat.uc.cl
2018-11-16
12:00hrs.
Nishant Mehta. University of Victoria
Fast Rates for Unbounded Losses: from ERM to Generalized Bayes
Sala 5, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:
I will present new excess risk bounds for randomized and deterministic estimators, discarding boundedness assumptions to handle general unbounded loss functions like log loss and squared loss under heavy tails. These bounds have a PAC-Bayesian flavor in both derivation and form, and their expression in terms of the information complexity forms a natural connection to generalized Bayesian estimators. The bounds hold with high probability and a fast $\tilde{O}(1/n)$ rate in parametric settings, under the recently introduced central' condition (or various weakenings of this condition with consequently weaker results) and a type of 'empirical witness of badness' condition. The former conditions are related to the Tsybakov margin condition in classification and the Bernstein condition for bounded losses, and they help control the lower tail of the excess loss. The 'witness' condition is new and suitably controls the upper tail of the excess loss. These conditions and our techniques revolve tightly around a pivotal concept, the generalized reversed information projection, which generalizes the reversed information projection of Li and Barron. Along the way, we connect excess risk (a KL divergence in our language) to a generalized Rényi divergence, generalizing previous results connecting Hellinger distance to KL divergence.

This is joint work with Peter Grünwald.

Seminario organizado por el Centro para el Descubrimiento de Estructuras en Datos Complejos - MiDaS.
http://midas.mat.uc.cl
2018-10-26
12:00hrs.
Discovering Interactions Using Covariate Informed Random Partition Models
Sala 5, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:
Combination chemotherapy treatment regimens created for patients diagnosed with childhood acute lymphoblastic leukemia have had great success in improving cure rates. Unfortunately, patients prescribed these types of treatment regimens have displayed susceptibility to the onset of osteonecrosis. Some have suggested that this is due to pharmacokinetic interaction between two agents in the treatment regimen (asparaginase and dexamethasone) and other physiological variables. Determining which physiological variables to consider when searching for interactions in scenarios like these, minus a priori guidance, has proved to be a challenging problem, particularly if interactions influence the response distribution in ways beyond shifts in expectation or dispersion only. In this paper we propose an exploratory technique that is able to discover associations between covariates and  responses  in a very general way. The procedure connects covariates to responses very flexibly through dependent random partition prior distributions, and then employs machine learning techniques to highlight potential associations found in each cluster. We apply the method to data produced from a study dedicated to learning which physiological predictors influence severity of osteonecrosis multiplicatively.

Seminario organizado por el Centro para el Descubrimiento de Estructuras en Datos Complejos - MiDaS.
http://midas.mat.uc.cl
2018-10-24
15:00hrs.
Leidy Rocío León Dávila. Escuela de Matemáticas y Estadística de la Universidad Pedagógica y Tecnológica de Colombia
Presentación Libro Análisis de Datos Categóricos
Sala 2, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract:
El texto ha sido elaborado pensando en un lector que demande el uso de algunas herramientas estadísticas, útiles para el análisis de la información, principalmente de tipo categórico, producto de algún trabajo de investigación. No obstante que los primeros destinatarios son las personas que trabajen en torno a problemas de la salud y la biología, el material estadístico que se ofrece puede ser empleado por investigadores de otras disciplinas, pues basta cambiar el escenario de los ejemplos e ilustraciones, para hacer de este texto un instrumento de apoyo a varias disciplinas.
2018-10-19
12:00hrs.
Claudio Chamorro. Departamento de Ciencias de la Salud UC
Regla de predicción clínica para determinar qué tipo de pacientes son candidatos a cirugía de manguito rotador en el hombro
Sala 5, Facultad de Matemáticas, Edificio Rolando Chuaqui, Campus San Joaquin, Pontificia Universidad Católica de Chile
Abstract: