Te Tari Pāngarau me te Tatauranga
Department of Mathematics & Statistics

Archived seminars in Statistics

Seminars 1 to 50

Next 50 seminars
Epidemic modelling: successes and challenges

Valerie Isham, NZMS 2018 Forder Lecturer

University College London

Date: Monday 23 April 2018

##Note time and venue of this public lecture##
Epidemic models are developed as a means of gaining understanding about the dynamics of the spread of infection (human and animal pathogens, computer viruses etc.) and of rumours and other information. This understanding can then inform control measures to limit spread, or in some cases enhance it (e.g., viral marketing). In this talk, I will give an introduction to simple generic epidemic models and their properties, the role of stochasticity and the effects of population structure (metapopulations and networks) on transmission dynamics, illustrating some past successes and outlining some future challenges.
Estimating dated phylogenetic trees with applications in epidemiology, immunology, and macroevolution

Alexandra Gavryushkina

Department of Biochemistry

Date: Monday 23 April 2018

##Note day, time and venue for this seminar##
Newly available data require developing new approaches to reconstructing dated phylogenetic trees. In this talk, I will present new methods that employ birth-death-sampling models to reconstruct dated phylogenetic trees in a Bayesian framework. These methods have been successfully applied in epidemiology and macroevolution. Dated phylogenetic histories can be informative about the past events, for example, we can learn from a reconstructed transmission tree which individuals were likely to infect other individuals. By reconstructing dated phylogenetic trees, we can also learn about the tree generating process parameters. For example, we can estimate and predict how fast epidemics spread or how fast new species arise or go extinct. In immunology, dating HIV antibody lineages can be important for vaccine design.
Confidence distributions

David Fletcher

Department of Mathematics and Statistics

Date: Thursday 19 April 2018

In frequentist statistics, it is common to summarise inference about a parameter using a point estimate and confidence interval. A useful alternative is a confidence distribution, first suggested by David Cox sixty years ago. This provides a visual summary of the set of confidence intervals obtained when we allow the confidence level to vary, and can be thought of as the frequentist analogue of a Bayesian posterior distribution. I will discuss the potential benefits of using confidence distributions and their link with Fisher's controversial concept of a fiducial distribution. I will also outline current work with Peter Dillingham and Jimmy Zeng on the calculation of a model-averaged confidence distribution.
A statistics-related seminar in Preventive and Social Medicine: Meta-analysis and its implications for public health policy decisions

Andrew Anglemyer

Naval Postgraduate School, California

Date: Wednesday 4 April 2018

When recommending policies, clinical guidelines, and treatment decisions, policy makers and practitioners alike can benefit greatly from clear evidence obtained from available empirical data. Methods for synthesizing these data that have been developed for use in clinical environments may prove to be a powerful tool in evidence-based decision making in other fields, as well. In this discussion, I will overview examples of how meta-analysis techniques have provided guidance in public health policy decisions (e.g., HIV treatment guidelines), methods for synthesizing data, and possible limitations of these approaches. Additionally, I will apply meta-analysis techniques to a uniquely Kiwi question to illustrate possible ways to provide guidance in health decisions.

~~Dr. Andrew Anglemyer is an epidemiologist who specializes in infectious diseases and study design methodology at Naval Postgraduate School (and previously at University of California, San Francisco). Since 2009 he has been a member of the World Health Organization’s HIV Treatment Guidelines development committee and was the statistics and methods editor for the HIV/AIDS Cochrane Review Group at UC San Francisco until 2014. Dr. Anglemyer has co-authored dozens of public health and clinical peer-reviewed papers with a wide range of topics including HIV prevention and treatment in high-risk populations, firearms-related injury, paediatric encephalitis and hyponatremia. He received an MPH in Epidemiology/Biostatistics and a PhD in Epidemiology from University of California, Berkeley.~~
A statistics-related seminar in Public Health - Mapping for public health: Effective use of spatial analysis to communicate epidemiological information

Jason Gilliland

Western University, Canada

Date: Thursday 29 March 2018

In this seminar I will present some background and lessons on the use of mapping and spatial analytical methods for public health. With practical examples from my own research, I will cover some important considerations for public health researchers wanting to bring GIS-based analyses into their own projects. The presentation will focus on key methodological issues related to using spatial data which are often overlooked by epidemiologists and other health researchers. Discussion will revolve around opportunities for using qualitative data in Health GIS projects and some other future directions and challenges.
~~Professor Jason Gilliland is Director of the Urban Development Program and Professor in the Dept of Geography, Dept of Paediatrics, School of Health Studies and Dept of Epidemiology & Biostatistics at Western University in Canada. He is also a Scientist with the Children's Health Research Institute and Lawson Health Research Institute, two of Canada's leading hospital-based research institutes. His research is primarily focused on identifying environmental influences on children’s health issues such as poor nutrition, physical inactivity, obesity, and injury. He is also Director of the Human Environments Analysis Lab (, an innovative research and training environment which specializes in community-based research and identifying interventions to inform public policy and neighbourhood design to promote the health and quality of life of children and youth.~~
Genetic linkage map construction in the next generation sequencing era: do old frameworks work with new challenges?

Phil Wilcox

Department of Mathematics and Statistics

Date: Thursday 29 March 2018

The low cost and high throughput of new DNA sequencing technologies have led to a data ‘revolution’ in genomics: two-to-three orders of magnitude more data can be generated for the same cost compared to previous technologies. This has facilitated genome-wide investigations in non-model species at scales not previously possible. However, these new technologies also present new challenges, particularly with genetic linkage mapping, where error due to sequencing and heterozygote undercalling upwardly bias estimates of linkage map lengths, and creates difficulties in reliably ordering clustered loci. In this talk I will describe the application of an exome capture based genotyping panel to genetic linkage map construction in ~~Pinus radiata D.Don~~. I will show that previously applied approaches first proposed in the mid-1990s still provide a suitable analytical framework for constructing robust linkage maps even in this modern data rich era.
Case-control logistic regression is more complicated than you think

Thomas Lumley

University of Auckland

Date: Thursday 22 March 2018

It is a truth universally acknowledged that logistic regression gives consistent and fully efficient estimates of the regression parameter under case-control sampling, so we can often ignore the distinction between retrospective and prospective sampling. I will talk about two issues that are more complicated than this. First, the behaviour of pseudo-$r^2$ statistics under case-control sampling: most of these are not consistently estimated. Second, the question of when and why unweighted logistic regression is much more efficient than survey-weighted logistic regression: the traditional answers of 'always' and 'because of variation in weights' are wrong.
Visual trumpery: How charts lie - and how they make us smarter

Ihaka Lecture #3: Alberto Cairo

University of Miami

Date: Wednesday 21 March 2018

With facts and truth increasingly under assault, many interest groups have enlisted charts — graphs, maps, diagrams, etc. — to support all manner of spin. Because digital images are inherently shareable and can quickly amplify messages, sifting through the visual information and misinformation is an important skill for any citizen.

The use of graphs, charts, maps and infographics to explore data and communicate science to the public has become more and more popular. However, this rise in popularity has not been accompanied by an increasing awareness of the rules that should guide the design of these visualisations.

This talk teaches normal citizens principles to become a more critical and better informed readers of charts.

~~Alberto Cairo is the Knight Chair in Visual Journalism at the University of Miami. He’s also the director of the visualisation programme at UM’s Center for Computational Science. Cairo has been a director of infographics and multimedia at news publications in Spain (El Mundo, 2000-2005) and Brazil (Editora Globo, 2010-2012,) and a professor at the University of North Carolina-Chapel Hill. Besides teaching at UM, he works as a freelancer and consultant for companies such as Google and Microsoft. He’s the author of the books The Functional Art: An Introduction to Information Graphics and Visualization (2012) and The Truthful Art: Data, Charts, and Maps for Communication (2016).~~

[!The lectures are live-streamed;] from 6.30pm NZDST onwards on 7, 14 and 21 March 2018.

Join the local group in the Mathematics and Statistics Department for this live-stream viewing and discussion
Local contact: [;]
Making colour accessible

Ihaka Lecture #2: Paul Murrell

University of Auckland

Date: Wednesday 14 March 2018

The 'BrailleR' package for R generates text descriptions of R plots.

When combined with screen reader software, this provides information for blind and visually-impaired R users about the contents of an R plot. A minor difficulty that arises in the generation of these text descriptions involves the information about colours within a plot. As far as R is concerned, colours are described as six-digit hexadecimal strings, e.g. "#123456", but that is not very helpful for a human audience. It would be more useful to report colour names like "red" or "blue".

This talk will make a mountain out of that molehill and embark on a daring Statistical Graphics journey featuring colour spaces, high-performance computing, Te Reo, and XKCD. The only disappointment will be the ending.

~~Paul Murrell is an Associate Professor in the Department of Statistics at The University of Auckland. He is a member of the core development team for R, with primary responsibility for the graphics system.~~

[!The lectures are live-streamed;] from 6.30pm NZDST onwards on 7, 14 and 21 March 2018.

Join the local group in the Mathematics and Statistics Department for this live-stream viewing and discussion
Local contact: [;]

Ihaka Lectures: A thousand words: visualising statistical data

Live-streamed, 1st of 3 lectures

Date: Wednesday 7 March 2018

Following on from the success of last year's inaugural series, the theme of the 2018 Ihaka lectures is A thousand words: visualising statistical data.

[!The lectures are live-streamed;] from 6.30pm NZDST onwards on 7, 14 and 21 March 2018.

Local contact: [;]
Scalable Gaussian process models for analyzing large spatial and spatio-temporal datasets

Alan E Gelfand

Duke University

Date: Tuesday 28 November 2017

Spatial process models for analyzing geostatistical data entail computations that become prohibitive as the number of spatial locations becomes large. This talk considers computationally feasible Gaussian process-based approaches to address this problem. We consider two approaches to approximate an intended Gaussian process; both are Gaussian processes in their own right. One, the Predictive Gaussian Process (PGP), is based upon the idea of dimension reduction. The other, the Nearest Neighbor Gaussian Process (NNGP), is based upon sparsity ideas.

The predictive process is simple to understand, routine to implement, with straightforward bias correction. It enjoys several attractive properties within the class of dimension reduction approaches and works well for datasets of order 103 or 104. It suffers several limitations including spanning only a finite dimensional subspace, over-smoothing, and underestimation of uncertainty.

So, we focus primarily on the nearest neighbor Gaussian process which draws upon earlier ideas of Vecchia and of Stein. It is a bit more complicated to grasp and implement but it is highly scalable, having been applied to datasets as large as 106. It is a well-defined spatial process providing legitimate finite dimensional Gaussian densities with sparse precision matrices. Scalability is achieved by using local information from few nearest neighbors, i.e., by using the neighbor sets in a conditional specification of the model. This is equivalent to sparse modeling of Cholesky factors of large covariance matrices. We show a multivariate spatial illustration as well as a space-time example. We also consider automating the selection of the neighbor set size.

For either specification, we embed the PGP as a dimension reduction prior and the NNGP as a sparsity-inducing prior within a rich hierarchical modeling framework and outline how computationally efficient Markov chain Monte Carlo (MCMC) algorithms can be executed. However, the future likely lies with the NNGP since it can accommodate spatial scales that preclude dimension-reducing methods.
Why does the stochastic gradient method work?

Matthew Parry

Department of Mathematics and Statistics

Date: Tuesday 24 October 2017

The stochastic gradient (SG) method seems particularly suited to the numerical optimization problems that arise in large-scale machine learning applications. In a recent paper, Bottou et al. give a comprehensive theory of the SG algorithm and make some suggestions as to how it can be further improved. In this talk, I will briefly give the background to the optimization problems of interest and contrast the batch and stochastic approaches to optimization. I will then give the mathematical basis for the success of the SG method. If time allows, I will discuss how the SG method can also be applied to sampling algorithms.
The changing face of undergraduate mathematics education: a US perspective

Rachel Weir

Allegheny College, Pennsylvania

Date: Monday 16 October 2017

##Note day and time of this seminar##
A common theme in the United States in recent years has been a call to increase the number of graduates in STEM (science, technology, engineering, and mathematics) fields and to enhance the scientific literacy of students in other disciplines. For example, in the 2012 report Engage to Excel, the Obama administration announced a goal of "producing, over the next decade, 1 million more college graduates in STEM fields than expected under current assumptions." Achieving these types of goals will require us to harness the potential of all students, forcing us to identify and acknowledge the barriers encountered by students from traditionally underrepresented groups. Over the past few years, I have been working to understand these barriers to success, particularly in mathematics. In this talk, I will share what I have learned so far and how it has influenced my teaching.
What is n?

David Fletcher

Department of Mathematics and Statistics

Date: Thursday 12 October 2017

In some settings, the definition of "sample size" will depend on the purpose of the analysis. I will consider several examples that illustrate this issue, and point out some of the problems that can arise if we are not clear about what we mean by "n".
Project presentations

Honours and PGDip students

Department of Mathematics and Statistics

Date: Friday 6 October 2017

Jodie Buckby : ~~Model checking for hidden Markov models~~
Jie Kang : ~~Model averaging for renewal process~~
Yu Yang : ~~Robustness of temperature reconstruction for the past 500 years~~

Sam Bremer : ~~An effective model for particle distribution in waterways~~
Joshua Mills : ~~Hyperbolic equations and finite difference schemes~~
Gems of Ramanujan and their lasting impact on mathematics

Ken Ono

Emory University; 2017 NZMS/AMS Maclaurin Lecturer

Date: Thursday 5 October 2017

##Note venue of this public lecture##
Ramanujan’s work has had a truly transformative effect on modern mathematics, and continues to do so as we understand further lines from his letters and notebooks. In this lecture, some of the studies of Ramanujan that are most accessible to the general public will be presented and how Ramanujan’s findings fundamentally changed modern mathematics, and also influenced the lecturer’s work, will be discussed. The speaker is an Associate Producer of the film ~~The Man Who Knew Infinity~~ (starring Dev Patel and Jeremy Irons) about Ramanujan. He will share several clips from the film in the lecture.

Biography: Ken Ono is the Asa Griggs Candler Professor of Mathematics at Emory University. He is considered to be an expert in the theory of integer partitions and modular forms. He has been invited to speak to audiences all over North America, Asia and Europe. His contributions include several monographs and over 150 research and popular articles in number theory, combinatorics and algebra. He received his Ph.D. from UCLA and has received many awards for his research in number theory, including a Guggenheim Fellowship, a Packard Fellowship and a Sloan Fellowship. He was awarded a Presidential Early Career Award for Science and Engineering (PECASE) by Bill Clinton in 2000 and he was named the National Science Foundation’s Distinguished Teaching Scholar in 2005. In addition to being a thesis advisor and postdoctoral mentor, he has also mentored dozens of undergraduates and high school students. He serves as Editor-in-Chief for several journals and is an editor of The Ramanujan Journal. He is also a member of the US National Committee for Mathematics at the National Academy of Science.
A statistics-related seminar in Physics: Where do your food and clothes come from? Oritain finds the answer in chemistry and statistics

Katie Jones and Olya Shatova

Oritain Dunedin

Date: Monday 2 October 2017

##A statistics-related seminar in the Physics Department##
##Note day, time and venue##
Oritain Global Ltd is a scientific traceability company that verifies the origin of food, fibre, and pharmaceutical product by combining trace element and isotope chemistry with statistics. Born in the research labs at the Chemistry Department in the University of Otago, Oritain has grown to become a multinational company with offices in Dunedin, London, and Sydney, and with clients from around the globe. Dr Katie Jones and Dr Olya Shatova are Otago alumni working as scientists at Oritain Dunedin. They will provide an overview of the science behind Oritain and discuss their transition from academic research to commercialized science.
Quantitative genetics in forest tree breeding

Mike and Sue Carson

Carson Associates Ltd

Date: Thursday 28 September 2017

Forest tree breeding, utilising quantitative genetic (QG) methods, is employed across a broad range of plant species for improvement of a wide diversity of products, or ‘breeding objectives’. Examples of breeding objectives range from the traditional sawn timber and pulpwood products desired largely from pines and eucalypts, to antibiotic factors in honey obtained from NZ manuka, and including plant oil products from oil palms. The standard population breeding approach recognises a hierarchy of populations (the ‘breeding triangle’) with a broad and diverse gene resource population at the base, and a highly-improved but less diverse deployment population at the peak. With the constraint that the deployment population must contain a ‘safe’ amount of genetic diversity, the main goal for any tree improvement program is to use selection and recombination to maximise deployment population gains in the target traits. The key QG tools used in tree improvement programs for trial data analysis, estimation of breeding values, index ranking and selection, and mating and control of pedigree are in common with most other plant and livestock breeding programs. However, the perennial nature of most tree crops requires tree breeders to place a greater emphasis on the use of well-designed, long-term field trials, in combination with efficient and secure databases like Gemview. Recent advances using factor analytic models are providing useful tools for examining and interpreting genotype and site effects and their interaction on breeding values. Genomic selection is expected to enhance, rather than replace, conventional field screening methods for at least the medium term.
Genomic data analysis: bioinformatics, statistics or data science?

Mik Black

Department of Biochemistry

Date: Thursday 21 September 2017

Analysis of large-scale genomic data has become a core component of modern genetics, with public data repositories providing enormous opportunities for both exploratory and confirmatory studies. To take advantage of these opportunities, however, potential data analysts need to possess a range of skills, including those drawn from the disciplines of bioinformatics, data science and statistics, as well as domain-specific knowledge about their biological area of interest. While traditional biology-based teaching programmes provide an excellent foundation in the latter skill set, relatively little time is spent equipping students with the skills required for genomic data analysis, despite high demand for graduates with this knowledge. In this talk I will work through a fairly typical analysis of publicly accessible genomic data, highlighting the various bioinformatics, statistical and data science concepts and techniques being utilized. I will also discuss current efforts being undertaken at the University of Otago to provide training in these areas, both inside and outside the classroom.
Thinking statistically when constructing genetic maps

Timothy Bilton

Department of Mathematics and Statistics

Date: Thursday 14 September 2017

A genetic linkage map shows the relative position of and genetic distance between genetic markers, positions of the genome which exhibit variation, and underpins the study of species' genomes in a number of scientific applications. Genetic maps are constructed by tracking the transmission of genetic information from individuals to their offspring, which is frequently modelled using a hidden Markov model (HMM) since only the expression and not the transmission of genetic information is observed. However, data generated using the latest sequencing technology often results in partially observed information, which if unaccounted for, typically results in inflated estimates. Most approaches to circumvent this issue involves a combination of filtering and correcting individual data points using ad-hoc methods. Instead, we develop a new methodology that models the partially observed information by incorporating an additional layer of latent variables into the HMM. Results show that our methodology is able to produce accurate genetic map estimates, even in situations where a large proportion of the data is only partially observed.
Network tomography for integer valued traffic

Martin Hazelton

Massey University

Date: Thursday 7 September 2017

Volume network tomography is concerned with inference about traffic flow characteristics based on traffic measurements at fixed locations on the network. The quintessential example is estimation of the traffic volume between any pair of origin and destination nodes using traffic counts obtained from a subset of the links of the network. The data provide only indirect information about the target variables, generating a challenging type of statistical linear inverse problem.

In this talk I will discuss network tomography for a rather general class of traffic models. I will describe some recent progress on model identifiability. I will then discuss the development of effective MCMC samplers for simulation-based inference, based on insight provided by an examination of the geometry of the space of feasible route flows.
TensorFlow: a short intro

Lech Szymanski

Department of Computer Science

Date: Thursday 31 August 2017

TensorFlow is an open source software library for numerical computation. Its underlying paradigm of computation uses data flow graphs, which allow for automatic differentiation and effortless deployment that parallelises across CPUs or GPUs. I have been working in TensorFlow for about a year now, using it to build and train deep learning models for image classification. In this talk I will give a brief introduction to TensorFlow as well as share some of my experiences of working with it. I will try to make this talk not about deep learning with TensorFlow, but rather about TensorFlow itself, which I happen to use for deep learning.
Theory and application of latent variable models for multivariate binomial data

John Holmes

Department of Mathematics and Statistics

Date: Thursday 24 August 2017

A large body of work has been devoted to developing latent variable models for exponential family distributed multivariate data exhibiting interdependencies. For the binomial case however, extensions of models past analysis of binary data is almost entirely missing. Focusing on principal component/factor analysis representations, we will show that under the canonical logit link, latent variable models can be fitted in closed form, via Gibbs sampling, to multivariate binomial data of arbitrary trial size, by applying Pólya-gamma augmentation to the binomial likelihood. In this talk, the properties of binomial latent variable models under Pólya-gamma data augmentation will be discussed from both a theoretical perspective and through application to a range of simulated and real demographic datasets.
Māori student success: Findings from the Graduate Longitudinal Study New Zealand

Moana Theodore

Department of Psychology

Date: Thursday 17 August 2017

Māori university graduates are role models for educational success and important for the social and economic wellbeing of Māori whānau (extended family), communities and society in general. Describing their experiences can help to build an evidence base to inform practice, decision-making and policy. I will describe findings for Māori graduates from all eight New Zealand universities who are participants in the Graduate Longitudinal Study New Zealand. Data were collected when the Māori participants were in their final year of study in 2011 (n=626) and two years post-graduation in 2014 (n=455). First, I will focus on what Māori graduates describe as helping or hindering the completion of their qualifications, including external (e.g. family), institutional (e.g. academic support) and student/personal (e.g. persistence) factors. Second, I will describe Māori graduate outcomes at 2 years post-graduation. In particular, I will describe the private benefits of higher education, such as labour market outcomes (e.g. employment and income), as well as the social benefits such as civic participation and volunteerism. Overall, our findings suggest that boosting higher education success for Māori may reduce ethnic inequalities in New Zealand labour market outcomes and may impart substantial social benefits as a result of Māori graduates’ contribution to society.
Bayes factors, priors and mixtures

Matthew Schofield

Department of Mathematics and Statistics

Date: Thursday 10 August 2017

It is well known that Bayes factors are sensitive to the prior distribution chosen on the parameters. This has led to comments such as “Diffuse prior distributions ... must be used with care” (Robert 2014) and “We do not see Bayesian methods as generally useful for giving the posterior probability that a model is true, or the probability for preferring model A over model B” (Gelman and Shalizi 2013). We consider the calculation of Bayes factors for nested models. We show this is equivalent to a model with a mixture prior distribution, where the weights on the resulting posterior are related to the Bayes factor. These results allow us to directly compare Bayes factors to shrinkage priors, such as the Laplace prior used in the Bayesian lasso. We use these results as the basis for offering practical suggestions for estimation and selection in nested models.
Development and implementation of culturally informed guidelines for medical genomics research involving Māori communities

Phil Wilcox

Department of Mathematics and Statistics

Date: Thursday 3 August 2017

Medical genomic research is usually conducted within a ‘mainstream’ cultural context. Māori communities have been underrepresented in such research despite being impacted by heritable diseases and other conditions that could potentially be unravelled via modern genomic technologies. Reasons for low participation of Māori communities include negative experiences of genomics and genetics researchers - such as the notorious ‘Warrior Gene’ saga – and an unease with technologies that are often implemented by non-Māori researchers in a manner inconsistent with Māori values. In my talk I will describe recently developed guidelines for ethically appropriate genomics research with Māori communities; how these guidelines were informed by my iwi, Ngāti Rakaipaaka, who had previously been involved in a medical genomics investigation; and current efforts to complete that research via a partnership with Te Tari Pāngarau me Tātauranga ki Te Whare Wānaka o Otakou (Department of Mathematics and Statistics at the University of Otago).
Who takes Statistics? A look at student composition, 2000-2016

Peter Dillingham

Department of Mathematics and Statistics

Date: Thursday 27 July 2017

In this blended seminar and discussion, we will examine how student data can help inform curriculum development and review, focussing on the Statistics programme as an example. Currently, the Statistics academic staff are reviewing our programme to ensure that we continue to provide a high quality and modern curriculum that meets the needs of students. An important component of this process is to understand whom our students are and what they are interested in, from first-year service teaching through to students majoring in statistics. As academics, we often have a reasonable answer to these questions, but we can be more specific by poring over student data. While not glamorous, this sort of data can help confirm those things we think we know, identify opportunities or risks, and help answer specific questions where we know that we don’t know the answer.
A missing value approach for breeding value estimation

Alastair Lamont

Department of Mathematics and Statistics

Date: Thursday 20 July 2017

A key goal in quantitative genetics is the identification and selective breeding of individuals with high economic value. For a particular trait, an individual’s breeding value is the genetic worth it has for its progeny. While methods for estimating breeding values have existed since the middle of last century, the march of technology now allows the genotypes of individuals to be directly measured. This additional information allows for improved breeding value estimation, supplementing observed measurements and known pedigree information. However, while it can be cost efficient to genotype some animals, it is unfeasible to genotype every individual in most populations of interest, due to either cost or logistical issues. As such, any approach must be able to accommodate missing data, while also managing computational efficiency, as the dimensionality of data can be immense. Most modern approaches tend to impute or average over the missing data in some fashion, rather than fully incorporating it into the model. These approximations lead to a loss in estimation accuracy. Similar models are used within Human genetics, but for different purposes. With different data and different goals to quantitative genetics, these approaches natively include missing data within the model. We are developing an approach which utilises a human genetics framework, but adapted so as to estimate breeding values.
Assessing and dealing with imputation inaccuracy in genomic predictions

Michael Lee

Department of Mathematics and Statistics

Date: Thursday 13 July 2017

Genomic predictions rely on having genotypes from high density SNP Chips from many individuals. Many national animal evaluations, to predict breeding values, may include millions of animals, where an increasing proportion of these have genotype information. Imputation can be used to make genomic predictions more cost effective. For example, in the NZ Sheep industry genomic predictions can be done by genotyping animals with a SNP Chip of lower density (e.g. 5-15K) and imputing the genotypes for a given animal to a density of about 50K, where the imputation process needs a reference panel of 50K genotypes. The imputed genotypes are used in genomic predictions and the accuracy of imputation is a function of the quality of the reference panel. A study to assess the imputation accuracy of a wide range of animals was undertaken. The goal was to quantify the levels of inaccuracy and to determine a best strategy to deal with this inaccuracy in the context of single step genomic best linear unbiased prediction (ssGBLUP).
Twists and trends in exercise science

Jim Cotter

School of Physical Education, Sport and Exercise Sciences

Date: Thursday 1 June 2017

From my perspective, exercise science is entering an age of enlightenment, but misuse of statistics remains a serious limitation to its contributions and progress for human health, performance, and basic knowledge. This seminar will summarise our recent and current work in hydration, heat stress and patterns of dosing/prescribing exercise, and the implications for human health and performance. These contexts will be used to discuss methodological issues including of research design, analysis and interpretation.
Hidden Markov models for incompletely observed point processes

Amina Shahzadi

Department of Mathematics and Statistics

Date: Thursday 25 May 2017

Natural phenomena such as earthquakes and volcanic eruptions can cause catastrophic damage. Such phenomena can be modelled using point processes. However, this is complicated and potentially biased by the problem of missing data in the records. The degree of completeness of volcanic records varies dramatically over time. Often the older the record is, the more incomplete it is. One way to handle such records with missing data is to use hidden Markov models (HMMs). An HMM is a two-layered process based on an observed process and an unobserved first-order stationary Markov chain with the state duration geometrically distributed. This limits the application of HMMs in the field of volcanology, where the processes leading to missed observations do not necessarily behave in a memoryless and time-independent manner. We propose Inhomogeneous hidden semi-Markov models (IHSMMs) to investigate the time-inhomogeneity of the completeness of volcanic eruption catalogues to obtain the reliable hazard estimate.

Jin Zhang

Department of Accountancy and Finance

Date: Thursday 18 May 2017

The CBOE SKEW is an index launched by the Chicago Board Options Exchange (CBOE) in February 2011. Its term structure tracks the risk-neutral skewness of the S&P 500 (SPX) index for different maturities. In this paper, we develop a theory for the CBOE SKEW by modelling SPX using a jump-diffusion process with stochastic volatility and stochastic jump intensity. With the term structure data of VIX and SKEW, we estimate model parameters and obtain the four processes of variance, jump intensity and their long-term mean levels. Our results can be used to describe SPX risk-neutral distribution and to price SPX options.
Finding true identities in a sample using MCMC methods

Paula Bran

Department of Mathematics and Statistics

Date: Thursday 11 May 2017

Uncertainty about the true identities behind observations is known in statistics as a misidentification problem. The observations may be duplicated, wrongly reported or missing which results in error-prone data collection. This error can affect seriously the inferences and conclusions. A wide variety of MCMC algorithms have been developed for simulating the latent identities of individuals in a dataset using Bayesian inference. In this talk, the DIU (Direct Identity Updater) algorithm is introduced. It is a Metropolis-Hastings sampler with an application-specific proposal density. Its performance and efficiency is compared with two other algorithms solving similar problems. The convergence to the correct stationary distribution is discussed by using a toy example where the data is comprised of genotypes which includes uncertainty. As the state space is small, the behaviour of the chains is easily visualized. Interestingly, while they converge to the same stationary distribution, the transition matrices for the different algorithms have little in common.
Correlated failures in multicomponent systems

Richard Arnold

Victoria University Wellington

Date: Thursday 4 May 2017

Multicomponent systems may experience failures with correlations amongst failure times of groups of components, and some subsets of components may experience common cause, simultaneous failures. We present a novel, general approach to model construction and inference in multicomponent systems incorporating these correlations in an approach that is tractable even in very large systems. In our formulation the system is viewed as being made up of Independent Overlapping Subsystems (IOS). In these systems components are grouped together into overlapping subsystems, and further into non-overlapping subunits. Each subsystem has an independent failure process, and each component's failure time is the time of the earliest failure in all of the subunits of which it is a part.

This is joint work with Stefanka Chukova (VUW) and Yu Hayakawa (Waseda University, Tokyo)
Integration of IVF technologies with genomic selection to generate high merit AI bulls: a simulation study

Fiona Hely


Date: Thursday 27 April 2017

New reproductive technologies such as genotyping of embryos prior to cloning and IVF allow the possibility of targeting elite AI bull calves from high merit sires and dams. A stochastic simulation model was set up to replicate both progeny testing and genomic selection dairy genetic improvement schemes with, and without the use of IVF to generate bull selection candidates. The reproductive process was simulated using a series of random variates to assess the likelihood of a given cross between a selected sire and dam producing a viable embryo, and the superiority of these viable bulls assessed from the perspective of a commercial breeding company.
Recovery and recolonisation by New Zealand southern right whales: making the most of limited sampling opportunities

Will Rayment

Department of Marine Science

Date: Thursday 13 April 2017

Studies of marine megafauna are often logistically challenging, thus limiting our ability to gain robust insights into the status of populations. This is especially true for southern right whales, a species which was virtually extirpated in New Zealand waters by commercial whaling in the 19th century, and restricted to breeding around the remote sub-Antarctic Auckland Islands. We have gathered photo-ID and distribution data during annual 3-week duration trips to study right whales at the Auckland Islands since 2006. Analysis of the photo-ID data has yielded estimates of demographic parameters including survival rate and calving interval, essential for modelling the species’ recovery, while species-distribution models have been developed to reveal the specific habitat preferences of calving females. These data have been supplemented by visual and acoustic autonomous monitoring, in order to investigate seasonal occurrence of right whales in coastal habitats. Understanding population recovery, and potential recolonization of former habitats around mainland New Zealand, is essential if the species is to be managed effectively in the future.
Ion-selective electrode sensor arrays: calibration, characterisation, and estimation

Peter Dillingham

Department of Mathematics and Statistics

Date: Thursday 6 April 2017

Ion-selective electrodes (ISEs) have undergone a renaissance over the last 20 years. New fabrication techniques, which allow mass production, have led to their increasing use in demanding environmental and health applications. These deployable low-cost sensors are now capable of measuring sub-micromolar concentrations in complex and variable solutions, including blood, sweat, and soil. However, these measurement challenges have highlighted the need for modern calibration techniques to properly characterise ISEs and report measurement uncertainty. In this talk, our group’s developments will be discussed, with a focus on modelling ISEs, properly defining the limit of detection, and extensions to sensor arrays.
What in the world caused that? Statistics of sensory spike trains and neural computation for inference

Mike Paulin

Department of Zoology

Date: Thursday 30 March 2017

Before the “Cambrian explosion” 542 million years ago, animals without nervous systems reacted to environmental signals mapped onto the body surface. Later animals constructed internal maps from noisy partial observations gathered at the body surface. Considering the energy costs of data acquisition and inference versus the costs of not doing this in late Precambrian ecosystems leads us to model spike trains recorded from sensory neurons (in sharks, frogs and other animals) as samples from a family of Inverse Gaussian-censored Poisson, a.k.a. Exwald, point-processes. Neurons that evolved for other reasons turn out to be natural mechanisms for generating samples from Exwald processes, and natural computers for inferring the posterior density of their parameters. This is a consequence of a curious correspondence between the likelihood function for sequential inference from a censored Poisson process and the impulse response function of a neuronal membrane. We conclude that modern animals, including humans, are natural Bayesians because when neurons evolved 560 million years ago they provided our ancestors with a choice between being Bayesian or being dead.
This is joint work with recent Otago PhD students Kiri Pullar and Travis Monk, honours student Ethan Smith, and UCLA neuroscientist Larry Hoffman.
Brewster Glacier - a benchmark for investigating glacier-climate interactions in the Southern Alps of New Zealand

Nicolas Cullen

Department of Geography

Date: Thursday 23 March 2017

The advance of some fast-responding glaciers in the Southern Alps of New Zealand at the end of the 20th and beginning of the 21st century during three of the warmest decades of the instrumental era provides clear evidence that changes in large-scale atmospheric circulation in the Southern Hemisphere can act as a counter-punch to global warming. The Southern Alps are surrounded by vast areas of ocean and are strongly influenced by both subtropical and polar air masses, with the interaction of these contrasting air masses in the prevailing westerly airflow resulting in weather systems having a strong influence on glacier mass balance. Until recently, one of the challenges in assessing how large-scale atmospheric circulation influences glacier behaviour has been the lack of observational data from high-elevation sites in the Southern Alps. However, high-quality meteorological and glaciological observations from Brewster Glacier allow us to now assess in detail how atmospheric processes at different scales influence glacier behaviour. This talk will provide details about the observational programme on Brewster Glacier, which has been continuous for over a decade, and then target how weather systems influence daily ablation and precipitation (snowfall).
Estimating overdispersion in sparse multinomial data

Farzana Afroz

Department of Mathematics and Statistics

Date: Thursday 16 March 2017

When overdispersion is present in a data set, ignoring it may lead to serious underestimation of standard errors and potentially misleading model comparisons. Generally we estimate the overdispersion parameter $\phi$ by dividing the Pearson's goodness of fit statistic $X^2$ by the residual degrees of freedom. But when the data are sparse, that is when there are many zero or small counts, it may not be reasonable to use this statistic since $X^2$ is unlikely to be $\chi^2$-distributed. This study presents a comparison of four estimators of the overdispersion parameter $\phi$, in terms of bias, root mean squared error and standard deviation, when the data are sparse and multinomial. Dead recovery data on Herring gulls from Kent Island, Canada are used to provide a practical example of sparse multinomial data. In a simulation study, we consider Dirichlet-multinomial distribution and finite mixture distribution, which are widely used to model extra variation in multinomial data.
Fast computation of spatially adaptive kernel smooths

Tilman Davies

Department of Mathematics and Statistics

Date: Thursday 9 March 2017

Kernel smoothing of spatial point data can often be improved using an adaptive, spatially-varying bandwidth instead of a fixed bandwidth. However, computation with a varying bandwidth is much more demanding, especially when edge correction and bandwidth selection are involved. We propose several new computational methods for adaptive kernel estimation from spatial point pattern data. A key idea is that a variable-bandwidth kernel estimator for d-dimensional spatial data can be represented as a slice of a fixed-bandwidth kernel estimator in (d+1)-dimensional "scale space", enabling fast computation using discrete Fourier transforms. Edge correction factors have a similar representation. Different values of global bandwidth correspond to different slices of the scale space, so that bandwidth selection is greatly accelerated. Potential applications include estimation of multivariate probability density and spatial or spatiotemporal point process intensity, relative risk, and regression functions. The new methods perform well in simulations and real applications.
Joint work with Professor Adrian Baddeley, Curtin University, Perth.
Detection and replenishment of missing data in the observation of point processes with independent marks

Jiancang Zhuang

Institute of Statistical Mathematics, Tokyo

Date: Thursday 2 March 2017

Records of processes of geophysical events, which are usually modeled as marked point processes, such as earthquakes and volcanic eruptions, often have missing data that result in underestimate of corresponding hazards. This study presents a fast approach for replenishing missing data in the record of a temporal point process with time independent marks. The basis of this method is that, if such a point process is completely observed, it can be transformed into a homogeneous Poisson process on the unit square $[0,1]^2$ by a biscale empirical transformation. This method is tested on a synthetic dataset and applied to the record of volcanic eruptions at the Hakone Volcano, Japan and several datasets of the aftershock sequences following some large earthquakes. Especially, by comparing the analysis results from the original and the replenished datasets of aftershock sequence, we have found that both the Omori-Utsu formula and ETAS model are stable, and the variations in the estimated parameters with different magnitude thresholds in past studies are caused by the influence of short-term missing of small events.
A new multidimensional stress release statistical model based on coseismic stress transfer

Shiyong Zhou

Peking University

Date: Tuesday 14 February 2017

NOTE venue is not our usual
Following the stress release model (SRM) proposed by Vere-Jones (1978), we developed a new multidimensional SRM, which is a space-time-magnitude version based on multidimensional point processes. First, we interpreted the exponential hazard functional of the SRM as the mathematical expression of static fatigue failure caused by stress corrosion. Then, we reconstructed the SRM in multidimensions through incorporating four independent submodels: the magnitude distribution function, the space weighting function, the loading rate function and the coseismic stress transfer model. Finally, we applied the new model to analyze the historical earthquake catalogues in North China. An expanded catalogue, which contains the information of origin time, epicentre, magnitude, strike, dip angle, rupture length, rupture width and average dislocation, is composed for the new model. The estimated model can simulate the variations of seismicity with space, time and magnitude. Compared with the previous SRMs with the same data, the new model yields much smaller values of Akaike information criterion and corrected Akaike information criterion. We compared the predicted rates of earthquakes at the epicentres just before the related earthquakes with the mean spatial seismic rate. Among all 37 earthquakes in the expanded catalogue, the epicentres of 21 earthquakes are located in the regions of higher rates.
Next generation ABO blood type genetics and genomics

Keolu Fox

University of San Diego

Date: Wednesday 1 February 2017

The ABO gene encodes a glycosyltransferase, which adds sugars (N-acetylgalactos-amine for A and α-D- galactose for B) to the H antigen substrate. Single nucleotide variants in the ABO gene affect the function of this glycosyltransferase at the molecular level by altering the specificity and efficiency of this enzyme for these specific sugars. Characterizing variation in ABO is important in transfusion and transplantation medicine because variants in ABO have significant consequences with regard to recipient compatibility. Additionally, variation in the ABO gene has been associated with cardiovascular disease risk (e.g., myocardial infarction) and quantitative blood traits (von Willebrand factor (VWF), Factor VIII (FVIII) and Intercellular Adhesion molecule 1 (ICAM-1). Relating ABO genotypes to actual blood antigen phenotype requires the analysis of haplotypes. Here we will explore variation (single nucleotide, insertion and deletions, and structural variation) in blood cell train gene loci (ABO) using multiple datasets enriched for heart, lung and blood-related diseases (including both African-Americans and European-Americans) from multiple NGS datasets (e.g. the NHLBI Exome Sequencing Project (ESP) dataset). I will also describe the use of a new ABO haplotyping method, ABO-seq, to increase the accuracy of ABO blood type and subtype calling using variation in multiple NGS datasets. Finally, I will describe the use of multiple read-depth based approaches to discover previously unsuspected structural variation (SV) in genes not shown to harbor SV, such as the ABO gene, by focusing on understudied populations, including individuals of Hispanic and African ancestry.

Keolu has a strong background in using genomic technologies to understand human variation and disease. Throughout his career he has made it his priority to focus on the interface of minority health and genomic technologies. Keolu earned a Ph.D. in Debbie Nickerson's lab in the University of Washington's Department of Genome Sciences (August, 2016). In collaboration with experts at Bloodworks Northwest, (Seattle, WA) he focused on the application of next-generation genome sequencing to increase compatibility for blood transfusion therapy and organ transplantation. Currently Keolu is a postdoc in Alan Saltiel's lab at the University of California San Diego (UCSD) School of Medicine, Division of Endocrinology and Metabolism and the Institute for Diabetes and Metabolic Health. His current project focuses on using genome editing technologies to investigate the molecular events involved in chronic inflammatory states resulting in obesity and catecholamine resistance.
To be or not to be (Bayesian) Non-Parametric: A tale about Stochastic Processes

Roy Costilla

Victoria University Wellington

Date: Tuesday 24 January 2017

Thanks to the advances in the last decades in theory and computation, Bayesian Non-Parametric (BNP) models are now use in many fields including Biostatistics, Bioinformatics, Machine Learning, Linguistics and many others.

Despite its name however, BNP models are actually massively parametric. A parametric model uses a function with finite dimensional parameter vector as prior. Bayesian inference then proceeds to approximate the posterior of these parameters given the observed data. In contrast to that, a BNP model is defined on an infinite dimensional probability space thanks to the use of a stochastic process as a prior. In other words, the prior for a BNP model is a space of functions with an infinite dimensional parameter vector. Therefore, instead of avoiding parametric forms, BNP inference uses a large number of them to gain more flexibility.

To illustrate this, we present simulations and also a case study where we use life satisfaction in NZ over 2009-2013. We estimate the models using a finite Dirichlet Process Mixture (DPM) prior. We show that this BNP model is tractable, i.e. is easily computed using Markov Chain Monte Carlo (MCMC) methods; allowing us to handle data with big sample sizes and estimate correctly the model parameters. Coupled with a post-hoc clustering of the DPM locations, the BNP model also allows an approximation of the number of mixture components, a very important parameter in mixture modelling.
Computational methods and statistical modelling in the analysis of co-ocurrences: where are we now?

Jorge Navarro Alberto

Universidad Autónoma de Yucatán (UADY)

Date: Wednesday 9 November 2016

NOTE day and time of this seminar
The subject of the talk is statistical methods (both theoretical and applied) and computational algorithms for the analysis of binary data, which have been applied in ecology in the study of species composition in systems of patches with the ultimate goal to uncover ecological patterns. As a starting point, I review Gotelli and Ulrich's (2012) six statistical challenges in null model analysis in Ecology. Then, I exemplify the most recent research carried out by me and other statisticians and ecologists to face those challenges, and applications of the algorithms outside the biological sciences. Several topics of research are proposed, seeking to motivate statisticians and computer scientists to venture and, eventually, to specialize in the subject of the analysis of co-occurrences.
Reference: Gotelli, NJ and Ulrich, W, 2012. Statistical challenges in null model analysis. Oikos 121: 171-180
Extensions of the multiset sampler

Scotland Leman

Virginia Tech, USA

Date: Tuesday 8 November 2016

NOTE day and time of this seminar
In this talk I will primarily discuss the Multiset Sampler (MSS): a general ensemble based Markov Chain Monte Carlo (MCMC) method for sampling from complicated stochastic models. After which, I will briefly introduce the audience to my interactive visual analytics based research.

Proposal distributions for complex structures are essential for virtually all MCMC sampling methods. However, such proposal distributions are difficult to construct so that their probability distribution match that of the true target distribution, in turn hampering the efficiency of the overall MCMC scheme. The MSS entails sampling from an augmented distribution that has more desirable mixing properties than the original target model, while utilizing a simple independent proposal distributions that are easily tuned. I will discuss applications of the MSS for sampling from tree based models (e.g. Bayesian CART; phylogenetic models), and for general model selection, model averaging and predictive sampling.

In the final 10 minutes of the presentation I will discuss my research interests in interactive visual analytics and the Visual To Parametric Interaction (V2PI) paradigm. I'll discuss the general concepts in V2PI with an application of Multidimensional Scaling, its technical merits, and the integration of such concepts into core statistics undergraduate and graduate programs.
New methods for estimating spectral clustering change points for multivariate time series

Ivor Cribben

University of Alberta

Date: Wednesday 19 October 2016

NOTE day and time of this seminar
Spectral clustering is a computationally feasible and model-free method widely used in the identification of communities in networks. We introduce a data-driven method, namely Network Change Points Detection (NCPD), which detects change points in the network structure of a multivariate time series, with each component of the time series represented by a node in the network. Spectral clustering allows us to consider high dimensional time series where the number of time series is greater than the number of time points. NCPD allows for estimation of both the time of change in the network structure and the graph between each pair of change points, without prior knowledge of the number or location of the change points. Permutation and bootstrapping methods are used to perform inference on the change points. NCPD is applied to various simulated high dimensional data sets as well as to a resting state functional magnetic resonance imaging (fMRI) data set. The new methodology also allows us to identify common functional states across subjects and groups. Extensions of the method are also discussed. Finally, the method promises to offer a deep insight into the large-scale characterisations and dynamics of the brain.
Inverse prediction for paleoclimate models

John Tipton

Colorado State University

Date: Tuesday 18 October 2016

NOTE day and time of this seminar
Many scientific disciplines have strong traditions of developing models to approximate nature. Traditionally, statistical models have not included scientific models and have instead focused on regression methods that exploit correlation structures in data. The development of Bayesian methods has generated many examples of forward models that bridge the gap between scientific and statistical disciplines. The ability to fit forward models using Bayesian methods has generated interest in paleoclimate reconstructions, but there are many challenges in model construction and estimation that remain.

I will present two statistical reconstructions of climate variables using paleoclimate proxy data. The first example is a joint reconstruction of temperature and precipitation from tree rings using a mechanistic process model. The second reconstruction uses microbial species assemblage data to predict peat bog water table depth. I validate predictive skill using proper scoring rules in simulation experiments, providing justification for the empirical reconstruction. Results show forward models that leverage scientific knowledge can improve paleoclimate reconstruction skill and increase understanding of the latent natural processes.
Ultrahigh dimensional variable selection for interpolation of point referenced spatial data

Benjamin Fitzpatrick

Queensland University of Technology

Date: Monday 17 October 2016

NOTE day and time of this seminar
When making inferences concerning the environment, ground truthed data will frequently be available as point referenced (geostatistical) observations accompanied by a rich ensemble of potentially relevant remotely sensed and in-situ observations.
Modern soil mapping is one such example characterised by the need to interpolate geostatistical observations from soil cores and the availability of data on large numbers of environmental characteristics for consideration as covariates to aid this interpolation.

In this talk I will outline my application of Least Absolute Shrinkage Selection Opperator (LASSO) regularized multiple linear regression (MLR) to build models for predicting full cover maps of soil carbon when the number of potential covariates greatly exceeds the number of observations available (the p > n or ultrahigh dimensional scenario). I will outline how I have applied LASSO regularized MLR models to data from multiple (geographic) sites and discuss investigations into treatments of site membership in models and the geographic transferability of models developed. I will also present novel visualisations of the results of ultrahigh dimensional variable selection and briefly outline some related work in ground cover classification from remotely sensed imagery.

Key references:
Fitzpatrick, B. R., Lamb, D. W., & Mengersen, K. (2016). Ultrahigh Dimensional Variable Selection for Interpolation of Point Referenced Spatial Data: A Digital Soil Mapping Case Study. PLoS ONE, 11(9): e0162489.
Fitzpatrick, B. R., Lamb, D. W., & Mengersen, K. (2016). Assessing Site Effects and Geographic Transferability when Interpolating Point Referenced Spatial Data: A Digital Soil Mapping Case Study.