Aarhus-Copenhagen Data Science Meeting 2022

15 December 2022

This meeting will bring together senior and early career data science researchers from Aarhus University and University of Copenahgen. Both groups are internationally renowned for their strong research profile in data science, each having their own focus areas. The core of the workshop are scientific presentations and informal discussions with the aim of fostering exchange between both groups.


Participation at the event is open to everyone and free. If you would like to participate, please register by November 10thNovember 13th through the following link: google form


Aarhus University: Aud. G2 (1532-122)


Start: 9:15 End: 17:00
Time Speaker Title Misc
9:00-9:30 Welcome/Registration
9:30-10:10 Asger Hobolth Statistical methods for understanding and decoding evolutionary processes
Abstract (click to expand) Two of my major research interests are statistical methods for (a) understanding genetic diversity within populations, and (b) decoding mutational processes in cancer evolution. In this talk I will describe the core probability models in these two areas of molecular evolution. In particular I plan to emphasize recent developments and challenges that could serve as foundations for future collaborations.
10:10-10:50 Dmytro Marushkevych Parametric drift estimation for high-dimensional diffusions
Abstract (click to expand) This talk is dedicated to the problem of parametric estimation in the diffusion setting and mostly concentrated on properties of the Lasso estimator of drift component. More specifically, we consider a multivariate parametric diffusion model X observed continuously over the interval [0,T] and investigate drift estimation under sparsity constraints. We allow the dimensions of the model and the param- eter space to be large. We obtain an oracle inequality for the Lasso estimator and derive an error bound for the L2-distance using concentration inequalities for linear functionals of diffusion processes. The probabilistic part is based upon elements of empirical processes theory and, in particular, on the chaining method. Some alternative estimation procedures, such as adaptive Lasso and Slope will also be discussed to give a perspective on improving the obtained results.
10:50-11:10 Coffee break
11:10-11:50 Carsten Chong Statistical inference for rough volatility: Central limit theorems
Abstract (click to expand) In recent years, there has been substantive empirical evidence that stochastic volatility is rough. In other words, the local behavior of stochastic volatility is much more irregular than semimartingales and resembles that of a fractional Brownian motion with Hurst parameter H<0.5. In this paper, we derive a consistent and asymptotically mixed normal estimator of H based on high-frequency price observations. In contrast to previous works, we work in a semiparametric setting and do not assume any a priori relationship between volatility estimators and true volatility. Furthermore, our estimator attains a rate of convergence that is known to be optimal in a minimax sense in parametric rough volatility models. This talk is based on joined work with Marc Hoffmann (Paris Dauphine), Yanghui Liu (Baruch College), Mathieu Rosenbaum and Grégoire Szymanski (both Ecole Polytechnique).
11:50-12:30 Niklas Pfister Distribution Generalization and Identifiability in IV Models
Abstract (click to expand) Causal models can provide good predictions even under distributional shifts. This observation has led to the development of various methods that use causal learning to improve the generalization performance of predictive models. In this talk, we consider this type of approach for instrumental variable (IV) models. IV allows us to identify a causal function between covariates X and a response Y, even in the presence of unobserved confounding. In many practical prediction settings the causal function is however not fully identifiable. We consider two approaches for dealing with this under-identified setting: (1) By adding a sparsity constraint and (2) by introducing the invariant most predictive (IMP) model, which deals with the under-identifiability by selecting the most predictive model among all feasible IV solutions. Furthermore, we analyze to which types of distributional shifts these models generalize.
12:30-14:00 Lunch
14:00-14:40 Andreas Basse-O'Connor Berry-Essen Theorem for Functionals of Infinitely Divisible Processes
Abstract (click to expand) In this talk, we derive Berry-Esseen bounds for non-linear functionals of infinitely divisible processes. More precisely, we consider the convergence rate in the Central Limit Theorem for functionals of heavy-tailed moving averages, including the linear fractional stable noise, stable fractional ARIMA processes, and stable Ornstein-Uhlenbeck processes. Our rates are obtained for the Wasserstein and Kolmogorov distances and depend strongly on the interplay between the process's memory, controlled by parameter a, and its tail-index, controlled by a parameter b. For example, we obtain the classical n^{-1/2} convergence rate when the tails are not too heavy and the memory is not too strong, more precisely, when a*b>3 or a*b>4 in the Wasserstein and Kolmogorov distance, respectively. Our quantitative bounds rely on a new second-order Poincare inequality on the Poisson space, which we derive through Stein's method and Malliavin calculus. This inequality improves and generalizes a result by Last, Peccati, and Schulte. The talk is based on joint work with M. Podolskij (University of Luxembourg) and C. Thäle (Ruhr University Bochum).
14:40-15:20 Thomas Mikosch Testing independence of random elements with the distance covariance
Abstract (click to expand) This is joint work with Herold Dehling (Bochum), Muneya Matsui (Nagoya), Gennady Samorodnitsky (Cornell) and Laleh Tafakori (Melbourne). Distance covariance was introduced by Szekely, Rizzo and Bakirov (2007) as a measure of dependence between vectors of possibly distinct dimensions. Since then it has attracted attention in various fields of statistics and applied probability. The distance covariance of two random vectors X, Y is a weighted L2 distance between the joint characteristic function of (X,Y) and the product of the characteristic functions of X and Y. It has the desirable property that it is zero if and only if X, Y are indepen- dent. This is in contrast to classical measures of dependence such as the correlation between two random variables: zero correlation corresponds to the absence of linear dependence but does not give any information about other kinds of dependencies. We consider the distance covariance for stochastic processes X, Y defined on some interval and having square integrable paths, including Levy processes, fractional Brownian, diffusions, stable processes, and many more. Since distance covariance is defined for vectors we consider discrete approximations to X, Y . We show that sample versions of the discretized distance covariance converge to zero if and only if X, Y are independent. The sample distance covariance is a degenerate V-statistic square-root n-rates. This fact also shows nicely in simulation studies for independent X,Y in and, therefore, has rate of convergence which is much faster than the classical contrast to dependent X, Y .
15:20-15:40 Coffee break
15:40-17:00 PhD Lightning round


Thomas Mikosch (University of Copenhagen)

Dr. Mikosch is a Professor at the Department. He got his Master degree in Mathematics at TU Dresden (1981), defended his PhD in Probability Theory at St. Petersburg University (1984), his Habilitation at TU Dresden (1990). Before he joined the Department on January 1, 2001, he worked at TU Dresden, ETH Zürich, ISOR Wellington, RUG Groningen.

Asger Hobolth (Aarhus University)
Asger Hobolth is currently Professor in Data Science in the Department of Mathematics at Aarhus University. He received his PhD in Theoretical Statistics from Aarhus University in 2002. His research interests include computational statistics, genomics and high-dimensional statistics amongst others.

Niklas Pfister (University of Copenhagen)
My research focuses on statistical methodology for complex data structures. Developing statistical tools that help us understand underlying causal mechanisms is becoming more important as our data collection capabilities increase. Much of my work uses some notion of causality and stability to model and infer parts of such mechanisms using data. Statistical methodology should be driven by applied problems which I try to reflect in my approach to research.
Andreas Basse-O'Connor (Aarhus University)
I am a Professor of Stochastics at the Department of Mathematics, Aarhus University, Denmark. My research interests are probability theory and in particular stochastic processes.

Dmytro Marushkevych (University of Copenhagen)
I am a postdoc on Thiele Data Science Fellowship at the Department of Mathematics, University of Copenhagen. I received my PhD in 2019 from Le Mans University, France, under the supervision of Marina Kleptsyna and from 2019 to 2021 I held a postdoctoral position at Aarhus University and University of Luxembourg, mentored by Mark Podolskij. My research interests include high dimensional statistics, asymptotic statistics, statistics in partial observations and filtering problems, focusing mainly on diffusion and fractional processes.
Special Guest: Carsten Chong (Columbia University)
Starting from Fall 2020, I am an Assistant Professor in the Department of Statistics at Columbia University. Prior to this, I did my PhD at the Technical University of Munich under the supervision of Claudia Klüppelberg and Jean Jacod and spent a couple of years as a postdoc in the group of Robert Dalang at EPFL.


  • Claudia Strauch (Aarhus University)
  • Lukas Trottner (Aarhus University)
  • Munir Hiabu (University of Copenhagen)