Welcome to my personal website!

I am Associate Professor in Econometrics and Statistics and James S. Kemper Foundation Faculty Scholar at the University of Chicago Booth School of Business.

My work brings together statistics and machine learning to analyze and develop tools for learning from large datasets.

My research interests reside at the intersection of Bayesian and frequentist statistics, and include: machine learning, variable selection, optimization, non-parametric methods, factor models, high-dimensional decision theory and inference.

News!

I received the 2020 NSF Career award. The faculty Early Career Development (CAREER) Program is a Foundation-wide activity that offers the National Science Foundation's most prestigious awards in support of early-career faculty who have the potential to serve as academic role models in research and education and to lead advances in the mission of their department or organization. Activities pursued by early-career faculty should build a firm foundation for a lifetime of leadership in integrating education and research. More info here.

Attention!

Interested in working with me?


I am inviting applications for a (post-doctoral) Principal Research Professional. I am looking for exceptional doctoral-level candidates with a strong statistics background.

Job Description

`

Contact Information

Veronika.Rockova@ChicagoBooth.edu
369 Charles M. Harper Center
5807 South Woodlawn Avenue
Chicago, IL 60637

Publications and Manuscripts

NEW!

  • Ideal Bayesian Spatial Adaptation
    Rockova V. and Rousseau J. (2021)
    Submitted
    link
  • Metropolis-Hastings via Classification
    Kaji T. and Rockova V. (2021)
    Journal of the American Statistical Association, Theory and Methods (In Revision)
    link
  • The Bayesian Bootstrap Spike-and-Slab LASSO
    Nie L. and Rockova V. (2020)
    Journal of the American Statistical Association, Theory and Methods (In Revision)
    link
  • The Art of BART: On Flexibility of Bayesian Forests
    Jeong S. and Rockova V. (2020)
    Manuscript
    link
  • Dynamic Sparse Factor Analysis
    McAlinn K., Saha E. and Rockova V. (2018+)
    Journal of Applied Econometrics (In Revision)
    pdf

Statistics and Machine Learning

  • Uncertainty Quantification for Bayesian CART
    Castillo I. and Rockova V. (2021)
    The Annals of Statistics (In Press)
    link
  • Adaptive Bayesian SLOPE: Model Selection with Incomplete Data
    Jiang W., Bogdan M., Josse J., Majewski S., Miasojedow, B., Rockova V. and TraumaBase Group (2021)
    Journal of Computational and Graphical Statistics (In Press)
    link
  • Variable Selection via Thompson Sampling
    Liu Y. and Rockova V. (2021)
    Winner of SBSS 2020 Student Paper Competition awarded by ASA
    Journal of the American Statistical Association, Theory and Methods (In Press)
    link
  • ABC Variable Selection with Bayesian Forests
    Liu Y., Rockova V. and Wang Y. (2021)
    Journal of the Royal Statistical Society, Series B (In Press)
    pdf
  • The Median Probability Model and Correlated Variables
    Barbieri M., Berger J., George E. and Rockova V. (2020)
    Bayesian Analysis (In Press)
    link
  • Regularization via Bayesian Penalty Mixing
    Comment on: Ridge Regularization: An Essential Concept in Data Science by Trevor Hastie
    George E. and Rockova V. (2020)
    Technometrics (62), 438-442
    link
  • Spike-and-Slab Meets the LASSO: A Review of the Spike-and-Slab LASSO
    Bai R., George E. and Rockova V. (2020)
    Handbook on Bayesian Variable Selection (In Press)
    link
  • Determinantal Priors for Bayesian Variable Selection
    Rockova V. and George, E. (2020)
    Statistics in the Public Interest - In Memory of
    Stephen E. Feinberg (In Press) link
  • Spike-and-Slab LASSO Biclustering
    Moran G., Rockova V. and George E. (2020)
    The Annals of Applied Statistics (15), 148-173
    link
  • On Semi-parametric Inference for BART
    Rockova V. (2020)
    37th International Conference on Machine Learning (119), 8137–8146
    pdf
  • Uncertainty Quantification for Sparse Deep Learning
    Wang Y. and Rockova V. (2020)
    23rd Conference on Artificial Intelligence and Statistics (108), 298–308
    pdf
  • Dynamic Variable Selection with Spike-and-Slab Process Priors
    Rockova V. and McAlinn K. (2020)
    Bayesian Analysis (16), 233-269
    pdf
  • Posterior Concentration for Bayesian Regression Trees and Forests
    Rockova V. and van der Pas S. (2020)
    The Annals of Statistics (48), 2108-2131
    pdf | Supplement
  • On Theory for BART
    Rockova V. and Saha E. (2019)
    22nd Conference on Artificial Intelligence and Statistics (89), 2839–2848
    pdf
  • Posterior Concentration for Sparse Deep Learning
    Polson N. and Rockova V. (2018)
    32nd Annual Conference on Neural Information Processing Systems (NeurIPS)
    pdf
  • Simultaneous Variable and Covariance Selection with the Multivariate Spike-and-Slab Lasso
    Deshpande S., Rockova V. and George E. (2019)
    Journal of Computational and Graphical Statistics (18), 921–931
    link
  • On Variance Estimation for Bayesian Variable Selection
    Moran G., Rockova V. and George E. (2019)
    Bayesian Analysis (14), 1091–1119

    pdf | Supplement
  • Particle EM for Variable Selection
    Rockova V. (2018)
    Journal of the American Statistical Association, Theory and Methods (113), 1684-1697

    pdf | supplement
  • The Spike-and-Slab LASSO
    Rockova V. and George E. (2018)
    Journal of the American Statistical Association, Theory and Methods (113), 431-444

    pdf | supplement
  • Bayesian Estimation of Sparse Signals with a Continuous Spike-and-Slab Prior
    Rockova V. (2018)
    The Annals of Statistics (46), 401-437
    pdf | supplement
  • Bayesian Dyadic Trees and Histograms for Regression
    van der Pas S. and Rockova V. (2017)
    31st Annual Conference on Neural Information Processing Systems (NeurIPS)
    pdf
  • Hospital Mortality Rate Estimation for Public Reporting
    George E., Rockova V., Rosenbaum, P., Satopaa, V., Silber, J. (2017)
    Journal of the American Statistical Association, Applications (112), 933-947 link
  • Fast Bayesian Factor Analysis via Automatic Rotations to Sparsity
    Rockova V. and George E. (2016)
    Journal of the American Statistical Association, Theory and Methods (111), 1608-1622
    pdf | Supplement
  • Determinantal Regularization for Ensemble Variable Selection
    Rockova V., Moran, G. and George E. (2016)
    19th International Conference on Artificial Inteligence & Statistics pdf
  • Bayesian Penalty Mixing: The Case of a Non-separable Penalty
    Rockova V. and George E. (2015)
    Statistical Analysis for High-Dimensional Data - The Abel Symposium 2014 Springer Series pdf
  • EMVS: The EM Approach to Bayesian Variable Selection
    Rockova V. and George E. (2014)
    Journal of the American Statistical Association, Theory and Methods (109), 828-846 link
  • Negotiating Multicolinearity with Spike-and-Slab Priors
    Rockova V. and George E. (2014)
    Metron (72), 217-229 link
  • Incorporating Grouping in Bayesian Variable Selection with Applications in Genomics
    Rockova V. and Lesaffre E. (2014)
    Bayesian Analysis (9), 221-258. link
  • Hierarchical Bayesian Formulations for Selecting Variables in Regression Models
    Rockova V., Lesaffre E., Luime, J., Lowenberg, B. (2012)
    Statistics in Medicine (31), 1221-1237. link

Public Health and Biomedical

  • Improving Medicare's Hospital Compare Mortality Model
    Silber, J. H., Satopaa, V. A., Mukherjee, N., Rockova, V. , Wang, W., Hill, A., Even-Shoshan, O., Rosenbaum, P. R., and George, E. (2016)
    Health Services Research Journal
  • Risk-stratification of Intermediate-risk Acute Myeloid Leukemia: Integrative Analysis of a multitude of gene mutation and expression markers
    Rockova V., Abbas S., Wouters B.J., Erpelinck C., Beverloo B., Delwel R., van Putten W., Lowenberg B. and Valk P. (2011)
    Blood (118), 1069-1076
  • The Prognostic Relevance of miR-212 Expression with Survival in Cytogenetically and Molecularly Heterogeneous AML
    Sun S., Rockova V., Bullinger L., Dijkstra M., Dohner H., Lowenberg B., Jongen-Lavrencic M. (2013)
    Leukemia (27), 100-106
  • Mutant DNMT3A: a Marker of Poor Prognosis in Acute Myeloid Leukemia
    Ribeiro A., Pratcorona M., Erpelinck C., Rockova V., Sanders M., Abbas S., Figueroa M., Zeilemaker Z., Melnick A., Lowenberg B., Valk P. and Delwel R. (2012)
    Blood (119), 5824-5831
  • Retroviral Integration Mutagenesis in Mice and Comparative Analysis in Human AML Identify Reduced PTP4A3 Expression as a Prognostic Indicator
    Beekman E., Valkhof M., Erkeland S., Taskesen E., Rockova V., Peeters J., Valk P., Lowenberg B. and Touw I. (2011)
    PLoS ONE 6(10), e26537
  • Deregulated Expression of EVI1 Defines a Poor Prognostic Subset of MLL-Rearranged Acute Myeloid Leukemias
    Groschel S., Schlenk R., Engelmann J., Rockova V., Teleanu V., Kuhn M., Eiwen K., Erpelinck C., Havermans M., Lubbert M., Germing U., Schmidt-Wolf I., Beverloo B., Schuurhuis G., Bargetzi M., Krauter J., Ganser A., Valk P., Lowenberg B., Dohner K., Dohner H., Delwel R. (2013)
    Journal of Clinical Oncology 31(1), 95-103

Refereed Proceedings

  • Fast Bayesian Factor Analysis with the Indian Buffet Process
    Rockova V. and George E. (2014)
    47th Scientific Meeting of Italian Statistical Society
  • Dual Coordinate Ascent EM for Bayesian Variable Selection
    George E., Rockova V., Lesaffre E. (2013)
    28th International Workshop in Statistical Modeling, ISBN: 978-88-96251-47-8, 165-171
  • Sparse Bayesian Factor Regression Approach to Genomic Data Integration
    Rockova V. and Lesaffre E. (2013)
    28th International Workshop in Statistical Modeling, ISBN: 978-88-96251-47-8, 337-343
  • Incorporating Prior Biological Knowledge in Bayesian Modeling of Sparse Networks
    Rockova V. and Lesaffre E. (2012)
    27th International Workshop in Statistical Modeling, ISBN: 978-80-263-0250-6, 291-296

"Machinarium"

example
		  graphic

BB-SSL

This R package implements BB-SSL (Bayesian Bootstrap Spike-and-Slab LASSO) from Nie and Rockova (2021). BB-SSL is an approximate posterior sampling strategy for the Spike-and-Slab LASSO.

Download Here

See documentation and examples.

TVS

This R package implements TVS (Thompson sampling for variable selection) from Liu and Rockova (2021). TVS is a reinforcement learning algorithm for Bayesian subset selection.

Download Here

See examples.

EMVS

C++ written R package implementing an EM algorithm for Bayesian variable selection described in Rockova and George (2014). The software is made available as is, and no warranty - about the software, its performance or its conformity to any specification - is given or implied. Please email me with comments and suggestions. The package can be installed via R CMD BUILD and R CMD INSTALL from a local R library directory.

Download Here

Now available on CRAN!


Check out help(EMVS) for examples.

Spike-and-Slab LASSO

C written R package implementing coordinate-wise optimization for Spike-and-Slab LASSO priors in linear regression (Rockova and George (2015)). Spike-and-Slab LASSO is a spike-and-slab refinement of the LASSO procedure, using a mixture of Laplace priors indexed by lambda0 (spike) and lambda1 (slab). The SSLASSO procedure fits coefficients paths for Spike-and-Slab LASSO-penalized linear regression models over a grid of values for the regularization parameter lambda_0.

Now available on CRAN!


Download Here

Check out help(SSLASSO) for examples.

Factor Rotations to Sparsity

R code for implementing rotations to sparsity in high-dimensional factor models (Rockova and George (2015)). FACTOR ROTATE is a unified Bayesian approach that incorporates factor rotations within the model fitting process, greatly enhancing the effectiveness of sparsity inducing priors. These automatic transformations are embedded within a new PXL-EM algorithm, a Bayesian variant of parameter-expanded EM for fast posterior mode detection.

Download Here

Particle EM

C written R package implementing Particle EM of Rockova (2017), a new population-based optimization strategy that harvests multiple modes in search spaces that present many local maxima. Motivated by non-parametric variational Bayes strategies, Particle EM achieves this goal by deploying an ensemble of interactive repulsive particles. These particles are geared towards uncharted areas of the posterior, providing a more comprehensive summary of its topography than simple parallel EM deployments.

Download Here

My Team

example
	graphic

Seonghyun Jeong, Ph.D.

example
	graphic Seonghyun is a Senior Research Professional in Econometrics and Statistics at the University of Chicago, Booth School of Business. Seonghyun Jeong received his Ph.D. at North Carolina State University in the Department of Statistics under the guidance of Dr. Subhashis Ghoshal. He also obtained his M.A. and B.A. in Statistics from Yonsei University in Seoul, Korea. His research interests include Bayesian asymptotic theories, high- and infinite-dimensional inference, and nonparametric Bayes.


Enakshi Saha

Enakshi is a 4th year PhD Student at the Department of Statistics at the University of Chicago. She received BStat and MStat degrees from the Indian Statistical Institute in Kolkata, India. Her research interests primarily include Bayesian Statistics, Factor Analysis, Nonparametric and High Dimensional Statistical methods and Time Series Analysis.


Yi Liu

Yi is a 3rd year PhD Student in the Department of Statistics at the University of Chicago. Yi obtained his undergraduate degree at the Imperial College London and later a master degree in statistics from Stanford. Yi is deeply interested in Bayesian Methods and application of Bayesian Statistics into different fields such as genetics and epidemiology. In his free time, Yi likes to watch documentaries about history and Nature.



Lizhen Nie

Lizhen is a 3rd year PhD Student in the Department of Statistics at the University of Chicago. Lizhen obtained her undergraduate degree at Zhejiang University, China. Her research interests primarily include Bayesian methods with a particular emphasis on Bayesian computation and model selection.


Qing Yan

Qing is a 3rd year PhD Student in the Department of Statistics at the University of Chicago. Qing received his Bachelor’s degree in Pure and Applied Mathematics from Tsinghua University. His research interests include: Nonparametric Bayesian Method, Causal Inference, Computer Vision, Deep Generative Model, Invertible Neural Networks.

Yuexi Wang

example
graphic Yuexi is a 2nd year PhD student in Econometrics and Statistics at Booth. Yuexi received her master degree in Statistics at the University of Chicago. Her research interests include Bayesian statistics and machine learning. Prior to UChicago, she obtained a Bachelor’s degree in Mathematics and Applied Mathematics from Zhejiang University (China).





Teaching

Big Data (BUS 41201)

Course Description

BUS 41201 is a course about data mining: the analysis, exploration, and simplification of large high-dimensional datasets. Students will learn how to model and interpret complicated `Big Data' and become adept at building powerful models for prediction and classification. Techniques covered include an advanced overview of linear and logistic regression, model choice and false discovery rates, multinomial and binary regression, classification, decision trees, factor models, clustering, the bootstrap and cross-validation. We learn both basic underlying concepts and practical computational skills, including techniques for analysis of distributed data. Heavy emphasis is placed on analysis of actual datasets, and on development of application specific methodology. Among other examples, we will consider consumer database mining, internet and social media tracking, network analysis, and text mining.

Syllabus

Teaching Assistants:

Ken McAlinn (kenmcalinn@gmail.com)
Wenxi Li (wenxi.li@chicagobooth.edu)
Jianfei Cao (jcao0@chicagobooth.edu)

Office Hours:

By appointment

Review Sessions:

Saturday at Gleacher.
Instructor: Kenichiro (Ken) McAlinn (Senior Research Professional in Econometrics and Statistics)

R Resources:

Dowload R , R Project Site , R Studio

Tutorials: Google developer , Princeton , TryR code school , Quick R
Books: R in a nutshell , Art of R programming, Library E-Books , Introductory Statistics with R

Piazza link

piazza.com/uchicago/spring2017/busn412010185bigdata/home

First Class Assignment:

Make yourself familiar with R! The course is a fast paced introduction to a wide variety of statistical learning methods. Knowing the basics of R before you start will make your life much easier and allow you to concentrate your effort on learning data science tools and concepts. As a start, I recommend going through R tutorials, such as the TryR tutorial at http://tryr.codeschool.com, to people who are new to R.

Week 1 : Inference at scale


Slides


Datasets:


Trucks: pickup.R , pickup.csv
Diabetes: dm2_pvals.R , dm2_fdr.R , diabetes.csv
Cholesterol: lipids.R , jointGwasMc_LDL.txt
Extra Code: fdr.R


Week 2 : Regression


Slides


Datasets:


Orange juice: oj.R , oj.csv
Spam: spam.R , spam.csv
Extra Code: deviance.R

Week 3 : Model Selection


Slides


Datasets:


Comscore: comscore.R , CS2006demographics.csv , CS2006domains.csv.csv , CS2006sites.txt , CS2006totalspend.csv
Semiconductor: semiconductor.R , semiconductor.csv
Extra Code: naref.R

Week 4 : Treatment Effects


Slides


Datasets:


Abortion: abortion.dat , abortion.R , us_cellphone.csv
Paidsearch: paidsearch.csv , paidsearch.R
Extra Code: mab.R

Week 5 : Classification


Slides


Datasets:


Credit: credit.csv , credit.R , data_description
Glass: glass.R
Extra Code: roc.R

Week 6 : Networks


Slides


Datasets:


Marriage: firenze.R , firenze.txt
Karate: karate.R
Lastfm: lastfm.R , lastfm.csv
Websearch: CaliforniaEdges.csv , CaliforniaNodes.txt , websearch.R

Week 7 : Clustering


Slides


Datasets:


Protein: protein.R , protein.csv
Wine: wine.R , wine.csv
We8there: we8there.R
Extra Code: kIC.R


Week 8 : Factor Models


Slides


Datasets:


Protein: protein.R , protein.csv
Rollcall: rollcall_votes.R , rollcall.csv , rollcall-members.csv
NBC: nbc_demographics.csv , nbc_pilotsurvey.csv , nbc_showdetails.csv , nbc.R
Gas: gas.R , gasoline.csv

Week 9 : Trees


Slides


Datasets:


Prostate: prostate_cancer.R , prostate.csv
Mcycle: mcycle.R
Calhomes: CAhousing.csv , calhomes.R