Education
- Ph.D. in Statistics, University of Chicago, 2020 (expected)
- B.A. in Mathematics and Economics (Music Minor), Cornell University, 2013
Awards
- Davis Wallace Award for Applied Statistics, 2018
- Scholarship for Summer Institutes at the University of Washington in Seattle, 2017
Research
- De-mystifying Drop-outs in Single Cell UMI Data - Kim, T. Zhou, X. and Chen, M.
- Dynamic Gene Coexpression Analysis with Correlation Modeling - Kim, T. and Nicolae, D.
- Multivariate Bayesian Analysis with Incomplete Data: Application to Local Ancestry Effects on admixed Transcriptome - Kim, T. and Nicolae, D.
- Minimizing confounders and increasing data quality in murine models for studies of the gut microbiome - Miyoshi, J. et al.
Softwares
- HIPPO (Heterogeneity Inspired Pre-Processing Tool), available in BioConductor: HIPPO
- Multivariate Missing Bayesian Variable Selection, available in CRAN: MMVBVS
- Differential Network Analysis: diffNet
- Clustering Noisy Single Cell Data: SCNoisyClustering
- Change Point Analysis for Copy Number Variation: CopyNumberCellShift
Talks
- University of Chicago Consulting Seminar
- Application of Neural Networks to Predicting Winter Wheat Yield
- University of Chicago, Medical School, Section of Genetic Medicine
- Autoencoders with Parametric Noise Models for De-Noising scRNA-seq Data
Other Projects
- Noisy Data Clustering, advised by Mengjie Chen
- design a similarity learning algorithm to cluster zero-inflated data with high noise and apply to single-cell sequencing data
- Copy Number Variation Change Point Analysis, advised by Mengjie Chen
- devise an alternating descent algorithm combining group fused lasso and mixed effects model to detect copy number alterations in cancer cells
- Ancestry-eGenes, advised by Dan Nicolae
- found genes that are differentially expressed based on local ancestry and genotypes for 44 human tissues in African Americans and European Americans; found an enrichment in the immunity-related region
- Microbiome Data Analysis, consulting team leader
- Advised a manuscript revision in a medical journal for sound analysis of microbiome data, especially to model within and between group variance accounting for complex batch effects
- Effects of Maternal Language Use in Children’s Brain Development, consulting team leader
- Led a consulting team to provide statistical analysis for complex correlated data for a psychologist’s post-doctoral project
- Computing Variance across Large Data Sets, advised by Lars Vilhuber
- Manually implemented map-reduce for 3.3TB census data with several million time-series while avoiding memory issue and ensuring numerical stability
Work experience
- Zurich North America, 2019
- Data Scientist Summer Intern
- Built a neural network model to predict performance of winter wheat yields for the Multi-peril Crops Insurance that reduced RMSE by 10% compared to the existing GBM model
- Conducted extensive research toe valuate the current state of the data for explaining the variability of crop yields
- Hanwha Life Insurance, 2013 - 2014
- Actuarial Associate
- participated as part of the product design team
- revised contracts to abide by the amendments of national tax systems
- LnB Prep, 2014 - 2015
- Instructor
- taught SAT, ACT, and TOEFL
- developed strategy textbooks for the new SAT
Skills
- Statistical Modeling and Inference
- Coding Languages
- R, Rcpp, Python, Matlab, Java, bash, Julia (basic)
- Other Technical Skills
- Git, TeX, Linux/Unix, HTML, MS Office
- Language
- English, Korean
Courseworks
- Theoretical Statistics
- Distribution Theory (STAT 304)
- Mathematical Statistics 1 (STAT 301)
- Mathematical Statistics 2: Bayesian Analysis and Principles (STAT 302)
- Nonparametric Inference (STAT 374)
- Multiple Testing, Modern Inference, and Replicability (STAT 308)
- Applied Statistics
- Applied Linear Statistical Methods (STAT 343)
- Design and Analysis of Experiments (STAT 345)
- Generalized Linear Models (STAT 347)
- Computational Biology: Models and Inference (STAT 354)
- Statistical Genetics (STAT 355)
- Computational Statistics
- Mathematical Computation I: Matrix Computation (STAT 309)
- Mathematical Computation II: Nonlinear Optimization (STAT 310)
- Machine Learning (STAT 377)
- Machine Learning and Cancer (CMSC 337)
- Machine Learning and Large-Scale Data Analysis (STAT 376)
Teaching
I served as a course assistant in Stat 331, Stat 226, and Stat 200 and was in charge of grading and answering the students’ questions. For Stat 234, I served as the lead TA who was in charge of the overall organization of the course.
- Stat 331 Sample Survey (Autumn 2016, Autumn 2017, Autumn 2018)
- Stat 234 Statistical Models and Methods I (Spring 2016, Spring 2017)
- Stat 226 Analysis of Categorical Data (Winter 2017, Winter 2019)
- Stat 200 Elementary Statistics (Winter 2015)
Other Activities
- Cornell University Chorus, 2011 - 2013
- Rockefeller Chapel Choir, 2015 - Present