Biography

Ye Zheng is an NIH/NHGRI K99/R00 fellow and a tenure-track Assistant Professor in the Bioinformatics and Computational Biology Department of the University of Texas MD Anderson Cancer Center. Dr. Zheng received her postdoctoral training at the Fred Hutchinson Cancer Center from both molecular biology and quantitative modelingperspectives mentored by Dr. Steven Henikoff and Dr. Raphael Gottardo. She has also established close collaborations with Dr. Cameron Turtle and Dr. Evan Newell to decipher the CAR-T cell immunotherapy response variations. Before her postdoctoral training, Dr. Zheng received a Ph.D. in Statistics from the University of Wisconsin-Madison under the supervision of Dr. Sündüz Keleş, and her dissertation topics were centered around statistical modelings of three-dimensional chromatin structure (3D genomics) for promoter-enhancer inference.

At MD Anderson Cancer Center, Dr. Zheng leads a quantitative research group dedicated to the statistical modeling and computational pipeline development using bulk and single-cell transcriptomics, proteomics, epigenomics, and 3D genomics data to address biological and clinical challenges. Her wet lab specializes in the epigenomic profiling of the Formalin-Fixed, Paraffin-Embedded (FFPE) samples.

The goal of Dr. Zheng’s research group is to solve biological and clinically important, and methodologically challenging problems by innovating cutting-edge statistical models. The group is actively hiring at all levels, including but not limited to Postdoc, Ph.D., Master students, technician and undergraduate or graduate interns, and open to discussion and collaboration. This highly interdisciplinary group looks forward to being inspired and motivated by the novel and intriguing problems in other disciplines.

Hiring Projects

  1. [3D genomics] Investigating the three-dimensional chromatin organization and the long-range gene regulation through multimodality integrative modeling and accompanying software development, using data such as scHi-C and scRNA-seq.
  2. [Epigenomics] RNA PolII profiling on the formalin-fixed paraffin-embedded (FFPE) samples provide a cost-effective and robust approach to generate critical data for cancer research and motivate new association and prediction models with patient phenotypes. This project also has a wet lab training option and is actively hiring a technician.
  3. [Epigenomics] Single-cell epigenomics data are known for their ultra-sparsity. Denoising and imputation models are needed to gain useful cell information and integrate across epigenomic markers.
  4. [Proteomics] Cell surface protein measurement can provide deeper and standardized single-cell cell-type annotations and status descriptions. The project integrates CITE-seq and Cytomery data across the study and platform for joint disease analysis.
  5. [Immunotherapy and Multi-omics] Statistical modeling and computational analysis of immunological and immunotherapeutic Studies using multi-omics bulk and single-cell genomics data to decipher key genotypic and phenotypic features that drive efficacy versus toxicity in CAR-T cell immunotherapy.
  6. [ML] Genomic-disease association machine learning modeling leverage the unprecedented resolution of single-cell data for unraveling genomic underpinnings of disease phenotypes and therapeutic responses.

Please send your CV/resume, a brief cover letter describing your relevant experience and motivations, a GitHub link to the repository or any other materials that can best demonstrate your programming skills, and any related research manuscripts/writing samples (if applicable) to yzheng8@mdanderson.org.

Interests
  • Statistical Genomics
  • Computational Biology
  • Multi-omics
  • Immunotherapy and Cancer Study
Education
  • Ph.D. in Statistics - Minor in Quantitative Biology, 2019

    University of Wisconsin - Madison

  • B.E. in Statistics, 2014

    Renmin University of China

Professional Experience

 
 
 
 
 
Fred Hutchinson Cancer Center
Postdoctoral Research Fellow
Nov 2019 – Sep 2024 Seattle, WA, USA

Integrative modeling of bulk and single-cell transcriptomics, epigenomics and proteomics:

  • Developed FFPE epigenomic landscape processing and analysis pipeline. Proposed customized normal- ization method for hypertranscription tumor samples. - Developed normalization method, ADTnorm, for CITE-seq data to remove the technical batch effect and facilitate data integration across studies. - Constructed data processing and analysis pipeline for CUT&RUN and CUT&Tag data and created spike-in free normalization method for multiple sample comparison. - Developed statistical models and computational tools for integrative analysis of single-cell 3D genomics, transcriptomics and epigenomics for gene cis-regulatory mechanism discovery.

Wet lab experimental training:

  • Learned and implemented epigenomic experiments, including FFPE-CUTAC, CUT&Tag, scCUT&Tag, scMulTI-Tag, to collect data for multiple ongoing projects in the Henikoff lab.

Chimeric antigen receptor T (CAR-T) Cell Immunotherapy:

  • Multi-omics integrative analysis to profile the genomic signatures of CAR-T cell therapy products from transcriptomics and epigenomics perspectives using CITE-seq, CUT&RUN and scCUT&Tag data. - Constructed tree-based machine learning models to detect genomic features associated with CAR-T cell immunotherapy efficacy and toxicity.

Statistical consulting for genomics and biomedical studies:

  • Statistical support for single-cell RNA-seq analysis in Hsieh Lab, trajectory analysis in Prlic Lab, and collaborative analysis of cytotoxic HDAC inhibition with Seattle Children’s Hospital.
 
 
 
 
 

Dissertation Research:

  • Developed biologically motivated hierarchical generative model to investigate 3D chromatin architectures using Hi-C data and investigated the genomic features involving repetitive regions of the genomes.
  • Developed a computational tool for fast simulation of 3D proximity ligation sequencing data.
  • Constructed hierarchical testing to detect differential 3D genome interactions with precise False Discovery Rate control.
  • Investigated protein-DNA interactions residing in repetitive regions and integrated multi-mapping reads into Encyclopedia of DNA Elements (ENCODE) ChIP-seq data processing pipeline.

Collaborative Work with the Bresnick Lab:

  • Leveraged multi-omics analysis, particularly using ATAC-seq and RNA-seq data, to reveals GATA/Heme regulation mechanism in controlling hemoglobin synthesis and erythrocyte development.
  • Investigated the impact of single nucleotide mutation in the Ets motif of GATA2 enhancer on its function to control hematopoiesis through a comprehensive transcriptomic differential analysis.
 
 
 
 
 
Business Analysis, IBM
Data Scientist Intern
Feb 2014 – Jun 2014 Beijing, China

Projects:

  • Developed dynamic text mining model using IBM communication database to infer topic networks.
  • Constructed a modified Latent Dirichlet Allocation model to optimize the CPU usage of IBM servers.
 
 
 
 
 
School of Information, Renmin University of China
Project Assistant
Feb 2012 – Jan 2014 Beijing, China
Developed multi-objectives operations research model to improve proposals grouping accuracy and efficiency utilizing Multi-Objective Particle Swarm Optimization (MOPSO) algorithm for optimization.
 
 
 
 
 
Department of Biology, Mathematics and Statistics, University of Ottawa
Research Assistant
Jun 2013 – Sep 2013 Ottawa, Canada
Investigated Approximate Bayesian Computation (ABC), ABC–Markov Chain Monte Carlo and ABC– Sequential Monte Carlo samplers in estimating the transmission networks of viruses in human populations.
 
 
 
 
 
Department of Statistics and Actuarial Science, The University of Hong Kong
Exchange Study
Sep 2012 – Jan 2013 Hong Kong, China

Publications

+: co-first authors, ++: co-corresponding authors

Epigenomics and FFPE

  1. Henikoff S+, Zheng Y+, Paranal R, Xu Y, Greene J, Henikoff J, Russell Z, Szulzewsky F, Thirimanne H, Kugel S, Holland E, Ahmad K. RNA Polymerase II at histone genes predicts outcome in human cancer. Under review of Science. (2024) Previous version is available at: https://www.biorxiv.org/content/10.1101/2024.02.28.582647v3.

  2. Henikoff S, Zheng Y, Ahmad K. Mitotic errors do not explain aneuploidy in cancer. Under review of Trends in Genetics. (2024)

  3. [Book] Savonen et al. Choosing Genomics Tools. (2023) Chapter 19 CUT&RUN and CUT&Tag. Full author list. Online book chapter.

  4. Wu S, Furlan S, Mihalas A, Kaya-Okur H, Feroze H, Emerson S, Zheng Y, Carson K, Cimino P, Keene C, Holland E, Sarthy J, Gottardo R, Ahmad K, Henikoff S, Patel A. [Single-cell CUTTag analysis of chromatin modifications in differentiation and tumor progression](https://doi. org/10.1038/s41587-021-00865-z). Nature Biotechnology. 2021.

  5. Zheng Y, Ahmad K, Henikoff K. CUTTag Data Processing and Analysis Tutorial. Protocols.io. (2020) https://www.protocols.io/view/cut-amp-tag-data-processing-and-analysis-tutorial-e6nvw93x7gmk/v1. (17,459 views, 4,355 exports, and 239 questions)

  6. Liao R+, Zheng Y+, Liu X, Zhang Y, Seim G, Tanimura N, Wilson G, Hematti P, Coon J, Fan J, Xu J, Keleş S++ and Bresnick E++. Discovering How Heme Controls Genome Function Through Heme-omics. Cell Reports. 2020.

  7. Zeng X, Li B, Welch R, Rojo C, Zheng Y, Dewey CN, Keleş S. Perm-seq: Mapping Protein-DNA Interactions in Segmental Duplication and Highly Repetitive Regions of Genomes with Prior-enhanced Read Mapping. PLoS Computational Biology. 2015.

Three-dimensional chromatin interaction and long-range gene regulation

  1. Zheng Y+, Shen S+, Keleş S. Normalization and de-noising of single-cell Hi-C data with BandNorm and scVI-3D. Accepted by Genome Biology. 2022. (*: co-first authors)

  2. Shen S, Zheng Y++, Keleş S++. scGAD: single-cell gene associating domain scores for exploratory analysis of scHi-C data. Bioinformatics. 2022. (+: co- corresponding authors)

  3. Cheng J, Clayton J, Acemel R, Zheng Y, Taylor R, Keleş S, Harley J, Quail E, Gómez-Skarmeta J and Ulgiati D. Regulatory architecture of the RCA gene cluster captures an intragenic TAD boundary and enhancer elements in B cells. Frontiers in Immunology, section B Cell Biology. 2022.

  4. Zheng Y++, Zhou P, Keleş S++. FreeHi-C Spike-in Simulations for Benchmarking Differential Chromatin Interaction Detection. Methods. 2021.

  5. Huang K, Wu Y, Shin J, Zheng Y, Siahpirani A, Lin Y, Ni Z, Chen J, You J, Keleş S, Wang D, Roy S, Lu Q. Transcriptome-wide transmission disequilibrium analysis identifies novel risk genes for autism spectrum disorder. PLOS Genetics. 2021.

  6. Zheng Y, Keleş S. FreeHi-C simulates high-fidelity Hi-C data for benchmarking and data augmentation. Nature Methods. 2020.

  7. The ENCODE Project Consortium, et al. Expanded Encyclopedias of DNA Elements in the Human and Mouse Genomes. Nature. 2020 .

  8. The ENCODE Project Consortium, Snyder, M.P., Gingeras, T.R., Moore, J.E., Weng, Z., Gerstein, M.B., Ren, B., Hardison, R.C., Stamatoyannopoulos, J.A., Graveley, B.R., Feingold, E.A. and Pazin, M.J. Perspectives on ENCODE. Nature. 2020.

  9. Zheng Y, Ay F, Keleş S. Generative modeling of multi-mapping reads with mHi-C advances analysis of Hi-C studies. eLife. 2019.

Single-cell Proteomics

  1. Zheng Y+, Caron D+, Kim J, Jun S, Tian Y, Florian M, Stuart K, Sims P, Gottardo R. ADTnorm: Robust Integration of Single-cell Protein Measurement across CITE-seq Datasets. BioRxiv. Under review of Nature Communications. (2024) https://www.biorxiv.org/content/10.1101/2022.04.29.489989v2.

Statistical Modeling and Computational Analysis of Immunological and Immunotherapeutic Studies

  1. Fiorenza S, Zheng Y, Purushe J, Bock T, Sarthy J, Janssens D, Sheih A, Kimble E, Kirchmeier D, Phi T, Gauthier J, Hirayama A, Riddell S, Wu Q, Gottardo R, Maloney D, Yang J, Henikoff S, Turtle C. Histone marks identify novel transcription factors that parse CAR-T subset-of-origin, clinical potential and expansion. Accepted at Nature Communications. (2024)

  2. Germanos AA, Arora S, Zheng Y, Goddard ET, Coleman IM, Ku AT, Wilkinson S, Amezquita RA, Zager M, Long A, Yang YC, Bielas J, Gottardo R, Ghajar C, Nelson P, Sowalsky A, Setty M, Hsieh A. Defining cellular population dynamics at single cell resolution during prostate cancer progression. eLife. 2022.

  3. Hirayama AV, Zheng Y, Dowling MR, Sheih A, Phi TD, Kirchmeier DR, Chucka AW, Gauthier J, Maloney DG, Gottardo R, Turtle CJ. Long-Term Follow-up and Single-Cell Multiomics Characteristics of Infusion Products in Patients with Chronic Lymphocytic Leukemia Treated with CD19 CAR-T Cells. Blood. 2021.

  4. Vitanza N, Biery M, Myers C, Ferguson E, Zheng Y, Girard E, Przystal J, Park G, Noll A, Pakiam F, Winter C, Morris S, Sarthy J, Cole B, Leary S, Crane C, Lieberman N, Mueller S, Nazarian J, Gottardo R, Brusniak M, Mhyre A, Olson J, Optimal therapeutic targeting by HDAC inhibition in biopsy-derived treatment-naïve diffuse midline glioma models. Neuro-Oncology. 2021.

  5. Soukup AA, Zheng Y, Mehta C, Liu P, Hofmann I, Zhou Y, Zhang J, Choi K, Johnson KD, Keles S, Bresnick EH. Single-nucleotide human disease mutation inactivates a blood-regenerative GATA2 enhancer. Journal of Clinical Investigation. 2019.

  6. Tanimura N, Liao R, Wilson GM, Dent MR, Cao M, Burstyn JN, Hematti P, Liu X, Zhang Y, Zheng Y, Keleş S, Xu J, Coon J, Bresnick E. GATA/Heme Multi-omics Reveals a Trace Metal-dependent Cellular Differentiation Mechanism. Developmental Cell. 2018.

Software

  • ADTnorm: R package for normalization and integration tools for CITE-seq cell surface measurement.

  • scGAD: R package for extracting the three-dimensional chromatin interaction at the unit of genes and facilitate the integration of single-cell 3D genomcis with other single-cell modalities.

  • scVI-3D: Normalization and de-noising of single-cell Hi-C data using deep generative modeling using python pipline.

  • BandNorm: R package for fast band normalization for sing-cell Hi-C data. (Co-developer)

  • FreeHiC Spike-In: FreeHi-C python pipeline with a user/data-driven spike-in module to allow a comprehensive comparison of differential chromatin interaction detection methods where the ground truth differential chromatin interactions are known.

  • FreeHiC: Python pipeline using FRagment Interactions Empirical Estimation method for fast simulation of Hi-C and other 3D proximity ligation sequencing data. Major computing parts are accelerated by C.

  • mHiC: Python pipeline of multi-mapping strategy for Hi-C data by probabilistically assigning reads originatedfrom repetitive regions. Major computing parts are accelerated by C.

  • permseq: R package for mapping protein-DNA interactions in highly repetitive regions of the genomes with prior-enhanced read mapping.

  • permseqExample: R package for the permseq package illustration and demo runs. Smaller raw data and demo R scripts are provided for quick runs in order to get to know permseq package.

Computing Skills

R

Daily Usage

shell

Daily Usage

Python

Optional if excel at R

Rmarkdown/Jupyter Notebook

Reproducible Report

git

Reproducible Research

Grid/Distributed computing systems

Daily Usage

Teaching and Mentoring

Teaching

  • STAT 423/623 - Probability Bioinformatics and Genetics (Spring 2024 Guest Lecturer at Rice University):

    Gave lecture to statistics and biostatistics graduate and undergraduate students about statistical analysis in single-cell genomics.

  • STAT 877 - Statistical Methods for Molecular Biology (Fall 2020 Guest Lecturer at University of Wisconsin - Madison):

    Gave lecture to statistics and biostatistics graduate students about 3D Genomics and Long-range Gene Regulations.

  • AMSI BioInfoSummer (Winter 2019 Workshop Lecturer at University of Sydney, Australia)

    Gave a workshop to the faculty and students attending the AMSI BioInfoSummer conference about basic concepts of 3D Genomics and how to do computational data processing and statistical modeling in a practical manner. Led interactive computational group work to process real Hi-C data using Google Box.

  • STAT 998 - Statistical Consulting (Fall 2019 Guest Lecturer at University of Wisconsin - Madison):

    Lead lectures to discuss real-world consulting problem with statistics graduate students utilizing the traditional and modern statistical tools.

  • STAT 877 - Statistical Methods for Molecular Biology (Spring 2019 Guest Lecturer at University of Wisconsin - Madison):

    Gave lecture to statistics and biostatistics graduate students about 3D Genomics and Long-range Gene Regulations.

  • 2017-2018 Single-cell Technologies Journal Club (Organizer and Instructor at University of Wisconsin - Madison):

    Gave lectures about single-cell related research topics, such as scRNA-seq, scATAC-seq and scHi-C, to graduate students and post-docs from statistics background, and led paper review discussions.

  • 2017-2018 Three-dimensional Chromatin Interactions Journal Club (Organizer and Instructor at University of Wisconsin - Madison):

    Gave lectures about 3D chromatin architecture related research topics to graduate students and post-docs from statistics background, and led paper review discussions.

  • STAT301 - Introduction to Statistical Methods (Fall 2014 Guest Lecturer for Discussion Sections at University of Wisconsin - Madison):

    Led undergraduate students discussions for solving hypothesis testing and statistical estimation problems.

Mentoring

  • Long Nguyen, Bioinformatics Analyst I at Fred Hutchinson Cancer Center, currently Master student at University of Michigan (Feb. 2022 to July 2024):

    Single-cell transcriptomics and proteomics integrative analysis for cell atlas construction of CAR-T cell therapy CITE-seq data and association with gene and protein markers with clinical responses.

  • Siqi Shen, Ph.D. Candidate at UW-Madison (June 2020 to Dec. 2023):

    Co-mentor with Dr. Sunduz Keles on single-cell 3D chromatin organization normalization and integrative analysis with single-cell transcriptomics and epigenomics.

  • Fanding Zhou, VISP student at UW-Madison, currently Ph.D. student at UC Berkeley (June 2020 to Sep. 2021):

    Co-mentor with Dr. Sunduz Keles on constructing tree-based statistical models for the false discovery rate control of 3D chromatin organization differential detection.

  • Olivia Rae Steidl, Summer Undergraduate Student at University of Wisconsin - Madison, currently Ph.D. student at University of Wisconsin - Madison (Summer 2019):

    Co-mentor with Dr. Sunduz Keles on investigation of the poly(UG) tails at the end of RNAs and its function in human using eCLIP-seq data.

Professional Activity

Journal Reviewer

  • Genome Medicine

  • Science Advances

  • Nature Biotechnology

  • Briefings in Bioinformatics

  • Scientific Report

  • eLife

  • Bioinformatics

  • PLOS Computational Biology

  • BMC Bioinformatics

  • Life Science Alliance

  • Annals of Applied Statistics

  • Computational and Structural Biotechnology

Committee

  • Grant Review Committee: 2024 Fred Hutchison Cancer Center TDS IRC Postdoctoral Fellowship and TDS IRC Pilot Award

  • Program Committee: 2024 Regulatory and Systems Genomics Conference with DREAM Challenges (RSGDREAM2024)

Blogs

Download data from GEO
Download data from GEO by linux command lines.