研究主題 RESEARCH HIGHLIGHTS

The details of the aims consist of:

1. Identify gene signature for response to immunotherapy in melanoma

To date, immune checkpoint inhibitors (CPI) therapy is one of the frontiers in metastatic cancer treatments. However, the objective response rate remains very low. We shall first analyze whole genome sequencing data) of UKBB to reveal variants then genes relevant to tumor mutation burden. Next, mimicking the processes of immune cells killing tumor cells, we shall identify the immune subtype, e.g., plasma cells, and its unique signature most associated with response to CPI, via analyzing bulk- and single-cell RNA-seq and clinical data of melanoma patients. A two-stage penalized regression model will be trained on the signature and tested. Finally, spatial transcriptomes and clinical data will be analyzed to refine the signature. A prediction score based on the signature genes will be derived and tested by many datasets.


2. Redict disease occurrence using multiple site biobanks

From the integration of the biobanks perspective, the primary challenges are both policy and heterogeneity among different biobanks. Neural networks (NN) modeling has great flexibility in handling the heterogeneity and complex data structure presented in Biobanks, and has been widely used in various areas. Google and many researchers use tokens (discrete pieces of data) in analyzing electronic health records(eHR). Most of these studies are single institute databases. However, this method could also be applied to heterogeneous biobanks. For instance, we can model site bias as a separate variable or included in the token. We aim to predict the progression of CKD, and explore the tokenization technique. We will first work on the big eHR from Chang Gung Hospital, which has rich information on hematology tests, medication history, and demographic details. Secondly, we will apply our method to UKBB and TWBB. Lastly, the heterogeneity across biobanks will be investigated by comparing UKBB to TWBB.


3.Unravel the statistical relations among the features of genotype and imaging data and diverse phenotypes

We aim to automatically extract relevant features from multimodal data, and to unravel associations between genotypes, images and phenotypes of UKBB and TWBB. We shall establish a causal chain from genetic variations to structural/functional variations, and then to the macroscopic/medical phenotypes.  For instance, associations between specific genetic variations and Alzheimer’s disease can be mediated in the variations of certain brain area sizes. Using deep neural network (NN) models such as CNN and autoencoder, we shall retrieve the abstract features relevant to fit the data in the mediation analysis, and then map these abstract features back into interpretable image features.