Seminar: Machine Learning (Summer Term 2019) – Chair for Clinical Bioinformatics

The aim of this practical seminar (7 CP) is the realization of a complete pipeline of a project from the problem statement to finding solutions using methods of machine learning on our Deep Learning cluster. The topics are proposed by different groups of the Mathematics and Computer Science departments consisting for example of topics in computer linguistics, bioinformatics, computer vision, computer graphics, language processing, and optimization. Each topic will be supervised by the group that proposed the topic. Up to three students will be working on one topic. The project seminar consists of three parts. In the first part, the student group working on a topic will get acquainted with the topic, the input data, and research about potential solutions for the given problem. The second part is the implementation and testing of solutions for the problem. At the end of the course, the groups will present their topics and the solutions in a seminar.
Organizer: Dr. Pedro Guimarães (see below for the project supervisor/tutor)
Dates:

Registration *	HERE from April 16th to April 23rd, 2019
Kick-off meeting	April 25th 2019, 10:00-11:00, Building E 2.1 Room 2.06
Deadline to register in HISPOS OR de-register from seminar *	May 16th, 2019
Lecture dates	General introduction: April 26th, 2019, 9:00-10:00, Building E 2.1 Room 2.06 Technical introduction: May 3rd, 2019, 9:00-10:00, Building E 2.1 Room 2.06
Intermediate Progress Meeting	May 24th, 2019, 10:00, Building E 2.1 Room 2.06
Deadline for handing in results of implementation	July 12th, 2019
Presentations	July 19th, 2019, 10:00, Building E 1.1 Room 2.06
Final report deadline	July 26th, 2019

* If you want to deregister from the seminar, please send the tutor an email irrespectively whether you (de)registered in HISPOS or not. Requirements for participation:

at least one course in Machine Learning or Statistical Learning or Neural Networks: Implementation and Applications

Certificate requirements:

Successful presentation:
- Talk: 30 minutes
- Questions from the tutors/audience after the presentation
Taking minutes during the practical part to make clear which student worked on which part of the project.
Handing in a final report after the presentation along with the protocol of the practical part.

Final grade:

Based on the given presentation (see “Certificate requirements”)
May be influenced by the submitted report and handling of the practical part

Topics

#	Supervisor	Topic	Participants

1	Peter Minko	Non-occlusive mesenteric ischemia (NOMI) detection
1	Our data set consists of 10,000 patients that underwent heart and/or thorax surgery. Of these patients we extracted laboratory values (approximately over 12 million) to be analyzed in regards to non-occlusive mesenteric ischemia (NOMI). This disease is graded by a scoring system: 0 points = no NOMI, 1-3 mild and 4-7 severe signs of NOMI. This scoring system has been validated by previous data and publications. There are several questions we want to pursue: 1. Correlation of the severity of NOMI with the lab values; 2. How are lab values changing over time: pre-operative (earliest value), post-operative (24 h after) and 24h, 48h and 72 h after the last angiography; 3. Correlation with indication: Emergency vs. planed operation with lab values; 4. Re-bleeding of the thorax with lab values pre-operative and first day post-operative.

2	Veit Flockerzi & Claudia Fecher-Trost	Comparison of proteomes derived from primary trophoblast cells of wild-type (WT) and Trpv6 gene-deficient mice (Trpv6-/- or Trpv6KO)
2	TRPV6 channels selectively conduct calcium ions (Fecher-Trost et al. (2013); Störger & Flockerzi (2014)) and are expressed during pregnancy in the maternal decidua and the fetal labyrinth of the placenta (Fecher-Trost et al. (2019)). As we have recently shown TRPV6 channels are necessary for proper embryonic and bone development (Fecher-Trost et al. (2019)): Homozygous Trpv6KO embryos that develop in a Trpv6KO mother accumulate less calcium, have a reduced bone mineralization and altered bone biomechanics that persist into adulthood. This suggests a causal relationship between the functional TRPV6 channel in the placenta and bone development. We have already shown that calcium uptake in isolated trophoblasts, which are part of the fetal placental labyrinth, is greatly reduced in Trpv6KO animals. In addition, the morphology and cell-cell contacts of the trophoblasts of the placental labyrinth is reduced compared to the WT. This suggests a correlation between functional trophoblast TRPV6 channels, calcium uptake and placental labyrinth morphology. We isolated trophoblasts from wild-type (WT) and Trpv6KO placentae (n = 5, each) of pregnant mice, examined them for their growth behavior in cell culture and analyzed them by mass spectrometry (nanoLC-MS / MS). Although the amount of protein and the number of spectra obtained was almost equal for both groups, the overall number of identified proteins was 2875 for WT trophoblasts and 1776 for Trpv6KO trophoblasts. We would like to get answers to the following questions: 1. Can proteins (qualitatively and quantitatively) be identified to determine whether the protein endowment of wild-type and Trpv6KO trophoblasts is different? Or are there only individual differences of the proteomes, independently of the respective genotype, id est wild-type or Trpv6KO? 2. In case there are changes in both categories (WT vs Trpv6 KO), which parameters are different and are these parameters related to cellular functions, such as cell adhesion, cell-cell contacts, hormonal regulation, transport, angiogenesis, calcium homeostasis/signaling or metabolism? 3. Is it possible to draw any conclusions in regard of underlying changes of cellular pathways of trophoblasts in Trpv6 KO animals?

3	Klaus Illgner & Sunil Jaiswal	Robust depth estimation	Samim Taray
3	The company K\|Lens GmbH is a spin-off of the Saarbrücken University and the Max-Planck Institute for Computer Science. https://www.k-lens.de/ We are working on the development of a light field lens and the corresponding software. http://resources.mpi-inf.mpg.de/KaleidoCam/ A core module of our software is a robust depth estimation algorithm. Here we are working on different approaches, one of which is based on Machine Learning. Over the past months, we have analyzed various solutions and consider the Flownet 2.0 algorithm to be the most promising basis. https://lmb.informatik.uni-freiburg.de/Publications/2017/IMSKDB17/ We now want to optimize this algorithm according to our hardware, to accelerate it, and to train the network according to our very special camera logic. Ideally, this algorithm will then be used within our photographic product as well as to solve industrial challenges. From our part, the project will be fully supported and result-oriented.

4	Matthias Mueter	Sugar beet plants segmentation	Natalia Sergeeva, Vladimir Bokor
4	Due to public pressure, the use of herbicides in farming will be reduced in the near future. An alternative is mechanical weed hoeing. The online detection of plant positions and the distinction between weed and crop are the main problems of mechanical weed control in the row. https://www.youtube.com/watch?v=TFL2K1tyat0 The goal of the project is to get the coordinates (local horizontal and vertival position in pixels) from each sugar beet in the image. That means the plant center (where the plant grows out of the soil). As a secondary target we want to distinguish between weed and sugar beet plants.

5	Mustafa Kahraman	Identification of tumor patients with Deep Learning approaches based on microRNAs measured in blood samples	Harry Ritchie, Hui-Syuan Yeh, Koushik Chowdhury
5	In the last 10 years the number of studies and efforts in the area of body fluid diagnostics and research of diverse diseases has been increased. Especially the interest is focused on early identification of tumor patients since the current applied diagnostic methods work only for diseases at advanced stage which is in most cases too late for a successful curative treatment. While the majority of the studies deals with messengerRNAs as biomarkers, the community researching with microRNAs becomes more and more established. The current research shows that there is a huge diagnostic potential in the regulatory behavior of miRNAas. This is usually analyzed in bioinformatics by classical machine learning methods and hypothesis tests. In the last years Deep Learning is one of the newest trends in data science applied in different fields. However, there are only a few groups using the new methods for researching with miRNAs. The following project has the aim to find if Deep Learning approaches can find sets of biomarkers that can distinguish good enough between tumor and non-tumor patients, or even outperform classical machine learning methods used on the same dataset. The dataset consists of over 3000 patients of which 500 have tumors.

6	Vera Demberg & Xudong Hong	Coherent Visual Storytelling
6	Storytelling is one of the oldest activities of language use. Automatic storytelling has been a research focus in modern artificial intelligence. An interesting variant of this task is visual storytelling, generating stories given visual input like images or videos (Huang et al., 2016). State-of-the-art models can generate short stories given five consecutive photos (Wang et al., 2018). However, these models do not consider whether the generated stories are coherent. One problem in the output of current storytelling systems is that referring expressions are often not used correctly (e.g., there is no antecedent that a pronoun in a text can resolve to, for example when the text talks about “he”, but no male person has previously been mentioned). Some recent works on coherent language generation integrate coherence models in the storytelling task (Clark et al., 2018, Xu et al., 2018, Fan et al., 2018), which explicitly represent the entities that have been previously introduced. In this project, we would like to apply these coherent storytelling methods to the visual storytelling task so that we can generate coherent stories given photo sequences. A second potential direction is to include some measure of coherence into the objective during training (Nguyen et al., 2017). One of our baseline models makes use of reinforcement learning for visual storytelling which provides a test-bed for different rewards. We aim to design an auxiliary reward that is sensitive to text coherence to improve this baseline.

7	Pedro Guimarães	Parkinson’s Disease diagnosis from SPECT imaging	Anna Olah, Egla Hajdini, Jayesh Mahapatra
7	There are no quantitative objective biomarkers for the diagnosis and progression of Parkinson’s Disease (PD), the second most common neurodegenerative disorder. It is a lifelong condition that heavily impacts individuals and their families. Symptom-based evaluation hinders both the daily clinical practice and the research for novel pharmaceutical therapies. Several studies have reported substandard misdiagnosis rates, and early stages often go undiagnosed. In this work we will pursue the automatic diagnosis of PD from single-photon emission computed tomography (SPECT) imaging using convolutional neural networks. Furthermore, we will take advantage of a comprehensive longitudinal dataset to search for progression biomarkers.

8	Tobias Fehlmann	RNAfold using Deep Learning
8	Computationally fast and efficient prediction of RNA secondary structure in dot-bracket notation and minimum free energy computation based on deep neural networks. Example input/output sequence: In : UAGCUUAUCAGACUGAUGUUGACUGUUGAAUCUCAUGGCAACACCAGUCGAUGGGCUGUC Out: (((((((((.(((((.(((((.((((.((…))))))))))).)))))))))))))).. Webserver: http://rna.tbi.univie.ac.at//cgi-bin/RNAWebSuite/RNAfold.cgi Toolsuite (ViennaRNA package): https://www.tbi.univie.ac.at/RNA/

9	Tim Kehl	Multi-task learning for drug sensitivity prediction	Naajil Khan, Trang Do
9	Cancer is a heterogeneous class of diseases that are caused by an interplay of various genetic and environmental factors and that can be characterized by a common set of features, known as the Hallmarks of Cancer. The high genotypic and phenotypic diversity among tumors makes the treatment of malignant tumors a grand challenge. As a remedy, optimal therapies have to be determined in a personalized manner based on the in-depth characterization of each tumors genetic and epigenetic makeup. In the last few years, several research projects showed that neural networks and especially deep learning techniques can successfully be used to predict the sensitivity of tumor cells to individual drugs based on their molecular characteristics. In this project, deep learning techniques will be used to develop and apply a multi-task learning approach to solve this problem. To this end, we use the Genomics of Drug Sensitivity in Cancer Project (GDSC) dataset, which contains measurements of gene expression, genetic aberrations and methylation for over 1000 cancer cell lines and their response to 250 anti-cancer drugs.