The aim of this practical seminar (7 CP) is the realization of a complete pipeline of a project from the problem statement to finding solutions using methods of machine learning on our Deep Learning cluster. The topics are proposed by different groups of the Mathematics and Computer Science departments consisting for example of topics in computer linguistics, bioinformatics, computer vision, computer graphics, language processing, and optimization.
Each topic will be supervised by the group that proposed the topic. Up to three students will be working on one topic. The project seminar consists of three parts. In the first part, the student group working on a topic will get acquainted with the topic, the input data, and research about potential solutions for the given problem. The second part is the implementation and testing of solutions for the problem. At the end of the course, the groups will present their topics and the solutions in a seminar.
|Registration *||from April 10th, 2018 to April 17th, 2018 HERE|
|Kick-off meeting||April 19th 2018, 10:00-11:00, Building E 2.1 Room 2.06|
|Deadline to register in HISPOS OR de-register from seminar *||May 10th, 2018|
|Lecture dates||General introduction: 20.04.18, 9:00-10:00, Building E 2.1 Room 2.06
Technical introduction: 27.04.18, 9:00-10:00, Building E 2.1 Room 2.06
|Intermediate Progress Meeting||May 18th 2018, 8:00-10:00, Building E 2.1 Room 2.06|
|Deadline for handing in results of implementation||Juli 13th, 2018|
|Presentations||Juli 20th 2018, 10:15-12:30, Building E 2.1 Room 2.06|
|Final report deadline||Juli 27th 2018|
* If you want to deregister from the seminar, please send the tutor an email irrespectively whether you (de)registered in HISPOS or not.
Requirements for participation:
- at least one course in Machine Learning or Statistical Learning or Neural Networks: Implementation and Applications
- Successful presentation:
- Talk: 30 minutes
- Questions from the tutors/audience after the presentation
- Taking minutes during the practical part to make clear which student worked on which part of the project.
- Handing in a final report after the presentation along with the protocol of the practical part.
- Based on the given presentation (see “Certificate requirements”)
- May be influenced by the submitted report and handling of the practical part
|1||Prof. Keller||miRNA target prediction||miRNAs are small non-coding RNA molecules that regulate gene expression post-transcriptionally . The prediction of miRNA targets is still challenging and relies on feature engineering and extraction . Deep learning approaches may help to resolve these issues by learning complex features directly from the data. One of the proposed approaches  uses auto-encoders to learn new mRNA/miRNA representation from the sequence data and feed it into subsequent neural network for target classification. The goal of the project would be to implement the described model using Keras  and TensorFlow  in Python. The model should be trained, tuned, and validated on the given data. Moreover, different model structures should be tested, e.g. CNN (convolution neural network) and RNN (recurrent neural network) layers.
|Saheli De||Fridays, 9:00-10:00|
|2||Prof. Klakow||New types of word representations||Representing words as vectors is extremely popular since the advent of tools like word2vec. However vectors have bad (that is unrealistic) compositional properties. Consider „Only he told me that ….“ vs „He told only me that …“. Both versions use the same words but have very different meanings. Adding up vectors representing the words in this example gives the same result for both versions, because vector addition is commutative while word composition to form a sentence is not commutative. The simplest possibles extension could be matrices to represent words. This would have the advantage that matrix multiplication is not-commutative like word composition is not. Therefore computing sentence meaning by matrix multiplication is more realistic. The goal of this project is to develop a word2mat tool and check its properties on standard language modelling tasks.||Mossad Helali
Muhammad Ehtisham Ali
|3||Mario Fritz & Yang He||Depth-aware dilated convolution networks for RGBD semantic segmentation of Traffic Scene||Understanding of street scene is a key ingredient to the success of autonomous and assisted driving. One widely used formulation of this task is a pixel-wise labeling into relevant semantic classes such as road, car, pedestrian and so on. State of the art methods rely on contextual information for highly accurate predictions in realistic conditions. E.g. Dilated convolutions  are widely used operations in semantic segmentation, which is able to acquire large receptive field while keeping feature map resolution. Besides, it has been shown that dilation factors in convolutions can be learned automatically for better context modeling . However, in , only color information is exploited to learn dilation factors. In the case of depth are easily captured, depth information is potentially more effective than color image to determine the size of regions or objects. Therefore, this project will investigate the use of depth information for learning more suitable dilation factors in RGBD semantic segmentation in traffic scenes
|4||Vera Demberg & Wei Shi||Domain-adaptation for neural discourse relation classifiers.||Discourse relation classification is the task of determining how two different sentences relate to one another. For instance, in a pair of sentences like “It’s cold. The radiator is broken.”, human comprehenders will usually infer a causal relation between the two sentences. Being able to automatically calculate such inferences is an important step in deep language understanding; however, the task is very challenging when explicit cues like “because”, “but” or “however” are not present in the text.
Current state of the art systems in automatic discourse relation classification rely on neural networks, e.g. LSTMs . Due to the relatively small amount of training data, it is important to train models such that they don’t just learn idiosyncracies of the training data, but generalize well also to new domains . The central goal of this project is to apply neural discourse relation classifiers to a new out-of-domain dataset  to establish a baseline, and then improve over this baseline by using neural domain adaptation methods  as well as additional weakly labelled data .
|5||Tim Kehl & Kerstin Lenhof||Predicting the sensitivity of cancer drugs||Cancer is a heterogeneous class of diseases that are caused by an interplay of various genetic and environmental factors and that can be characterized by a common set of features, known as the Hallmarks of Cancer. The high genotypic and phenotypic diversity among tumors makes the treatment of malignant tumors a grand challenge. As a remedy, optimal therapies have to be determined in a personalized manner based on the in-depth characterization of each tumors genetic and epigenetic makeup.
In this project, machine learning techniques should be used to predict the sensitivity of certain drugs based on genetic and epigenetic markers of tumors. To this end, we use the Genomics of Drug Sensitivity in Cancer Project (GDSC) dataset, which contains measurements of gene expression, genetic aberrations and methylation for over 1000 cancer cell lines and their response to 250 anti-cancer drugs.
|Hasan Md Tusfiqur Alam
Anilkumar Erappanakoppal Swamy