Differences

This shows you the differences between two versions of the page.

--- teaching:ss2024:dl4science:summaries [2024/05/08 17:26]
Karnik Ram
+++ teaching:ss2024:dl4science:summaries [2024/06/14 17:15]
Karnik Ram
@@ Line 11: / Line 11: @@
 Alternate title: Message Passing GNNs are all you need for solving PDEs
+-----
+__Summary on GraphCast: Learning skillful medium-range global weather forecasting **by Nils**__
+GraphCast rivals more traditional numerical medium range weather forecasting methods using a GNN approach. It uses the worlds weather data as 0.25 degree grid points across earth as its input. It then autoregressively learns to predict the weather in 6h hour time intervals via a message passing GNN architecture. The most unique part of this architecture is that the grid points are mapped to a so called multi-mesh via the encoder and mapped back from the multi-mesh for the actual predictions. The multi-mesh can be thought of as an isocahedron for which each side is continually split into more triangles (in this case repeated 6 times) resulting in a graph with over 40.000 nodes. However, this graph maintains all the edges from the intermediate splits, i.e. all the edges of the original isocahedron, the first split and so on until the sixth and final split. This allows for more complex and longer distance message passing. The model is trained on weather data up until 2017 and then evaluated on newer data, i.e. 2018 and onwards against ECMWF’s High RESolution forecast (HRES). GraphCast mostly outperforms HRES, with the exception of low pressure areas in the stratosphere. While not specifically trained for extreme weather events, the paper also shows that GraphCast can help predict things like tropical cyclones and extreme heat and cold decently well.
+-----
+__Summary on Improved protein structure prediction using potentials from deep learning **by Ferdinand**__
+A novel method to predict protein structures is presented, which, contrary to most previous approaches, never relies on templates of homologous proteins, making it more useful for predicting unknown proteins. The prediction of a specific protein structure is performed by gradient descent on pairwise distances of protein residues, whilst a CNN is trained to predict these distances in the form of a histogram of pairwise distances. The training data was based on the Protein Data Bank, but used HHblits to construct/extract information about the relation between different protein sequences and structural constraints, and trains on this contructed data. The CNN consists of 220 residual blocks that cycle through dilation rates for the convolutions to propagate information quickly, where each block contains batch norm, projection, convolution, and nonlinear layers. Using the obtained distances, a potential of a protein is constructed and fed into a differentiable geometry model that finds a geometry minimizing the potential using gradient descent. As the method requires many steps of pre-processing the training data and post-processing the model output, even performing gradient descent on a function of the model output to obtain the desired structure, this is very much a hybrid model. Whilst very promising it is also prone to weaknesses in the model assumptions made in the pre and post-processing steps.
+-----
+__Summary on Highly accurate protein structure prediction with AlphaFold **by Sarthak**__
+Let's start the journey. We assume modelling long-range dependencies over MSAs with vanilla transformers would suffice, since that would be the correct inductive bias for reasoning over a protein sequence, it doesn't, therefore they implement row and column attention (to capture the mutational information). Since that'd be computationally too expensive, they implement cropping during training, which we assume would defeat the idea of capturing long-range dependencies, it doesn't. Now we assume that doing this 48 times we could predict the protein by manipulating the 2D distogram, we can't, we have to do an end-to-end prediction, and transformers, it turns out, are not the right inductive bias for graphs, since they don't represent the edges. Directly going from 1D to 3D doesn't work.
+Since they need to represent edges, they add a bias term to the row attention for the MSA, which encodes the pairwise relationships (edges), they search structure data for this. We may think doing so once suffices, it doesn't, it needs to be repeated 48 times, in both directions, so that better edge representations can also be learned in parallel. We may want to fold right after, but this is not enough, since these 2D Maps are mathematically inconsistent and don't follow the triangular inequality. We may think that we have to hardcode that, but it would hurt the end-to-end differentiability, so they do it *informationally* and get down the violations to an acceptable minimum.
+Now we may think that picking the top row and the 2D pairwise would suffice for building the 3D representation, since we now already have the positions and orientations of the residues, not true, they do away with the peptide bonds and only get a bag of triangles, in order to capture long-range dependencies and de-emphasise local peptide bonds, which is analogous to what transformers do for words. We might think this would be enough, not quite, the feature map for triangles would have to be SE(3)-equivariant, which leads them to add the IPA to the original attention, making it geometrically aware. The (final) backbone is then gained only by the refining the orientations through eight blocks in the structure module, each with its own penalty and subsequent updates.
+This suffices for the backbone, but what about the side chains?
+They take the first row of the MSA to predict torsion angles with it, which combined with the backbone through local transformations gets them the side chains. We now have a final representation and a ground truth with a loss function. We think a simple loss function would of course suffice, not true, they include FAPE Loss (chirality aware loss), auxiliary loss, distribution loss, MSA loss and confidence loss. We assume doing so once would suffice, it doesn't, they recycle it three times.
+Alternate title: Unfolding AlphaFold2

Computer Vision Group
TUM School of Computation, Information and Technology
Technical University of Munich

Technical University of Munich

Links

Informatik IX
Computer Vision Group

News

GCPR / VMV 2024

Navigation

Differences

Rechte Seite

Informatik IX
Computer Vision Group

News

GCPR / VMV 2024

Computer Vision GroupTUM School of Computation, Information and TechnologyTechnical University of Munich

Technical University of Munich

Links

Informatik IX Computer Vision Group

News

GCPR / VMV 2024

Navigation

Differences

Rechte Seite

Informatik IX Computer Vision Group

News

GCPR / VMV 2024

Computer Vision Group
TUM School of Computation, Information and Technology
Technical University of Munich

Informatik IX
Computer Vision Group

Informatik IX
Computer Vision Group