Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
teaching:ss2024:dl4science:summaries [2024/05/08 16:49] Karnik Ram created |
teaching:ss2024:dl4science:summaries [2024/08/06 16:33] (current) Karnik Ram |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | __Summary on Fourier Neural | + | __Summary on Fourier Neural |
- | ---- | + | Developed a method for learning the operator for parametric PDEs, giving the solution without having to specify the exact parameter values, boundary conditions or the discretization. This is achieved by exploiting the fact that a kernel integral operator can be represented as a convolution in the fourier space (via linear transformations), |
+ | ----- | ||
- | __Summary on Message Passing PDE Solvers by__ | ||
- | ---- | + | __Summary on Message Passing Neural PDE Solvers **by Nils**__ |
+ | |||
+ | The paper aims to learn the solution to various kinds of partial differential equations via an autoregressive model. They deviate from the more typical "LLM-style" autoregressive approach in two main ways, which they call the push forward and temporal bundling trick. | ||
+ | |||
+ | Alternate title: Message Passing GNNs are all you need for solving PDEs | ||
+ | |||
+ | |||
+ | ----- | ||
+ | |||
+ | __Summary on GraphCast: Learning skillful medium-range global weather forecasting **by Nils**__ | ||
+ | |||
+ | GraphCast rivals more traditional numerical medium range weather forecasting methods using a GNN approach. It uses the worlds weather data as 0.25 degree grid points across earth as its input. It then autoregressively learns to predict the weather in 6h hour time intervals via a message passing GNN architecture. The most unique part of this architecture is that the grid points are mapped to a so called multi-mesh via the encoder and mapped back from the multi-mesh for the actual predictions. The multi-mesh can be thought of as an isocahedron for which each side is continually split into more triangles (in this case repeated 6 times) resulting in a graph with over 40.000 nodes. However, this graph maintains all the edges from the intermediate splits, i.e. all the edges of the original isocahedron, | ||
+ | |||
+ | ----- | ||
+ | |||
+ | __Summary on Improved protein structure prediction using potentials from deep learning **by Ferdinand**__ | ||
+ | |||
+ | A novel method to predict protein structures is presented, which, contrary to most previous approaches, never relies on templates of homologous proteins, making it more useful for predicting unknown proteins. The prediction of a specific protein structure is performed by gradient descent on pairwise distances of protein residues, whilst a CNN is trained to predict these distances in the form of a histogram of pairwise distances. The training data was based on the Protein Data Bank, but used HHblits to construct/ | ||
+ | |||
+ | ----- | ||
+ | |||
+ | __Summary on Highly accurate protein structure prediction with AlphaFold **by Sarthak**__ | ||
+ | |||
+ | Let's start the journey. We assume modelling long-range dependencies over MSAs with vanilla transformers would suffice, since that would be the correct inductive bias for reasoning over a protein sequence, it doesn' | ||
+ | |||
+ | Since they need to represent edges, they add a bias term to the row attention for the MSA, which encodes the pairwise relationships (edges), they search structure data for this. We may think doing so once suffices, it doesn' | ||
+ | |||
+ | Now we may think that picking the top row and the 2D pairwise would suffice for building the 3D representation, | ||
+ | |||
+ | This suffices for the backbone, but what about the side chains? | ||
+ | |||
+ | They take the first row of the MSA to predict torsion angles with it, which combined with the backbone through local transformations gets them the side chains. We now have a final representation and a ground truth with a loss function. We think a simple loss function would of course suffice, not true, they include FAPE Loss (chirality aware loss), auxiliary loss, distribution loss, MSA loss and confidence loss. We assume doing so once would suffice, it doesn' | ||
+ | |||
+ | Alternate title: Unfolding AlphaFold2 | ||
+ | |||
+ | ----- | ||
+ | |||
+ | __Summary on PINNS **by Niklas**__ | ||
+ | |||
+ | In one sentence the main application of a physically informed neural network (PINN) is to learn the solution of a PDE by creating custom loss and activation functions that resemble physical constraints of the solution. That way multiple loss terms such as starting or boundary conditions or even special properties like a divergence of zero are all calculated separately and added up into a final loss term. The most important loss term which always needs to be present is the approximation of the PDE solution by the neural network. | ||
+ | In the paper two cases are discussed. The data driven solutions approach tries to predict the hidden state of the system given fixed parameters of the PDE while the data driven discovery tries to discover the underlying PDE and its parameters that best match certain data. For both approaches continuous and discrete cases are discussed as well. Generally they only tested their methods for simple MLPs but the ideas can be extended to more advanced architectures. | ||
+ | With an easy creation of virtual points within the boundary conditions the methods generally do not need large amount of real data during training which helps especially when only sparse measurements are available. Another main advantage to previous work is that PINNs are not limited to linear PDE's or do not have to linearize them first, which often results in far better results when dealing with non-linear PDE's. Additionally no prior assumptions are necessary and the paper showed that PINNs are somewhat robust to noise. | ||
+ | However, one thing that users need to be very careful about the weighting of the added loss functions, which can result in extensive testing or hyper parameter tuning. Sometimes the inference time can be quite slow, and it might be better to use a numerical solver. | ||
+ | All in all the paper introduced a simple and very effective way to enforce physical properties into the approximations of a neural network which forces the network to not hallucinate as much and come up with realistic solutions. Thereby the authors made a significant contributions towards the field of deep learning for natural sciences. | ||
+ | |||
+ | |||
+ | Alternate title: How to make your neural network start caring about physics and stop hallucinating. | ||
+ | |||
+ | ----- | ||
+ | |||
+ | __Summary on A Self-Attention Ansatz for Ab-initio Quantum Chemistry **by Leon**__ | ||
+ | |||
+ | The authors introduce the PsiFormer, which is designed to solve the many-electron Schrödinger equation using self-attention. | ||
+ | |||
+ | In quantum chemistry, the Schrödinger equation defines the physical behaviour, with the Hamiltonian describing the details of the system. Variational Mote Carlo (VMC) methods optimize a parametric wave function to find the ground state solution. This is done by the Slater determinant to satisfy physical constraints like the Kato cups conditions. To mitigate for infinite potential energies due to electron overlap the Jastrow factor can be introduced. | ||
+ | |||
+ | The main advancements over existing neural networks like FermiNet, Paulinet and SchNet lie in the use of self-attention with a sequence of multihead self-attention layers. The motivation behind it is that the electron-electron dependence in the Hamiltonian introduces a complex dependence in the wavefunction, | ||
+ | |||
+ | Compared to FermiNet, where only electron-nuclear features used to compute the self-attention, | ||
+ | |||
+ | The Psiformer was evaluated against FermiNet with SchNet convolutions for small and large molecules. It could outperform them for small molecules, despite having less parameters. The differences become more pronounced when looking at larger molecules, where Psiformer not only outperform FermiNet by a significant margin but perform also better then the best conventional methods (DMC energy). | ||
+ | |||
+ | One limitation of the paper is, that it does not show the inference times for the Psiformer. Particularly, | ||
+ | |||
+ | ----- | ||
+ | |||
+ | __Summary on An-Initio Potential Energy Surfaces by Pairing GNNs with Neural Wave Functions **by Ferdinand**__ | ||
+ | |||
+ | A novel approach for predicting molecular properties is presented, which allows for quantum chemistry predictions of molecules without separate training passes for each energy potential surface. This is achieved by leveraging FermiNet, a previous model used for single geometries to model wave functions, but supplying inputs from a GNN rather than hand crafting the input based on a specific molecular geometry. With a wave function model training on a wide range of possible input geometries, the GNN learns specific combinations of inputs that lead to waves similar to the ones belonging to the desired molecular geometry. To achieve this, all training is done on distances only, whilst directionality is preserved through transforming geometries and coordinates into a equivariant coordinate system based on PCA. The resulting model required shorter training times than previous NN based approaches such as PauliNet and FermiNet, even though it was less susceptible to produce false results (from e.g. wrongly assumed spherical symmetry) and had a higher accuracy. However, it still requires training for each different molecule, and e.g. cyclobutadiene required almost 20x the time of a simple hydrogen chain due to having more complex energy surfaces. |