EDUCATION

SEMINAR

Next generation of protein structure analysis using ColabFold and Foldseek

Date
2022-09-20 16:00:00
Department
Biomedical Engineering
Venue
110-N104
Lecturer
Martin Steinegger, Ph.D/Seoul National Univ.

Next generation of protein structure analysis using ColabFold and Foldseek

Structure prediction with AlphaFold2 is set to have a huge impact on biology, medicine, and biotechnology.
AlphaFold2 is not only accurate but, if optimized, also fast. Our ColabFold-AlphaFold2 pipeline accurately predicts the structures of a whole proteome within two days on a single GPU, approx. 100 times faster compared to the AlphaFold2 base system.

The availability of these methods and Deepmind/EBIs large-scale effort to predict the structure of every UniRef90 protein sequence (>100Mio.) is rapidly increasing the number of available structures. Analysing these structural datasets became a major bottleneck.
In particular, a simple search for homologous structures in a database of one million entries takes a week on a single core using currently available tools. To address this issue, we developed Foldseek for fast and sensitive similarity searching through large structural databases. Foldseek is about four orders of magnitude faster than current structural aligners allowing one to search in seconds through millions of structures. Another issue arising is storing the predicted structures, while the current AlphaFold database requires 24 TB of space, our method Foldcomp can compress the information down to 900GB with a <0.1 Å while being nearly as fast as GZIP. During this talk, I will explain how we designed Colabfold to predict highly accurate structures in seconds as well as how Foldseek efficiently queries large structural databases and Foldcomp to store structure the information efficiently. These tools are open source and can be accessed at colabfold.com, foldseek.com, and github.com/steineggerlab/foldcomp, respectively.