[R] Applying Linearly Scalable Transformers to Model Longer Protein Sequences

Original article was published by /u/Yuqing7 on Deep Learning

In a bid to make transformer models even better for real-world applications, researchers from Google, University of Cambridge, DeepMind and Alan Turing Institute have proposed a new transformer architecture called “Performer” — based on what they call fast attention via orthogonal random features (FAVOR).

Here is a quick read: Applying Linearly Scalable Transformers to Model Longer Protein Sequences

The paper Masked Language Modeling for Proteins via Linearly Scalable Long-Context Transformers is on arXiv.

submitted by /u/Yuqing7
[link] [comments]