Nystrom Approximation of Attention as a Subunit of Network in Network Transformer
Paper Coming Soon
-