Energy-Based Transformers Are Scalable Learners and Thinkers alexiglad.github.io 2 points by cs702 3 hours ago
See also https://www.reddit.com/r/MachineLearning/comments/1lu1ia0/r_...