Transformer

Also known as: Transformer Model, Transformer Architecture

A deep learning architecture introduced by Vaswani et al. in 2017 that relies entirely on attention mechanisms rather than recurrence (RNNs) or convolution for sequence modeling tasks. Transformers process entire input sequences in parallel using "self-attention" to weigh the importance of different parts of the input when generating each part of the output. In accessibility applications, transformers power state-of-the-art sign language translation, automatic captioning, image description, and speech recognition systems. Models like GPT, BERT, and their successors are all transformer-based. The architecture excels at capturing long-range dependencies and has largely replaced RNN/LSTM approaches for natural language processing.

Category: Machine Learning · Natural Language Processing · Deep Learning

Related: LSTM · Attention Mechanism · Neural Network · Sign Language Translation

Sources

https://arxiv.org/abs/1706.03762
https://doi.org/10.1145/3477498