Attention Mechanism
Also known as: Attention
A technique in neural networks that allows models to focus on relevant parts of the input when generating each part of the output, rather than relying solely on a fixed-length context vector. In sequence-to-sequence models, attention computes a weighted combination of all encoder states when predicting each output token, with weights learned based on relevance. This enables models to handle long sequences and capture dependencies between distant elements. In accessibility applications, attention mechanisms improve sign language translation by allowing models to focus on the most relevant video frames when generating each word, and enhance image captioning by attending to specific image regions when describing them.
Category: Machine Learning · Deep Learning · Natural Language Processing
Related: Transformer · Sequence-to-Sequence · LSTM · Neural Network