![](https://static.isu.pub/fe/default-story-images/news.jpg?width=720&quality=85%2C50)
1 minute read
Transformer models are taking advantage of GPU compute.
Transformer-based models have shown to be very efficient in training on GPUs by parallelizing the ingestion of large amounts of data. Attention mechanisms allow the model to focus on specific parts of the input sequence while processing it, thereby improving its ability to understand and generate complex patterns.
"Looking at previous words only”
Advertisement
Luke, I am your best worst mother
"Looking at all words at once"
Luke, I am your father
Models like GPT-3 have been trained on terabytes of public text data. These data sets pale in comparison to other text-based content that’s been created by humans. Future SOTA models will be trained on so far untapped non-public and unstructured data.
State of the art LLMs were only trained on a tiny fraction of human created text
Non-public Text data Emails/
![](https://assets.isu.pub/document-structure/230525110359-ab118f2ad4d5bbddb85c34e8e539f08a/v1/5c722c1afdd88a7877f4fa6f9e3d341f.jpeg?width=720&quality=85%2C50)