Skip to content
Papyros
Archive
Graph
Builders
Notes
Join
The Archive
transformers
2019
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
2017
Attention Is All You Need