GigaGPT: GPT-3 models in 565 lines of code that raise questions about robustness and adoption of established standards in the deep learning community – Developpez.com
GigaGPT presents itself as a Cerebras implementation of Andrei Karpathy’s nanoGPT and impresses with its simplicity and compactness with only 565 lines of code. This release promises to push the boundaries of model size and exceed 100 billion parameters without resorting to third-party code additions or frameworks. This is made possible by leveraging the memory … [Read more…]