Contains implementations of prominent ViT architectures broken down into modular components like encoder, attention mechanism, and decoder.
Makes it easy to develop custom models by composing components of different architectures.
Code Documentation