Transformer Library Documentation¶
Getting Started
Guide
- Transformer: A PyTorch SOTA Transformer Implementation
- Configuration
API Reference
Usage Examples
Project Info
Overview¶
A Polished PyTorch implementation of the current State-Of-The-Art(SOTA) Transformer, designed to be a Baseline for Research and Engineering.
Features:
Fully configurable architecture (layers, heads, dimensions, etc.)
HuggingFace-compatible API (
PreTrainedModel,GenerationMixin)Multi-Head Attention (MHA), Grouped-Query Attention (GQA) and others
Rotary Position Embeddings (RoPE) and SwiGLU feed-forward
Optional weight tying, QK normalization, and bias control
Quick Example¶
import torch
from transformer import Transformer, TransformerConfig
config = TransformerConfig(vocab_size=32000, n_layers=12, n_heads=16, d_model=1024)
model = Transformer(config)
input_ids = torch.randint(0, 32000, (2, 512))
outputs = model(input_ids)
logits = outputs.logits # shape: (2, 512, 32000) [batch_size, seq_len, vocab_size]