# Transformer Library Documentation ```{toctree} :maxdepth: 2 :caption: Getting Started installation quickstart ``` ```{toctree} :maxdepth: 3 :caption: Guide guide ``` ```{toctree} :maxdepth: 2 :caption: API Reference api ``` ```{toctree} :maxdepth: 3 :caption: Usage Examples examples ``` ```{toctree} :maxdepth: 2 :caption: Project Info contributing ``` ## Overview A Polished PyTorch implementation of the current State-Of-The-Art(SOTA) Transformer, designed to be a Baseline for Research and Engineering. **Features:** - Fully configurable architecture (layers, heads, dimensions, etc.) - HuggingFace-compatible API (`PreTrainedModel`, `GenerationMixin`) - Multi-Head Attention (MHA), Grouped-Query Attention (GQA) and others - Rotary Position Embeddings (RoPE) and SwiGLU feed-forward - Optional weight tying, QK normalization, and bias control ### Quick Example ```python import torch from transformer import Transformer, TransformerConfig config = TransformerConfig(vocab_size=32000, n_layers=12, n_heads=16, d_model=1024) model = Transformer(config) input_ids = torch.randint(0, 32000, (2, 512)) outputs = model(input_ids) logits = outputs.logits # shape: (2, 512, 32000) [batch_size, seq_len, vocab_size] ``` ### Indices and tables - {ref}`genindex` - {ref}`modindex` - {ref}`search`