VLM from Scratch
Built two VLM variants on MNIST: a symbolic (rule-based) system and a fully learned model with ViT encoder, custom character-level tokenizer, and trained Transformer decoder with an interactive drawing UI for visual QA.
Built two VLM variants on MNIST: a symbolic (rule-based) system and a fully learned model with ViT encoder, custom character-level tokenizer, and trained Transformer decoder with an interactive drawing UI for visual QA.