- Fuck it, today we're open-sourcing the codebase used to train SmolVLM from scratch on 256 H100s 🔥 Inspired by our team's effort to open-source DeepSeek's R1, we are releasing the training and evaluation code on top of the weights 🫡 Now you can train any SmolVLM—or create your own custom VLMs!
- Launching the training for SmolVLM 256M is as simple as: ./vision/experiments/pretraining/vloom/tr_341_smolvlm_025b_1st_stage/01_launch . sh Then we use wandb to track the losses. Check out the file to find out details!Jan 31, 2025 15:06
- Post training, you can run the evaluation on all of these tasks by running: sbatch vision/experiments/evaluation/vloom/async_evals_tr_346/run_evals_0_shots_val_2048 . slurm
- The codebase is full of interesting insights like this one in our dataset.py file: How do you get reproducible randomness in different processes across different machines? Start different random number generators based on a tuple (seed, rank)!
- And it also has a bunch of bugs like this one in our modeling_vllama3.py file. We start from a pretrained LLM, but for some reason the weights of the head are not loaded from the model. I still don't know why :(
- And that was why we didn't release this before. It's live research code. Most gets rewritten fairly often, and some parts have been the same for years. It works, it manages to produce SOTA results at 256M and 80B sizes, but it's not production code. Go check it out: github.com/huggingface/...