Andi: Launching the training for SmolVLM 256M is as simple as: ./vision/experiments/pretraining/vloom/tr_341_smolvlm_025b_1st_stage/01_launch . sh Then we use wandb to track the losses. Check out the file to find out details!

See full post

Andi andimara.bsky.social · Jan 31, 2025
Fuck it, today we're open-sourcing the codebase used to train SmolVLM from scratch on 256 H100s 🔥 Inspired by our team's effort to open-source DeepSeek's R1, we are releasing the training and evaluation code on top of the weights 🫡 Now you can train any SmolVLM—or create your own custom VLMs!

View on Bluesky Download image Show all post labels
Andi andimara.bsky.social
Launching the training for SmolVLM 256M is as simple as: ./vision/experiments/pretraining/vloom/tr_341_smolvlm_025b_1st_stage/01_launch . sh Then we use wandb to track the losses. Check out the file to find out details!
Jan 31, 2025 15:06
0 reposts 0 quotes 0 likes

View on Bluesky Download image Show all post labels
Andi andimara.bsky.social · Jan 31, 2025
Post training, you can run the evaluation on all of these tasks by running: sbatch vision/experiments/evaluation/vloom/async_evals_tr_346/run_evals_0_shots_val_2048 . slurm

View on Bluesky Download image Show all post labels
Andi andimara.bsky.social · Jan 31, 2025
The codebase is full of interesting insights like this one in our dataset.py file: How do you get reproducible randomness in different processes across different machines? Start different random number generators based on a tuple (seed, rank)!

View on Bluesky Download image Show all post labels
Andi andimara.bsky.social · Jan 31, 2025
And it also has a bunch of bugs like this one in our modeling_vllama3.py file. We start from a pretrained LLM, but for some reason the weights of the head are not loaded from the model. I still don't know why :(

View on Bluesky Download image Show all post labels
Andi andimara.bsky.social · Jan 31, 2025
And that was why we didn't release this before. It's live research code. Most gets rewritten fairly often, and some parts have been the same for years. It works, it manages to produce SOTA results at 256M and 80B sizes, but it's not production code. Go check it out: github.com/huggingface/...
smollm/vision at main · huggingface/smollm

Everything about the SmolLM2 and SmolVLM family of models - huggingface/smollm

github.com

View on Bluesky Show all post labels

An unhandled error has occurred. Reload 🗙

smollm/vision at main · huggingface/smollm