Tobias Weyand

tobw.net

Followers · Following

Researcher at Google DeepMind working towards human-level video understanding 🔗 tobw.net

Joined November 2024

Posts Replies Media Original posts Likes

Tobias Weyand tobw.net · May 13, 2025
We're excited to release Minerva 🕵️‍♀️, a benchmark to evaluate if AI can truly reason about videos, from spotting game-changing moments in sports 🏀 to understanding character motivations in short films 🍿. We provide the "why" behind the answers! Pointers below 👇

View on Bluesky Show all post labels
Tobias Weyand tobw.net · May 13, 2025
📜 Paper: arxiv.org/abs/2505.006... 📊 Dataset: github.com/google-deepm... This is work with my amazing colleagues and collaborators Arsha Nagrani, Sachit Menon, Ahmet Iscen, Shyamal Buch, Ramin Mehran, Nilpa Jha, Anja Hauth, Yukun Zhu, Carl Vondrick, Mikhail Sirotenko, and Cordelia Schmid
MINERVA: Evaluating Complex Video Reasoning

Multimodal LLMs are turning their focus to video benchmarks, however most video benchmarks only provide outcome supervision, with no intermediate or interpretable reasoning steps. This makes it challe...

arxiv.org

View on Bluesky Show all post labels
Tobias Weyand tobw.net · May 13, 2025
The newly released Gemini 2.5 Pro (Preview 05/06) sets the state-of-the art on Minerva with 63.5% accuracy. Human accuracy is 92.5%. developers.googleblog.com/en/gemini-2-...
Advancing the frontier of video understanding with Gemini 2.5- Google Developers Blog

Explore Gemini 2.5, enhancing video understanding and combining audio-visual data and code for new interactive applications

developers.googleblog.com

View on Bluesky Show all post labels
Tobias Weyand tobw.net · May 13, 2025
Listen to the AGI Breakdown podcast on Minerva here: aibreakdown.org/arxiv-paper-...
Arxiv paper – MINERVA: Evaluating Complex Video Reasoning

In this episode, we discuss MINERVA: Evaluating Complex Video Reasoning by Arsha Nagrani, Sachit Menon, Ahmet Iscen, Shyamal Buch, Ramin Mehran, Nilpa Jha, Anja Hauth, Yukun Zhu, Carl Vondrick, Mikhai...

aibreakdown.org

View on Bluesky Show all post labels

Tobias Weyand tobw.net · Jan 8, 2025
6yo daughter: Papa, are you the boss of Google? Me: No 6yo daughter: Why?

View on Bluesky Show all post labels

Tobias Weyand tobw.net · Dec 5, 2024
Excited to share Long-Video Masked Autoencoder (LVMAE) our team just published at NeurIPS'24! We boost the context length of video models using an adaptive decoder and a dual-masking strategy and achieve SotA on several video benchmarks. Paper: arxiv.org/abs/2411.13683
- Mikhail Sirotenko msirotenko.bsky.social · Dec 4, 2024
  The blogpost is out about our recent work on training masked autoencoders on long(-er) videos. The paper was accepted to NeurIPS`24. More at: goo.gle/4fW5aIc
  Extending video masked autoencoders to 128 frames
  
  goo.gle
View on Bluesky Show all post labels

Tobias Weyand tobw.net · Nov 24, 2024
Is there a better way to find the publication venue of an ArXiv paper than searching for the title on Google / Google Scholar / OpenReview and checking authors' websites?

View on Bluesky Show all post labels

Tobias Weyand tobw.net · Nov 24, 2024
Tap, tap. Is this thing on?

View on Bluesky Show all post labels

Tobias Weyand

MINERVA: Evaluating Complex Video Reasoning

Advancing the frontier of video understanding with Gemini 2.5- Google Developers Blog

Arxiv paper – MINERVA: Evaluating Complex Video Reasoning

Extending video masked autoencoders to 128 frames