VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self . . . In this paper, we show that video masked autoencoders (VideoMAE) are data-efficient learners for self-supervised video pre-training (SSVP) We are inspired by the recent ImageMAE and propose customized video tube masking with an extremely high ratio
[CVPR 2023] Official Implementation of VideoMAE V2 - GitHub [2023 05 29] VideoMAE V2-g features for THUMOS14 and FineAction datasets are available at TAD md now [2023 05 11] We have supported testing of our distilled models at MMAction2 (dev version)!
MCG-NJU videomae-base · Hugging Face VideoMAE is an extension of Masked Autoencoders (MAE) to video The architecture of the model is very similar to that of a standard Vision Transformer (ViT), with a decoder on top for predicting pixel values for masked patches