Install Free Gold Price Widget!
Install Free Gold Price Widget!
Install Free Gold Price Widget!
|
- GitHub - MCG-NJU VideoMAE: [NeurIPS 2022 Spotlight] VideoMAE: Masked . . .
VideoMAE performs the task of masked video modeling for video pre-training We propose the extremely high masking ratio (90%-95%) and tube masking strategy to create a challenging task for self-supervised video pre-training VideoMAE uses the simple masked autoencoder and plain ViT backbone to perform video self-supervised learning
- VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self . . .
In this paper, we show that video masked autoencoders (VideoMAE) are data-efficient learners for self-supervised video pre-training (SSVP) We are inspired by the recent ImageMAE and propose customized video tube masking with an extremely high ratio
- [NeurIPS 2022] VideoMAE: 简单高效的视频自监督预训练新范式
视频自监督学习 (Video Self-supervised Learning) : 不利用标签信息,通过设计自监督的代理任务,从视频数据中学习时空表征信息。 现有的视频自监督预训练算法主要分为两大类: (1) 基于对比学习的自监督方法,如 CoCLR,CVRL等。 (2 )基于时序相关代理任务的自监督方法,如 DPC,SpeedNet,Pace 等。 动作识别 (Action Recognition) : 对给定剪裁过视频 (Trimmed Video)进行分类,识别这段视频中人物的动作。
- VideoMAE - Hugging Face
In this paper, we show that video masked autoencoders (VideoMAE) are data-efficient learners for self-supervised video pre-training (SSVP) We are inspired by the recent ImageMAE and propose customized video tube masking and reconstruction
- VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
paper shows that video masked autoencoder (VideoMAE) is a scalable and general self-supervised pre-trainer for building video foundation models We scale the VideoMAE in both model and data with a core design Specifically, we present a dual masking strategy for efficient pre-training, with an encoder operating on a subset of video tokens
- [CVPR 2023] Official Implementation of VideoMAE V2 - GitHub
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking Limin Wang, Bingkun Huang, Zhiyu Zhao, Zhan Tong, Yinan He, Yi Wang, Yali Wang, and Yu Qiao Nanjing University, Shanghai AI Lab, CAS
- MCG-NJU videomae-base - Hugging Face
VideoMAE is an extension of Masked Autoencoders (MAE) to video The architecture of the model is very similar to that of a standard Vision Transformer (ViT), with a decoder on top for predicting pixel values for masked patches
- VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
This paper shows that video masked autoencoder (VideoMAE) is a scalable and general self-supervised pre-trainer for building video foundation models We scale the VideoMAE in both model and data with a core design
|
|
|