Self Supervised Learning for 3D Action Prediction with Graph Convolutional Recurrent Network
Keywords:
3D action prediction, self-supervised learning, state discrimination, spatio-temporal consistency, contrast learningAbstract
In view of the dependence of existing 3D action prediction research on labels, we propose a graph convolutional recurrent 3D action prediction method based on state discrimination and spatio-temporal self-supervised contrast learning. In the state discrimination task, cross-sample sampling and relative action completeness perception are used to train the model for generalized state information learning across instances and classes. In the spatio-temporal contrast task, spatio-temporal consistency information is introduced into the feature representation to enrich action semantics in features. Additionally, in order to fully extract spatio-temporal information in 3D action sequences, a spatio-temporal feature extraction network (STFEN) based on graph convolution recurrent network is designed. The experimental results on public datasets demonstrate the efficiency of the proposed methods.