FIXMYPOSE: Pose Correctional Captioning and Retrieval

Abstract: Interest in physical therapy and individual exercises such as yoga/dance has increased alongside the well-being trend, and people globally enjoy such exercises at home/office via video streaming platforms. However, such exercises are hard to follow without expert guidance. Even if experts can help, it is almost impossible to give personalized feedback to every trainee remotely. Thus, automated pose correction systems are required more than ever, and we introduce a new captioning dataset named FixMyPose to address this need. We collect natural language descriptions of correcting a “current” pose to look like a “target” pose. To support a multilingual setup, we collect descriptions in both English and Hindi. The collected descriptions have interesting linguistic properties such as egocentric relations to the environment objects, analogous references, etc., requiring an understanding of spatial relations and commonsense knowledge about postures. Further, to avoid ML biases, we maintain a balance across characters with diverse demographics, who perform a variety of movements in several interior environments (e.g., homes, offices). From our FixMyPose dataset, we introduce two tasks: the pose-correctional-captioning task and its reverse, the target-pose-retrieval task. During the correctional-captioning task, models must generate the descriptions of how to move from the current to the target pose image, whereas in the retrieval task, models should select the correct target pose given the initial pose and the correctional description. We present strong cross-attention baseline models (uni/multimodal, RL, multilingual) and also show that our baselines are competitive with other models when evaluated on other image-difference datasets. We also propose new task-specific metrics (object-match, body-part-match, direction-match) and conduct human evaluation for more reliable evaluation, and we demonstrate a large human-model performance gap suggesting room for promising future work. Finally, to verify the sim-to-real transfer of our FixMyPose dataset, we collect a set of real images and show promising performance on these images. Data and code are available: https://fixmypose-unc.github.io.

FIXMYPOSE: Pose Correctional Captioning and Retrieval

Hyounghun Kim, Abhay Zala, Graham Burri, Mohit Bansal

Comments

Similar Papers

Towards Universal Representation Learning for Deep Face Recognition

Yichun Shi, Xiang Yu, Kihyuk Sohn and Manmohan Chandraker, Anil K. Jain

Keywords Abstract Paper

face recognition, universal representation, data augmentation

WP2-GAN: Wavelet-based Multi-level GAN for Progressive Facial Expression Translation with Parallel Generators

Jun Shao, Tien Bui

Keywords Abstract Paper

expression translation, parallel training, progressive training, wavelet packet transform, multi-level GAN

Partially-Aligned Data-to-Text Generation with Distant Supervision

Zihao Fu, Bei Shi, Wai Lam and Lidong Bing, Zhiyuan Liu

Keywords Abstract Paper

data-to-text task, generation task, dataset problem, over-generation problem

Render In-between: Motion Guided Video Synthesis for Action Interpolation

Hsuan-I Ho, Xu Chen, Jie Song, Otmar Hilliges

Keywords Abstract Paper

video interpolation, action prediction, human motion modeling, human generation, human centric video, neural renderer, transformer

Sample elicitation

Jiaheng Wei, Zuyue Fu, Yang Liu and Xingyu Li, Zhuoran Yang, Zhaoran Wang

Keywords Abstract Paper

Non-contact Pain Recognition from Video Sequences with Remote Physiological Measurements Prediction

Ruijing Yang, Ziyu Guan, Zitong Yu and Xiaoyi Feng, Jinye Peng, Guoying Zhao

Keywords Abstract Paper

Computer Vision, Biometrics, Face and Gesture Recognition, AI for Life Science

Attentive Adversarial Network for Large-Scale Sleep Staging

Samaneh Nasiri, Gari D. Clifford

Keywords Abstract Paper

Robust Local Features for Improving the Generalization of Adversarial Training

Chuanbiao Song, Kun He, Jiadong Lin and Liwei Wang, John E. Hopcroft

Keywords Abstract Paper

adversarial robustness, adversarial training, adversarial example, deep learning

Subpixel Heatmap Regression for Facial Landmark Localization

Adrian Bulat, Enrique Sanchez, Georgios Tzimiropoulos

Keywords Abstract Paper

face alignment, landmarks estimation, face tracking

Interactive hybrid approach to combine machine and human intelligence for personalized rehabilitation assessment

Min Hun Lee, Daniel P. Siewiorek, Asim Smailagic and Alexandre Bernardino, Sergi BermÃºdez i Badia

Keywords Abstract Paper

Inter-Task Association Critic for Cross-Resolution Person Re-Identification

Zhiyi Cheng, Qi Dong, Shaogang Gong, Xiatian Zhu

Keywords Abstract Paper

person re-identification, cross-resolution person re-identification, inter-task, image super-resolution, low-resolution, image retrieval

Landmark-RxR: Solving Vision-and-Language Navigation with Fine-Grained Alignment Supervision

Keji He, Yan Huang, Qi Wu and Jianhua Yang, Dong An, Shuanglin Sima, Liang Wang

Keywords Abstract Paper

Task-Assisted Domain Adaptation With Anchor Tasks

Zhizhong Li, Linjie Luo, Sergey Tulyakov and Qieyun Dai, Derek Hoiem

Keywords Abstract Paper

Domain-transferred Face Augmentation Network

Hao-Chiang Shao, Kang-Yu Liu, Chia-Wen Lin, Jiwen Lu

Keywords Abstract Paper

Selective Spatio-Temporal Aggregation Based Pose Refinement System: Towards Understanding Human Activities in Real-World Videos

Di Yang, Rui Dai, Yaohui Wang and Rupayan Mallick, Luca Minciullo, Gianpiero Francesca, Francois Bremond

Keywords Abstract Paper

DESC: Domain Adaptation for Depth Estimation via Semantic Consistency

Adrian Lopez-Rodriguez, Krystian Mikolajczyk

Keywords Abstract Paper

domain adaptation, depth estimation, monocular, depth, domain, KITTI, Virtual KITTI

Learning discriminative joint embeddings for efficient face and voice association

Rui Wang, Xin Liu, Yiu-ming Cheung and Kai Cheng, Nannan Wang, Wentao Fan

Keywords Abstract Paper

bi-directional ranking constraint, face-voice association, cross-modal verification, discriminative joint embedding

Precise Yet Efficient Semantic Calibration and Refinement in ConvNets for Real-time Polyp Segmentation from Colonoscopy Videos

Huisi Wu, Jiafu Zhong, Wei Wang and Zhenkun Wen, Jing Qin

Keywords Abstract Paper

I Am Going MAD: Maximum Discrepancy Competition for Comparing Classifiers Adaptively

Haotao Wang, Tianlong Chen, Zhangyang Wang, Kede Ma

Keywords Abstract Paper

model comparison

BCaR: Beginner Classifier as Regularization Towards Generalizable Re-ID

Masato Tamura, Tomoaki Yoshinaga

Keywords Abstract Paper

person re-identification, generalizable, soft label, knowledge distillation, Re-ID, domain generalization

Prior Guided GAN Based Semantic Inpainting

Avisek Lahiri, Arnav Kumar Jain, Sanskar Agrawal and Pabitra Mitra, Prabir Kumar Biswas

Keywords Abstract Paper

semantic inpainting, generative adversarial networks, video inpainting, facial keypoints, generative models

Yichun Shi, Xiang Yu, Kihyuk Sohn and
Manmohan Chandraker, Anil K. Jain

Keywords Paper

Keywords Paper

Zihao Fu, Bei Shi, Wai Lam and
Lidong Bing, Zhiyuan Liu

Keywords Paper

Keywords Paper

Jiaheng Wei, Zuyue Fu, Yang Liu and
Xingyu Li, Zhuoran Yang, Zhaoran Wang

Keywords Paper

Ruijing Yang, Ziyu Guan, Zitong Yu and
Xiaoyi Feng, Jinye Peng, Guoying Zhao

Keywords Paper

Keywords Paper

Chuanbiao Song, Kun He, Jiadong Lin and
Liwei Wang, John E. Hopcroft

Keywords Paper

Keywords Paper

Min Hun Lee, Daniel P. Siewiorek, Asim Smailagic and
Alexandre Bernardino, Sergi BermÃºdez i Badia

Keywords Paper

Keywords Paper

Keji He, Yan Huang, Qi Wu and
Jianhua Yang, Dong An, Shuanglin Sima, Liang Wang

Keywords Paper

Zhizhong Li, Linjie Luo, Sergey Tulyakov and
Qieyun Dai, Derek Hoiem

Keywords Paper

Keywords Paper

Di Yang, Rui Dai, Yaohui Wang and
Rupayan Mallick, Luca Minciullo, Gianpiero Francesca, Francois Bremond

Keywords Paper

Keywords Paper

Rui Wang, Xin Liu, Yiu-ming Cheung and
Kai Cheng, Nannan Wang, Wentao Fan

Keywords Paper

Huisi Wu, Jiafu Zhong, Wei Wang and
Zhenkun Wen, Jing Qin

Keywords Paper

Keywords Paper

Keywords Paper

Avisek Lahiri, Arnav Kumar Jain, Sanskar Agrawal and
Pabitra Mitra, Prabir Kumar Biswas

Keywords Paper

Alexander Podolskiy, Dmitry Lipin, Andrey Bout and
Ekaterina Artemova, Irina Piontkovskaya

Keywords Paper

Keywords Paper

Dongqin Xu, Junhui Li, Muhua Zhu and
Min Zhang, Guodong Zhou

Keywords Paper

Alexander Ku, Peter Anderson, Roma Patel and
Eugene Ie, Jason Baldridge

Keywords Paper

Abhishek Das, Federico Carnevale, Hamza Merzic and
Laura Rimell, Rosalia Schneider, Josh Abramson, Alden Hung, Arun Ahuja, Stephen Clark, Greg Wayne, Feilx Hill

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Anilesh K. Krishnaswamy, Haoming Li, David Rein and
Hanrui Zhang, Vincent Conitzer

Keywords Paper

Jiexiong Tang, Hanme Kim, Vitor Guizilini and
Sudeep Pillai, Rares Ambrus

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Artsiom Sanakoyeu, Vasil Khalidov, Maureen S. McCarthy and
Andrea Vedaldi, Natalia Neverova

Keywords Paper