filename : Dib17a.pdf entry : article conference : Eurographics 2017 pages : year : 2017 month : title : DeepGarment : 3D Garment Shape Estimation from a Single Image subtitle : author : Radek Danecek*, Endri Dibra*, A. Cengiz Ă–ztireli, Remo Ziegler, Markus Gross booktitle : Computer Graphics Forum (Proceedings of Eurographics 2017) ISSN/ISBN : editor : publisher : The Eurographics Association and John Wiley & Sons Ltd. publ.place : volume : 36 issue : 2 language : English keywords : Computational Geometry and Object Modeling, Three-Dimensional Graphics and Realism abstract : 3D garment capture is an important component for various applications such as free-view point video, virtual avatars, online shopping, and virtual cloth fitting. Due to the complexity of the deformations, capturing 3D garment shapes requires controlled and specialized setups. A viable alternative is image-based garment capture. Capturing 3D garment shapes from a single image, however, is a challenging problem and the current solutions come with assumptions on the lighting, camera calibration, complexity of human or mannequin poses considered, and more importantly a stable physical state for the garment and the underlying human body. In addition, most of the works require manual interaction and exhibit high run-times. We propose a new technique that overcomes these limitations, making garment shape estimation from an image a practical approach for dynamic garment capture. Starting from synthetic garment shape data generated through physically based simulations from various human bodies in complex poses obtained through Mocap sequences, and rendered under varying camera positions and lighting conditions, our novel method learns a mapping from rendered garment images to the underlying 3D garment model. This is achieved by training Convolutional Neural Networks (CNN-s) to estimate 3D vertex displacements from a template mesh with a specialized loss function. We illustrate that this technique is able to recover the global shape of dynamic 3D garments from a single image under varying factors such as challenging human poses, self occlusions, various camera poses and lighting conditions, at interactive rates. Improvement is shown if more than one view is integrated. Additionally, we show applications of our method to videos.