Thesis proposal for students of:
1. Saarland University, Saarbrücken, Germany
2. Technische Universität, Berlin, Germany
Published: 26.05.2026
Proposed by the SCAAI Group (Social, Cognitive and Affective AI, https://scaai.dfki.de) at the German Research Center for Artificial Intelligence (DFKI), Saarbrücken, Germany
## Introduction/Problem
One of the most effective approaches for translating text into sign language is through the animation of signing 3D avatars.
Translation pipelines normally work in two stages: first, text is translated into the description of a sequence of signs to be performed; second, animations of each single sign are concatenated.
For an avatar to be useful in a translation pipeline, a repository of single insulated signs, i.e., signs in their “citation form”, must be available.
Collecting single signs through motion capture technologies is extremely expensive and time consuming.
As an alternative, it should be possible to create such repository by programmatically analysing existing online repositories of insulated signs.
Methods exist for estimating 3D human motion from affordable RGB cameras (Pavlakos et al., 2019), but they typically require extensive post-processing for dynamic motion. This challenge is even greater for sign language, which involves rapid movements, self-occlusions, and highly expressive facial gestures.
Recently, O’Brien et al. (2026), evaluated the quality of different openly available body/face motion capture systems (from RGB videos) for reconstructing 3D body/face motion. They specifically targeted the sign language domain. Results show that different different body parts are best recognized by different libraries, suggesting an approach that mixes several technologies for best results.
Public open repositories of insulated signs exist for the community as video vocabulary (i.e., Spreadthesign: https://www.spreadthesign.com), but they miss a 3D version in a standardized format.
## Goals
The goals of this thesis are to:
* Investigate body/face 3D motion reconstruction techniques for the human body, hands, and face running on plain RGB video.
* Identify online repositories for collecting sign language videos (including investigating licensing issues).
* Design and build a repository for video and 3D motion data of insulated signs
* Apply 3D
In addition to the written manuscript, the thesis must produce a demonstrable working system, released as open-source software in a public repository.
Satisfactory results may lead to a scientific publication.
As initial material, the candidate can have a look at our sign language synthesis software (Nunnari et al., 2025), which will be used for running quality tests: https://github.com/DFKI-SignLanguage/MMS-Player
## Requirements
* Passion for 3D graphics and human characters animation.
* Strong Python programming skills.
* Software engineering skills for organizing big code repositories into modules, classes, and scripts (not only plain sequential notebooks).
* Use of the Blender 3D editor and APIs (https://www.blender.org).
* Proactive and propositive attitude.
* Knowledge of any sign language is not required but would be considered as a plus.
* For Uni Saarland students, the thesis will follow the guidelines of the UMTL department at the University of Saarland: https://umtl.cs.uni-saarland.de/teaching/thesis.html
## Contacts
When interested, send a CV, transcript of your grades, and possibly links to your selected existing open-source software repositories to Fabrizio Nunnari <fabrizio.nunnari@dfki.de>
## References
Nunnari, F., Mishra, S. and Gebhard, P. (2025) “MMS Player: an open source software for parametric data-driven animation of Sign Language avatars,” Adjunct Proceedings of the 25th ACM International Conference on Intelligent Virtual Agents. 9th Workshop on Sign Language Translation and Avatar Technology, Berlin Germany: ACM, pp. 1–8. Available at: https://doi.org/10.1145/3742886.3756710.
O’Brien, C. et al. (2026) “Evaluation of Pose Estimation Systems for Sign Language Translation,” 12th Workshop on the Representation and Processing of Sign Languages: Language in Motion. LREC, ELRA, pp. 371–386. https://www.sign-lang.uni-hamburg.de/lrec/pub/26035.pdf
Pavlakos, G. et al. (2019) “Expressive Body Capture: 3D Hands, Face, and Body from a Single Image,” in Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 10975–10985.
