[1804.00326] Seeing Voices and Hearing Faces: Cross-modal
Seeing Voices and Hearing Faces: Cross-modal biometric matching. Arsha Nagrani, Samuel Albanie, Andrew Zisserman. We introduce a seemingly impossible task: given only an audio clip of someone speaking, decide which of two face images is the speaker. In this paper we study this, and a number of related cross-modal tasks, aimed