The voice can reveal a lot of information, but few expect it to expose the speaker after a few seconds of speaking.
A recent study by the Massachusetts Institute of Technology (MIT), USA, shows that trained AI can not only identify a person's gender, age, and ethnicity from a person's voice, but it can even guess the face of a person. What do they look like.
Using a dataset of millions of videos on YouTube, the researchers trained an AI based on a neural network model called Speech2Face. Experimental results show that just hearing a voice for 6 seconds, this system can restore the speaker's face with high accuracy.
How Speech2Face works is divided into two parts. One is the speech encoder, which is responsible for analyzing words from the input and predicting the relevant facial features. The rest is the face decoder, which integrates facial features to create images.
The MIT team points out that their purpose is not to restore the exact appearance of the source. This model was primarily created to study the correlation between speech and human appearance.
From the training results, Speech2Face can identify gender better and can be clearly distinguished between whites and Asians. In addition, the accuracy rate in predicting the age is a bit higher when the sound comes from people between the ages of 30-40 and 70.
In addition to gender and age, Speech2Face can even predict facial features such as the structure of the nose, the thickness and shape of the lips, or the facial bone frame at an approximate ratio. Basically, the longer the audio listening time, the higher the AI accuracy.
Of course, there are also cases of confused AI. The researchers found that the system would identify a boy who had not broken his voice at puberty as a female, or some with a specific voice. This is completely understandable because sound is not something for sure. Like the case of the cute girl who was with you in the hotel last night could be a difficult guy.
The results also suggest that the limitations of Speech2Face are partly due to the lack of ethnic diversity in the data set. This also led to inaccuracy in determining the voice of black people.
The application of this technology is also vast. The simplest, imagine just say a few words, the software can build a face of your avatar looks 70-80% lifelike.
The voice can also be identified as a DNA or human fingerprint. In the future, the technology could also be upgraded so police can use it to narrow the scope of criminal investigations or find pranksters by calling to report fake cases.
Currently, HSBC, Standard Chartered, JPMorgan Chase and several other banks are using the same technology to create a "voice ID", to detect whether a customer's account is stolen or appropriated. .
Some companies, such as Metropolitan Life Insurance's customer service centers, also use the AI system to help identify customer emotions over the phone, thereby assessing whether the caller is intent on fraud insurance. or not.
A number of major technology companies have also applied AI in recruitment, to analyze the personality of candidates to see if they are suitable for the vacancy.
At CES 2017, Toyota showed off a car with an infrared camera, a sensor and a voice recognition and dialogue system. They will all work together to determine if the driver is in a state of fatigue to give a warning.
Of course, compared to the above applications, MIT's technology is more deployable. Researchers hope that one day, it can be used to remotely diagnose diseases like Parkinson's. Currently, studies have found that patients with coronary artery disease will have their own frequency signature in the voice. In the future, doctors will "listen" to patients so they can diagnose their diseases better.