Xiaomi XiaoAi Announces Comprehensive Self-research On Acoustic Speech Technology

Recently, Xiaomi released the Xiaoai speaker Art. As the 9th smart speaker launched by Xiaomi, the acoustic voice technology behind this speaker has also been heavily upgraded. It comes with the third generation of Xiaoai voice assistant and supports emotional voice interaction.

Today, Xiaomi officially revealed that Xiaomi’s acoustic speech technology has achieved full self-research and has continued to lead in some areas of self-research.

The first is to achieve ‘emotionalization’ of voice interaction. In order to let the machine add emotional elements, under the premise of ‘limited emotional data amount’, Xiaomi AI Lab finally uses the combination of different acoustic models and different vocoders to finally launch a natural and anthropomorphic emotional TTS (Text to speech, artificial speech synthesis). Due to this, it will become the first enterprise in the industry to launch large-scale emotional TTS.

Emotional voice interaction

This time, Xiaomi Xiaoai Speaker Art fully supports emotional voice interaction, based on limited but different types of emotional audio data (such as happy, concerned, shy, surprised, etc.). In the future, Xiaomi Voice will upgrade this technology to support real-time synthesis of emotional TTS.

Secondly, the AIoT playback technology has been upgraded. And for the first time, the same audio function can be played synchronously throughout the house. It is reported that Xiaomi Xiaoai Speaker Art is the first device that can support voice playback throughout the house. Users can directly say ‘play XXX in the whole house’ to Xiao Ai voice assistant. So, you can achieve one-sentence voice interaction.

Thirdly, the Xiaomi Xiaoai Speaker Art wakes up nearby due to the new upgrade. The alarm clock of the distant speaker sounds, and the wake-up speaker can directly turn off the alarm clock of the distant place. This feature was first launched in the industry, and Xiaomi Xiaoai Speaker Art is also the first product to support this feature.

Fourth, the new strategy of the two microphone arrays wakes up, taking into account low power consumption and high performance, efficient noise reduction, and obtain clean vocals. The Xiaomi Xiaoai Speaker Art simultaneously supports the two-microphone array wake-up technology. In terms of microphone array, Xiaomi adopts two wheat blind source separation noise reduction front ends. Through blind source separation, noise reduction, echo cancellation and other technologies, it can combine voice enhancement technology in noisy environments with multiple sound sources. And when the speaker itself plays music, it can eliminate strong interference of noise and obtain clean and accurate vocal audio.

At present, the number of smart hardware connected to the Xiaomi IoT platform has reached 250 million units. And the speaker shipment has reached 22 million units.

Source