All Posts
Artificial Intelligence

Cutting Through the Noise: How We Built a High-performance Microphone for the Classroom

This brief write-up provides a high-level overview of part of the effort needed to build and productize a far-field microphone solution from a hardware point of view.
Javier Villafana and Shom Ponoth
September 9, 2022

The Merlyn voice assistant-based Symphony Classroom device is designed to offer a high-performance audio experience within the classroom setting. A typical classroom can be a challenging acoustic environment with different noise sources that can be largely classified into non-stationary, such as children talking or “babble” noise, and stationary noise sources such as HVAC systems or projector fans. Variations in physical classroom dimensions and reverberances add to challenges for voice pick-up. A high-performance far-field microphone solution including the hardware and the voice processing algorithm was selected and tuned to address these challenges.

This brief write-up provides a high-level overview of part of the effort needed to build and productize a far-field microphone solution from a hardware point of view. Aspects such as speaker design and echo cancellation, far-field testing, wake-word and automated speech recognition optimizations, which are key elements of a voice assistant speaker capability, are not addressed.

Exploded view showing the microphone array for the Symphony Classroom multi-modal teacher assistant device

Implementing a production-quality, high-performance far-field voice solution is an involved process spanning acoustic, electrical, and mechanical design phases, acoustic characterization, far-field algorithm tuning, and testing at multiple stages during the build process. For the Printed Circuit Board Assembly (PCBA) phase within the Engineering Validation Test (EVT) stage, some of the key steps include microphone part component choice, ensuring the PCBA is manufactured correctly including the port design on the board, soldering of the microphones, ensuring there is no unwanted electrical noise pickup, and PCBA-level acoustic characterization in an anechoic chamber. The microphone enclosure design and sealing of the PCBA are critical steps to minimize degradation of the frequency response and to ensure that the microphones are independent of each other; both are critical to ensure that the far-field algorithm can function correctly. On the production line, acoustic-focused checks are needed to continue to monitor key quality parameters such as microphone frequency response, Total Harmonic Distortion (THD), and microphone sealing and matching, among others. It is also critical that the far-field microphone and speaker performance be checked during the different stages of the build, namely post-EVT, DVT, and PVT.

For the far-field algorithm, we worked closely with our algorithm partner to customize a seven-microphone high-performance far-field voice solution. Some of the key algorithm modules include beamforming, Acoustic Echo Cancellation (AEC), and noise reduction. When target speech is identified by the algorithm, the beamformer can increase the signal-to-noise ratio (SNR) by up to 16dB, resulting in a cleaner, clearer speech signal that is provided to the rest of the downstream components. For the beamformer to provide best results, it is critical that the microphone signals be isolated from one another ideally by at least 30dB. Failure to isolate the microphone signals from one another will result in a far-field performance degradation. Achieving a good performance requires careful design and assembly of the mechanical components including the enclosure, gaskets, and PCB.

The specifications and choice of the microphone component are other key design parameters for a high-performance far-field voice solution. A flat frequency response and low distortion of the fully assembled part is desired in the speech region of interest (100Hz - 8KHz). Bottom-port and top-port microphones could be considered, and the microphone port should be carefully designed to avoid the Helmholtz resonator effect from appearing within the speech region as this will introduce distortions in the speech frequency region. As the voice algorithm uses a multi-microphone array, it is also important that all microphone components are sensitivity-matched to each other.

To conclude, productizing a far-field microphone-based assistant involves significant technical design and production know-how. We are thrilled to have built and offered such as a solution for a teacher-focused device with our Symphony Classroom Hub.

Javier Villafana is the Senior Staff Acoustics Engineer at Merlyn Mind.

Shom Ponoth is the Chief Hardware Officer at Merlyn Mind.