High-performance speech neuroprosthesis
Prof. Francis R. Willett from the Department of Neurobiology, Stanford University School of Medicine, reported their newly developed BCI device/method for speech neuroprosthesis.
They focused on addressing the communication challenges faced by individuals with paralysis, particularly those who can no longer speak due to conditions such as amyotrophic lateral sclerosis (ALS). People with neurological disorders often experience severe speech and motor impairments, including the complete loss of speech (locked-in syndrome). While there have been advancements in brain-computer interfaces (BCIs) that enable individuals to communicate through hand movement activities, speech BCIs have not yet achieved high accuracies for unconstrained communication with large vocabularies. The aim of the study is to develop a high-performance speech neuroprosthesis, specifically a BCI, that can restore rapid communication for people with paralysis who can no longer speak intelligibly.
The researchers developed a high-performance speech neuroprosthesis using a brain-computer interface (BCI). Specifically, they used intracortical microelectrode arrays to record neural activity. The participant with amyotrophic lateral sclerosis (ALS) who retained limited orofacial movement but could not produce intelligible speech attempted to speak, and the neural activity was decoded into text using the BCI. The study also involved the use of a language model with a large vocabulary of 125,000 words. The developed methods and device enabled the participant to achieve low word error rates and significantly improved accuracy compared to previous speech BCIs.
They decoded the attempted speech using a brain-to-text decoding algorithm. They recorded neural activity from the participant with ALS using intracortical microelectrode arrays. The neural activity was temporally binned and smoothed on each electrode. Then, a recurrent neural network (RNN) was used to convert the neural activity into probabilities for each phoneme. The RNN decoder combined these phoneme probabilities with a language model to infer the most likely sequence of words. The language model used both the phoneme probabilities and the statistics of the English language to decode the speech. Specifically, the RNN decoder was a five-layer gated recurrent-unit architecture trained using TensorFlow.During the decoding process, the participant prepared to speak a sentence and when a “go” cue was given, the neural decoding was triggered. The RNN decoder generated real-time decoded words, reflecting the language model’s best guess, which appeared on a screen. The participant finalized the decoded output by pressing a button. They used two different language models: a large-vocabulary model with 125,000 words and a small-vocabulary model with 50 words. The performance of the decoding algorithm was evaluated over several days of attempted speaking and mouthing (silent speech) sessions. The participant achieved low word error rates of 9.1% with the 50-word vocabulary and 23.8% with the 125,000-word vocabulary.
In addition to analyzing phonemes, the researchers also examined the representation of vowels, which have a two-dimensional articulatory structure. The saliency vectors for vowels mirrored this structure, with similar vowels having a similar neural representation. The neural activity also contained a plane that reflected the two dimensions of vowels in a direct way. These findings were verified using additional methods and with additional able-bodied speakers.These results suggest that the neural representation of speech in the participant’s brain was preserved despite their inability to speak intelligibly. The saliency vectors extracted from the neural activity showed details about the articulation of phonemes and vowels, which is encouraging for the development of speech neuroprostheses.
They also explored three important factors that can improve the accuracy and usability of speech brain-computer interfaces (BCIs): language model vocabulary size, microelectrode count, and training dataset size.
Regarding vocabulary size, they found that only very small vocabularies, such as 50-100 words, retained a large improvement in accuracy. Word error rates saturated at around 1,000 words, indicating that using an intermediate vocabulary size may not be effective in increasing accuracy.They also investigated the impact of the number of electrodes used for decoding. It was observed that accuracy improved with a log-linear trend, meaning that doubling the electrode count nearly halved the word error rate. This suggests that using intracortical devices capable of recording from more electrodes could lead to improved accuracies in the future. In summary, the design considerations for speech BCIs involve optimizing the language model vocabulary size, increasing the number of electrodes used for decoding, and considering the size of the training dataset to improve the accuracy and usability of these neuroprosthetic devices.