This study addresses the challenge of selective auditory attention in noisy environments by proposing an EEG-based target speaker extraction model, ASEAF, designed to mimic neural decoding through tailored spatio-temporal feature extraction and cross-modal fusion. The model achieves precise extraction of the target speaker's speech by simultaneously processing EEG and audio signals....
No actionable clinical change yet; ASEAF is a proof-of-concept deep-learning model that may inform future neuro-steered hearing aid design but has not been validated in clinical populations.
EEG-audio fused auditory attention decoding represents a key research frontier for next-generation neuro-steered hearing aids, making this directly relevant to audiology's technology pipeline.
- 01ASEAF fuses EEG brainwave data with audio signals using attention mechanisms and a SincNet architecture.
- 02The model aims to decode which speaker a listener is attending to — known as auditory attention decoding (AAD).
- 03Designed to work in noisy, multi-speaker environments, a critical challenge for hearing aid users.
- 04Represents a step toward brain-controlled hearing devices that automatically amplify the desired speaker.
- 05Published in a biomedical physics/engineering journal; results are computational, not yet from clinical trials.
ASEAF can extract a target speaker's signal from a noisy mixture by fusing EEG and audio inputs.
studypartially supportedThe model mimics selective auditory attention as observed in the human brain.
studypartially supported- PMID
- 42102832
- DOI
- 10.1088/2057-1976/ae6aa0.
- Journal
- Biomedical Physics & Engineering Express
- Publication type
- research_article
- Evidence level
- na
- Population
- Computational model evaluated on EEG and audio datasets (no clinical patient cohort specified)
- Intervention
- ASEAF deep-learning model combining EEG signals and audio for target speaker extraction
- Comparator
- Baseline speaker extraction models without EEG fusion
Primary outcomes
Target speaker extraction accuracy in noisy multi-speaker environments; Auditory attention decoding performance