A Dual-Stage Perceptual-Harmonic Hybrid Estimator for Speech Enhancement

Main Article Content

Sally Taha Yousif
Basheera M. Mahmmod

Abstract

This paper proposes a hybrid speech enhancement estimator that integrates the Perceptually-motivated Karhunen–Loève Transform (PKLT) with the Dual-Masking Harmonic-based (DMH) algorithm in a unified framework termed PKDMH. The main novelty lies in combining perceptual subspace projection with harmonic-residual suppression, enabling the system to jointly remove noise while preserving speech-relevant spectral cues. PKLT first performs perceptual subspace projection and suppresses inaudible components, after which DMH eliminates remaining broadband and harmonic residuals. The proposed PKDMH system was evaluated using the TIMIT dataset contaminated with five noise types: White, Pink, F16, Airport, and Car noise—across five SNR levels (−10 dB, −5 dB, 0 dB, +5 dB, +10 dB). Objective evaluation used the standard perceptual and signal-level measures of PESQ, STOI, SNRseg, Csig, Cbak and Covl. Results show that the enhanced quality of separation and speech signal ratio between enhanced signals and original target binary mask cause obvious improvements in quantity, with average PESQ gains of 1.099, 0.888 and 0.824 for White, Pink and F16 noise, respectively. These results bring out the subjective benefit of the PKDMH cascade, in terms of being a more robust enhancement approach under low SNR and acoustically varying cases.

Downloads

Download data is not yet available.

Article Details

Section

Articles

How to Cite

“A Dual-Stage Perceptual-Harmonic Hybrid Estimator for Speech Enhancement” (2026) Journal of Engineering, 32(3), pp. 173–192. doi:10.31026/j.eng.2026.03.10.

References

Abdulhussain, S.H., Mahmmod, B.M., Naser, M.A., Alsabah, M., and Mustafina, J., 2021. Speech enhancement algorithm based on a hybrid estimator. IOP Conference Series: Materials Science and Engineering, 1090(1), P. 012102. https://doi.org/10.1088/1757-899X/1090/1/012102.

Al-Zubaidi, A.S., Abduljabbar, R.B., Mahmmod, B.M., Abdulhussain, S.H., Naser, M.A., Alsabah, M., Hussain, A., and Al-Jumeily, D., 2024. Speech enhancement algorithm using deep learning and Hahn polynomials. In: Proceedings of the 17th International Conference on Developments in eSystems Engineering (DeSE), pp. 42–47. IEEE. https://doi.org/10.1109/DeSE63988.2024.10911938.

Awad, H.A., Hameed, S.M., Mahmmod, B.M., Abdulhussain, S.H., and Hussain, A.J., 2023. Dual stages of speech enhancement algorithm based on super Gaussian speech models. Journal of Engineering, University of Baghdad, 29(9), pp. 1–13. https://doi.org/10.31026/j.eng.2023.09.01.

Boll, S.F., 1979. Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing, 27(2), pp. 113–120. https://doi.org/10.1109/TASSP.1979.1163209.

Cappe, O., 1994. Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor. IEEE Transactions on Speech and Audio Processing, 2(2), pp. 345–349. https://doi.org/10.1109/89.279283.

Cohen, I., 2002. Optimal speech enhancement under signal presence uncertainty using log-spectral amplitude estimator. IEEE Signal Processing Letters, 9(4), pp. 113–116. https://doi.org/10.1109/97.995823.

Elert, G., 2016. The nature of sound--the physics hypertextbook. physics.info. Retrieved, pp. 6–20.

Ephraim, Y. and Van Trees, H.L., 1995. A signal subspace approach for speech enhancement. IEEE Transactions on Speech and Audio Processing, 3(4), pp. 251–266. http://dx.doi.org/10.1109/89.397090.

Ephraim, Y., and Malah, D., 1984. Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 32(6), pp. 1109–1121. https://doi.org/10.1109/TASSP.1984.1164453.

Fu S.W., Yu C., Hsieh T.A., Plantinga P., Ravanelli M., Lu X., and Tsao Y., 2021. MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement. Interspeech, pp. 201–205, https://doi.org/10.48550/arXiv.2104.03538.

Garofolo, J.S., Lamel, L.F., Fisher, W.M., Fiscus, J.G., Pallett, D.S., and Dahlgren, N.L., 1993. TIMIT Acoustic-Phonetic Continuous Speech Corpus. Linguistic Data Consortium, Philadelphia. https://doi.org/10.35111/17gk-bn40.

Harris, F.J., 1978. On the use of windows for harmonic analysis with the discrete Fourier transform. Proceedings of the IEEE, 66(1), pp. 51–83. https://doi.org/10.1109/PROC.1978.10837.

Hasan, T. and Hasan, M.K., 2010. MMSE estimator for speech enhancement considering the constructive and destructive interference of noise. IET Signal Processing, 4(1), pp. 1–11. https://doi.org/10.1049/iet-spr.2008.0114.

Hattaraki, S.M. and Kambalimath, S.G., 2024. Enhancing speech intelligibility in hearing aids using spectral subtraction. Advanced Engineering Science, 56(7), pp. 4793–4799.

Hu Y., Liu Y., Lv S., Xing M., Zhang S., Fu Y., Wu J., Zhang B., and Xie L., DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement. Interspeech, 2020, pp. 2472–2476. https://doi.org/10.21437/Interspeech.2020-2537.

Hu, Y. and Loizou, P.C., 2007. Evaluation of objective quality measures for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 16(1), pp. 229–238. https://doi.org/10.1109/TASL.2007.911054.

Huang, J. and Zhao, Y., 2000. A DCT-based fast signal subspace technique for robust speech recognition. IEEE Transactions on Speech and Audio Processing, 8(6), pp. 747–751. https://doi.org/10.1109/89.876314.

Jabloun, F. and Champagne, B., 2003. Incorporating the human hearing properties in the signal subspace approach for speech enhancement. IEEE Transactions on Speech and Audio Processing, 11(6), pp. 700–708. https://doi.org/10.1109/TSA.2003.819954.

Jerjees, S.A., Mohammed, H.J., Radeaf, H.S., Mahmmod, B.M., and Abdulhussain, S.H., 2023. Deep learning-based speech enhancement algorithm using Charlier transform. Proceedings of the 15th International Conference on Developments in eSystems Engineering (DeSE), pp. 100–105. IEEE. https://doi.org/10.1109/DeSE58274.2023.10099854.

Kolbæk, M., Tan, Z.H. and Jensen, J., 2016. Speech intelligibility potential of general and specialized deep neural network-based speech enhancement systems. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(1), pp. 153–167. https://doi.org/10.1109/TASLP.2016.2628641.

Loizou, P.C., 2013. Speech Enhancement: Theory and Practice. 2nd ed. Boca Raton: CRC Press.

Michelsanti D., Tan Z.H., Zhang S.X., Xu Y., Yu M., Yu D., and Jensen J.,2021. An Overview of Deep-Learning-Based

Audio-Visual Speech Enhancement and Separation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 1368–1396. https://doi.org/10.1109/TASLP.2021.3066303.

Nabi, W., Aloui, N. and Cherif, A., 2016. Speech enhancement in dual-microphone mobile phones using Kalman filter. Applied Acoustics, 109, pp. 1–4. https://doi.org/10.1016/j.apacoust.2016.02.009.

Nasir, R.J. and Abdulmohsin, H.A., 2025. A hybrid method for speech noise reduction using Log-MMSE. Iraqi Journal of Science, 66(2), pp. 860–875. https://doi.org/10.24996/ijs.2025.66.2.24.

Natarajan, S., Al-Haddad, S.A.R., Ahmad, F.A., Kamil, R., et al., 2025. Deep neural networks for speech enhancement and speech recognition: A systematic review. Ain Shams Engineering Journal, 16(7), Article 103405. https://doi.org/10.1016/j.asej.2025.103405.

Ochieng, P., 2023. Deep neural network techniques for monaural speech enhancement and separation: state of the art analysis. Artificial Intelligence Review, 56(Suppl 3), pp. 3651–3703.

https://doi.org/10.48550/arXiv.2212.00369.

Plapous, C., Marro, C. and Scalart, P., 2006. Improved signal-to-noise ratio estimation for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 14(6), pp. 2098–2108. https://doi.org/10.1109/TASL.2006.872626.

Rix, A.W., Beerends, J.G., Hollier, M.P. and Hekstra, A.P., 2001. Perceptual evaluation of speech quality (PESQ)—A new method for speech quality assessment of telephone networks and codecs. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2, pp. 749–752.

https://doi.org/10.1109/ICASSP.2001.941023.

Roy, S.K. and Paliwal, K.K., 2021. Robustness and sensitivity tuning of the Kalman filter for single-channel speech enhancement in real-life noise conditions. Signals, 2(3), P. 27. https://doi.org/10.3390/signals2030027.

Scalart, P. and Filho, J.V., 1996. Speech enhancement based on a priori signal-to-noise ratio estimation. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2, pp. 629–632. https://doi.org/10.1109/ICASSP.1996.543199.

Scalart, P., Vieira Filho, J. and Chiquito, J.G., 1996. On speech enhancement algorithms based on MMSE estimation. In: 1996 8th European Signal Processing Conference (EUSIPCO 1996), pp. 1–4. IEEE. https://doi.org/10.5281/zenodo.36358.

Shi, S., Paliwal, K. and Busch, A., 2023. On DCT-based MMSE estimation of short-time spectral amplitude for single-channel speech enhancement. Applied Acoustics, 202, P. 109134. https://doi.org/10.1016/j.apacoust.2022.109134.

Soon, Y., Koh, S.N. and Yeo, C.K., 1998. Noisy speech enhancement using discrete cosine transform. Speech Communication, 24(3), pp. 249–257. https://doi.org/10.1016/S0167-6393(98)00019-3.

Stylianou, Y., 2001. Removing noise from speech using spectral subtraction and harmonicity-based masking. Speech Communication, 34(3), pp. 271–288. https://doi.org/10.1016/S0167-6393(00)00052-3.

Taal C.H., Hendriks R.C., Heusdens R., and Jensen J., 2011. An algorithm for intelligibility prediction of time–frequency weighted noisy speech. IEEE Transactions on Audio, Speech, and Language Processing, 19(7), pp. 2125–2136. https://doi.org/10.1109/TASL.2011.2114881.

Verteletskaya, E. and Simak, B., 2010. Noise reduction based on modified spectral subtraction method. The 17th International Conference on Systems, Signals and Image Processing (IWSSIP), pp. 233–236.

Vetter, R., 2001. Single-channel speech enhancement using MDL-based subspace approach in bark domain. International Conference on Acoustics, Speech, and Signal Processing (ICASSP’01), 1, pp.641–644. https://doi.org/10.1109/ICASSP.2001.940913.

Yousif, S.T., and Mahmmod, B.M., 2025. Speech enhancement algorithms: A systematic literature review. Algorithms, 18(5), Article 272. https://doi.org/10.3390/a18050272.

Zwicker, E. and Fastl, H., 1990. Psychoacoustics. Berlin: Springer-Verlag. https://doi.org/10.1007/978-3-540-68888-4.

Zwicker, E. and Fastl, H., 1999. Psychoacoustics: Facts and Models. 2nd ed. Berlin: Springer. https://doi.org/10.1007/978-3-662-03976-6.

Similar Articles

You may also start an advanced similarity search for this article.

Most read articles by the same author(s)