An Optimized and Robust Machine Learning Framework for Early Parkinson's Disease Prediction Using Speech Signals
Keywords:
XGBoost, Tree-structured Parzen Estimator, Data Augmentation, SMOTE, Decision Support SystemAbstract
With the rapid advancement of technologies in the present era, predicting Parkinson's disease (PD) early using non-invasive and low-cost methods, such as speech analysis with machine learning (ML) tools, remains a challenging task and lacks sufficient confidence for healthcare providers to use in daily practice. Therefore, this study presents an optimized early PD prediction tool and investigates its stability and robustness using a rigorous evaluation mechanism. For early PD prediction using speech signal data, the eXtreme Gradient Boosting (XGB) model is optimized using the Tree-structured Parzen Estimator (TPE) method and the Synthetic Minority Oversampling Technique (SMOTE) for solving the imbalanced dataset problem. Its performance was rigorously evaluated using an optimized strategy to ensure reliability and to earn the trust of clinicians for real-world operational use. To validate the model's trustworthiness and prediction capability, it was evaluated through 10 different runs of Stratified 10-Fold Cross Validation (SCV). The average measures of accuracy as 96.76%, precision as 97.70%, f1-score as 96.70%, recall as 95.91% and ROC-AUC 98.72% show great progress and performance in comparison with similar works. The model performance and stability were evaluated in many different situations and showed that the proposed model is stable and strong enough, and could be used as a practical tool in daily medical care. This tool brings the opportunity to be used easily as a decision support system through a website and detect PD early using patient voice signal with low cost in a non-invasive way that could be used remotely and easily.
References
Aarsland, D., Batzu, L., Halliday, G. M., Geurtsen, G. J., Ballard, C., Ray Chaudhuri, K., & Weintraub, D. (2021). Parkinson disease-associated cognitive impairment. Nature Reviews Disease Primers, 7(1), 47. https://pubmed.ncbi.nlm.nih.gov/34210995/
Akila, B., & Nayahi, J. J. V. (2024). Parkinson classification neural network with mass algorithm for processing speech signals. Neural Computing and Applications, 36(17), 10165-10181. https://link.springer.com/article/10.1007/s10462-025-11347-y
Almeida, J. S., Rebouças Filho, P. P., Carneiro, T., Wei, W., Damaševičius, R., Maskeliūnas, R., & de Albuquerque, V. H. C. (2019). Detecting Parkinson’s disease with sustained phonation and speech signals using machine learning techniques. Pattern Recognition Letters, 125, 55-62. https://www.sciencedirect.com/science/article/pii/S0167865519301163
Alshammri, R., Alharbi, G., Alharbi, E., & Almubark, I. (2023). Machine learning approaches to identify Parkinson's disease using voice signal features. Frontiers in Artificial Intelligence, 6, 1084001. https://www.frontiersin.org/journals/artificial-intelligence/articles/10.3389/frai.2023.1084001/full
Armstrong, M. J., & Okun, M. S. (2020). Diagnosis and treatment of Parkinson disease: a review. JAMA, 323(6), 548-560. https://pubmed.ncbi.nlm.nih.gov/32044947/
Balaha, H. M., Hassan, A. E.-S., Ahmed, R. A., & Balaha, M. H. (2025). Comprehensive multimodal approach for Parkinson’s disease classification using artificial intelligence: insights and model explainability. Soft Computing, 1-33. https://dl.acm.org/doi/10.1007/s00500-025-10463-9
Baqer, N. R., & Rashidi-Khazaee, P. (2025). Residential Building Energy Usage Prediction Using Bayesian-Based Optimized XGBoost Algorithm. IEEE Access. https://ieeexplore.ieee.org/iel8/6287639/10820123/10900361.pdf
Baruah, D., Rehman, R., Bora, P. K., Mahanta, P., Dutta, K., & Konwar, P. (2025). Performance Evaluation of Classification Algorithms for Parkinson’s Disease Diagnosis: A Comparative Study. Journal of Electronics, Electromedical Engineering, and Medical Informatics, 7(3), 692-712. https://jeeemi.org/index.php/jeeemi/article/view/713
Ben-Shlomo, Y., Darweesh, S., Llibre-Guerra, J., Marras, C., San Luciano, M., & Tanner, C. (2024). The epidemiology of Parkinson's disease. The lancet, 403(10423), 283-292. https://pubmed.ncbi.nlm.nih.gov/38245248/
Bergstra, J., Yamins, D., & Cox, D. (2013). Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. International conference on machine learning, https://proceedings.mlr.press/v28/bergstra13.html
Beriich, M., Ouhmida, A., Alouani, Z., Saleh, S., Cherradi, B., & Raihani, A. (2025). Advancing Parkinson’s Disease Detection: A Review of AI and Deep Learning Innovations. 2025 5th International Conference on Innovative Research in Applied Science, Engineering and Technology (IRASET), https://jglobal.jst.go.jp/en/detail?JGLOBAL_ID=202502252256845613
Bloem, B. R., Okun, M. S., & Klein, C. (2021). Parkinson's disease. The lancet, 397(10291), 2284-2303. https://pubmed.ncbi.nlm.nih.gov/33848468/
Braga, D., Madureira, A. M., Coelho, L., & Ajith, R. (2019). Automatic detection of Parkinson’s disease based on acoustic analysis of speech. Engineering Applications of Artificial Intelligence, 77, 148-158. https://www.sciencedirect.com/science/article/abs/pii/S0952197618302045
Cantürk, İ., & Günay, O. (2024). Investigation of scalograms with a deep feature fusion approach for detection of Parkinson’s disease. Cognitive Computation, 16(3), 1198-1209. https://link.springer.com/article/10.1007/s12559-024-10254-8
Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. https://doi.org/10.1145/2939672.2939785
Das, P., Nanda, S., & Panda, G. (2020). Automated improved detection of Parkinson’s disease using ensemble modeling. 2020 IEEE International Symposium on Sustainable Energy, Signal Processing and Cyber Security (iSSSC), https://www.proceedings.com/content/057/057976webtoc.pdf
Dorsey, E. R., Sherer, T., Okun, M. S., & Bloem, B. R. (2018). The emerging evidence of the Parkinson pandemic. Journal of Parkinson’s disease, 8(s1), S3-S8. https://pubmed.ncbi.nlm.nih.gov/30584159/
Gupta, D., Julka, A., Jain, S., Aggarwal, T., Khanna, A., Arunkumar, N., & de Albuquerque, V. H. C. (2018). Optimized cuttlefish algorithm for diagnosis of Parkinson’s disease. Cognitive Systems Research, 52, 36-48. https://www.sciencedirect.com/science/article/pii/S1389041718301876
Islam, M. A., Majumder, M. Z. H., Hussein, M. A., Hossain, K. M., & Miah, M. S. (2024). A review of machine learning and deep learning algorithms for Parkinson's disease detection using handwriting and voice datasets. Heliyon, 10(3). https://www.sciencedirect.com/science/article/pii/S2405844024015007
Jain, D., Mishra, A. K., & Das, S. K. (2020). Machine learning based automatic prediction of Parkinson’s disease using speech features. Proceedings of International Conference on Artificial Intelligence and Applications: ICAIA 2020, https://www.researchgate.net/publication/342640627_Machine_Learning_Based_Automatic_Prediction_of_Parkinson's_Disease_Using_Speech_Features
Jain, D., Mishra, A. K., & Das, S. K. (2021). Machine learning based automatic prediction of Parkinson’s disease using speech features. Proceedings of International Conference on Artificial Intelligence and Applications: ICAIA 2020,
Kadam, V. J., & Jadhav, S. M. (2018). Feature ensemble learning based on sparse autoencoders for diagnosis of Parkinson’s disease. In Computing, Communication and Signal Processing: Proceedings of ICCASP 2018 (pp. 567-581). Springer. https://dl.acm.org/doi/abs/10.1007/s00521-021-05741-0
Kadam, V. J., & Jadhav, S. M. (2019). Feature ensemble learning based on sparse autoencoders for diagnosis of Parkinson’s disease. Computing, Communication and Signal Processing: Proceedings of ICCASP 2018, https://dl.acm.org/doi/abs/10.1007/s00521-021-05741-0
Kadhim, M. N., Al-Shammary, D., & Sufi, F. (2024). A novel voice classification based on Gower distance for Parkinson disease detection. International Journal of Medical Informatics, 191, 105583. https://www.sciencedirect.com/science/article/pii/S1386505624002466
Kardan, R., Nazari, M., Hemmati, J., Ahmadi, A., & Ashab, M. (2024). A Novel Therapeutic Strategy for Parkinson's Disease based on the Gut Microbiota: A Rreview Article [Review]. Scientific Journal of Kurdistan University of Medical Sciences, 29(3), 127-138. https://doi.org/10.61186/sjku.29.3.11
Lamba, R., Gulati, T., Alharbi, H. F., & Jain, A. A hybrid system for Parkinson’s disease diagnosis using machine learning techniques. International Journal of Speech Technology, 1-11. https://dl.acm.org/doi/10.4018/IJSI.292027
Lamba, R., Gulati, T., & Jain, A. (2022). A hybrid feature selection approach for parkinson’s detection based on mutual information gain and recursive feature elimination. Arabian Journal for Science and Engineering, 47(8), 10263-10276. https://www.springerprofessional.de/en/a-hybrid-feature-selection-approach-for-parkinson-s-detection-ba/20046808
Little, M., McSharry, P., Hunter, E., Spielman, J., & Ramig, L. (2008). Suitability of dysphonia measurements for telemonitoring of Parkinson’s disease. Nature Precedings, 1-1. https://pmc.ncbi.nlm.nih.gov/articles/PMC3051371/
Mei, J., Desrosiers, C., & Frasnelli, J. (2021). Machine learning for the diagnosis of Parkinson's disease: a review of literature. Frontiers in Aging Neuroscience, 13, 633752. https://www.frontiersin.org/journals/aging-neuroscience/articles/10.3389/fnagi.2021.633752/full
Pahuja, G., & Nagabhushan, T. (2021). A comparative study of existing machine learning approaches for Parkinson's disease detection. IETE Journal of Research, 67(1), 4-14. https://www.shs-conferences.org/articles/shsconf/ref/2022/09/shsconf_etltc2022_03027/shsconf_etltc2022_03027.html
Patel¹, N., Srividhya, R., Linda, P. E., & Rajesh¹, S. (2025). Parkinson's Insight: Leveraging CNN and LSTM Networks for Enhanced Diagnostic Accuracy. Proceedings of the International Conference on Advancements in Computing Technologies and Artificial Intelligence (COMPUTATIA 2025), https://www.atlantis-press.com/proceedings/computatia-25/126010054
Poewe, W., Seppi, K., Tanner, C. M., Halliday, G. M., Brundin, P., Volkmann, J., Schrag, A.-E., & Lang, A. E. (2017). Parkinson disease. Nature Reviews Disease Primers, 3(1), 17013. https://doi.org/10.1038/nrdp.2017.13
Rahman, S., Hasan, M., Sarkar, A. K., & Khan, F. (2023). Classification of Parkinson’s disease using speech signal with machine learning and deep learning approaches. European Journal of Electrical Engineering and Computer Science, 7(2), 20-27. https://ejece.org/index.php/ejece/article/view/488
Reddy, A., Reddy, R. P., Roghani, A. K., Garcia, R. I., Khemka, S., Pattoor, V., Jacob, M., Reddy, P. H., & Sehar, U. (2024). Artificial intelligence in Parkinson's disease: Early detection and diagnostic advancements. Ageing Research Reviews, 99, 102410. https://ejece.org/index.php/ejece/article/view/488
Reddy, H., Jagadeesh, D. V. S., Pati, P. B., & Kn, B. P. (2024). Parkinson's Disease Diagnosis from Patients Speech Analysis. 2024 IEEE 9th International Conference for Convergence in Technology (I2CT), https://www.semanticscholar.org/paper/Parkinson's-Disease-Diagnosis-from-Patients-Speech-HarshithaReddy-Aryagopal/e24528e3b84b9b2f6d65f7c8821d1bc9a9f16639
Saha, D. K., & Nath, T. D. (2025). A lightweight CNN-based ensemble approach for early detecting Parkinson’s disease with enhanced features. International Journal of Speech Technology, 1-15. https://pubmed.ncbi.nlm.nih.gov/28592904/
Schapira, A. H., Chaudhuri, K. R., & Jenner, P. (2017). Non-motor features of Parkinson disease. Nature Reviews Neuroscience, 18(7), 435-450. https://pubmed.ncbi.nlm.nih.gov/28592904/
Senturk, Z. K. (2020). Early diagnosis of Parkinson’s disease using machine learning algorithms. Medical Hypotheses, 138, 109603. https://pubmed.ncbi.nlm.nih.gov/32028195/
Sharma, P., Sundaram, S., Sharma, M., Sharma, A., & Gupta, D. (2019). Diagnosis of Parkinson’s disease using modified grey wolf optimization. Cognitive Systems Research, 54, 100-115. https://www.sciencedirect.com/science/article/abs/pii/S1389041718308726
Srinivasan, S., Ramadass, P., Mathivanan, S. K., Panneer Selvam, K., Shivahare, B. D., & Shah, M. A. (2024). Detection of Parkinson disease using multiclass machine learning approach. Scientific reports, 14(1), 13813. https://www.nature.com/articles/s41598-024-64004-9
Thirapanish, W., Kantavat, P., Wanvarie, D., Chuangsuwanich, E., & Punyabukkana, P. (2024). Evaluating Machine Learning-Based Feature Selection Methods for Diagnosing Parkinson's Disease Under the SVM Framework. 2024 7th International Conference on Artificial Intelligence and Big Data (ICAIBD), https://www.researchgate.net/publication/382718440_Evaluating_Machine_Learning-Based_Feature_Selection_Methods_for_Diagnosing_Parkinson's_Disease_Under_the_SVM_Framework
Yang, L., & Shami, A. (2020). On hyperparameter optimization of machine learning algorithms: Theory and practice. Neurocomputing, 415, 295-316. https://doi.org/10.1016/j.neucom.2020.07.061
Yang, Z., Zhou, H., Srivastav, S., Shaffer, J. G., Abraham, K. E., Naandam, S. M., & Kakraba, S. (2025). Optimizing parkinson’s disease prediction: A comparative analysis of data aggregation methods using multiple voice recordings via an automated artificial intelligence pipeline. Data, 10(1), 4. https://www.mdpi.com/2306-5729/10/1/4
Zolin, A., Ooi, H., Zhou, M., Su, C., Wang, F., & Sarva, H. (2025). Liver fibrosis associated with more severe motor deficits in early Parkinson’s disease. Clinical Neurology and Neurosurgery, 252, 108861. https://scholar.google.com/citations?user=P4PgpD4AAAAJ&hl=en
Downloads
Published
Submitted
Revised
Accepted
Issue
Section
License
Copyright (c) 2026 Ghadeer Aqil Ali, Leila Sharifi (Author); Parviz Rashidi-Khazaee; Hossein Nahid-Titkanlue (Author)

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

