

Genetic Algorithm-Optimized B-Spline Feature Extraction for Accurate Concentration Prediction by Online Raman Spectroscopy: a Comparative Analysis of the Efficacy of Sparse Training Data
Abstract
Raman spectroscopy combined with machine learning techniques is a promising approach for quantitative substance analysis. Online Raman spectrometers have intrinsic limits in sampling circumstances, preventing the utilization of surface-enhanced Raman scattering (SERS) approaches and therefore hindering highprecision predictions for low-concentration analytes. This paper introduces an innovative framework that integrates B-spline fitting for feature extraction with a least squares concentration prediction model, which is improved by hyperparameter optimization using a genetic algorithm (GA). The performance of this framework was carefully evaluated against four alternative GA-optimized prediction models: wavelet transform feature extraction with ridge regression, linear regression neural networks, standalone ridge regression, and polynomial fitting using least squares. Experimental validation included Raman spectral datasets obtained from boric acid and nitric acid solutions throughout 11 concentration gradients (0–500 mg/L) that were evenly dispersed within the designated range. A stratified data partitioning approach, which assigned six concentration levels to the test set, while leveraging the remaining five to create three separate training subsets (3, 4, and 5 concentration levels), was employed. A comparative investigation revealed that the B-spline-least-squares model achieved optimal prediction accuracy when it was trained on four concentration levels, resulting in a mean root-mean-square error (RMSE) of 5.83 mg/L for both analytes. The performance hierarchy revealed that the wavelet-ridge regression model (5-level training subset, RMSE = 6.02 mg/L) was the secondbest method. Linear regression neural networks, ridge regression, and polynomial least squares models achieved optimal performance with five training concentrations, yielding mean RMSE values of 7.35, 9.17, and 12.21 mg/L, respectively.
Keywords
About the Authors
Shu WangChina
Zhejiang
Peng-Fan Xiong
China
Zhejiang
Bo Xu
China
Zhejiang
Yan-Long Meng
China
Zhejiang
Chun-Lian Zhan
China
Zhejiang
Zheng-Ye Zhou
China
Zhejiang
References
1. Muhammad Shahbaz, Ayesha Tariq, Muhammad I. Majeed, Haq Nawaz, Nosheen Rashid, Hina Shehnaz, Kiran Kainat, et al., ACS Omega, 8, No. 39, 36393–36400 (2023).
2. Yuyang Miao, Lihong Wu, Junlian Qiang, Jinfeng Qi, Ying Li, Ruihua Li, Xiaodong Kong, et al., Front. Bioeng. Biotechnol., 12, 1385552 (2024).
3. Yunfei Bai, Haiyan Luo, Zhiwei Li, Yi Ding, Yunfei Han, Wei Xiong, Anal. Lett., 57, No. 13, 2018–2033 (2024).
4. Saga Bergqvist, Per-Erik Bengtsson, Kim C. Le, Opt. Express, 32, No. 16, 28681–28694 (2024).
5. Jin Wang, Wuye Yang, Meng Su, Huipeng Deng, Yiping Du, Anal. Methods, 17, No. 1, 184–192 (2025).
6. Haowen Huang, Zile Fang, Yuelong Xu, Guosheng Lu, Can Feng, Min Zeng, Jiaju Tian, et al., Talanta, 276, 126242 (2024).
7. Xun Zhang, Sheng Chen, Zhe Ling, Xia Zhou, Da-Yong Ding, Yoon S. Kim, Feng Xu, Sci. Rep., 7, 39891 (2017).
8. H. Georg Schulze, Shreyas Rangan, James M. Piret, Michael W. Blades, Robin F. B. Turner, Appl. Spectrosc., 72, No. 9, 1322–1340 (2018).
9. Dimitar Georgiev, Simon V. Pedersen, Ruoxiao Xie, Álvaro Fernández-Galiana, Molly M. Stevens, Mauricio Barahona, Anal. Chem., 96, No. 21, 8492–8500 (2024).
10. Ravi T. Vulchi, Volodymyr Morgunov, Rajendhar Junjuri, Thomas Bocklitz, Molecules, 29, No. 19, 4748 (2024).
11. Derrick Boateng, Chuanzhen Hu, Yichuan Dai, Kaiqin Chu, Jun Du, Zachary J. Smith, Analyst, 147, No. 20, 4607–4615 (2022).
12. Nicolas Pavillon, Nicholas I. Smith, Analyst, 146, No. 11, 3633–3641 (2021).
13. Riham Ezzeldin, Martina Zelenakova, Hany F. Abd-Elhamid, Katarzyna Pietrucha-Urbanik, Samer Elabd, Water, 15, No. 10, 1906 (2023).
14. M. A. El-Shorbagy, Adel M. El-Refaey, J. Comput. Des. Eng., 9, No. 2, 706–730 (2022).
15. Ning Yang, Cédric Guerin, Ninel Kokanyan, Patrick Perré, Spectrochim. Acta Mol. Biomol. Spectrosc., 304, 123343 (2024).
16. Aliaksandra Sikirzhytskaya, Vitali Sikirzhytski, Luis Pérez-Almodóvar, Igor K. Lednev, Forensic Chem., 32, 100468 (2023).
17. Gull Rimsha, Muhammad Shahbaz, Muhammad I. Majeed, Haq Nawaz, Nosheen Rashid, Muhammad W. Akram, Ifra Shabbir, et al., ACS Omega, 8, No. 44, 41451–41457 (2023).
18. Tuyu Li, Yong Zheng, Chang Huang, Jianhua Cao, Lingling Wang, Guihua Wang, Forests, 14, No. 6, 1122 (2023).
19. Muhammad Usman, Ahmad Ali, Abdullah Tahir, Muhammad Z. U. Rahman, Abdul Manan Khan, Sensors, 22, No. 23, 9168 (2022).
20. David Lenz, Raine Yeh, Vijay Mahadevan, Iulian Grindeanu, Tom Peterka, J. Comp. Sci., 71, 102037 (2023).
21. Qingxian Zhang, Hui Li, Hongfei Xiao, Jian Zhang, Xiaozhe Li, Rui Yang, Anal. Methods, 13, No. 17, 2037–2043 (2021).
Review
For citations:
Wang Sh., Xiong P., Xu B., Meng Ya., Zhan Ch., Zhou Zh. Genetic Algorithm-Optimized B-Spline Feature Extraction for Accurate Concentration Prediction by Online Raman Spectroscopy: a Comparative Analysis of the Efficacy of Sparse Training Data. Zhurnal Prikladnoii Spektroskopii. 2025;92(5):708.