Advances in avian acoustic recognition through artificial intelligence: a systematic review of techniques and environmental applications
DOI:
https://doi.org/10.5327/Z2176-94782514Keywords:
convolutional neural networks; birdsong; machine learning.Abstract
The accelerating loss of biodiversity has driven the adoption of automated technologies for environmental monitoring, including the acoustic recognition of birds using artificial intelligence. This study aimed to systematically review the primary methods employed in automatic recognition of bird vocalizations, with an emphasis on the evolution of techniques and their environmental applications. The integrative literature review covered publications from 2013 to 2025, with searches conducted on the databases Scopus, Web of Science, IEEE Xplore, and Google Scholar, using terms related to machine learning and bioacoustics. After screening 2,435 publications, 25 studies were selected for in-depth analysis. The findings indicate a methodological shift from Mel-Frequency Cepstral Coefficients to Convolutional Neural Networks, highlighting improvements in classification accuracy and noise robustness. It is concluded that neural networks are becoming increasingly effective tools for biodiversity conservation, although challenges remain regarding model generalization and computational cost.
Downloads
References
Blake, J.G.; Loiselle, B.A., 2024. Sharp declines in observation and capture rates of Amazon birds in absence of human disturbance. Global Ecology and Conservation, v. 51, e02902. https://doi.org/10.1016/j.gecco.2024.e02902.
Braga, A.P.; Ferreira, A.C.P.L.; Ludermir, T.B., 2007. Redes neurais artificiais: teorias e aplicações. LTC, Rio de Janeiro, 226 p.
Celeghin, A.; Borriero, A.; Orsenigo, D.; Diano, M.; Méndez Guerrero, C.A.; Perotti, A.; Petri, G.; Tamietto, M., 2023. Convolutional neural networks for vision neuroscience: significance, developments, and outstanding issues. Frontiers in Computational Neuroscience, v. 17, 1153572. https://doi.org/10.3389/fncom.2023.1153572.
Chhaya, V.; Lahiri, S.; Jagan, M.A.; Krishnan, A., 2021. Community bioacoustics: studying acoustic community structure for ecological and conservation insights. Frontiers in Ecology and Evolution, v. 9, 706445. https://doi.org/10.3389/fevo.2021.706445.
Clark, M.L.; Salas, L.; Baligar, S.; Quinn, C.; Snyder, R.L.; Leland, D.; Schakwitz, W.; Goetz, S.J.; Newsam, S., 2023. The effect of soundscape composition on bird vocalization classification in a citizen science biodiversity monitoring project. Ecological Informatics, v. 75, 102065. https://doi.org/10.1016/j.ecoinf.2023.102065.
Cooke, R.; Sayol, F.; Andermann, T.; Blackburn, T.M.; Steinbauer, M.J.; Antonelli, A.; Faurby, S., 2023. Undiscovered bird extinctions obscure the true magnitude of human-driven extinction waves. Nature Communications, v. 14, 8116. https://doi.org/10.1038/s41467-023-43445-2.
Das, N.; Mondal, A.; Chaki, J.; Padhy, N.; Dey, N., 2020. Machine learning models for bird species recognition based on vocalization: a succinct review. Information Technology and Intelligent Transportation Systems. IOS Press, Amsterdam, p. 117-124. https://doi.org/10.3233/FAIA200052.
Dong, X.; Towsey, M.; Zhang, J.; Banks, J.; Roe, P., 2013. A novel representation of bioacoustic events for content-based search in field audio data. In: International Conference on Digital Image Computing: Techniques and Applications (DICTA), Hobart, pp. 1-6. https://doi.org/10.1109/DICTA.2013.6691473.
Dufour, O.; Artières, T.; Glotin, H.; Giraudet, P., 2014. Clusterized mel filter cepstral coefficients and support vector machines for bird song identification. In: Glotin, H. (Ed.), Proceedings of the First International Workshop on Machine Learning for Bioacoustics (ICML4B), joint to ICML 2013, Atlanta, GA, USA, p. 89-93. InTech. https://doi.org/10.5772/56872.
Eickhoff, S.; Rottschy, C.; Kujovic, M.; Palomero-Gallagher, N.; Zilles, K., 2008. Organizational principles of human visual cortex revealed by receptor mapping. Cerebral Cortex, v. 18 (11), 2637-2645. https://doi.org/10.1093/cercor/bhn024.
Espejo, D.; Vargas, V.; Viveros-Muñoz, R.; Labra, F.A.; Huijse, P.; Poblete, V., 2024. Short-time acoustic indices for monitoring urban-natural environments using artificial neural networks. Ecological Indicators, v. 160, 111775. https://doi.org/10.1016/j.ecolind.2024.111775.
García, D.; Rumeu, B.; Illera, J.C.; Miñarro, M.; Palomar, G.; González-Varo, J.P., 2024. Common birds combine pest control and seed dispersal in apple orchards through a hybrid interaction network. Agriculture, Ecosystems & Environment, v. 365, 108927. https://doi.org/10.1016/j.agee.2024.108927.
García-Ordás, M.T.; Rubio-Martín, S.; Benítez-Andrades, J.A.; Alaiz-Moretón, H.; García-Rodríguez, I., 2023. Multispecies bird sound recognition using a fully convolutional neural network. Applied Intelligence, v. 53 (20), 23287-23300. https://doi.org/10.1007/s10489-023-04704-3.
Giri, G.; Kc, I.; Khatiwada, P.; Adhikari, S.K.; Shakya, S., 2025. CNN-based bird sound detection: a comparative performance study. International Journal on Engineering Technology, v. 2 (2), 176-187. https://doi.org/10.3126/injet.v2i2.78615.
Goodfellow, I.; Bengio, Y.; Courville, A., 2016. Deep learning. MIT Press, Cambridge, MA (Accessed March 09, 2025) at:. https://www.deeplearningbook.org.
Han, X.; Peng, J., 2023. Bird sound classification based on ECOC-SVM. Applied Acoustics, v. 204, 109245. https://doi.org/10.1016/j.apacoust.2023.109245.
He, H.; Luo, H., 2025. An improved lightweight method based on EfficientNet for birdsong recognition. Scientific Reports, v. 15, 23727. https://doi.org/10.1038/s41598-025-07875-w.
Hidayat, A.A.; Cenggoro, T.W.; Pardamean, B., 2021. Convolutional neural networks for scops owl sound classification. Procedia Computer Science, v. 179, 81-87. https://doi.org/10.1016/j.procs.2020.12.010.
Hill, S.D.; Ji, W.; Parker, K.A.; Amiot, C.; Wells, S.J., 2013. A comparison of vocalisations between mainland tui (Prosthemadera novaeseelandiae novaeseelandiae) and Chatham Island tui (P. n. chathamensis). New Zealand Journal of Ecology, v. 37 (2), 214-223 (Accessed July 18, 2025) at:. https://newzealandecology.org/nzje/3085_/pdf.
Himmelberg, M.M.; Winawer, J.; Carrasco, M., 2022. Linking individual differences in human primary visual cortex to contrast sensitivity around the visual field. Nature Communications, v. 13 (1), 3309. https://doi.org/10.1038/s41467-022-31041-9.
Hong, T.Y.; Zabidi, M., 2021. Bird sound detection with convolutional neural networks using raw waveforms and spectrograms. International Symposium on Applied Science and Engineering, Erzurum, Turkey, 7-9 (Accessed March 07, 2025) at:. https://www.researchgate.net/publication/350725575_Bird_Sound_Detection_with_Convolutional_Neural_Networks_using_Raw_Waveforms_and_Spectrograms.
Incze, A.; Janczó, H.B.; Szilágyi, Z.A.; Farkas, A.; Sulyok, C., 2018. Bird sound recognition using a convolutional neural network. Proceedings of IEEE 16th International Symposium on Intelligent Systems and Informatics (SISY), 295-300. https://doi.org/10.1109/SISY.2018.8524677.
Inoue, T.; Okura, Y.; Yoshida, T.; Washitani, I., 2025. Passive acoustic monitoring for assessing forest bird distribution and identifying conservationally important areas in a subtropical forest landscape. Ecological Research, v. 40 (4). https://doi.org/10.1111/1440-1703.12543.
Joly, A.; Champ, J.; Buisson, O., 2014. Instance-based bird species identification with undiscriminant features pruning – LifeCLEF 2014. CLEF Working Notes, LifeCLEF 2014 (Accessed July 07, 2025) at:. https://ceur-ws.org/Vol-1180/CLEF2014wn-Life-JolyEt2014b.pdf.
Kahl, S.; Wood, C.M.; Eibl, M.; Klinck, H., 2021. BirdNET: A deep learning solution for avian diversity monitoring. Ecological Informatics, v. 61, 101236. https://doi.org/10.1016/j.ecoinf.2021.101236.
Keck, F.; Peller, T.; Alther, R.; Barouillet, C.; Blackman, R.; Capo, E.; Chonova, T.; Couton, M.; Fehlinger, L.; Kirschner, D.; Knüsel, M.; Muneret, L.; Oester, R.; Tapolczai, K.; Zhang, H.; Altermatt, F., 2025. The global human impact on biodiversity. Nature, v. 641, 395-400. https://doi.org/10.1038/s41586-025-08752-2.
Kershenbaum, A.; Blumstein, D. T.; Roch, M. A.; Akçay, Ç.; Backus, G.; Bee, M. A.; Bohn, K.; Cao, Y.; Carter, G.; Cäsar, C.; Coen, M.; DeRuiter, S. L.; Doyle, L.; Edelman, S.; Ferrer-i-Cancho, R.; Freeberg, T. M.; Garland, E. C.; Gustison, M.; Harley, H. E.; Huetz, C.; Hughes, M.; Hyland Bruno, J.; Ilany, A.; Jin, D. Z.; Johnson, M.; Ju, C.; Karnowski, J.; Lohr, B.; Manser, M. B.; McCowan, B.; Mercado III, E.; Narins, P. M.; Piel, A.; Rice, M.; Salmi, R.; Sasahara, K.; Sayigh, L.; Shiu, Y.; Taylor, C.; Vallejo, E. E.; Waller, S.; Zamora-Gutierrez, V., 2014. Acoustic sequences in nonhuman animals: a tutorial review and prospectus. Biological Reviews, v. 91 (1), 13-52. https://doi.org/10.1111/brv.12160.
Koops, H.V.; Van Balen, J.; Wiering, F., 2014. A deep neural network approach to the LifeCLEF 2014 bird task. CLEF2014 Working Notes, v. 1180, 634-642 (Accessed July 18, 2025) at:. https://ceur-ws.org/Vol-1180/CLEF2014wn-Life-KoopsEt2014.pdf.
Krishna, B.; Kondle, P.; Vankdothu, R., 2024. Automated system for identifying bird species. African Journal of Biological Sciences, v. 6 (Si2), 367-385 (Accessed March 07, 2025) at:. https://www.afjbs.com/uploads/paper/3887c3d484b83ff1a68b53586f2fd925.pdf.
Lauha, P.; Somervuo, P.; Lehikoinen, P.; Geres, L.; Richter, T.; Seibold, S.; Ovaskainen, O., 2022. Domain-specific neural networks improve automated bird sound recognition already with small amount of local data. Methods in Ecology and Evolution, v. 13 (12), 2799-2810. https://doi.org/10.1111/2041-210X.14003.
LeBien, J.; Zhong, M.; Campos-Cerqueira, M.; Velev, J.P.; Dodhia, R.; Lavista Ferres, J.; Aide, T.M., 2020. A pipeline for identification of bird and frog species in tropical soundscape recordings using a convolutional neural network. Ecological Informatics, v. 59, 101113. https://doi.org/10.1016/j.ecoinf.2020.101113.
Lees, A.C.; Haskell, L.; Allinson, T.; Bezeng, S.B.; Burfield, I.J.; Renjifo, L.M.; Rosenberg, K.V.; Viswanathan, A.; Butchart, S.H.M., 2022. State of the World’s Birds. Annual Review of Environment and Resources, v. 47, 231-260. https://doi.org/10.1146/annurev-environ-112420-014642.
Liu, J.; Zhang, Y.; Lv, D.; Lu, J.; Xie, S.; Zi, J.; Yin, Y.; Xu, H., 2022. Birdsong classification based on ensemble multi-scale convolutional neural network. Scientific Reports, v. 12, 8636. https://doi.org/10.1038/s41598-022-12121-8.
Maegawa, Y.; Ushigome, Y.; Suzuki, M.; Taguchi, K.; Kobayashi, K.; Haga, C.; Matsui, T., 2021. A new survey method using convolutional neural networks for automatic classification of bird calls. Ecological Informatics, v. 61, 101164. https://doi.org/10.1016/j.ecoinf.2020.101164.
Márquez-Rodríguez, A.; Rodríguez-Gómez, C.; León-Ortega, M.; Guzmán, J.; Baños-Guerrero, C., 2025. A bird song detector for improving bird identification through deep learning: a case study from Doñana. Ecological Informatics, v. 75, 103254. https://doi.org/10.1016/j.ecoinf.2025.103254.
Mascorro, G.A.M.; Torres, G.A., 2013. Reconocimiento de voz basado en MFCC, SBC y espectrogramas. Ingenius, (10), 12-20. ISSN: 1390-650X.
Maznikova, V.N.; Ormerod, S.J.; Gómez Serrano, M.Á., 2024. Birds as bioindicators of river pollution and beyond: specific and general lessons from an apex predator. Ecological Indicators, v. 158, 111366. https://doi.org/10.1016/j.ecolind.2023.111366.
McGinn, K.; Kahl, S.; Peery, M.Z.; Klinck, H.; Wood, C.M., 2023. Feature embeddings from the BirdNET algorithm provide insights into avian ecology. Ecological Informatics, v. 74, 101995. https://doi.org/10.1016/j.ecoinf.2023.101995.
Mohanty, R.; Bhuyan, H.K.; Pani, S.K.; Ravi, V.; Krichen, M., 2023. Bird species recognition using spiking neural network along with distance based fuzzy co-clustering. International Journal of Speech Technology, v. 26 (3), 681-694. https://doi.org/10.1007/s10772-023-10040-1.
Molina-Mora, I.; Ruiz-Gutiérrez, V.; Vega-Hidalgo, Á.; Sandoval, L., 2024. The utility of passive acoustic monitoring for using birds as indicators of sustainable agricultural management practices. Frontiers in Bird Science, v. 3. https://doi.org/10.3389/fbirs.2024.1386759.
Müller, J.; Mitesser, O.; Schaefer, H.M.; Seibold, S.; Busse, A.; Kriegel, P.; Rabl, D.; Gelis, R.; Arteaga, A.; Freile, J.; Leite, G.A.; De Melo, T.N.; LeBien, J.; Campos Cerqueira, M.; Blüthgen, N.; Tremlett, C.J.; Böttger, D.; Feldhaar, H.; Grella, N.; Falconí López, A.; Donoso, D.A.; Morinière, J.; Buřivalová, Z., 2023. Soundscapes and deep learning enable tracking biodiversity recovery in tropical forests. Nature Communications, v. 14, 6191. https://doi.org/10.1038/s41467-023-41693-w.
Pereira, G.H.A.; Centeno, J.A.S., 2017. Avaliação do tamanho de amostras de treinamento para redes neurais artificiais na classificação supervisionada de imagens utilizando dados espectrais e laser scanner. Boletim de Ciências Geodésicas, v. 23 (2), 268-283. https://doi.org/10.1590/S1982-21702017000200017.
Permana, S.D.H.; Saputra, G.; Arifitama, B.; Yaddarabullah; Caeserenda, W.; Rahim, R., 2022. Classification of bird sounds as an early warning method of forest fires using convolutional neural network (CNN) algorithm. Journal of King Saud University – Computer and Information Sciences, v. 34 (11), 4345-4357. https://doi.org/10.1016/j.jksuci.2021.04.013.
Piczak, K. J., 2016. Recognizing bird species in audio recordings using deep convolutional neural networks. In: Working Notes of CLEF 2016 – Conference and Labs of the Evaluation Forum, 534-543. CEUR-WS. (CEUR Workshop Proceedings, v. 1609), Aachen (Accessed July 18, 2025) at:. https://ceur-ws.org/Vol-1609/16090534.pdf.
Potamitis, I.; Ntalampiras, S.; Jahn, O.; Riede, K., 2014. Automatic bird sound detection in long real-field recordings: applications and tools. Applied Acoustics, v. 80, 1-9. https://doi.org/10.1016/j.apacoust.2014.01.001.
Priyadarshani, N.; Marsland, S.; Castro, I., 2018. The impact of environmental factors in birdsong acquisition using automated recorders. Ecology and Evolution, v. 8, 5016-5033. https://doi.org/10.1002/ece3.3889.
Priyadarshani, N.; Marsland, S.; Juodakis, J.; Castro, I.; Listanti, V., 2020. Wavelet filters for automated recognition of birdsong in long-time field recordings. Methods in Ecology and Evolution, v. 11 (3), 403-417. https://doi.org/10.1111/2041-210X.13357.
Qamar, R.; Zardari, B.A., 2023. Artificial neural networks: an overview. Mesopotamian Journal of Computer Science, v. 2023, 130-139. https://doi.org/10.58496/MJCSC/2023/015.
Ramirez, A.D.P.; De la Rosa Vargas, J.I.; Valdez, R.R.; Becerra, A., 2018. A comparative between Mel Frequency Cepstral Coefficients (MFCC) and Inverse Mel Frequency Cepstral Coefficients (IMFCC) features for an automatic bird species recognition system. IEEE Latin American Conference on Computational Intelligence (LA-CCI), Guadalajara, México, 1-4 (Accessed July 18, 2025) at:. https://ieeexplore.ieee.org/document/8625230.
Raschka, S.; Mirjalili, V., 2017. Python machine learning. Packt Publishing Ltd., Birmingham, 622 p.
Rivera, M.; Edwards, J.A.; Hauber, M.E.; Woolley, S.M.N., 2023. Machine learning and statistical classification of birdsong link vocal acoustic features with phylogeny. Scientific Reports, v. 13, 7076. https://doi.org/10.1038/s41598-023-33825-5.
Salamon, J.; Bello, J.P., 2017. Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Signal Processing Letters, v. 24 (3), 279-283. https://doi.org/10.1109/LSP.2017.2657381.
Schuster, G.E.; Walston, L.J.; Little, A.R., 2024. Evaluation of an autonomous acoustic surveying technique for grassland bird communities in Nebraska. PLOS ONE, v. 19 (7), e0306580. https://doi.org/10.1371/journal.pone.0306580.
Sick, H., 1984. Ornitologia Brasileira. Universidade de Brasília, Brasília, 827 p.
Somervuo, P.; Härmä, A.; Fagerlund, S., 2006. Parametric representations of bird sounds for automatic species recognition. IEEE Transactions on Audio, Speech and Language Processing, v. 14 (6), 2252-2263. https://doi.org/10.1109/TASL.2006.872624.
Somervuo, P.; Lauha, P.; Lokki, T., 2023. Effects of landscape and distance in automatic audio based bird species identification. Journal of the Acoustical Society of America, v. 154 (1), 245-254. https://doi.org/10.1121/10.0020153.
Sprengel, E.; Jaggi, M.; Kilcher, Y.; Hofmann, T., 2016. Audio based bird species identification using deep learning techniques. CLEF Working Notes, LifeCLEF 2016, v. 1609, 534-543 (Accessed July 18, 2025) at:. https://ceur-ws.org/Vol-1609/16090547.pdf.
Stastny, J.; Munk, M.; Juranek, L., 2018. Automatic bird species recognition based on birds vocalization. EURASIP Journal on Audio, Speech and Music Processing, v. 2018, art. 19, 1-19. https://doi.org/10.1186/s13636-018-0143-7.
Tóth, B.P.; Czeba, B., 2016. Convolutional neural networks for large-scale bird song classification in noisy environment. CLEF Working Notes, LifeCLEF 2016, v. 1609, 560-568 (Accessed July 18, 2025) at:. https://ceur-ws.org/Vol-1609/16090560.pdf.
Uddin, M.R.; Asaduzzaman, A.; Soza, R.; Minkler, C., 2024. Avian song identification using CNN. IEEE Green Technologies Conference (GreenTech), Springdale, AR, USA, 43-47. https://doi.org/10.1109/GreenTech58819.2024.10520499.
Verdin, R.; Kumar, A., 2015. Musical segmentation techniques for bird song classification. [S.l.] (Accessed March 09, 2025) at:. https://regisverdin.github.io.
Xie, J.; Hu, K.; Zhu, M.; Yu, J.; Zhu, Q., 2019. Investigation of different CNN-based models for improved bird sound classification. IEEE Access, v. 7, 175353-175361. https://doi.org/10.1109/ACCESS.2019.2957572.
Xie, J.; Li, W.; Zhang, J.; Ding, C., 2018. Bird species recognition method based on chirplet spectrogram feature and deep learning. Journal of Beijing Forestry University, v. 40 (3), 122-127. https://doi.org/10.13332/j.1000-1522.20180008.
Xie, J.; Towsey, M.; Eichinski, P.; Zhang, J.; Roe, P., 2015. Acoustic feature extraction using perceptual wavelet packet decomposition for frog call classification. IEEE 11th International Conference on e-Science, 237-242. https://doi.org/10.1109/eScience.2015.47.
Xie, J.; Yang, J.; Ding, C.; Li, W., 2020. High accuracy individual identification model of Crested Ibis (Nipponia nippon) based on autoencoder with self-attention. IEEE Access, v. 8, 41062-41070. https://doi.org/10.1109/ACCESS.2020.2973243.
Xie, J.; Zhong, Y.; Zhang, J.; Liu, S.; Ding, C.; Triantafyllopoulos, A., 2023. A review of automatic recognition technology for bird vocalizations in the deep learning era. Ecological Informatics, v. 73, 101927. https://doi.org/10.1016/j.ecoinf.2022.101927.
Xie, J.; Zhu, M., 2022. Sliding-window based scale-frequency map for bird sound classification using 2D- and 3D-CNN. Expert Systems with Applications, v. 207, 118054. https://doi.org/10.1016/j.eswa.2022.118054.
Zhang, Q.; Hu, S.; Tang, L.; Deng, R.; Yang, C.; Zhou, G.; Chen, A., 2024. SDFIE-NET – A self-learning dual-feature fusion information capture expression method for birdsong recognition. Applied Acoustics, v. 221, 110004. https://doi.org/10.1016/j.apacoust.2024.110004.
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Revista Brasileira de Ciências Ambientais

This work is licensed under a Creative Commons Attribution 4.0 International License.















