Full text available in english in Adobe Acrobat format:https://www.food.actapol.net/volume24/issue4/4_4_2025.pdf

Computer vision has become a cornerstone technology in the food and agriculture industries, driving innovation and enabling automation across a wide range of processes. Within this field, object detection plays a critical role, supporting efficiency, accuracy, and scalability in real-world applications. The Transformer, first introduced in natural language processing, demonstrated outstanding performance thanks to its powerful self-attention mechanism and parallel processing capabilities. More recently, it has been rapidly adopted in object detection and is emerging as a strong alternative to traditional convolutional neural networks. However, much of the related research remains scattered and interdisciplinary. This paper systematically reviews the development of transformer-based models for computer vision, analysing research trends, key topics, and distinctions from other algorithms. It introduces the basic architecture of the Vision Transformer (ViT) and other transformer-based vision models, explains core principles such as self-attention and multi-stage processing, and examines applications in food and agriculture, including food quality analysis, crop monitoring, pest and disease detection, and weed identification. Challenges and future directions of transformer-based models are also discussed, alongside a review of the latest research for reference. By consolidating a large body of literature, this study provides a comprehensive overview of the structure, development, dvantages, and limitations of transformer-based vision models, while highlighting their potential to deliver more intelligent, sustainable, and efficient decision-support systems for precision food and farming practices.
Full text available in english in Adobe Acrobat format:| MLA | Lin, Maolan, et al. "A systematic review of transformer-based vision models for object detection in food and agriculture." Acta Sci.Pol. Technol. Aliment. 24.4 (2025): 489-511. https://doi.org/10.17306/J.AFS.001425 |
| APA | Lin M., Gao Z., Liao W., Cai H. (2025). A systematic review of transformer-based vision models for object detection in food and agriculture. Acta Sci.Pol. Technol. Aliment. 24 (4), 489-511 https://doi.org/10.17306/J.AFS.001425 |
| ISO 690 | LIN, Maolan, et al. A systematic review of transformer-based vision models for object detection in food and agriculture. Acta Sci.Pol. Technol. Aliment., 2025, 24.4: 489-511. https://doi.org/10.17306/J.AFS.001425 |