Modular Retrieval- and Active Learning-Guided Pipeline for High-Quality Pollen Microscopy Training Set Generation

Biricz András, ELTE (2025.04.01 - 08.31)

Abstract: In this study, we developed a modular and scalable pipeline for the automated generation of high-quality object detection datasets from large-scale Whole Slide Images (WSIs), with a particular focus on pollen microscopy. The framework leverages recent advances in open-vocabulary vision-language models (e.g., OWL-ViT) for one-shot object detection, further refined through vision transformer embeddings (DINOv2) and distribution-aware clustering techniques.

To ensure reliable training data, we adopted an active learning paradigm, where only clean, high-confidence detections were used to fine-tune specialized Faster R-CNN models with ViT backbones. These models were then deployed in a refinement loop to recover additional detections and improve coverage. The pipeline is generalizable to other domains, such as medical histology or cellular imaging.

High-performance GPU resources were essential for processing high-resolution WSIs, computing embedding representations for hundreds of thousands of image crops, and training large-scale vision transformer models.

Next Post Previous Post