Combining DINO with Grounded Pre-Training can improve performances in Open-Set Object Detection
Combining DINO with Grounded Pre-Training can improve performances in Open-Set Object Detection Chinese researchers report that combining DINO with Grounded Pre-Training can improve performances in Open-Set Object Detection
Grounding DINO, an open-set object detector that utilizes language to detect arbitrary objects with human inputs such as category names or referring expressions. The model builds upon DINO, a transformer-based detector that incorporates multi-level text information through grounded pre-training. The authors introduce a tight fusion solution, which includes a feature enhancer, language-guided query selection, and a cross-modality decoder for effective cross-modality fusion.