Daniel Wainstock
Paper download is intended for registered attendees only, and is
subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.
Papers from this author
On Identification and Retrieval of Near-Duplicate Biological Images: A New Dataset and Protocol
Thomas E. Koker, Sai Spandana Chintapalli, San Wang, Blake A. Talbot, Daniel Wainstock, Marcelo Cicconet, Mary C. Walsh
Auto-TLDR; BINDER: Bio-Image Near-Duplicate Examples Repository for Image Identification and Retrieval
Manipulation and re-use of images in scientific publications is a growing issue, not only for biomedical publishers, but also for the research community in general. In this work we introduce BINDER -- Bio-Image Near-Duplicate Examples Repository, a novel dataset to help researchers develop, train, and test models to detect same-source biomedical images. BINDER contains 7,490 unique image patches for model training, 1,821 same-size patch duplicates for validation and testing, and 868 different-size image/patch pairs for image retrieval validation and testing. Except for the training set, patches already contain manipulations including rotation, translation, scale, perspective transform, contrast adjustment and/or compression artifacts. We further use the dataset to demonstrate how novel adaptations of existing image retrieval and metric learning models can be applied to achieve high-accuracy inference results, creating a baseline for future work. In aggregate, we thus present a supervised protocol for near-duplicate image identification and retrieval without any "real-world" training example. Our dataset and source code are available at hms-idac.github.io/BINDER.