Yan Liu
Paper download is intended for registered attendees only, and is
subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.
Papers from this author
Position-Aware Safe Boundary Interpolation Oversampling
Auto-TLDR; PABIO: Position-Aware Safe Boundary Interpolation-Based Oversampling for Imbalanced Data
Abstract Slides Poster Similar
The class imbalance problem is characterized by the unequal distribution of different class samples, usually resulting in a learning bias toward the majority class. In the past decades, kinds of techniques have been proposed to alleviate this problem. Among those approaches, one promising method, interpolation- based oversampling, proposes to generate synthetic minority samples based on selected reference data, which can effectively solve the skewed distribution of data samples. However, there are several unsolved issues in interpolation-based oversampling. Existing methods often suffer from noisy synthetic samples due to improper data clusterings and unsatisfactory reference selection. In this paper, we propose the position-aware safe boundary interpolation oversampling algorithm (PABIO) to address such issues. We firstly introduce a combined clustering algorithm for minority samples to overcome the shortage of clustering using only distance-based or density-based. Then a position- aware interpolation-based oversampling algorithm is proposed for different minority clusters. Especially, we develop a novel method to leverage the majority class information to learn a safe boundary for generating synthetic points. The proposed PABIO is evaluated on multiple imbalanced data sets classified by two base classifiers: support vector machine (SVM) and C4.5 decision tree classifier. Experimental results show that our proposed PABIO outperforms other baselines among benchmark data sets.