多标记特征选择通过去除无关特征来提升学习模型的性能。然而,大多数现有方法假设训练集中的标记仅包含简单的逻辑值,并认为所有相关标记对实例的作用相同。除此之外,在实际应用中,不同标记对实例的影响程度可能存在差异。基于此,提出一种基于模糊邻域信息熵与互判别指数的特征选择方法,首先采用标记增强技术将原始多标记数据集转换为标记分布数据集;继后通过邻域信息熵来量化标记空间中样本间的相似关系;最终借助模糊邻域互判别指数将特征空间与标记空间相结合,从而识别出具有显著区分能力的特征子集。通过6个数据集的实验综合表明,该算法的分类性能较其他算法更为优异。
Abstract
Multi-label feature selection improves the performance of learning models by eliminating irrelevant features. However, most existing methods assume that the labels in the training set only contain simple logical values and that all relevant labels have the same effect on instances. In addition, in practical applications, the influence of different labels on instances may vary. Based on this, this paper proposes a feature selection method based on fuzzy neighborhood information entropy and mutual discriminant index. Firstly, the original multi-label datasets are transformed into label distribution datasets by using label enhancement technology. Then, the neighborhood information entropy is used to quantify the similarity relationship between samples in the label space. Finally, the feature space and the label space are combined by using the fuzzy neighborhood mutual discriminant index to identify the feature subset with significant discrimination ability. Experiments on six datasets comprehensively show that the classification performance of this algorithm is superior to that of other algorithms.
关键词
特征选择 /
模糊邻域 /
多标记学习
Key words
feature selection /
fuzzy neighborhood /
multi-label learning
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Kumar S,Rastogi R.Low rank label subspace transformation for multi-label learning with missing labels[J].Information Sciences,2022,596:53-72.
[2] Wang C,Hu Q,Wang X,et al.Feature selection based on neighborhood discrimination index[J].IEEE Transactions on Neural Networks and Learning Systems,2017,29(7):2986-2999.
[3] 孙林,马天娇,薛占熬.基于Fisher score与模糊邻域熵的多标记特征选择算法[J].计算机应用,2023,43(12):3779-3789.
[4] 耿新,徐宁.标记分布学习与标记增强[J].中国科学:信息科学,2018,48(5):521-530.
[5] Geng X.Label distribution learning[J].IEEE Transactions on Knowledge and Data Engineering,2016,28(7):1734-1748.
[6] Hu Q,Zhang L,Zhang D,et al.Measuring relevance between discrete and continuous features based on neighborhood mutual information[J].Expert Systems with Applications,2011,38(9):10737-10750.
[7] Lin Y,Hu Q,Liu J,et al.Multi-label feature selection based on neighborhood mutual information[J].Applied Soft Computing,2016,38:244-256.
[8] Dai J,Chen J,Liu Y,et al.Novel multi-label feature selection via label symmetric uncertainty correlation learning and feature redundancy evaluation[J].Knowledge-Based Systems,2020,207:106342.
[9] Qian W,Xiong C,Qian Y,et al.Label enhancement-based feature selection via fuzzy neighborhood discrimination index[J].Knowledge-Based Systems,2022,250:109-119.
[10] Zhang J,Luo Z,Li C,et al.Manifold regularized discriminative feature selection for multi-label learning[J].Pattern Recognition,2019,95:136-150.
[11] Lee J,Kim D.Feature selection for multi-label classification using multivariate mutual information[J].Pattern Recognition Letters,2013,34(3):349-357.
[12] Lin Y,Hu Q,Liu J,et al.Streaming feature selection for multi label learning based on fuzzy mutual information[J].IEEE Transactions on Fuzzy Systems,2017,25(6):1491-1507.
[13] Liu J,Lin Y,Li Y,et al.Online multi-label streaming feature selection based on neighborhood rough set[J].Pattern Recognition,2018,84:273-287.
[14] Dong H,Sun J,Li T,et al.A multi-objective algorithm for multi-label filter feature selection problem[J].Applied Intelligence,2020,50:3748-3774.
[15] Mohapatra P,Chakravarty S,Dash P.Microarray medical data classification using kernel ridge regression and modified cat swarm optimization based gene selection system[J].Swarm and Evolutionary Computation,2016,28:144-160.