Hlle algorithm based on the weighted distance

Transactions on Computer Science and Technology June 2013, Volume 2, Issue 2, PP.17-23

HLLE Algorithm Based on the Weighted Distance Shuaibin Lian1#, Qiuli Kong2, Xianhua Dai1 1. College of Information Science and Technology, Sun Yat-sen University, Guangzhou 510006, China 2. College of Mathematical Sciences, Guangxi Normal University, Guangxi 541004, China #

Email: shuai_lian@qq.com

Abstract HLLE is an effective nonlinear dimension reduction algorithm and is widely explored into machine learning, pattern recognition, data mining and etc. However, HLLE is very sensitive to the neighborhood selection and non-uniformed data sampling. In this paper, an improved HLLE based on weighted distance named WHLLE is proposed which can avoid the unreasonable neighborhood selection by using weighted Euclidean distance. Furthermore, WHLLE not only can have a better effect of dimension reduction but also can preserve the intrinsic geometry structure of the original manifolds. We validate the performances of WHLLE on the two classical artificial manifolds. The experiments on artificial manifolds confirm that WHLLE can keep the relationship of neighborhood of the data point, global distributions and intrinsic structures of the data better than other related Algorithms. Keywords: Machine Learning; Dimension Reduction; Hessian Locally Linear Embedding (HLLE) Algorithm; Weighted Distance

基于加权距离的 HLLE 算法* 连帅彬 1，孔秋丽 2，戴宪华 1 1.中山大学信息科学与技术学院，广东广州 5100061 2.广西师范大学数学科学学院，广西桂林 541004 摘

要：海赛局部线性嵌入（Hessian Locally Linear Embedding，HLLE）是一种非常有效的非线性数据降维方法，被广泛

应用于机器学习，模式识别，数据挖掘等领域。但是 HLLE 算法对邻域的选择和非均匀数据采样非常的敏感。本文提出一种基于加权距离的 HLLE 算法（WHLLE），该算法采用加权距离的邻域选择方式从而避免了欧式距离选择邻域的不合理性，而且 WHLLE 在降维的同时能够保持原始流形整体的内在几何结构。我们在两个经典的人工流形上验证了 WHLLE 算法的性能，实验结果表明 WHLLE 除了具有良好的数据降维效果之外，同时还能够保持数据的整体分布和内在几何结构不变。关键词：机器学习；数据降维；海赛局部线性嵌入算法；加权距离

1 引言在机器学习，模式识别，数据挖掘等邻域中经常会遇到高维数据，比如人脸图像，语音图谱等。数据降维是处理高维数据的非常有效的方法。经典的数据降维的方法大致有主成分分析(PCA)[1] 、多维尺度变换 (MDS)[2]、等距映射(ISOMAP)[3]、局部线性嵌入(LLE)[4]、拉普拉斯特征映射算法(LE)[5]、Hessian 局部线性嵌入算法(HLLE)[6]等。其中 HLLE 算法被认为是非常有效的非线性数据降维方法，能够较好的恢复出高维数据的低维结构，同时较好的保持高维数据点的邻域关系不变，从而被广泛的应用在多个领域 [7][8]。但是 HLLE 算法对邻域的选择特别敏感，而且对于非均匀分布的数据流形降维效果欠佳。多年来研究学者们在 HLLE 算 *

基金项目：国家自然科学基金(项目号：G61174163) - 17 http://www.ivypub.org/cst

Turn static files into dynamic content formats.

Create a flipbook