摘 要
支持向量机(SVM)是近来提出的基于统计学习理论的解决模式识别问题的新技术,因其出色的学习性能,已成为目前国际机器学习界的研究热点。传统的统计学前提是有足够多样本,当样本数目有限时难以取得理想的效果。但在实际问题中,样本常常有限,导致一些理论上优秀的学习方法在实际应用中不能达到预期效果,而支持向量机(SVM)能够较好的解决这一问题。
本文主要阐述了SVM分类方法的基本原理和课题的设计思想及相关的实现方法,并在最后侧重分析了核函数的选择和样本的大小对性能的影响。设计方案如下:首先选择线性核函数(Linear)和径向基函数(RBF)等核函数作为内积运算,并利用支持向量机分解算法解决二次规划问题来创建一个SVM分类器。在该实验系统中,最大的特点是用户可以通过图形界面方便地进行一些基本的操作,即装载已有的数据集,也可以创建新的数据集,并选择不同的核函数训练支持向量机分类器,能够对数据进行正确的分类,比较直观地显示了分类的结果。
关键词:支持向量机,模式分类,训练算法,二次规划问题,核函数
The Experimental System about the Method of Pattern Classification with SVM Based on Matlab
Author: Ning Shu
Tutor: Zhong Qingliu
Abstract
Support Vector Machine (SVM), based on statistical learning theory, has been recently introduced as a new technique for solving pattern recognition problems. This technique has become the hotspot of machine learning because of their excellent learning performance. The conventional statistical learning theory provides conclusion only for the situation where sample size is tending to infinity. So they may not work in practical cases of limited samples. But the samples are often finite in the actual problems that lead some theoretically excellent learning methods can’t reach anticipative effect in practice, and that Support Vector Machine (SVM) can solve these problems better.
In the paper, it mainly expounds the basic principle of the method of pattern classification with SVM, the designing thought and correlative implementing methods about the task, and finally analyses particularly the influence on performance when choosing different kernel and size of samples. The design scheme is as follow. First, choose the Linear kernel and Redial Basis kernel as the inner product operation, and make use of decomposition algorithm solving the quadratic programming problem to create a SVM classification. In this experimental system, the main feature is that user can do some basal operation expediently using graphical interfaces, namely loading the existent data sets or creating new data sets, and choosing different kernel function to train the SVM classification, which can classify the given data correctly, it shows the result of classification and comparison of performance intuitionally.
Key words: Support Vector Machine, pattern classification, training algorithm, quadratic programming problem, kernel function
目 录
1 绪 论 1
1.1课题研究背景及目的 1
1.2国内外研究现状及发展趋势 2
1.3课题研究内容及方法 3
1.4课题研究开发工具 5
2 系统总体设计 7
2.1功能需求分析 7
2.2 总体设计思想 7
2.3系统总流程图及功能模块设计 9
2.3.1支持向量机分类模型总流程图 9
2.3.2主要功能模块设计 10
2.3.3系统实现相关程序的流程图 11
3 系统的技术分析 12
3.1支持向量机概述 12
3.1.1支持向量机核心思想 13
3.1.2支持向量机基本方法 14
3.2.3.1线性情况 14
3.2.3.2非线性情况 17
3.2 支持向量机的模型选择 18
3.3 支持向量机分类器的分解算法 19
4 系统的设计实现 23
4.1系统的功能描述 23
4.1.1系统主界面 23
4.1.2数据分类仿真实验模块主界面 24
4.1.3创建数据模块 27
4.2 系统运行结果 28
4.3 系统性能分析 32
4.3.1核函数的选择对于分类的影响 32
4.3.2训练个数的选择对于分类的影响 34
5 结论 38
致谢 39
参考文献 40
参 考 文 献
[1] 边肇祺,张学工.模式识别[M].第2版.北京:清华大学出版社,2000.
[2] 张学工.关于统计学习理论与支持向量机.自动化学报,2000,26(1):32—42.
[3] 张学工.关于统计学习理论与支持向量机[J].自动化学报,2000,26(1).
[4] VapnikVN,著;张学工,译.统计学习理论的本质.北京:清华大学出版社。2000
[5] 范劲松,方廷健.基于粗集理论和SVM算法的模式分类方法.模式识别与人工智能,2000.
[6] Vapnik V N. The nature of statistical learning theory[M] . New York: Springer-Verlag Press,1995.
[7] Cortes C,Vapnik V.Support Vector Network Machine Learning,1995, 20: 273~297.
[8] 张浩然,韩正之,李昌刚.支持向量机.计算机科学,2002,29(12): 135-137.
[9] 赵卫华,范玉妹,张剑峰.应用于数据挖掘分类算法的SVM研究[A]. 北京: 北京科技大学应用数学专业.
[10] 王国胜,钟义信.支持向量机的若干新进展[A].北京:北京邮电大学信息工程学院,2001-10.
[11] Harris Drucker, Vladimir Vapnik, and Dongui Wu, "Support vector machines for spam categorization," IEEE Transactions on Neural Networks, vat. 10, no. 5, pp 1048-1054,1999.
[12] Burges C, A tutorial on support vector machines for pattern recognition[J] , Data Mining and Knowledge Discovery,1998,2(2):121—167.
[13] Joachims T, Making Large-Scale SVM Learning Practice, In:Scho1kopf B, et al, eds, Advances in Kernel Methods—Support Vector Learning, MIT Press,1998.
[14] 刘江华,程君实,陈佳品.支持向量机训练算法综述[J].信息与控制,2002,31(1):45—50.
[15] 李红莲,王春花.针对大规模的支持向量机的学习策略[J] .计算机学报,2004,5(27):716—718.
[16] Nello Cristianini, Bristol et al. Dynamically Adapting Kernels in Support Vector Machines.
