锅贴怎么做
基于SVM算法的癌症基因数据分类研究
基于SVM算法的癌症基因数据分类研究
中文摘要屁股长疮怎么治
癌症是对人类生命构成严重威胁的主要疾病之一,而癌症的早诊断是提高癌症患者成活率的关键。随着DNA微列阵技术的飞速发展,海量的癌症基因表达数据得以积累。在分子生物学的基础上,如何根据这些庞大的基因表达数据进行癌症的早期诊断已成为后基因组时代的研究热点,但是癌症基因表达数据一般都具有高维数、样本数量少、非线性等特征,这就给基因数据的分类带来了很多困难。
针对以上基因表达数据的普遍特征,本文运用一种基于支持向量机的分类方法对癌症数据样本进行分类。SVM是在统计学理论的基础上发展起来的新一代机器学习方法,它采用结构风险化原则,代替了经验最小化原则,成功应用核函数将非线性问题转化为线性问题,在解决有限样本、非线性及高维模式识别问题中表现出了许多特有的优势。
尽管SVM有效的解决了欠学习和过学习的问题,但是基因表达数据样本数少、维数高的特性对数据分类准确度的影响难以避免。如果直接对原始数据进行分类,工作量大且得不到比较满意的结果。因此,数据降维就成为癌症基因数据分类的关键性问题。本文首先运用数据降维方法,对原始基因表达数据进行降维,得到较低维度的数据之后,再对其进行SVM分类。通过多种降维方法的比较以及SVM参数的合理设置,可以取得较高的癌症诊断精度。文章中使用的数据降维方法有稀疏主成分分析,广义判别分析和拉普拉斯特征值映射法等。windows域
本文的研究重点是如何利用降维方法优化数据,通过选择两组网络公开的数据集进行相关实验,可得
对于Prostate Tumor数据,GDA的降维效果最佳,而对于Leukemia 数据,MDS的降维效果最佳。实验结果表明:寻求最优的降维方法以及合理的调整SVM参数,可以有效的优化基因数据,提高SVM的分类性能,取得较高的分类精度。
关键字:DNA微列阵;基因表达数据;降维;SVM;数据分类
作者:黄燕红
指导教师:翁桂荣
A Study of Cancer Gene Data Classification Bad on SVM Algorithm
知足者>格列奈类药物A Study of Cancer Gene Data Classification
Bad on SVM Algorithm
Abstract
Cancer is one of the main dias posing rious threats to human. The early diagnosis of cancer is key to improve the survival rate of patients. With the rapid development of DNA microarray technol
炒股学习
《离骚》全文注音ogy, vast amounts of cancer gene expression data have been collected. On the basis of molecular biology, making u of the huge gene expression data for early cancer diagnosis has become a hot topic in the post genome era. However, gene expression data always has the characters of small sample size, high dimension and nonlinear.
To solve the problems above-mentioned, this paper introduced a classification method bad on SVM (support vector machine), which can be ud to realize the cancer diagnosis. SVM is a new machine learning method bad on SLT (statistical Learning theory), using ERM (Empirical Risk Minimization) principle instead of SRM (Structural Risk Minimization) rule. Kernel function is successfully applied to convert nonlinear problem to a linear problem, so it has good generalization ability. As a result, SVM has many unique advantages especially in solving the pattern recognition problems which are sample limited, nonlinear and high dimensional.
SVM avoids the over-fitting and under-fitting problems effectively. However, problems like small sample size and high dimension still have the influence on classification accuracy. Therefore, the dimensional reduction has become an important step in cancer genetic data classification. In this paper, some dimensional reduction methods are applied to get a lower-dimensional data, and then SVM is ud for classification. Higher cancer diagnosis accuracy is achieved by comparing various
methods of dimensional reduction and tting the appropriate parameters of SVM. This paper us SPCA,GDA , Laplacian Eigenmaps, etc.
The main emphasis of this paper is optimizing gene data by dimensional reduction methods. Two public databas “Prostate Tumor” and “Leukemia” are chon to do the experiment. The results and analysis of the experiment show: GDA is the best dimensional reduction method for Prostate Tumor datat, and MDS is the best dimensional reduction公爹轻点