对于Prostate Tumor数据,GDA的降维效果最佳,而对于Leukemia 数据,MDS的降维效果最佳。实验结果表明:寻求最优的降维方法以及合理的调整SVM参数,可以有效的优化基因数据,提高SVM的分类性能,取得较高的分类精度。
A Study of Cancer Gene Data Classification Bad on SVM Algorithm
知足者>格列奈类药物A Study of Cancer Gene Data Classification
Bad on SVM Algorithm
Cancer is one of the main dias posing rious threats to human. The early diagnosis of cancer is key to improve the survival rate of patients. With the rapid development of DNA microarray technol
《离骚》全文注音ogy, vast amounts of cancer gene expression data have been collected. On the basis of molecular biology, making u of the huge gene expression data for early cancer diagnosis has become a hot topic in the post genome era. However, gene expression data always has the characters of small sample size, high dimension and nonlinear.
To solve the problems above-mentioned, this paper introduced a classification method bad on SVM (support vector machine), which can be ud to realize the cancer diagnosis. SVM is a new machine learning method bad on SLT (statistical Learning theory), using ERM (Empirical Risk Minimization) principle instead of SRM (Structural Risk Minimization) rule. Kernel function is successfully applied to convert nonlinear problem to a linear problem, so it has good generalization ability. As a result, SVM has many unique advantages especially in solving the pattern recognition problems which are sample limited, nonlinear and high dimensional.
SVM avoids the over-fitting and under-fitting problems effectively. However, problems like small sample size and high dimension still have the influence on classification accuracy. Therefore, the dimensional reduction has become an important step in cancer genetic data classification. In this paper, some dimensional reduction methods are applied to get a lower-dimensional data, and then SVM is ud for classification. Higher cancer diagnosis accuracy is achieved by comparing various
methods of dimensional reduction and tting the appropriate parameters of SVM. This paper us SPCA,GDA , Laplacian Eigenmaps, etc.
The main emphasis of this paper is optimizing gene data by dimensional reduction methods. Two public databas “Prostate Tumor” and “Leukemia” are chon to do the experiment. The results and analysis of the experiment show: GDA is the best dimensional reduction method for Prostate Tumor datat, and MDS is the best dimensional reduction公爹轻点