首页 > 美文阅读

R语言数据高效处理指南——基本数据处理

更新时间:2023-06-19 03:49:49 阅读：评论：0

R语⾔数据⾼效处理指南——基本数据处理南京条约的主要内容

1 数据集基本探索

str、summary、head这三个函数，是对数据框进⾏探索性分析的“三板斧”。

> str(iris)

'data.frame': 150 obs. of 5 variables:

$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...

$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...

$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...

$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...

$ Species : Factor w/ 3 levels "tosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

> summary(iris)

Sepal.Length Sepal.Width Petal.Length Petal.Width Species

Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100 tosa :50

1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300 versicolor:50

Median :5.800 Median :3.000 Median :4.350 Median :1.300 virginica :50

Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199

3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800

Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500

> head(iris)

Sepal.Length Sepal.Width Petal.Length Petal.Width Species

1 5.1 3.5 1.4 0.

2 tosa

2 4.9 3.0 1.4 0.2 tosa

3 4.7 3.2 1.3 0.2 tosa

4 4.6 3.1 1.

5 0.2 tosa

5 5.0 3.

6 1.4 0.2 tosa

6 5.4 3.9 1.

7 0.4 tosa

2 基本范式实现

2.1 创建（read.csv/data.frame）

2.1.1 外部导⼊

⾸先创建⼀个csv⽂件，为了步骤统⼀，我们先⽤write.csv函数从内部写出表格，再⽤read.csv读⼊。以iris数据及为例，⾸先将它写⼊D 盘根⽬录下：

> write.csv(iris,file = "D:/iris.csv")

进学生会的申请书

# 上⾯函数可以默认file = 这个部分，也就是说，可以写成：

# write.csv(iris,"D:/iris.csv")

操作完毕后，可以把这个数据从外部读⼊，并赋值iris2：

> iris2 = read.csv("D:/iris.csv")

> iris2

X Sepal.Length Sepal.Width Petal.Length Petal.Width Species

1 1 5.1 3.5 1.4 0.

2 tosa

2 2 4.9 3.0 1.4 0.2 tosa

3 3 4.7 3.2 1.3 0.2 tosa

4 4 4.6 3.1 1.

5 0.2 tosa

5 5 5.0 3.

6 1.4 0.2 tosa性感美女图

…………

⼀般来说，read.csv函数默认第⼀⾏为表头，会作为列名称赋给数据框，如果不希望这个操作，可设置参数“header = F”，不过在包含列名称的数据中使⽤header = F是错误的。

2.1.2 内部创作（data.frame）

内部创作数据框，可使⽤data.frame函数直接创建：

> df = data.frame(x = 1:3,y = c("a","b","c"))

> df

x y

1 1 a

2 2 b

3 3 c

2.2 删除（rm）

如果不希望R 中继续使⽤这个变量，可以⽤rm函数删除它：

> rm(df)

> df

function (x, df1, df2, ncp, log = FALSE)

{

if (missing(ncp))

Call(C_df, x, df1, df2, log)

el .Call(C_dnf, x, df1, df2, ncp, log)

}

<bytecode: 0x0000000004c0a8c0>

<environment: namespace:stats>

想要知道环境中有哪些变量，可以⽤ls函数显⽰：

> ls()

[1] "iris2"

如果想清空环境中所有变量，可以这么做：

天使翅膀图片

> rm(list = ls())

但是注意，⽆法删除系统⾃带数据集。

2.3 检索（DF[i , j]）

检索分为⾏检索和列检索，注意⼏点：中括号，逗号前⾏后列，连续⾏⽤冒号，不连续⾏⽤向量c(* , * , *) 2.3.1 ⾏检索

iris的第33⾏：

> iris[33,]

Sepal.Length Sepal.Width Petal.Length Petal.Width Species

33 5.2 4.1 1.5 0.1 tosa

选取多⾏，例如33到35⾏：

> iris[33:35,]

Sepal.Length Sepal.Width Petal.Length Petal.Width Species

33 5.2 4.1 1.5 0.1 tosa

34 5.5 4.2 1.4 0.2 tosa

35 4.9 3.1 1.5 0.2 tosa

选取不连续⾏，例如33、36、38⾏：

Sepal.Length Sepal.Width Petal.Length Petal.Width Species

33 5.2 4.1 1.5 0.1 tosa

36 5.0 3.2 1.2 0.2 tosa

38 4.9 3.6 1.4 0.1 tosa

2.3.2 列检索

同理

> iris1 = iris[2:5,]

> iris1[,2:4]

Sepal.Width Petal.Length Petal.Width

2 3.0 1.4 0.2

3 3.2 1.3 0.2

4 3.1 1.

5 0.2

5 3.

6 1.4 0.2

因为列是有名称的，例如选取Petal.Length列：

> iris1[,"Petal.Length"]

[1] 1.4 1.3 1.5 1.4

也可以通过$符号来选取列，上⾯的例⼦也可以这样实现：

无法无天造句> iris1$Petal.Length

[1] 1.4 1.3 1.5 1.4

如果需要选取多列，就需要利⽤向量的⽅法：

> iris1[,c("Sepal.Length","Petal.Length")]

Sepal.Length Petal.Length

2 4.9 1.4

3 4.7 1.3

4 4.6 1.5

5 5.0 1.4

2.4 插⼊（rbind/cbind）

同检索，插⼊也分为⾏插⼊和列插⼊。

2.4.1 ⾏插⼊（rbind）

对数据框进⾏⾏插⼊时，必须保证两个数据框列数⼀样，⽽且列名⼀致。下⾯，我将区iris两个⼦集，再将两个⼦集合并在⼀起，完成对第⼀个列表插⼊第⼆个列表的操作：

> i1

Sepal.Length Sepal.Width Petal.Length Petal.Width Species

1 5.1 3.5 1.4 0.

2 tosa

2 4.9 3.0 1.4 0.2 tosa

3 4.7 3.2 1.3 0.2 tosa

> i2

Sepal.Length Sepal.Width Petal.Length Petal.Width Species

4 4.6 3.1 1.

5 0.2 tosa

公司开业庆典策划> rbind(i1,i2)->i

> i

Sepal.Length Sepal.Width Petal.Length Petal.Width Species

1 5.1 3.5 1.4 0.

2 tosa

2 4.9 3.0 1.4 0.2 tosa

3 4.7 3.2 1.3 0.2 tosa

4 4.6 3.1 1.

5 0.2 tosa

注意，“->”箭头指向哪⾥，就往哪⾥赋值

2.4.2 列插⼊（cbind）

c是column的简写

> i1[,1:2]->i3

> i1[,3]->i4

> cbind(i3,i4)->i5

> i3

Sepal.Length Sepal.Width

1 5.1 3.5

2 4.9 3.0

3 4.7 3.2

> i4

[1] 1.4 1.4 1.3

> i5

抗癌水果Sepal.Length Sepal.Width i4

1 5.1 3.5 1.4

2 4.9 3.0 1.4

3 4.7 3.2 1.3

注意，i4 表格没有列名称，因此赋值后，⾃动将i4 作为列名称放⼊数据框，如果想改列名称，可以⽤colnames函数或者names函数

> names(i5) = c("a","b","c")

> i5

a b c

1 5.1 3.5 1.4

2 4.9 3.0 1.4

3 4.7 3.2 1.3

2.5 排序（order）

order函数原理：接受⼀个向量，然后返回这个向量的排序。例如：

> c(3,5,2,6,4,8)->a

备战中考作文

> order(a)

[1] 3 1 5 2 4 6

解释⼀下意思，输出结果第⼀个数字3也就是向量中第3个数字应该排在第⼀位，以此类推。

取iris前六⾏进⾏演⽰

> > test<-iris[1:6,]

> test[order(test$Sepal.Length),]

Sepal.Length Sepal.Width Petal.Length Petal.Width Species

4 4.6 3.1 1.

5 0.2 tosa

3 4.7 3.2 1.3 0.2 tosa

2 4.9 3.0 1.4 0.2 tosa

5 5.0 3.

6 1.4 0.2 tosa

1 5.1 3.5 1.4 0.

2 tosa

6 5.4 3.9 1.

7 0.4 tosa

如果希望降序，在order函数的参数中加⼊负号即可：

> test[order(-test$Sepal.Length),]

Sepal.Length Sepal.Width Petal.Length Petal.Width Species

6 5.4 3.9 1.

7 0.4 tosa

1 5.1 3.5 1.4 0.

2 tosa

5 5.0 3.

6 1.4 0.2 tosa

2 4.9 3.0 1.4 0.2 tosa

3 4.7 3.2 1.3 0.2 tosa

4 4.6 3.1 1.

5 0.2 tosa

order函数中可以加⼊多个参数

2.6 过滤（DF[condition,]）

对数据框的过滤依然需要对⾏进⾏操作，⾏的检索其实是可以利⽤逻辑值的，例如：

> test[c(T,T,T,T,F)]

Sepal.Length Sepal.Width Petal.Length Petal.Width

1 5.1 3.5 1.4 0.2

2 4.9 3.0 1.4 0.2

3 4.7 3.2 1.3 0.2

4 4.6 3.1 1.

5 0.2

5 5.0 3.

6 1.4 0.2

6 5.4 3.9 1.

7 0.4

筛选Sepal.Lenght⼤于5的记录，查看是否满⾜条件

满⾜逻辑值的进⾏筛选

> test$Sepal.Length > 5

[1] TRUE FALSE FALSE FALSE FALSE TRUE

> test[test$Sepal.Length > 5,]

Sepal.Length Sepal.Width Petal.Length Petal.Width Species

1 5.1 3.5 1.4 0.

2 tosa

6 5.4 3.9 1.

7 0.4 tosa

2.7 汇总（apply）

本文发布于:2023-06-19 03:49:49，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/82/988120.html

上一篇：2023年万圣节前夜说说文案(三篇)

下一篇：早安的文案短句早安文案简短可爱(3篇)

标签：数据函数名称检索基本操作向量

留言与评论（共有 0 条评论）