如何使用 dplyr 包从 R 数据框中删除多行?

r programmingserver side programmingprogramming更新于 2025/6/25 8:22:17

有时,我们的数据集中会包含一些需要删除的不必要信息,这些信息可能是单个案例、多个案例、整个变量或任何其他对实现分析目标无益的信息,因此我们需要删除它们。如果我们想借助 dplyr 包从 R 数据框中删除此类行,可以使用 anti_join 函数。

示例

考虑以下数据框:

> set.seed(2514)
> x1<-rnorm(20,5)
> x2<-rnorm(20,5,0.05)
> df1<-data.frame(x1,x2)
> df1

输出

     x1      x2
1 5.567262 4.998607
2 5.343063 4.931962
3 2.211267 5.034461
4 5.092191 5.075641
5 3.883282 4.997900
6 5.950218 5.038626
7 4.903268 5.010087
8 7.462286 4.974513
9 5.056762 5.097812
10 6.031768 5.002989
11 3.814416 4.990552
12 3.359167 4.891964
13 5.304671 4.950883
14 4.768564 4.953290
15 3.842797 4.950219
16 5.270018 4.995953
17 6.344269 5.008545
18 5.366249 4.905290
19 5.547608 5.098554
20 5.266844 5.003416

加载 dplyr 包:

> library(dplyr)

从 df1 中删除第 1 至 5 行:

> anti_join(df1,df1[1:5,])
Joining, by = c("x1", "x2")
     x1       x2
1 5.950218 5.038626
2 4.903268 5.010087
3 7.462286 4.974513
4 5.056762 5.097812
5 6.031768 5.002989
6 3.814416 4.990552
7 3.359167 4.891964
8 5.304671 4.950883
9 4.768564 4.953290
10 3.842797 4.950219
11 5.270018 4.995953
12 6.344269 5.008545
13 5.366249 4.905290
14 5.547608 5.098554
15 5.266844 5.003416

从 df1 中删除第 11 行至第 18 行:

> anti_join(df1,df1[11:18,])
Joining, by = c("x1", "x2")
     x1       x2
1 5.567262 4.998607
2 5.343063 4.931962
3 2.211267 5.034461
4 5.092191 5.075641
5 3.883282 4.997900
6 5.950218 5.038626
7 4.903268 5.010087
8 7.462286 4.974513
9 5.056762 5.097812
10 6.031768 5.002989
11 5.547608 5.098554
12 5.266844 5.003416

从 df1 中删除第 6 行至第 12 行:

> anti_join(df1,df1[6:12,])
Joining, by = c("x1", "x2")
     x1       x2
1 5.567262 4.998607
2 5.343063 4.931962
3 2.211267 5.034461
4 5.092191 5.075641
5 3.883282 4.997900
6 5.304671 4.950883
7 4.768564 4.953290
8 3.842797 4.950219
9 5.270018 4.995953
10 6.344269 5.008545
11 5.366249 4.905290
12 5.547608 5.098554
13 5.266844 5.003416

从 df1 中删除第 15 行至第 20 行:

> anti_join(df1,df1[15:20,])
Joining, by = c("x1", "x2")
     x1      x2
1 5.567262 4.998607
2 5.343063 4.931962
3 2.211267 5.034461
4 5.092191 5.075641
5 3.883282 4.997900
6 5.950218 5.038626
7 4.903268 5.010087
8 7.462286 4.974513
9 5.056762 5.097812
10 6.031768 5.002989
11 3.814416 4.990552
12 3.359167 4.891964
13 5.304671 4.950883
14 4.768564 4.953290

从 df1 中删除第 5 至 18 行:

> anti_join(df1,df1[5:18,])
Joining, by = c("x1", "x2")
     x1       x2
1 5.567262 4.998607
2 5.343063 4.931962
3 2.211267 5.034461
4 5.092191 5.075641
5 5.547608 5.098554
6 5.266844 5.003416

从 df1 中删除第 11 行至第 20 行:

> anti_join(df1,df1[11:20,])
Joining, by = c("x1", "x2")
    x1        x2
1 5.567262 4.998607
2 5.343063 4.931962
3 2.211267 5.034461
4 5.092191 5.075641
5 3.883282 4.997900
6 5.950218 5.038626
7 4.903268 5.010087
8 7.462286 4.974513
9 5.056762 5.097812
10 6.031768 5.002989

从 df1 中删除第 1 至 10 行:

> anti_join(df1,df1[1:10,])
Joining, by = c("x1", "x2")
      x1      x2
1 3.814416 4.990552
2 3.359167 4.891964
3 5.304671 4.950883
4 4.768564 4.953290
5 3.842797 4.950219
6 5.270018 4.995953
7 6.344269 5.008545
8 5.366249 4.905290
9 5.547608 5.098554
10 5.266844 5.003416

从 df1 中删除第 2 行至第 11 行:

> anti_join(df1,df1[2:11,])
Joining, by = c("x1", "x2")
     x1       x2
1 5.567262 4.998607
2 3.359167 4.891964
3 5.304671 4.950883
4 4.768564 4.953290
5 3.842797 4.950219
6 5.270018 4.995953
7 6.344269 5.008545
8 5.366249 4.905290
9 5.547608 5.098554
10 5.266844 5.003416

相关文章