如何删除 R 数据框中某些列包含 NA 值的行?
r programmingserver side programmingprogramming更新于 2025/6/24 13:37:17
如果数据框中存在缺失数据,那么如果我们拥有关于缺失信息案例特征的足够信息,则可以替换其中一些数据。但是,如果没有这些信息,并且我们找不到合适的方法来替换缺失值,则可以对包含缺失值的列使用 complete.cases 函数。
示例
考虑以下数据框:
> set.seed(19991) > x1<-sample(c(NA,rnorm(5,2,1)),20,replace=TRUE) > x2<-sample(c(NA,rnorm(5,40,0.87)),20,replace=TRUE) > x3<-sample(c(NA,rnorm(5,1,0.015)),20,replace=TRUE) > x4<-sample(c(NA,rnorm(10,5,1.27)),20,replace=TRUE) > x5<-sample(c(NA,rnorm(8,1,0.20)),20,replace=TRUE) > df1<-data.frame(x1,x2,x3,x4,x5) > df1
输出
x1 x2 x3 x4 x5 1 0.8287962 39.74094 0.9983586 6.338327 0.8692225 2 1.3167347 NA NA 4.133738 0.8692225 3 3.9911408 38.84212 1.0047761 5.825111 0.8423061 4 0.6426335 39.74094 1.0047761 5.177329 NA 5 1.3167347 NA 0.9963252 5.073915 0.8423061 6 0.8287962 38.84212 0.9963252 5.154073 1.0566156 7 NA 40.36844 0.9927987 NA 0.8423061 8 0.1952913 40.36844 1.0047761 6.338327 NA 9 3.9911408 NA 1.0366262 5.154073 1.1936387 10 0.6426335 39.77818 0.9927987 5.177329 0.8557775 11 NA NA 1.0047761 7.216787 0.9506370 12 NA 38.84212 0.9983586 NA 0.8423061 13 1.3167347 39.77818 0.9963252 5.825111 0.8557775 14 0.8287962 39.77818 1.0366262 5.177329 NA 15 0.1952913 NA 0.9927987 5.073915 0.8692225 16 0.1952913 38.84212 1.0366262 5.154073 0.8286973 17 0.1952913 38.84212 1.0366262 NA 0.9506370 18 1.3167347 40.36844 0.9983586 NA 1.0566156 19 0.1952913 39.80231 NA 5.073915 NA 20 NA NA 0.9983586 5.073915 0.8557775
删除 df1 中第 3 至 5 列包含 NA 的行:
示例
> df1[complete.cases(df1[3:5]),]
输出
x1 x2 x3 x4 x5 1 0.8287962 39.74094 0.9983586 6.338327 0.8692225 3 3.9911408 38.84212 1.0047761 5.825111 0.8423061 5 1.3167347 NA 0.9963252 5.073915 0.8423061 6 0.8287962 38.84212 0.9963252 5.154073 1.0566156 9 3.9911408 NA 1.0366262 5.154073 1.1936387 10 0.6426335 39.77818 0.9927987 5.177329 0.8557775 11 NA NA 1.0047761 7.216787 0.9506370 13 1.3167347 39.77818 0.9963252 5.825111 0.8557775 15 0.1952913 NA 0.9927987 5.073915 0.8692225 16 0.1952913 38.84212 1.0366262 5.154073 0.8286973 20 NA NA 0.9983586 5.073915 0.8557775
删除 df1 中第 1 至 3 列包含 NA 的行:
示例
> df1[complete.cases(df1[1:3]),]
输出
x1 x2 x3 x4 x5 1 0.8287962 39.74094 0.9983586 6.338327 0.8692225 3 3.9911408 38.84212 1.0047761 5.825111 0.8423061 4 0.6426335 39.74094 1.0047761 5.177329 NA 6 0.8287962 38.84212 0.9963252 5.154073 1.0566156 8 0.1952913 40.36844 1.0047761 6.338327 NA 10 0.6426335 39.77818 0.9927987 5.177329 0.8557775 13 1.3167347 39.77818 0.9963252 5.825111 0.8557775 14 0.8287962 39.77818 1.0366262 5.177329 NA 16 0.1952913 38.84212 1.0366262 5.154073 0.8286973 17 0.1952913 38.84212 1.0366262 NA 0.9506370 18 1.3167347 40.36844 0.9983586 NA 1.0566156
删除 df1 中第 2 至 4 列包含 NA 的行:
示例
> df1[complete.cases(df1[2:4]),]
输出
x1 x2 x3 x4 x5 1 0.8287962 39.74094 0.9983586 6.338327 0.8692225 3 3.9911408 38.84212 1.0047761 5.825111 0.8423061 4 0.6426335 39.74094 1.0047761 5.177329 NA 6 0.8287962 38.84212 0.9963252 5.154073 1.0566156 8 0.1952913 40.36844 1.0047761 6.338327 NA 10 0.6426335 39.77818 0.9927987 5.177329 0.8557775 13 1.3167347 39.77818 0.9963252 5.825111 0.8557775 14 0.8287962 39.77818 1.0366262 5.177329 NA 16 0.1952913 38.84212 1.0366262 5.154073 0.8286973
我们来看另一个例子:
示例
> y1<-sample(c(NA,rpois(5,2)),20,replace=TRUE) > y2<-sample(c(NA,rpois(5,5)),20,replace=TRUE) > y3<-sample(c(NA,rpois(5,1)),20,replace=TRUE) > y4<-sample(c(NA,rpois(5,2)),20,replace=TRUE) > df2<-data.frame(y1,y2,y3,y4) > df2
输出
y1 y2 y3 y4 1 0 2 0 NA 2 6 NA NA NA 3 0 9 1 1 4 6 4 NA 1 5 2 2 0 2 6 2 9 NA NA 7 6 2 0 1 8 2 4 1 NA 9 2 2 1 1 10 6 4 1 2 11 2 2 0 NA 12 6 2 3 1 13 0 4 1 1 14 2 4 1 0 15 2 9 0 1 16 2 2 1 1 17 2 9 NA 1 18 2 9 0 1 19 2 9 1 0 20 NA 2 3 1
示例
> df2[complete.cases(df2[1:3]),]
输出
y1 y2 y3 y4 1 0 2 0 NA 3 0 9 1 1 5 2 2 0 2 7 6 2 0 1 8 2 4 1 NA 9 2 2 1 1 10 6 4 1 2 11 2 2 0 NA 12 6 2 3 1 13 0 4 1 1 14 2 4 1 0 15 2 9 0 1 16 2 2 1 1 18 2 9 0 1 19 2 9 1 0
示例
> df2[complete.cases(df2[2:4]),]
输出
y1 y2 y3 y4 3 0 9 1 1 5 2 2 0 2 7 6 2 0 1 9 2 2 1 1 10 6 4 1 2 12 6 2 3 1 13 0 4 1 1 14 2 4 1 0 15 2 9 0 1 16 2 2 1 1 18 2 9 0 1 19 2 9 1 0 20 NA 2 3 1
示例
> df2[complete.cases(df2[c(1,3)]),]
输出
y1 y2 y3 y4 1 0 2 0 NA 3 0 9 1 1 5 2 2 0 2 7 6 2 0 1 8 2 4 1 NA 9 2 2 1 1 10 6 4 1 2 11 2 2 0 NA 12 6 2 3 1 13 0 4 1 1 14 2 4 1 0 15 2 9 0 1 16 2 2 1 1 18 2 9 0 1 19 2 9 1 0