如何从 R 数据框的列值中删除下划线之后的值之前的所有内容?
r programmingserver side programmingprogramming更新于 2025/6/25 10:22:17
如果 R 数据框中的某一列包含以下划线分隔的字符串值,并且扩展了包含相同值的列值的大小,那么最好将所有值的下划线符号与相同值一起删除。这将有助于我们正确读取数据,并使分析变得更容易。为此,我们可以使用 gsub 函数
考虑以下数据框 −
示例
set.seed(191) ID<-c("ID_1","ID_2","ID_3","ID_4","ID_5","ID_6","ID_7","ID_8","ID_9","ID_10","ID_11","ID_12","ID_13","ID_14","ID_15","ID_16","ID_17","ID_18","ID_19","ID_20") Salary<-sample(20000:50000,20) df1<-data.frame(ID,Salary) df1
输出
ID Salary 1 ID_1 33170 2 ID_2 22747 3 ID_3 42886 4 ID_4 22031 5 ID_5 45668 6 ID_6 32584 7 ID_7 34779 8 ID_8 20471 9 ID_9 38689 10 ID_10 29660 11 ID_11 49664 12 ID_12 24284 13 ID_13 36537 14 ID_14 37693 15 ID_15 30265 16 ID_16 36004 17 ID_17 48247 18 ID_18 20750 19 ID_19 27400 20 ID_20 20553
删除 ID 列中 ID 值前面的所有内容(包括下划线符号) −
示例
df1$ID<-gsub("^.*\_","",df1$ID) df1
输出
ID Salary 1 1 48769 2 2 26002 3 3 37231 4 4 24437 5 5 43311 6 6 47494 7 7 21029 8 8 28069 9 9 41108 10 10 29363 11 11 23371 12 12 25898 13 13 42434 14 14 22210 15 15 48969 16 16 21640 17 17 36175 18 18 21210 19 19 43374 20 20 29367
我们来看另一个例子 −
示例
Group<-c("GRP_1","GRP_2","GRP_3","GRP_4","GRP_5","GRP_6","GRP_7","GRP_8","GRP_9","GRP_10","GRP_11","GRP_12","GRP_13","GRP_14","GRP_15","GRP_16","GRP_17","GRP_18","GRP_19","GRP_20") Ratings<-sample(0:10,20,replace=TRUE) df2<-data.frame(Group,Ratings) df2
输出
Group Ratings 1 GRP_1 6 2 GRP_2 9 3 GRP_3 7 4 GRP_4 10 5 GRP_5 10 6 GRP_6 9 7 GRP_7 9 8 GRP_8 3 9 GRP_9 2 10 GRP_10 0 11 GRP_11 3 12 GRP_12 7 13 GRP_13 6 14 GRP_14 10 15 GRP_15 1 16 GRP_16 3 17 GRP_17 10 18 GRP_18 2 19 GRP_19 9 20 GRP_20 0
删除 Group 列中 GRP 值之前的所有内容(包括下划线符号) −
示例
df2$Group<-gsub("^.*\_","",df2$Group) df2
输出
Group Ratings 1 1 4 2 2 8 3 3 7 4 4 0 5 5 10 6 6 10 7 7 5 8 8 4 9 9 3 10 10 7 11 11 4 12 12 4 13 13 3 14 14 10 15 15 7 16 16 2 17 17 3 18 18 8 19 19 9 20 20 5