apply - Using lapply to list percentage of null variables in every column in R -
i given large csv 115 columns across , 1000 rows. columns have variety of data, character-based, integer, etc. however, data has lot of null variables of varying types (na, -999, null, etc.).
what want write script generate list of columns on 30% of data in column null of type.
to this, wrote script give me null percentage (as decimal) 1 column. this script works fine me.
length(which(indata$observationyear == "" | is.na(indata$observationyear) | indata$observationyear == "na" | indata$observationyear == "-999" | indata$observationyear == "0"))/nrow(indata)
i want write script columns. believe need use lapply function.
i attempted here, however, can't seem script work @ all:
null_counter <- lapply(indata, 2, length(x), length(which(indata == "" | is.na(indata) | indata == "na" | indata == "-999" | indata == "0"))) names(indata(which(0.3>=null_counter / nrow(indata))))
i following errors:
error in match.fun(fun) : '2' not function, character or symbol
and:
error: not find function "indata"
ideally, want give me vector list of column names percentage of null variables (na, -999, 0, null) on 30%.
can help?
i believe want use apply rather lapply apply function list. try this:
null_counter <- apply(indata, 2, function(x) length(which(x == "" | is.na(x) | x == "na" | x == "-999" | x == "0"))/length(x)) null_name <- colnames(indata)[null_counter >= 0.3]
Comments
Post a Comment