apply - Using lapply to list percentage of null variables in every column in R -


i given large csv 115 columns across , 1000 rows. columns have variety of data, character-based, integer, etc. however, data has lot of null variables of varying types (na, -999, null, etc.).

what want write script generate list of columns on 30% of data in column null of type.

to this, wrote script give me null percentage (as decimal) 1 column. this script works fine me.

length(which(indata$observationyear == "" | is.na(indata$observationyear) | indata$observationyear == "na" | indata$observationyear == "-999" | indata$observationyear == "0"))/nrow(indata) 

i want write script columns. believe need use lapply function.

i attempted here, however, can't seem script work @ all:

null_counter <- lapply(indata, 2, length(x),                    length(which(indata == "" | is.na(indata) | indata == "na" | indata == "-999" | indata == "0")))                    names(indata(which(0.3>=null_counter / nrow(indata)))) 

i following errors:

error in match.fun(fun) : '2' not function, character or symbol 

and:

error: not find function "indata" 

ideally, want give me vector list of column names percentage of null variables (na, -999, 0, null) on 30%.

can help?

i believe want use apply rather lapply apply function list. try this:

null_counter <- apply(indata, 2, function(x) length(which(x == "" | is.na(x) | x == "na" | x == "-999" | x == "0"))/length(x)) null_name <- colnames(indata)[null_counter >= 0.3] 

Comments

Popular posts from this blog

python - TypeError: start must be a integer -

c# - DevExpress RepositoryItemComboBox BackColor property ignored -

django - Creating multiple model instances in DRF3 -