r - Subset of a table that contains at least one element of another table -

- January 15, 2013

i have 2 tables made intervals of bp, table1 has large intervals , second has short intervals (just 2bp). want make new table contains table 1 ranges have @ least 1 element of table 2 contained in "large" ranges. if doesn´t have element in table 2 corresponds table 1 range, range of table 1 should not included.

in example row 2 (1, 600, 1500) of table1 (df) should not included:

df <- "chromosome start end     1  1 450     1 600 1500      2 3500 3585      2 7850 10000" df <- read.table(text=df, header=t)

table2 (df2)

df2 <- "chromosome start end     1 5 6     1 598 599      2 3580 3581      2 7851 7852     2  7859 7860"  df2 <- read.table(text=df2, header=t)

newtable (dfout):

dfout <- "chromosome start end 1 1 450 2 3500 3585 2 7850 10000"  dfout <- read.table(text=df2, header=t)

try foverlaps data.table

library(data.table) setkey(setdt(df1), chromosome, start, end) setkey(setdt(df2), chromosome, start, end) setnames(unique(foverlaps(df1, df2, nomatch=0)[, c(1,4:5),                          with=false]), names(df1))[] #   chromosome start   end #1:          1     1   450 #2:          2  3500  3585 #3:          2  7850 10000

or @arun commented, can use which=true (to extract indices) , subset 'df1' using yid column.

df1[unique(foverlaps(df2, df1, nomatch=0l, which=true)$yid)] #    chromosome start   end #1:          1     1   450 #2:          2  3500  3585 #3:          2  7850 10000

Search This Blog

Soju

r - Subset of a table that contains at least one element of another table -

Comments

Post a Comment

Popular posts from this blog

python - TypeError: start must be a integer -

c# - DevExpress RepositoryItemComboBox BackColor property ignored -

django - Creating multiple model instances in DRF3 -