r - Subset of a table that contains at least one element of another table -
i have 2 tables made intervals of bp, table1 has large intervals , second has short intervals (just 2bp). want make new table contains table 1 ranges have @ least 1 element of table 2 contained in "large" ranges. if doesn´t have element in table 2 corresponds table 1 range, range of table 1 should not included.
in example row 2 (1, 600, 1500
) of table1 (df
) should not included:
df <- "chromosome start end 1 1 450 1 600 1500 2 3500 3585 2 7850 10000" df <- read.table(text=df, header=t)
table2 (df2
)
df2 <- "chromosome start end 1 5 6 1 598 599 2 3580 3581 2 7851 7852 2 7859 7860" df2 <- read.table(text=df2, header=t)
newtable (dfout
):
dfout <- "chromosome start end 1 1 450 2 3500 3585 2 7850 10000" dfout <- read.table(text=df2, header=t)
try foverlaps
data.table
library(data.table) setkey(setdt(df1), chromosome, start, end) setkey(setdt(df2), chromosome, start, end) setnames(unique(foverlaps(df1, df2, nomatch=0)[, c(1,4:5), with=false]), names(df1))[] # chromosome start end #1: 1 1 450 #2: 2 3500 3585 #3: 2 7850 10000
or @arun commented, can use which=true
(to extract indices) , subset 'df1' using yid
column.
df1[unique(foverlaps(df2, df1, nomatch=0l, which=true)$yid)] # chromosome start end #1: 1 1 450 #2: 2 3500 3585 #3: 2 7850 10000
Comments
Post a Comment