r - Subset of a table that contains at least one element of another table -
i have 2 tables made intervals of bp, table1 has large intervals , second has short intervals (just 2bp). want make new table contains table 1 ranges have @ least 1 element of table 2 contained in "large" ranges. if doesn´t have element in table 2 corresponds table 1 range, range of table 1 should not included.
in example row 2 (1, 600, 1500) of table1 (df) should not included:
df <- "chromosome start end 1 1 450 1 600 1500 2 3500 3585 2 7850 10000" df <- read.table(text=df, header=t) table2 (df2)
df2 <- "chromosome start end 1 5 6 1 598 599 2 3580 3581 2 7851 7852 2 7859 7860" df2 <- read.table(text=df2, header=t) newtable (dfout):
dfout <- "chromosome start end 1 1 450 2 3500 3585 2 7850 10000" dfout <- read.table(text=df2, header=t)
try foverlaps data.table
library(data.table) setkey(setdt(df1), chromosome, start, end) setkey(setdt(df2), chromosome, start, end) setnames(unique(foverlaps(df1, df2, nomatch=0)[, c(1,4:5), with=false]), names(df1))[] # chromosome start end #1: 1 1 450 #2: 2 3500 3585 #3: 2 7850 10000 or @arun commented, can use which=true (to extract indices) , subset 'df1' using yid column.
df1[unique(foverlaps(df2, df1, nomatch=0l, which=true)$yid)] # chromosome start end #1: 1 1 450 #2: 2 3500 3585 #3: 2 7850 10000
Comments
Post a Comment