How to compare 2 lists of ranges in bash?
Answer #1 100 %It depends on how big your files are, of course. If they are not big enough to exhaust the memory, you can try this 100% bash solution:
declare -a min=() # array of lower bounds of ranges
declare -a max=() # array of upper bounds of ranges
# read ranges in second file, store then in arrays min and max
while read a b; do
min+=( "$a" );
max+=( "$b" );
done < file2
# read ranges in first file
while read a b; do
# loop over indexes of min (and max) array
for i in "${!min[@]}"; do
if (( max[i] >= a && min[i] <= b )); then # if ranges overlap
echo "${min[i]} ${max[i]}" # print range
unset min[i] max[i] # performance optimization
fi
done
done < file1
This is just a starting point. There are many possible performance / memory footprint improvements. But they strongly depend on the sizes of your files and on the distributions of your ranges.
EDIT 1: improved the range overlap test.
EDIT 2: reused the excellent optimization proposed by RomanPerekhrest (unset already printed ranges from file2
). The performance should be better when the probability that ranges overlap is high.
EDIT 3: performance comparison with the awk
version proposed by RomanPerekhrest (after fixing the initial small bugs): awk
is between 10 and 20 times faster than bash
on this problem. If performance is important and you hesitate between awk
and bash
, prefer:
awk 'NR == FNR { a[FNR] = $1; b[FNR] = $2; next; }
{ for (i in a)
if ($1 <= b[i] && a[i] <= $2) {
print a[i], b[i]; delete a[i]; delete b[i];
}
}' file2 file1