jueves, 17 de enero de 2013

sort -u like awk for few items and many dups

| awk '{a[$0]=1} END{for (k in a) print k}'

Performance test:
# create a file with few items and many dups
$ for i in {1..1000000}; do printf "a\nb\nc\n" >> few_items_many_dups.txt; done

# sort uniq using awk
$ time cat few_items_many_dups.txt | awk '{a[$0]=1} END{for (k in a) print k}' > few_items_many_dups.txt.awk_sortuniq
real    0m1.291s


# sort uniq using sort -u
$ time cat few_items_many_dups.txt | sort -u > few_items_many_dups.txt.sortuniq
real    0m36.726s