COMP111 Lab Session 4
Data Processing with grep, sort and uniq
The trick is that the delimter is '*' which is a metacharacter for
UNIX and thus has to be single-quoted.
Sort the file in ascending order
according to the year of entry of the students.
cut -d '*' -f2 lab4.data | sort -n
Output student IDs which are duplicated in the file, as in the
following:
9917940
cut -d '*' -f2 lab4.data | sort -n | uniq -d
List the most popular last name and the number of students using
that last name, as in the following:
6 Chan
cut -d ' ' -f1 lab4.data | sort | uniq -c | head -1
How many domain names are used in the email addresses (the part after
the @ sign). Generate the output as in the following:
11 cs.ust.hk
9 home.ust.hk
5 ismt.ust.hk
5 stu.ust.hk
cut -d '@' -f2 lab4.data | sort | uniq -c
Put the commands and the output into lab4.out.
Finally, mail "lab4.out" to the lab TA.
Click here to go back to the COMP111 lab
page.