COMP111 Lab Session 4

Data Processing with grep, sort and uniq

The trick is that the delimter is '*' which is a metacharacter for UNIX and thus has to be single-quoted.

  • Sort the file in ascending order according to the year of entry of the students.

    cut -d '*' -f2 lab4.data | sort -n 
    

  • Output student IDs which are duplicated in the file, as in the following:

    9917940
    

    cut -d '*' -f2 lab4.data | sort -n | uniq -d
    
  • List the most popular last name and the number of students using that last name, as in the following:

       6 Chan
    

    cut -d ' ' -f1 lab4.data | sort | uniq -c | head -1
    
  • How many domain names are used in the email addresses (the part after the @ sign). Generate the output as in the following:

      11 cs.ust.hk
       9 home.ust.hk
       5 ismt.ust.hk
       5 stu.ust.hk
    

    cut -d '@' -f2 lab4.data | sort | uniq -c
    

  • Put the commands and the output into lab4.out.

    Finally, mail "lab4.out" to the lab TA.


    Click here to go back to the COMP111 lab page.