This lab consists of two parts:
Pay attention to:
Note: While attempting these short questions, don't just do it by trial-and-error. Try to understand why things don't work.
One powerful feature of Unix is its filter and pipe mechanism: the output of one command can be piped into another, and so on through a long chain of filters until the desired result emerges on the standard output or redirected to a file. You will go through a real problem in this exercise to gain a first hand experience on the mindset that you need to develop to fully exploit this feature.
New command in this lab: tr
Given a text file, how to determine the frequency of all the words that appear in the file? Consider the following text:
What is Life?<tab>Perhaps,we will never know what life is!
We would like to obtain the following output:
2 is
2 life
2 what
1 know
1 never
1 perhaps
1 we
1 will
You may think of writing a C/C++ program to solve this problem. We will guide you to write a pipeline of Unix commands that can achieve the same result.
1. Create the file "lab5.dat" with the following content:
What is Life?<tab>Perhaps,we will never know what life is!
2. The first step is to keep each word in "lab5.dat" on one line. That is, we need to convert any sequence of nonletters, such as spaces, tabs, and punctuation marks, into a single newline. The "tr" command serves this purpose. Since we did not cover this command in class, we will give you the answer. Your duty is to understand the answer by checking the manpage of "tr". To get the manpage for /usr/ucb/tr, use the command man -s 1b tr
/usr/ucb/tr -sc '[A-Z][a-z]' '\012' < lab5.dat > lab5.out1
3. The output in "lab5.out1" contains words such as "What" and "what", and "Life" and "life". We consider words in different cases as identical. What command can you use to convert all capital letters to small letters? Save your output to the file "lab5.out2".
4. The content of "lab5.out2" should be:
what
is
life
perhaps
we
will
never
know
what
life
is
Two commands that we've covered in class can be used to produce the following output from "lab5.out2".
2 is
1 know
2 life
1 never
1 perhaps
1 we
2 what
1 will
The first field of each line is the frequency of occurrence of the second field. Determine what these two commands are, execute them, and save your output as "lab5.out3".
5. You should now sort the output of "lab5.out3" in descending order based on the frequency field. Your final result is:
2 is
2 life
2 what
1 know
1 never
1 perhaps
1 we
1 will
6. Combine the commands from step 2 to 5 inclusive into a pipeline. You should save the final result to the file "lab5.ans". At the same time, the final result should be displayed to the standard output.
7. Include the pipeline of commands used in step 6 to the file "lab5.ans" and mail it to the TA.
Click here to go back to the COMP111 lab page.