COMP111 Lab Session 6

COMP111 Lab Session 5

This lab consists of two parts:

Overview of the solution for homework 1.
Further exercises on data processing.

Discussion of Homework 1

Pay attention to:

Open and close several connections, one at a time, within the same ftp session.
Bugs in xhost on several Solaris machines in lab 2.
Use a combination of head and tail to extract lines from a file.

Small Exercises on Shell Variables, cut and grep

ypcat passwd will list the password file from the system's network information system; but DON't do it now because it is a very large file!
Use ypcat and other UNIX commands to list all records that contains Ho as an individual word in the fifth field. List only the fifth field.
Repeat above but do not show names with Ho as a middle name.
Define a variable FIRST with value Dik and a variable LAST with value Lee
Use ypcat, grep and the two variables to find out all records containing Dik Lee
Define a variable ALEX with value Alex Kean's Project
Find out the userid of that project from the passwd file.

Note: While attempting these short questions, don't just do it by trial-and-error. Try to understand why things don't work.

Analysis of Word Frequencies in a Passage

One powerful feature of Unix is its filter and pipe mechanism: the output of one command can be piped into another, and so on through a long chain of filters until the desired result emerges on the standard output or redirected to a file. You will go through a real problem in this exercise to gain a first hand experience on the mindset that you need to develop to fully exploit this feature.

New command in this lab: tr

Notice that there are two versions of tr: one stored in /usr/ucb/tr and the other in /usr/bin/tr; another copy identical to /usr/bin/tr may be stored in /bin/tr as well (so there may actually be more than 2 copies!).
Which command can be used to find out which copy of tr you would use if you just type tr on the unix prompt?
If you want tr to refer to /usr/ucb/tr by default, what would you have to do? (Note: you don't need to� make any change on your shell, just give the answer in words)?

Given a text file, how to determine the frequency of all the words that appear in the file? Consider the following text:

What is Life?<tab>Perhaps,we will never know what life is!

We would like to obtain the following output:

2 is
2 life
2 what
1 know
1 never
1 perhaps
1 we
1 will

You may think of writing a C/C++ program to solve this problem. We will guide you to write a pipeline of Unix commands that can achieve the same result.

1. Create the file "lab5.dat" with the following content:

What is Life?<tab>Perhaps,we will never know what life is!

2. The first step is to keep each word in "lab5.dat" on one line. That is, we need to convert any sequence of nonletters, such as spaces, tabs, and punctuation marks, into a single newline. The "tr" command serves this purpose. Since we did not cover this command in class, we will give you the answer. Your duty is to understand the answer by checking the manpage of "tr". To get the manpage for /usr/ucb/tr, use the command man -s 1b tr

/usr/ucb/tr -sc '[A-Z][a-z]' '\012' < lab5.dat > lab5.out1

3. The output in "lab5.out1" contains words such as "What" and "what", and "Life" and "life". We consider words in different cases as identical. What command can you use to convert all capital letters to small letters? Save your output to the file "lab5.out2".

4. The content of "lab5.out2" should be:

what
is
life
perhaps
we
will
never
know
what
life
is

Two commands that we've covered in class can be used to produce the following output from "lab5.out2".

2 is
1 know
2 life
1 never
1 perhaps
1 we
2 what
1 will

The first field of each line is the frequency of occurrence of the second field. Determine what these two commands are, execute them, and save your output as "lab5.out3".

5. You should now sort the output of "lab5.out3" in descending order based on the frequency field. Your final result is:

2 is
2 life
2 what
1 know
1 never
1 perhaps
1 we
1 will

6. Combine the commands from step 2 to 5 inclusive into a pipeline. You should save the final result to the file "lab5.ans". At the same time, the final result should be displayed to the standard output.

7. Include the pipeline of commands used in step 6 to the file "lab5.ans" and mail it to the TA.

Click here to go back to the COMP111 lab page.