Approximation Algorithms -- COMP 670J

Class Project FAQ

This file contains tips and answers to frequently asked question concerning the COMP670J project. It will be updated frequently in response to your questions so I suggest that, at least for the first week or so, you check it every few days for additions.

LAST UPDATE: Mon Dec 3 11:15:14 HKT 2001

Linear Programming: There are many LP solvers available in books and on the web. One possible source is the book Numerical Recipes in C, which contains a general purpose simplex routine (the ``simplex method'' is one of the standard ways of solving LP problems). The book is available in the library and an older version is available on line here. (The simplex code is in section 10.8)

Another place to look is here which has links to an object-oriented simplex library. This looks more complicated to use than the Numerical Recipes code, though. Yet another place to look is the linear programming FAQ.

In the TSP project you will need to implement MST, Euler tour and Minimum cost weighted matching algorithms as subroutines. MST can be found in most textbooks on algorithms, e.g., the Cormen, Leiserson, Rivest, ``Introduction to Algorithms'' book. That same book has Euler Tour'' as an exercise (page 496) with hints. The Minimum cost weighted matching algorithm is the hardest of the three and, in my opinion, is where most of the work in this project will be needed. The best description of the algorithm that I know of is in the book ``Combinatorial Optimization'' by Steiglitz and Papadimitriou, Chapter 11, section 11.3, if you have trouble finding the book please come and see me to photocopy the relevant section.

You can also check out this code. It contains code for a weighted matching solver (it solves for max weighted matching but you can easily use this for min matching by negating the edge costs). I haven't tested it myself but it looks as if it should be useful for you. One warning; you'll have to compile a package.

In the Shortest Superstring project you will need to implement a Minimum cost bipartite weighted matching algorithm. The algorithm for this problem (much easier than the one for the general case) can also be found in Steiglitz and Papadimitriou, Chapter 11, this time in section 11.2. This problem is sometimes also known as the assignment problem and the algorithm for solving it, the Hungarian algorithm. I've been told that the book The Stanford GraphBase by Donald Knuth, available in our library, also contains good Hungarian algorithm code. Of course, you can also use the code for solving the general min-cost weighted matching problem to solve the bipartite case as well (it's just longer and more compkicated code).

The Stony Brook Algorithm Repository is a good place to look for algorithm code. In particular their linear program section (1.2.6) has pointers to various LP solvers.

In the SONET Ring paper there is a typo in the proof of Proposition 4.2 on page 6. The 9th line of the page starts with M = D_{g,h} - C_g - C_h. This is a mistake. It should be M = C_g + C_h - D_{g,h}.$

For this project you should be aware that even though the algorithm is not difficult the paper is not that easy to read; it will occasionally skip steps that you have to fill in for yourselves. If you are doing this project and are having difficulty with the paper please come and see me and we can work through the bottlenecks.

Added Mon Dec 3 11:15:14 HKT 2001

Q: What do you mean by a short description of how you will generate the data upon which you will test the performance of your program?

A: You should send me two things.
(i) for each set of parameters describe how you will generate the random data for the set of parameters.
Please address any issues that you considered when choosing this data generation method, e.g., how
well the data will test the approximation algorithm.

As an example, in Max-Sat the parameters are m = number of variables and n = number of clauses.
In this case you need to describe how for any given (m,n) you will generate a random instance.
An issue that you need to consider here is how to choose the number of variables in a particular
clause. If, for example, you let every clause have greater than 8 variables then, from the analaysis in the notes, we see that every clause has probability > .99 of being satsified. So, taking instances with clauses of length > 8 will not really test the differences between the random algorithm and best-of-two very well; you need to somehow guarantee many short clauses. In particular, the strategy of letting a variable have 50% probability of appearing in each clause is not a good generation method if the number of variable is > 16 or so. Max-Sat is is just an example, many of the projects have similar considerations.

(ii) After describing how you will generate a random data set for a particular set of parameters you will need to also specify for what values of the parameters you will generate the data sets and, how many times for each set of parameters (I recommend at least 10). As an example, in Max-Sat you might
decide to let m=3, 6, 9, 12 15, 25, 30 n = 5, 10, 15, 20, 25,30, 35, 40, 45, 50, 55, 60 and, for each,
(m,n) pair, generate 10 random instances.

If, after writing your code you find that for some reson you want to run it on more datasets, that
would be fine; if you want to not run it on some of the data sets you describe then,in your report, you will need to explain why, e.g., you did not realize in advance the limitations of the LP code that you would be using.