PhD Qualifying Examination "A Survey on the Fringe of the Similarity World" By Mr. Yo-Sub Han Abstract: The similarity of two objects plays a crucial role in various applications such as face identification system in a security system, character recognition in a handheld device, document version control in a document warehouse and so on. The history of the similarity problem started from string comparisons in 1970s and has extended to tree and graph similarities. The web has become so popular now due to fast networks and fast machines that there is a tremendous amount of documents online. A web document written in a markup language such as HTML or XML has a tree structure. Since XML has been adopted by the world wide web consortium as the method to specify web-based structured documents, many online documents have been created by XML. Using XML DTD, we can define a grammar to prescribe how documents should be. To process and manage a huge amount of web documents, it is important to categorize them by grouping similar documents together. Then the question is how to define a similarity between two XML documents. We suggest two ways; one is to compare the grammars, DTDs in XML, of each document and the other is to compare documents if the grammar is not provided. For the former we study the structural properties of automata for regular expressions and for the latter we look at tree, graph and structured document similarities. We end this survey by presenting some open problems for web document similarity. Date: Tuesday, 27 January 2004 Time: 10:00a.m.-12:00noon Venue: Room 2302 lifts 17-18 Committee Members: Prof. Derick Wood (Supervisor) Prof. Mordecai Golin (Chairperson) Prof. Siu-Wing Cheng Prof. Rudolf Fleischer **** ALL are Welcome ****