More about HKUST
Joint Embeddings of Chinese Words, Characters, and Fine-grained Subcharacter Components
MPhil Thesis Defence Title: "Joint Embeddings of Chinese Words, Characters, and Fine-grained Subcharacter Components" By Mr. Jinxing YU Abstract Word embedding has attracted much attention recently given its simplicity of word representation and generalization ability for a lot of downstream tasks. Different from alphabetic writing systems such as English, Chinese characters are often composed of subcharacter components which are also semantically informative. In this paper, we propose an approach to jointly embed Chinese words as well as their characters and fine-grained subcharacter components. We use three likelihoods to evaluate whether the context words, characters, and components can predict the current target word, and collected 13,253 subcharacter components to demonstrate the existing approaches of decomposing Chinese characters are not enough. Evaluation on intrinsic word similarity and word analogy tasks as well as extrinsic downstream classification tasks demonstrates the superior performance of our model. Date: Monday, 20 November 2017 Time: 2:00pm - 4:00pm Venue: Room 5510 Lifts 25/26 Committee Members: Prof. Nevin Zhang (Supervisor) Dr. Raymond Wong (Chairperson) Dr. Yangqiu Song **** ALL are Welcome ****