L1. Adaptive Block-Size Transform Based Just-Noticeable Difference Profile for Videos

Lin Ma and King N. Ngan

The Chinese University of Hong Kong, Hong Kong

In this paper, we propose a novel adaptive block-size transform (ABT) based just-noticeable difference (JND) model for videos. Firstly, the ABT-based spatial JND profile is extended to spatial-temporal JND model for videos by considering temporal contrast sensitivity function (TCSF), eye movement, and the motion information of the objects in video sequence. Furthermore, a metric named motion characteristics distance (MCD) is proposed to depict the motion characteristics similarity between a macroblock and its corresponding sub-blocks. Based on the proposed MCD and the obtained spatial image content information, a novel balanced strategy is proposed to determine which transform size is employed to generate the resulting JND model. Experimental results have demonstrated that our proposed scheme could tolerate more distortions while preserving better perceptual quality than other JND profiles, which means that the proposed model consists well with human vision system (HVS). Moreover, for the balanced strategy, experiments have shown that temporal motion characteristics accord very well with the spatial image content information, which has demonstrated the efficiency of our proposed balanced strategy.

L2. Robust Joint Design of Linear Relay Precoder and Destination Equalizer for Dual-Hop Amplify-and-Forward MIMO Relay Systems

Chengwen Xing, Shaodan Ma, and Yik-Chung Wu

The University of Hong Kong

This paper addresses the problem of robust linear relay precoder and destination equalizer design for a dual-hop amplify-and-forward (AF) multiple-input multiple-output (MIMO) relay system, with Gaussian random channel uncertainties in both hops. By taking the channel uncertainties into account, two robust design algorithms are proposed to minimize the mean-square error (MSE) of the output signal at the destination. One is an iterative algorithm with its convergence proved analytically. The other is an approximated closed-form solution with much lower complexity than the iterative algorithm. Although the closed-form solution involves a minor relaxation for the general case, when the column covariance matrix of the channel estimation error at the second hop is proportional to identity matrix, no relaxation is needed and the proposed closed-form solution is the optimal solution. Simulation results show that the proposed algorithms reduce the sensitivity of the AF MIMO relay systems to channel estimation errors, and perform better than the algorithm using estimated channels only. Furthermore, the closed-form solution provides a comparable performance to that of the iterative algorithm.

L3. Improving Speech Recognition by Explicit Modeling of Phone Deletions

Tom Ko and Brian Mak

The Hong Kong University of Science and Technology

In a paper published by Greenberg in 1998, it was said that in conversational speech, phone deletion rate may go as high as 12% whereas syllable deletion rate is about 1%. The finding prompted a new research direction of syllable modeling for speech recognition. To date, the syllable approach has not yet fulfilled its promise. On the other hand, there were few attempts to model phone deletions explicitly in current ASR systems. In this paper, fragmented word models were derived from well-trained cross-word triphone models, and phone deletion was implemented by skip arcs for words consisting of at least four phonemes. An evaluation on CSR-II WSJ1 Hub2 5K task shows that even with this limited implementation of phone deletions in read speech, we obtained a word error rate reduction of 6.73%.

L4. Gradient-Directed Image Composition for High Dynamic Range Imaging

Wei Zhang and Wai-Kuen Cham

The Chinese University of Hong Kong

In this paper, we present a simple but effective method that takes advantage of the gradient information to tackle the challenging high dynamic range (HDR) imaging tasks in both static and dynamic scenes. Given multiple images at different exposures, the proposed approach is capable of producing a pleasant and artifact-free tone-mapped like HDR image directly by compositing them with the guidance of gradient-based quality assessment. Especially, two novel quality measures: visibility and consistency are developed based on the observations of gradient change among different exposures. Compared to previous work, the proposed method is more appealing in practice since it is computational efficient and frees users from the tedious radiometric calibration and tone mapping steps. Experimental results in static and dynamic HDR tasks with various exposure sequences demonstrate the effectiveness of the proposed method.

L5. Secrecy Rate Maximization of A MISO Channel with Multiple Multi-Antenna Eavesdroppers via Semidefinite Programming

Qiang Li and Wing-Kin Ma

The Chinese University of Hong Kong

The advances of multi-antenna techniques has recently led to renewed interest in physical-layer secrecy, a meaningful topic that enables us to prevent eavesdroppers from retrieving information intended for a legitimate user through physical layer designs. This paper address a secrecy-rate maximization problem for the scenario of a multi-input single-output channel listened by multiple multi-antenna eavesdroppers; e.g., in downlink. This problem is non-convex and has no analytical solution. Through a careful analysis and reformulation, we show that the secrecy-rate maximization problem has a convex equivalent in form of a semidefinite program (SDP). We also prove that the respective optimal transmit covariance generally can yield a rank-one structure, implying that transmit beamforming is secrecy-rate optimal in the considered scenario. Simulation results are also provided to illustrate that the optimal transmit design solved by our SDP approach can yield significantly improved secrecy rates than an existing closed-form design.

L6. Speech Enhancement in Car Noise Environment Based on An Analysis-Synthesis Approach Using Harmonic Noise Model

R. F. Chen^1,2, C. F. Chan¹, H. C. So¹, Jonathan S. C. Lee², and C. Y. Leung²

¹City University of Hong Kong

²Avantwave Limited,

This paper presents a speech enhancement method based on an analysis-synthesis framework using harmonic noise model (HNM) in car noise environment. The major advantages of this method are effective suppression of car noise even in very low signal-to-noise ratio environments and mitigation of “musical tones” which are generally introduced by conventional methods. In this paper, we devise a complete analysis-synthesis based speech enhancement system, and give details in HNM modeling, parameter estimation, and car noise adaptation. Subjective evaluation results show that the proposed method exhibits better noise suppression ability over conventional approaches without obvious degradation of speech quality.

L7. Novel Directional Gradient Descent Searches for Fast Block Motion Estimation

Lai-Man Po, Ka-Ho Ng, Kwok-Wai Cheung, Ka-Man Wong, Yusuf Md. Salah Uddin, and Chi-Wang Ting

City University of Hong Kong

Search point pattern-based fast block motion estimation algorithms provide significant speedup for motion estimation but usually suffer from being easily trapped in local minima. This may lead to low robustness in prediction accuracy particularly for video sequences with complex motions. This problem is especially serious in one-at-a-time search (OTS) and block-based gradient descent search (BBGDS), which provide very high speedup ratio. A multipath search using more than one search path has been proposed to improve the robustness of BBGDS but the computational requirement is much increased. To tackle this drawback, a novel directional gradient descent search (DGDS) algorithm using multiple OTSs and gradient descent searches on the error surface in eight directions is proposed in this letter. The search point patterns in each stage depend on the minima found in these eight directions, and thus the global minimum can be traced more efficiently. In addition, a fast version of the DGDS (FDGDS) algorithm is also described to further improve the speed of DGDS. Experimental results show that DGDS reduces computation load significantly compared with the well-known fast block motion estimation algorithms. Moreover, FDGDS can achieve faster speedup compared with the UMHexagonS algorithm in H.264/AVC implementation while maintaining very similar rate-distortion performance.

P1. Closed-Form Power Allocation Scheme for Space-Time Coded Multiple-Antenna Systems with Imperfect CSI

Quan Kuang¹, Shu-Hung Leung¹ and Xiangbin Yu²

¹City University of Hong Kong

²Nanjing University of Aeronautics and Astronautics1

This paper presents a closed-form power allocation scheme for space-time coded multiple-antenna systems with imperfect channel state information at the transmitter. The proposed scheme is based on a so-called compressed signal-to-noise ratio (CSNR) criterion, where a single compression factor is used to minimize the bit-error-rate (BER). The obtained closed-form solution is computational efficient and achieves almost the same performance as the existing optimal approach which requires numerical search, and outperforms the existing suboptimal closed-form algorithm.

P2. VISA: Versatile Impulse Structure Approximation for Time-Domain Linear Macromodeling

Chi-Un Lei and Ngai Wong

The University of Hong Kong,

We develop a rational function macromodeling algorithm named VISA (Versatile Impulse Structure Approximation) for macromodeling of system responses with (discrete) timesampled data. The ideas of Walsh theorem and complementary signal are introduced to convert the macromodeling problem into a non-pole-based Steiglitz-McBride (SM) iteration without initial guess and eigenvalue computation. We demonstrate the fast convergence and the versatile macromodeling requirement adoption through a P-norm approximation expansion, using examples from practical data.

P3. On Clock Synchronization Algorithms for Wireless Sensor Networks under Unknown Delay

Mei Leng and Yik-Chung Wu

The University of Hong Kong

In this paper, three clock synchronization algorithms for wireless sensor networks (WSNs) under unknown delay are derived. They include the maximum likelihood estimator (MLE), a generalization of an estimator from [15], and a novel low complexity estimator. Their corresponding performance bounds are derived and compared, and complexities are also analyzed. It is found that the MLE achieves the best performance with the price of high complexity. For the generalized version of estimator from [15], although it has low complexity, its performance is degraded with respect to the MLE. On the other hand, the newly proposed estimator achieves the same performance as the MLE, and the complexity is at the same level as the generalized version of estimator in [15].

P4. Semi-definite Programming Algorithms for Sensor Network Node Localization with Anchor Position Uncertainty

Kenneth W. K. Lui^∗, W.-K. Ma^†, H. C. So^∗ and Frankie K. W. Chan^∗

Department of Electronic Engineering, City University of Hong Kong

^†Department of Electronic Engineering, The Chinese University of Hong Kong

Finding the positions of nodes in an ad hoc wireless sensor network (WSN) with the use of the incomplete and noisy distance measurements between nodes as well as anchor position information is currently an important and challenging research topic. However, most WSN localization studies have considered that the anchor positions are perfectly known which is not a valid assumption in the underwater and underground scenarios. In this paper, semi-definite programming (SDP) algorithms are devised for node localization in the presence of the uncertainty. Computer simulations are included to contrast the performance of the proposed algorithms with the conventional SDP method and CRLB.

P5. Q-ary LDPC Decoder with Euclidean-distance-based Sorting Criterion

X. H. Shen and Francis C. M. Lau

The Hong Kong Polytechnic University

Q-ary low-density parity-check (LDPC) codes, compared with binary ones, produce a better error performance but with a higher decoding complexity. Various solutions, such as speeding up single operations or reducing the total number of operations, have been proposed for accelerating the decoding process. In this letter, we propose a modification to the extended min-sum (EMS) decoding algorithm. The aim is to improve the decoding speed without sacrificing any error performance over an additive white Gaussian noise (AWGN) channel environment.

P6. Effects of Language Mixing for Automatic Recognition of Cantonese-English Code-Mixing Utterances

Houwei Cao, P. C. Ching and Tan Lee

The Chinese University of Hong Kong

While automatic speech recognition of either Cantonese or English alone has achieved a great degree of success, recognition of Canton-English code-mixing speech is not as trivial. This paper attempts to analyze the effect of language mixing on recognition performance of code-mixing utterances. By examining the recognition results of Canton-English code-mixing speech, where Canton is the matrix language and English is the embedded language, we noticed that recognition accuracy of the embedded language plays a significant role to the overall performance. In particular, significant performance degradation is found in the matrix language if the embedded words can not

be recognized correctly. We also studied the error propagation effect of the embedded English. The results show that the error in embedded English words may propagate to two neighboring Cantonese syllables. Finally, analysis is carried out to determine the influencing factors for recognition performance in embedded English.

P7. A Discriminant Subspace Framework for Speaker Recognition

Zhifeng Li, Weiwu Jiang and Helen Meng

The Chinese University of Hong Kong

We propose a new framework for speaker recognition, referred as Fishervoice. It includes the design of a feature representation known as the structured score vector (SSV), which relates acoustic structures with “key” frames in an input utterance in capturing relevant speaker characteristics. The framework also applies nonparametric Fisher’s discriminant analysis to map the SSVs into a compressed discriminant subspace, where matching is performed between a test sample and reference speaker samples to achieve speaker recognition. The objective is to reduce intra-speaker variability and emphasize discriminative class boundary information to facilitate speaker recognition. Experiments based on the XM2VTSDB corpus shows that the Fishervoice framework gave superior performance, compared with other commonly used approaches, e.g. GMM-UBM and Eigenvoice.

P8. Automatic Estimation of Decoding Parameters Using Large-Margin Iterative Linear Programming

Brian Mak and Tom Ko

The Hong Kong University of Science and Technology

The decoding parameters in automatic speech recognition — grammar factor and word insertion penalty—are usually determined by performing a grid search on a development set. Recently, we cast their estimation as a convex optimization problem, and proposed a solution using an iterative linear programming algorithm. However, the solution depends on how well the development data set matches with the test set. In this paper, we further investigates an improvement on the generalization property of the solution by using large margin training within the iterative linear programming framework. Empirical evaluation on the WSJ0 5K speech recognition tasks shows that the recognition performance of the decoding parameters found by the improved algorithm using only a subset of the acoustic model training data is even better than that of the decoding parameters found by grid search on the development data, and is close to the performance of those found by grid search on the test set.

P9. Prosodic Attribute Model for Spoken Language Identification

Raymond W. M. Ng¹, Cheung-Chi Leung², Tan Lee¹, Bin Ma² and Haizhou Li^2,3

¹The Chinese University of Hong Kong

²Institute for Infocomm Research, Singapore

³University of Eastern Finland

Prosodic information is believed to carry language-specific information useful to spoken language recognition. Modeling prosodic features is a challenging problem, on which a wide diversity of approaches have been investigated. In this paper, a novel prosodic attribute model (PAM) is proposed to capture prosodic features with compact models. It models the language-specific co-occurrence statistics of a comprehensive set of prosodic features. When the prosodic LID system with PAM is evaluated in NIST Language Recognition Evaluations (LRE) 2007 and 2009, it demonstrates respectively 21% and 11% relative EER reduction compared to a phonotactic LID system. The contributions of prosodic features in detecting some of the target languages, including tonal languages, are even more substantial. It is also noted that most prosodic attributes in the comprehensive set are making positive contributions.

P10. Exploration of Vocal Excitation Modulation Features for Speaker Recognition

Ning Wang, P. C. Ching, and Tan Lee

The Chinese University of Hong Kong

To derive spectro-temporal vocal source features complementary to the conventional spectral-based vocal tract features in improving the performance and reliability of a speaker recognition system, the excitation related modulation properties are studied. Through multi-band demodulation method, source related amplitude and phase quantities are parameterized into feature vectors. Evaluation of the proposed features is carried out first through a set of designed experiments on artificially generated inputs, and then by simulations on speech database. It is observed via the designed experiments that the proposed features are capable of capturing the vocal differences in terms of F0 variation, pitch epoch shape, and relevant excitation details between epochs. In the real task simulations, by combination with the standard spectral features, both the amplitude and the phase-related features are shown to evidently reduce the identification error rate and equal error rate in the context of the Gaussian mixture model-based speaker recognition system.

P11. Fast GMM Computation for Speaker Verification Using Scalar Quantization and Discrete Densities

Guoli Ye¹, Brian Mak¹, Man-Wai Mak²

¹The Hong Kong University of Science and Technology

²The Hong Kong Polytechnic University

Most of current state-of-the-art speaker verification (SV) systems use Gaussian mixture model (GMM) to represent the universal background model (UBM) and the speaker models (SM). For an SV system that employs log-likelihood ratio between SM and UBM to make the decision, its computational efficiency is largely determined by the GMM computation. This paper attempts to speedup GMM computation by converting a continuous-density GMM to a single or a mixture of discrete densities using scalar quantization. We investigated a spectrum of such discrete models: from high-density discrete models to discrete mixture models, and their combination called high density discrete-mixture models. For the NIST 2002 SV task, we obtained an overall speedup by a factor of 2–100 with little loss in EER performance.

P12. Speeding Up Subcellular Localization by Extracting Informative Regions of Protein Sequences for Profile Alignment

Wei Wang, Man-Wai Mak and Sun-Yuan Kung

The Hong Kong Polytechnic University

The function of a protein is closely related to its subcellular location. In the post-proteomics era, the amount of gene and protein data grows exponentially, which necessitates the prediction of subcellular localization by computational means. This paper proposes mitigating the computation burden of alignment-based approaches to subcellular localization prediction by using the information provided by the N-terminal sorting signals. To this end, a cascaded fusion of cleavage site prediction and profile alignment is proposed. Specifically, the informative segments of protein sequences are identified by a cleavage site predictor. Then, only the informative segments are applied to a homology-based classifier for predicting the subcellular locations. Experimental results on a newly constructed dataset show that the method can make use of the best property of both approaches and can attain an accuracy higher than using the full-length sequences. Moreover, the method can reduce the computation time by 20 folds. We advocate that the method will be important for biologists to conduct large-scale protein annotation or for bioinformaticians to perform preliminary investigations on new algorithms that involve pairwise alignments.

P13. New Motion Compensation Model via Frequency Classification for Fast Video Super-resolution

Kwok-Wai Hung and Wan-Chi Siu

The Hong Kong Polytechnic University

A typical dynamic reconstruction-based super-resolution video involves three independent processes: registration, fusion and restoration. Fast video super-resolution systems apply translational motion compensation model for registration with low computational cost. Traditional motion compensation model assumes that the whole spectrum of pixels is consistent between frames. In reality, the low frequency component of pixels often varies significantly. We propose a translational motion compensation model via frequency classification for video super-resolution systems. A novel idea to implement motion compensation by combining the up-sampled current frame and the high frequency part of the previous frame through the SAD framework is presented. Experimental results show that the new motion compensation model via frequency classification has an advantage of 2dB gain on average over that of the traditional motion compensation model. The SR quality has 0.25dB gain on average after the fusion process which is to minimize error by making use of the new motion compensated frame.

P14. Viewpoint Switching in Multiview Videos Using SP-frames

Ki-Kit Lai, Yui-Lam Chan, Chang-Hong Fu, and Wan-Chi Siu

The Hong Kong Polytechnic University

The distinguishing feature of multiview video lies in the interactivity, which allows users to select their favorite viewpoint. It switches bit-stream at a particular view when necessary instead of transmitting all the views. The new SP-frame in H.264 is originally developed for multiple bit-rate streaming with the support of seamless switching. The SP-frame can also be directly employed in the viewpoint switching of multiview videos. Notwithstanding the guarantee of seamless switching using SPframes, the cost is the bulky size of secondary SP-frames. This induces a significant amount of additional space or bandwidth for storage or transmission, especially for the multiview cenario. For this reason, a new motion estimation and compensation technique operating in the quantized transform (QDCT) domain is designed for coding secondary SP-frame in this paper. Our proposed work aims at keeping the secondary SP-frames as small as possible without affecting the size of primary SP-frames by incorporating QDCT-domain motion estimation and compensation in the secondary SP-frame coding. Simulation results show that the size of secondary SP-frames can be reduced remarkably in viewpoint switching.

P15. Elastic Block Set Reconstruction for Face Recognition

Dong Li¹, Xudong Xie², Kin-Man Lam¹ andZhigang Jin³

¹The Hong Kong Polytechnic University

²Tsinghua University

³Tianjin University

In this paper, a novel face recognition algorithm named elastic block set reconstruction (EBSR) is proposed. In our method, the EBSR face is used to represent a set of training faces and to simulate different factors in a query image. An EBSR face is constructed by using the blocks from the training face images which best match to the blocks of the query image at the corresponding locations. The elastic local reconstruction (ELR) error is then used to evaluate how well a block pair matches, and the query image is classified based on the accumulated reconstruction error. The proposed method can effectively explore local information in the training set and deal with various conditions well. Also, the reconstruction error can be considered as a kind of dissimilarity measure, which gives a new approach to designing the training set so as to maximize robustness of recognition. Experiments show that consistent and promising results are obtained.

P16. Subtractive Impairment, Additive Impairment and Image Visual Quality

Songnan Li and King Ngi Ngan

The Chinese University of Hong Kong

In this paper, we propose an engineering-based image quality metric which distinguishes subtractive impairment from additive impairment. Since the amount of subtractive impairment is up-bounded by the total details within the reference image but the same limitation can’t be applied to additive impairment, intuitively visual quality degradation due to the two types of impairments should be measured differently. In the proposed metric, subtractive and additive impairments are separated and represented in the wavelet domain, and their influences to image visual quality is measured by different equations. We tested the proposed metric on five subjectively rated databases and proved its effectiveness in objective image quality assessment.

P17. H.264 Fast Intra Mode Selection Algorithm Based Direction Difference Measure in the Pixel Domain

Li-Li Wang and Wan-Chi Siu

The Hong Kong Polytechnic University

In this paper, a fast mode decision algorithm for Intra prediction in the H.264/AVC is proposed. We use the characteristics of each directional prediction mode to compute the strength of directional differences in the original pixel domain to find the minimal direction error. This is the first time reported in the literature that the intrinsic differences between the real-data and the predictors of modes are used to form an algorithm for mode decision. The approach allows us to select several better candidate modes for evaluation instead of using the full search. Experimental results show that the proposed method can achieve more than 80% reduction in computation with negligible degradation in rate-distortion performance, and the results are better than other algorithms available in the literature.

P18. A Novel Weighted Cross Prediction for H.264 Intra Coding

Liping Wang, Lai-Man Po, Yusuf Md.Salah Uddin, Ka-Man Wong, and Shenyuan Li

City University of Hong Kong

In this paper, a novel weighted cross prediction (WCP) mode is proposed to replace DC mode in Intra_4x4 prediction of H.264/AVC. In the proposed scheme, the upper right part of one 4x4 block mainly employs vertical prediction while the lower left part mainly uses horizontal prediction, predicting both in vertical and horizontal directions in one block. This scheme uses simple prediction equations with fixed weighting coefficients. Experimental results show that WCP has improvement compared to H.264 and it is very competitive while comparing to other Intra_4x4 prediction algorithms.

P19. Block-Matching Translation and Zoom Motion-Compensated Prediction

Ka-Man Wong¹,Lai-Man Po¹, and Kwok-Wai Cheung²

¹City University of Hong Kong

²Chu Hai College of Higher Education

In modern video coding standards, motion compensated prediction (MCP) plays a key role to achieve video compression efficiency. Most of them make use of block matching techniques and assume the motions are pure translational. Attempts toward a more general motion model are usually too complex to be practical in near future. In this paper, a new Block-Matching Translation and Zoom Motion-Compensated Prediction (BTZMP) is proposed to extend the pure translational model to a more general model with zooming. It adopts the camera zooming and object motions that becomes zooming while projected on video frames. Experimental results show that BTZMP can give prediction gain up to 2.25dB for various sequences compared to conventional block-matching MCP. BTZMP can also be incorporated with multiple reference frames technique to give extra improvement, evidentially by the prediction gain ranging from 2.03 to 3.68dB in the empirical simulations.