L1. Adaptive
Block-Size Transform Based Just-Noticeable Difference Profile for Videos
Lin Ma and King N. Ngan
The Chinese University of Hong Kong, Hong Kong
In this paper, we propose a novel
adaptive block-size transform (ABT) based just-noticeable difference (JND)
model for videos. Firstly, the ABT-based spatial JND profile is extended to
spatial-temporal JND model for videos by considering temporal contrast
sensitivity function (TCSF), eye movement, and the motion information of the
objects in video sequence. Furthermore, a metric named motion characteristics
distance (MCD) is proposed to depict the motion characteristics similarity between
a macroblock and its corresponding sub-blocks. Based
on the proposed MCD and the obtained spatial image content information, a novel
balanced strategy is proposed to determine which transform size is employed to
generate the resulting JND model. Experimental results have demonstrated that
our proposed scheme could tolerate more distortions while preserving better perceptual
quality than other JND profiles, which means that the proposed model consists
well with human vision system (HVS). Moreover, for the balanced strategy,
experiments have shown that temporal motion characteristics accord very well
with the spatial image content information, which has demonstrated the
efficiency of our proposed balanced strategy.
L2. Robust
Joint Design of Linear Relay Precoder and Destination
Equalizer for Dual-Hop Amplify-and-Forward MIMO Relay Systems
Chengwen Xing, Shaodan Ma, and Yik-Chung Wu
The University of Hong Kong
This paper addresses the problem of robust linear relay precoder and destination equalizer design for a dual-hop
amplify-and-forward (AF) multiple-input multiple-output (MIMO) relay system,
with Gaussian random channel uncertainties in both hops. By taking the channel
uncertainties into account, two robust design algorithms are proposed to
minimize the mean-square error (MSE) of the output signal at the destination.
One is an iterative algorithm with its convergence proved analytically. The
other is an approximated closed-form solution with much lower complexity than
the iterative algorithm. Although the closed-form solution involves a minor
relaxation for the general case, when the column covariance matrix of the
channel estimation error at the second hop is proportional to identity matrix,
no relaxation is needed and the proposed closed-form solution is the optimal
solution. Simulation results show that the proposed algorithms reduce the sensitivity
of the AF MIMO relay systems to channel estimation errors, and perform better
than the algorithm using estimated channels only. Furthermore, the closed-form
solution provides a comparable performance to that of the iterative algorithm.
L3. Improving
Speech Recognition by Explicit Modeling of Phone Deletions
Tom Ko and Brian Mak
The Hong Kong University of Science and Technology
In a paper published by Greenberg in 1998, it was said
that in conversational speech, phone deletion rate may
go as high as 12% whereas syllable deletion rate is about 1%. The finding
prompted a new research direction of syllable modeling for speech recognition.
To date, the syllable approach has not yet fulfilled its promise. On the other
hand, there were few attempts to model phone deletions explicitly in current
ASR systems. In this paper, fragmented word models were derived from
well-trained cross-word triphone
models, and phone deletion was implemented by skip arcs for words consisting of
at least four phonemes. An evaluation on CSR-II WSJ1 Hub2 5K task shows that
even with this limited implementation of phone deletions in read speech, we
obtained a word error rate reduction of 6.73%.
L4. Gradient-Directed
Image Composition for High Dynamic Range Imaging
Wei Zhang
and Wai-Kuen Cham
The Chinese University of Hong Kong
In this paper, we present a simple but effective method
that takes advantage of the gradient information to tackle the challenging high
dynamic range (HDR) imaging tasks in both static and dynamic scenes. Given
multiple images at different exposures, the proposed approach is capable of
producing a pleasant and artifact-free tone-mapped like HDR image directly by
compositing them with the guidance of gradient-based quality assessment.
Especially, two novel quality measures: visibility and consistency are
developed based on the observations of gradient change among different
exposures. Compared to previous work, the proposed method is more appealing in
practice since it is computational efficient and frees users from the tedious
radiometric calibration and tone mapping steps. Experimental results in static
and dynamic HDR tasks with various exposure sequences demonstrate the effectiveness
of the proposed method.
L5. Secrecy
Rate Maximization of A MISO Channel with Multiple Multi-Antenna Eavesdroppers
via Semidefinite Programming
Qiang Li and Wing-Kin Ma
The Chinese University of Hong Kong
The advances of multi-antenna techniques has recently led
to renewed interest in physical-layer secrecy, a meaningful topic that enables us
to prevent eavesdroppers from retrieving information intended for a legitimate
user through physical layer designs. This paper address a secrecy-rate
maximization problem for the scenario of a multi-input single-output channel
listened by multiple multi-antenna eavesdroppers; e.g., in downlink. This
problem is non-convex and has no analytical solution. Through a careful
analysis and reformulation, we show that the secrecy-rate maximization problem has
a convex equivalent in form of a semidefinite program
(SDP). We also prove that the respective optimal transmit covariance generally can
yield a rank-one structure, implying that transmit beamforming
is secrecy-rate optimal in the considered scenario. Simulation results are also
provided to illustrate that the optimal transmit design solved by our SDP
approach can yield significantly improved secrecy rates than an existing
closed-form design.
L6. Speech
Enhancement in Car Noise Environment Based on An Analysis-Synthesis Approach
Using Harmonic Noise Model
R. F. Chen1,2, C. F. Chan1, H. C. So1,
Jonathan S. C. Lee2, and C. Y. Leung2
1City University of Hong Kong
2Avantwave Limited,
This paper presents a speech enhancement
method based on an analysis-synthesis framework using harmonic noise model
(HNM) in car noise environment. The major advantages of this method are
effective suppression of car noise even in very low signal-to-noise ratio
environments and mitigation of Òmusical tonesÓ which are generally introduced
by conventional methods. In this paper, we devise a complete analysis-synthesis
based speech enhancement system, and give details in HNM modeling, parameter
estimation, and car noise adaptation. Subjective evaluation results show that
the proposed method exhibits better noise suppression ability over conventional
approaches without obvious degradation of speech quality.
L7. Novel
Directional Gradient Descent Searches for Fast Block Motion Estimation
Lai-Man Po, Ka-Ho Ng, Kwok-Wai Cheung, Ka-Man Wong, Yusuf Md. Salah Uddin, and Chi-Wang Ting
City University of Hong Kong
Search point pattern-based fast block motion estimation
algorithms provide significant speedup for motion estimation but usually suffer
from being easily trapped in local minima. This may lead to low robustness in
prediction accuracy particularly for video sequences with complex motions. This
problem is especially serious in one-at-a-time search (OTS) and block-based
gradient descent search (BBGDS), which provide very high speedup ratio. A
multipath search using more than one search path has been proposed to improve
the robustness of BBGDS but the computational requirement is much increased. To
tackle this drawback, a novel directional gradient descent search (DGDS)
algorithm using multiple OTSs and gradient descent searches on the error
surface in eight directions is proposed in this letter. The search point
patterns in each stage depend on the minima found in these eight directions,
and thus the global minimum can be traced more efficiently. In addition, a fast
version of the DGDS (FDGDS) algorithm is also described to further improve the
speed of DGDS. Experimental results show that DGDS reduces computation load
significantly compared with the well-known fast block motion estimation algorithms.
Moreover, FDGDS can achieve faster speedup compared with the UMHexagonS algorithm in H.264/AVC implementation while
maintaining very similar rate-distortion performance.
P1. Closed-Form
Power Allocation Scheme for Space-Time Coded Multiple-Antenna Systems with
Imperfect CSI
Quan Kuang1, Shu-Hung Leung1
and Xiangbin Yu2
1City University of Hong Kong
2Nanjing University of Aeronautics and Astronautics1
This paper presents a closed-form power allocation scheme
for space-time coded multiple-antenna systems with imperfect channel state
information at the transmitter. The proposed scheme is based on a so-called
compressed signal-to-noise ratio (CSNR) criterion, where a single compression
factor is used to minimize the bit-error-rate (BER). The obtained closed-form
solution is computational efficient and achieves almost the same performance as
the existing optimal approach which requires numerical search, and outperforms
the existing suboptimal closed-form algorithm.
P2. VISA:
Versatile Impulse Structure Approximation for Time-Domain Linear Macromodeling
Chi-Un Lei and Ngai Wong
The University of Hong Kong,
We develop a rational function macromodeling algorithm named VISA (Versatile Impulse
Structure Approximation) for macromodeling of system
responses with (discrete) timesampled data. The ideas
of Walsh theorem and complementary signal are introduced to convert the macromodeling problem into a non-pole-based Steiglitz-McBride (SM) iteration without initial guess and
eigenvalue computation. We demonstrate the fast convergence and the versatile macromodeling requirement adoption through a P-norm
approximation expansion, using examples from practical data.
P3. On Clock
Synchronization Algorithms for Wireless Sensor Networks under Unknown Delay
Mei Leng
and Yik-Chung Wu
The University of Hong Kong
In this paper, three clock synchronization algorithms for
wireless sensor networks (WSNs) under unknown delay are derived. They include
the maximum likelihood estimator (MLE), a generalization of an estimator from
[15], and a novel low complexity estimator. Their corresponding performance
bounds are derived and compared, and complexities are also analyzed. It is
found that the MLE achieves the best performance with the price of high
complexity. For the generalized version of estimator from [15], although it has
low complexity, its performance is degraded with respect to the MLE. On the
other hand, the newly proposed estimator achieves the same performance as the
MLE, and the complexity is at the same level as the generalized version of
estimator in [15].
P4. Semi-definite
Programming Algorithms for Sensor Network Node Localization with Anchor
Position Uncertainty
Kenneth W.
K. Lui∗, W.-K. Ma , H. C. So∗ and Frankie K. W. Chan∗
Department of Electronic Engineering, City University of
Hong Kong
Department of Electronic Engineering, The Chinese
University of Hong Kong
Finding the positions of nodes in an ad hoc wireless
sensor network (WSN) with the use of the incomplete and noisy distance
measurements between nodes as well as anchor position information is currently
an important and challenging research topic. However, most WSN localization
studies have considered that the anchor positions are perfectly known which is
not a valid assumption in the underwater and underground scenarios. In this
paper, semi-definite programming (SDP) algorithms are devised for node
localization in the presence of the uncertainty. Computer simulations are
included to contrast the performance of the proposed algorithms with the conventional
SDP method and CRLB.
P5. Q-ary LDPC Decoder with Euclidean-distance-based Sorting
Criterion
X. H. Shen
and Francis C. M. Lau
The Hong Kong Polytechnic University
Q-ary
low-density parity-check (LDPC) codes, compared with binary ones, produce a
better error performance but with a higher decoding complexity. Various
solutions, such as speeding up single operations or reducing the total number
of operations, have been proposed for accelerating the decoding process. In
this letter, we propose a modification to the extended min-sum (EMS) decoding
algorithm. The aim is to improve the decoding speed without sacrificing any
error performance over an additive white Gaussian noise (AWGN) channel
environment.
P6. Effects
of Language Mixing for Automatic Recognition of Cantonese-English Code-Mixing
Utterances
Houwei Cao, P. C. Ching and Tan Lee
The Chinese University of Hong Kong
While automatic speech recognition of
either Cantonese or English alone has achieved a great degree of success,
recognition of Canton-English code-mixing speech is not as trivial. This paper attempts
to analyze the effect of language mixing on recognition performance of
code-mixing utterances. By examining the recognition results of Canton-English
code-mixing speech, where Canton is the matrix language and English is the
embedded language, we noticed that recognition accuracy of the embedded
language plays a significant role to the overall performance. In particular,
significant performance degradation is found in the matrix language if the
embedded words can not
be recognized correctly. We also studied the error
propagation effect of the embedded English. The results show that the error in
embedded English words may propagate to two neighboring Cantonese syllables.
Finally, analysis is carried out to determine the influencing factors for
recognition performance in embedded English.
P7. A Discriminant Subspace Framework for Speaker Recognition
Zhifeng Li, Weiwu Jiang and Helen Meng
The Chinese University of Hong Kong
We propose a new framework for speaker
recognition, referred as Fishervoice. It includes the
design of a feature representation known as the structured score vector (SSV),
which relates acoustic structures with ÒkeyÓ frames in an input utterance in
capturing relevant speaker characteristics. The framework also applies
nonparametric FisherÕs discriminant analysis to map the SSVs into a compressed
discriminant subspace, where matching is performed between a test sample and
reference speaker samples to achieve speaker recognition. The objective is to
reduce intra-speaker variability and emphasize discriminative class boundary
information to facilitate speaker recognition. Experiments based on the XM2VTSDB
corpus shows that the Fishervoice framework gave
superior performance, compared with other commonly used approaches, e.g.
GMM-UBM and Eigenvoice.
P8. Automatic
Estimation of Decoding Parameters Using Large-Margin Iterative Linear
Programming
Brian Mak
and Tom Ko
The Hong Kong University of Science and Technology
The decoding parameters in automatic speech recognition
— grammar factor and word insertion penalty—are usually determined by
performing a grid search on a development set. Recently, we cast their
estimation as a convex optimization problem, and proposed a solution using an
iterative linear programming algorithm. However, the solution depends on how
well the development data set matches with the test set. In this paper, we further
investigates an improvement on the generalization
property of the solution by using large margin training within the iterative
linear programming framework. Empirical evaluation on the WSJ0 5K speech
recognition tasks shows that the recognition performance of the decoding
parameters found by the improved algorithm using only a subset of the acoustic
model training data is even better than that of the decoding parameters found
by grid search on the development data, and is close to the performance of
those found by grid search on the test set.
P9. Prosodic Attribute Model for Spoken
Language Identification
Raymond W. M. Ng1, Cheung-Chi Leung2,
Tan Lee1, Bin Ma2 and Haizhou Li2,3
1The Chinese University of Hong Kong
2Institute for Infocomm
Research, Singapore
3University of Eastern Finland
Prosodic information is believed to carry
language-specific information useful to spoken language recognition. Modeling
prosodic features is a challenging problem, on which a wide diversity of approaches
have been investigated. In this paper, a novel
prosodic attribute model (PAM) is proposed to capture prosodic features with compact
models. It models the language-specific co-occurrence statistics of a
comprehensive set of prosodic features. When the prosodic LID system with PAM
is evaluated in NIST Language Recognition Evaluations (LRE) 2007 and 2009, it
demonstrates respectively 21% and 11% relative EER reduction compared to a phonotactic LID system. The contributions of prosodic
features in detecting some of the target languages, including tonal languages, are
even more substantial. It is also noted that most prosodic attributes in the
comprehensive set are making positive contributions.
P10. Exploration of Vocal Excitation
Modulation Features for Speaker Recognition
Ning
Wang, P. C. Ching, and Tan Lee
The Chinese University of Hong Kong
To derive spectro-temporal
vocal source features complementary to the conventional spectral-based vocal
tract features in improving the performance and reliability of a speaker
recognition system, the excitation related modulation properties are studied.
Through multi-band demodulation method, source related amplitude and phase
quantities are parameterized into feature vectors. Evaluation of the proposed
features is carried out first through a set of designed experiments on
artificially generated inputs, and then by simulations on speech database. It is
observed via the designed experiments that the proposed features are capable of
capturing the vocal differences in terms of F0 variation, pitch epoch shape,
and relevant excitation details between epochs. In the real task simulations,
by combination with the standard spectral features, both the amplitude and the phase-related
features are shown to evidently reduce the identification error rate and equal
error rate in the context of the Gaussian mixture model-based speaker
recognition system.
P11. Fast GMM Computation for Speaker
Verification Using Scalar Quantization and Discrete Densities
Guoli Ye1, Brian Mak1, Man-Wai Mak2
1The Hong Kong University of Science and Technology
2The Hong Kong Polytechnic University
Most of current state-of-the-art speaker verification
(SV) systems use Gaussian mixture model (GMM) to represent the universal background
model (UBM) and the speaker models (SM). For an SV system that employs
log-likelihood ratio between SM and UBM to make the decision, its computational
efficiency is largely determined by the GMM computation. This paper attempts to
speedup GMM computation by converting a continuous-density GMM to a single or a
mixture of discrete densities using scalar quantization. We investigated a
spectrum of such discrete models: from high-density discrete models to discrete
mixture models, and their combination called high density
discrete-mixture models. For the NIST 2002 SV task, we obtained an overall
speedup by a factor of 2–100 with little loss in EER performance.
P12. Speeding Up Subcellular Localization
by Extracting Informative Regions of Protein Sequences for Profile Alignment
Wei Wang, Man-Wai Mak and Sun-Yuan Kung
The Hong Kong Polytechnic University
The function of a protein is closely related to its subcellular
location. In the post-proteomics era, the amount of gene and protein data grows
exponentially, which necessitates the prediction of subcellular localization by
computational means. This paper proposes mitigating the computation burden of
alignment-based approaches to subcellular localization prediction by using the
information provided by the N-terminal sorting signals. To this end, a cascaded
fusion of cleavage site prediction and profile alignment is proposed.
Specifically, the informative segments of protein sequences
are identified by a cleavage site predictor. Then, only the informative
segments are applied to a homology-based classifier for predicting the subcellular
locations. Experimental results on a newly constructed dataset show that the
method can make use of the best property of both approaches and can attain an
accuracy higher than using the full-length sequences. Moreover, the method can
reduce the computation time by 20 folds. We advocate that the method will be
important for biologists to conduct large-scale protein annotation or for bioinformaticians to perform preliminary investigations on
new algorithms that involve pairwise alignments.
P13. New Motion Compensation Model via
Frequency Classification for Fast Video Super-resolution
Kwok-Wai Hung and Wan-Chi Siu
The Hong Kong Polytechnic University
A typical dynamic reconstruction-based super-resolution video
involves three independent processes: registration, fusion and restoration.
Fast video super-resolution systems apply translational motion compensation
model for registration with low computational cost. Traditional motion compensation
model assumes that the whole spectrum of pixels is consistent between frames.
In reality, the low frequency component of pixels often varies significantly.
We propose a translational motion compensation model via frequency
classification for video super-resolution systems. A novel idea to implement
motion compensation by combining the up-sampled current frame and the high frequency
part of the previous frame through the SAD framework is presented. Experimental
results show that the new motion compensation model via frequency
classification has an advantage of 2dB gain on average over that of the traditional
motion compensation model. The SR quality has 0.25dB gain on average after the
fusion process which is to minimize error by making
use of the new motion compensated frame.
P14. Viewpoint Switching in Multiview Videos Using SP-frames
Ki-Kit Lai, Yui-Lam Chan,
Chang-Hong Fu, and Wan-Chi Siu
The Hong Kong Polytechnic University
The distinguishing feature of multiview
video lies in the interactivity, which allows users to select their favorite
viewpoint. It switches bit-stream at a particular view when necessary instead
of transmitting all the views. The new SP-frame in H.264 is originally
developed for multiple bit-rate streaming with the support of seamless
switching. The SP-frame can also be directly employed in the viewpoint
switching of multiview videos. Notwithstanding the
guarantee of seamless switching using SPframes, the
cost is the bulky size of secondary SP-frames. This induces a significant
amount of additional space or bandwidth for storage or transmission, especially
for the multiview cenario.
For this reason, a new motion estimation and compensation technique operating
in the quantized transform (QDCT) domain is designed for coding secondary
SP-frame in this paper. Our proposed work aims at keeping the secondary
SP-frames as small as possible without affecting the size of primary SP-frames
by incorporating QDCT-domain motion estimation and compensation in the secondary
SP-frame coding. Simulation results show that the size of secondary SP-frames
can be reduced remarkably in viewpoint switching.
P15. Elastic Block Set Reconstruction for
Face Recognition
Dong Li1, Xudong Xie2,
Kin-Man Lam1 and
Zhigang Jin3
1The Hong Kong Polytechnic University
2Tsinghua University
3Tianjin University
In this paper,
a novel face recognition algorithm named elastic block set reconstruction
(EBSR) is proposed. In our method, the EBSR face is used to represent a set of
training faces and to simulate different factors in a query image. An EBSR face
is constructed by using the blocks from the training face images which best
match to the blocks of the query image at the corresponding locations. The
elastic local reconstruction (ELR) error is then used to evaluate how well a
block pair matches, and the query image is classified based on the accumulated
reconstruction error. The proposed method can effectively explore local
information in the training set and deal with various conditions well. Also,
the reconstruction error can be considered as a kind of dissimilarity measure,
which gives a new approach to designing the training set so as to maximize
robustness of recognition. Experiments show that consistent and promising
results are obtained.
P16. Subtractive Impairment, Additive
Impairment and Image Visual Quality
Songnan Li and King Ngi Ngan
The Chinese University of Hong Kong
In this paper, we propose an engineering-based image
quality metric which distinguishes subtractive
impairment from additive impairment. Since the amount of subtractive impairment
is up-bounded by the total details within the reference image but the same
limitation canÕt be applied to additive impairment, intuitively visual quality
degradation due to the two types of impairments should be measured differently.
In the proposed metric, subtractive and additive impairments are separated and
represented in the wavelet domain, and their influences to image visual quality
is measured by different equations. We tested the proposed metric on five
subjectively rated databases and proved its effectiveness in objective image quality
assessment.
P17. H.264 Fast Intra Mode Selection
Algorithm Based Direction Difference Measure in the Pixel Domain
Li-Li Wang and Wan-Chi Siu
The Hong Kong Polytechnic University
In this paper, a fast mode decision algorithm for Intra prediction
in the H.264/AVC is proposed. We use the characteristics of each directional
prediction mode to compute the strength of directional differences in the original
pixel domain to find the minimal direction error. This is the first time
reported in the literature that the intrinsic differences between the real-data
and the predictors of modes are used to form an algorithm for mode decision.
The approach allows us to select several better candidate modes for evaluation
instead of using the full search. Experimental results show that the proposed
method can achieve more than 80% reduction in computation with negligible
degradation in rate-distortion performance, and the results are better than
other algorithms available in the literature.
P18. A Novel Weighted Cross Prediction for
H.264 Intra Coding
Liping Wang, Lai-Man Po, Yusuf Md.Salah Uddin, Ka-Man Wong, and Shenyuan Li
City University of Hong Kong
In this paper, a novel weighted cross
prediction (WCP) mode is proposed to replace DC mode in Intra_4x4 prediction of
H.264/AVC. In the proposed scheme, the upper right part of one 4x4 block mainly
employs vertical prediction while the lower left part mainly uses horizontal
prediction, predicting both in vertical and horizontal directions in one block.
This scheme uses simple prediction equations with fixed weighting coefficients.
Experimental results show that WCP has improvement compared to H.264 and it is
very competitive while comparing to other Intra_4x4 prediction algorithms.
P19. Block-Matching Translation and Zoom
Motion-Compensated Prediction
Ka-Man Wong1, Lai-Man Po1, and Kwok-Wai Cheung2
1City University of Hong Kong
2Chu Hai College of Higher
Education
In modern video coding standards, motion compensated prediction
(MCP) plays a key role to achieve video compression efficiency. Most of them
make use of block matching techniques and assume the motions are pure
translational. Attempts toward a more general motion model are usually too
complex to be practical in near future. In this paper, a new Block-Matching
Translation and Zoom Motion-Compensated Prediction (BTZMP) is proposed to extend
the pure translational model to a more general model with zooming. It adopts
the camera zooming and object motions that becomes zooming while projected on video
frames. Experimental results show that BTZMP can give prediction gain up to
2.25dB for various sequences compared to conventional block-matching MCP. BTZMP
can also be incorporated with multiple reference frames technique to give extra
improvement, evidentially by the prediction gain ranging from 2.03 to 3.68dB in
the empirical simulations.