More about HKUST
Algorithms, Applications, and Verification of Causal Structure Learning
PhD Thesis Proposal Defence
Title: "Algorithms, Applications, and Verification of Causal Structure
Learning"
by
Mr. Pingchuan MA
Abstract:
Understanding causal relations is one of the most fundamental problems in
scientific discovery, such as clinical trials, economics. The gold standard for
inferring causal relations is to conduct randomized experiments, which,
however, are often infeasible due to high costs or ethical concerns. In
contrast, causal structure learning (a.k.a., causal discovery) aims to infer
causal relations from observational data and learn the probabilistic graphical
model of the underlying data. Historically, conventional causal structure
learning algorithms generally relies on carefully-crafted criteria to deduce
graph structures. For instance, PC (Peter-Clark) algorithm conducts conditional
independence tests to constrain graphical structures and gradually deduce the
whole graph from data. As a result, they often produce spurious causal
relations.
In this thesis, we propose two novel algorithms, namely, ML4S and SPOT, which
leverages machine learning techniques to predict causal relations from
observational data. ML4S is a supervised causal structure learning algorithm
that predicts edge adjacencies in the causal skeleton. SPOT first infers the
posteriors of causal skeletons using amortized variational inference, and then
use the posteriors to guide the search of the causal graph (in a continuous
optimization setting). We show that both algorithms outperform the
state-of-the-art causal structure learning algorithms in terms of both accuracy
and scalability.
Then, we show an application of causal structure learning in the context of
explaining query results in databases. we present XINSIGHT, a general framework
for XDA (explainable data analysis). XINSIGHT provides data analysis with
qualitative and quantitative explanations of causal and non-causal semantics.
This way, it will significantly improve human understanding and confidence in
the outcomes of data analysis, facilitating accurate data interpretation and
decision making in the real world. XINSIGHT is a three-module, end-to-end
pipeline designed to extract causal graphs, translate causal primitives into
XDA semantics, and quantify the quantitative contribution of each explanation
to a data fact. XINSIGHT uses a set of design concepts and optimizations to
address the inherent difficulties associated with integrating causality into
XDA. Experiments on synthetic and real-world datasets as well as a user study
demonstrate the highly promising capabilities of XINSIGHT.
Finally, we propose a runtime verification tool called CICHECK, designed to
harden causal structure learning algorithms from reliability and privacy
perspectives. CICHECK employs a sound and decidable encoding scheme that
translates CIR into SMT problems. To solve the CIR problem efficiently, CICHECK
introduces a four-stage decision procedure with three lightweight optimizations
that actively prove or refute consistency, and only resort to costly SMT-based
reasoning when necessary. Based on the decision procedure to CIR, CICHECK
includes two variants: ED-CHECK and P-CHECK, which detect erroneous CI tests
(to enhance reliability) and prune excessive CI tests (to enhance privacy),
respectively. We evaluate CICHECK on four real-world datasets and 100 CIR
instances, showing its effectiveness in detecting erroneous CI tests and
reducing excessive CI tests while retaining practical performance.
Date: Friday, 28 June 2024
Time: 4:00pm - 6:00pm
Venue: Room 2127A
Lift 19
Committee Members: Dr. Shuai Wang (Supervisor)
Prof. Raymond Wong (Chairperson)
Prof. Bo Li
Dr. Dan Xu