Algorithms, Applications, and Verification of Causal Structure Learning

PhD Thesis Proposal Defence


Title: "Algorithms, Applications, and Verification of Causal Structure 
Learning"

by

Mr. Pingchuan MA


Abstract:

Understanding causal relations is one of the most fundamental problems in 
scientific discovery, such as clinical trials, economics. The gold standard for 
inferring causal relations is to conduct randomized experiments, which, 
however, are often infeasible due to high costs or ethical concerns. In 
contrast, causal structure learning (a.k.a., causal discovery) aims to infer 
causal relations from observational data and learn the probabilistic graphical 
model of the underlying data. Historically, conventional causal structure 
learning algorithms generally relies on carefully-crafted criteria to deduce 
graph structures. For instance, PC (Peter-Clark) algorithm conducts conditional 
independence tests to constrain graphical structures and gradually deduce the 
whole graph from data. As a result, they often produce spurious causal 
relations.

In this thesis, we propose two novel algorithms, namely, ML4S and SPOT, which 
leverages machine learning techniques to predict causal relations from 
observational data. ML4S is a supervised causal structure learning algorithm 
that predicts edge adjacencies in the causal skeleton. SPOT first infers the 
posteriors of causal skeletons using amortized variational inference, and then 
use the posteriors to guide the search of the causal graph (in a continuous 
optimization setting). We show that both algorithms outperform the 
state-of-the-art causal structure learning algorithms in terms of both accuracy 
and scalability.

Then, we show an application of causal structure learning in the context of 
explaining query results in databases. we present XINSIGHT, a general framework 
for XDA (explainable data analysis). XINSIGHT provides data analysis with 
qualitative and quantitative explanations of causal and non-causal semantics. 
This way, it will significantly improve human understanding and confidence in 
the outcomes of data analysis, facilitating accurate data interpretation and 
decision making in the real world. XINSIGHT is a three-module, end-to-end 
pipeline designed to extract causal graphs, translate causal primitives into 
XDA semantics, and quantify the quantitative contribution of each explanation 
to a data fact. XINSIGHT uses a set of design concepts and optimizations to 
address the inherent difficulties associated with integrating causality into 
XDA. Experiments on synthetic and real-world datasets as well as a user study 
demonstrate the highly promising capabilities of XINSIGHT.

Finally, we propose a runtime verification tool called CICHECK, designed to 
harden causal structure learning algorithms from reliability and privacy 
perspectives. CICHECK employs a sound and decidable encoding scheme that 
translates CIR into SMT problems. To solve the CIR problem efficiently, CICHECK 
introduces a four-stage decision procedure with three lightweight optimizations 
that actively prove or refute consistency, and only resort to costly SMT-based 
reasoning when necessary. Based on the decision procedure to CIR, CICHECK 
includes two variants: ED-CHECK and P-CHECK, which detect erroneous CI tests 
(to enhance reliability) and prune excessive CI tests (to enhance privacy), 
respectively. We evaluate CICHECK on four real-world datasets and 100 CIR 
instances, showing its effectiveness in detecting erroneous CI tests and 
reducing excessive CI tests while retaining practical performance.


Date:                   Friday, 28 June 2024

Time:                   4:00pm - 6:00pm

Venue:                  Room 2127A
                        Lift 19

Committee Members:      Dr. Shuai Wang (Supervisor)
                        Prof. Raymond Wong (Chairperson)
                        Prof. Bo Li
                        Dr. Dan Xu