Improving the Expressiveness and Efficiency of Pattern Matching

PhD Thesis Proposal Defence


Title: "Improving the Expressiveness and Efficiency of Pattern Matching"

by

Mr. Luyu CHENG


Abstract:

Pattern matching is an important feature in many programming languages.
Originally popularized by functional programming languages, it is now
pervasive in other languages such as Python, Rust, and Scala. Despite
decades of research, we argue that pattern matching can still be improved
in both expressiveness and efficiency.

With respect to expressiveness, we introduce a new pattern-matching syntax
that unifies and generalizes traditional conditional forms while improving
readability. The syntax supports multiway and nested matches interleaved
with intermediate computations and bindings. Its expressiveness is enhanced
by two mechanisms: condition splitting, which factors out common conditional
prefixes, and conditional pattern flowing, which propagates bindings across
nested branches. We formalize the syntax in a surface calculus and define a
two-stage translation pipeline: (1) desugaring into a core calculus with only
flat patterns, followed by (2) normalization that removes backtracking. We
prove the semantic preservation property of the normalization. Additionally,
we provide a coverage-checking algorithm that operates over the normalized
core representation.

In terms of efficiency, we propose a language construct called pattern
definitions, which enables abstraction over patterns and addresses a key
limitation of traditional pattern matching and recursive functions:
expressing ambiguous or nondeterministic matches often requires manual
control flow, leading to verbose and non-modular code. Pattern definitions
support both data validation—for example, checking structured JSON input
against a schema-like pattern—and data transformation in a type-safe and
modular way. They naturally support recursive patterns, allowing programmers
to describe structures corresponding to regular tree languages. We also
present a concise formalization that captures the essence of structural
refinement types, giving patterns clear input/output semantics. We compile
pattern definitions into backtracking-free matcher functions that run in
asymptotically linear time, and we formally prove the correctness of the
compilation.

We design and implement our research contributions on MLscript, an
experimental mixed-paradigm ML-family language developed in our lab.
However, the constructs introduced are not specific to MLscript and can be
integrated, in part or in whole, into existing and future programming
languages.


Date:                   Monday, 24 November 2025

Time:                   4:00pm - 6:00pm

Venue:                  Room 5510
                        Lift 25/26

Committee Members:      Dr. Lionel Parreaux (Supervisor)
                        Prof. Charles Zhang (Chairperson)
                        Dr. Dongdong She