Talk Details

The following shows the talk details.

Session A: Streaming, Security/Privacy and Real-World Applications

Time: 9:05am-10:30am

Talk 1

Talk A1: A Bird's-Eye View of Complex Streaming Data Analytics

Speaker: Minos Garofalakis (TUC & RC Athena, Greece)

Abstract: Massive continuous data streams arise naturally in several dynamic big data analytics applications, such as enabling observability for complex distributed systems, network-operations monitoring in large ISPs, or incremental federated learning over dynamic distributed data. In such settings, usage information from numerous devices needs to be continuously collected and analyzed for interesting trends and real-time reaction to different conditions (e.g., anomalies/hotspots, DDoS attacks, or concept drifts). Streaming data raises important memory-, time-, and communication-efficiency issues, making it critical to carefully optimize the use of available computation and communication resources. In this talk, I will provide a (biased) overview of some key algorithmic tools in the space of streaming data analytics, along with relevant applications and challenges.

Biography: Minos Garofalakis is the Director of the Information Management Systems Institute (IMSI) at the ATHENA Research Center and a Professor at the School of ECE at the Technical University of Crete (TUC). He also works as a (part-time) senior research consultant for Huawei ISR/ERC and is the Co-founder and Director of Research at Agora Labs, a startup company bringing state-of-the-art data privacy technologies to the healthcare domain. Minos received the MSc and PhD degrees from the University of Wisconsin-Madison, and previously held senior/principal researcher positions at Bell Labs (1998-2005), Intel Research Berkeley (2005-2007), and Yahoo! Research (2007-2008); in parallel, he held an Adjunct Professor position at the EECS Department of UC Berkeley (2006-2008). Between 2/2022-2/2023, he also worked as a consulting Senior Principal Scientist for Amazon Web Services (AWS). Minos's research interests lie in the broad area of Big Data Analytics. He has published over 170 papers that have received more than 17,000 citations (h-index=70) according to Google Scholar and is listed as an inventor in more than 35 US patent filings (29 issued patents) for companies such as Lucent, Yahoo!, AT&T, and Huawei. Minos is an ACM and IEEE Fellow, a Member of Academia Europaea, and a recipient of several awards, including the TUC "Excellence in Research" Award (2015), the Bell Labs President's Gold Award (2004), two Best Research Paper Awards (VLDB'2024, ICDE'2009), and ten "best of" conference paper selections.

Talk 18

Talk A2: A Black-Box Reduction for Deferred Data Structuring

Speaker:Yufei Tao (The Chinese University of Hong Kong)

Abstract: This talk will discuss how to minimize the total cost of answering r queries over n elements in an online manner when the value r <= n is unknown in advance. Traditional indexing, which first builds a complete index on the n elements before answering queries, may be unsuitable because the index's construction time — usually Ω(n log n) — can become the performance bottleneck. In contrast, for many problems, a lower bound of Ω(n log(1 + r)) holds on the total cost of r queries for every r ∈ [1, n]. Matching this lower bound is a primary objective of deferred data structuring (DDS), also known as database cracking in the system community. For a wide class of problems, we will present a generic reduction to convert traditional indexes into DDS algorithms that match the lower bound for a long range of r.

Biography: Yufei Tao is a computer scientist and professor at CUHK. He is an ACM Fellow, an IEEE Fellow, and the Editor-in-Chief of ACM TODS.

Talk 15

Talk A3: Secure and Privacy-Preserving RAG as a Service

Speaker:Jianliang Xu (Hong Kong Baptist University)

Abstract: Retrieval-Augmented Generation (RAG) offers powerful capabilities for enhancing large language models with external knowledge. While RAG as a Service (RAGaaS) can address the complexities of managing external data sources, it raises critical privacy concerns around sensitive query data. In this talk, we will explore the promise and challenges of RAGaaS, focusing on the trade-off between leveraging external knowledge and preserving user privacy. We will also discuss potential solutions and future directions for secure and privacy-preserving RAGaaS.

Biography:Prof. Xu is a Chair Professor and the Head of the Department of Computer Science at Hong Kong Baptist University. He received his BEng degree in Computer Science and Engineering from Zhejiang University, Hangzhou, China, and his PhD degree in Computer Science from the Hong Kong University of Science and Technology. His research interests include databases, blockchain, applied AI & LLM, and data security & privacy. With an H-index of 64, Prof. Xu has published over 250 technical articles in refereed journals and conferences, including SIGMOD, PVLDB, ICDE, and TKDE. His research has been funded by RGC, ITF, and NSFC, with total funding exceeding HK$40 million. Prof. Xu has served as an Associate Editor for multiple IEEE Transactions (TKDE, TBD, and TPDS) and the Proceedings of the VLDB Endowment (PVLDB). He is a Fellow of IEEE.

Talk 13

Talk A4: Local Differential Privacy Made Practical: Security, Efficiency, and Beyond

Speaker:Haibo Hu (Hong Kong Polytechnic University)

Abstract: While local differential privacy (LDP) has become the de-facto privacy model in big data analytics and deep learning, there are a few practical challenges to address in real-life applications. In this talk, I will present some recent progress along this line. The first challenge is the impractical semi-honest threat model. I will show several techniques in defending against poison attacks in an LDP setting. Then I will present how local differential privacy schemes can be leveraged in practical multi-service systems, and multi-class mining. Finally, I will show some open challenges for next generation deep learning and data analytics.

Biography:Dr. Haibo Hu is a professor with the Department of Electrical and Electronic Engineering, The Hong Kong Polytechnic University. His research interests include cybersecurity, data privacy, and adversarial machine learning. He has published over 180 research papers in refereed journals, international conferences, and book chapters, and is granted 6 US patents and 4 China/HK patents. He is the recipient of a number of titles and awards, including IWAIT 2021 Best Paper Award, IEEE MDM 2019 Best Paper Award, WAIM Distinguished Young Lecturer, ACM-HK Best PhD Paper, Microsoft Imagine Cup, and GS1 Internet of Things Award. He is a senior member of ACM, IEEE and CCF, and a certified Cisco CCNA Security Trainer.

Talk 7

Talk A5: Scaling Private Computation: Next-Generation Oblivious Primitives for Systems, Databases, and LLMs

Speaker:Ioannis Demertzis (UC Santa Cruz, CA, USA)

Abstract:Any privacy-preserving computation on encrypted data that relies solely on encryption can leak significant information about the plaintext input through leakage-abuse attacks. Industrial approaches that support confidential computing through hardware enclaves are susceptible to side-channel attacks; however, hardware enclaves provide an affordable and low-cost solution for any privacy-preserving computation. Oblivious primitives are a robust cryptographic tool that can mitigate leakage-abuse and software side-channel attacks (when combined with hardware enclaves).

Oblivious primitives find applications in various areas, including Signal's contact discovery, Anonymous Key Transparency, end-to-end encrypted email search, differential privacy in the shuffle model, large-scale software monitoring (e.g., Google's Prochlo), private federated learning, LLM privacy, Google's Privacy Sandbox, Google's FLEDGE, Titan Security Key, Asylo, and broader confidential computing efforts. In this talk, we explore recent advances in hardware-enclave-based oblivious primitives that scale private computations to terabyte-sized inputs—far exceeding the previous 100MB–4GB range. Our state-of-the-art oblivious primitives include a high-throughput oblivious key-value store (used in Signal's contact discovery, SOSP'21), a low-latency approach (PVLDB'24), the most scalable oblivious sort and shuffle (SP'24), and the first scalable oblivious filter, group-by, join approaches (USENIX'25). We will also discuss the current challenges, emerging opportunities, and potential avenues for future collaboration to further enhance the scalability and efficiency of private computation systems.

Biography: Ioannis Demertzis is an Assistant Professor in the Computer Science and Engineering Dept. at the University of California, Santa Cruz. His research focuses on applied cryptography, security & privacy, and secure databases/systems. His work has been published at top security, system and database conferences including USENIX, CRYPTO, NDSS, S&P, SIGMOD, SOSP, PVLDB and TODS. He is the recipient of the ACM SIGSAC Doctoral Dissertation Award Runner-up, Distinguished Dissertation Award of ECE (University of Maryland), and the Symantec Research Labs Graduate Fellowship. Before joining UCSC, he was a Postdoctoral Researcher at the EECS Dept. of UC Berkeley hosted by Prof. Raluca Ada Popa. He received his Ph.D. from the ECE Dept. of the University of Maryland, College Park advised by Prof. Charalampos Papamanthou. He obtained his ECE Diploma and M.Sc at the Technical University of Crete, under the supervision of Minos Garofalakis.

Talk 16

Talk A6: Graph Data Science for Social Goods: STAR Lab's Experience

Speaker:Reynold Cheng (The University of Hong Kong)

Abstract: In many metropolitan cities, there is a lack of manpower in social care. In Hong Kong, for example, the elderly care homes report a 70% shortage of employees. To alleviate these issues, recently there is a lot of attention on data science for social goods, or the use of technologies for enhancing service quality and streamlining administrative work of social workers. In this talk, I will discuss how the HKU STAR (Social Technology And Research) Lab uses data science technologies to support elderly and family care services. I will first introduce HINCare, a software platform that provides volunteering and cultivating mutual-help culture in the community. HINCare uses the HIN (Heterogeneous Information Network) to recommend helpers to elders or other service recipients, and is now supporting 14 NGOs and 7,000 users. I will also discuss our collaboration with the Hong Kong Jockey Club Charities Trust for developing a novel case management and data analysis system for 40% of the family care centers in Hong Kong. These projects have received an HKICT Award, Asia Smart App Awards, and HKU Knowledge Exchange Awards.

Biography:Prof. Reynold Cheng is currently the Division Head and Professor (Computer Science), at the School of Computing and Data Science, in the University of Hong Kong (HKU). He is a Steering Committee Member of the HKU Musketeers Foundation Institute of Data Science. He is an academic advisor to the College of Professional and Continuing Education of HKPU. He was an Associate Dean of Engineering in 2022-24. His research interests are in data science, big graph analytics and uncertain data management.

Professor Cheng is named the AI 2000 Most Influential Scholar Honorable Mention in Database in 2023 and 2024. He received the ACM Distinguished Membership Award and the HKU Outstanding Research Student Supervisor Award in 2023. He was listed as the World's Top 2% Scientists by Stanford University in 2022. He received the SIGMOD Research Highlights Reward 2020, HKICT Awards (2021, 2023), HKU Knowledge Exchange Award (2024) and HKU Knowledge Exchange Award (Engineering) (2024, 2021). He was granted an Outstanding Young Researcher Award 2011-12 by HKU. He received the Universitas 21 Fellowship in 2011, and two HKPU Computing Performance Awards in 2006 and 2007. He was a PC co-chair of IEEE ICDE 2021. He is on the editorial board of IS, DAPD and DSEJ.

Session B: Data Integration and Deep Learning

Time: 11:00am-11:25am

Talk 14

Talk B1: Entity Matching in Low-Resource Contexts

Speaker:Wen Hua (Hong Kong Polytechnic University)

Abstract: Entity matching is a fundamental process in data integration, playing a crucial role in ensuring the accuracy and consistency of information across various data sources. In today's data-driven world, where vast amounts of information are generated from diverse sources, entity matching becomes essential to identify and match the same entities that are represented differently in disparate datasets. It helps to eliminate data redundancy, improve data quality, and provide a unified view of the data, which further enables more accurate data analysis and decision-making in various domains such as healthcare, education, e-commerce, transportation, etc. In this talk, we will present the development of entity-matching models, discuss the challenges confronted when applying entity matching in practice, especially in low-resource scenarios, and share with you some promising solutions we have explored recently to overcome these issues.

Biography: Dr Wen Hua is an Associate Professor and Presidential Young Scholar in the Department of Data Science and Artificial Intelligence at the Hong Kong Polytechnic University. She received her Bachelor's and PhD degrees in Computer Science from Renmin University of China in 2010 and 2015, respectively. Before joining PolyU, she worked as a Senior Lecturer and an ARC DECRA Senior Research Fellow in the School of Information Technology and Electrical Engineering at the University of Queensland. She was awarded the Advance Queensland Early Career Research Fellowship in 2017 and the ARC Discovery Early Career Researcher Award in 2021, two highly competitive fellowships for early-career researchers. Her current research interests include knowledge graphs, information extraction and retrieval, data integration, high-performance query processing, and spatiotemporal data management. She has published 80+ papers in reputed journals and top international conferences, and served actively in various conference organizing committees and review boards. Her publications have received the ICDE 2015 Best Paper Award, the CIKM 2022 Best Paper Honourable Mention Award, and some other notable awards.

Talk 12

Talk B2: Effective data preprocessing for deep learning

Speaker:Lei Chen (HKUST (Guangzhou))

Abstract: TBC

Biography: Lei Chen received his BS degree in Computer Science at Tian Jin University, P.R.China (BS 94), and an MA degree in computer science at Asian Institute of Technology (AIT) Asian Institute of Technology (MS 97). He received a PhD degree in Computer Science at University of Waterloo.

Session C: Indexing and Graph Processing

Time: 2:15pm-3:30pm

Talk 4

Talk C1: Adaptive indexing in multidimensional spaces

Speaker:Nikos Mamoulis (UoI & RC Athena, Greece)

Abstract: Adaptive indexing (a.k.a. database cracking) refers to the process of progressively constructing an index while evaluating a workload of range queries. While this problem has been widely studied for the case where the indexed column is of a simple type, little attention has been given to the case where the indexed data are multidimensional and/or are of a range type. The main challenge is to make proper cracking decisions, given the exponential growth of available options as dimensionality grows. I will present our efforts in the past few years on adaptively indexing spatial and multidimensional data, in response to multidimensional range queries and kNN queries. I will also briefly present our suggested techniques for updating multidimensional adaptive indexes when serving a mixed workload of queries and data updates.

Biography: Nikos Mamoulis is a professor at the Department of Computer Science and Engineering, University of Ioannina (UoI), and a lead researcher at Archimedes Research Unit of Athena RC. Before joining UoI, he was a faculty member at the Department of Computer Science, University of Hong Kong. He holds a BEng degree in computer engineering and informatics from the University of Patras, Greece, and a PhD in computer science in from HKUST. His research focuses on the management and mining of complex data types. His work on spatio-temporal data management has received best paper and test-of-time awards. He is the recipient of a Marie Curie fellowship (2014-2018) and he has been granted over 15 research projects as a PI in Hong Kong and Greece. He is a senior ACM member. He has participated in many organization boards of international conferences and has been a PC member in more than 120 program committees of top conferences in database research (e.g., SIGMOD, VLDB, ICDE, EDBT, KDD). He has been an associate editor and an editorial board member in several journals including TKDE, VLDBJ, KAIS, Geoinformatica, and ACM TSAS.

Talk 11

Talk C2: Adaptive Indexing of Multidimensional Points

Speaker:Dimitris Papadias (HKUST)

Abstract: Adaptive indexes are generated progressively as a response to query processing. Consequently, parts of the index that participate in more queries, are more refined than the rest. In this talk, we apply the concepts of adaptive indexing to multidimensional points. As a first step, we discuss multidimensional bulk loading. Specifically, whereas conventional indexes are bulk loaded in advance (i.e., before the first query), we propose an adaptive structure that is built on-demand, when unprocessed nodes are encountered during query processing. This exhibits significant advantages compared to non-adaptive competitors when the queries cover a small part of the data space, in which case only a partial index is generated. As a second step, we extend the proposed techniques for dynamic updates, so that only parts of the data space that receive queries are progressively refined. For the rest of the index, we allow sub-optimal structure, in order to enhance update efficiency.

Biography: Dimitris Papadias is a Professor of Computer Science and Engineering, HKUST. Before joining HKUST, he worked and studied at the German National Research Center for Information Technology (GMD), the UCSD (California), the Technical University of Vienna, the NTUA, Queen's University (Canada), and University of Patras (Greece). He has served in the editorial boards of the VLDBJ, IEEE TKDE, and Information Systems, and has been the PC Chair od ACM SIGMOD.

Talk 10

Talk C3: Efficient Indexing for Flexible Label-Constrained Shortest Path Queries in Road Networks

Speaker:Raymond Wong (HKUST)

Abstract: The point-to-point shortest path query is widely used in many spatial applications, e.g., navigation systems. However, the returned shortest path minimizing only one objective fails to satisfy users' various routing requirements in practice. For example, the user may specify the order of using several transportation modes in the planned route. The Label-Constrained Shortest Path (LCSP) query under regular languages is powerful enough to express diversified routing demands in a labeled road network where each edge is associated with a label to denote its road type. The complex routing demand can be formulated by a regular language, and the edge labels along each path should be a word under the given regular language. Previous LCSP solutions were either inefficient in query processing or inflexible in their use of the languages since they made some assumptions about the given language. In this paper, we propose an efficient index-based solution called Border-based State Move (BSM), which can answer LCSP queries quickly with flexible use of the language constraint. Specifically, our BSM builds indexes to skip the exploration between a vertex and its border vertices during query processing. Our experiments conducted on real road networks demonstrated the superiority of our proposed BSM. It can reduce the query time over state-of-the-art solutions by two orders of magnitude.

Biography:Raymond Chi-Wing Wong is a Professor in Computer Science and Engineering (CSE) of The Hong Kong University of Science and Technology (HKUST). He is currently the associate head of Department of Computer Science and Engineering (CSE) and the director of Undergraduate Research Opportunities Program (UROP). He was the associate director of the Data Science & Technology (DSCT) program (from 2019 to 2021), the director of the Risk Management and Business Intelligence (RMBI) program (from 2017 to 2019), the director of the Computer Engineering (CPEG) program (from 2014 to 2016) and the associate director of the Computer Engineering (CPEG) program (from 2012 to 2014). He received the BSc, MPhil and PhD degrees in Computer Science and Engineering in the Chinese University of Hong Kong (CUHK) in 2002, 2004 and 2008, respectively. In 2004-2005, he worked as a research and development assistant under an R&D project funded by ITF and a local industrial company called Lifewood.

He received 43 awards. He published 130 conference papers (e.g., SIGMOD, SIGKDD, VLDB, ICDE and ICDM), 49 journal/chapter papers (e.g., TODS, DAMI, TKDE, VLDB journal and TKDD) and 1 book. He reviewed papers from conferences and journals related to data mining and database, including VLDB conference, SIGMOD, TODS, VLDB Journal, TKDE, TKDD, ICDE, SIGKDD, ICDM, DAMI, DaWaK, PAKDD, EDBT and IJDWM. He is a program committee member of conferences, including SIGMOD, VLDB, ICDE, KDD, ICDM and SDM, and a referee of journals, including TODS, VLDBJ, TKDE, TKDD, DAMI and KAIS. His research interests include database and data mining.

Talk 6

Talk C4: A Tale of Intervals

Speaker:Panagiotis Bouros (JGU Mainz, Germany)

Abstract: Given a discrete or continuous 1D space, an interval is a range data type defined by a starting and an ending point in this domain. Collections of intervals (associated with objects) are found in a wide range of applications and fields, for instance in temporal databases, probabilistic databases, anonymized databases, computational geometry and spatial databases, and data streaming applications. Fundamental querying operations on intervals include point (stabbing) or range selections and joins. In this talk, I summarise my work on managing intervals; I discuss novel methods for organising and indexing data collections and for scalable query processing.

Biography: Panagiotis Bouros holds a diploma and a doctorate degree from the School of Electrical and Computer Engineering at the National Technical University of Athens, Greece. Since 2018, he is an assistant professor at the Institute of Computer Science in Johannes Gutenberg University Mainz, Germany, and the head of the Data Management group. Prior to Mainz, he held research positions at Aarhus University, Denmark, Humboldt-Universität zu Berlin, Germany and the University of Hong Kong, China. His research interests are in scalable data management and query processing with a special focus on non-traditional data types such as spatial and temporal data, text and graphs. Results of his work have appeared multiple times in international journals and conference proceedings including the VLDB Journal, Proceedings of the VLDB Endowment, IEEE TKDE, ACM SIGMOD, IEEE ICDE, EDBT, ACM SIGSPATIAL.

Talk 17

Talk C5: Graph Prompt Learning and Pre-training

Speaker:Hong Cheng (The Chinese University of Hong Kong)

Abstract: Recently, "pre-training and fine-tuning" has been adopted as a standard workflow for many graph tasks since it can take general graph knowledge to relieve the lack of graph annotations from each application. However, graph tasks with node level, edge level, and graph level are far diversified, making the pre-training pretext often incompatible with these multiple tasks. This gap may even cause a negative transfer to the specific application, leading to poor results. Inspired by the prompt learning in natural language processing (NLP), which has presented significant effectiveness in leveraging prior knowledge for various NLP tasks, in the first work, we study the prompting topic for graphs with the motivation of filling the gap between pre-trained models and various graph tasks. We propose a novel multi-task prompting method for graph models. To narrow the gap between various graph tasks and state-of-the-art pre-training strategies, we study the task space of various graph applications and reformulate downstream problems to the graph-level task. Afterward, we introduce meta-learning to efficiently learn a better initialization for the multi-task prompt of graphs so that our prompting framework can be more reliable and general for different tasks. In the second work, we study cross-domain graph pre-training and propose a novel approach called Graph COordinators for PrEtraining (GCOPE) that harnesses the underlying commonalities across diverse graph datasets to enhance few-shot learning. By successfully leveraging the synergistic potential of multiple graph datasets for pretraining, our work stands as a pioneering contribution to the realm of graph foundational model.

Biography: Hong Cheng is a Professor in the Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong. She received the Ph.D. degree from the University of Illinois at Urbana-Champaign in 2008. Her research interests include data mining, database systems, and machine learning. She received best research paper awards at SIGKDD'23. She received the 2010 Vice-Chancellor's Exemplary Teaching Award at The Chinese University of Hong Kong.

Session D: Similarity and Database System Design

Time: 4:00pm-4:40pm

Talk 9

Talk D1: Optimizing Similarity Search: From LSH Theory to Data-Driven Practice

Speaker:Xiaofang Zhou (HKUST)

Abstract: Workloads in the post-LLM era (e.g., RAG) frequently require semantic similarity search on embedding vectors. This shift has led to a renewed need for efficient range search, nearest neighbor (NN), and related tasks. In this talk, I will introduce a locality-sensitive hashing (LSH) framework for high-dimensional approximate NN search problems, which offers the best-known query complexity in theory, along with several optimizations to improve query quality and accelerate lookups in practice. Next, I will show how machine learning can leverage data distribution patterns to address the dilemma in similarity search, where neither traditional low-dimensional indexes (e.g., R-tree) nor high-dimensional indexes (e.g. LSH) are well-suited or effective.

Biography: Professor Xiaofang Zhou holds the Otto Poon Professorship in Engineering and is a Chair Professor of Computer Science and Engineering at HKUST, where he leads the department. His work spans database systems, data quality management, big data analytics, machine learning, and AI. He chaired the IEEE ICDE 2013, ACM CIKM 2016, and PVLDB 2020 conferences, and was General Chair for ICDE 2025 and ACM MM 2015. Prior to HKUST, he was a Computer Science Professor at The University of Queensland, heading its Data Science discipline. He is a Global STEM Scholar and an IEEE Fellow.

Talk 5

Talk D2: Explanations and Responsible Data Management

Speaker:Evaggelia Pitoura (UoI & RC Athena, Greece)

Abstract:Explainable AI (XAI) is essential for building trust, ensuring accountability, and enabling human oversight in critical applications where AI-driven decisions impact lives. In this short talk, I will present our ongoing work on explainability with emphasis on counterfactual explanations. Counterfactual explanations provide insights by identifying the minimum changes to input data that would alter the output of an algorithm. We have deployed counterfactual explanations among others to understand clustering models, identify biases in data, and gain a deeper understanding of Retrieval-Augmented Generation.

Biography: Evaggelia Pitoura is a Professor at the Department of Computer Science and Engineering at the University of Ioannina and a Lead Researcher at Archimedes Research Unit of Athena RC, Greece. She holds a BEng degree from the University of Patras, Greece, and an MS and PhD from Purdue University, USA. Her current research interests focus on two primary areas: responsible AI and graph exploration and analysis. For her work, he has received best paper awards, a Marie Currie Fellowship and two Recognition of Service Awards from ACM. She is an ACM senior member, chair of the Greek ACM-W event steering committee, chair of the Hellenic ACM SIGMOD chapter, and member of the sectorial scientific council of Greece National Council for Research, Technology and Innovation.

Talk 5

Talk D3: Efficient Execution of UDF Queries in Modern Data Engines

Speaker:Alkis Simitsis (RC Athena, Greece)

Abstract: User-defined functions (UDFs) have been widely used to overcome the expressivity limitations of SQL and complement its declarative nature with functional capabilities. UDFs are particularly useful in today's applications that involve complex data analytics and machine learning algorithms and logic. However, UDFs pose significant performance challenges in query processing and optimization, largely due to the mismatch of the UDF execution and SQL processing environments. To deal with this problem, research and commercial systems employ a broad scope of solutions ranging from algebraic, cost-based optimization to low level, physical query optimization, compilation, and execution. In this talk, we will highlight effective techniques to boost the performance of UDF queries including vectorization, parallelization, tracing JIT compilation, and operator fusion for various types of UDF (scalar, aggregate, table UDFs) and relational operators.

Biography: Alkis Simitsis is a Research Director at Athena Research Center. In the past, he held positions with HP/HPE Labs, Micro Focus, Unravel Data, and IBM Research, including Chief Scientist, Systems Architect, and Principal Research Scientist. Alkis brings 20+ years of experience in both startup and corporate environments, building innovative information and data management solutions and enterprise-grade products in areas such as scalable big data infrastructure, data-intensive analytics, information management, business intelligence, massively parallel processing, distributed databases, column-store databases, security analytics, and cloud computing. Alkis holds 45 U.S. and 1 European patents, and has published 130+ papers in refereed international journals and conferences (7500+ citations, h-index: 46), and frequently serves in the organization and program committees of top-tier international scientific conferences. His most recent service includes General co-chair for VLDB 2027, PC co-chair for IEEE ICDE 2026 and EDBT 2025, associate editor for ACM SIGMOD 2026, PVLDB 2026/2025/2023, IEEE ICDE 2023, and he is in the editorial board of ACM/IMS J. of Data Science, VLDB Journal, and Elsevier DKE. Alkis is a recipient of the ACM DOLAP 2023 Test-of-Time Award, several best demo paper awards (IEEE ICDE 2024, EDBT 2024, ACM CIKM 2020, ACM SIGMOD 2014 and 2012) and service awards (PVLDB 2023, IEEE ICDE 2023, EDBT 2023/2024, ACM SIGMOD 2021).