Theta-join SQL Operators Optimization on Distributed Systems

This project aims to design algorithms to accelerate theta-join SQL operators on distributed systems.

Project Details

Algorithms:
- “Equal” SQL operator acceleration algorithm
- “Less than” SQL operator acceleration algorithm
- “More than” SQL operator acceleration algorithm
Environment:
- Memory: 40 * 50G
- CPU Cores: 40 * 7
- Platform: Spark 2.2.0 + JDK 1.8
Data Set:
- TPC-H
- 1G, 10G, 100G
Performance:
- 5 ~ 400 times faster than Spark SQL.

People

Zijian Li

Xiaofei Zhang

Wenjie Liu

Associate Professor