Apache Spark
Apache Spark is an open-source distributed computing framework designed for processing massive datasets efficiently. Current research emphasizes optimizing Spark's performance through automated parameter tuning, employing techniques like Bayesian optimization and transfer learning to minimize resource consumption (CPU, memory) and improve execution speed across diverse workloads, including machine learning tasks. This focus on efficient resource management and automated optimization is crucial for enabling large-scale data analysis in various domains, such as healthcare, finance, and general scientific computing, where handling massive datasets is increasingly critical.
Papers
March 9, 2024
September 5, 2023
March 17, 2023
February 8, 2023
September 21, 2022
September 17, 2022
September 7, 2022
April 4, 2022
February 23, 2022
February 8, 2022