[Openinfralabs] Project Caerus Update

Hui Lei dr.huilei at gmail.com
Wed Aug 4 03:01:53 UTC 2021

Dear all,

I would like to take this opportunity to give you another update on Project
Caerus. As you may remember, the project develops techniques such as
near-data processing and semantic caching to optimize the performance of
disaggregated data lakes. On the front of near data processing, we have
implemented the pushdown of a wide range of SQL operators from a Spark
cluster to a storage cluster that deploys either HDFS (CSV format) or S3.
Our evaluation using TCPH has shown significant improvements in application
latency, network I/O and compute-side CPU time. You can check out our design
and latest evaluation results
in GitHub.

On the front of semantic cache, which explores opportune caching of a
variety of data and metadata, we have the core functionality working, with
4x-5x improvement in execution time and CPU time. Again the design document
and the initial evaluation results
are available in GitHub.

As always, your comments and contributions are welcome.

- Hui
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opendev.org/pipermail/openinfralabs/attachments/20210803/782629e8/attachment.html>

More information about the Openinfralabs mailing list