[Openinfralabs] Project Caerus Update

Hui Lei dr.huilei at gmail.com
Wed Aug 4 03:01:53 UTC 2021


Dear all,

I would like to take this opportunity to give you another update on Project
Caerus. As you may remember, the project develops techniques such as
near-data processing and semantic caching to optimize the performance of
disaggregated data lakes. On the front of near data processing, we have
implemented the pushdown of a wide range of SQL operators from a Spark
cluster to a storage cluster that deploys either HDFS (CSV format) or S3.
Our evaluation using TCPH has shown significant improvements in application
latency, network I/O and compute-side CPU time. You can check out our design
document
<https://github.com/open-infrastructure-labs/caerus-dike/blob/master/doc/ndp_design.pdf>
and latest evaluation results
<https://github.com/open-infrastructure-labs/caerus-dike/blob/master/doc/s3_hdfs_results_6_1_2021.pdf>
in GitHub.

On the front of semantic cache, which explores opportune caching of a
variety of data and metadata, we have the core functionality working, with
4x-5x improvement in execution time and CPU time. Again the design document
<https://github.com/open-infrastructure-labs/caerus-semantic-cache/blob/master/Design.docx>
and the initial evaluation results
<https://github.com/open-infrastructure-labs/caerus-semantic-cache/blob/master/Evaluation.docx>
are available in GitHub.

As always, your comments and contributions are welcome.

- Hui
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opendev.org/pipermail/openinfralabs/attachments/20210803/782629e8/attachment.html>


More information about the Openinfralabs mailing list