Authors
Vaibhav Khadilkar, Kerim Yasin Oktay, Bijit Hore, Murat Kantarcioglu, Sharad Mehrotra, Bhavani Thuraisingham
Publication date
2011/11
Description
This paper explores query processing in a hybrid cloud model where a user? s local computing capability is exploited alongside public cloud services to deliver an efficient and secure data management solution. Hybrid clouds offer numerous economic advantages including the ability to better manage data privacy and confidentiality, as well as exerting control on monetary expenses of consuming cloud services by exploiting local resources. Nonetheless, query processing in hybrid clouds introduces numerous challenges, the foremost of which is, how to partition data and computation between the public and private components of the cloud. The solution must account for the characteristics of the workload that will be executed, the monetary costs associated with acquiring/operating cloud services as well as the risks affiliated with storing sensitive data on a public cloud. This paper proposes a principled framework for distributing data and processing in a hybrid cloud that meets the conflicting goals of performance, disclosure risk and resource allocation cost. The proposed solution is implemented as an add-on tool for a Hadoop and Hive based cloud computing infrastructure.