Authors: Narasimman S, Jayavarman V, Parandhaman P, Vasanth V, Umavathi. V
Abstract: Rising storage and computational capacities have led to the accumulation of voluminous datasets. These datasets contain insights that describe natural phenomena, usage patterns, trends, and other aspects of complex, real-world systems. We propose greedy K-NN (K-Nearest Neighbor) data allocation strategies (across the agents) that improve the probability of identifying data leakages. These methods do not rely on alterations of the released data (e.g., watermarks). In some cases, we can also inject “realistic but fake” data records to further improve our chances of detecting leakage and identifying the guilty party. Mining large data requires intensive computing resources and data mining expertise, which might be inaccessible to most of the users. With the regularly obtainable cloud computing resources, data mining tasks cannot be stimulated to the cloud or outsourced to the third party to save cost. In this new pattern, data and model confidentiality becomes the major unease to the data owner. Data owners have to understand the possible trade-offs among client-side costs, model quality, and confidentiality to justify outsourcing solutions. In this paper, we propose the RASP Boost framework to address these problems in confidential cloud-based learning. The RASP-Boost approach works with our previous developed Random Space Data Perturbation (RASP) method to protect data confidentiality and uses the boosting framework to conquer the complexity of learning high-class classifiers as of RASP disconcerted data. So, we have to build upsome cloud-client combined boosting algorithms. These algorithms need low client-side calculation and communication expenses. The client does not call for to stay online in the progression of learning models. So, we have methodically studied the confidentiality of data, model, and learning process under a realistic security model.