A Utility-Optimized Framework for Personalized Private Histogram Estimation

Page 1

A Utility-Optimized Framework for Personalized Private Histogram Estimation

Abstract: Recently, local differential privacy (LDP), as a strong and practical notion, has been applied to deal with privacy issues in data collection. However, existing LDPbased strategies mainly focus on utility optimization at a single privacy level while ignoring various privacy preferences of data providers and multilevel privacy demands for statistics. In this paper, we for the first time propose a framework to optimize the utility of histogram estimation with these two privacy requirements. To clarify the goal of privacy protection, we personalize the traditional definition of LDP. We design two independent approaches to minimize the utility loss: Advanced Combination, which composes multilevel results for utility optimization, and Data Recycle with Personalized Privacy, which enlarges sample size for estimation. We demonstrate their effectiveness on privacy and utility, respectively. Moreover, we embed these approaches within a Recycle and Combination Framework and prove that the framework stably achieves the optimal utility by quantifying its error bounds. On real-world datasets, our approaches are experimentally validated and remarkably outperform baseline methods.


Existing system: According to the procedure of AC, the accuracy of the existing results is the bottleneck for its optimization effect. As shown in Eq. (9), one way to improve the accuracy of results estimated by PG (i.e., the components served for AC) is to add more samples to that privacy group. A naive sample expansion scheme is based on composition theorem; randomly splitting privacy budget of the level user chooses to enlarge the average data volume over all levels (marked as NE). The specific algorithm is shown in Appendix D.With no doubts, NE achieves __-PVLDP. However, it is not a good mechanism in terms of utility, though the average data volume expands, because the volume expansion at some levels costs the volume reduction at others. Our claim is proven in the following context by analyzing its theoretical error and experimental performance. Proposed system: Local differential privacy (LDP) achieves DP on the user side. The first relevant study dates back to randomized response (RR) proposed by Warner. Based on RR, Duchi et al. Design a LDP mechanism for multinomial estimation (i.e., histogram) and prove its optimality on utility in minimax framework. Since this mechanism is a simple and utility-optimal, we adopt it as the basic model in our framework (i.e., PG), on which AC, DRPP and RCF are designed. Moreover, bassily et al. proposes a LDP protocol for succinct histogram estimation (i.e., SHP) and proves its asymptotical optimality. SHP is a utility optimization method for PG, but it has complex computation and is hard to be applied on other estimations. Compared to SHP, our methods perform better and they can be extended to other estimations with quantifies `2-error. Advantages: The black column is always the lowest one, our conclusion is practically confirmed that RCF not only overcomes the weaknesses of DRPP and AC, but also embraces their advantages on the utility optimization. Moreover, the best performance of RCF appears on L8 (Figs. 6h and 7h). Even if the extra derived data provided by DPRR is not that much on this level, relatively more budget ensures a relatively considerable utility improvement, and


meanwhile, AC performs well on L8, due to the sufficient extra estimated results from other levels. Therefore, the utility peak is located on L8. Disadvantages: These practical works focus on LDP mechanism design for specific problem, and ignore the utility. Different from them, the methods proposed in this paper for utility improvement are more general and have relatively favorable expandability (since AC works on quantified `2-errors to combine existed results, and DRPP are used to increase sample size). In this paper, we study the problem of utility optimization for personalized private histogram estimation. We propose a priori strategy DRPP and a posterior strategy AC to optimize the utility at each privacy level respectively. We theoretically analyze their effectiveness and limitations. Modules: Local differential privacy: Local differential privacy (LDP) is a de facto concept to defend user privacy without any reliance on trusted third parties. It ensures that (i) the data collector never possesses the raw data; (ii) useful statistics can still be derived from privatized data. This notion opens a win-win prospect of the privacy mechanism design such that the individual privacy is preserved and also the utility for statistical analysts is not compromised. Many utility-optimized mechanisms and applications have been proposed with LDP, such as optimal mechanisms for classical estimations under minimax framework, succinct histogram estimation with low communication and a practical LDP-based technique RAPPOR. Most of them assume that all users and statistic analysts have identical privacy requirements. However, this assumption is impractical. On account of different careers and culture backgrounds, users’ privacy preferences could various for the same information in actual scenarios. For instance, location information is highly sensitive for celebrities, since some irrational fans may stalk them by this data, but it is not that risky for ordinary travelers who even inform others of their coordinates for personal safety. Combination Framework (RCF):


For histogram estimation (HE) with personalized multilevel privacy. For the first challenge, we propose a scheme Advanced Combination (AC). It selects multiple estimations from privacy levels which are accorded with the credibility of analysts, and combines them with reasonable weights. By utilizing a least square method, we solve the optimized coefficients for combination, and prove that our scheme reaches the utility loss minimization (w.r.t. `2- error). Additionally, AC is a problem-independent scheme for utility optimization and suitable for various estimations with resoluble error. To handle the second challenge (i.e., data deficiency caused by personalized privacy), we give a preprocessing strategy, named Data Recycle with Personalized Privacy (DRPP). By novelly integrating the technique of data derivation with LDP, our strategy can replenish data for a privacy level without additional information and its effectiveness on privacy and utility can be theoretically and practically validated. Utility Improvement with Data Derivation: Our design of Data Recycle with Personalized Privacy (DRPP) represents that for the first time the statistic utility is improved by independent sample expansion without extra information and samples. By integrating data derivation techniques with LDP, our strategy can derive different private versions from the one offered by a user (i.e., generate a privatized data of the level Different from the level a user chooses). It satisfies personalized LDP, and its derived versions are valid for the corresponding estimations without extra bias. DRPP enlightens a new way to optimize the utility for private estimation problems. Data Recycle with Personalized Privacy: According to the procedure of AC, the accuracy of the existing results is the bottleneck for its optimization effect. As shown in Eq. (9), one way to improve the accuracy of results estimated by PG (i.e., the components served for AC) is to add more samples to that privacy group. A naive sample expansion scheme is based on composition theorem; randomly splitting privacy budget of the level user chooses to enlarge the average data volume over all levels (marked as NE). The specific algorithm is shown in Appendix D.With no doubts, NE achieves -PVLDP. However, it is not a good mechanism in terms of utility, though the average data volume expands, because the volume expansion at some levels costs the volume


reduction at others. Our claim is proven in the following context by analyzing its theoretical error and experimental performance. In this section, inspired by the method proposed by Xiao et al. in , we propose a strategy Data Recycle with Personalized Privacy (DRPP), which generates multiple private versions by reusing the one provided by a user. It increases the sample size for a level without harming others, and stably improves the estimation accuracy.


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.