Local differential privacy

From HandWiki
Revision as of 18:54, 8 February 2024 by Rtextdoc (talk | contribs) (simplify)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Local differential privacy (LDP) is a model of differential privacy with the added requirement that if an adversary has access to the personal responses of an individual in the database, that adversary will still be unable to learn much of the user's personal data. This is contrasted with global differential privacy, a model of differential privacy that incorporates a central aggregator with access to the raw data.[1] Local differential privacy (LDP) is an approach to mitigate the concern of data fusion and analysis techniques used to expose individuals to attacks and disclosures. LDP is a well-known privacy model for distributed architectures that aims to provide privacy guarantees for each user while collecting and analyzing data, protecting from privacy leaks for the client and server.[2] LDP has been widely adopted to alleviate contemporary privacy concerns in the era of big data.[3]

History

In 2003, Alexandre V. Evfimievski, Johannes Gehrke, and Ramakrishnan Srikant[4] gave a definition equivalent to local differential privacy. In 2008, Kasiviswanathan et al.[5] gave a formal definition conforming with the now-standard definition of differential privacy.

The prototypical example of a mechanism with local differential privacy is the randomized response survey technique proposed by Stanley L. Warner in 1965.[6] Warner's innovation was the introduction of the “untrusted curator” model, where the entity collecting the data may not be trustworthy. Before users' responses are sent to the curator, the answers are randomized in a controlled manner guaranteeing differential privacy while still allowing valid population-wide statistical inferences.

Applications

The era of big data exhibits high demand for machine learning services that provide privacy protection for users. Demand for such services has pushed research into algorithmic paradigms that provably satisfy specific privacy requirements.

Anomaly Detection

Anomaly detection is formally defined as the process of identifying unexpected items or events in data sets.[7] The rise of social networking in the current era has led to many potential concerns related to information privacy. As more and more users rely on social networks, users are often threatened by privacy breaches, unauthorized access to personal information, and leakage of sensitive data. To attempt to solve this issue, the authors of "Anomaly Detection over Differential Preserved Privacy in Online Social Networks" have proposed a model using a social network utilizing restricted local differential privacy. By using this model, it aims for improved privacy preservation through anomaly detection is analyzed. In this paper, the authors propose a privacy preserving model that sanitizes the collection of user information from a social network utilizing restricted local differential privacy (LDP) to save synthetic copies of collected data. This model uses reconstructed data to classify user activity and detect abnormal network behavior. The experimental results demonstrate that the proposed method achieves high data utility on the basis of improved privacy preservation. Furthermore, local differential privacy sanitized data are suitable for use in subsequent analyses such as anomaly detection. Anomaly detection on the proposed method’s reconstructed data achieves a detection accuracy similar to that on the original data.[8]

Blockchain Technology

Potential combinations of blockchain technology with local differential privacy have received research attention. Blockchains implement distributed, secured, and shared ledgers used to record and track data within a decentralized network, and they have successfully replaced certain prior systems of economic transactions within and between organizations. Increased usage of blockchains has raised some questions regarding privacy and security of data they store, and local differential privacy of various kinds has been proposed as a desirable property for blockchains containing sensitive data.[9]

Context-Free Privacy

Local differential privacy provides context-free privacy even in the absence of a trusted data collector, though often at the expense of a significant drop in utility. The classical definition of LDP assumes that all elements in the data domain are equally sensitive. However, in many applications, some symbols are more sensitive than others. A context-aware framework of local differential privacy[10] can allow a privacy designer to incorporate the application’s context into the privacy definition. For binary data domains, algorithmic research has provided a universally optimal privatization scheme and highlighted its connections to Warner’s randomized response[11] (RR) and Mangat’s improved response. For k-ary data domains, motivated by geolocation and web search applications, researchers have considered at least two special cases of context-aware LDP: block-structured LDP and high-low LDP (the latter is also defined in [12]). The research has provided communication-efficient, sample-optimal schemes and information theoretic lower bounds for both models.

Facial Recognition

Facial recognition, though convenient, can potentially lead to a leak of biometric features that identify the user

Facial recognition has become more and more widespread in recent years. Recent smartphones, for example, utilize facial recognition to unlock the users phone as well as authorize the payment with their credit card. Though this is convenient, it poses privacy concerns. It is a resource-intensive task that often involves third party users, often resulting in a gap where the user’s privacy could be compromised. Biometric information delivered to untrusted third-party servers in an uncontrolled manner can constitute a significant privacy leak as biometrics can be correlated with sensitive data such as healthcare or financial records. In Chamikara's academic article, he proposes a privacy-preserving technique for “controlled information release”, where they disguise an original face image and prevent leakage of the biometric features while identifying a person. He introduces a new privacy-preserving face recognition protocol named PEEP (Privacy using Eigenface Perturbation) that utilizes local differential privacy. PEEP applies perturbation to Eigenfaces utilizing differential privacy and stores only the perturbed data in the third-party servers to run a standard Eigenface recognition algorithm. As a result, the trained model will not be vulnerable to privacy attacks such as membership inference and model memorization attacks.[13] This model provided by Chami kara shows the potential solution of this issue or privacy leaks.

Federated Learning (FL)

With federated learning coupled with local differential privacy, researchers have found this model to be quite effective to facilitate crowdsourcing applications and provide protection for users' privacy

Federated learning has the ambition to protect data privacy through distributed learning methods that keep the data in its storage. Likewise, differential privacy (DP) attains to improve the protection of data privacy by measuring the privacy loss in the communication among the elements of federated learning. The prospective matching of federated learning and differential privacy to the challenges of data privacy protection has caused the release of several software tools that support their functionalities, but they lack a unified vision of these techniques, and a methodological workflow that supports their usage. In the study sponsored by the Andalusian Research Institute in Data Science and computational Intelligence, they developed a Sherpa.ai FL, 1,2 which is an open-research unified FL and DP framework that aims to foster the research and development of AI services at the edges and to preserve data privacy. The characteristics of FL and DP tested and summarized in the study suggests that they make them good candidates to support AI services at the edges and to preserve data privacy through their finding that by setting the value of [math]\displaystyle{ \epsilon }[/math] for lower values would guarantee higher privacy at the cost of lower accuracy.[14]

Health Data Aggregation

The rise of technology not only changes the way we work and perform our everyday lives, but also the changes to the health industry is also prominent as a result of the rise of the big data era is emphasized. The rapid growth of the health data scale, the limited storage and computation resources of wireless body area sensor networks is becoming a barrier to the development of the health industry to keep up. Aiming to solve this, the outsourcing of encrypted health data to the cloud has been an appealing strategy. However, there may come potential downsides as do all choices. The data aggregation will become more difficult and more vulnerable to data branches of this sensitive information of the patients of the healthcare industry. In his academic article, "Privacy-Enhanced and Multifunctional Health Data Aggregation under Differential Privacy Guarantees," Hao Ren and his team proposes a privacy enhanced and multifunctional health data aggregation scheme (PMHA-DP) under differential privacy. This aggregation function is designed to protect the aggregated data from cloud servers. The performance and evaluation done in their study shows that the proposal leads to less communication overhead than the existing data aggregation models currently in place.[15]

Internet Connected Vehicles

The idea of having internet in one's car would only be a dream if this concept was brought up during the last century. However, now most updated vehicles contain this feature for the convenience of the users. Though convenient, this poses yet another threat to the user's privacy. Internet of connected vehicles (IoV) are expected to enable intelligent traffic management, intelligent dynamic information services, intelligent vehicle control, etc. However, vehicles’ data privacy is argued to be a major barrier toward the application and development of IoV, thus causing a wide range of attention. Local differential privacy (LDP) is the relaxed version of the privacy standard, differential privacy, and it can protect users’ data privacy against the untrusted third party in the worst adversarial setting. The computational costs of using LDP is one concern among researchers as it is quite expensive to implement for such a specific model given that the model needs high mobility and short connection times.[16] Furthermore, as the number of vehicles increases, the frequent communication between vehicles and the cloud server incurs unexpected amounts of communication cost. To avoid the privacy threat and reduce the communication cost, researchers propose to integrate federated learning and local differential privacy (LDP) to facilitate the crowdsourcing applications to achieve the machine learning model.[17]

Phone Blacklisting

With LDP based systems, it is shown that it can counter the ever-growing population of spam calls while protecting users' privacy.

The topic of spam phone calls has been increasingly relevant, and though it has been a growing nuisance to the current digital world, researchers have been looking at potential solutions in minimizing this issue. To counter this increasingly successful attack vector, federal agencies such as the US Federal Trade Commission (FTC) have been working with telephone carriers to design systems for blocking robocalls. Furthermore, a number of commercial and smartphone apps that promise to block spam phone calls have been created, but they come with a subtle cost. The user’s privacy information that comes with giving the app the access to block spam calls may be leaked without the user’s consent or knowledge of it even occurring. In the study,[18] the researchers analyze the challenges and trade-offs related to using local differential privacy, evaluate the LDP-based system on real-world user-reported call records collected by the FTC, and show that it is possible to learn a phone blacklist using a reasonable overall privacy budget and at the same time preserve users’ privacy while maintaining utility for the learned blacklist.

Trajectory Cross-Correlation Constraint

Aiming to solve the problem of low data utilization and privacy protection, a personalized differential privacy protection method based on cross-correlation constraints is proposed by researcher Hu. By protecting sensitive location points on the trajectory and the sensitive points, this extended differential privacy protection model combines the sensitivity of the user’s trajectory location and user privacy protection requirements and privacy budget. Using autocorrelation Laplace transform, specific white noise is transformed into noise that is related to the user's real trajectory sequence in both time and space. This noise data is used to find the cross-correlation constraint mechanics of the trajectory sequence in the model. By proposing this model, the researcher Hu's personalized differential privacy protection method is broken down and addresses the issue of adding independent and uncorrelated noise and the same degree of scrambling results in low privacy protection and poor data availability.[19]

ε-local differential privacy

Definition of ε-local differential privacy

Let ε be a positive real number and [math]\displaystyle{ \mathcal{A} }[/math] be a randomized algorithm that takes a user's private data as input. Let [math]\displaystyle{ \textrm{im} \mathcal{A} }[/math] denote the image of [math]\displaystyle{ \mathcal{A} }[/math]. The algorithm [math]\displaystyle{ \mathcal{A} }[/math] is said to provide [math]\displaystyle{ \epsilon }[/math]-local differential privacy if, for all pairs of users' possible private data [math]\displaystyle{ x }[/math] and [math]\displaystyle{ x^\prime }[/math] and all subsets [math]\displaystyle{ S }[/math] of [math]\displaystyle{ \textrm{im} \mathcal{A} }[/math]:

[math]\displaystyle{ \Pr[\mathcal{A}(x) \in S] \leq e^{\epsilon} \times \Pr[\mathcal{A}(x^\prime) \in S], }[/math]

where the probability is taken over the random measure implicit in the algorithm.

The main difference between this definition of local differential privacy and the definition of standard (global) differential privacy is that in standard differential privacy the probabilities are of the outputs of an algorithm that takes all users' data and here it is on an algorithm that takes a single user's data.

Other formal definitions of local differential privacy concern algorithms that categorize all users' data as input and output a collection of all responses (such as the definition in Raef Bassily, Kobbi Nissim, Uri Stemmer and Abhradeep Guha Thakurta's 2017 paper [20]).

Deployment

Algorithms guaranteeing local differential privacy have been deployed in several internet companies:

  • RAPPOR,[21] where Google ensured local differential privacy while collecting data from users about running processes and Chrome home pages
  • Private Count Mean Sketch (and variants)[22] where Apple ensured local differential privacy while collecting emoji usage data, word usage, and other information from iPhone users

References

  1. "Local vs. global differential privacy – Ted is writing things". https://desfontain.es/privacy/local-global-differential-privacy.html. 
  2. Joseph, Matthew; Roth, Aaron; Ullman, Jonathan; Waggoner, Bo (2018-11-19). "Local Differential Privacy for Evolving Data". arXiv:1802.07128 [cs.LG].
  3. Wang, Teng; Zhang, Xuefeng; Feng, Jingyu; Yang, Xinyu (2020-12-08). "A Comprehensive Survey on Local Differential Privacy toward Data Statistics and Analysis". Sensors (Basel, Switzerland) 20 (24): 7030. doi:10.3390/s20247030. ISSN 1424-8220. PMID 33302517. Bibcode2020Senso..20.7030W. 
  4. Evfimievski, Alexandre V.; Gehrke, Johannes; Srikant, Ramakrishnan (June 9–12, 2003). "Limiting privacy breaches in privacy preserving data mining". Proceedings of the Twenty-Second ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. pp. 211–222. doi:10.1145/773153.773174. ISBN 1-58113-670-6. 
  5. Kasiviswanathan, Shiva Prasad; Lee, Homin K.; Nissim, Kobbi; Raskhodnikova, Sofya; Smith, Adam D. (2008). "What Can We Learn Privately?". 2008 49th Annual IEEE Symposium on Foundations of Computer Science. pp. 531–540. doi:10.1109/FOCS.2008.27. ISBN 978-0-7695-3436-7. 
  6. Warner, Stanley L. (1965). "Randomized Response: A Survey Technique for Eliminating Evasive Answer Bias". Journal of the American Statistical Association 60 (309): 63–69. doi:10.1080/01621459.1965.10480775. PMID 12261830. 
  7. "Medium" (in en). 2 July 2019. https://towardsdatascience.com/anomaly-detection-for-dummies-15f148e559c1?gi=709c2cece27e. 
  8. Aljably, Randa; Tian, Yuan; Al-Rodhaan, Mznah; Al-Dhelaan, Abdullah (2019-04-25). "Anomaly detection over differential preserved privacy in online social networks". PLOS ONE 14 (4): e0215856. doi:10.1371/journal.pone.0215856. ISSN 1932-6203. PMID 31022238. Bibcode2019PLoSO..1415856A. 
  9. Ul Hassan, Muneeb; Rehmani, Mubashir Husain; Chen, Jinjun (2020-11-01). "Differential privacy in blockchain technology: A futuristic approach" (in en). Journal of Parallel and Distributed Computing 145: 50–74. doi:10.1016/j.jpdc.2020.06.003. ISSN 0743-7315. https://www.sciencedirect.com/science/article/abs/pii/S0743731520303105. 
  10. Acharya, Jayadev; Bonawitz, Kallista; Kairouz, Peter; Ramage, Daniel; Sun, Ziteng (2020-11-21). "Context Aware Local Differential Privacy" (in en). International Conference on Machine Learning (PMLR): 52–62. http://proceedings.mlr.press/v119/acharya20a.html. 
  11. Kim, Jong-Min; Warde, William D. (2004-02-15). "A stratified Warner's randomized response model" (in en). Journal of Statistical Planning and Inference 120 (1–2): 155–165. doi:10.1016/S0378-3758(02)00500-1. ISSN 0378-3758. https://www.sciencedirect.com/science/article/abs/pii/S0378375802005001. 
  12. Murakami, Takao; Kawamoto, Yusuke (2019). "Utility-Optimized Local Differential Privacy Mechanisms for Distribution Estimation" (in en). Proceedings of the 28th USENIX Security Symposium: 1877–1894. https://www.usenix.org/system/files/sec19-murakami.pdf. 
  13. Chamikara, M.A.P.; Bertok, P.; Khalil, I.; Liu, D.; Camtepe, S. (2020-10-01). "Privacy Preserving Face Recognition Utilizing Differential Privacy" (in en). Computers & Security 97: 101951. doi:10.1016/j.cose.2020.101951. ISSN 0167-4048. https://www.sciencedirect.com/science/article/pii/S0167404820302273. 
  14. Rodríguez-Barroso, Nuria; Stipcich, Goran; Jiménez-López, Daniel; Antonio Ruiz-Millán, José; Martínez-Cámara, Eugenio; González-Seco, Gerardo; Luzón, M. Victoria; Veganzones, Miguel Ángel et al. (2020). "Federated Learning and Differential Privacy: Software Tools Analysis, the Sherpa.ai FL Framework and Methodological Guidelines for Preserving Data Privacy.". Information Fusion 64: 270–92. doi:10.1016/j.inffus.2020.07.009. 
  15. Ren, Hao; Li, Hongwei; Liang, Xiaohui; He, Shibo; Dai, Yuanshun; Zhao, Lian (2016-09-10). "Privacy-Enhanced and Multifunctional Health Data Aggregation under Differential Privacy Guarantees". Sensors (Basel, Switzerland) 16 (9): 1463. doi:10.3390/s16091463. ISSN 1424-8220. PMID 27626417. Bibcode2016Senso..16.1463R. 
  16. Zhao, Ping; Zhang, Guanglin; Wan, Shaohua; Liu, Gaoyang; Umer, Tariq (2020-11-01). "A survey of local differential privacy for securing internet of vehicles". The Journal of Supercomputing 76 (11): 8391–8412. doi:10.1007/s11227-019-03104-0. https://www.researchgate.net/publication/337842824. 
  17. Zhao, Yang; Zhao, Jun; Yang, Mengmeng; Wang, Teng; Wang, Ning; Lyu, Lingjuan; Niyato, Dusit; Lam, Kwok-Yan (2020-11-10). "Local Differential Privacy based Federated Learning for Internet of Things". IEEE Internet of Things Journal PP (11): 8836–8853. doi:10.1109/JIOT.2020.3037194. https://www.researchgate.net/publication/346833990. 
  18. Ucci, Daniele; Perdisci, Roberto; Lee, Jaewoo; Ahamad, Mustaque (2020-06-01). "Towards a Practical Differentially Private Collaborative Phone Blacklisting System". Annual Computer Security Applications Conference. pp. 100–115. doi:10.1145/3427228.3427239. ISBN 978-1-4503-8858-0. 
  19. Hu, Zhaowei; Yang, Jing (2020-08-12). "Differential privacy protection method based on published trajectory cross-correlation constraint" (in en). PLOS ONE 15 (8): e0237158. doi:10.1371/journal.pone.0237158. ISSN 1932-6203. PMID 32785242. Bibcode2020PLoSO..1537158H. 
  20. Bassily, Raef; Nissim, Kobbi; Stemmer, Uri; Thakurta, Abhradeep Guha (2017). "Privacy Aware Learning". Practical Locally Private Heavy Hitters. Advances in Neural Information Processing Systems. 30. pp. 2288–2296. Bibcode2017arXiv170704982B. http://papers.nips.cc/paper/4505-privacy-aware-learning. 
  21. Erlingsson, Úlfar; Pihur, Vasyl; Korolova, Aleksandra (2014). RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response. doi:10.1145/2660267.2660348. Bibcode2014arXiv1407.6981E. https://ai.google/research/pubs/pub42852. 
  22. Learning with Privacy at Scale. 2017. https://machinelearning.apple.com/2017/12/06/learning-with-privacy-at-scale.html.