Adversarial stylometry

From HandWiki
Short description: Textual anonymisation techniques


Adversarial stylometry is the practice of altering writing style to reduce the potential for stylometry to discover the author's identity or their characteristics. This task is also known as authorship obfuscation or authorship anonymisation. Stylometry poses a significant privacy challenge in its ability to unmask anonymous authors or to link pseudonyms to an author's other identities, which, for example, creates difficulties for whistleblowers, activists, and hoaxers and fraudsters. The privacy risk is expected to grow as machine learning techniques and text corpora develop.

All adversarial stylometry shares the core idea of faithfully paraphrasing the source text so that the meaning is unchanged but the stylistic signals are obscured. Such a faithful paraphrase is an adversarial example for a stylometric classifier. Several broad approaches to this exist, with some overlap: imitation, substituting the author's own style for another's; translation, applying machine translation with the hope that this eliminates characteristic style in the source text; and obfuscation, deliberately modifying a text's style to make it not resemble the author's own.

Manually obscuring style is possible, but laborious; in some circumstances, it is preferable or necessary. Automated tooling, either semi- or fully-automatic, could assist an author. How best to perform the task and the design of such tools is an open research question. While some approaches have been shown to be able to defeat particular stylometric analyses, particularly those that do not account for the potential of adversariality, establishing safety in the face of unknown analyses is an issue. Ensuring the faithfulness of the paraphrase is a critical challenge for automated tools.

It is uncertain if the practice of adversarial stylometry is detectable in itself. Some studies have found that particular methods produced signals in the output text, but a stylometrist who is uncertain of what methods may have been used may not be able to reliably detect them.

History

(Rao Rohatgi), an early work in adversarial stylometry,[1] identified machine translation as a possibility, but noted that the quality of translators available at the time presented severe challenges.[2] (Kacmarcik Gamon) is another early work. (Brennan Afroz) performed the first evaluation of adversarial stylometric methods on actual texts.[1]

(Brennan Greenstadt) introduced the first corpus of adversarially authored texts specifically for evaluating stylometric methods;[3] other corpora include the International Imitation Hemingway Competition, the Faux Faulkner contest, and the hoax blog A Gay Girl in Damascus.[4]

Motivations

(Rao Rohatgi) suggest that short, unattributed documents (i.e., anonymous posts) are not at risk of stylometric identification, but pseudonymous authors who have not practiced adversarial stylometry in producing corpuses of thousands of words may be vulnerable.[5] (Narayanan Paskov) attempted large-scale deanonymisation of 100,000 blog authors with mixed results: the identifications were significantly better than chance, but only accurately matched the blog and author a fifth of the time;[6] identification improved with the number of posts written by the author in the corpus.[7] Even if an author is not identified, some of their characteristics may still be deduced stylometrically,[8] or stylometry may narrow the anonymity set of potential authors sufficiently for other information to complete the identification.[7] Detecting author characteristics (e.g., gender or age) is often simpler than identifying an author from a large, possibly open, set of candidates.[9]

Modern machine learning techniques offer powerful tools for identification;[10] further development of corpora and computational stylometric techniques are likely to raise further privacy issues.[11] (Gröndahl Asokan) say that the general validity of the hypothesis underlying stylometry—that authors have invariant, content-independent 'style fingerprints'—is uncertain, but "the deanonymisation attack is a real privacy concern".[12]

Those interested in practicing adversarial stylometry and stylistic deception include whistleblowers avoiding retribution;[13] journalists and activists;[10] perpetrators of frauds and hoaxes;[14] authors of fake reviews;[15] literary forgers;[16] criminals disguising their identity from investigators;[17] and, generally, anyone with a desire for anonymity or pseydonymity.[13] Authors, or agents acting on behalf of authors, may also attempt to remove stylistic clues to author characteristics (e.g., race or gender) so that knowledge of those characteristics cannot be used for discrimination (e.g., through algorithmic bias).[18][19] Another possible use for adversarial stylometry is in disguising automatically generated text as human-authored.[20]

Methods

With imitation, the author attempts to mislead stylometry by matching their style to another author's.[21] An incomplete imitation, where some of the true author's unique characteristics appear alongside the imitated author's, can be a detectable signal for the use of adversarial stylometry.[22] Imitation can be performed automatically with style transfer systems, though this typically requires a large corpus in the target style for the system to learn from.[23]

Another approach is translation, which employs machine translation of a source text to eliminate characteristic style, often through multiple translators in sequence to produce a round-trip translation. Such chained translation can lead to texts being significantly altered, even to the point of incomprehensibility; improved translation tools reduce this risk. More simply-structured texts can be easier to machine translate without losing the original meaning.[21] Machine translation blurs into direct stylistic imitation or obfuscation achieved through automated style transfer, which can be viewed as a "translation" with the same language as input and output.[24][25] With low-quality translation tools, an author can be required to manually correct major translation errors while avoiding the hazard of re-introducing stylistic characteristics.[2] (Wang Juola) found that gross errors introduced by Google Translate were rare, but more common with several intermediate translations—however, occasional simple or short sentences and misspellings in the source text appeared verbatim in the output, potentially providing an identifying signal.[26] Chain translation can leave characteristic traces of its application in a document, which may allow reconstruction of the intermediate languages used and the number of translation steps performed.[23]

Obfuscation involves deliberately changing the style of a text to reduce its similarity to other texts by some metric; this may be performed at the time of writing by conscious modification, or as part of a revision process with feedback from the metric being targeted as an input to decide when the text has been sufficiently obfuscated. In contrast to translation, complex texts can offer more opportunities for effective obfuscation without altering meaning,[27] and likewise genres with more permissible variation allow more obfuscation.[28] However, longer texts are harder to thoroughly obfuscate.[29] Obfuscation can blend into imitation if the author develops a novel target style, distinct from their original style.[30] With respect to masking author characteristics, obfuscation may aim to achieve a union (adding signals for imitated characteristics) or an intersection (removing signals and normalising) of other authors' styles.[31] Avoiding the author's own idiosyncrasies and producing a "normalised" text is a critical obfuscatory step: an author may have a unique tendency to misspell certain words, use particular variants, or to format a document in a characteristic way.[2][32] Stylometric signals vary in how simply they can be adversarially masked; an author may easily change their vocabulary by conscious choice, but altering the pattern of grammar or the letter frequency in their text may be harder to achieve, though (Juola Vescovi) report that imitation typically succeeds at masking more characteristics than obfuscation.[33] Automated obfuscation may require large amounts of training data written by the author.[29]

Concerning automated implementations of adversarial stylometry, two possible implementations are rule-based systems for paraphrasing; and encoder–decoder architectures, where the text passes through an intermediate format that is (intended to be) style-neutral.[34] Another division in automated methods is whether there is feedback from an identification system or not.[35] With such feedback, finding paraphrases for author masking has been characterised as a heuristic search problem, exploring textual variants until the result is stylistically sufficiently far (in the case of obfuscation) or near (in the case of imitation), which then constitutes an adversarial example for that identification system.[36][37]

Evaluation

How to best mask stylometric characteristics in practice, and what tasks to perform manually, what with tool assistance, and what fully automatically, is an open field of research, especially in short documents with limited potential variability.[38][11] Manual adversarial stylometry can be preferred or even required if the author does not trust available computers with the task (as may be the case for a whistleblower, for example).[23] Software tools require maintenance; (Wang Juola) report that there is no maintained obfuscatory software suitable for general use.[39] (Zhai Rusert) identify DS-PAN (Castro-Castro Ortega Bueno) and Mutant-X (Mahmood Ahmad) as the 2022 state of the art in automated obfuscation.[40] Manual stylistic modulation is a significant effort, with poor scalability properties; tool assistance can reduce the burden to varying degrees.[41] Deterministic automated methods can lose effectiveness against a classifier trained adversarially, where output from the style transfer program is used in the classifier's training set.[42]

(Potthast Hagen) give three criteria for use in evaluation of adversarial stylometry methods: safety, meaning that stylistic characteristics are reliably eliminated; soundness, meaning that the semantic content of the text is not unacceptably altered; and sensible, meaning that the output is "well-formed and inconspicuous". Compromising any too deeply is typically an unacceptable result, and the three trade off against each other in practice.[43] (Potthast Hagen) find that automatically evaluating sensibility, and specifically whether output is acceptably grammatical and well-formed, is difficult;[44] automated evaluation of soundness is somewhat more promising, but manual review is the best method.[45]

Despite safety being an important property of an adversarial stylometry method, it can still be usefully traded away if the conceded stylometric identification potential is otherwise possible by non-stylometric analysis—for example, an author discussing their own upbringing in Britain is unlikely to care if stylometry can reveal that their text is typical of British English.[46][47]

Evaluating the safety of different approaches is complicated by how identification-resistance fundamentally depends on the methods of identification under consideration.[48] The property of being resilient to unknown analyses is called transferability.[49] (Gröndahl Asokan) identify four different threat models for authors, varying with their knowledge of how their text will be analysed and what training data will be used: query access, with the weakest analyst and the strongest author who knows both the methods of analysis and the training data; architecture access, where the author knows the analysis methods but not the training data; data access, where the author knows the training data but not the analysis methods; and surrogate access, with the weakest author and the strongest analyst, where the author does not know the methods of analysis nor the training data.[34] Further, when an author chooses a method, they must rely on their threat model and trust that it is valid, and that unknown analyses able to detect remaining stylistic signals cannot or will not be performed, or that the masking successfully transfers;[50] a stylometrist with knowledge of how the author attempted to mask their style, however, may be able to exploit some weakness in the method and render it unsafe.[51] Much of the research into automated methods has assumed that the author has query access, which may not generalise to other settings.[52] Masking methods that internally use an ensemble of different analyses as a model for its adversary may transfer better against unseen analyses.[35]

A thorough soundness loss defeats the purpose of communication, though some degree of meaning change may be tolerable if the core message is preserved; requiring only textual entailment or allowing automatic summarisation are other options to lose some meaning in a possibly-tolerable way.[53] Rewriting an input text to defeat stylometry, as opposed to consciously removing stylistic characteristics during composition, poses challenges in retaining textual meaning.[54] (Gröndahl Asokan) assess the problem of unsoundness as "the most important challenge" for research into fully automatic approaches.[11]

For sensibility, if a text is so ungrammatical as to be incomprehensible or so ill-formed that it cannot fit in to its genre then the method has failed, but compromises short of that point may be useful.[44] If inconspicuity is partially lost, then there is the possibility that more expensive and less scalable analyses will be performed (e.g., consulting a forensic linguist) to confirm suspicions or gather further evidence.[55] The impact of a total inconspicuity failure varies depending on the motivation for performing adversarial stylometry: for someone simply attempting to stay anonymous (e.g., a whistleblower), detection may not be an issue; for a literary forger, however, detection would be disastrous.[16] Adversarial stylometry can leave evidence of its practice, which is an inconspicuity failure.[56][57] In the Brennan–Greenstadt corpus, the texts have been found to share a common "style" of their own.[58] However, (Gröndahl Asokan) assess existing evidence as insufficient to prove that adversarial stylometry is always detectable, with only limited methods having been studied.[59] Improving the smoothness of the output text may reduce the detectability of automated tools.[60] The overall detectability of adversarial authorship has not been thoroughly studied; if the methods available to be used by the author are unknown to the stylometrist, it may be impossible.[11]

The problems of author identification and verification in an adversarial setting are greatly different from recognising naïve or cooperative authors.[61] Deliberate attempts to mask authorship are described by (Juola Vescovi) as a "problem for the current state of stylometric art",[62] and (Brennan Afroz) state that, despite stylometry's high performance in identifying non-adversarial authors, manual application of adversarial methods render it unreliable.[63]

(Kacmarcik Gamon) observe that low-dimensional stylometric models which operate on small numbers of features are less resistant to adversarial stylometry.[64] Research has found that authors vary in how well they are able to modulate their style, with some able to successfully perform the task even without training.[39] (Wang Juola), a replication and reproduction of (Brennan Afroz), found that all three of imitation, translation and obfuscation meaningfully reduced the effectiveness of authorship attribution, with manual obfuscation being somewhat more effective than manual imitation or translation, which performed similarly to each other; the original study found that imitation was superior.[65] (Potthast Hagen) reported that even simple automated methods of adversarial stylometry caused major difficulties for state-of-the-art authorship identification systems, though at significant soundness and sensibility cost.[66] Adversarially-aware identification systems can perform much better against adversarial stylometry provided that they know which potential obfuscation methods were used, even if the identifier makes mistakes in analysing which anonymisation method was used.[67]

See also

References

  1. 1.0 1.1 Brennan, Afroz & Greenstadt 2012, p. 3-4.
  2. 2.0 2.1 2.2 Kacmarcik & Gamon 2006, p. 445.
  3. Juola & Vescovi 2011, p. 117.
  4. Afroz, Brennan & Greenstadt 2012, p. 466.
  5. Rao & Rohatgi 2000, 1.3 Contributions.
  6. Gröndahl & Asokan 2020a, p. 19.
  7. 7.0 7.1 Narayanan et al. 2012, p. 301.
  8. Emmery, Kádár & Chrupała 2021, p. 2388.
  9. Shetty, Schiele & Fritz 2018, 1 Introduction.
  10. 10.0 10.1 Mahmood et al. 2019, p. 54.
  11. 11.0 11.1 11.2 11.3 Gröndahl & Asokan 2020a, p. 28.
  12. Gröndahl & Asokan 2020a, p. 3.
  13. 13.0 13.1 Kacmarcik & Gamon 2006, p. 444.
  14. Afroz, Brennan & Greenstadt 2012, p. 461.
  15. Gröndahl & Asokan 2020a, p. 4.
  16. 16.0 16.1 Potthast, Hagen & Stein 2016, p. 5.
  17. Juola & Vescovi 2011, p. 115.
  18. Xu et al. 2019, p. 247.
  19. Mireshghallah & Berg-Kirkpatrick 2021, p. 2009.
  20. Uchendu, Le & Lee 2022, p. 1.
  21. 21.0 21.1 Neal et al. 2018, p. 6.
  22. Kacmarcik & Gamon 2006, p. 446.
  23. 23.0 23.1 23.2 Wang, Juola & Riddell 2022, p. 2.
  24. Adelani et al. 2021, p. 8687.
  25. Wang, Juola & Riddell 2022, p. 8.
  26. Neal et al. 2018, p. 6-7.
  27. Neal et al. 2018, p. 26.
  28. 29.0 29.1 Mahmood et al. 2019, p. 55.
  29. Afroz, Brennan & Greenstadt 2012, p. 471.
  30. Mireshghallah & Berg-Kirkpatrick 2021, p. 2009-2010.
  31. Rao & Rohatgi 2000, 5 Future Directions.
  32. Juola & Vescovi 2011, p. 121-123.
  33. 34.0 34.1 Gröndahl & Asokan 2020b, p. 177.
  34. 35.0 35.1 Haroon et al. 2021, p. 1.
  35. Bevendorff et al. 2019, p. 1098.
  36. Saedi & Dras 2020, p. 181.
  37. Neal et al. 2018, p. 27.
  38. 39.0 39.1 Wang, Juola & Riddell 2022, p. 3.
  39. Zhai et al. 2022, p. 7374.
  40. Gröndahl & Asokan 2020a, p. 21-22.
  41. Gröndahl & Asokan 2020b, p. 176.
  42. Potthast, Hagen & Stein 2016, p. 6.
  43. 44.0 44.1 Potthast, Hagen & Stein 2016, p. 12-13.
  44. Potthast, Hagen & Stein 2016, p. 11.
  45. Almishari, Oguz & Tsudik 2014, p. 6.
  46. Xu et al. 2019, p. 247-248.
  47. Kacmarcik & Gamon 2006, p. 448.
  48. Haroon et al. 2021, p. 3.
  49. Emmery, Kádár & Chrupała 2021, p. 2388-2389.
  50. Potthast, Hagen & Stein 2016, p. 9-10.
  51. Gröndahl & Asokan 2020b, p. 189.
  52. Potthast, Hagen & Stein 2016, p. 11-12.
  53. McDonald et al. 2012, 7.1 Further Work.
  54. Potthast, Hagen & Stein 2016, p. 13.
  55. Mahmood, Shafiq & Srinivasan 2020, p. 2235.
  56. Afroz, Brennan & Greenstadt 2012, p. 462.
  57. Juola 2012, p. 93-94.
  58. Gröndahl & Asokan 2020a, p. 2.
  59. Mahmood, Shafiq & Srinivasan 2020, p. 2243.
  60. Afroz, Brennan & Greenstadt 2012, p. 464.
  61. Juola & Vescovi 2011, p. 123.
  62. Brennan, Afroz & Greenstadt 2012, p. 2.
  63. Kacmarcik & Gamon 2006, p. 451.
  64. Wang, Juola & Riddell 2022, p. 7-8.
  65. Potthast, Hagen & Stein 2016, p. 21.
  66. Zhai et al. 2022, p. 7373.

Bibliography