Text simplification

Short description: Automated process

Text simplification is an operation used in natural language processing to change, enhance, classify, or otherwise process an existing body of human-readable text so its grammar and structure is greatly simplified while the underlying meaning and information remain the same. Text simplification is an important area of research because of communication needs in an increasingly complex and interconnected world more dominated by science, technology, and new media. But natural human languages pose huge problems because they ordinarily contain large vocabularies and complex constructions that machines, no matter how fast and well-programmed, cannot easily process. However, researchers have discovered that, to reduce linguistic diversity, they can use methods of semantic compression to limit and simplify a set of words used in given texts.

Example

Text simplification is illustrated with an example used by Siddharthan (2006).^[1] The first sentence contains two relative clauses and one conjoined verb phrase. A text simplification system aims to change the first sentence into a group of simpler sentences, as seen just below the first sentence.

Also contributing to the firmness in copper, the analyst noted, was a report by Chicago purchasing agents, which precedes the full purchasing agents report that is due out today and gives an indication of what the full report might hold.
Also contributing to the firmness in copper, the analyst noted, was a report by Chicago purchasing agents. The Chicago report precedes the full purchasing agents report. The Chicago report gives an indication of what the full report might hold. The full report is due out today.

One approach to text simplification is lexical simplification via lexical substitution, a two-step process of first identifying complex words and then replacing them with simpler synonyms. A key challenge here is identifying complex words, which is performed by a machine learning classifier trained on labeled data. Researchers, frustrated by the problems with using the classical method of asking research subjects to describe words as either simple or complex, have discovered that they can get a higher consistency in more levels of complexity if they ask labelers to sort words presented to them in order of complexity.^[2]

References

↑ Siddharthan, Advaith (28 March 2006). "Syntactic Simplification and Text Cohesion". Research on Language and Computation 4 (1): 77–109. doi:10.1007/s11168-006-9011-1.
↑ Gooding, Sian; Kochmar, Ekaterina; Sarkar, Advait; Blackwell, Alan (August 2019). "Comparative judgments are more consistent than binary classification for labelling word complexity" (in en-us). Proceedings of the 13th Linguistic Annotation Workshop: 208–214. doi:10.18653/v1/W19-4024. https://www.aclweb.org/anthology/W19-4024/. Retrieved 22 November 2019.

Wei Xu, Chris Callison-Burch and Courtney Napoles. "Problems in Current Text Simplification Research". In Transactions of the Association for Computational Linguistics (TACL), Volume 3, 2015, Pages 283–297.
Advaith Siddharthan. "Syntactic Simplification and Text Cohesion". In Research on Language and Computation, Volume 4, Issue 1, Jun 2006, Pages 77–109, Springer Science, the Netherlands.
Siddhartha Jonnalagadda, Luis Tari, Joerg Hakenberg, Chitta Baral and Graciela Gonzalez. Towards Effective Sentence Simplification for Automatic Processing of Biomedical Text. In Proc. of the NAACL-HLT 2009, Boulder, USA, June. [1]

External links

0.00

(0 votes)

Original source: https://en.wikipedia.org/wiki/Text simplification. Read more

[1] Siddharthan, Advaith (28 March 2006). "Syntactic Simplification and Text Cohesion". Research on Language and Computation 4 (1): 77–109. doi:10.1007/s11168-006-9011-1.

[2] Gooding, Sian; Kochmar, Ekaterina; Sarkar, Advait; Blackwell, Alan (August 2019). "Comparative judgments are more consistent than binary classification for labelling word complexity" (in en-us). Proceedings of the 13th Linguistic Annotation Workshop: 208–214. doi:10.18653/v1/W19-4024. https://www.aclweb.org/anthology/W19-4024/. Retrieved 22 November 2019.

[1]

[2]

v t e Natural language processing
General terms	Natural language understanding Text corpus Speech corpus Stopwords Bag-of-words AI-complete n-gram (Bigram, Trigram)
Text analysis	Text segmentation Part-of-speech tagging Text chunking Compound term processing Collocation extraction Stemming Lemmatisation Named-entity recognition Coreference resolution Sentiment analysis Concept mining Parsing Word-sense disambiguation Ontology learning Terminology extraction Textual entailment Truecasing
Automatic summarization	Multi-document summarization Sentence extraction Text simplification
Machine translation	Computer-assisted Example-based Rule-based Neural
Automatic identification and data capture	Speech recognition Speech synthesis Optical character recognition Natural language generation
Topic model	Pachinko allocation Latent Dirichlet allocation Latent semantic analysis
Computer-assisted reviewing	Automated essay scoring Concordancer Grammar checker Predictive text Spell checker Syntax guessing
Natural language user interface	Automated online assistant Chatbot Interactive fiction Question answering Voice user interface

Anonymous

Search

Text simplification

Namespaces

More

Page actions

Contents

Example

See also

References

External links

Navigation

Navigation

Resources

Help

googletranslator

Navigation

Wiki tools

Wiki tools

Anonymous

Search

Text simplification

Example

See also

References

External links

Navigation

Wiki tools

Page tools

Other projects

Categories