Software:Apache cTAKES

Apache cTAKES
Developer(s)	Apache Software Foundation
Stable release	4.0.0.1 / January 20, 2021; 3 years ago
Repository	cTakes Repository
Written in	Java, Scala
Operating system	Cross-platform
Type	Natural language processing, Bioinformatics, Text mining, Information Extraction
License	Apache License 2.0
Website	ctakes.apache.org

Short description: Natural language processing system

Apache cTAKES: clinical Text Analysis and Knowledge Extraction System is an open-source Natural Language Processing (NLP) system that extracts clinical information from electronic health record unstructured text. It processes clinical notes, identifying types of clinical named entities — drugs, diseases/disorders, signs/symptoms, anatomical sites and procedures. Each named entity has attributes for the text span, the ontology mapping code, context (family history of, current, unrelated to patient), and negated/not negated.^[1]

cTAKES was built using the UIMA Unstructured Information Management Architecture framework and OpenNLP natural language processing toolkit.^[2]^[3]

Components

Components of cTAKES are specifically trained for the clinical domain, and create rich linguistic and semantic annotations that can be utilized by clinical decision support systems and clinical research.^[4]

These components include:

Named Section identifier
Sentence boundary detector
Rule-based tokenizer
Formatted list identifier
Normalizer
Context dependent tokenizer
Part-of-speech tagger
Phrasal chunker
Dictionary lookup annotator
Context annotator
Negation detector
Uncertainty detector
Subject detector
Dependency parser
patient smoking status identifier
Drug mention annotator

History

Development of cTAKES began at the Mayo Clinic in 2006. The development team, led by Dr. Guergana Savova and Dr. Christopher Chute, included physicians, computer scientists and software engineers. After its deployment, cTAKES became an integral part of Mayo's clinical data management infrastructure, processing more than 80 million clinical notes.^[5]

When Dr. Savova's moved to Boston Children's Hospital in early 2010, the core development team grew to include members there. Further external collaborations include:^[5]

Such collaborations have extended cTAKES' capabilities into other areas such as Temporal Reasoning, Clinical Question Answering, and coreference resolution for the clinical domain.^[5]

In 2010, cTAKES was adopted by the i2b2 program and is a central component of the SHARP Area 4.^[5]

In 2013, cTAKES released their first release as an Apache incubator project: cTAKES 3.0.^{[citation needed]}

In March 2013, cTAKES became an Apache Top Level Project (TLP).^[5]

References

↑ Denecke, Kerstin (2015-08-31). "Tools and Resources for Information Extraction". Health Web Science: Social Media Data for Healthcare. Springer. p. 67. ISBN 978-3-319-20582-3. https://books.google.com/books?id=yVp4CgAAQBAJ.
↑ Khalifa, Abdulrahman; Meystre, Stéphane (2015-12-01). "Adapting existing natural language processing resources for cardiovascular risk factors identification in clinical notes". Journal of Biomedical Informatics. Proceedings of the 2014 i2b2/UTHealth Shared-Tasks and Workshop on Challenges in Natural Language Processing for Clinical Data 58 (Supplement): S128–S132. doi:10.1016/j.jbi.2015.08.002. PMID 26318122.
↑ Khudairi, Sally (2017-04-25). "The Apache Software Foundation Announces Apache® cTAKES™ v4.0" (Press release). Forest Hill, MD: The Apache Software Foundation. Globe Newswire. Retrieved 2017-09-20.
↑ Savova, Guergana K; Masanz, James J; Ogren, Philip V; Zheng, Jiaping; Sohn, Sunghwan; Kipper-Schuler, Karin C; Chute, Christopher G (2010). "Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications". Journal of the American Medical Informatics Association 17 (5): 507–513. doi:10.1136/jamia.2009.001560. ISSN 1067-5027. PMID 20819853.
↑ ^5.0 ^5.1 ^5.2 ^5.3 ^5.4 "History". 2015-06-22. http://ctakes.apache.org/history.html.

External links

cTAKES Official Website
Apache cTAKES Project Information page from ASF
Abstract (JAMIA)
Open Health Natural Language Processing (OHNLP) Consortium
Strategic Health IT Advanced Research Projects (SHARP) Program
SHARP Area 4 - Secondary Use of EHR Data
The Automated Retrieval Console (ARC)
Health Information Text Extraction (HITEx)) was developed as part of the i2b2 project. It is a rule-based NLP pipeline based on the GATE framework developed by Informatics for Integrating Biology and the Bedside.
Computational Language and Education Research toolkit (cleartk) (No longer maintained) has been developed at the University of Colorado at Boulder, and provides a framework for developing statistical NLP components in Java. It is built on top of Apache UIMA.
NegEx - is a tool developed at the University of Pittsburgh to detect negated terms from clinical text. The system utilizes trigger terms as a method to determine likely negation scenarios within a sentence.
ConText): an extension to NegEx, and is also developed by the University of Pittsburgh. ConText extends NegEx to not only detect negated concepts, but to also find temporal (recent, historical or hypothetical scenarios) and who the Subject (of experience) is (patient or other).
MetaMap (by United States National Library of Medicine): is a comprehensive concept tagging system which is built on top of the Unified Medical Language System. It requires an active UMLS Metathesaurus License Agreement (and account) for use.
MedEx - a tool for extraction medication information from clinical text. MedEx processes free-text clinical records to recognize medication names and signature information, such as drug dose, frequency, route, and duration. Use is free with a UMLS license. It is a standalone application for Linux and Windows.
SecTag (section tagging hierarchy): recognizes note section headers using NLP, Bayesian, spelling correction, and scoring techniques. Use is free with either a UMLS or LOINC license.
(Stanford Named Entity Recognizer (NER)): Stanford’s NER is a Conditional Random Field sequence model, together with well-engineered features for Named Entity Recognition in English and German.
(Stanford CoreNLP) is an integrated suite of natural language processing tools for English in Java, including tokenization, part-of-speech tagging, named entity recognition, parsing, and coreference.

0.00

(0 votes)

Original source: https://en.wikipedia.org/wiki/Apache cTAKES. Read more

[1] Denecke, Kerstin (2015-08-31). "Tools and Resources for Information Extraction". Health Web Science: Social Media Data for Healthcare. Springer. p. 67. ISBN 978-3-319-20582-3. https://books.google.com/books?id=yVp4CgAAQBAJ.

[2] Khalifa, Abdulrahman; Meystre, Stéphane (2015-12-01). "Adapting existing natural language processing resources for cardiovascular risk factors identification in clinical notes". Journal of Biomedical Informatics. Proceedings of the 2014 i2b2/UTHealth Shared-Tasks and Workshop on Challenges in Natural Language Processing for Clinical Data 58 (Supplement): S128–S132. doi:10.1016/j.jbi.2015.08.002. PMID 26318122.

[3] Khudairi, Sally (2017-04-25). "The Apache Software Foundation Announces Apache® cTAKES™ v4.0" (Press release). Forest Hill, MD: The Apache Software Foundation. Globe Newswire. Retrieved 2017-09-20.

[4] Savova, Guergana K; Masanz, James J; Ogren, Philip V; Zheng, Jiaping; Sohn, Sunghwan; Kipper-Schuler, Karin C; Chute, Christopher G (2010). "Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications". Journal of the American Medical Informatics Association 17 (5): 507–513. doi:10.1136/jamia.2009.001560. ISSN 1067-5027. PMID 20819853.

[cTAKES_history-5] 5.0 ^5.1 ^5.2 ^5.3 ^5.4 "History". 2015-06-22. http://ctakes.apache.org/history.html.

[1]

[2]

[3]

[4]

[5]

v t e Apache Software Foundation
Top-level projects	Accumulo ActiveMQ Airflow Ambari Ant Apex Aries Apache HTTP Server APR Avro Axis Axis2 Beam Bloodhound Brooklyn Buildr Calcite Camel CarbonData Cassandra Cayenne Chemistry CloudStack Cocoon Cordova CouchDB cTAKES CXF Derby Directory Drill Empire-db Felix Flex Flink Flume Forrest Geronimo Giraph Gump Hadoop Hama HBase Helix Hive Impala Jackrabbit James Jini JMeter Kafka Karaf Kudu Kylin Lucene Mahout Marmotta Maven MINA mod perl MyFaces NetBeans Nutch ODE OFBiz Oozie OpenEJB OpenJPA OpenNLP OрenOffice ORC PDFBox Parquet Phoenix POI Pig Pivot Qpid Roller RocketMQ Samza ServiceMix Shiro SINGA Sling Solr Spark Stanbol Storm SpamAssassin Sqoop Struts 1 Struts 2 Subversion SystemML Tapestry Thrift Tika Tomcat Trafodion Traffic Server UIMA Velocity Wicket Xalan Xerces Yetus ZooKeeper
Commons	BCEL BSF Daemon Jelly Logging
Incubator	MXNet Taverna XAP
Other projects	Batik Chainsaw FOP Ivy Log4j
Attic	Abdera AxKit Beehive Bluesky iBATIS Cactus Click Continuum Deltacloud Etch Excalibur Harmony HiveMind Jakarta Lenya Shale Shindig Slide stdcxx Tuscany Wave Wink XMLBeans
Licenses	Apache License
Category

Anonymous

Search

Software:Apache cTAKES

Namespaces

More

Page actions

Contents

Components

History

See also

References

External links

Navigation

Navigation

Help

Translate

Wiki tools

Wiki tools

Anonymous

Search

Software:Apache cTAKES

Components

History

See also

References

External links

Navigation

Wiki tools

Page tools

Other projects

Categories