ChatScript

From HandWiki
Revision as of 20:44, 6 February 2024 by LinuxGuru (talk | contribs) (linkage)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

ChatScript is a combination Natural Language engine and dialog management system designed initially for creating chatbots, but is currently also used for various forms of NL processing. It is written in C++. The engine is an open source project at SourceForge.[1] and GitHub.[2] ChatScript was written by Bruce Wilcox and originally released in 2011, after Suzette (written in ChatScript) won the 2010 Loebner Prize, fooling one of four human judges.[3]

Features

In general ChatScript aims to author extremely concisely, since the limiting scalability of hand-authored chatbots is how much/fast one can write the script.

Because ChatScript is designed for interactive conversation, it automatically maintains user state across volleys. A volley is any number of sentences the user inputs at once and the chatbots response.

The basic element of scripting is the rule. A rule consists of a type, a label (optional), a pattern, and an output. There are three types of rules. Gambits are something a chatbot might say when it has control of the conversation. Rejoinders are rules that respond to a user remark tied to what the chatbot just said. Responders are rules that respond to arbitrary user input which is not necessarily tied to what the chatbot just said. Patterns describe conditions under which a rule may fire. Patterns range from extremely simplistic to deeply complex (analogous to Regex but aimed for NL). Heavy use is typically made of concept sets, which are lists of words sharing a meaning. ChatScript contains some 2000 predefined concepts and scripters can easily write their own. Output of a rule intermixes literal words to be sent to the user along with common C-style programming code.

Rules are bundled into collections called topics. Topics can have keywords, which allows the engine to automatically search the topic for relevant rules based on user input.

Example code

Topic: ~food(  ~fruit  fruit food eat)

t: What is your favorite food?
    a: (~fruit) I like fruit also.
    a: (~metal) I prefer listening to heavy metal music rather than eating it.

?:  WHATMUSIC ( << what music you ~like >>) I prefer rock music.
s: ( I * ~like * _~music_types)    ^if (_0 == country) {I don't like country.} else {So do I.}

Words starting with ~ are concept sets. For example, ~fruit is the list of all known fruits. The simple pattern (~fruit) reacts if any fruit is mentioned immediately after the chatbot asks for favorite food. The slightly more complex pattern for the rule labelled WHATMUSIC requires all the words what, music, you and any word or phrase meaning to like, but they may occur in any order. Responders come in three types.  ?: rules react to user questions. s: rules react to user statements. u: rules react to either.

ChatScript code supports standard if-else, loops, user-defined functions and calls, and variable assignment and access.

Data

Some data in ChatScript is transient, meaning it will disappear at the end of the current volley. Other data is permanent, lasting forever until explicitly killed off. Data can be local to a single user or shared across all users at the bot level.

Internally all data is represented as text and is automatically converted to a numeric form as needed.

Variables

User variables come in several kinds. Variables purely local to a topic or function are transient. Global variables can be declared as transient or permanent. A variable is generally declared merely by using it, and its type depends on its prefix ($, $$, $_).

$_local  = 1			is a local transient variable being assigned a 1
$$global1.value = “hi”	is a transient global variable which is a JSON object
$global2 += 20		is a permanent global variable

Facts

In addition to variables, ChatScript supports facts – triples of data, which can also be transient or permanent. Functions can query for facts having particular values of some of the fields, making them act like an in-memory database. Fact retrieval is very quick and efficient the number of available in-memory facts is largely constrained to the available memory of the machine running the ChatScript engine. Facts can represent record structures and are how ChatScript represents JSON internally. Tables of information can be defined to generate appropriate facts.

table: ~inventors(^who ^what)
createfact(^who invent ^what)
DATA:
"Johannes Gutenberg" "printing press"
"Albert Einstein" ["Theory of Relativity" photon "Theory of General Relativity"]

The above table links people to what they invented (1 per line) with Einstein getting a list of things he did.

External communication

ChatScript embeds the Curl library and can directly read and write facts in JSON to a website.

Server

A ChatScript engine can run in local or server mode.

Pos-tagging, parsing, and ontology

ChatScript comes with a copy of English WordNet embedded within, including its ontology, and creates and extends its own ontology via concept declarations. It has an English language pos-tagger and parser and supports integration with TreeTagger for pos-tagging a number of other languages (TreeTagger commercial license required).

Databases

In addition to an internal fact database, ChatScript supports PostgreSQL, MySQL, MSSQL and MongoDB both for access by scripts, but also as a central filesystem if desired so ChatScript can be scaled horizontally. A common use case is to use a centralized database to host the user files and multiple servers to scale the ChatScript engine.

JavaScript

ChatScript also embeds DukTape, ECMAScript E5/E5.1 compatibility, with some semantics updated from ES2015+.

Spelling Correction

ChatScript has built-in automatic spell checking, which can be augmented in script as both simple word replacements or context sensitive changes. With appropriate simple rules you can change perfect legal words into other words or delete them. E.g., if you have a concept of ~electronic_goods and don't want an input of Radio Shack (a store name) to be detected as an electronic good, you can get the input to change to Radio_Shack (a single word), or allow the words to remain but block the detection of the concept.

This is particularly useful when combined with speech-to-text code that is imperfect, but you are familiar with common failings of it and can compensate for them in script.

Control flow

A chatbot's control flow is managed by the control script. This is merely another ordinary topic of rules, that invokes API functions of the engine. Thus control is fully configurable by the scripter (and functions exist to allow introspection into the engine). There are pre-processing control flow and post-processing control flow options available, for special processing.

References

  1. ChatScript, SourceForge
  2. [1], GitHub
  3. Prizewinning chatbot steers the conversation, New Scientist, 27 October 2010

External links