Software:MeCab

MeCab
Developer(s)	Taku Kudou, Google Japanese Input project
Stable release	0.996 / 18 February 2013; 12 years ago
Written in	C++, has modules for C, C#, Java, Perl, Python, and Ruby
Platform	Cross-platform
License	Tri-licensed under GPL, LGPL and BSD licenses
Website	https://taku910.github.io/mecab

Short description: Open-source text segmentation library

MeCab is an open-source text segmentation library for use with text written in the Japanese language originally developed by the Nara Institute of Science and Technology and currently maintained by Taku Kudou (工藤拓) as part of his work on the Google Japanese Input project.^[1]^[2] The name derives from the developer's favorite food, mekabu [ja] (和布蕪), a Japanese dish made from wakame leaves.

The software was originally based on ChaSen and was developed under the name ChaSenTNG, but now it is developed independently from ChaSen and was rewritten from scratch. MeCab's analysis accuracy is comparable to ChaSen, and its analysis speed is 3–4 times faster on average.

MeCab can analyze and segment a sentence into its parts of speech. There are several dictionaries available for MeCab, but IPADIC is the most commonly used one as with ChaSen.

In 2007, Google used MeCab to generate n-gram data for a large corpus of Japanese text, which it published on its Google Japan blog.^[3]

MeCab is also used for Japanese input on Mac OS X 10.5 and 10.6, and in iOS since version 2.1.^[4]^[5]

Example

Input:

ウィキペディア（Ｗｉｋｉｐｅｄｉａ）は誰でも編集できるフリー百科事典です

Results in:

ウィキペディア	名詞,一般,*,*,*,*,*
（	記号,括弧開,*,*,*,*,（,（,（
Ｗｉｋｉｐｅｄｉａ	名詞,固有名詞,組織,*,*,*,*
）	記号,括弧閉,*,*,*,*,）,）,）
は	助詞,係助詞,*,*,*,*,は,ハ,ワ
誰	名詞,代名詞,一般,*,*,*,誰,ダレ,ダレ
でも	助詞,副助詞,*,*,*,*,でも,デモ,デモ
編集	名詞,サ変接続,*,*,*,*,編集,ヘンシュウ,ヘンシュー
できる	動詞,自立,*,*,一段,基本形,できる,デキル,デキル
フリー	名詞,一般,*,*,*,*,フリー,フリー,フリー
百科	名詞,一般,*,*,*,*,百科,ヒャッカ,ヒャッカ
事典	名詞,一般,*,*,*,*,事典,ジテン,ジテン
です	助動詞,*,*,*,特殊・デス,基本形,です,デス,デス
EOS

Besides segmenting the text, MeCab also lists the part of speech of the word, and, if applicable and in the dictionary, its pronunciation. In the above example, the verb できる (dekiru, "to be able to") is classified as an ichidan (一段) verb (動詞) in the infinitive tense (基本形). The word でも (demo) is identified as an adverbial particle (副助詞). As not all columns apply to all words, when a column does not apply to a word, an asterisk is used; this makes it possible to format the information after the word and the tab character as the comma-separated values.

MeCab also supports several output formats; one of which, chasen, outputs tab-separated values in a format that programs written for ChaSen can use. Another format, yomi (from 読む yomu, to read), outputs the pronunciation of the input text as katakana,^[6] as shown below.

ウィキペディア（Ｗｉｋｉｐｅｄｉａ）ハダレデモヘンシュウデキルフリーヒャッカジテンデス

References

↑ "「ググる」の精度を高めるために必要なもの－＠IT自分戦略研究所" (in ja). ITmedia. 2006-03-15. http://jibun.atmarkit.co.jp/lcareer01/rensai/cas003/cas001.html. Retrieved 2009-04-09.
↑ "思いどおりの日本語入力 - Google 日本語入力" (in ja). Google. 2009-12-03. http://googlejapan.blogspot.com/2009/12/google_03.html. Retrieved 2009-12-03.
↑ "Google Japan Blog: 大規模日本語 n-gram データの公開" (in ja). Google. 2007-11-01. http://googlejapan.blogspot.com/2007/11/n-gram.html. Retrieved 2009-04-09.
↑ "大規模テキスト処理を支える形態素解析技術（工藤拓氏・Google）" (in ja). 2009-12-03. http://d.hatena.ne.jp/kazama/20080115/p1. Retrieved 2009-12-03.
↑ "iPhoneの仮名漢字変換はMeCabを利用" (in ja). 2009-12-03. Archived from the original on 2008-09-18. https://web.archive.org/web/20080918005625/http://yebo-blog.blogspot.com/2008/09/iphonemecab.html. Retrieved 2009-12-03.
↑ Kudou, Taku. "MeCab: Yet Another Part-of-Speech and Morphological Analyzer" (in ja). https://taku910.github.io/mecab/#parse. Retrieved 23 January 2018.

External links

0.00

(0 votes)

[1] "「ググる」の精度を高めるために必要なもの－＠IT自分戦略研究所" (in ja). ITmedia. 2006-03-15. http://jibun.atmarkit.co.jp/lcareer01/rensai/cas003/cas001.html. Retrieved 2009-04-09.

[2] "思いどおりの日本語入力 - Google 日本語入力" (in ja). Google. 2009-12-03. http://googlejapan.blogspot.com/2009/12/google_03.html. Retrieved 2009-12-03.

[3] "Google Japan Blog: 大規模日本語 n-gram データの公開" (in ja). Google. 2007-11-01. http://googlejapan.blogspot.com/2007/11/n-gram.html. Retrieved 2009-04-09.

[4] "大規模テキスト処理を支える形態素解析技術（工藤拓氏・Google）" (in ja). 2009-12-03. http://d.hatena.ne.jp/kazama/20080115/p1. Retrieved 2009-12-03.

[5] "iPhoneの仮名漢字変換はMeCabを利用" (in ja). 2009-12-03. Archived from the original on 2008-09-18. https://web.archive.org/web/20080918005625/http://yebo-blog.blogspot.com/2008/09/iphonemecab.html. Retrieved 2009-12-03.

[6] Kudou, Taku. "MeCab: Yet Another Part-of-Speech and Morphological Analyzer" (in ja). https://taku910.github.io/mecab/#parse. Retrieved 23 January 2018.

[1]

[2]

[3]

[4]

[5]

[6]

Anonymous

Search

Software:MeCab

Namespaces

More

Page actions

Example

References

External links

Navigation

Navigation

Help

Translate

Wiki tools

Wiki tools

Anonymous

Search

Software:MeCab

Example

References

External links

Navigation

Wiki tools

Page tools

Other projects

Categories