NLTK自带的词干提取器
生活随笔
收集整理的這篇文章主要介紹了
NLTK自带的词干提取器
小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.
代碼來自《Python自然語言處理》P116
(python2.7) appleyuchi@ubuntu:~/.virtualenvs/python2.7/bin$ python Python 2.7.12 (default, Nov 19 2016, 06:48:10) [GCC 5.4.0 20160609] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> raw="""DENNIS:Listen,strange women lying in ponds distributing swords is... is no basis for a system of goverment. Supreme executive power derives from... a mandate from the masses, not from some farcical aquatic ceremony.""" >>> import nltk >>> tokens=nltk.word_tokenize(raw) >>> porter = nltk.PorterStemmer() >>> lancaster=nltk.LancasterStemmer() >>> [porter.stem(t) for t in tokens] [u'denni', ':', 'listen', ',', u'strang', 'women', u'lie', 'in', u'pond', u'distribut', u'sword', 'is', '...', 'is', 'no', u'basi', 'for', 'a', 'system', 'of', u'gover', '.', u'suprem', u'execut', 'power', u'deriv', 'from', '...', 'a', u'mandat', 'from', 'the', u'mass', ',', 'not', 'from', 'some', u'farcic', u'aquat', u'ceremoni', '.'] >>> [lancaster.stem(t) for t in tokens] ['den', ':', 'list', ',', 'strange', 'wom', 'lying', 'in', 'pond', 'distribut', 'sword', 'is', '...', 'is', 'no', 'bas', 'for', 'a', 'system', 'of', 'gov', '.', 'suprem', 'execut', 'pow', 'der', 'from', '...', 'a', 'mand', 'from', 'the', 'mass', ',', 'not', 'from', 'som', 'farc', 'aqu', 'ceremony', '.']上述代碼中,raw是原始余料,最后幾行是詞干提取結(jié)果。
以上代碼總共使用了兩種詞干提取器,分別是Porter和Lancaster
創(chuàng)作挑戰(zhàn)賽新人創(chuàng)作獎(jiǎng)勵(lì)來咯,堅(jiān)持創(chuàng)作打卡瓜分現(xiàn)金大獎(jiǎng)總結(jié)
以上是生活随笔為你收集整理的NLTK自带的词干提取器的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。