當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

【Lucene4.8教程之一】使用Lucene4.8进行索引及搜索的基本操作

發(fā)布時(shí)間：2024/1/23 编程问答 22 豆豆

生活随笔收集整理的這篇文章主要介紹了【Lucene4.8教程之一】使用Lucene4.8进行索引及搜索的基本操作小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

在Lucene對文本進(jìn)行處理的過程中，可以大致分為三大部分：

1、索引文件：提取文檔內(nèi)容并分析，生成索引

2、搜索內(nèi)容：搜索索引內(nèi)容，根據(jù)搜索關(guān)鍵字得出搜索結(jié)果

3、分析內(nèi)容：對搜索詞匯進(jìn)行分析，生成Quey對象。

注：事實(shí)上，除了最基本的完全匹配搜索以外，其它都需要在搜索前進(jìn)行分析。

如不加分析步驟，則搜索JAVA，是沒有結(jié)果的，因?yàn)樵谒饕^程中已經(jīng)將詞匯均轉(zhuǎn)化為小寫，而此處搜索時(shí)則要求關(guān)鍵字完全匹配。

使用了QueryParser類以后，則根據(jù)Analyzer的具體實(shí)現(xiàn)類，對搜索詞匯進(jìn)行分析，如大小寫轉(zhuǎn)換，java and ant等的搜索詞解釋等。

一、索引文件

基本步驟如下：

1、創(chuàng)建索引庫IndexWriter

2、根據(jù)文件創(chuàng)建文檔Document

?3、向索引庫中寫入文檔內(nèi)容

package com.ljh.search.index;import java.io.File; import java.io.FileReader; import java.io.IOException;import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.document.LongField; import org.apache.lucene.document.StringField; import org.apache.lucene.document.TextField; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.IndexWriterConfig; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; import org.apache.lucene.util.Version;// 1、創(chuàng)建索引庫IndexWriter // 2、根據(jù)文件創(chuàng)建文檔Document // 3、向索引庫中寫入文檔內(nèi)容public class IndexFiles {public static void main(String[] args) throws IOException {String usage = "java IndexFiles"+ " [-index INDEX_PATH] [-docs DOCS_PATH] \n\n"+ "This indexes the documents in DOCS_PATH, creating a Lucene index"+ "in INDEX_PATH that can be searched with SearchFiles";String indexPath = null;String docsPath = null;for (int i = 0; i < args.length; i++) {if ("-index".equals(args[i])) {indexPath = args[i + 1];i++;} else if ("-docs".equals(args[i])) {docsPath = args[i + 1];i++;}}if (docsPath == null) {System.err.println("Usage: " + usage);System.exit(1);}final File docDir = new File(docsPath);if (!docDir.exists() || !docDir.canRead()) {System.out.println("Document directory '"+ docDir.getAbsolutePath()+ "' does not exist or is not readable, please check the path");System.exit(1);}IndexWriter writer = null;try {// 1、創(chuàng)建索引庫IndexWriterwriter = getIndexWriter(indexPath);index(writer, docDir);} catch (IOException e) {e.printStackTrace();} finally {writer.close();}}private static IndexWriter getIndexWriter(String indexPath)throws IOException {Directory indexDir = FSDirectory.open(new File(indexPath));IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_48,new StandardAnalyzer(Version.LUCENE_48));IndexWriter writer = new IndexWriter(indexDir, iwc);return writer;}private static void index(IndexWriter writer, File file) throws IOException {if (file.isDirectory()) {String[] files = file.list();if (files != null) {for (int i = 0; i < files.length; i++) {index(writer, new File(file, files[i]));}}} else {// 2、根據(jù)文件創(chuàng)建文檔DocumentDocument doc = new Document();Field pathField = new StringField("path", file.getPath(),Field.Store.YES);doc.add(pathField);doc.add(new LongField("modified", file.lastModified(),Field.Store.NO));doc.add(new TextField("contents", new FileReader(file)));System.out.println("Indexing " + file.getName());// 3、向索引庫中寫入文檔內(nèi)容writer.addDocument(doc);}}}

（1）使用“java indexfiles -index d:/index -docs d:/tmp”運(yùn)行程序，索引d:/tmp中的文件，并將索引文件放置到d:/index。

（2）上述生成的索引文件可以使用Luke進(jìn)行查看。目前Luke已遷移至github進(jìn)行托管。

二、搜索文件

1、打開索引庫IndexSearcher
2、根據(jù)關(guān)鍵詞進(jìn)行搜索
3、遍歷結(jié)果并處理

package com.ljh.search.search;//1、打開索引庫IndexSearcher //2、根據(jù)關(guān)鍵詞進(jìn)行搜索 //3、遍歷結(jié)果并處理 import java.io.File; import java.io.IOException;import org.apache.lucene.index.DirectoryReader; import org.apache.lucene.index.IndexReader; import org.apache.lucene.index.Term; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.search.TermQuery; import org.apache.lucene.search.TopDocs; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory;public class Searcher {public static void main(String[] args) throws IOException {String indexPath = null;String term = null;for (int i = 0; i < args.length; i++) {if ("-index".equals(args[i])) {indexPath = args[i + 1];i++;} else if ("-term".equals(args[i])) {term = args[i + 1];i++;}}System.out.println("Searching " + term + " in " + indexPath);// 1、打開索引庫Directory indexDir = FSDirectory.open(new File(indexPath));IndexReader ir = DirectoryReader.open(indexDir);IndexSearcher searcher = new IndexSearcher(ir);// 2、根據(jù)關(guān)鍵詞進(jìn)行搜索TopDocs docs = searcher.search(new TermQuery(new Term("contents", term)), 20);// 3、遍歷結(jié)果并處理ScoreDoc[] hits = docs.scoreDocs;System.out.println(hits.length);for (ScoreDoc hit : hits) {System.out.println("doc: " + hit.doc + " score: " + hit.score);}ir.close();}}

三、分析

事實(shí)上，除了最基本的完全匹配搜索以外，其它都需要在搜索前進(jìn)行分析。

使用了QueryParser類以后，則根據(jù)Analyzer的具體實(shí)現(xiàn)類，對搜索詞匯進(jìn)行分析，如大小寫轉(zhuǎn)換，java and ant等的搜索詞解釋等。

分析過程有2個(gè)基本步驟：

1、生成QueryParser對象

2、調(diào)用QueryParser.parse()生成Query()對象。

具體代碼，將下述代碼：

// 2、根據(jù)關(guān)鍵詞進(jìn)行搜索TopDocs docs = searcher.search(new TermQuery(new Term("contents", term)), 20);用以下代替：

// 2、根據(jù)關(guān)鍵詞進(jìn)行搜索/*TopDocs docs = searcher.search(new TermQuery(new Term("contents", term)), 10);*/QueryParser parser = new QueryParser(Version.LUCENE_48, "contents", new SimpleAnalyzer(Version.LUCENE_48));Query query = null;try {query = parser.parse(term);} catch (ParseException e) {e.printStackTrace();}TopDocs docs = searcher.search(query, 30);

總結(jié)

以上是生活随笔為你收集整理的【Lucene4.8教程之一】使用Lucene4.8进行索引及搜索的基本操作的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇：【Heritrix基础教程之4】开始一个
下一篇：【Lucene4.8教程之三】搜索