當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

全文检索引擎Solr系列——Solr核心概念、配置文件

發布時間：2025/3/19 编程问答 17 豆豆

生活随笔收集整理的這篇文章主要介紹了全文检索引擎Solr系列——Solr核心概念、配置文件小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

2019獨角獸企業重金招聘Python工程師標準>>>

Document

Document是Solr索引（動詞，indexing）和搜索的最基本單元，它類似于關系數據庫表中的一條記錄，可以包含一個或多個字段（Field），每個字段包含一個name和文本值。字段在被索引的同時可以存儲在索引中，搜索時就能返回該字段的值，通常文檔都應該包含一個能唯一表示該文檔的id字段。例如：

<doc>

????<field name="id">company123</field>

????<field name="companycity">Atlanta</field>

????<field name="companystate">Georgia</field>

????<field name="companyname">Code Monkeys R Us, LLC</field>

????<field name="companydescription">we write lots of code</field>

????<field name="lastmodified">2013-06-01T15:26:37Z</field>

</doc>

Schema

Solr中的Schema類似于關系數據庫中的表結構，它以schema.xml的文本形式存在在conf目錄下，在添加文當到索引中時需要指定Schema，Schema文件主要包含三部分：字段（Field）、字段類型（FieldType）、唯一鍵（uniqueKey）

字段類型（FieldType）：用來定義添加到索引中的xml文件字段（Field）中的類型，如：int，String，date，
字段（Field）：添加到索引文件中時的字段名稱
唯一鍵（uniqueKey）：uniqueKey是用來標識文檔唯一性的一個字段（Feild），在更新和刪除時用到

例如：

????<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />

????<field name="title" type="text_general" indexed="true" stored="true" multiValued="true"/>

????<uniqueKey>id</uniqueKey>

????<fieldType name="string" class="solr.StrField" sortMissingLast="true" />

????<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">

??????????<analyzer type="index">

????????????<tokenizer class="solr.StandardTokenizerFactory"/>

????????????<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />

????????????<!-- in this example, we will only use synonyms at query time

????????????<filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>

????????????-->

????????????<filter class="solr.LowerCaseFilterFactory"/>

??????????</analyzer>

??????????<analyzer type="query">

????????????<tokenizer class="solr.StandardTokenizerFactory"/>

????????????<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />

????????????<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>

????????????<filter class="solr.LowerCaseFilterFactory"/>

??????????</analyzer>

????</fieldType>

</schema>

Field

在Solr中，字段(Field)是構成Document的基本單元。對應于數據庫表中的某一列。字段是包括了名稱，類型以及對字段對應的值如何處理的一種元數據。比如：

Indexed：Indexed=true時，表示字段會加被Sorl處理加入到索引中，只有被索引的字段才能被搜索到。
Stored：Stored=true，字段值會以保存一份原始內容在在索引中，可以被搜索組件組件返回，考慮到性能問題，對于長文本就不適合存儲在索引中。

Field Type

Solr中每個字段都有一個對應的字段類型，比如：float、long、double、date、text，Solr提供了豐富字段類型，同時，我們還可以自定義適合自己的數據類型，例如：

?<fieldType name="text_cn_stopword" class="solr.TextField">

?????<analyzer type="index">

?????????<tokenizer class="org.wltea.analyzer.lucene.IKAnalyzerSolrFactory" useSmart="false"/>

?????</analyzer>

?????<analyzer type="query">

?????????<tokenizer class="org.wltea.analyzer.lucene.IKAnalyzerSolrFactory" useSmart="true"/>

?????</analyzer>

?</fieldType>

?

Solrconfig：

如果把Schema定義為Solr的Model的話，那么Solrconfig就是Solr的Configuration，它定義Solr如果處理索引、高亮、搜索等很多請求，同時還指定了緩存策略，用的比較多的元素包括：

指定索引數據路徑

<!--

Used to specify an alternate directory to hold all index data

other than the default ./data under the Solr home.

If replication is in use, this should match the replication configuration.

-->

緩存參數

<filterCache

??class="solr.FastLRUCache"

??size="512"

??initialSize="512"

??autowarmCount="0"/>

<!-- queryResultCache caches results of searches - ordered lists of

?????document ids (DocList) based on a query, a sort, and the range

?????of documents requested.? -->

?<queryResultCache

??class="solr.LRUCache"

??size="512"

??initialSize="512"

??autowarmCount="0"/>

?<!-- documentCache caches Lucene Document objects (the stored fields for each document).

???Since Lucene internal document ids are transient, this cache will not be autowarmed.? -->

?<documentCache

??class="solr.LRUCache"

??size="512"

??initialSize="512"

??autowarmCount="0"/>

請求處理器
請求處理器用于接收HTTP請求，處理搜索后，返回響應結果的處理器。比如：query請求：

?????<lst name="defaults">

???????<str name="echoParams">explicit</str>

???????<str name="wt">json</str>

???????<str name="indent">true</str>

???????<str name="df">text</str>

?????</lst>

</requestHandler>

每個請求處理器包括一系列可配置的搜索參數，例如：wt,indent,df等等。

轉載于:https://my.oschina.net/u/2929819/blog/760871

總結

以上是生活随笔為你收集整理的全文检索引擎Solr系列——Solr核心概念、配置文件的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： C# mvc 500 内部服务器访问异常
下一篇： Java的对象初始化过程