Solr(搜索引擎服务)和MongoDB通过mongodb-connector进行数据同步的解决方案,以及遇到的各种坑的总结(针对solr-5.3.x版本),mongodb和solr实现实时增量索引
Solr配置與MongoDB的安裝
???? Solr安裝配置到目前已經非常簡單,參考官方文檔:http://lucene.apache.org/solr/quickstart.html,官方文檔中用的是cloud這個樣例(-e 指定),最后,我采用的是techproducts,基本命令如下:
注意:如果unzip沒有安裝,請先安裝:apt-get install unzip ? root@xxx:xxx# ls solr-* solr-5.3.1.zip? solr-5.3.1.zip root@xxx:xxx# unzip -q solr-5.3.1.zip root@xxx:xxx# cd solr-5.3.1/ -----------------------------安裝好之后使用solr中自帶的techproducts測試一下start-------------------------------------------- root@xxx:/home/software/solr-5.3.1# bin/solr start -e techproducts –noprompt Your current version of Java is too old to run this version of Solr We found version 1.7.0_79, using command '/usr/java/jdk1.7.0_79/bin/java' Please install latest version of Java 8 or set JAVA_HOME properly. ? Debug information: JAVA_HOME: /usr/java/jdk1.7.0_79 Active Path: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/usr/java/jdk1.7.0_79/bin:/usr/local/mongodb/bin ? 如果出現上面的提示語,請安裝jdk1.7以上版本 ? 安裝好之后執行: root@xxx:/home/software/solr-5.3.1# bin/solr start -e techproducts -noprompt ? 這時候從瀏覽器訪問: http://localhost:8983/solr/#/techproducts ????的結果是: |
上面的techproducts就是一個SolrCore ----------------------------------------------------------------安裝好之后使用solr中自帶的techproducts測試一下end--------------------------------------------------------------------------------- |
1、Solr的安裝和啟動停止
按照官方文檔所說,如果你像用完后關閉solr,并清除這個樣例底下的數據,那么請運行:
root@xxx:/home/software/solr-5.3.1# pwd /home/software/solr-5.3.1 root@xxx:/home/software/solr-5.3.1# bin/solr stop -all Sending stop command to Solr running on port 8983 ... waiting 5 seconds to allow Jetty process 1816 to stop gracefully. ? root@xxx:/home/software/solr-5.3.1# rm -Rf example/techproducts/ |
?
注意:如果在停止所有之后執行:bin/solr start -all -noprompt ? 默認到,然后訪問:http://localhost:8983/solr/ ? ,添加solrCore,它最后會到/home/software/solr-5.3.1/server/solr中去找。若沒有,拷貝:/home/software/solr-5.3.1/example/techproducts/techproducts ? ?到 ??/home/software/solr-5.3.1/server/solr ? 并將名稱techproducts改成docdetection,例如:
并修改docdetection中的core.properties的內容為:
#Written by CorePropertiesLocator
#Fri Mar 31 18:36:50 UTC 2017
name=docdetection
config=solrconfig.xml
schema=schema.xml
dataDir=data
如果想創建多個,可以在docdetection同級目錄下創建多個。比如:
core.properties的內容如下:
#Written by CorePropertiesLocator
#Fri Mar 31 18:36:50 UTC 2017
name=test
config=solrconfig.xml
schema=schema.xml
dataDir=data
進入/home/software/solr-5.3.1/server/solr-webapp/webapp/WEB-INF,修改web.xml中的
<env-entry>
? ? ? ?<env-entry-name>solr/home</env-entry-name>
? ? ? ?<env-entry-value>/home/software/solr-5.3.1/server/solr</env-entry-value>
? ? ? ?<env-entry-type>java.lang.String</env-entry-type>
? ? </env-entry>
接著刷新:http://localhost:8983/solr,最終的界面如下:
2、solr與MongoDB的整合
從Solr官方給的quickstart文檔上來看,它可以搜索xml,json, csv等多種文檔,但絲毫看不出這東西還能跟MongoDB整合,但是萬能的人類總是能想辦法把他們弄到一起,或許真的有全能神吧。
??? 參考地址:http://www.cnblogs.com/sysuys/p/3403670.html
?
??? 為了讓solr和mongodb進行整合,需要mongo-connector,參考地址是:https://github.com/10gen-labs/mongo-connector/wiki/Getting-Started
???
??? 關于mongo-connector的下載地址:https://github.com/mongodb-labs/mongo-connector
???
1) 、建立MongoDB的replica set(副本集)
安裝python-pip 和 git
root@xxx:~# apt-get install python-pip root@xxx:~# apt-get install git |
Reading package lists... Done Building dependency tree?????? Reading state information... Done The following extra packages will be installed: root@iZm5effj2tm01xy2qqmnlzZ:~# |
配置副本參考:http://blog.csdn.net/tototuzuoquan/article/details/76473441?
2)、安裝mongo-connector
2.1)、mongo-connector安裝(推薦)
安裝參考https://github.com/10gen-labs/mongo-connector,十分簡單,一條命令:
可以在安裝的時候,讓mongo-connector作為一個后臺進程,可以按照下面的步驟進行安裝: 編輯config.json進行查看 root@xxx:/home/software/mongo-connector-master#pwd /home/software/mongo-connector-master ? root@xxx:/home/software/mongo-connector-master# pip install mongo_connector[solr] 要注意的是,在后面同步solr的時候,要doc manager,所以,也需要對它進行安裝,如果夠僅僅是按照上面的方式安裝,同步的時候會出現錯誤,在網上查了很久,最后直接在mongodb-connector中的README.rst找到了一個地址:https://github.com/mongodb-labs/solr-doc-manager ???(同理安裝其它類型的doc-manger也類似) ? 如果沒有安裝:solr-doc-manager,請執行下面的命令:pip install solr-doc-manager? 如果查找mongo-connector在哪兒,可以使用下面的方式: root@xxx:/etc/init.d# find / -name mongo-connector /home/software/mongo-connector-master/scripts/mongo-connector /usr/local/bin/mongo-connector root@xxx:/etc/init.d# ? 下面是安裝elastic2-doc-manager 這個doc-manager root@xxx:/home/software/mongo-connector-master# pip install elastic2-doc-manager |
?
注意:如果提示沒有python-pip,apt-get一下便好了。但是先別急著用,因為這個東西要讀取solr的配置文件,所以Solr中的一些地方弄好了,再用這個就只是一條命令罷了。
?
注意:網上說通過pip安裝,但是沒有說卸載的,看下pip的說明:
root@xxx:/home/software/mongo-connector-master# pip? --help ? Usage:?? ? pip <command> [options] |
可以通過下面的方式進行卸載:
root@xxx:/home/software/mongo-connector-master# pip uninstall mongo-connector |
2.2)、第二種安裝mongodb-connector的方式:
git clone https://github.com/10gen-labs/mongo-connector.git cd mongo-connector #安裝前修改mongo_connector/constants.py的變量:設置DEFAULT_COMMIT_INTERVAL = 0 python setup.py install |
?
2.3)、第三種方式:https://github.com/mongodb-labs/mongo-connector上下載mongo-connector-master.zip?? (不推薦)
root@xxx:/home/software# unzip mongo-connector-master.zip root@xxx:/home/software# chmod +x setup.py root@xxx:/home/software# cd mongo-connector-master/ root@xxx:/home/software/mongo-connector-master# python setup.py install
running install_service creating /var/log/mongo-connector copying ./config.json -> /etc/mongo-connector.json copying ./scripts/mongo-connector -> /etc/init.d ? root@xxx:/home/software/mongo-connector-master# chmod +x /etc/init.d/mongo-connector 執行下面的命令確保系統的啟動配置被更新了: root@xxx:/home/software/mongo-connector-master# update-rc.d mongo-connector defaults update-rc.d: warning: default start runlevel arguments (2 3 4 5) do not match mongo-connector Default-Start values (3 4 5) ?Adding system startup for /etc/init.d/mongo-connector ... ?? /etc/rc0.d/K20mongo-connector -> ../init.d/mongo-connector ?? /etc/rc1.d/K20mongo-connector -> ../init.d/mongo-connector ?? /etc/rc6.d/K20mongo-connector -> ../init.d/mongo-connector ?? /etc/rc2.d/S20mongo-connector -> ../init.d/mongo-connector ?? /etc/rc3.d/S20mongo-connector -> ../init.d/mongo-connector ?? /etc/rc4.d/S20mongo-connector -> ../init.d/mongo-connector ?? /etc/rc5.d/S20mongo-connector -> ../init.d/mongo-connector root@iZm5effj2tm01xy2qqmnlzZ:/home/software/mongo-connector-master# ? 如果想移除后臺運行的可以執行下面的操作: python setup.py uninstall_service ? 通過這個命令可以移除/etc/init.d/mongo-connector 和 /etc/mongo-connector.json |
?
3)Solr一端的配置:
查找schema.xml,并修改這個文件
root@xxx:/home/software/solr-5.3.1# find ./ -name "schema.xml" ./example/example-DIH/solr/rss/conf/schema.xml ./example/example-DIH/solr/tika/conf/schema.xml ./example/example-DIH/solr/solr/conf/schema.xml ./example/example-DIH/solr/mail/conf/schema.xml ./example/example-DIH/solr/db/conf/schema.xml ./example/techproducts/solr/techproducts/conf/schema.xml ./server/solr/configsets/sample_techproducts_configs/conf/schema.xml ./server/solr/configsets/basic_configs/conf/schema.xml root@iZm5effj2tm01xy2qqmnlzZ:/home/software/solr-5.3.1# |
?
打開
vi ./server/solr/configsets/sample_techproducts_configs/conf/schema.xml |
將(linux上的查找方式是: Esc --à??? :/<uniqueKey>)
<uniqueKey>id</uniqueKey> |
改成帶有下劃線的id:
再添加(Linux上到達最第行的命令:??Esc --à? Shift + g):
<field?name="_id"?type="string"?indexed="true"?stored="true"?/> <field name="_ts" type="long" indexed="true" stored="true" /> <field name="ns" type="string" indexed="true" stored="true"/> |
?
添加后的效果如下:
?
注釋掉原來的(命令是:? Esc --à :/name="id")
<!--?
<field name="id"type="string" indexed="true" stored="true"required="true" multiValued="false" />
-->
截圖如下:
不然往Solr中添加一個json,或者xml都要求有這個字段id,因為required=”true”
schema.xml的修改就是這些
?
修改solrconfig.xml
打開:
vi ./server/solr/configsets/sample_techproducts_configs/conf/solrconfig.xml |
將(關于下面的class如果配置錯了,也將出不來solr和mongodb的數據同步,參考官網:https://github.com/mongodb-labs/mongo-connector/wiki/Usage%20with%20Solr#make-sure-the-lukerequesthandler-is-enabled)
<requestHandler name="/admin/luke" class="org.apache.solr.handler.admin.LukeRequestHandler" /> |
?
解注釋,如果沒有,就添加一行,這個東西要被mongo-connector用到,mongo-connector會請求獲取上面的schema.xml,正是這個Handler來處理這個請求,所以說這個很重要。
?
最后:
??? 最后,我們按照之前說的關閉Solr,清除example/techproducts目錄,重新再次啟動Solr,重啟techproducts樣例會產生一些錯誤,那是因為修改了schema.xml,里面uniqueKey變成了_id,而不是id,所以會產生這些錯誤,但這些都可以忽略,不產生錯誤就說明有問題。之后你會發現,那兩個配置文件被復制成了exmaple/techproducts這個樣例的配置文件,就像上文說的。
root@xxx:/home/software/solr-5.3.1# cd /home/software/solr-5.3.1 root@xxx:/home/software/solr-5.3.1# bin/solr stop -all Sending stop command to Solr running on port 8983 ... waiting 5 seconds to allow Jetty process 1816 to stop gracefully. ? root@xxx:/home/software/solr-5.3.1# root@xxx:/home/software/solr-5.3.1# bin/solr start -e techproducts -noprompt |
?
4)使用mongo-connector連接Solr與MongoDB.
在目前情況下,請運行(其中:mongo-connector 的參考地址是:http://blog.csdn.net/hyman_yx/article/details/51684218):
mongo-connector -m localhost:27017 --auto-commit-interval=0 -t http://localhost:8983/solr/techproducts -d solr_doc_manager ?1>/dev/null 2>&1 & mongo-connector -m localhost:27017 --auto-commit-interval=0 -t http://localhost:8983/solr/docdetection -d solr_doc_manager?1>/dev/null 2>&1 & 注意: 若有時候發現重新創建索引的時候不給力,需要執行下面的命令(同時要刪除索引,重新創建): root@iZm5effj2tm01xy2qqmnlzZ:/home/software/solr-5.3.1# rm -rf mongo-connector.log? root@iZm5effj2tm01xy2qqmnlzZ:/home/software/solr-5.3.1/server/solr/docdetection/data# pwd |
執行完成之后的效果如下:
查看mongo-connector進去的內容
經過以上步驟配置之后,終于可以看到(至此,配置成功):
在MongoDB中的內容為:
如果有時候你發現你的solr沒有自動同步數據,那是因為solr默認配置中,默認把自動同步給關閉了,這時候需要對solrconfig.xml自動同步的開關進行設置,可以以下操作
進入solr的目錄(注意:我的solr是放在/home/software/solr-5.3.1)中的:
cd /home/software/solr-5.3.1
查找solrconfig.xml
find ./ -name solrconfig.xml,結果如下:
./example/files/conf/solrconfig.xml ./example/example-DIH/solr/rss/conf/solrconfig.xml ./example/example-DIH/solr/tika/conf/solrconfig.xml ./example/example-DIH/solr/solr/conf/solrconfig.xml ./example/example-DIH/solr/mail/conf/solrconfig.xml ./example/example-DIH/solr/db/conf/solrconfig.xml ./example/techproducts/solr/techproducts/conf/solrconfig.xml ./server/solr/configsets/sample_techproducts_configs/conf/solrconfig.xml ./server/solr/configsets/basic_configs/conf/solrconfig.xml ./server/solr/configsets/data_driven_schema_configs/conf/solrconfig.xml |
?
修改上面紅色標注出來的文件中的如下內容進行修改:
<autoCommit> ?? <maxTime>${solr.autoCommit.maxTime:15000}</maxTime> ?? <openSearcher>false</openSearcher> </autoCommit> ? <!-- softAutoCommit is like autoCommit except it causes a ???????? 'soft' commit which only ensures that changes are visible ???????? but does not ensure that data is synced to disk.? This is ???????? faster and more near-realtime friendly than a hard commit. --> <autoSoftCommit> <maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime> </autoSoftCommit> |
?
修改1:vimexample/techproducts/solr/techproducts/conf/solrconfig.xml
<autoCommit> <maxTime>${solr.autoCommit.maxTime:15000}</maxTime> ??? <openSearcher>false</openSearcher> </autoCommit> ? <!-- softAutoCommit is like autoCommit except it causes a ???????? 'soft' commit which only ensures that changes are visible ???????? but does not ensure that data is synced to disk.? This is ???????? faster and more near-realtime friendly than a hard commit. --> ? <autoSoftCommit> ?????? <maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime> </autoSoftCommit> ? 修改為: <autoCommit> <maxTime>300000</maxTime> ??? <maxDocs>10000</maxDocs> ??? <openSearcher>true</openSearcher> </autoCommit> ? <!-- softAutoCommit is like autoCommit except it causes a ???????? 'soft' commit which only ensures that changes are visible ???? ????but does not ensure that data is synced to disk.? This is ???????? faster and more near-realtime friendly than a hard commit. --> <autoSoftCommit> ??? <maxDocs>1000</maxDocs> ??? <maxTime>60000</maxTime> </autoSoftCommit> |
?
修改2:/server/solr/configsets/sample_techproducts_configs/conf/solrconfig.xml
<autoCommit> ??? <maxTime>${solr.autoCommit.maxTime:15000}</maxTime> ??? <openSearcher>false</openSearcher> </autoCommit> ? <!-- softAutoCommit is like autoCommit except it causes a ???????? 'soft' commit which only ensures that changes are visible ???????? but does not ensure that data is synced to disk.? This is ???????? faster and more near-realtime friendly than a hard commit. --> ? <autoSoftCommit> ??? <maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime> </autoSoftCommit> ? 修改為: <autoCommit> ??? <maxTime>300000</maxTime> ??? <maxDocs>10000</maxDocs> ??? <openSearcher>true</openSearcher> </autoCommit> ? <!-- softAutoCommit is like autoCommit except it causes a ???????? 'soft' commit which only ensures that changes are visible ???? ????but does not ensure that data is synced to disk.? This is ???????? faster and more near-realtime friendly than a hard commit. --> <autoSoftCommit> ??? <maxDocs>1000</maxDocs> ??? <maxTime>60000</maxTime> </autoSoftCommit> |
?
修改3:vim server/solr/configsets/basic_configs/conf/solrconfig.xml
<autoCommit> ???? <maxTime>${solr.autoCommit.maxTime:15000}</maxTime> ???? <openSearcher>false</openSearcher> </autoCommit> ? <!-- softAutoCommit is like autoCommit except it causes a ???? 'soft' commit which only ensures that changes are visible ???? but does not ensure that data is synced to disk.? This is ???? faster and more near-realtime friendly than a hard commit. --> <autoSoftCommit> ???? <maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime> </autoSoftCommit> ? 修改為: <autoCommit> ??? <maxTime>300000</maxTime> ??? <maxDocs>10000</maxDocs> ??? <openSearcher>true</openSearcher> </autoCommit> ? <!-- softAutoCommit is like autoCommit except it causes a ???????? 'soft' commit which only ensures that changes are visible ???? ????but does not ensure that data is synced to disk.? This is ???????? faster and more near-realtime friendly than a hard commit. --> <autoSoftCommit> ??? <maxDocs>1000</maxDocs> ??? <maxTime>60000</maxTime> </autoSoftCommit> |
?
修改4:vimserver/solr/configsets/data_driven_schema_configs/conf/solrconfig.xml
<autoCommit> ??? <maxTime>${solr.autoCommit.maxTime:15000}</maxTime> <openSearcher>false</openSearcher> </autoCommit> ? 修改為: <autoCommit> ??? <maxTime>300000</maxTime> <maxDocs>10000</maxDocs> ??? <openSearcher>true</openSearcher> </autoCommit> ? <!-- softAutoCommit is like autoCommit except it causes a ???????? 'soft' commit which only ensures that changes are visible ???? ????but does not ensure that data is synced to disk.? This is ???????? faster and more near-realtime friendly than a hard commit. --> <autoSoftCommit> <maxDocs>1000</maxDocs> ??? <maxTime>60000</maxTime> </autoSoftCommit> |
要注意的是,如果想在solr中再次添加mongodb中中的key作為索引元素,需要編輯solrCore中的schema.xml中的內容。下面的一個例子是:
<?xml version="1.0" encoding="UTF-8" ?>
<schema name="example" version="1.5">?
?<types>? ? ?
? <fieldType name="string" class="solr.StrField" sortMissingLast="true" /> ?
<fieldType name="boolean" class="solr.BoolField" sortMissingLast="true"/> ?
<fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/>? ? ??
<fieldType name="float" class="solr.TrieFloatField" precisionStep="0" positionIncrementGap="0"/>? ? ??
<fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>? ? ?
<fieldType name="double" class="solr.TrieDoubleField" precisionStep="0" positionIncrementGap="0"/> ?
<fieldType name="date" class="solr.TrieDateField" precisionStep="0" positionIncrementGap="0"/> ?
<!-- ?
? ? ? ? <fieldType name="text_ik" class="solr.TextField">? ? ? ?
<analyzer class="org.wltea.analyzer.lucene.IKAnalyzer"/>? ? ??
</fieldType> ?
--> ? ?
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100"> ?
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<!-- in this example, we will only use synonyms at query time
<filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
-->
<filter class="solr.LowerCaseFilterFactory"/> ?
? </analyzer> ?
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/> ?
</analyzer>? ? ??
</fieldType>??
</types>????
<fields>? ? ??
<field name="_version_" type="long" indexed="true" stored="true"/>? ? ??
<field name="_id" type="string" indexed="true" stored="true" />? ? ??
<field name="_ts" type="long" indexed="true" stored="true" /> ? ??
<field name="ns" type="string" indexed="true" stored="true"/>? ?
<field name="docLibrayId" type="string" indexed="true" stored="true"/>? ? ??
<field name="originalDocPath" type="string" indexed="true" stored="true"/>? ? ??
<field name="htmlDocPath" type="string" indexed="true" stored="true" />? ? ??
<field name="originalFileName" type="string" indexed="true" stored="true"/>? ? ??
<field name="majorId" type="string" indexed="true" stored="true"/>? ?
<field name="majorName" type="string" indexed="true" stored="true"/>? ? ??
<field name="propertyId" type="string" indexed="true" stored="true"/>? ? ??
<field name="propertyName" type="string" indexed="true" stored="true"/>? ? ??
<field name="wordNum" type="int" indexed="true" stored="true"/>? ? ??
<field name="paragNum" type="int" indexed="true" stored="true"/>? ? ??
<field name="sentenceNum" type="int" indexed="true" stored="true"/> ?
<field name="content" type="text_general" indexed="false" stored="true" multiValued="true"/>??
</fields>??
<uniqueKey>_id</uniqueKey>??
<defaultSearchField>majorName</defaultSearchField>??
<solrQueryParser defaultOperator="OR"/>
</schema>
回到執行mongodb-connector命令的所在位置:
mongo-connector -m localhost:27017 --auto-commit-interval=0 -t http://localhost:8983/solr/docdetection -d solr_doc_manager &
找到:oplog.timestamp,然后刪除。同樣,也可以刪除mongo-connector.log這個文件
進入索引的存放目錄:
cd /home/software/solr-5.3.1/server/solr/docdetection/data
刪除生成的所有的索引信息rm -rf * ? ?(注意目錄在:cd /home/software/solr-5.3.1/server/solr/docdetection/data)
然后再執行:
重啟solr,命令在博文的上面:
mongo-connector -m localhost:27017 --auto-commit-interval=0 -t http://localhost:8983/solr/docdetection -d solr_doc_manager &
總結
以上是生活随笔為你收集整理的Solr(搜索引擎服务)和MongoDB通过mongodb-connector进行数据同步的解决方案,以及遇到的各种坑的总结(针对solr-5.3.x版本),mongodb和solr实现实时增量索引的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: solr快速入门的地址,spring-d
- 下一篇: 顺周期怎么理解 简单了解下