【Sphinx】MySQL+Sphinx 全文检索的使用和测试
? ? 導讀:最近在線上由于某些不可名狀的原因需要在數據庫中過濾一些詞語,所以線上比較多的語句都是 select c1,c2 fron tb1 where c1 like '%name%'類似與這個樣子的模糊查詢。開始想到了全文索引來進行分詞,但是又想到這些都是線上數據庫,并且表中的數據都是幾千萬行的數據,在線online ddl 或者說使用第三方的工具的話對于在修改期間會阻塞線上的DML操作,并且針對全文索引的維護也需要消耗一定的資源,所以暫時打消了這個想法。在和開發進行討論并且結合實際業務,最后還是打算選型第三方的全文檢索工具,在這里選擇了全文檢索工具sphinx。
? ? sphinx的安裝和測試:
sphinx的安裝
? 2. 配置文件
#?Minimal?Sphinx?configuration?sample?(clean,?simple,?functional)
#
source?sbtest_sbtest1
{
type =?mysql
sql_host =?localhost?
sql_user =?root
sql_pass =?redhat
sql_db =?sbtest
sql_port =?3306 #?optional,?default?is?3306
sql_query =?\
SELECT?userid?as??id,?\
????????????????k,?\
????????????????c,?\
????????????????pad?\
FROM?sbtest1?\
????????????????where?c?like?'%03679578678%'?\
sql_attr_uint??? =?k??#數據類型int或者timestamp
????????sql_field_string?????????=?c??#數據類型text或者string
????????sql_attr_string?????????=?pad?#數據類型string
}
index?sbtest1
{
source =?sbtest_sbtest1
path =?/var/lib/sphinx/sbtest1?#索引文件的路徑
}
index?testrt
{
type =?rt
rt_mem_limit =?128M
path =?/var/lib/sphinx/testrt
rt_field =?title
rt_field =?content
rt_attr_uint =?gid
}
indexer
{
mem_limit =?128M
}
searchd
{
listen =?9312?
listen =?9306:mysql41????#對外提供服務的端口,類似與mysql中的3306端口
log =?/var/log/sphinx/searchd.log
query_log =?/var/log/sphinx/query.log
read_timeout =?5
max_children =?30
pid_file =?/var/run/sphinx/searchd.pid
seamless_rotate =?1
preopen_indexes =?1
unlink_old =?1
workers =?threads?#?for?RT?to?work
binlog_path =?/var/lib/sphinx/
}
? ? 注意:在配置文件中需要注意的是:
? ? a. 在sql_query參數后面是sql語句,但是這個sql語句的第一個字段必須是id字段,在后面的參數類型定義的時候不需要定義,否則的話sphinx是不能識別這個屬性的。
? ? b. 在屬性的定義的時候必須包含一個sql_filed_string類型的字段,否則sphinx認為你沒有必須使用全文檢索這個功能。
??3.創建索引
indexer?sbtest1? 4.啟動searchd 服務
? 5.查看searchd服務狀態
Sphinx?2.3.2-id64-beta?(4409612)
Copyright?(c)?2001-2016,?Andrew?Aksyonoff
Copyright?(c)?2008-2016,?Sphinx?Technologies?Inc?(http://sphinxsearch.com)
using?config?file?'/etc/sphinx/sphinx.conf'...
searchd?status
--------------
uptime:?80403
connections:?11
maxed_out:?0
command_search:?7
command_excerpt:?0
command_update:?0
command_delete:?0
command_keywords:?0
command_persist:?0
command_status:?1
command_flushattrs:?0
agent_connect:?0
agent_retry:?0
queries:?7
dist_queries:?0
query_wall:?0.010
query_cpu:?OFF
dist_wall:?0.000
dist_local:?0.000
dist_wait:?0.000
query_reads:?OFF
query_readkb:?OFF
query_readtime:?OFF
avg_query_wall:?0.001
avg_query_cpu:?OFF
avg_dist_wall:?0.000
avg_dist_local:?0.000
avg_dist_wait:?0.000
avg_query_reads:?OFF
avg_query_readkb:?OFF
avg_query_readtime:?OFF
qcache_max_bytes:?16777216
qcache_thresh_msec:?3000
qcache_ttl_sec:?60
qcache_cached_queries:?0
qcache_used_bytes:?0
qcache_hits:?0
? 6.測試使用:mysql 客戶端
mysql:?[Warning]?Using?a?password?on?the?command?line?interface?can?be?insecure.
+------+----------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
|?id???|?k????????|?c???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????|?pad?????????????????????????????????????????????????????????|
+------+----------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
|????1?|?15324329?|?64733237507-56788752464-03679578678-53343296505-31167207241-10603050901-64148678956-82738243332-73912678420-24566188603?|?78326593386-76411244360-77646817949-14319822046-41963083322?|
+------+----------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
? 7.python測試使用
#!/usr/bin/env?python#coding:utf-8
import?pymysql
con?=?pymysql.connect(host='127.0.0.1',?port=9306,?user="",?passwd="",?db="")
with?con.cursor(pymysql.cursors.DictCursor)?as?cur:
????cur.execute("select?*?from?sbtest1?where?match('%03679578678%')")
????print(cur.fetchall())
????
[root@TiDB-node2?~]#?python?sphinx.py
[{u'c':?'64733237507-56788752464-03679578678-53343296505-31167207241-10603050901-64148678956-82738243332-73912678420-24566188603',?u'k':?15324329,?u'pad':?'78326593386-76411244360-77646817949-14319822046-41963083322',?u'id':?1}]
? 8.表結構:
root@mysqldb?18:06:??[sbtest]>?show?create?table?sbtest1?\G***************************?1.?row?***************************
???????Table:?sbtest1
Create?Table:?CREATE?TABLE?`sbtest1`?(
??`id`?int(11)?NOT?NULL?AUTO_INCREMENT,
??`k`?int(11)?NOT?NULL?DEFAULT?'0',
??`c`?char(120)?NOT?NULL?DEFAULT?'',
??`pad`?char(60)?NOT?NULL?DEFAULT?'',
??PRIMARY?KEY?(`id`),
??KEY?`k_1`?(`k`)
)?ENGINE=InnoDB?AUTO_INCREMENT=25000000?DEFAULT?CHARSET=utf8mb4
? ? 測試環境中表sbtest1的數據有2500萬行,測試sql是select * from sbtest1 where c like '%03679578678%'。在MySQL執行這個sql的時候由于使用不了索引只能進行全表掃描,花費了將近4分鐘,但是在sphinx創建索引的時候也花費了70多秒的樣子。在索引創建成功之后基本上面執行python腳本或者使用mysql客戶端獲取得到的花費基本是在毫秒級別的。性能提升了很多,并且sphinx和MySQL的在創建索引成功之后之間就沒有鏈接了。并不會影響MySQL的操作,sphinx可以獨立部署到一臺服務器上面。
? 9.更新索引文件:增量更新索引文件
indexer?--rotate?test1
轉載于:https://blog.51cto.com/11819159/2052167
總結
以上是生活随笔為你收集整理的【Sphinx】MySQL+Sphinx 全文检索的使用和测试的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 关于http中Transfer-Enco
- 下一篇: 《12个球问题》分析