PostgreSQL GIN multi-key search 优化
標(biāo)簽
PostgreSQL , gin , in , or , multi key , right link scan , skip scan
背景
PostgreSQL中,有一種GIN索引,被廣泛應(yīng)用于多值類型,例如數(shù)組,分詞,同時(shí)也被應(yīng)用于模糊查詢等領(lǐng)域。
gin索引,將列(比如數(shù)組,全文檢索類型)中的值拿出來,再存儲(chǔ)到樹形結(jié)構(gòu)中(類似B-TREE,鍵值+heap行號(hào)s),對于低頻值,會(huì)作為posting list直接存在樹的gin的葉子節(jié)點(diǎn)中,而對于高頻值,行號(hào)s會(huì)存儲(chǔ)在另外樹結(jié)構(gòu)(posting tree)中,gin的葉子節(jié)點(diǎn)中存儲(chǔ)的是指向posting tree的pointer。
GIN本質(zhì)上是elemet為key的樹結(jié)構(gòu),而value則為"posting tree pointer"或者"posting list"。
Internally, a GIN index contains a B-tree index constructed over keys, where each key is an element of one or more indexed items (a member of an array, for example) and where each tuple in a leaf page contains either a pointer to a B-tree of heap pointers (a “posting tree”), / or a simple list of heap pointers (a “posting list”) when the list is small enough to fit into a single index tuple along with the key value.關(guān)于GIN的一些介紹,可參考
《從難纏的模糊查詢聊開 - PostgreSQL獨(dú)門絕招之一 GIN , GiST , SP-GiST , RUM 索引原理與技術(shù)背景》
posting list\tree
gin 索引的葉子節(jié)點(diǎn)中存儲(chǔ)的為posting list(heap page number, itempointers),或者posting tree(heap pointers構(gòu)建的樹)。
那么什么時(shí)候使用list什么時(shí)候使用tree呢?
因?yàn)閜osting list是在GIN的葉子節(jié)點(diǎn)里面直接存儲(chǔ)的,所以指當(dāng)heap pointers較少,小于TOAST_INDEX_TARGET時(shí)(參考自PostgreSQL數(shù)據(jù)庫內(nèi)核分析,基于8.4的版本編寫),使用posting list.
否則使用posting tree。
gin結(jié)構(gòu)
可以使用pageinspect觀察gin索引的內(nèi)容。
https://www.postgresql.org/docs/devel/static/pageinspect.html
多值搜索例子
多值查詢,例如 where column @> aa and column @> bb and column @> cc and ....
多值查詢是比較常見的需求,例如有一個(gè)表存儲(chǔ)的是店家售賣的商品ID,每個(gè)店家一行記錄,其中列A為數(shù)組類型,數(shù)組中包含了這家店鋪售賣的商品ID。
找出有哪些店鋪在售賣熱水器(ID=1)、筆記本(ID=2)以及臺(tái)式機(jī)(ID=3)。可以這樣把你的需求告訴數(shù)據(jù)庫 where column @> array[1,2,3] 或者 column @> 1 and column @> 2 and column @> 3
這種查詢可以使用GIN索引,由于本質(zhì)上GIN還是個(gè)樹結(jié)構(gòu),所以掃描方法和B-Tree實(shí)際上是相差不大的,B-Tree類似的優(yōu)化手段同樣適用。
多值搜索優(yōu)化
2014-01-29, 30 提交的幾個(gè)patch針對多值搜索場景進(jìn)行了優(yōu)化
1. where column @> aa and column @> bb and column @> cc
gin 索引掃描方法是bitmap scan,也就是對gin中每一個(gè)命中KEY的posting list/tree中存儲(chǔ)的CTID(heap行號(hào))排序后,再開始從HEAP掃描結(jié)果。
當(dāng)找到了一條滿足 "某一條件(如column @> aa)" 的記錄后,首先對該posting list/tree里面存儲(chǔ)的CTIDs進(jìn)行排序,這個(gè)時(shí)候就得到了一個(gè)有序的ctid列表LIST-A,由于INDEX沒有版本信息,所以要從HEAP搜索對應(yīng)的記錄,判斷可見性。
當(dāng)可見時(shí),這條記錄的CTID-A會(huì)被用于掃描優(yōu)化,也就是這個(gè)patch的優(yōu)化點(diǎn)。
另一個(gè)條件(如column @> bb),也會(huì)對該posting list/tree里面存儲(chǔ)的CTIDs進(jìn)行排序,這個(gè)時(shí)候也會(huì)得到一個(gè)有序的ctid列表LIST-B。
優(yōu)化點(diǎn)在于,當(dāng)LIST-B的ctid < CTID-A時(shí),這些LIST-B的ctid會(huì)直接跳過,不需要到HEAP進(jìn)行CHECK可見性,從而減少IO操作。
為什么可以跳過呢?
因?yàn)間in使用了bitmap scan,所以一定是ctid的有序掃描。
那么從邏輯推理角度來看,比如A條件,第一個(gè)滿足條件的行號(hào)為1000,B條件的posting list/tree為(1,2,3,1000,2000),那么1,2,3則可以直接跳過,因?yàn)樗贏條件中肯定不存在。
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=e20c70cb0fa74d5bffa080e21a99b44bf0768667
Allow skipping some items in a multi-key GIN search. In a multi-key search, ie. something like "col @> 'foo' AND col @> 'bar'", as soon as we find the next item that matches the first criteria, we don't need to check the second criteria for TIDs smaller the first match. That saves a lot of effort, especially if one of the terms is rare, while the second occurs very frequently. Based on ideas from Alexander Korotkov's fast scan patch.當(dāng)某個(gè)條件比較罕見,而另一個(gè)條件很常見時(shí),有立竿見影的效果。(也就是說一個(gè)有很多行記錄滿足條件,另一個(gè)則只有少量記錄滿足條件)
例子(posting list/tree最小的先掃描,所以直接跳過若干ctid的掃描)
2. 依舊是multi-key的優(yōu)化,優(yōu)化點(diǎn)和1類似,對于所有tid都更小的posting list segments,連decoding都不做。
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=25b1dafab63f465a65c63b26834dc18857f0fa0c
Further optimize multi-key GIN searches. If we're skipping past a certain TID, avoid decoding posting list segments that only contain smaller TIDs. Extracted from Alexander Korotkov's fast scan patch, heavily modified. + GinPostingList *seg = GinDataLeafPageGetPostingList(page); Size len = GinDataLeafPageGetPostingListSize(page); + Pointer endptr = ((Pointer) seg) + len; + GinPostingList *next; + + /* Skip to the segment containing advancePast+1 */ + if (ItemPointerIsValid(&advancePast)) + { + next = GinNextPostingListSegment(seg); + while ((Pointer) next < endptr && + ginCompareItemPointers(&next->first, &advancePast) <= 0) + { + seg = next; + next = GinNextPostingListSegment(seg); + } + len = endptr - (Pointer) seg; + } - result = ginPostingListDecodeAllSegments(ptr, len, nitems); + if (len > 0) + result = ginPostingListDecodeAllSegments(seg, len, nitems); + else + { + result = NULL; + *nitems = 0; + }3. 跳躍掃描優(yōu)化,指posting tree的掃描優(yōu)化,當(dāng)skip的element已經(jīng)不在當(dāng)前posting tree的當(dāng)前page時(shí),返回posting tree的root開始掃描。
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=626a120656a75bf4fe64b1d0d83c23cb38d3771a
Further optimize GIN multi-key searches. When skipping over some items in a posting tree, re-find the new location by descending the tree from root, rather than walking the right links. This can save a lot of I/O. Heavily modified from Alexander Korotkov's fast scan patch. + bool stepright; + + if (!BufferIsValid(entry->buffer)) + { + entry->isFinished = true; + return; + } + + /* + * We have two strategies for finding the correct page: step right from + * the current page, or descend the tree again from the root. If + * advancePast equals the current item, the next matching item should be + * on the next page, so we step right. Otherwise, descend from root. + */ + if (ginCompareItemPointers(&entry->curItem, &advancePast) == 0) + { + stepright = true; + LockBuffer(entry->buffer, GIN_SHARE); + } + else + { + GinBtreeStack *stack; + + ReleaseBuffer(entry->buffer); + + /* + * Set the search key, and find the correct leaf page. + */ + if (ItemPointerIsLossyPage(&advancePast)) + { + ItemPointerSet(&entry->btree.itemptr, + GinItemPointerGetBlockNumber(&advancePast) + 1, + FirstOffsetNumber); + } + else + { + entry->btree.itemptr = advancePast; + entry->btree.itemptr.ip_posid++; + } + entry->btree.fullScan = false; + stack = ginFindLeafPage(&entry->btree, true); + + /* we don't need the stack, just the buffer. */ + entry->buffer = stack->buffer; + IncrBufferRefCount(entry->buffer); + freeGinBtreeStack(stack); + stepright = false; + } + + elog(DEBUG2, "entryLoadMoreItems, %u/%u, skip: %d", + GinItemPointerGetBlockNumber(&advancePast), + GinItemPointerGetOffsetNumber(&advancePast), + !stepright);跳躍掃描優(yōu)化類似于遞歸查詢的優(yōu)化
參考
《時(shí)序數(shù)據(jù)合并場景加速分析和實(shí)現(xiàn) - 復(fù)合索引,窗口分組查詢加速,變態(tài)遞歸加速》
《distinct xx和count(distinct xx)的變態(tài)遞歸優(yōu)化方法 - 索引收斂(skip scan)掃描》
《用PostgreSQL找回618秒逝去的青春 - 遞歸收斂優(yōu)化》
posting list 壓縮優(yōu)化
posting list的壓縮優(yōu)化也是9.4對GIN的優(yōu)化之一。
fastupdate, pending list 優(yōu)化
由于多值類型的變更,插入,可能影響GIN索引的若干個(gè)KEY,所以IO巨大,為了減少這種IO,提高數(shù)據(jù)的寫入\變更速度,提出了pending list的結(jié)構(gòu),類似緩沖區(qū),這部分?jǐn)?shù)據(jù)非樹結(jié)構(gòu),可以有效合并IO,使用速度提升非常明顯。
但是要注意pending list的存在,使得查詢效率有一定的下降,特別是pending list中有大量數(shù)據(jù)時(shí),使用vacuum可以手動(dòng)將pending list合并到gin tree中。
或者等pending list寫滿后觸發(fā)合并的動(dòng)作,或者等待autovcauum來合并。
https://www.postgresql.org/docs/9.6/static/gin-tips.html
其他
btree_gin
https://www.postgresql.org/docs/9.6/static/btree-gin.html
btree_gin為普通類型的GIN索引接口。
int2, int4, int8, float4, float8, timestamp with time zone, timestamp without time zone, time with time zone, time without time zone, date, interval, oid, money, "char", varchar, text, bytea, bit, varbit, macaddr, inet, and cidr它主要是GIN的開發(fā)例子,或者復(fù)合索引(如int, tsvector的復(fù)合查詢,可以建立GIN的單一索引)
Also, for queries that test both a GIN-indexable column and a B-tree-indexable column, it might be more efficient to create a multicolumn GIN index that uses one of these operator classes than to create two separate indexes that would have to be combined via bitmap ANDing.由于這些標(biāo)量類型默認(rèn)只有B-Tree和hash索引掃描方法,當(dāng)查詢需求包含數(shù)組列,同時(shí)還包含這些標(biāo)量數(shù)據(jù)列的查詢時(shí)。
1. 如果有兩個(gè)索引,那么會(huì)對兩個(gè)索引的CTID進(jìn)行合并
bitmap anding
例子
create table t1(id int , c1 int[]);create index idx1 on t1 using btree (id); create index idx2 on t1 using gin (c1);select ? from t1 where id=? and c1 @> ....;2. 而如果是一個(gè)GIN復(fù)合索引(標(biāo)量+多值類型),則不需要bitmap anding操作。
例子 , 使用gin復(fù)合索引
create extension btree_gin; create index idx3 on t1 using gin (id, c1); select ? from t1 where id=? and c1 @> ....;參考
GIN in 9.4 and further
https://www.postgresql.org/docs/current/static/indexes-bitmap-scans.html
https://www.postgresql.org/docs/current/static/indexes-multicolumn.html
A multicolumn B-tree index can be used with query conditions that involve any subset of the index's columns, but the index is most efficient when there are constraints on the leading (leftmost) columns. The exact rule is that equality constraints on leading columns, plus any inequality constraints on the first column that does not have an equality constraint, will be used to limit the portion of the index that is scanned. Constraints on columns to the right of these columns are checked in the index, so they save visits to the table proper, but they do not reduce the portion of the index that has to be scanned. For example, given an index on (a, b, c) and a query condition WHERE a = 5 AND b >= 42 AND c < 77, the index would have to be scanned from the first entry with a = 5 and b = 42 up through the last entry with a = 5. Index entries with c >= 77 would be skipped, but they'd still have to be scanned through. This index could in principle be used for queries that have constraints on b and/or c with no constraint on a — but the entire index would have to be scanned, so in most cases the planner would prefer a sequential table scan over using the index.A multicolumn GiST index can be used with query conditions that involve any subset of the index's columns. Conditions on additional columns restrict the entries returned by the index, but the condition on the first column is the most important one for determining how much of the index needs to be scanned. A GiST index will be relatively ineffective if its first column has only a few distinct values, even if there are many distinct values in additional columns.A multicolumn GIN index can be used with query conditions that involve any subset of the index's columns. Unlike B-tree or GiST, index search effectiveness is the same regardless of which index column(s) the query conditions use.https://www.postgresql.org/docs/devel/static/gin-implementation.html
Multicolumn GIN indexes are implemented by building a single B-tree over composite values (column number, key value). The key values for different columns can be of different types.src/backend/access/gin/README
* In a single-column index, a key tuple just contains the key datum, but in a multi-column index, a key tuple contains the pair (column number, key datum) where the column number is stored as an int2. This is needed to support different key data types in different columns. This much of the tuple is built by index_form_tuple according to the usual rules. The column number (if present) can never be null, but the key datum can be, in which case a null bitmap is present as usual. (As usual for index tuples, the size of the null bitmap is fixed at INDEX_MAX_KEYS.)backend/access/gin/ginentrypage.c: itup = index_form_tuple(ginstate->tupdesc[attnum - 1], datums, isnull);
backend/access/nbtree/nbtree.c: itup = index_form_tuple(RelationGetDescr(rel), values, isnull);
backend/access/common/indextuple.c:index_form_tuple(TupleDesc tupleDescriptor,
總結(jié)
以上是生活随笔為你收集整理的PostgreSQL GIN multi-key search 优化的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 字符编码(一)
- 下一篇: 3720: Gty的妹子树