Rocksdb 写入数据后 GetApproximateSizes 获取的大小竟然为0?
項目開發中需要從引擎 獲取一定范圍的數據大小,用作打點上報,測試過程中竟然發現寫入了一部分數據之后通過GetApproximateSizes 獲取寫入的key的范圍時取出來的數據大小竟然為0。。。難道發現了一個bug?(欣喜)
因為寫入的數據是小于一個sst的data-block(默認是4K),會不會因為GetApproximateSizes 對小于一個data-block的數據大小都默認是0?對于一個嚴謹的引擎,這么明顯的問題顯然不可忍。
問題代碼:
#include <iostream>
#include <string>#include <rocksdb/db.h>
#include <rocksdb/slice.h>#define VALUE_SIZE 100using namespace std;
using namespace rocksdb;void check_status(Status s, std::string op) {if (!s.ok()) {cout << " Excute " << op << " failed "<< s.ToString() << endl;exit(1);}
}static std::string Key(int i) {char buf[100];snprintf(buf, sizeof(buf), "key%06d", i);return std::string(buf);
}int main() {rocksdb::DB* db;rocksdb::Options options;rocksdb::Status s;options.create_if_missing = true;options.compression = kNoCompression;// 打開dbcheck_status(rocksdb::DB::Open(options, "./db", &db), "Open DB");// 寫入10條key-value,value大小是100Bfor (int i = 0;i < 10; i++) {check_status(db->Put(WriteOptions(), Key(i), Slice(string(VALUE_SIZE, 'a' + (i % 26)))), "Put DB");}// 取其中的key范圍為[1,3],獲取處于這個范圍的key-value大小uint64_t size;string start = Key(1);string end = Key(3);Range r(start, end);db->GetApproximateSizes(&r, 1, &size);cout << "Approximate size is " << size << endl; delete db;return 0;
}
最終的執行結果是:
Approximate size is 0
本來開開心心,很明顯的問題,想要分析一下原因,向社區提一個PR,結果翻看了一下源代碼就沒心情了,還是自己太天真。
這個獲取指定范圍的key大小的接口是有一個額外參數的include_flags:
virtual void GetApproximateSizes(const Range* ranges, int n, uint64_t* sizes,uint8_t include_flags = INCLUDE_FILES) {GetApproximateSizes(DefaultColumnFamily(), ranges, n, sizes, include_flags);}
這個額外參數是用來指定從rocksdb的哪一個組件獲取指定范圍的key的大小,比如從memtable,或則 sst?
自己使用默認參數 寫入了一小部分數據,顯然沒有達到觸發flush的條件,都會存儲在memtable,所以這里從默認的sst文件獲取這個范圍的key大小時顯然獲取不到。
可以繼續看更底層的實現:
Status DBImpl::GetApproximateSizes(const SizeApproximationOptions& options,ColumnFamilyHandle* column_family,const Range* range, int n, uint64_t* sizes) {......Version* v;auto cfh = static_cast_with_check<ColumnFamilyHandleImpl>(column_family);auto cfd = cfh->cfd();// 增加針對當前cf的引用SuperVersion* sv = GetAndRefSuperVersion(cfd);v = sv->current;// 允許同時傳入多個range,這里對傳入的range進行遍歷for (int i = 0; i < n; i++) {Slice start = range[i].start;Slice limit = range[i].limit;// Add timestamp if neededstd::string start_with_ts, limit_with_ts;if (ts_sz > 0) {// Maximum timestamp means including all key with any timestampAppendKeyWithMaxTimestamp(&start_with_ts, start, ts_sz);// Append a maximum timestamp as the range limit is exclusive:// [start, limit)AppendKeyWithMaxTimestamp(&limit_with_ts, limit, ts_sz);start = start_with_ts;limit = limit_with_ts;}// Convert user_key into a corresponding internal key.InternalKey k1(start, kMaxSequenceNumber, kValueTypeForSeek);InternalKey k2(limit, kMaxSequenceNumber, kValueTypeForSeek);sizes[i] = 0;// 從sst文件中取指定key范圍的大小if (options.include_files) {sizes[i] += versions_->ApproximateSize(options, v, k1.Encode(), k2.Encode(), /*start_level=*/0,/*end_level=*/-1, TableReaderCaller::kUserApproximateSize);}// 從memtable中取出指定key范圍的大小,包括mem和immif (options.include_memtabtles) {sizes[i] += sv->mem->ApproximateStats(k1.Encode(), k2.Encode()).size;sizes[i] += sv->imm->ApproximateStats(k1.Encode(), k2.Encode()).size;}}// 釋放對superversion的引用ReturnAndCleanupSuperVersion(cfd, sv);return Status::OK();
}
再對應到從sst文件的blockbased table中取數據,需要創建blockbased的index的iter來取start-end key所屬的datablock的偏移地址。
如果要從memtable 中取數據,也就是需要遍歷skiplist,順序逐層遍歷跳表,找到屬于start-end范圍內的所有key的個數,統一計算大小。
經過上面一輪的分析,我們就知道了想要通過GetApproximateSizes 獲取準確的一個區間內的key-value大小,需要同時計算memtable+sst的大小,這才足夠精確。
ps: 同樣的數據放在memtable和放在sst中是不一樣的,因為sst中除了data-block中key-value數據,還有indexblock,還有metaindex,還有footer。所以統計同樣的數據在memtable和sst中會有一些差異。
最終正確使用GetApproximateSizes() 接口的方式如下:
#include <iostream>
#include <string>#include <rocksdb/db.h>
#include <rocksdb/slice.h>#define VALUE_SIZE 100using namespace std;
using namespace rocksdb;void check_status(Status s, std::string op) {if (!s.ok()) {cout << " Excute " << op << " failed "<< s.ToString() << endl;exit(1);}
}static std::string Key(int i) {char buf[100];snprintf(buf, sizeof(buf), "key%06d", i);return std::string(buf);
}int main() {rocksdb::DB* db;rocksdb::Options options;rocksdb::Status s;options.create_if_missing = true;options.compression = kNoCompression;check_status(rocksdb::DestroyDB("./db", options),"DestroyDB");check_status(rocksdb::DB::Open(options, "./db", &db), "Open DB");for (int i = 0;i < 3; i++) {check_status(db->Put(WriteOptions(), Key(i), Slice(string(VALUE_SIZE, 'a' + (i % 26)))), "Put DB");}uint64_t size;string start = Key(1);string end = Key(3);Range r(start, end);db->GetApproximateSizes(&r, 1, &size);cout << "Approximate size is " << size << endl; uint8_t include_both = DB::SizeApproximationFlags::INCLUDE_FILES |DB::SizeApproximationFlags::INCLUDE_MEMTABLES;db->GetApproximateSizes(&r, 1, &size, include_both);cout << "After set memtable flag, Approximate size is " << size << endl; db->Flush(FlushOptions());db->GetApproximateSizes(&r, 1, &size);cout << "After flush, Approximate size is " << size << endl; delete db;return 0;
}
輸出如下:
Approximate size is 0
After set memtable flag, Approximate size is 238
After flush, Approximate size is 1151
好吧,不用提bug了。。。。。。
總結
以上是生活随笔為你收集整理的Rocksdb 写入数据后 GetApproximateSizes 获取的大小竟然为0?的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 帮我看看做的双眼皮是不是失败了?[已扎口
- 下一篇: 从BloomFilter到Counter