當前位置：首頁 > 人文社科 > 生活经验 >内容正文

生活经验

pmdk -- libpmemlog 介绍

發布時間：2023/11/27 生活经验 23 豆豆

生活随笔收集整理的這篇文章主要介紹了 pmdk -- libpmemlog 介绍小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

文章目錄

- 1. libpmemlog 應用背景
- 2. libpmemlog 使用方式
- - 2.1 基本接口
  - 2.2 接口使用
- 3. Libpmemlog 性能
- - 3.1 write sys call 性能
  - 3.2 libpmemlog 性能

1. libpmemlog 應用背景

本文介紹的是英特爾傲騰持久化內存 pmdk中的一個持久化日志的庫。

我們正常系統中會將日志形成一個LOG文件保存到磁盤中，這個過程在PMEM中也是類似的。尤其是基于PMEM構建自己的高性能應用時，使用傳統的文件系統寫日志的接口會降低PMEM本身的日志寫入速度，從而間接影響系統性能，這方面后文會有兩者的性能對比測試。

所以基于PMEM的系統開發，日志也建議使用PMEM的日志庫來完成日志信息的持久化。

2. libpmemlog 使用方式

社區提供了對linux和windows系統的不同使用方式，這里也會有一些基本的區別。

2.1 基本接口

pmemlog_create 在PMEM上創建一個pmempool，用于存儲數據。

Pmempool 是pmdk操作PMEM空間的一種形態，需要先從PMEM上劃分一定的存儲空間pmempool，然后通過類似文件的接口寫入/刪除數據。
```
PMEMlogpool *pmemlog_create(const char *path, size_t poolsize, mode_t mode);
```
包括指定。pmempool 的存儲路徑，pool大小，訪問權限。

該接口底層還是會使用pmemobj 來進行創建：
```
PMEMlogpool *
pmemlog_create(const char *path, size_t poolsize, mode_t mode)
{return (PMEMlogpool *)pmemobj_create(path, LAYOUT_NAME,poolsize, mode);
}
```
pmemlog_open 打開pmempool，打開成功并返回pool文件句柄。
```
PMEMlogpool *pmemlog_open(const char *path);
```
只需要指定打開的pool文件路徑即可

底層仍然會使用pmemobj相關接口創建，能夠保證創建過程的一致性和原子性。
```
PMEMlogpool *
pmemlog_open(const char *path)
{return (PMEMlogpool *)pmemobj_open(path, LAYOUT_NAME);
}
```

pmemlog_append 向打開的pmempool中寫入數據，追加方式，類似write

int pmemlog_append(PMEMlogpool *plp, const void *buf, size_t count);

指定寫入的pool的 pmempool 對象 plp，要寫入的數據buf, 以及寫入大小count

當然寫入的過程仍然是通過pmemobj相關的事務接口保證寫入的原子性：

int
pmemlog_append(PMEMlogpool *plp, const void *buf, size_t count)
{PMEMobjpool *pop = (PMEMobjpool *)plp;PMEMoid baseoid = pmemobj_root(pop, sizeof(struct base));struct base *bp = pmemobj_direct(baseoid);/* set the return point */jmp_buf env;if (setjmp(env)) {/* end the transaction */(void) pmemobj_tx_end();return 1;}/* begin a transaction, also acquiring the write lock for the log */if (pmemobj_tx_begin(pop, env, TX_PARAM_RWLOCK, &bp->rwlock,TX_PARAM_NONE))return -1;/* allocate the new node to be inserted */PMEMoid log = pmemobj_tx_alloc(count + sizeof(struct log_hdr),LOG_TYPE);struct log *logp = pmemobj_direct(log);logp->hdr.size = count;memcpy(logp->data, buf, count);logp->hdr.next = OID_NULL;/* add the modified root object to the undo log */pmemobj_tx_add_range(baseoid, 0, sizeof(struct base));if (bp->tail.off == 0) {/* update head */bp->head = log;} else {/* add the modified tail entry to the undo log */pmemobj_tx_add_range(bp->tail, 0, sizeof(struct log));((struct log *)pmemobj_direct(bp->tail))->hdr.next = log;}bp->tail = log; /* update tail */bp->bytes_written += count;pmemobj_tx_commit();(void) pmemobj_tx_end();return 0;
}

pmemlog_walk 從pmempool中遍歷寫入的數據

void pmemlog_walk(PMEMlogpool *plp, size_t chunksize,int (*process_chunk)(const void *buf, size_t len, void *arg),void *arg);

其中 plp是pmemlogpool的對象，通過process_chunk 回調函數來訪問plp中起始到結束的所有數據，訪問粒度是chunksize（代碼中好像沒啥用，根本沒有用到這個變量）。

需要注意的是pmemlog_walk函數為了保證訪問的原子性，會在處理過程中會加讀鎖，這個時候不能在process_chunk中再次調用pmemlog_append寫入數據，可能會出現死鎖。

void
pmemlog_walk(PMEMlogpool *plp, size_t chunksize,int (*process_chunk)(const void *buf, size_t len, void *arg), void *arg)
{PMEMobjpool *pop = (PMEMobjpool *)plp;// 創建一個obj root對象，如果已經有了，就直接返回。// 用于后續對pool中數據的讀取。struct base *bp = pmemobj_direct(pmemobj_root(pop,sizeof(struct base)));//加讀鎖， 加失敗則返回，說明有其他進程在訪問if (pmemobj_rwlock_rdlock(pop, &bp->rwlock) != 0)return;/* process all chunks */// 返回root對象的頭指針,依次訪問所有的數據// 因為之前數據的存放也是這樣追加鏈表尾的方式寫入的。struct log *next = pmemobj_direct(bp->head);while (next != NULL) {(*process_chunk)(next->data, next->hdr.size, arg);next = pmemobj_direct(next->hdr.next);}// 讀取完畢，釋放讀鎖pmemobj_rwlock_unlock(pop, &bp->rwlock);
}

pmemlog_rewind 清理pmemlogpool中的所有數據，事務方式清理
pmemlog_close 關閉pmempool 的對象

2.2 接口使用

如下代碼，功能是使用pmemlog相關接口寫入持久化數據，并打印出來。

#include <stdio.h>
#include <fcntl.h>
#include <errno.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <libpmemlog.h>/* size of the pmemlog pool -- 1 GB */
#define POOL_SIZE ((size_t)(1 << 30))/** printit -- log processing callback for use with pmemlog_walk()*/
int
printit(const void *buf, size_t len, void *arg)
{fwrite(buf, len, 1, stdout);return 0;
}int
main(int argc, char *argv[])
{const char path[] = "./pmem";PMEMlogpool *plp;size_t nbyte;char *str;/* create the pmemlog pool or open it if it already exists */plp = pmemlog_create(path, POOL_SIZE, 0666);if (plp == NULL)plp = pmemlog_open(path);if (plp == NULL) {perror(path);exit(1);}/* how many bytes does the log hold? */nbyte = pmemlog_nbyte(plp);printf("log holds %zu bytes\n", nbyte);/* append to the log... */str = "This is the first string appended\n";if (pmemlog_append(plp, str, strlen(str)) < 0) {perror("pmemlog_append");exit(1);}str = "This is the second string appended\n";if (pmemlog_append(plp, str, strlen(str)) < 0) {perror("pmemlog_append");exit(1);}/* print the log contents */printf("log contains:\n");pmemlog_walk(plp, 0, printit, NULL);pmemlog_close(plp);
}

編譯:
g++ libpmemlog_walk.cc -o log_walk -lpmem -lpmemlog

輸出如下：

$ ./log_walk
log holds 1073733632 bytes
log contains:
This is the first string appended
This is the second string appended

可以看到在當前PMEM 掛載的文件系統中有一個pmem pool的data 類型的文件，可以通過pmempool工具查看寫入的內容：

$ ls -l pmem
-rw-rw-rw- 1 server server 1073741824 Jan 29 11:50 pmem
$ file pmem
pmem: data
$ pmempool dump pmem
00000000  54 68 69 73 20 69 73 20  74 68 65 20 66 69 72 73  |This is the firs|
00000010  74 20 73 74 72 69 6e 67  20 61 70 70 65 6e 64 65  |t string appende|
00000020  64 0a 54 68 69 73 20 69  73 20 74 68 65 20 73 65  |d.This is the se|
00000030  63 6f 6e 64 20 73 74 72  69 6e 67 20 61 70 70 65  |cond string appe|
00000040  6e 64 65 64 0a                                    |nded.           |

3. Libpmemlog 性能

文章前言有說過，當我們使用PMEM 硬件構建我們的存儲系統時，日志記錄方式的選擇在PMEM上會一定程度得影響性能。使用傳統的vfs的系統調用來寫日志文件和 pmdk提供的libpmemlog 性能之間有多少差異，這一些差異需要在使用前來驗證，以便指導我們完成系統日志方案的選型評估。

3.1 write sys call 性能

如下測試代碼，通過write 寫入20*64M 大小的數據。

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>#include <sys/time.h>
#include <unistd.h>
#include <sys/types.h>
#include <fcntl.h>#include <libpmemlog.h>#define POOL_SIZE ((uint64_t)(64 << 20))using namespace std;const uint64_t file_op = 20;
const uint64_t write_size=(4*1024*1024);
const uint64_t read_size=(1024*1024);
const uint64_t read_op=1000;static uint64_t get_now_micros(){struct timeval tv;gettimeofday(&tv, NULL);return (tv.tv_sec) * 1000000 + tv.tv_usec;
}int main(int argc, char *argv[]) {char *filename;// 文件名獲取if (argc == 2) {filename = argv[1];printf("filename : %s \n", filename);} else {printf("args is not valid\n");return 1;}uint64_t ret,start_time,end_time,do_time,next_report_;void *buf;void *cp_buf;int buf_num=sizeof(char)*write_size;char namebuf[100];int fd;int i,j;bool error = false;buf=(void *)malloc(buf_num);cp_buf=(void *)malloc(buf_num);memset(buf,'1',buf_num); // 構建寫入數據，單次寫入大小為4M// 起始時間start_time=get_now_micros();next_report_=0;// 總共寫入file_op 20次for (i = 0;i < file_op; i++) {snprintf(namebuf, sizeof(namebuf), "%s_vfs_%03d.log",filename, i);fd = open(namebuf, O_CREAT|O_RDWR|O_APPEND);if (fd == -1) {perror(namebuf);return 1;}// 每次寫入4Mfor (j = 0;j < (POOL_SIZE / write_size); j++ ) {size_t x = write(fd, buf, write_size);if (x < 0) {printf("ret: %d, write failed! \n", x);error = true;break;}}// 保證數據落盤fsync(fd);if (error) {break;}close(fd);if (i >= next_report_) {if      (next_report_ < 1000)   next_report_ += 100;else if (next_report_ < 5000)   next_report_ += 500;else if (next_report_ < 10000)  next_report_ += 1000;else if (next_report_ < 50000)  next_report_ += 5000;else if (next_report_ < 100000) next_report_ += 10000;else if (next_report_ < 500000) next_report_ += 50000;else                            next_report_ += 100000;fprintf(stderr, "... finished %d ops%30s\r", i, "");fflush(stderr);}}// 結束時間end_time=get_now_micros();do_time=end_time-start_time;printf("file_size:%ld MB file_op:%ld write_size:%ld K\n",POOL_SIZE/1048576,file_op,write_size/1024);printf("%11.3f micros/op %6.1f MB/s\n",(1.0*do_time)/(file_op*((POOL_SIZE/write_size))),(POOL_SIZE*file_op/1048576.0)/(do_time*1e-6));free(buf);free(cp_buf);return 0;
}

編譯：g++ -std=c++11 vfs_log.cc -o vfs_log

運行性能如下：

$ ./vfs_log test
filename : test
file_size:64 MB file_op:20 write_size:4096 K8976.303 micros/op  445.6 MB/s

3.2 libpmemlog 性能

同樣，寫入20*64M的數據，變更相關文件系統接口為pmemlog接口如下:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>#include <sys/time.h>
#include <unistd.h>
#include <sys/types.h>
#include <fcntl.h>#include <libpmemlog.h>#define POOL_SIZE ((uint64_t)(64 << 20))using namespace std;const char filename[]="./pmem";
const uint64_t file_op = 20;
const uint64_t write_size=(4*1024*1024);
const uint64_t read_size=(1024*1024);
const uint64_t read_op=1000;static uint64_t get_now_micros(){struct timeval tv;gettimeofday(&tv, NULL);return (tv.tv_sec) * 1000000 + tv.tv_usec;
}int main(int argc, char **argv)
{uint64_t ret,start_time,end_time,do_time,next_report_;void *buf;void *cp_buf;int buf_num=sizeof(char)*write_size;buf=(void *)malloc(buf_num);cp_buf=(void *)malloc(buf_num);memset(buf,'1',buf_num);PMEMlogpool *plp;char namebuf[100];start_time=get_now_micros();next_report_=0;int i,j;bool error_flag=false;for(i=0;i<file_op;i++){snprintf(namebuf,sizeof(namebuf),"%sceshi%04d.pool",filename,i);plp = pmemlog_create(namebuf, POOL_SIZE, 0666);if (plp == NULL)plp = pmemlog_open(namebuf);if (plp == NULL) {perror(namebuf);return 1;}pmemlog_rewind(plp);for(j=0;j<(POOL_SIZE/write_size);j++){ret=pmemlog_append(plp, buf, write_size);if(ret<0){printf("ret:%ld append falid!\n",ret);error_flag = true;break;}}if(error_flag){break;}pmemlog_close(plp);if (i >= next_report_) {if      (next_report_ < 1000)   next_report_ += 100;else if (next_report_ < 5000)   next_report_ += 500;else if (next_report_ < 10000)  next_report_ += 1000;else if (next_report_ < 50000)  next_report_ += 5000;else if (next_report_ < 100000) next_report_ += 10000;else if (next_report_ < 500000) next_report_ += 50000;else                            next_report_ += 100000;fprintf(stderr, "... finished %d ops%30s\r", i, "");fflush(stderr);}}end_time=get_now_micros();do_time=end_time-start_time;printf("file_size:%ld MB file_op:%ld write_size:%ld K\n",POOL_SIZE/1048576,file_op,write_size/1024);printf("%11.3f micros/op %6.1f MB/s\n",(1.0*do_time)/(file_op*((POOL_SIZE/write_size))),(POOL_SIZE*file_op/1048576.0)/(do_time*1e-6));free(buf);free(cp_buf);return 0;}

編譯：

g++ libpmemlog_test.cc -o t2 -lpmemlog -lpmem -pthread

運行如下:

$ ./t2
file_size:64 MB file_op:20 write_size:4096 K4772.434 micros/op  838.1 MB/s

相比于vfs write的性能，延時上還是有很大的優勢的（write需要走操作系統內核vfs接口）。

總結

以上是生活随笔為你收集整理的pmdk -- libpmemlog 介绍的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： Intel Optane PMEM 概览
下一篇：清洗汽车油箱多少钱