PostgreSQL中的WAL保留策略
⼀、背景描述
在PostgreSQL中,我们熟知对于wal⽇志的保留,有wal_keep_gments这个参数去配置。当主库⽇志刷新⽐较快时,我们⼀般会将其调整为⼀个⽐较⼤的值,来保证从库稳定地进⾏流复制。
但有时,我们会发现主库保留的⽇志远远⼤于这个参数配置,甚⾄会出现⽇志暴增,存在写满磁盘的风险。
这就涉及到了另外⼀个机制–复制槽,从官⽅⽂档描述来看复制槽提供了⼀种⾃动化的⽅法来确保主库在所有的从库收到 WAL 段 之前不会移除它们。因此wal_keep_gments参数和复制槽之间肯定存在着某种关系,来确定⽇志保留数量。
⼆、原理解析
1.场景制造
以我⾃⼰制造的场景为例来探究其中的原理,
wal_keep_gments设置为16,主库创建⼀个复制槽standby_repl_slot给从库使⽤,进⾏流复制。
notreally
主从正常连接的情况下,停⽌从库;在主库进⾏批量dml操作,可以观察到主库保留的wal越来越多,⽬前已经保留179个,并还在增加中。
[postgres@postgres_primary:pg11.5:6548/opt/postgres/postgresql-11.5/pg11debug/data/pg_wal]$ ll 00000001*| wc -l
179
[postgres@postgres_primary:pg11.5:6548/opt/postgres/postgresql-11.5/pg11debug/data/pg_wal]$ ll 00000001*| head -5
-rw-------. 1 postgres postgres 16777216 Oct 8 19:06 0000000100000000000000F8
-rw-------. 1 postgres postgres 16777216 Oct 8 19:06 0000000100000000000000F9
-rw-------. 1 postgres postgres 16777216 Oct 8 19:06 0000000100000000000000FA
-rw-------. 1 postgres postgres 16777216 Oct 8 19:09 0000000100000000000000FB
-rw-------. 1 postgres postgres 16777216 Oct 8 19:09 0000000100000000000000FC
[postgres@postgres_primary:pg11.5:6548/opt/postgres/postgresql-11.5/pg11debug/data/pg_wal]$
查看复制槽信息如下:
复制槽⽬前是⾮活跃状态
psql (11.5)
Type"help"for help.
postgres=# lect * from pg_replication_slots ;
slot_name | plugin | slot_type | datoid |databa|temporary| active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn
-------------------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+------------------
standby_repl_slot || physical ||| f | f ||||0/F8000140 |
(1row)
postgres=#
2.代码⾛读
wal的保留策略是由checkpointer进程来执⾏的,在CreateCheckPoint或者CreateRestartPoint时,会计算需要从什么位置开始保留,然后对之前的⽇志进⾏Recycle和remove。
以CreateCheckPoint为例:
在创建检查点时,KeepLogSeg函数会计算需要保留的⽇志段,RemoveOldXlogFiles将不需要保留的都处理掉。
痘印如何治疗void
CreateCheckPoint(int flags)
{
bool shutdown;
CheckPoint checkPoint;
XLogRecPtr recptr;
XLogSegNo _logSegNo;
XLogCtlInrt *Inrt =&XLogCtl->Inrt;
uint32 freespace;
XLogRecPtr PriorRedoPtr;
XLogRecPtr curInrt;
XLogRecPtr last_important_lsn;
VirtualTransactionId *vxids;
int nvxids;
/*省略部分代码*/
recptr =XLogInrt(RM_XLOG_ID,
shutdown ? XLOG_CHECKPOINT_SHUTDOWN :
XLOG_CHECKPOINT_ONLINE);
XLogFlush(recptr);
/*省略部分代码*/
/*
* Update the average distance between checkpoints if the prior checkpoint
* exists.
*/
/* 估算两次checkpoint的偏移量 */
if(PriorRedoPtr != InvalidXLogRecPtr) UpdateCheckPointDistanceEstimate(RedoRecPtr - PriorRedoPtr);
/*
/*
* Delete old log files, tho no longer needed for last checkpoint to
* prevent the disk holding the xlog from growing full.
*/
XLByteToSeg(RedoRecPtr, _logSegNo, wal_gment_size);
/*计算出需要保留的wal序列号*/
KeepLogSeg(recptr,&_logSegNo);
/*将序列号减⼀,也就是上⼀个wal⽇志,从这个⽇志及之前所有的都已经不需要*/ _logSegNo--;
英文转换器/*回收或清理不需要的⽇志 */
RemoveOldXlogFiles(_logSegNo, RedoRecPtr, recptr);
/*省略部分代码*/
}
来看保留⽇志的计算过程:
通过两种策略,分别计算出需要保留的wal序列号,选择其中更⼩者
KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo)
{
XLogSegNo gno;
XLogRecPtr keep;
/*gno是wal⽇志序列号,recptr是本次checkpoint的record,wal_gment_size为16MB*/ /*XLByteToSeg计算出本次checkpoint flush的wal⽇志,⼀般是最新的wal⽇志*/ XLByteToSeg(recptr, gno, wal_gment_size);
宁波一对一
/*这⾥的keep获取的是复制槽⽬前保存的LSN位点信息*/
/*即 XLogCtl->replicationSlotMinLSN */
keep =XLogGetReplicationSlotMinimumLSN();
/*这⾥是通过策略1--wal_keep_gments,计算需要保留的wal序列号*/
/* compute limit for wal_keep_gments first */
if(wal_keep_gments >0)
{
/* avoid underflow, don't go below 1 */
if(gno <= wal_keep_gments)
gno =1;
el
/*需要保留的wal序列号= 当前最新wal序列号 - 16 */
gno = gno - wal_keep_gments;
}
/*这⾥是通过策略⼆--复制槽,计算需要保留的wal序列号*/
/* then check whether slots limit removal further */
/* 在使⽤复制槽的情况下,才会考虑这种策略 */
if(max_replication_slots >0&& keep != InvalidXLogRecPtr)
{
XLogSegNo slotSegNo;
/*这⾥通过之前获取到的复制槽位点信息keep,计算出对应的wal序列号slotSegNo*/ XLByteToSeg(keep, slotSegNo, wal_gment_size);
if(slotSegNo <=0)
gno =1;
/
*这⾥就是两种策略计算出的wal序列号进⾏⽐较,选择⼩值,也就是更早的wal⽇志*/ el if(slotSegNo < gno)
gno = slotSegNo;
}
/*当计算的wal序列号,⼩于本次do记录时,将其赋值给logSegNo*/
/* don't delete WAL gments newer than the calculated gment */
if(gno <*logSegNo)
*logSegNo = gno;
}
再来看清理⽇志的逻辑:
RemoveOldXlogFiles(XLogSegNo gno, XLogRecPtr RedoRecPtr, XLogRecPtr endptr)
{
DIR *xldir;
struct dirent *xlde;
char lastoff[MAXFNAMELEN];
/*
* Construct a filename of the last gment to be kept. The timeline ID
* doesn't matter, we ignore that in the comparison. (During recovery,
* ThisTimeLineID isn't t, so we can't u that.)
*/
/*根据wal序列号,计算wal⽂件名,这⾥把它描述为回收点*/
XLogFileName(lastoff,0, gno, wal_gment_size);
elog(DEBUG2,"attempting to remove WAL gments older than log file %s", lastoff);
xldir =AllocateDir(XLOGDIR);
/*循环遍历pg_wal⽬录下的所有⽇志*/
while((xlde =ReadDir(xldir, XLOGDIR))!=NULL)
{
/* Ignore files that are not XLOG gments */
if(!IsXLogFileName(xlde->d_name)&&广告英语
!IsPartialXLogFileName(xlde->d_name))
continue;
/*
* We ignore the timeline part of the XLOG gment identifiers in
* deciding whether a gment is still needed. This ensures that we
* won't prematurely remove a gment from a parent timeline. We could
* probably be a little more proactive about removing gments of
* non-parent timelines, but that would be a whole lot more
* complicated.
*
* We u the alphanumeric sorting property of the filenames to decide
* which ones are earlier than the lastoff gment.
*/
/* ⽤strcmp来⽐较,如果当前wal⽇志号⼩于或等于回收点,并且该⽇志已经归档(开归档的情况下)那么就可以回收或者删除 */
if(strcmp(xlde->d_name +8, lastoff +8)<=0)
{
/*检查⽇志是否归档完成(即pg_wal/archive_status⽬录下是不是已经存在对应的.done⽂件)*/
if(XLogArchiveCheckDone(xlde->d_name))
{
/* Update the last removed location in shared memory first */
UpdateLastRemovedPtr(xlde->d_name);
/*真正的回收/删除函数,函数⾥使⽤unlink删除wal*/
RemoveXlogFile(xlde->d_name, RedoRecPtr, endptr);
}
}
}
FreeDir(xldir);
}
RemoveXlogFile⾥边进⾏⽇志回收以及清理,回收就是从不需要保留的⽇志中选择⼀部分来给未来使⽤(回收数量和两次checkpoint间产⽣wal量有关系),其余的会被清理掉。
/*
* Recycle or remove a log file that's no longer needed.
*
* endptr is current (or recent) end of xlog, and RedoRecPtr is the
* redo pointer of the last checkpoint. The are ud to determine
* whether we want to recycle rather than delete no-longer-wanted log files.
* If RedoRecPtr is not known, pass invalid, and the function will recycle,
* somewhat arbitrarily, 10 future gments.
*/
static void
RemoveXlogFile(const char*gname, XLogRecPtr RedoRecPtr, XLogRecPtr endptr)
{
{
char path[MAXPGPATH];
关于国庆节的英语手抄报
#ifdef WIN32
char newpath[MAXPGPATH];
#endif
struct stat statbuf;
XLogSegNo endlogSegNo;
XLogSegNo recycleSegNo;
/
*
* Initialize info about where to try to recycle to.
*/
/* 计算当前最新wal序列号 */
XLByteToSeg(endptr, endlogSegNo, wal_gment_size);
/* 这⾥是很重要的⼀步,计算最⼤回收号recycleSegNo */
/* 若当前为第⼀次checkpoint时,最⼤回收号为当前wal序列号+ 10,*/
/*也就是说,回收10个⽇志 */
if(RedoRecPtr == InvalidXLogRecPtr)
recycleSegNo = endlogSegNo +10;
el/*当前不是第⼀次checkpoint,使⽤XLOGfileslop函数计算最⼤回收号*/
recycleSegNo =XLOGfileslop(RedoRecPtr);
snprintf(path, MAXPGPATH, XLOGDIR "/%s", gname);
/*
* Before deleting the file, e if it can be recycled as a future log
* gment. Only recycle normal files, pg_standby for example can create
* symbolic links pointing to a parate archive directory.
*/
/* 当前wal序列号⼩于最⼤回收号,并满⾜⼀定条件时,使⽤InstallXLogFileSegment函数回收⽇志*/ if(endlogSegNo <= recycleSegNo &&
海淀翻译公司lstat(path,&statbuf)==0&&S_ISREG(statbuf.st_mode)&&
InstallXLogFileSegment(&endlogSegNo, path,
true, recycleSegNo, true))
{/*服务器⽇志级别为debug2时,会提⽰当前正在回收wal*/
ereport(DEBUG2,
(errmsg("recycled write-ahead log file \"%s\"",老友记第六季
gname)));
CheckpointStats.ckpt_gs_recycled++;
汽车低音炮/* Needn't recheck that slot on future iterations */
endlogSegNo++;
}
el/* 清理剩余的wal */
{
/* No need for any more */
int rc;
ereport(DEBUG2,
(errmsg("removing write-ahead log file \"%s\"",
gname)));
韩语大婶怎么说#ifdef WIN32
/*
* On Windows, if another process (e.g another backend) holds the file
* open in FILE_SHARE_DELETE mode, unlink will succeed, but the file
* will still show up in directory listing until the last handle is
* clod. To avoid confusing the lingering deleted file for a live
* WAL file that needs to be archived, rename it before deleting it.
*
* If another process holds the file open without FILE_SHARE_DELETE
* flag, rename will fail. We'll try again at the next checkpoint.
*/
snprintf(newpath, MAXPGPATH,"%s.deleted", path);
if(rename(path, newpath)!=0)
{
ereport(LOG,
(errcode_for_file_access(),