首页 > 美文阅读

Skywalking存储数据定时清理任务失效原因查找与解决方案_2021-04-26

更新时间:2023-06-25 18:33:50 阅读：评论：0

Skywalking存储数据定时清理任务失效原因查找与解决⽅案

_2021-04-26

Skywalking 存储数据定时清理任务失效现象

现⽹环境deploy Skywalking 后台oap rver，采⽤elasticarch 6为存储，现象为长达两个星期的数据

始终保存在ES中，并没有按照预想的根据配置⽂件中默认的recordDataTTL和metricsDataTTL设置的3天

和7天的有效期来进⾏清理。

--recordDataTTL The lifecycle of record data. Record data includes traces, top

n sampled records, and logs. Unit is day. Minimal value is 2.

SW_CORE_RECORD_DATA_TTL3

--metricsDataTTL The lifecycle of metrics data, including the metadata. Unit is day.

Recommend metricsDataTTL >= recordDataTTL. Minimal value is 2.

SW_CORE_METRICS_DATA_TTL7

查看配置⽂件中有关清理的配置，默认应该是启动数据清理DataKeeperExecutor, 并且是每5分钟运⾏⼀次。

--enableDataKeeperExecutor Controller of TTL scheduler.

Once disabled, TTL wouldn’t

work.

人口分界线SW_CORE_ENABLE_DATA_KEEPER_EXECUTOR true

--dataKeeperExecutePeriod The execution period of TTL

scheduler, unit is minute.

Execution doesn’t mean deleting

data. The storage provider could

什么是bgm

override this, such as

ElasticSearch storage.

SW_CORE_DATA_KEEPER_EXECUTE_PERIOD5

查看skywalking oap rver pod的⽇志，kubectl logs -f 跟踪查看，发现每五分钟，有这样⼀条记录

2021-04-26 06:05:51,082 - org.apache.skywalking.l.DataTTLKeeperTimer -325937 [pool-10-thread-1] INFO [] - The lected first getAddress is 100.67.187.229_11800. Skip.

这⾥可以看到DataTTLKeeperTimer 这个类，应该就是定时做数据清理的，转向Skywalking 8.4.0 源码分析原因。

Skywalking 8.4.0 数据清理流程源码

查看Skywalking 8.4.0源码 DataTTLKeeperTimer中，查找到log信息对应的delete⽅法，⽅法comment指出，数据清理DataTTLKeeperTimer在每个OAP node中都会运⾏，但只有当前OAP node是list中的第⼀个是才会真正执⾏清理，否则直接跳过。

/**

* DataTTLKeeperTimer starts in every OAP node, but the deletion only work when it is as the first node in the OAP

* node list from {@link ClusterNodesQuery}.

private void delete() {

List<RemoteInstance> remoteInstances = clusterNodesQuery.queryRemoteNodes();

if (CollectionUtils.isNotEmpty(remoteInstances) && !(0).getAddress().isSelf()) {

表达近义词log.info("The lected first getAddress is {}. Skip.", (0).toString());

return;

}

log.info("Beginning to remove expired metrics from the storage.");

好听的王者荣耀名字女IModelManager modelGetter = moduleManager.find(CoreModule.NAME).provider().getService(IModelManager.class);

List<Model> models = modelGetter.allModels();

models.forEach(this::execute);

}

对应我们测试环境的情况，测试环境下只有⼀个oap node, 但是依然在log中显⽰skip信息，根据判断，只可能为

<(0).getAddress().isSelf()部分为fal导致，查看这个isSelf()⽅法在哪⾥赋值，查找到是在接⼝ClusterNodesQuery的实现类KubernetesCoordinator中的queryRemoteNodes()⽅法中初始化remoteInstances list时⼀并赋值，通过对⽐Metadata().getUid()和KubernetesCoordinator类创建时的uid对⽐来判断。

public List<RemoteInstance> queryRemoteNodes() {

try {

initHealthChecker();

List<V1Pod> pods = NamespacedPodListInformer.INFORMER.listPods().orElGet(this::lfPod);

if (log.isDebugEnabled()) {

List<String> uidList = pods

.stream()

.map(item -> Metadata().getUid())

.List());

log.debug("[kubernetes cluster pods uid list]:{}", String());

}

if (port == -1) {

port = manager.find(CoreModule.NAME).provider().getService(ConfigService.class).getGRPCPort();

}

List<RemoteInstance> remoteInstances =

pods.stream()

.filter(pod -> StringUtil.Status().getPodIP()))

.map(pod -> new RemoteInstance(

new Status().getPodIP(), port, Metadata().getUid().equals(uid))))

.List());

healthChecker.health();

return remoteInstances;

} catch (Throwable e) {

healthChecker.unHealth(e);

throw new Message());

}

}

这⾥为了更好判断Metadata().getUid()和KubernetesCoordinator类创建时的uid 为什么不相同，修改了源代码查看cluster pods uid list和

KubernetesCoordinator的uid，

@Override

public List<RemoteInstance> queryRemoteNodes() {

try {

initHealthChecker();

李砚祖

List<V1Pod> pods = NamespacedPodListInformer.INFORMER.listPods().orElGet(this::lfPod);

// if (log.isDebugEnabled()) {

List<String> uidList = pods

.stream()

.map(item -> Metadata().getUid())

.List());

log.info("[kubernetes cluster pods uid list]:{}", String());

log.info("[KubernetesCoordinator uid: ]:{}", uid);

// }

if (port == -1) {

port = manager.find(CoreModule.NAME).provider().getService(ConfigService.class).getGRPCPort();

}

List<RemoteInstance> remoteInstances =

pods.stream()

.filter(pod -> StringUtil.Status().getPodIP()))

.map(pod -> new RemoteInstance(

new Status().getPodIP(), port, Metadata().getUid().equals(uid))))

.List());

healthChecker.health();

return remoteInstances;

} catch (Throwable e) {

healthChecker.unHealth(e);

throw new Message());

}

}

修改源码后，通过源码中的makefile⼿动打包⽣成⾃定义的skywalking oap docker镜像。

打包⾃定义Skywalking OAP 镜像

使⽤项⽬中⾃带的makefile⽂件运⾏make docker即可打包，

安装后，运⾏make docker SKIP_TEST=true，跳过测试打包镜像。

在make命令执⾏完成后，通过docker images查看⽣成的镜像

$ docker images

REPOSITORY TAG IMAGE ID CREATED SIZE

文艺壁纸>伴君幽独weiwei11/oap latest 607ca94fd742 11 minutes ago 537MB

apache/skywalking-ui 8.4.0 5f4d7292cd19 2 months ago 4 apache/skywalking-oap-rv

er 8.4.0-es6 35183ada1fbf 2 months ago elasticarch 6.5.1 32f93c89076d 2 years ago 773MB

将⾃定义打包的镜像tag后推送到hub上，再修改skyoap-test deployment的yaml配置⽂件，使⽤⾃定义打包的镜像，在pod的log中发现

这样的信息：

2021-04-25 09:29:54,270 - org.apache.skywalking.oap.rver.cluster.plugin.kubernetes.KubernetesCoordinator -16079 [poo

2021-04-25 09:29:54,271 - org.apache.skywalking.oap.rver.cluster.plugin.kubernetes.KubernetesCoordinator -16080 [pool-3-thread-1] INFO [] - [KubernetesC

可以看到 KubernetesCoordinator创建时uid为null, 导致即使在单节点oap node时，依然isSelf ()为fal使得数据清理机制失效。

解决⽅案

查找Skywalking的application.yaml中有关Kubernetes部分的配置，发现uidEnvName配置

kubernetes:

namespace: ${SW_CLUSTER_K8S_NAMESPACE:default}

labelSelector: ${SW_CLUSTER_K8S_LABEL:app=collector,relea=skywalking}

uidEnvName: ${SW_CLUSTER_K8S_UID:SKYWALKING_COLLECTOR_UID}

⽽KubernetesCoordinator中的uid在初始化时的代码中正是通过这个uidEnvName来获取的，

public KubernetesCoordinator(final ModuleDefineHolder manager,

final ClusterModuleKubernetesConfig config) {

this.uid = new UidEnvName()).get();

this.manager = manager;

}简单灯谜

因此在⽹络上查找有关SW_CLUSTER_K8S_UID和SKYWALKING_COLLECTOR_UID的内容，查找到如下⽅式可以向sky oap rver中注⼊pod的metadata.uid

- name: SKYWALKING_COLLECTOR_UID

valueFrom:

fieldRef:

apiVersion: v1

fieldPath: metadata.uid

在sky oap rver deployment的yaml中增加配置后，数据清理机制正常运⾏。

本文发布于:2023-06-25 18:33:50，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/82/1038158.html

上一篇：店铺连锁销售合同书(3篇)

下一篇：最新仓储合同印花税优秀

标签：数据配置查找

留言与评论（共有 0 条评论）