ETCD核心机制解析

更新时间:2023-05-23 04:49:42 阅读：评论：0

ETCD核⼼机制解析

ETCD整体机制

etcd 是⼀个分布式的、可靠的 key-value 存储系统，它适⽤于存储分布式系统中的关键数据。

etcd 集群中多个节点之间通过Raft算法完成分布式⼀致性协同，算法会选举出⼀个主节点作为 leader，由 leader 负责数据的同步与分发。当 leader 出现故障后系统会⾃动地重新选取另⼀个节点成为 leader，并重新完成数据的同步。

etcd集群实现⾼可⽤主要是基于quorum机制，即：集群中半数以上的节点可⽤时，集群才可继续提供服务，quorum 机制在分布式⼀致性算法中应⽤⾮常⼴泛，此处不再详细阐述。

raft数据更新和etcd调⽤是基于两阶段机制：

第⼀阶段 leader记录log (uncommited)；⽇志复制到follower；follower响应，操作成功，响应客户端；调⽤者调

⽤leader，leader会将kv数据存储在⽇志中，并利⽤实时算法raft进⾏复制

第⼆阶段 leader commit；通知follower；当复制给了N+1个节点后，本地提交，返回给客户端，最后leader异步通知follower完成通知

ETCD核⼼API分析

etcd提供的api主要有kv相关、lea相关及watch，查看其源码可知：

kv相关接⼝：

type KV interface {

// Put puts a key-value pair into etcd.

// Note that key,value can be plain bytes array and string is

// an immutable reprentation of that bytes array.

// To get a string of bytes, do string([]byte{0x10, 0x20}).

Put(ctx context.Context, key, val string, opts ...OpOption) (*PutRespon, error)

// Get retrieves keys.

// By default, Get will return the value for "key", if any.

// When pasd WithRange(end), Get will return the keys in the range [key, end).

// When pasd WithFromKey(), Get returns keys greater than or equal to key.

// When pasd WithRev(rev) with rev > 0, Get retrieves keys at the given revision;

// if the required revision is compacted, the request will fail with ErrCompacted .

// When pasd WithLimit(limit), the number of returned keys is bounded by limit.

// When pasd WithSort(), the keys will be sorted.

Get(ctx context.Context, key string, opts ...OpOption) (*GetRespon, error)

// Delete deletes a key, or optionally using WithRange(end), [key, end).

Delete(ctx context.Context, key string, opts ...OpOption) (*DeleteRespon, error)

// Compact compacts etcd KV history before the given rev.

Compact(ctx context.Context, rev int64, opts ...CompactOption) (*CompactRespon, error)

// Txn creates a transaction.

Txn(ctx context.Context) Txn

}

主要有Put、Get、Delete、Compact、Do和Txn⽅法；Put⽤于向etcd集群中写⼊消息，以key value的形式存储；Get可以根据key查看其对应存储在etcd中的数据；Delete通过删除key来删除etcd中的数据；Compact ⽅法⽤于压缩etcd 键值对存储中的事件历史，避免事件历史⽆限制的持续增长；Txn ⽅法在单个事务中处理多个请求，etcd事务模式为：

if compare

then op

el op

commit

lea相关接⼝：

type Lea interface {

// Grant creates a new lea.

Grant(ctx context.Context, ttl int64) (*LeaGrantRespon, error)

// Revoke revokes the given lea.

Revoke(ctx context.Context, id LeaID) (*LeaRevokeRespon, error)

// TimeToLive retrieves the lea information of the given lea ID.

TimeToLive(ctx context.Context, id LeaID, opts ...LeaOption) (*LeaTimeToLiveRespon, error)

// Leas retrieves all leas.

Leas(ctx context.Context) (*LeaLeasRespon, error)

// KeepAlive keeps the given lea alive forever. If the keepalive respon

// posted to the channel is not consumed immediately, the lea client will

// continue nding keep alive requests to the etcd rver at least every

// cond until latest respon is consumed.

// The returned "LeaKeepAliveRespon" channel clos if underlying keep

// alive stream is interrupted in some way the client cannot handle itlf;

// given context "ctx" is canceled or timed out. "LeaKeepAliveRespon"

// from this clod channel is nil.

// If client keep alive loop halts with an unexpected error (e.g. "etcdrver:

// no leader") or canceled by the caller (e.g. context.Canceled), the error

挣怎么组词/

/ is returned. Otherwi, it retries.

匡衡凿壁借光//

// TODO(v4.0): post errors to last keep alive message before closing

// (e /coreos/etcd/pull/7866)

KeepAlive(ctx context.Context, id LeaID) (<-chan *LeaKeepAliveRespon, error)

// KeepAliveOnce renews the lea once. The respon corresponds to the

// first message from calling KeepAlive. If the respon has a recoverable

// error, KeepAliveOnce will retry the RPC with a new keep alive message.

// In most of the cas, Keepalive should be ud instead of KeepAliveOnce.

KeepAliveOnce(ctx context.Context, id LeaID) (*LeaKeepAliveRespon, error)

中国科学技术大学是985还是211

/ Clo releas all resources Lea keeps for efficient communication

// with the etcd rver.

星星闪闪Clo() error

}

lea 是分布式系统中⼀个常见的概念，⽤于代表⼀个分布式租约。典型情况下，在分布式系统中需要去检测⼀个节点是否存活的时，就需要租约机制。

Grant⽅法⽤于创建⼀个租约，当服务器在给定 time to live 时间内没有接收到 keepAlive 时租约过期；Revoke撤销⼀个租约，所有附加到租约的key将过期并被删除；TimeToLive 获取租约信息；KeepAlive 通过从客户端到服务器端的流化的 keep alive 请求和从服务器端到客户端的流化的 keep alive 应答来维持租约；检测分布式系统中⼀个进程是否存活，可以在进程中去创建⼀个租约，并在该进程中周期性的调⽤KeepAlive 的⽅法。如果⼀切正常，该节点的租约会⼀致保持，如果这个进程挂掉了，最终这个租约就会⾃动过期，在etcd 中，允许将多个key 关联在同⼀个lea 之上，可以⼤幅减少 lea 对象刷新带来的开销。

watch相关接⼝：

type Watcher interface {

// Watch watches on a key or prefix. The watched events will be returned

// through the returned channel. If revisions waiting to be nt over the

// watch are compacted, then the watch will be canceled by the rver, the

// client will post a compacted error watch respon, and the channel will clo.

// If the context "ctx" is canceled or timed out, returned "WatchChan" is clod,

// and "WatchRespon" from this clod channel has zero events and nil "Err()".

// The context "ctx" MUST be canceled, as soon as watcher is no longer being ud,

// to relea the associated resources.

// If the context is "context.Background/TODO", returned "WatchChan" will

/ not be clod and block until event is triggered, except when rver

// returns a non-recoverable error (e.g. ErrCompacted).

// For example, when context pasd with "WithRequireLeader" and the

// connected rver has no leader (e.g. due to network partition),

// error "etcdrver: no leader" (ErrNoLeader) will be returned,

// and then "WatchChan" is clod with non-nil "Err()".

// In order to prevent a watch stream being stuck in a partitioned node,

// make sure to wrap context with "WithRequireLeader".

// Otherwi, as long as the context has not been canceled or timed out,

// watch will retry on other recoverable errors forever until reconnected.

// TODO: explicitly t context error in the last "WatchRespon" message and clo channel?

// Currently, client contexts are overwritten with "valCtx" that never clos.

// TODO(v3.4): configure watch retry policy, limit maximum retry number

// (e /etcd-io/etcd/issues/8980)

Watch(ctx context.Context, key string, opts ...OpOption) WatchChan

// RequestProgress requests a progress notify respon be nt in all watch channels.

RequestProgress(ctx context.Context) error

// Clo clos the watcher and cancels all watch requests.

Clo() error

兰蔻保质期怎么看

}

etcd 的Watch 机制可以实时地订阅到 etcd 中增量的数据更新，watch ⽀持指定单个 key，也可以指定⼀个 key 的前缀。Watch 观察将要发⽣或者已经发⽣的事件，输⼊和输出都是流；输⼊流⽤于创建和取消观察，输出流发送事件。⼀个观察 RPC 可以在⼀次性在多个key范围上观察，并为多个观察流化事件，整个事件历史可以从最后压缩修订版本开始观察。

ETCD数据版本机制

etcd数据版本中主要有term表⽰leader的任期，revision 代表的是全局数据的版本。当集群发⽣ Leader 切换，term 的值就会 +1，在节点故障，或者 Leader 节点⽹络出现问题，再或者是将整个集群停⽌后再次拉起，都会发⽣ Leader 的切换；当数据发⽣变更，包括创建、修改、删除，其 revision 对应的都会 +1，在集群中跨 Leader 任期之

间，revision 都会保持全局单调递增，集群中任意⼀次的修改都对应着⼀个唯⼀的 revision，因此我们可以通过revision 来⽀持数据的 MVCC，也可以⽀持数据的 Watch。

对于每⼀个 KeyValue 数据节点，etcd 中都记录了三个版本：

梦见吃猪肉第⼀个版本叫做 create_revision，是 KeyValue 在创建时对应的 revision；

第⼆个叫做 mod_revision，是其数据被操作的时候对应的 revision；

第三个 version 就是⼀个计数器，代表了 KeyValue 被修改了多少次。

在同⼀个 Leader 任期之内，所有的修改操作，其对应的 term 值始终相等，⽽ revision 则保持单调递增。当重启集群之后，所有的修改操作对应的 term 值都加1了。

ETCD之MVCC并发控制

说起mvcc⼤家都不陌⽣，mysql的innodb中就使⽤mvcc实现⾼并发的数据访问，对数据进⾏多版本处理，并通过事务的可见性来保证事务能看到⾃⼰应该看到的数据版本，同样，在etcd中也使⽤mvcc进⾏并发控制。

etcd⽀持对同⼀个Key 发起多次数据修改，每次数据修改都对应⼀个版本号。etcd记录了每⼀次修改对应的数据，即⼀个 key 在 etcd 中存在多个历史版本。在查询数据的时候如果不指定版本号，etcd 会返回 Key 对应的最新版本，同时etcd 也⽀持指定⼀个版本号来查询历史数据。

etcd将每⼀次修改都记录下来，使⽤ watch订阅数据时，可以⽀持从任意历史时刻（指定 revision）开始创建⼀个watcher，在客户端与 etcd 之间建⽴⼀个数据管道，etcd 会推送从指定 revision 开始的所有数据变更。etcd 提供的watch 机制保证，该 Key 的数据后续的被修改之后，通过这个数据管道即时的推送给客户端。

分析其源码可知：

type revision struct {

// main is the main revision of a t of changes that happen atomically.

main int64

// sub is the the sub revision of a change in a t of changes that happen

// atomically. Each change has different increasing sub revision in that

// t.

sub int64

}

func (a revision) GreaterThan(b revision) bool {

if a.main > b.main {

return true

}

if a.main < b.main {

return fal

}

return a.sub > b.sub

}

在etcd的mvcc实现中有⼀个revision结构体，main 表⽰当前操作的事务 id，全局⾃增的逻辑时间戳，sub 表⽰当前操作在事务内部的⼦ id，事务内⾃增，从 0 开始；通过GreaterThan⽅法进⾏事务版本的⽐较。

ETCD存储数据结构

etcd 中所有的数据都存储在⼀个 btree的数据结构中，该btree保存在磁盘中，并通过mmap的⽅式映

射到内存⽤来⽀持快速的访问，treeIndex的定义如下：

type treeIndex struct {

sync.RWMutex

tree *btree.BTree

大学生公租房}

func newTreeIndex() index {

return &treeIndex{

tree: btree.New(32),

}

index所绑定对btree的操作有Put、Get、Revision、Range及Visit等，以Put⽅法为例，其源码如下：

func (ti *treeIndex) Put(key []byte, rev revision) {

keyi := &keyIndex{key: key}

ti.Lock()

部落英文defer ti.Unlock()

item := ti.tree.Get(keyi)

if item == nil {

keyi.put(rev.main, rev.sub)

return

}

okeyi := item.(*keyIndex)

okeyi.put(rev.main, rev.sub)

}

通过源码可知对btree数据的读写操作都是在加锁下完成的，从⽽来保证并发下数据的⼀致性。

本文发布于:2023-05-23 04:49:42，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/82/740842.html

上一篇：北京居住证办理需要多长时间

下一篇：2023年桥之美教学反思(9篇)

标签：数据租约版本节点修改

留言与评论（共有 0 条评论）