首页 > 美文阅读

数据密集型应用系统设计pdf_设计数据密集型应用（3）：StorageandRetrieval

更新时间:2023-06-04 10:19:51 阅读：评论：0

数据密集型应⽤系统设计pdf_设计数据密集型应⽤（3）：

StorageandRetrieval

第三章主要介绍可持久化的数据索引——主流的可持久化数据索引有下⾯⼏种：

1. Hash Index。

2. LSM-Tree。

3. B-Tree。

4. B+Tree。书中没有提到 B+Tree，可能是因为它和 B-Tree ⽐较像。考虑到 B+Tree 作为世界上最流⾏的关系数据库 MySQL 的官⽅

存储引擎 InnoDB 的索引结构，本⽂还是决定拿出来学习⼀下。

Hash Index

Hash Index 是⼀种相对简单的索引结构。⼏乎每⼀种程序设计语⾔都有提供内存数据结构 hash map/table 的标准库，⽐如 C++ 中的std::unordered_map、Python 中的 dictionary、Golang 中的 map。

简单的 Hash Index 可以在 hash map 的基础上实现将数据持久化：在内存中维护⼀个 hash map，保存 key -> <offt, size>，在磁盘上维护⼀个 append only 的⽂件⽤于持久化保存数据。

简单粗糙的 C++ 代码实现如下：

#include <asrt.h>

#include <fcntl.h>

#include <string.h>

#include <sys/stat.h>

#include <sys/types.h>

#include <unistd.h>

#include <string>

#include <unordered_map>

#include <vector>

class HashIndex {

public:

HashIndex(const std::string& data_fname)

: data_fname_(data_fname), data_fd_(-1) {}

int Init() {

data_fd_ = open(data_fname_.c_str(), O_CREAT | O_RDWR | O_APPEND, 0666); if (data_fd_ < 0) {

fprintf(stderr, "open %s error %sn", data_fname_.c_str(),

strerror(errno));

return -1;

}

return 0;

}

int Get(const std::string& key, std::string* value) {

auto itr = hash_.find(key);

if (itr == hash_.end()) {

return 1;

古代英雄的石像

}

std::vector<char> buf(itr->cond.size);

ssize_t rsize =

pread(data_fd_, &buf[0], itr->cond.size, itr->cond.offt);

if (rsize != itr->cond.size) {

fprintf(stderr, "pread fd %d offt %lu size %u rsize %ld error %sn",

data_fd_, itr->cond.offt, itr->cond.size, rsize,

strerror(errno));

return -1;

}

std::string tmp_key;薛甫伦

DecodeData(buf.data(), tmp_key, *value);

if (tmp_key != key) {

return -2;

}

return 0;

}

int Set(const std::string& key, const std::string& value) {

std::string buf = EncodeData(key, value);

off_t offt = lek(data_fd_, 0, SEEK_CUR);

if (offt < 0) {

fprintf(stderr, "lek fd %d error %sn", data_fd_, strerror(errno));

return -1;

}

auto wsize = write(data_fd_, buf.data(), buf.size());

if (wsize != buf.size()) {

fprintf(stderr, "write fd %d buf size %zu wsize %ld error %sn", data_fd_, buf.size(), wsize, strerror(errno));

return -1;艺术照写真

}

auto& value_info = hash_[key];

value_info.offt = offt;

value_info.size = buf.size();

阳光之家return 0;

}

private:

uint32_t EncodeDataSize(uint32_t ksize, uint32_t vsize) {

return sizeof(uint32_t) * 2 + ksize + vsize;

}

用箱子做手工std::string EncodeData(const std::string& key, const std::string& value) { std::string result;

uint32_t ksize = key.size();

uint32_t vsize = value.size();

唐婉和陆游result.append((char*)&ksize, sizeof(ksize));

result.append(key);

result.append((char*)&vsize, sizeof(vsize));

result.append(value);

return result;

}

void DecodeData(const char* buf, std::string& key, std::string& value) {

uint32_t ksize = *(const uint32_t*)buf;

buf += sizeof(uint32_t);

key = std::string(buf, ksize);

buf += ksize;

uint32_t vsize = *(const uint32_t*)buf;

buf += sizeof(uint32_t);

value = std::string(buf, vsize);

}

struct ValueInfo {

uint64_t offt;

uint32_t size;

};

std::unordered_map<std::string, ValueInfo> hash_;

std::string data_fname_;

int data_fd_;

};

int main() {

HashIndex hash("/tmp/hash_index_test");

int ret = hash.Init();

asrt(ret == 0);

std::string v0;

ret = hash.Get("hello", &v0);

asrt(ret == 1);

ret = hash.Set("hello", "world");

asrt(ret == 0);

ret = hash.Get("hello", &v0);

asrt(ret == 0);

asrt(v0 == "world");

ret = hash.Set("hash", "HashTable");

asrt(ret == 0);

ret = hash.Set("lsm", "LSMTree");

asrt(ret == 0);

ret = hash.Set("b-", "B-Tree");

asrt(ret == 0);

ret = hash.Set("b+", "B+Tree");

asrt(ret == 0);

ret = hash.Get("hash", &v0);

asrt(ret == 0);

asrt(v0 == "HashTable");

ret = hash.Get("lsm", &v0);

asrt(ret == 0);

asrt(v0 == "LSMTree");

ret = hash.Get("b-", &v0);

asrt(ret == 0);

asrt(v0 == "B-Tree");

ret = hash.Get("b+", &v0);

asrt(ret == 0);

asrt(v0 == "B+Tree");

ret = hash.Set("hello", "WORLD");

asrt(ret == 0);

ret = hash.Get("hello", &v0);

asrt(ret == 0);

asrt(v0 == "WORLD");

return 0;

}

这个实现没有考虑太多⽅⾯的问题，⽐如：

1. 删除记录。可以写⼊⼀条特殊的 delete flag 表⽰删除。

2. Crash recovery。进程重启后，如何重建索引？可以通过顺序扫描整个⽂件来重建索引。但是，当⽂件⾮常⼤的时候，重建索引的时

间会很久。

胆战心惊是什么意思3. 部分写失败。写⽂件不能保证是原⼦的，可能我们只写了⼀半就崩溃，重建索引的时候需要识别出来并剔除掉。

4. 并发控制。

5. 过期数据回收。等等。

Bitcask: A Log-Structured Hash Table for Fast Key/Value Data。

想要知道这些问题如何解决，可以参考论⽂：Bitcask: A Log-Structured Hash Table for Fast Key/Value Data

此外，Hash Index 还存在⼀些限制：

1. 整个 hash map 需要放在内存中，索引的⼤⼩受内存限制。

2. 不⽀持 range query（或者说 range query 的效率很低，⼀般需要通过全表扫描来实现）。

下⾯介绍的 LSM-Tree、B-Tree、B+Tree 的⼤⼩不会受到内存⼤⼩的限制，也能实现效率⽐较⾼的 range query，相对 Hash Indexe 会更加通⽤。

LSM-Tree

提升写性能。

LSM-Tree 最早应该是出⾃论⽂ The Log-Structured Merge-Tree (LSM-Tree) ，其设计⽬标是提升写性能

LSM-Tree 通过将随机写转化为顺序写来提⾼写性能（⽆论 HDD 还是 SSD，其顺序读写都要明显优于随机读写），⽽付出的代价就是读放⼤（每次查询可能需要 I/O）和写放⼤（compaction）。

如上图所⽰：

1. LSM-Tree 的实现⼀般由内存中 MemTable + 外存（HDD/SSD）上的 WAL（Write Ahead Log） + 外存上的 SST（Sorted

String Table）组成。Manifest ⽂件保存⼀些元数据。

2. 写操作很简单：1）写 WAL；2）写 MemTable。

3. 读操作就⽐较⿇烦了：需要从新到旧读取 MemTable 或 SST，直到找到⽬标值。如果是范围查找，这个过程会更复杂⼀点，暂时不

详细介绍了。

4. MemTable 在写满之后，会转换为 Immutable MemTable，然后被后台线程 dump 到外存上成为⼀个 SST ⽂件。这个过程也叫

Minor Compaction。

倒挂金钟怎么养

5. 随着外存上的数据/⽂件越来越多，为了尽可能保证数据的有序性和回收⼀些⽆效数据，外存上的 SST 之间会进⾏ compaction。这

个过程叫 Major Compaction，也是 LSM-Tree 写放⼤的主要来源。

LSM-Tree 最近⼏年⾮常热门，⽐较知名的开源实现有：

leveldb - /google/leveldb

rocksdb - /facebook/rocksdb

dgraph-io/badger - /dgraph-io/badger

cockroachdb/pebble - /cockroachdb/pebble

B-Tree

1970 年的论⽂ Organization and maintenance of large ordered indices 提出了⼀种按页管理外存，便于随机访问的数据结构——B-Tree。

B-Tree 是众多平衡树中的⼀种，其设计思想是尽可能减少每次读写需要访问外存的次数。⼤部分 B-Tree 的操作（arch、inrt、delete）都只需要访问磁盘 O(h) 次。h 是 B-Tree 的⾼度。B-Tree 是⼀棵⾼扇出的扁平树。h 的值⼀般都⽐较⼩。

B-Tree 将数据划分成⼀个个固定⼤⼩的 page，⼀般是 4/8/16 KB，每次读写⼀个 page。⼀个 page 上保存的数据是有序的，⽅便快速查找。

本文发布于:2023-06-04 10:19:51，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/82/858879.html

上一篇：简单的普通员工辞职信短语简单的普通员工辞职信30字酒店(十五篇)

下一篇：【精选】高二生物教学计划（通用10篇）

标签：需要实现数据内存外存设计访问读写

留言与评论（共有 0 条评论）