The most popular data structure ud for indexing in relational databas is the Btree (or its variant the B+tree). Btrees ro to popularity becau they do fewer disk I/O operations to run a lookup compared to other balanced trees. To the best of my knowledge, MemSQL is the first commercial relational databa in production today to u a skiplist as its primary index backing data structure. A lot of rearch and prototyping went into the decision to u a skiplist. I hope to provide some of the rationale for this choice and to demonstrate the power and relative simplicity of MemSQL’s skiplist implementation. I’ll show some very simple single-threaded table scans that that run more than eight times faster on MemSQL compared to MySQL as a very basic demonstration (MemSQL performs much better than this on more aggressive and complex workloads). This article will stick to high-level design choices and leaves most of the nitty-gritty implementation details for other posts.
What is a Skiplist
Btrees made a lot of n for databas when the data lived most of its life on disk and was only pulled into memory as needed during a query. Btrees do extra work to reduce dis
k I/O that is needless overhead if your data fits into memory. MemSQL is a memory optimized databa so it is free to u other data structures, such as skiplists, that are not well suited to having data stored on disk.
Skiplists are a relatively recent invention. The minal skiplist paper was published in 1990 by William Pugh (Skiplists: a probabilistic alternative to balanced trees). This makes it about 20 years younger than the Btree, which was first propod in the 1970s. A skiplist is an ordered data structure providing expected O(Log(n)) lookup, inrtion and deletion complexity. It provides this level of efficiency without the need for complex tree balancing or page splitting like that required by Btrees, redblack trees or AVL trees. As a result, it’s a much simpler and more conci data structure to implement. Lockfree skiplist implementations have recently been developed (Lockfree Linked Lists and Skiplists) that provide thread safety with better parallelism under a concurrent read/write workload than threadsafe balanced trees that require locking. I won’t dig into the details of how to implement a skiplist lockfree here, but to get an idea of how it might be done e this blog post about common pitfalls in writing lockfree algorithms.
pos
A skiplist is made up of elements attached to towers. Each tower in a skiplist is linked at each level of the tower to the next tower at the same height forming a group of linked lists, one for each level of the skiplist. 兔子的英语 When an element is inrted into the skiplist, its tower height is determined randomly via successive coin flips (a tower with height n occurs once 出国英语in 2^n times). 会计初级报名条件The element is linked into the linked lists at each level of the skiplist once its height has been determined. The towers support binary arching by starting at the highest tower and working towards the bottom, using the tower links to check when one should move forward in the list or down the tower to a lower level.英语课堂游戏
Why a Skiplist Index for MemSQLluckydog
1) Memory Optimized
MemSQL is a memory-optimized databa, so we can assume that data always resides in memory. Being memory-optimized means indexes are free to u pointers to rows directly without the need for indirection. In a traditional databa, rows need to be addressable by some other means than a pointer as their primary storage location is on disk. This indirection usually takes the form of a cache of memory resident pages (often called a buffer pool) that is consulted in order to find a particular row’s in-memory address or to read it into memory from disk if needed. This indirection is expensive and usually done at the page level (e.g., 8K at a time in SQL Server). MemSQL doesn’t have to worry about this overhead. 汉语言文学考研科目This makes data structures that refer to rows arbitrarily, like a skiplist does, feasible. Dereferencing a pointer is much less expensive than looking up a page in the buffer pool.
2) Simple
MemSQL’s skiplist implementation is about 1500 lines of code including comments. Having recently spent some time in both SQL Server’s and Innodb’s Btree implementatio
ns, I can tell you they are both clo to 50 times larger in terms of lines of code and both have many more moving parts. For example, a Btree has to deal with page splitting and page compaction while a skiplist has no equivalent operations. The first generally available build of MemSQL’s took a little over a year to build and stabilize. This feat wouldn’t have been possible with a more complex indexing data structure.
3) Lock free
A lockfree or non-blocking algorithm is one in which some thread is always able to make progress, no matter how all the threads’ executions are interleaved by the OS. MemSQL is designed to support highly concurrent workloads running on hardware with many cores. The goals makes lockfree algorithms desirable for MemSQL. 广州舞蹈培训 The algorithms for writing a thread safe skiplist lockfree are now a solved problem in academia. A number of papers have been published on the subject in the past decade. It’s much harder to make a lockfree skiplist perform well when there is low contention (ie., a single thread iterating over the entire skiplist with no other concurrent operations executing). O
ptimizing this ca is a more active area of rearch. Our approach to solving this particular problem is a topic for another time.
Btrees, on the other hand, have historically needed to u a complex locking scheme to achieve thread safety. 英文爱情格言Some newer lockfree Btree-like data structuressuch as the BWtree 限制区have recently been propod that avoid this problem. Again, the complexity of the BWTree data structure far outpaces that of a skiplist or even a traditional Btree (it requires more complex compaction algorithms then a Btree and depends on a log-structured storage system to persist its pages). The simplicity of the skiplist is what makes it well suited for a lockfree implementation.
4) Fast
The speed of a skiplist comes mostly from its simplicity. MemSQL is executing fewer instructions to inrt, delete, arch or iterate compared to other databas.
5) Flexible
Skiplists also support some extra operations that are uful for query processing and that aren’t readily implementable in a balance tree.