数据库CH (12)

更新时间:2023-07-03 10:31:45 阅读: 评论:0

C H A P T E R12
Query Processing
This chapter describes the process by which queries are executed efficiently by
a databa system.The chapter starts off with measures of cost,then proceeds
to algorithms for evaluation of relational algebra operators and expressions.This
chapter applies concepts from Chapter2,6,10,and11.
Query processing algorithms can be covered without tedious and distracting details of size estimation.Although size estimation is covered later,in Chapter13,
the prentation there has been simplified by omitting some details.Instructors
can choo to cover query processing but omit query optimization,without loss
of continuity with later chapters.
Exercis
12.10Suppo you need to sort a relation of40gigabytes,with4kilobyte blocks,
using a memory size of40megabytes.Suppo the cost of a ek is5
milliconds,while the disk transfer rate is40megabytes per cond.
a.Find the cost of sorting the relation,in conds,with b b=1and with
b b=100.
b.In each ca,how many merge pass are required?
c.Suppo aflash storage device is ud instead of a disk,and it has a
ek time of1microcond,and a transfer rate of40megabytes per
cond.Recompute the cost of sorting the relation,in conds,with
b b=1and with b b=100,in this tting.
Answer:
人事工作总结
教资考试真题a.The number of blocks in the main memory buffer available for sort-
=104.The number of blocks containing records of
ing(M)is40×106
3
=107.Then the cost of sorting the
the given relation(b r)is40×109
3
relation is:(Number of disk eks×Disk ek cost)+(Number of block
transfers×Block transfer time).Here Disk ek cost is5x10−3conds
103
104Chapter12Query Processing
香怜
and Block transfer time is10−4conds(4×103
40×106).The number of block
transfers is independent of b b and is equal to25×106.
•Ca1:b b=1
Using the equation in Section12.4,the number of disk eks is
5002×103.Therefore the cost of sorting the relation is:(5002×
103)×(5×10−3)+(25×106)×(10−4)=25×103+2500=27500
conds.
•Ca2:b b=100
The number of disk eks is:52×103.Therefore the cost of
sorting the relation is:(52×103)×(5×10−3)+(25×106)×(10−4)
=260+2500=2760conds.
b.Disk storage The number of merge pass required is given by
⌈log M−1(b r)⌉.This is independent of b b.Substituting the values above,
we get⌈log104−1(107
4
)⌉which evaluates to1.
c.Flash storage:
•Ca1:b b=1
红颜知己和蓝颜知己的区别The number of disk eks is:5002×103.Therefore the cost of
sorting the relation is:(5002×103)×(1×10−6)+(25×106)×
(10−4)=5.002+25002506conds.
•Ca2:b b=100
The number of disk eks is:52×103.Therefore the cost of sorting
the relation is:(52×103)×(1×10−6)+(25×106)×(10−4)=
0.052+2500,which is approx=2500conds.
12.11Consider the following extended relational-algebra operators.Describe
how to implement each operation using sorting,and using hashing.
a.Semijoin(⋉␪):r⋉␪s is defined as R(r1␪s),where R is the t
of attributes in the schema of r;that it lects tho tuples r i in r for
which there is a tuple s j in s such that r i and s j satisfy predicate␪.
b.Anti-mijoin(¯⋉␪):r¯⋉␪s is defined as r− R(r1␪s);that it it lects
tho tuples r i in r for which there is no tuple s j in s such that r i and
s j satisfy predicate␪.
Answer:As in the ca of join algorithms,mijoin and anti-mijoin can be done efficiently if the join conditions are equijoin conditions.We describe below how to efficiently handle the ca of equijoin conditions using sorting and hashing.With arbitrary join conditions,sorting and hashing cannot be ud;(block)nested loops join needs to be ud instead.
a.Semijoin:
•Semijoin using Sorting:Sort both r and s on the join attributes in␪.Perform a scan of both r and s similar to the merge al-
Exercis105 gorithm and add tuples of r to the result whenever the join
attributes of the current tuples of r and s match.
•Semijoin using Hashing:Create a hash index in s on the join attributes in␪.Iterate over r,and for each distinct value of the
join attributes,perform a hash lookup in s.If the hash lookup
returns a value,add the current tuple of r to the result.
Note that if r and s are large,they can be partitioned on the
join attributesfirst,and the above procedure applied on each
partition.If r is small but s is large,a hash index can be built on
r,and probed using s;and if an s tuple matches an r tuple,the
r tuple can be output and deleted from the hash index.
b.Anti-mijoin:
•Anti-Semijoin using Sorting:Sort both r and s on the join attributes in␪.Perform a scan of both r and s similar to the
红高梁merge algorithm and add tuples of r to the result if no tuple of
s satisfies the join predicate for the corresponding tuple of r.
•Anti-Semijoin using Hashing:Create a hash index in s on the join attributes in␪.Iterate over r,and for each distinct value
of the join attributes,perform a hash lookup in s.If the hash
lookup returns a null value,add the current tuple of r to the
result.
As for mijoin,partitioning can be ud if r and s are large.An
index on r can be ud instead of an index on s,but then when
an s tuple matches an r tuple,the r tuple is deleted from the
index.After processing all s tuples,all remaining r tuples in the
index are output as the result of the anti-mijoin operation. 12.12Why is it not desirable to force urs to make an explicit choice of a query-
processing strategy?Are there cas in which it is desirable for urs to be aware of the costs of co
mpeting query-processing strategies?Explain your answer.
Answer:In general it is not desirable to force urs to choo a query processing strategy becau naive urs might lect an inefficient strat-egy.The reason urs would make poor choices about processing queries is that they would not know how a relation is stored,nor about its indices.
It is unreasonable to force urs to be aware of the details since ea of u is a major object of databa query languages.If urs are aware of the costs of different strategies they could write queries efficiently,thus helping performance.This could happen if experts were using the system.
12.13Design a variant of the hybrid merge-join algorithm for the ca where
both relations are not physically sorted,but both have a sorted condary index on the join attributes.
Answer:We merge the leaf entries of thefirst sorted condary index with the leaf entries of the cond sorted condary index.The resultfile contains pairs of address,thefirst address in each pair pointing to a
106Chapter12Query Processing
tuple in thefirst relation,and the cond address pointing to a tuple in
the cond relation.
This resultfile isfirst sorted on thefirst relation’s address.The relation
is then scanned in physical storage order,and address in the resultfile
are replaced by the actual tuple values.Then the resultfile is sorted on
the cond relation’s address,allowing a scan of the cond relation in
physical storage order to complete the join.
12.14Estimate the number of block transfers and eks required by your solu-
tion to Exerci12.13for r11r2,where r1and r2are as defined in Practice
Exerci12.3.
Answer:r1occupies800blocks,and r2occupies1500blocks.Let there be
n pointers per index leaf block(we assume that both the indices have leaf
blocks and pointers of equal sizes).Let us assume M pages of memory,
M<800.r1’s index will need B1=⌈20000
n ⌉leaf blocks,and r2’s index
will need B2=⌈45000⌉leaf blocks.Therefore the merge join will need B3=B1+B2access,without output.The number of output tuples
is estimated as n o=⌈20000∗45000
12⌉.Each output tuple will need two
pointers,so the number of blocks of join output will be B o1=⌈n o⌉.Hence the join needs B j=B3+B o1disk block access.
蛲虫Now we have to replace the pointers by actual tuples.For thefirst sorting,
B s1=B o1(2⌈log M−1(B o1/M)⌉+2)disk access are needed,including the
writing of output to disk.The number of blocks of r1which have to be accesd in order to replace the
pointers with tuple values is min(800,n o).
Let n1pairs of the form(r1tuple,pointer to r2)fit in one disk block.Therefore the intermediate result after replacing the r1pointers will occupy B o2=⌈(n o/n1)⌉blocks.Hence thefirst pass of replacing the r1-pointers will cost
B f=B s1+B o1+min(800,n o)+B o2disk access.
The cond pass for replacing the r2-pointers has a similar analysis.Let n2 tuples of thefinal joinfit in one block.Then the cond pass of replacing the r2-pointers will cost B s=B s2+B o2+min(1500,n o)disk access,where
B s2=B o2(2⌈log M−1(B o2/M)⌉+2).
Hence the total number of disk access for the join is B j+B f+B s,and the number of pages of output is⌈n o/n2⌉.
12.15The hash-join algorithm as described in Section12.5.5computes the nat-
ural join of two relations.Describe how to extend the hash-join algorithm to compute the natural left o
uter join,the natural right outer join and the natural full outer join.(Hint:Keep extra information with each tuple in the hash index,to detect whether any tuple in the probe relation matches the tuple in the hash index.)Try out your algorithm on the takes and student relations.
Answer:For the probe relation tuple t r under consideration,if no matching tuple is found in the build relation’s hash partition,it is padded with nulls and included in the result.This will give us the natural left outer join t r1t s.To get the natural right outer join t r1t s,we can keep
Exercis107
a booleanflag with each tuple in the current build relation partition s i
residing in memory,and t it whenever any probe relation tuple matches with it.When we arefinished with s i,all the tuples in it which do not have theirflag t,are padded with nulls and included in the result.To get the natural full outer join,we do both the above operations together.
To try out our algorithm,we u the sample student and takes relations of Figures A.9and A.10.Let us assume that there is enough memory to hold three tuples of the build relation plus a hash index for tho three tuples.
We u takes as the build relation.We u the simple hashing function which returns the student ID mod10.Taking the partition corresponding to value7,we get r1={(“Snow”)},and s1=␾.The tuple in the probe relation partition will have no matching tuple,so(“70557”,“Snow”,“Physics”,“0”,null)is outputted.Proceeding in a similar way,we process all the partitions and complete the join.
12.16Pipelining is ud to avoid writing intermediate results to disk.Suppo
you need to sort relation r using sort–merge and merge-join the result with an already sorted relation s.
a.Describe how the output of the sort of r can be pipelined to the
merge join without being written back to disk.
b.The same idea is applicable even if both inputs to the merge join
are the outputs of sort–merge operations.However,the available
memory has to be shared between the two merge operations(the
merge-join algorithm itlf needs very little memory).What is the
effect of having to share memory on the cost of each sort–merge
operation?
Answer:世说新语的作者是谁
a.Using pipelining,output from the sorting operation on r is written千里之外mv
to a buffer B.When B is full,the merge-join process tuples from B,
joining them with tuples from s until B is empty.At this point,the
sorting operation is resumed and B is refilled.This process continues
until the merge-join is complete.
b.If the sort–merge operations are run in parallel and memory is
shared equally between the two,each operation will have only M/2
frames for its memory buffer.This may increa the number of runs
required to merge the data.
12.17Write pudocode for an iterator that implements a version of the sort
–merge algorithm where the result of thefinal merge is pipelined to its consumers.Your pudocode must define the standard iterator functions open(),next(),and clo().Show what state information the iterator must maintain between calls.
Answer:Let M denote the number of blocks in the main memory buffer available for sorting.For simplicity we assume that there are less than M

本文发布于:2023-07-03 10:31:45,感谢您对本站的认可!

本文链接:https://www.wtabcd.cn/fanwen/fan/89/1065836.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

标签:知己   考试   总结
相关文章
留言与评论(共有 0 条评论)
   
验证码:
推荐文章
排行榜
Copyright ©2019-2022 Comsenz Inc.Powered by © 专利检索| 网站地图