数据库CH (12)

更新时间:2023-07-03 10:31:45 阅读：评论：0

C H A P T E R12

Query Processing

This chapter describes the process by which queries are executed efﬁciently by

a databa system.The chapter starts off with measures of cost,then proceeds

to algorithms for evaluation of relational algebra operators and expressions.This

chapter applies concepts from Chapter2,6,10,and11.

Query processing algorithms can be covered without tedious and distracting details of size estimation.Although size estimation is covered later,in Chapter13,

the prentation there has been simpliﬁed by omitting some details.Instructors

can choo to cover query processing but omit query optimization,without loss

of continuity with later chapters.

Exercis

12.10Suppo you need to sort a relation of40gigabytes,with4kilobyte blocks,

using a memory size of40megabytes.Suppo the cost of a ek is5

milliconds,while the disk transfer rate is40megabytes per cond.

a.Find the cost of sorting the relation,in conds,with b b=1and with

b b=100.

b.In each ca,how many merge pass are required?

c.Suppo aﬂash storage device is ud instead of a disk,and it has a

ek time of1microcond,and a transfer rate of40megabytes per

cond.Recompute the cost of sorting the relation,in conds,with

b b=1and with b b=100,in this tting.

Answer:

人事工作总结

教资考试真题a.The number of blocks in the main memory buffer available for sort-

=104.The number of blocks containing records of

ing(M)is40×106

=107.Then the cost of sorting the

the given relation(b r)is40×109

relation is:(Number of disk eks×Disk ek cost)+(Number of block

transfers×Block transfer time).Here Disk ek cost is5x10−3conds

103

104Chapter12Query Processing

香怜

and Block transfer time is10−4conds(4×103

40×106).The number of block

transfers is independent of b b and is equal to25×106.

•Ca1:b b=1

Using the equation in Section12.4,the number of disk eks is

5002×103.Therefore the cost of sorting the relation is:(5002×

103)×(5×10−3)+(25×106)×(10−4)=25×103+2500=27500

conds.

•Ca2:b b=100

The number of disk eks is:52×103.Therefore the cost of

sorting the relation is:(52×103)×(5×10−3)+(25×106)×(10−4)

=260+2500=2760conds.

b.Disk storage The number of merge pass required is given by

⌈log M−1(b r)⌉.This is independent of b b.Substituting the values above,

we get⌈log104−1(107

)⌉which evaluates to1.

c.Flash storage:

•Ca1:b b=1

红颜知己和蓝颜知己的区别The number of disk eks is:5002×103.Therefore the cost of

sorting the relation is:(5002×103)×(1×10−6)+(25×106)×

(10−4)=5.002+25002506conds.

•Ca2:b b=100

The number of disk eks is:52×103.Therefore the cost of sorting

the relation is:(52×103)×(1×10−6)+(25×106)×(10−4)=

0.052+2500,which is approx=2500conds.

12.11Consider the following extended relational-algebra operators.Describe

how to implement each operation using sorting,and using hashing.

a.Semijoin(⋉␪):r⋉␪s is deﬁned as R(r1␪s),where R is the t

of attributes in the schema of r;that it lects tho tuples r i in r for

which there is a tuple s j in s such that r i and s j satisfy predicate␪.

b.Anti-mijoin(¯⋉␪):r¯⋉␪s is deﬁned as r− R(r1␪s);that it it lects

tho tuples r i in r for which there is no tuple s j in s such that r i and

s j satisfy predicate␪.

Answer:As in the ca of join algorithms,mijoin and anti-mijoin can be done efﬁciently if the join conditions are equijoin conditions.We describe below how to efﬁciently handle the ca of equijoin conditions using sorting and hashing.With arbitrary join conditions,sorting and hashing cannot be ud;(block)nested loops join needs to be ud instead.

a.Semijoin:

•Semijoin using Sorting:Sort both r and s on the join attributes in␪.Perform a scan of both r and s similar to the merge al-

Exercis105 gorithm and add tuples of r to the result whenever the join

attributes of the current tuples of r and s match.

•Semijoin using Hashing:Create a hash index in s on the join attributes in␪.Iterate over r,and for each distinct value of the

join attributes,perform a hash lookup in s.If the hash lookup

returns a value,add the current tuple of r to the result.

Note that if r and s are large,they can be partitioned on the

join attributesﬁrst,and the above procedure applied on each

partition.If r is small but s is large,a hash index can be built on

r,and probed using s;and if an s tuple matches an r tuple,the

r tuple can be output and deleted from the hash index.

b.Anti-mijoin:

•Anti-Semijoin using Sorting:Sort both r and s on the join attributes in␪.Perform a scan of both r and s similar to the

红高梁merge algorithm and add tuples of r to the result if no tuple of

s satisﬁes the join predicate for the corresponding tuple of r.

•Anti-Semijoin using Hashing:Create a hash index in s on the join attributes in␪.Iterate over r,and for each distinct value

of the join attributes,perform a hash lookup in s.If the hash

lookup returns a null value,add the current tuple of r to the

result.

As for mijoin,partitioning can be ud if r and s are large.An

index on r can be ud instead of an index on s,but then when

an s tuple matches an r tuple,the r tuple is deleted from the

index.After processing all s tuples,all remaining r tuples in the

index are output as the result of the anti-mijoin operation. 12.12Why is it not desirable to force urs to make an explicit choice of a query-

processing strategy?Are there cas in which it is desirable for urs to be aware of the costs of co

mpeting query-processing strategies?Explain your answer.

Answer:In general it is not desirable to force urs to choo a query processing strategy becau naive urs might lect an inefﬁcient strat-egy.The reason urs would make poor choices about processing queries is that they would not know how a relation is stored,nor about its indices.

It is unreasonable to force urs to be aware of the details since ea of u is a major object of databa query languages.If urs are aware of the costs of different strategies they could write queries efﬁciently,thus helping performance.This could happen if experts were using the system.

12.13Design a variant of the hybrid merge-join algorithm for the ca where

both relations are not physically sorted,but both have a sorted condary index on the join attributes.

Answer:We merge the leaf entries of theﬁrst sorted condary index with the leaf entries of the cond sorted condary index.The resultﬁle contains pairs of address,theﬁrst address in each pair pointing to a

106Chapter12Query Processing

tuple in theﬁrst relation,and the cond address pointing to a tuple in

the cond relation.

This resultﬁle isﬁrst sorted on theﬁrst relation’s address.The relation

is then scanned in physical storage order,and address in the resultﬁle

are replaced by the actual tuple values.Then the resultﬁle is sorted on

the cond relation’s address,allowing a scan of the cond relation in

physical storage order to complete the join.

12.14Estimate the number of block transfers and eks required by your solu-

tion to Exerci12.13for r11r2,where r1and r2are as deﬁned in Practice

Exerci12.3.

Answer:r1occupies800blocks,and r2occupies1500blocks.Let there be

n pointers per index leaf block(we assume that both the indices have leaf

blocks and pointers of equal sizes).Let us assume M pages of memory,

M<800.r1’s index will need B1=⌈20000

n ⌉leaf blocks,and r2’s index

will need B2=⌈45000⌉leaf blocks.Therefore the merge join will need B3=B1+B2access,without output.The number of output tuples

is estimated as n o=⌈20000∗45000

12⌉.Each output tuple will need two

pointers,so the number of blocks of join output will be B o1=⌈n o⌉.Hence the join needs B j=B3+B o1disk block access.

蛲虫Now we have to replace the pointers by actual tuples.For theﬁrst sorting,

B s1=B o1(2⌈log M−1(B o1/M)⌉+2)disk access are needed,including the

writing of output to disk.The number of blocks of r1which have to be accesd in order to replace the

pointers with tuple values is min(800,n o).

Let n1pairs of the form(r1tuple,pointer to r2)ﬁt in one disk block.Therefore the intermediate result after replacing the r1pointers will occupy B o2=⌈(n o/n1)⌉blocks.Hence theﬁrst pass of replacing the r1-pointers will cost

B f=B s1+B o1+min(800,n o)+B o2disk access.

The cond pass for replacing the r2-pointers has a similar analysis.Let n2 tuples of theﬁnal joinﬁt in one block.Then the cond pass of replacing the r2-pointers will cost B s=B s2+B o2+min(1500,n o)disk access,where

B s2=B o2(2⌈log M−1(B o2/M)⌉+2).

Hence the total number of disk access for the join is B j+B f+B s,and the number of pages of output is⌈n o/n2⌉.

12.15The hash-join algorithm as described in Section12.5.5computes the nat-

ural join of two relations.Describe how to extend the hash-join algorithm to compute the natural left o

uter join,the natural right outer join and the natural full outer join.(Hint:Keep extra information with each tuple in the hash index,to detect whether any tuple in the probe relation matches the tuple in the hash index.)Try out your algorithm on the takes and student relations.

Answer:For the probe relation tuple t r under consideration,if no matching tuple is found in the build relation’s hash partition,it is padded with nulls and included in the result.This will give us the natural left outer join t r1t s.To get the natural right outer join t r1t s,we can keep

Exercis107

a booleanﬂag with each tuple in the current build relation partition s i

residing in memory,and t it whenever any probe relation tuple matches with it.When we areﬁnished with s i,all the tuples in it which do not have theirﬂag t,are padded with nulls and included in the result.To get the natural full outer join,we do both the above operations together.

To try out our algorithm,we u the sample student and takes relations of Figures A.9and A.10.Let us assume that there is enough memory to hold three tuples of the build relation plus a hash index for tho three tuples.

We u takes as the build relation.We u the simple hashing function which returns the student ID mod10.Taking the partition corresponding to value7,we get r1={(“Snow”)},and s1=␾.The tuple in the probe relation partition will have no matching tuple,so(“70557”,“Snow”,“Physics”,“0”,null)is outputted.Proceeding in a similar way,we process all the partitions and complete the join.

12.16Pipelining is ud to avoid writing intermediate results to disk.Suppo

you need to sort relation r using sort–merge and merge-join the result with an already sorted relation s.

a.Describe how the output of the sort of r can be pipelined to the

merge join without being written back to disk.

b.The same idea is applicable even if both inputs to the merge join

are the outputs of sort–merge operations.However,the available

memory has to be shared between the two merge operations(the

merge-join algorithm itlf needs very little memory).What is the

effect of having to share memory on the cost of each sort–merge

operation?

Answer:世说新语的作者是谁

a.Using pipelining,output from the sorting operation on r is written千里之外mv

to a buffer B.When B is full,the merge-join process tuples from B,

joining them with tuples from s until B is empty.At this point,the

sorting operation is resumed and B is reﬁlled.This process continues

until the merge-join is complete.

b.If the sort–merge operations are run in parallel and memory is

shared equally between the two,each operation will have only M/2

frames for its memory buffer.This may increa the number of runs

required to merge the data.

12.17Write pudocode for an iterator that implements a version of the sort

–merge algorithm where the result of theﬁnal merge is pipelined to its consumers.Your pudocode must deﬁne the standard iterator functions open(),next(),and clo().Show what state information the iterator must maintain between calls.

Answer:Let M denote the number of blocks in the main memory buffer available for sorting.For simplicity we assume that there are less than M

本文发布于:2023-07-03 10:31:45，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/89/1065836.html

上一篇：关于垃圾分类英语作文120词

下一篇：2022年江苏省苏州张家港市中考一模英语试题(原版)

标签：知己考试总结

留言与评论（共有 0 条评论）