oracle随机抽样
需求是这样的:程序每隔⼏个⼩时,从表中取100个数据,监测其正确性。
按照正常的写法,每次取出的结果都是⼀样的。
lect * from ba_part where rownum<10,因为oracle每次的执⾏计划都是⼀样的,数据扫描⽅运动会文章
法顺序都是⼀样的,所以返回的结果也都是⼀样的。
随机取数据的⽅法,⼀般⽹上有两种,
⼀是⽤随机数排序。这种⽅法能够取到正确的结果,但是属于数据量很⼤的情况,排序是⾮常耗时的。⽽我的表基本上都在百万以上,所以这种⽅法只能pass了。按照随机数排序实际上是把随机数当做⼀列,然后排序。对应的语句为:
SELECT t.*,DBMS_RANDOM.value FROM TableName1 t ORDER BY DBMS_RANDOM.value;
再就是使⽤sample关键字,进⾏随机抽样。官⽹⽂档关于sample语句有很明确的说明。
sample_clau The sample_clau lets you instruct the databa to lect from a random sample of 自我介绍100字
d
ata from the table, rather than from the entire table.BLOCK BLOCK instructs the databa to attempt to perform random block sampling instead of random row sampling.(block说明数据库在读取数据时是随机快读取⽽不是随机⾏)
Block sampling is possible only during full table scans or index fast full scans. If a more efficie长春西汀注射液
nt execution pa隋朝是怎么灭亡的
th exists, then Oracle Databa does not perform block sampling. If you want to guarantee block sampling for a particular table or index, then u the FULL or INDEX祖国的强大
_FFS hint.(随机读取仅在全表扫描或者索引快速全扫描的时候有效)
sample_percent For sample_percent, specify the percentage of the total row or block count to be included in the sample. The value must be in the r我身边的好老师
ange .000001 to,but not including, 100. This percentage indicates the probability of each row, or each cluster of rows in the ca of block sampling, being lected as part of the sample. It does not mean that the databa will retrieve exactly sample_percent of the rows of table.(sample⼦句中的百分⽐的意思是,这⼀⾏或者块被读取的可能性。例如100⾏数据的表,sample(10形容团结的词语
)返回的结果不⼀定是10⾏,极端情况下可能为0⾏和100⾏,根据概率论的基本知识,这个很容易理解)
SEED ed_value Specify this clau to instruct the databa to attempt to return the same sample f
rom one execution to the n红红的日子
ext. The ed_value must be an integer between 0 and 4294967295. If you omit this clau, then the resulting sampl不假思索近义词
e will change from one execution to the next.(相同的种⼦值,返回的结果相同)我采⽤了这种⽅法,但是存在极端情况,就是返回结果不到100⾏,但是由于表⽐较⼤,上千万⾏,适当提⾼读取概率,基本纵向思维
上可以说是⼩概率事件。⽽且如果极端情况出现,也不会造成⼤的影响。所以就这样吧。