Codeforces科学刷题指南
简要介绍如何科学地刷算法题,来提⾼⾃⼰解决问题的能⼒,并利⽤爬⾍抓取Codeforces的题库,来分析题⽬难度以及算法分类的关系
⽆论做什么事,多尝试、找套路、然后刻意练习都是⾄关重要的。对信息科学竞赛(Olympiad in Informatics)爱好者来说,找套路的关键就是多刷题。然⽽题海茫茫,单以Codeforces来说,截⽌2017年1⽉3⽇,总共有3206道题。换⾔之,如果⼀个⼈⾜够勤奋,能够⼀天刷三道题,那也得快三年才能把题⽬刷完,⽽且题⽬数量还在扩充。所以盲⽬的刷题简直是浪费⽣命,本⼈从16年上半年⼀直按照题⽬解决⼈数从⾼到低排序,不断的刷⽔题。显然易见,刷⽔题的后果就是没有长进,熟悉的还是熟悉,不懂的还是不懂,唯⼀让⾃⼰开⼼的就是刷题数量的累积。所以科学刷题的本质在于不断挑战新⾼度,在⼀个平台练习⾜够久⾜够熟练之后,就要进⼊下⼀个难度平台。为了⽅便⼤家,我把Codeforces上截⽌2017年1⽉3⽇的所有题⽬的基本信息⽤爬⾍收集了下来,并存储到excel⾥。更进⼀步,本⽂试图分析不同算法在不同难度等级上的出现频率分布,以及不同算法在不同难度等级上被解决次数的分布。最后,我会简要介绍的我的刷题观,以及如何爬取Codeforces上的信息。
先说结论
⼀张图
Codeforces_Algorithms_Tag_Frequency.jpg
上⾯这张图反映了不同算法(第⼀列)在不同问题难度(第⼀⾏)上的频率分布,基于该图,⼤概就可以知道在什么样的⽔平下应该掌握什么样的算法。不过这⾥我没有区分Div1和Div2之间的差别,仅仅是按照题号(A、B、C等等)来推断难度。可以看到对简单的A题⽽⾔,⼤部分都是考察基本的编程功底,诸如implementation(⼤概就是题⽬说什么,你做什么就是了),math(四则运算、取模取整等等)以及brute force(暴⼒枚举)。⽽随着难度的增加,⽐如说E题,主要就在于考察对dp(动态规划),data structures(数据结构)。当然了,从图中也可以看出,⾼难度题⽬主要在math,geometry(计算⼏何),shortest
path(图论)以及games(博弈)上。下⾯再免费附送领⼀张图,反映了不同算法在不同问题难度上被解决次数的频率分布。
Codeforces_Algorithms_Tag_Solved.jpg
⼀张表
感恩作文题目Codeforces-ProblemSet.jpg
然后祭上刷题⽬录,也就是这⼀张表,汇总了截⽌2017年1⽉3⽇Codeforces题⽬上的所有算法题。基
于这张表,⼀来可以按照解决⼈数来进⾏刷题,⼆来可以按照题⽬难度进⾏刷题,三来还可以进⾏主题刷题。具体的⽂件下载链接请见⽂末。
我的刷题观
这⾥搬来我在知乎上的回答,详见
如果想提升⾃⼰的思维能⼒ ,可以按照AC率或者解决⼈数由低到⾼⼆分查找匹配⾃⼰当前⽔平难度的题⽬,然后适当挑战⾼难度题(⼆分时间复杂度是 ,⾄少⽐从易到难的 节省时间)
展拓香港界址专条如果想巩固某⼀专题 ,那⾃然应该按照tag来刷题,但是因为所⽤的⽅法在求解前已知,不太利于思维能⼒的提升
如果什么都不懂 ,那么建议随机刷题,⼀来可以涨见识,⼆来进步空间⽐较⼤
如果想提⾼AC率或者增加⾃信 ,那么建议刷⽔题
混搭以上策略 ,⽐如针对某⼀专题,然后⽤⼆分查找来选择问题求解
动作片的英文再有个建议,题⽬如果太难超过⾃⼰当前能⼒的话,尝试⼀定时间后还是⽼⽼实实看题解吧,⼈与⼈
之间还是有天赋差别的,但区别在于经验可以慢慢积累。特别是即使做对题之后,还要想尽办法看有没有提⾼的余地,并参考别⼈的代码,看如何精简代码以及精简时间空间复杂度。
tourist.jpg
据说⼤神们的刷题量都是上万的,所以正式⽐赛⾥可以看到诸多⼤神不到⼀分钟就秒了⼀道题,⼿速太快。对Competitive Programming ⽽⾔,把题⽬做对是基本要求(题⽬太难则另当别论),⽤更快的速度求解才是顶尖⾼⼿之间的核⼼区别。如果说真的有天赋存在的话,那我们也⽆能为⼒;但希望能像卖油翁⼀样说出,『⽆他,但⼿熟尔』。
如何⽤爬⾍获取信息
必要的库
1: import re
2: quest
3: from bs4 import BeautifulSoup
4: import os
5: import csv
6: import time
爬取Codeforces的所有算法题
1: #%% retrieve the problem t
2: def spider(url):
宝宝缺锌怎么补3: respon = quest.urlopen(url)
4: soup = ad())
5: pattern = {'name': 'tr'}
6: content = soup.findAll(**pattern)
7: for row in content:
8: item = row.findAll('td')
9: try:
10: # get the problem id
11: id = item[0].find('a').string.strip()
12: col2 = item[1].findAll('a')
13: # get the problem title
14: title = col2[0].string.strip()
15: # get the problem tags
16: tags = [foo.string.strip() for foo in col2[1:]]
17: # get the number of AC submissions
18: solved = re.findall('x(\d+)', str(item[3].find('a')))[0]
19: # update the problem info
20: codeforces[id] = {'title':title, 'tags':tags, 'solved':solved, 'accepted':0,}
21: except:
22: continue
23: return soup
24:
25: codeforces = {}
26: wait = 15 # wait time to avoid the blocking of spider
27: last_page = 33 # the total page number of problem t page
28: url = ['/problemt/page/%d' % page for page in range(1,last_page+1)] 29: for foo in url:
30: print('Processing URL %s' % foo)
31: spider(foo)
32: print('Wait %f conds' % wait)
33: time.sleep(wait)
标记已解决的算法题
1: #%% mark the accepted problems
2: def accepted(url):
3: respon = quest.urlopen(url)
4: soup = ad())
5: pattern = {'name':'table', 'class':'status-frame-datatable'}
6: table = soup.findAll(**pattern)[0]
7: pattern = {'name': 'tr'}
8: content = table.findAll(**pattern)
9: for row in content:
10: try:
11: item = row.findAll('td')
12: # check whether this problem is solved
13: if 'Accepted' in str(row):
14: id = item[3].find('a').string.split('-')[0].strip()
15: codeforces[id]['accepted'] = 1
16: except:
17: continue
18: return soup
19:
20: wait = 15 # wait time to avoid the blocking of spider
21: last_page = 10 # the total page number of ur submission
22: handle = 'Greenwicher' # plea input your handle
23: url = ['/submissions/%s/page/%d' % (handle, page) for page in range(1, last_page+1)] 24: for foo in url:
25: print('Processing URL %s' % foo)
26: accepted(foo)
27: print('Wait %f conds' % wait)
28: time.sleep(wait)
输出爬取信息到csv⽂本
1: #%% output the problem t to csv files好笔画
2: root = os.getcwd()
3: with open(os.path.join(root,"CodeForces-ProblemSet.csv"),"w", encoding="utf-8") as f_out: 4: f_csv = csv.writer(f_out)
张杰闰土
5: f_csv.writerow(['ID', 'Title', 'Tags', 'Solved', 'Accepted'])
6: for id in codeforces:
7: title = codeforces[id]['title']
8: tags = ', '.join(codeforces[id]['tags'])
9: solved = codeforces[id]['solved']求根公式一元二次方程
10: accepted = codeforces[id]['accepted']
11: f_csv.writerow([id, title, tags, solved, accepted])
跨年诗句
12: f_out.clo()
分析题⽬难度以及算法分类的关系
1: #%% analyze the problem t
2: # initialize the difficult and tag list
3: difficult_level = {}
4: tags_level = {}
5: for id in codeforces:
6: difficult = re.findall('([A-Z])', id)[0]
7: tags = codeforces[id]['tags']
8: difficult_level[difficult] = (difficult, 0) + 1
9: for tag in tags:
10: tags_level[tag] = (tag, 0) + 1
11: import operator
12: tag_level = sorted(tags_level.items(), key=operator.itemgetter(1))[::-1]
13: tag_list = [foo[0] for foo in tag_level]
14: difficult_level = sorted(difficult_level.items(), key=operator.itemgetter(0))
15: difficult_list = [foo[0] for foo in difficult_level]16:
17: # initialize the 2D relationships matrix
18: # matrix_solved: the number of AC submission for each tag in each difficult level
19: # matrix_freq: the number of tag frequency for each diffiicult level
20: matrix_solved, matrix_freq = [[[0] * len(difficult_list) for _ in range(len(tag_list))] for _ in range(2)]21:
22:
23: # construct the 2D relationships matrix
24: for id in codeforces:
25: difficult = re.findall('([A-Z])', id)[0]
26: difficult_id = difficult_list.index(difficult)
27: tags = codeforces[id]['tags']
28: solved = codeforces[id]['solved']
29: for tag in tags:
30: tag_id = tag_list.index(tag)
31: matrix_solved[tag_id][difficult_id] += int(solved)32: matrix_freq[tag_id][difficult_id] += 1下载本⽂源代码以及分析结果
</div>
</div>
</article>