Codeforces科学刷题指南

更新时间:2023-08-01 15:02:39 阅读：评论：0

Codeforces科学刷题指南

简要介绍如何科学地刷算法题，来提⾼⾃⼰解决问题的能⼒，并利⽤爬⾍抓取Codeforces的题库，来分析题⽬难度以及算法分类的关系

⽆论做什么事，多尝试、找套路、然后刻意练习都是⾄关重要的。对信息科学竞赛（Olympiad in Informatics）爱好者来说，找套路的关键就是多刷题。然⽽题海茫茫，单以Codeforces来说，截⽌2017年1⽉3⽇，总共有3206道题。换⾔之，如果⼀个⼈⾜够勤奋，能够⼀天刷三道题，那也得快三年才能把题⽬刷完，⽽且题⽬数量还在扩充。所以盲⽬的刷题简直是浪费⽣命，本⼈从16年上半年⼀直按照题⽬解决⼈数从⾼到低排序，不断的刷⽔题。显然易见，刷⽔题的后果就是没有长进，熟悉的还是熟悉，不懂的还是不懂，唯⼀让⾃⼰开⼼的就是刷题数量的累积。所以科学刷题的本质在于不断挑战新⾼度，在⼀个平台练习⾜够久⾜够熟练之后，就要进⼊下⼀个难度平台。为了⽅便⼤家，我把Codeforces上截⽌2017年1⽉3⽇的所有题⽬的基本信息⽤爬⾍收集了下来，并存储到excel⾥。更进⼀步，本⽂试图分析不同算法在不同难度等级上的出现频率分布，以及不同算法在不同难度等级上被解决次数的分布。最后，我会简要介绍的我的刷题观，以及如何爬取Codeforces上的信息。

先说结论

⼀张图

Codeforces_Algorithms_Tag_Frequency.jpg

上⾯这张图反映了不同算法（第⼀列）在不同问题难度（第⼀⾏）上的频率分布，基于该图，⼤概就可以知道在什么样的⽔平下应该掌握什么样的算法。不过这⾥我没有区分Div1和Div2之间的差别，仅仅是按照题号（A、B、C等等）来推断难度。可以看到对简单的A题⽽⾔，⼤部分都是考察基本的编程功底，诸如implementation（⼤概就是题⽬说什么，你做什么就是了），math（四则运算、取模取整等等）以及brute force（暴⼒枚举）。⽽随着难度的增加，⽐如说E题，主要就在于考察对dp（动态规划），data structures（数据结构）。当然了，从图中也可以看出，⾼难度题⽬主要在math，geometry（计算⼏何），shortest

path（图论）以及games（博弈）上。下⾯再免费附送领⼀张图，反映了不同算法在不同问题难度上被解决次数的频率分布。

Codeforces_Algorithms_Tag_Solved.jpg

⼀张表

感恩作文题目Codeforces-ProblemSet.jpg

然后祭上刷题⽬录，也就是这⼀张表，汇总了截⽌2017年1⽉3⽇Codeforces题⽬上的所有算法题。基

于这张表，⼀来可以按照解决⼈数来进⾏刷题，⼆来可以按照题⽬难度进⾏刷题，三来还可以进⾏主题刷题。具体的⽂件下载链接请见⽂末。

我的刷题观

这⾥搬来我在知乎上的回答，详见

如果想提升⾃⼰的思维能⼒，可以按照AC率或者解决⼈数由低到⾼⼆分查找匹配⾃⼰当前⽔平难度的题⽬，然后适当挑战⾼难度题（⼆分时间复杂度是，⾄少⽐从易到难的节省时间）

展拓香港界址专条如果想巩固某⼀专题，那⾃然应该按照tag来刷题，但是因为所⽤的⽅法在求解前已知，不太利于思维能⼒的提升

如果什么都不懂，那么建议随机刷题，⼀来可以涨见识，⼆来进步空间⽐较⼤

如果想提⾼AC率或者增加⾃信，那么建议刷⽔题

混搭以上策略，⽐如针对某⼀专题，然后⽤⼆分查找来选择问题求解

动作片的英文再有个建议，题⽬如果太难超过⾃⼰当前能⼒的话，尝试⼀定时间后还是⽼⽼实实看题解吧，⼈与⼈

之间还是有天赋差别的，但区别在于经验可以慢慢积累。特别是即使做对题之后，还要想尽办法看有没有提⾼的余地，并参考别⼈的代码，看如何精简代码以及精简时间空间复杂度。

tourist.jpg

据说⼤神们的刷题量都是上万的，所以正式⽐赛⾥可以看到诸多⼤神不到⼀分钟就秒了⼀道题，⼿速太快。对Competitive Programming ⽽⾔，把题⽬做对是基本要求（题⽬太难则另当别论），⽤更快的速度求解才是顶尖⾼⼿之间的核⼼区别。如果说真的有天赋存在的话，那我们也⽆能为⼒；但希望能像卖油翁⼀样说出，『⽆他，但⼿熟尔』。

如何⽤爬⾍获取信息

必要的库

1: import re

2: quest

3: from bs4 import BeautifulSoup

4: import os

5: import csv

6: import time

爬取Codeforces的所有算法题

1: #%% retrieve the problem t

2: def spider(url):

宝宝缺锌怎么补3: respon = quest.urlopen(url)

4: soup = ad())

5: pattern = {'name': 'tr'}

6: content = soup.findAll(**pattern)

7: for row in content:

8: item = row.findAll('td')

9: try:

10: # get the problem id

11: id = item[0].find('a').string.strip()

12: col2 = item[1].findAll('a')

13: # get the problem title

14: title = col2[0].string.strip()

15: # get the problem tags

16: tags = [foo.string.strip() for foo in col2[1:]]

17: # get the number of AC submissions

18: solved = re.findall('x(\d+)', str(item[3].find('a')))[0]

19: # update the problem info

20: codeforces[id] = {'title':title, 'tags':tags, 'solved':solved, 'accepted':0,}

21: except:

22: continue

23: return soup

24:

25: codeforces = {}

26: wait = 15 # wait time to avoid the blocking of spider

27: last_page = 33 # the total page number of problem t page

28: url = ['/problemt/page/%d' % page for page in range(1,last_page+1)] 29: for foo in url:

30: print('Processing URL %s' % foo)

31: spider(foo)

32: print('Wait %f conds' % wait)

33: time.sleep(wait)

标记已解决的算法题

1: #%% mark the accepted problems

2: def accepted(url):

3: respon = quest.urlopen(url)

4: soup = ad())

5: pattern = {'name':'table', 'class':'status-frame-datatable'}

6: table = soup.findAll(**pattern)[0]

7: pattern = {'name': 'tr'}

8: content = table.findAll(**pattern)

9: for row in content:

10: try:

11: item = row.findAll('td')

12: # check whether this problem is solved

13: if 'Accepted' in str(row):

14: id = item[3].find('a').string.split('-')[0].strip()

15: codeforces[id]['accepted'] = 1

16: except:

17: continue

18: return soup

19:

20: wait = 15 # wait time to avoid the blocking of spider

21: last_page = 10 # the total page number of ur submission

22: handle = 'Greenwicher' # plea input your handle

23: url = ['/submissions/%s/page/%d' % (handle, page) for page in range(1, last_page+1)] 24: for foo in url:

25: print('Processing URL %s' % foo)

26: accepted(foo)

27: print('Wait %f conds' % wait)

28: time.sleep(wait)

输出爬取信息到csv⽂本

1: #%% output the problem t to csv files好笔画

2: root = os.getcwd()

3: with open(os.path.join(root,"CodeForces-ProblemSet.csv"),"w", encoding="utf-8") as f_out: 4: f_csv = csv.writer(f_out)

张杰闰土

5: f_csv.writerow(['ID', 'Title', 'Tags', 'Solved', 'Accepted'])

6: for id in codeforces:

7: title = codeforces[id]['title']

8: tags = ', '.join(codeforces[id]['tags'])

9: solved = codeforces[id]['solved']求根公式一元二次方程

10: accepted = codeforces[id]['accepted']

11: f_csv.writerow([id, title, tags, solved, accepted])

跨年诗句

12: f_out.clo()

分析题⽬难度以及算法分类的关系

1: #%% analyze the problem t

2: # initialize the difficult and tag list

3: difficult_level = {}

4: tags_level = {}

5: for id in codeforces:

6: difficult = re.findall('([A-Z])', id)[0]

7: tags = codeforces[id]['tags']

8: difficult_level[difficult] = (difficult, 0) + 1

9: for tag in tags:

10: tags_level[tag] = (tag, 0) + 1

11: import operator

12: tag_level = sorted(tags_level.items(), key=operator.itemgetter(1))[::-1]

13: tag_list = [foo[0] for foo in tag_level]

14: difficult_level = sorted(difficult_level.items(), key=operator.itemgetter(0))

15: difficult_list = [foo[0] for foo in difficult_level]16:

17: # initialize the 2D relationships matrix

18: # matrix_solved: the number of AC submission for each tag in each difficult level

19: # matrix_freq: the number of tag frequency for each diffiicult level

20: matrix_solved, matrix_freq = [[[0] * len(difficult_list) for _ in range(len(tag_list))] for _ in range(2)]21:

22:

23: # construct the 2D relationships matrix

24: for id in codeforces:

25: difficult = re.findall('([A-Z])', id)[0]

26: difficult_id = difficult_list.index(difficult)

27: tags = codeforces[id]['tags']

28: solved = codeforces[id]['solved']

29: for tag in tags:

30: tag_id = tag_list.index(tag)

31: matrix_solved[tag_id][difficult_id] += int(solved)32: matrix_freq[tag_id][difficult_id] += 1下载本⽂源代码以及分析结果

</div>

</article>

本文发布于:2023-08-01 15:02:39，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/89/1104484.html

上一篇：九年级英语人教版第一单元测试题

下一篇：英语词性转换填空题

标签：算法刷题科学在于

留言与评论（共有 0 条评论）