如何判断字符串是否在Python中重复?

更新时间:2023-07-20 10:06:19 阅读: 评论:0

如何判断字符串是否在Python中重复?
本⽂翻译⾃:
I'm looking for a way to test whether or not a given string repeats itlf for the entire string or not. 我正在寻找⼀种⽅法来测
我正在寻找⼀种⽅法来测试⼀个给定的字符串是否为整个字符串重复⾃⼰。
Examples: 例⼦:
例⼦:
[
'0045662100456621004566210045662100456621',            # '00456621'
'0072992700729927007299270072992700729927',            # '00729927'
当我想起你张学友'001443001443001443001443001443001443001443',          # '001443'
'037037037037037037037037037037037037037037037',        # '037'
'047619047619047619047619047619047619047619',          # '047619'
'002457002457002457002457002457002457002457',          # '002457'
'001221001221001221001221001221001221001221',          # '001221'
'001230012300123001230012300123001230012300123',        # '00123'
'0013947001394700139470013947001394700139470013947',    # '0013947'
'001001001001001001001001001001001001001001001001001',  # '001'
'001406469760900140646976090014064697609',              # '0014064697609'
]
are strings which repeat themlves, and 是重复⾃⼰的字符串,和
是重复⾃⼰的字符串,和
[
'004608294930875576036866359447',
'00469483568075117370892018779342723',
'004739336492890995260663507109',
'001508295625942684766214177978883861236802413273',
'007518796992481203',
'0071942446043165467625899280575539568345323741',
'0434782608695652173913',
'0344827586206896551724137931',
'002481389578163771712158808933',
'002932551319648093841642228739',
'0035587188612099644128113879',
'003484320557491289198606271777',
'00115074798619102416570771',
]
are examples of ones that do not. 是那些没有的例⼦。
是那些没有的例⼦。
The repeating ctions of the strings I'm given can be quite long, and the strings themlves can be 500 or more characters, so looping through each character trying to build a pattern then checking the pattern vs the rest of the string ems awful slow. 我给出的字符串的重复部分可能很长,并且字符串本⾝可以是500或更多字符,因此循环遍历每个字符尝试构我给出的字符串的重复部分可能很长,并且字符串本⾝可以是500或更多字符,因此循环遍历每个字符尝试构建模式然后检查模式与字符串的其余部分似乎⾮常慢。 Multiply that by potentially hundreds of strings and I can't e any
建模式然后检查模式与字符串的其余部分似乎⾮常慢。
intuitive solution. 乘以可能数百个字符串,我看不到任何直观的解决⽅案。
乘以可能数百个字符串,我看不到任何直观的解决⽅案。
I've looked into regexes a bit and they em good for when you know what you're looking for, or at least the length of the pattern you're looking for. 我已经看了⼀下正则表达式,当你知道你在寻找什么,或者⾄少是你正在寻找的模式的长度时,它们看我已经看了⼀下正则表达式,当你知道你在寻找什么,或者⾄少是你正在寻找的模式的长度时,它们看
不幸的是,我也不知道。
起来很好。 Unfortunately, I know neither. 不幸的是,我也不知道。
起来很好。
How can I tell if a string is repeating itlf and if it is, what the shortest repeating subquence is? 如何判断⼀个字符串是否
如何判断⼀个字符串是否重复,如果是,那么最短的重复⼦序列是什么?
#1楼
#2楼
Here's a solution using regular expressions. 这是使⽤正则表达式的解决⽅案。
这是使⽤正则表达式的解决⽅案。import re
REPEATER = re.compile(r"(.+?)\1+$")
好听女英文名def repeated(s):
match = REPEATER.match(s)
up(1) if match el None
Iterating over the examples in the question: 迭代问题中的⽰例:
迭代问题中的⽰例:
examples = [
'0045662100456621004566210045662100456621',
'0072992700729927007299270072992700729927',
'001443001443001443001443001443001443001443',
'037037037037037037037037037037037037037037037',
'047619047619047619047619047619047619047619',
'002457002457002457002457002457002457002457',
'001221001221001221001221001221001221001221',
'001230012300123001230012300123001230012300123',
'0013947001394700139470013947001394700139470013947',
'001001001001001001001001001001001001001001001001001',
excited是什么意思
'001406469760900140646976090014064697609',
'004608294930875576036866359447',
'00469483568075117370892018779342723',
'004739336492890995260663507109',
'001508295625942684766214177978883861236802413273',
'007518796992481203',
'0071942446043165467625899280575539568345323741',
'0434782608695652173913',
'0344827586206896551724137931',
'002481389578163771712158808933',
'002932551319648093841642228739',
'0035587188612099644128113879',
'003484320557491289198606271777',
'00115074798619102416570771',
]
for e in examples:
sub = repeated(e)
if sub:
print("%r: %r" % (e, sub))
el:
print("%r does not repeat." % e)
... produces this output: ...产⽣这个输出:
...产⽣这个输出:
'0045662100456621004566210045662100456621': '00456621'
'0072992700729927007299270072992700729927': '00729927'
'001443001443001443001443001443001443001443': '001443'
'037037037037037037037037037037037037037037037': '037'
'047619047619047619047619047619047619047619': '047619'
'002457002457002457002457002457002457002457': '002457'
'001221001221001221001221001221001221001221': '001221'
'001230012300123001230012300123001230012300123': '00123'
'0013947001394700139470013947001394700139470013947': '0013947'
'001001001001001001001001001001001001001001001001001': '001'
'001406469760900140646976090014064697609': '0014064697609'
'004608294930875576036866359447' does not repeat.
'00469483568075117370892018779342723' does not repeat.
'004739336492890995260663507109' does not repeat.
'001508295625942684766214177978883861236802413273' does not repeat.describes
'007518796992481203' does not repeat.
'0071942446043165467625899280575539568345323741' does not repeat.
'0434782608695652173913' does not repeat.
法语  '0344827586206896551724137931' does not repeat.一月英文缩写
'002481389578163771712158808933' does not repeat.
'002932551319648093841642228739' does not repeat.
'0035587188612099644128113879' does not repeat.
'003484320557491289198606271777' does not repeat.
'00115074798619102416570771' does not repeat.
The regular expression (.+?)\\1+$ is divided into three parts: 正则表达式
正则表达式(.+?)\\1+$分为三个部分:
1. (.+?) is a matching group containing at least one (but as few as possible) of any character (becau ).(.+?)是⼀个匹配
组,包含⾄少⼀个(但尽可能少)任何字符(因为 )。
2. \\1+ checks for at least one repetition of the matching group in the first part.\\1+检查第⼀部分中匹配组的⾄少⼀次重复。
3. $ checks for the end of the string, to ensure that there's no extra, non-repeating content after the repeated substrings
(and using ensures that there's no non-repeating text before the repeated substrings).$检查字符串的结尾,以确保在重复的⼦字符串之后没有额外的,⾮重复的内容(并且使⽤确保在重复的⼦字符串之前没有⾮重复的⽂本)。
In Python 3.4 and later, you could drop the $ and u instead, or (in any Python at least as far back as 2.3) go the other way and u with the regex ^(.+?)\\1+$ , all of which are more down to personal taste than anything el. 在Python 3.4及更⾼版本
在Python 3.4及更⾼版本中,您可以删除$并使⽤代替,或者(在任何Python中⾄少早于2.3) 另⼀种⽅式并使⽤与正则表达式^(.+?)\\1+$ ,所有这些都⽐其他任何东西都更符合个⼈品味。
#3楼
Non-regex solution: ⾮正则表达式解决⽅案:
⾮正则表达式解决⽅案:
def repeat(string):
for i in range(1, len(string)//2+1):
if not len(string)%len(string[0:i]) and string[0:i]*(len(string)//len(string[0:i])) == string:
return string[0:i]
Faster non-regex solution, thanks to @ThatWeirdo (e comments): 更快的⾮正则表达式解决⽅案,感谢@ThatWeirdo(见评
更快的⾮正则表达式解决⽅案,感谢@ThatWeirdo(见评论):
def repeat(string):
l = len(string)
for i in range(1, len(string)//2+1):
if l%i: continue
s = string[0:i]
if s*(l//i) == string:
return s
The above solution is very rarely slower than the original by a few percent, but it's usually a good bit faster - sometimes a whole lot faster. 上⾯的解决⽅案很少⽐原始解决⽅案慢⼏个百分点,但它通常要快⼀点 - 有时速度要快很多。
上⾯的解决⽅案很少⽐原始解决⽅案慢⼏个百分点,但它通常要快⼀点 - 有时速度要快很多。 It's still not faster than davidism's for longer strings, and zero's regex solution is superior for short strings. 对于较长的字符串,它仍然不⽐
对于较长的字符串,它仍然不⽐davidism快,⽽对于短字符串,零的正则表达式解决⽅案更胜⼀筹。 It comes out to the fastest (according to davidism's test davidism快,⽽对于短字符串,零的正则表达式解决⽅案更胜⼀筹。
on github - e his answer) with strings of about 1000-1500 characters. 它以最快的速度出现(根据dithidism对github的测
它以最快的速度出现(根据dithidism对github的测试 - 请参阅他的回答),其中包含⼤约1000-1500个字符的字符串。 Regardless, it's reliably cond-fastest (or better) in all 试 - 请参阅他的回答),其中包含⼤约1000-1500个字符的字符串。
cas I tested. ⽆论如何,在我测试的所有情况下,它都是可靠的第⼆快(或更好)。
⽆论如何,在我测试的所有情况下,它都是可靠的第⼆快(或更好)。 Thanks, ThatWeirdo. 谢谢
谢,ThatWeirdo。
Test: 测试:
测试:
print(repeat('009009009'))
print(repeat('254725472547'))
print(repeat('abcdeabcdeabcdeabcde'))
print(repeat('abcdefg'))
print(repeat('09099099909999'))
print(repeat('025********'))
Results: 结果:
结果:
009
2547
abcde
None
None
None
#4楼
You can make the obrvation that for a string to be considered repeating, its length must be divisible by the length of its repeated quence. 你可以观察到⼀个字符串被认为是重复的,它的长度必须能够被重复序列的长度整除。
你可以观察到⼀个字符串被认为是重复的,它的长度必须能够被重复序列的长度整除。 Given that, here is a solution that generates divisors of the length from 1 to n / 2 inclusive, divides the original string into substrings with the length of the divisors, and tests the equality of the result t: 鉴于此,这是⼀个⽣成从
鉴于此,这是⼀个⽣成从1到n / 2的长度除数的解决⽅案,将原始字符串除以具有除数长度的⼦串,并测试结果集的相等性:
from math import sqrt, floor
def divquot(n):
if n > 1:
yield 1, n
swapped = []
for d in range(2, int(floor(sqrt(n))) + 1):
q, r = divmod(n, d)
if r == 0:
yield d, q
swapped.append((q, d))
while swapped:
yield swapped.pop()
def repeats(s):
n = len(s)
for d, q in divquot(n):
2021年6月六级成绩查询时间
sl = s[0:d]
if sl * q == s:
return sl
return None
EDIT: In Python 3, the / operator has changed to do float division by default. 编辑:在Python 3中,
编辑:在Python 3中, /运算符已默认更改为浮
要从Python 2获得int除法,可以使⽤//运算点除法。
pearlite点除法。 To get the int division from Python 2, you can u the // operator instead. 要从Python 2获得
符。 Thank you to @TigerhawkT3 for bringing this to my attention. 感谢@ TigerhawkT3引起我的注意。
感谢@ TigerhawkT3引起我的注意。
符。
The // operator performs integer division in both Python 2 and Python 3, so I've updated the answer to support both versions.//运算符在Python 2和Python 3中执⾏整数除法,因此我更新了答案以⽀持这两个版本。
运算符在Python 2和Python 3中执⾏整数除法,因此我更新了答案以⽀持这两个版本。 The part where we test to e if all the substrings are equal is now a short-circuiting operation using all and a generator expression. 我们测试以查看所
我们测试以查看所有⼦串是否相等的部分现在是使⽤all和⽣成器表达式的短路操作。
UPDATE: In respon to a change in the original question, the code has now been updated to return the smallest repeating substring if it exists and None if it does not. 更新:响应原始问题的更改,代码现在已更新为返回最⼩的重复⼦字符串(如果存
更新:响应原始问题的更改,代码现在已更新为返回最⼩的重复⼦字符串(如果存在)和None如果不存在)。
如果不存在)。 @godlygeek has suggested using divmod to reduce the number of iterations on the divisors generator, and the code has been updated to match that as well. @godlygeek建议使⽤
@godlygeek建议使⽤divmod来减少divisors⽣成器上的迭代
它现在以次数,并且代码也已更新以匹配它。 It now returns all positive divisors of n in ascending order, exclusive of n itlf. 它现在以次数,并且代码也已更新以匹配它。
升序返回n所有正除数,不包括n本⾝。
Further update for high performance: After multiple tests, I've come to the conclusion that simply testing for string equality has the best performance out of any slicing or iterator solution in Python. 进⼀步更新以获得⾼性能:经过多次测试后,我得出
长颈鹿英语怎么说进⼀步更新以获得⾼性能:经过多次测试后,我得出的结论是,简单地测试字符串相等性在Python中的任何切⽚或迭代器解决⽅案中都具有最佳性能。 Thus, I've taken a leaf out of 的结论是,简单地测试字符串相等性在Python中的任何切⽚或迭代器解决⽅案中都具有最佳性能。
@TigerhawkT3 's book and updated my solution. 因此,我从@ TigerhawkT3的书中摘了⼀条叶⼦并更新了我的解决⽅案。
因此,我从@ TigerhawkT3的书中摘了⼀条叶⼦并更新了我的解决⽅案。 It's now over 6x as fast as before, noticably faster than Tigerhawk's solution but slower than David's. 现在它的速度⽐以前快了6
现在它的速度⽐以前快了6倍,明显快于Tigerhawk的解决⽅案,但⽐⼤卫的速度慢。
#5楼
Here's a straight forward solution, without regexes. 这是⼀个直接的解决⽅案,没有正则表达式。
这是⼀个直接的解决⽅案,没有正则表达式。
For substrings of s starting from zeroth index, of lengths 1 through len(s) , check if that substring, substr is the repeated pattern. 对于从第0个索引开始的
是否为重复模式。 This check can be 对于从第0个索引开始的s⼦字符串,长度为1到len(s) ,检查⼦字符串substr是否为重复模式。
performed by concatenating substr with itlf ratio times, such that the length of the string thus formed is equal to the length of s . 可以通过将
英语辅导报
的长度。 Hence 可以通过将substr与⾃⾝的ratio时间连接来执⾏该检查,使得由此形成的串的长度等于s的长度。
因此, ratio=len(s)/len(substr) 。
ratio=len(s)/len(substr) . 因此,
Return when first such substring is found. 找到第⼀个这样的⼦字符串时返回。
找到第⼀个这样的⼦字符串时返回。 This would provide the smallest possible substring, if one exists. 这将提供尽可能⼩的⼦字符串(如果存在)。
这将提供尽可能⼩的⼦字符串(如果存在)。

本文发布于:2023-07-20 10:06:19,感谢您对本站的认可!

本文链接:https://www.wtabcd.cn/fanwen/fan/90/183098.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

标签:字符串   长度   测试   模式   解决   是否   检查
相关文章
留言与评论(共有 0 条评论)
   
验证码:
Copyright ©2019-2022 Comsenz Inc.Powered by © 专利检索| 网站地图