梧桐山隧道C++11正则表达式ECMAScript⽂法突然想写个爬⾍,然后发现,如果有正则表达式,会⽅便些。
C++11提供了Regex类.可以⽤来完成:
1.Match: 将整个输⼊拿来⽐对(匹配)某个正则表达式。
2.Search:查找“与正则表达式吻合”的⼦序列。
3.Tokenize:正则表达式作为分割器,得到分割器之前的字符串。
4.Replace:将与正则表达式吻合之的⼦序列替换掉
主要函数有: regex_match(),regex_arch(),regex_replace();
主要对象:sregex_iterator,sregex_token_iterator,regex,smatch
例⼦:
[_[:alpha:]][_[:alnum:]]* 表⽰,以_或字母开头,后⾯接着任意个_或字母的组合
[123]?[0-9]\.1?[0-9]\.20[0-9]{2} 表⽰german format,如 24.12.2010
C++11默认使⽤ ECMAScript ⽂法,告诉你怎么构造正则表达式
表⽰式意义
.newline以外的任何字符
[...]...字符中的任何⼀个
意见和建议怎么写[^...]...字符之外的任何⼀个
[ [:charclass:]]指定字符串类charclass中的⼀个(见下表)
\n,\t,\f,\r,\v⼀个newline,tabulator,form feed,carriage return,vertical tab
\xhh,\uhhh⼀个⼗六进制字符或Unicode字符
*前⼀个字符或群组,任意次数
前⼀个字符或群组,可有可⽆
+前⼀个字符或群组,⾄少⼀次
{n}前⼀个字符或群组,n次
{n,}前⼀个字符或群组,⾄少n次
{n,m}前⼀个字符或群组,⾄少n次,⾄多m次
...|...在 | 之前或之后的pattern,合并左边和右边,(.|\n)*表⽰任意字符和任意换⾏
(...)设定群组(group)
\1,\2,\3第n个group(第⼀个group的索引为1)
\b⼀个正字词边界,字词的起点或终点,不知道什么意思
\B⼀个负字词的边界,字词的⾮起点或⾮终点
^⼀⾏的起点
$⼀⾏的终点
字符类缩写转义效果
[[:alnum:]]⼀个字母或者数字
[[:alpha:]]⼀个字母
[[:blank:]]⼀个space或者tab
[[:cntrl:]]⼀个控制字符
[[:digit:]][[:d:]]\d⼀个数字
\D⼀个⾮数字
[[:graph:]]可打印⾮空⽩字符,相当于[[:alnum:][:punct:]]
[[:lower:]]⼀个⼩写字母
[[:print:]]⼀个可打印字符,包括空⽩字符
[[:punct:]]⼀个标点符号字符,但⾮space,digit,letter [[:space:]]\s⼀个空⽩字符
\S⼀个⾮空⽩字符
[[:upper:]]⼀个⼤写字母
[[:xdigit:]]⼀个⼗六进制数字
\w⼀个字母、数字或下划线
\W⼀个⾮字母、⾮数字
附上⼀个测试例⼦:
#include <regex>
#include <iostream>
#include <string>
#include <iomanip>
#include <algorithm>
using namespace std;
void out(bool b){
cout << ( b? "found" : "not found") << endl;
}
void regex1();
void regex2();
void regex3();
void regex4();
void regex5();
void regex6();
int main(){
//regex1();
/
/regex2();
//regex3();
//regex4();
//regex5();
//regex6();
string data = "1994-06-25\n"
"2015-09-13\n"生产跟单
"2015 09 13\n";
smatch m;
regex reg("(\\d{4})[- ](\\d{2})[- ](\\d{2})");
//sregex_iterator pos(data.cbegin(),d(),regex("(\\d{4})[- ](\\d{2})[- ](\\d{2})")); sregex_iterator pos(data.cbegin(),d(),reg);
sregex_iterator end;
for( ; pos!=end ;pos++){
cout << pos->str() << "";
cout << pos->str(1) << "" <<pos->str(2) <<"" << pos->str(3) << endl;
}
system("pau");
return0;
}
/*
* regex_replace(string,reg1,reg2)
* 将reg1匹配到的⼦串,⽤reg2替换掉
*/
void regex6(){
string data = "<person>\n"
"<first>Nico</first>\n"
"<last>Josuttis</last>\n"
"</person>\n";
regex reg("<(.*)>(.*)</(\\1)>");
cout << regex_replace(data,reg,"<$1 value=\"$2\"/>") << endl;
string res2;
regex_replace (back_inrter(res2),
data.begin(),d(),
reg,
"<$1 value=\"$2\"/>",
regex_constants::format_no_copy
| regex_constants::format_first_only);
cout << res2 << endl;
/*
* sregex_token_iteartor 分割器
* 详情看函数输出,⽐如,通过这个,可以取出下⾯的名字
*/
void regex5(){
string data = "<person>\n"
"<first>Nico</first>\n"
"<last>Josuttis</last>\n"
"</person>\n";
regex reg("<(.*)>(.*)</(\\1)>");
sregex_token_iterator pos(data.cbegin(),d(),reg,0);
sregex_token_iterator end;
for(; pos!=end;pos++){
cout << "match: "<<pos->str() << endl;
}
cout<< endl;
string names = "nico,jim,helmut,paul,tim,john paul,rita";
regex p("[ \t\n]*[,;.][ \t\n]*");
sregex_token_iterator p(names.cbegin(),d(),p,-1);
sregex_token_iterator e;
for(; p!=e;p++){
cout << "name: "<<*p << endl;
}
}
/*
* sregex_iterator 迭代器,通过这样个来遍历所以满⾜的⼦串
* 注意传进去的 begin,end 必须是const 所以使⽤ cbegin()
*/
生日寄语给孩子
void regex4(){
string data = "<person>\n"
"<first>Nico</first>\n"
"<last>Josuttis</last>\n"
经验总结
"</person>\n";
regex reg("<(.*)>(.*)</(\\1)>");
sregex_iterator pos(data.cbegin(),d(),reg);
sregex_iterator end;
for(;pos != end;++pos){
cout << "match: "<< pos->str(0) << endl;
cout << "tag: "<< pos->str(1)<< endl;
cout << "value "<< pos->str(2) << endl;
}
sregex_iterator beg(data.cbegin(),d(),reg);
for_each(beg,end,[](const smatch& m){
cout << "match: "<< m.str() << endl;
cout << "tag: "<< m.str(1)<< endl;
cout << "value "<< m.str(2) << endl;
});
}
/*
* bool regex_arch(string , smatch ,regex )
* 对整个字符串,⽤这个regex进⾏匹配,找到第⼀个满⾜的⼦串, * 通过前⾯的例⼦,可以发现 m.suffix() 指得是,满⾜⼦串后⾯的, * ⼀个字符的索引,所以,通过⼀个循环,可以不断找出后⾯满⾜的 */
void regex3(){
string data = "<person>\n"
"<first>Nico</first>\n"
"<last>Josuttis</last>\n"
"</person>\n";
regex reg("<(.*)>(.*)</(\\1)>");
auto pos = data.cbegin();
auto end = d();
smatch m;
for(; regex_arch(pos,end,m,reg);pos = m.suffix().first){
cout << "match: "<<m.str() << endl;
cout << "tag: "<<m.str(1) << endl;
cout << "value: " << m.str(2) << endl;
cout << "m.prefix(): "<<m.prefix().str() << endl;
cout << "m.suffix(): "<<m.suffix().str() << endl;
}
}
* bool regex_arch(string , smatch ,regex )
* 对整个字符串,⽤这个regex进⾏匹配,找到第⼀个满⾜的⼦串, * 下⾯是通过smatch 获取⼦串内容的⽅法,索印对应群组
*/
void regex2(){
string data = "XML tag: <tag-name>the value</tag-name>.";
cout << "data: "<<data << "\n\n";
smatch m;
bool found = regex_arch(data,m,regex("<(.*)>(.*)</(\\1)>"));
cout << "m.empty(): "<<boolalpha << m.empty() << endl;
cout << "m.size(): "<<m.size() << endl;
if(found){
cout << "m.str(): "<<m.str() << endl;
cout << "m.length(): "<<m.length()<<endl;
cout << "m.position(): "<<m.position()<<endl;
cout << "m.prefix().str(): "<<m.prefix().str()<< endl;
cout << "m.suffix().str(): "<<m.suffix().str() << endl;
cout << endl;
for(int i = 0;i<m.size();i++){
cout << "m["<<i<<"].str(): " << m[i].str() << endl;
cout << "m.str("<<i << "): " << m.str(i) << endl;
cout << "m.position(" << i << "): "<<m.position(i)<<endl; }
cout << endl;
cout << "matches:" << endl;参松养心
广西工学院
for(auto pos = m.begin();pos!=m.end();pos++){
cout << ""<< *pos << "";
范晓萤cout << "(length: " << pos->length() << ")" << endl;
}
}
}
/*
* bool regex_match(string , regex )
* 对整个字符串,⽤这个regex进⾏匹配,会匹配最⼤满⾜的字符串 */
void regex1(){
regex reg1("<.*>.*</.*>");
bool found = regex_match("<tag>value</tag>",reg1);
out(found);
regex reg2("<(.*)>.*</\\1>");
found = regex_match("<tag>value</tag>",reg2);
out(found);
regex reg3("<\\(.*\\)>.*</\\1>",regex_constants::grep);
found = regex_match("<tag>value</tag>",reg3);
out(found);
found = regex_match("<tag>value</tag>",regex("<(.*)>.*</\\1>")); out(found);
cout << endl;
found = regex_match("XML tag: <tag>value</tag>",
regex("<(.*)>.*</\\1>"));
out(found);
found = regex_match("XML tag: <tag>value</tag>",
regex(".*<(.*)>.*</\\1>"));
out(found);
found = regex_arch("XML tag: <tag>value</tag>",
regex("<(.*)>.*</\\1>"));
out(found);
found = regex_arch("XML tag: <tag>value</tag>",
regex(".*<(.*)>.*</\\1>"));
out(found);
}