java使用Document类解析html

更新时间:2023-07-26 11:42:51 阅读: 评论:0

java使⽤Document类解析html
今天⼯作中⽤到了解析html获取其中的标签内容,在此记录⼀下:
先感谢两个帖⼦:怎么改变文件格式
引⽤:
接下来是我的应⽤:
<DIV class="navbar navbar-inver navbar-fixed-top">
<DIV class="navbar-inner">
<DIV class="container-fluid">
<a class="brand lnk-file-title" STYLE="text-decoration: none; width: 200px" TITLE=" "> </a>
<a id="btnPrint" STYLE="margin:0px;padding:10px;" href="javascript:;" onClick="printDoc()">
<img src="./1281e94387f5efb28be502f828edc032.files/print.png">
</a>
<DIV class="changePage">高空解体
<a class="pageUp" href="javascript:;" onClick="slidePage(0)"></a>
<a class="pageDown" href="javascript:;" onClick="slidePage(1)"></a>马车
<SPAN STYLE="padding:0px 10px 0px 10px">Page:</SPAN>关于元宵的手抄报
<INPUT class="activePage" type="text" Value="1" onBlur="changePage(this.value)" onkeyup="this.value=place(/[^0-9]/g,'')" onafterpaste= "this.value=place(/[^0-9]/g,'')">
<SPAN class="totalPage"></SPAN>
</DIV>
</DIV>
马尔代夫选岛</DIV>
</DIV>
<DIV id="printArea" STYLE="display:none"></DIV>
<DIV class="container-fluid container-fluid-content">
<DIV class="row-fluid">
<DIV class="span12 docArea">
<DIV class="word-page" STYLE="width:921px;height:1275px" data-loaded="true">
<DIV class="word-content">
<embed src="1281e94387f5efb28be502f828edc032.files/1.svg" width="100%" height="100%" type="image/svg+xml"></embed>
</DIV>
</DIV>
<DIV class="word-page" STYLE="width:921px;height:1275px" data-loaded="true">
<DIV class="word-content">
<embed src="1281e94387f5efb28be502f828edc032.files/2.svg" width="100%" height="100%" type="image/svg+xml"></embed>
</DIV>
</DIV>
<DIV class="word-page" STYLE="width:921px;height:1275px" data-loaded="true">
<DIV class="word-content">
<embed src="1281e94387f5efb28be502f828edc032.files/3.svg" width="100%" height="100%" type="image/svg+xml"></embed>
</DIV>
</DIV>
<DIV class="word-page" STYLE="width:921px;height:1275px" data-loaded="true">
<DIV class="word-content">
<embed src="1281e94387f5efb28be502f828edc032.files/4.svg" width="100%" height="100%" type="image/svg+xml"></embed>
</DIV>
</DIV>
<DIV class="word-page" STYLE="width:921px;height:1275px" data-loaded="true">
<DIV class="word-content">
<embed src="1281e94387f5efb28be502f828edc032.files/5.svg" width="100%" height="100%" type="image/svg+xml"></embed>
</DIV>
</DIV>
<DIV class="word-page" STYLE="width:921px;height:1275px">
<DIV class="word-content"></DIV>
</DIV>
关于钱学森的作文</DIV>
</DIV>
</DIV>
我是要拿取embed标签中的src的内容:
对上⾯两个帖⼦进⾏整合:
public static List<String> match(String source, String element, String attr) {
柬埔寨游记List<String> result = new ArrayList<String>();
String reg = "<" + element + "[^<>]*?\\s" + attr + "=['\"]?(.*?)['\"]?\\s.*?>";
Matcher m = pile(reg).matcher(source);
while (m.find()) {
String r = m.group(1);
result.add(r);
}
新鲜的英文return result;
}
public static void main(String[] args) throws MalformedURLException, IOException {
Document doc= Jsoup.par(new URL("/2018/11/1281e94387f5efb28be502f828edc032.html"),100000); String html = String();
// String source = "<a title=中国体育报 href=''>aaa</a><a title='北京⽇报' href=''>bbb</a>";
List<String> list = match(html, "embed", "src");
System.out.println(list);
}
其中⽤到的jar包:
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.10.2</version>
</dependency>

本文发布于:2023-07-26 11:42:51,感谢您对本站的认可!

本文链接:https://www.wtabcd.cn/fanwen/fan/82/1118107.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

标签:标签   内容   获取   解析   高空
相关文章
留言与评论(共有 0 条评论)
   
验证码:
推荐文章
排行榜
Copyright ©2019-2022 Comsenz Inc.Powered by © 专利检索| 网站地图