首页 > 美文阅读

java使用Document类解析html

更新时间:2023-07-26 11:42:51 阅读：评论：0

java使⽤Document类解析html

今天⼯作中⽤到了解析html获取其中的标签内容，在此记录⼀下：

先感谢两个帖⼦：怎么改变文件格式

引⽤：

接下来是我的应⽤：

</a>

<DIV class="changePage">高空解体

<a class="pageDown" href="javascript:;" onClick="slidePage(1)"></a>马车

<SPAN STYLE="padding:0px 10px 0px 10px">Page:</SPAN>关于元宵的手抄报

</DIV>

马尔代夫选岛</DIV>

</DIV>

</DIV>

</DIV>

</DIV>

</DIV>

</DIV>

</DIV>

关于钱学森的作文</DIV>

</DIV>

我是要拿取embed标签中的src的内容：

对上⾯两个帖⼦进⾏整合：

public static List<String> match(String source, String element, String attr) {

柬埔寨游记List<String> result = new ArrayList<String>();

String reg = "<" + element + "[^<>]*?\\s" + attr + "=['\"]?(.*?)['\"]?\\s.*?>";

Matcher m = pile(reg).matcher(source);

while (m.find()) {

String r = m.group(1);

result.add(r);

}

新鲜的英文return result;

}

public static void main(String[] args) throws MalformedURLException, IOException {

Document doc= Jsoup.par(new URL("/2018/11/1281e94387f5efb28be502f828edc032.html"),100000); String html = String();

// String source = "<a title=中国体育报 href=''>aaa</a><a title='北京⽇报' href=''>bbb</a>";

List<String> list = match(html, "embed", "src");

System.out.println(list);

}

其中⽤到的jar包：

<groupId>org.jsoup</groupId>

<artifactId>jsoup</artifactId>

</dependency>

本文发布于:2023-07-26 11:42:51，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/82/1118107.html

上一篇：网页中怎样控制Flash的播放与停止

下一篇：ys168网盘设计代码

标签：标签内容获取解析高空

留言与评论（共有 0 条评论）