前端预览PDF总结：iframe、embed、PDFObject、PDF.js

更新时间:2023-07-26 10:32:40 阅读：评论：0

前端预览PDF总结：iframe、embed、PDFObject、PDF.js 要在⽹页上显⽰PDF⽂件,⾸先< object >、< embed >、< iframe >这⼏个标签就能实现PDF⽂件的预览（⽆需JavaScript⽀持），我还在⽹上看了下发现挺多第三⽅js库可以实现PDF预览，如jQuery Document Viewer、dia.js、PDFObject、PDF.js等等。我⼤概看了下PDFObject、PDF.js这两个库，前者并不是⼀个PDF的渲染⼯具，⽽是通过使⽤< embed >标签来显⽰PDF；后者则会解析PDF⽂件内容，还能将PDF渲染成Canvas。

< iframe >

所有浏览器都⽀持 < iframe > 标签，直接将src设置为指定的PDF⽂件就可以预览了。此外可以把需要的⽂本放置在 < iframe > 和之间，这样就可以应对⽆法理解 iframe 的浏览器，⽐如下⾯的代码可以提供⼀个PDF的下载链接：

This browr does not support PDFs. Plea download the PDF to view it: <a href="/index.pdf">Download PDF</a>

</iframe>

< embed >

< embed > 标签定义嵌⼊的内容，⽐如插件。在HTML5中这个标签有4个属性：

属性值描述

height pixels设置嵌⼊内容的⾼度。

width pixels设置嵌⼊内容的宽度。荻浦花海

type type定义嵌⼊内容的类型。

src url嵌⼊内容的 URL。

但是需要注意的是这个标签不能提供回退⽅案，与< iframe > < / iframe >

不同，这个标签是⾃闭合的的，也就是说如果浏览器不⽀持PDF的嵌⼊，那么这个标签的内容什么都看不到。⽤法如下：

< object >

< object >定义⼀个嵌⼊的对象，请使⽤此元素向页⾯添加多媒体。此元素允许您规定插⼊ HTML ⽂档中的对象的数据和参数，以及可⽤来显⽰和操作数据的代码。⽤于包含对象，⽐如图像、⾳频、视频、Java applets、ActiveX、PDF 以及 Flash。⼏乎所有主流浏览器都拥有部分对 < object > 标签的⽀持。这个标签在这⾥的⽤法和< iframe >很⼩，也⽀持回退：

This browr does not support PDFs. Plea download the PDF to view it: <a href="/index.pdf">Download PDF</a>

</object>

当然，结合< object >和< iframe >能提供⼀个更强⼤的回退⽅案：

This browr does not support PDFs. Plea download the PDF to view it: <a href="/index.pdf">Download PDF</a>

</iframe>

</object>

以上三个标签是⼀种⽆需JavaScript⽀持的PDF预览⽅案。下⾯提到的PDFObject和PDF.js都是js库。

PDFObject

看官⽹上的介绍，PDFObject并不是⼀个PDF渲染⼯具，它也是通过< embed >标签实现PDF预览：

PDFObject is not a rendering engine. PDFObject just writes an < embed > element to the page, and relies on the browr or browr plugins to render the PDF. If the browr does not support embedded PDFs, PDFObject is not capable of forcing the browr to render the PDF.

PDFObject提供了⼀个PDFObject.supportsPDFs⽤于判断该浏览器能否使⽤PDFObject：

if(PDFObject.supportsPDFs){

console.log("Yay, this browr supports inline PDFs.");

} el {

console.log("Boo, inline PDFs are not supported by this browr");

事业单位题库}

整个PDFObject使⽤起来⾮常简单，完整代码：

<!DOCTYPE html>

<html>

<head>

html,body,#pdf_viewer{

width: 100%;

height: 100%;

margin: 0;

padding: 0;

}

</style>

</head>

<body>

新生儿体重标准<div id="pdf_viewer"></div>

</body>

if(PDFObject.supportsPDFs){

// PDF嵌⼊到⽹页

} el {

location.href = "/canvas";

}

</script>

</html>

PDF.js

PDF.js可以实现在html下直接浏览pdf⽂档，是⼀款开源的pdf⽂档读取解析插件，⾮常强⼤，能将PDF⽂件渲染成Canvas。PDF.js主要包含两个库⽂件，⼀个pdf.js和⼀个pdf.worker.js，⼀个负责API解析，⼀个负责核⼼解析。

⾸先引⼊pdf.js⽂件<script type="text/javascript" src='pdf.js'></script>

PDF.js⼤部分⽤法都是基于Promi的，Document(url)⽅法返回的就是⼀个Promi：

var numPages = pdf.numPages;

var start = 1;

renderPageAsync(pdf, numPages, start);

});

Promi返回的pdf是⼀个PDFDocumentProxy对象官⽹API介绍是：

Proxy to a PDFDocument in the worker thread. Also, contains commonly ud properties that can be read synchronously.

PDF的解析⼯作需要通过Page(page)去执⾏，这个⽅法返回的也是⼀个Promi，因此可以通过async/await函数去逐页解析PDF：

async function renderPageAsync(pdf, numPages, current){

for(let i=1; i<=numPages; i++){

// 解析page

let page = Page(i);

// 渲染

// ...

}

得到的page是⼀个PDFPageProxy对象，即Proxy to a PDFPage in the worker thread 。这个对象得到了这⼀页的PDF解析结果，我们可以看下这个对象提供的⽅法：

⽅法返回

getAnnotations A promi that is resolved with an {Array} of the annotation objects.

getTextContent That is resolved a TextContent object that reprent the page text content.

getViewport Contains ‘width’ and ‘height’ properties along with transforms required for rendering.

render An object that contains the promi, which is resolved when the page finishes rendering.

我们可以试试调⽤getTextContent⽅法，并将其结果打印出来：

第⼀页部分结果如下：

{

"items": [

{

"str": "⼩册⼦标题",

狗狗肛门腺

"dir": "ltr",黄山游

"width": 240,

"height": 2304,

"transform": [

48,

45.32495,

679.04

"fontName": "g_d0_f1"

{

"str": " ",

"dir": "ltr",

"width": 9.600000000000001,

"height": 2304,

"transform": [

48,

285.325,

679.04

涂鸦移动"fontName": "g_d0_f2"

}

"styles": {

"g_d0_f1": {

"fontFamily": "monospace",

"ascent": 1.05810546875,

"descent": -0.26171875,

"vertical": fal

"g_d0_f2": {

"fontFamily": "sans-rif",

"ascent": 0.74365234375,

"descent": -0.25634765625

}

我们可以发现，PDF.js将每页⽂本的字符串、位置、字体都解析出来，感觉还是挺厉害的。

通过这种⽅式就能实现再预览⽂件上选中⽂字（刚开始我还在纳闷为什么渲染成Canvas还能选择⽂字）将page渲染成Canvas是通过render⽅法实现的，代码如下：

async function renderPageAsync(pdf, numPages, current){

console.log("renderPage async");

for(let i=1; i<=numPages; i++){

// page

let page = Page(i);

let scale = 1.5;

肠胃炎可以抽烟吗let viewport = Viewport(scale);

// Prepare canvas using PDF page dimensions.

let canvas = ateElement("canvas");

let context = Context('2d');

document.body.appendChild(canvas);

canvas.height = viewport.height;

canvas.width = viewport.width;

// Render PDF page into canvas context.

let renderContext = {

canvasContext: context,

viewport: viewport

};

}

前端每⽇⼀题，带你⾛⼊⾼级前端之路！每天早上9点左右更新题⽬及前⼀天的答案！

推荐web程序员必备微信号

▼

web夜读课泡沫箱种菜有毒吗

微信号：ydhlwnxs

推荐理由：web开发⼈员都在关注的公众号，在多学⼀点知识，就可以少写⼀⾏代码！专注于技术资源分享，经验交流，最新技术解读，另有海量免费电⼦书以及成套学习资源，关注web夜读课，做技术得先驱者。

▼长按下⽅↓↓↓⼆维码识别关注

本文发布于:2023-07-26 10:32:40，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/82/1117988.html

上一篇：WEDDINGWIND-PHILIPLARKIN

下一篇：1151424017-embed-三年级文明组室小结

标签：解析标签内容

留言与评论（共有 0 条评论）