ne

更新时间:2023-01-04 04:54:52 阅读: 评论:0


2023年1月4日发(作者:kneel)

NE问题分析⽅法----Native栈还原和调试技巧

⽂章⽬录

简介

NE即NativeException,我们主要指AndroidC/C++程序出现异常报错,因CameraHAL是由C/C++实现的,在相机系统开发过程中,

经常会碰到NE问题。出现NE问题的原因有很多,如空指针、内存踩踏、FDLEAK、数组越界访问等在出现问题时,Kernel会发送⼀个

signal给urspace,urspace中有个tombstoned进程接收处理信号,在异常进程奔溃前,tombstoned会将该进程的backtrace、

memroymap等信息抓取出来保存到/data/tombstones/tombstone_xx⽂件、同时会将tombstone信息输出到logcat中。在⼀些平台,

经过设置后,可将整个崩溃进程保存为coredump⽂件,可通过Trace32或者GDB调试coredump⽂件。

本⽂主要介绍Native栈还原,即根据NE报错信息,定位到报错代码,使⽤的⼯具是addr2line

注意:内存踩踏出现的报错通常报错位置可能不是出错位置,所以踩内存问题通常需要借助⼯具定位

Native栈还原

1.我们抓到NE报错问题后⾸先将tombstone⽂件从/data/tombstones/tombstone_xx导出,如:

************************************************

Buildfingerprint:'XXXX/XX/XX:10/QKQ1.200412.002/.20200611.122340:ur/relea-keys'

Revision:'0'

ABI:'arm64'

Timestamp:2020-06-1311:18:11+0800

pid:13260,tid:13260,name:provider@2.4->>>/vendor/bin/hw/er@2.4-rvice_64<<<

uid:1047

signal6(SIGABRT),code0(SI_USERfrompid4396,uid0),faultaddr--------

xfd7b20x10089x200000000fffffffex30000

x40000x500000000ffffffffx600000000ffffffffx7000000716d685000

x80062x90089x100009x110000

xcf54b47xceec98dx140000x150482

x1600000071f170a950x1700000071f1695320x18af530x1900000000fffffffe

x200000x21fd7b20x220089x2300000071f30a2188

x2400000071f1db9020x250002x26000000716d686000x2792c8

x280002x290000007ff862f280

sp0000007ff862f220lr00000071f16988acpc00000071f169533c

backtrace:

#00pc033c/apex/e/lib64/bionic/(syscall+28)(BuildId:778f9db29d872fa660c03bee8d69f746)

#01pc38a8/apex/e/lib64/bionic/(__futex_wait_ex(voidvolatile*,bool,int,bool,timespecconst*)+140)

(BuildId:778f9db29d872fa660c03bee8d69f746)

#02pce7a98/apex/e/lib64/bionic/(NonPI::MutexLockWithTimeout(pthread_mutex_internal_t*,bool,tim

especconst*)+596)(BuildId:778f9db29d872fa660c03bee8d69f746)

#03pcd444/vendor/lib64/hw/(CamX::Mutex::Lock()+116)(BuildId:f3ec37ddca55cd2b52366606c94f3e2a)

#04pc14fc/vendor/lib64/hw/(CamX::Session::Destroy()+572)(BuildId:f3ec37ddca55cd2b52366606c94f3e2a)

#05pcae9a8/vendor/lib64/hw/(CamX::ChiContext::DestroySession(CamX::CHISession*)+40)(BuildId:f3ec37dd

ca55cd2b52366606c94f3e2a)

#06pc341c/vendor/lib64/hw/(Session::Destroy(int)+84)(BuildId:ec7b9034d259422289af1a300c889c42

)

#07pcb606c/vendor/lib64/hw/(UcaMultiCamera::Destroy(int)+1660)(BuildId:ec7b9034d259422289a

f1a300c889c42)

#08pc3e64/vendor/lib64/hw/(Uca::DestroyObject(int)+732)(BuildId:ec7b9034d259422289af1a300

c889c42)

#09pc7a34/vendor/lib64/hw/(ExtensionModule::TeardownOverrideUca(camera3_deviceconst*,int)

+628)(BuildId:ec7b9034d259422289af1a300c889c42)

#10pc6f0c/vendor/lib64/hw/(ExtensionModule::TeardownOverrideSession(camera3_deviceconst*,unsi

gnedlong,void*)+724)(BuildId:ec7b9034d259422289af1a300c889c42)

#11pcdae8/vendor/lib64/hw/(CamX::HALDevice::Clo()+960)(BuildId:f3ec37ddca55cd2b52366606c94f3e2

a)

#12pc13d0/vendor/lib64/hw/(CamX::clo(hw_device_t*)+3928)(BuildId:f3ec37ddca55cd2b52366606c94f3e

2a)

#13pc5e60/vendor/lib64/hw/(CamX::clo(hw_device_t*)+336)(BuildId:f3ec37ddca55cd2b52366606c94f3e2

a)

#14pcd888/vendor/lib64/@(android::hardware::camera::device::V3_2::implementation::CameraDevic

eSession::clo()+248)(BuildId:53ae9500ac4e483bad90ffa7a69dfc)

2.根据tombstone⽂件中的版本信息、报错堆栈找到带调试信息的so(通常在同次编译的symbols⽬录,如

out/target/product/K81950AA1/symbols/),根据信号量signal值初步判断出错的类型,怎么确定so带调试信息?

在Linux中,可通过file命令读取信息,如果输出显⽰notstripped表⽰带调试信息

fileout/target/product/K81950AA1/symbols/vendor/lib64/hw/

out/target/product/K81950AA1/symbols/vendor/lib64/hw/:ELF64-bitLSBsharedobject,ARMaarch64,version1(SYSV),dyn

amicallylinked,BuildID[md5/uuid]=4e4e59619ceda634c1cd826fd2717bd8,notstripped

常见的signal含义:

SIGABRT6由abort(3)发出的退出指令,⼀般是代码逻辑⾛到了abort函数

SIGSEGV11⽆效的内存引⽤,段错误、内存问题、空指针等

其他信号含义参考

3.上⾯的backtrace中,显⽰的pc地址是相对地址(相对于somap的地址)可直接⽤addr2line来解析,如第5帧(#05pc

ae9a8/vendor/lib64/hw/),有时pc地址是绝对地址,我们怎么通过绝对地址来计算相对地址

呢?(下⾯例⼦和上⾯backtrace不是同⼀个)

#0pc7591fab52c

#1pc750f5a2044

#2pc750f881e08

#3pc750f884f40

#4pc750f9e0c50

#5pc750fa072bc

Dumpnativemapsfiles(tombstone⽂件⼀般包含,也可在进程奔溃前通过**adbshellcat/proc/$pid/maps**获得),根据pc值可知是落在:

750f4e0000-750fada000r-xp/system/lib64/

正常的so⽂件默认都是加载到0地址的,但是加载到其他地址的,不同的platform加载地址可能不同,具体可以⽤readelf证实。readelf放在⽬录pr

ebuilts/gcc/linux-x86/arm/arm-linux-androideabi-4.X/bin下⾯,使⽤命令:

ProgramHeaders:

TypeOfftVirtAddrPhysAddrFileSizMemSizFlgAlign

PHDR0x0000400x70400x70400x0001c00x0001c0R0x8

LOAD0x0000000x70000x70000x5eaf640x5eaf64RE0x1000

LOAD0x5eb2700x32700x32700x0133180x016438RW0x1000

可以看到第1个LOAD加载到了0x27000,即的加载地址是0x27000

计算相对地址

相对地址=pc-maps起始地址+so加载地址

例如上⾯的例⼦,加载地址是0x27000,则计算#1pc750f5a2044的公式是750f5a2044-750f4e0000+0x27000=0xE9044

4.解析地址(addr2line分32bit和64bit,根据要解析的库的abi来选择)

prebuilts/gcc/linux-x86/aarch64/aarch64-linux-android-4.9/bin/aarch64-linux-android-addr2line-Cfeout/target/product/K81950AA1/symbols/vendor/li

b64/hw/ae9a8

调试技巧

在进程卡住的时候,⽐如相机⿊屏卡住,我们在调试的时候经常通过adbshellkill-6$PID来主动杀进程,获取tombstone信息,也可以

通过adbshelldebuggerd-b$PID获得。

在杀进程前,最好能先收集进程的memory、fd等信息,附带⼀个脚本:

#!/bin/sh

logpath=$1

if[-z"$logpath"]

then

logpath=$(pwd)"/MemoryDebug"

echo$(pwd)

fi

adbroot

adbwait-for-device

#needclolinux,othersdumpfailure

adbshelltenforce0

pid=`er@2.4-rvice_64`

echo"dumppid="${pid}

time=$logpath"/pid"${pid}"_"$(date+%Y%m%d%H%M%S)

echo"logpath:"$time

mkdir-p${time}

#getmapsandsmapsinfo

adbshellcat/proc/$pid/maps>$time/

adbshellcat/proc/$pid/smaps>$time/

adbshell"er@2.4-rvice_64|xargspmap-x">$time/

#getprocessfdinfo

adbshellls-a-l/proc/$pid/fd>$time/

#getallprocessinfobeforedump

adbshellps-AT>$time/all_

#getprocessfilestate(threadinfo)

adbshelllsof-p$pid>$time/$pid"_process_"

#getcurrentmeminfobeforedump

adbshelldumpsysmeminfo$pid>$time/$pid"_"

#

>$time/

本文发布于:2023-01-04 04:54:52,感谢您对本站的认可!

本文链接:http://www.wtabcd.cn/fanwen/fan/90/88723.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

上一篇:调羹
下一篇:周末去郊游
标签:ne
相关文章
留言与评论(共有 0 条评论)
   
验证码:
Copyright ©2019-2022 Comsenz Inc.Powered by © 专利检索| 网站地图