NE问题分析⽅法----Native栈还原和调试技巧
⽂章⽬录
简介
NE即NativeException,我们主要指AndroidC/C++程序出现异常报错,因CameraHAL是由C/C++实现的,在相机系统开发过程中,
经常会碰到NE问题。出现NE问题的原因有很多,如空指针、内存踩踏、FDLEAK、数组越界访问等在出现问题时,Kernel会发送⼀个
signal给urspace,urspace中有个tombstoned进程接收处理信号,在异常进程奔溃前,tombstoned会将该进程的backtrace、
memroymap等信息抓取出来保存到/data/tombstones/tombstone_xx⽂件、同时会将tombstone信息输出到logcat中。在⼀些平台,
经过设置后,可将整个崩溃进程保存为coredump⽂件,可通过Trace32或者GDB调试coredump⽂件。
本⽂主要介绍Native栈还原,即根据NE报错信息,定位到报错代码,使⽤的⼯具是addr2line
注意:内存踩踏出现的报错通常报错位置可能不是出错位置,所以踩内存问题通常需要借助⼯具定位
Native栈还原
1.我们抓到NE报错问题后⾸先将tombstone⽂件从/data/tombstones/tombstone_xx导出,如:
************************************************
Buildfingerprint:'XXXX/XX/XX:10/QKQ1.200412.002/.20200611.122340:ur/relea-keys'
Revision:'0'
ABI:'arm64'
Timestamp:2020-06-1311:18:11+0800
pid:13260,tid:13260,name:provider@2.4->>>/vendor/bin/hw/er@2.4-rvice_64<<<
uid:1047
signal6(SIGABRT),code0(SI_USERfrompid4396,uid0),faultaddr--------
xfd7b20x10089x200000000fffffffex30000
x40000x500000000ffffffffx600000000ffffffffx7000000716d685000
x80062x90089x100009x110000
xcf54b47xceec98dx140000x150482
x1600000071f170a950x1700000071f1695320x18af530x1900000000fffffffe
x200000x21fd7b20x220089x2300000071f30a2188
x2400000071f1db9020x250002x26000000716d686000x2792c8
x280002x290000007ff862f280
sp0000007ff862f220lr00000071f16988acpc00000071f169533c
backtrace:
#00pc033c/apex/e/lib64/bionic/(syscall+28)(BuildId:778f9db29d872fa660c03bee8d69f746)
#01pc38a8/apex/e/lib64/bionic/(__futex_wait_ex(voidvolatile*,bool,int,bool,timespecconst*)+140)
(BuildId:778f9db29d872fa660c03bee8d69f746)
#02pce7a98/apex/e/lib64/bionic/(NonPI::MutexLockWithTimeout(pthread_mutex_internal_t*,bool,tim
especconst*)+596)(BuildId:778f9db29d872fa660c03bee8d69f746)
#03pcd444/vendor/lib64/hw/(CamX::Mutex::Lock()+116)(BuildId:f3ec37ddca55cd2b52366606c94f3e2a)
#04pc14fc/vendor/lib64/hw/(CamX::Session::Destroy()+572)(BuildId:f3ec37ddca55cd2b52366606c94f3e2a)
#05pcae9a8/vendor/lib64/hw/(CamX::ChiContext::DestroySession(CamX::CHISession*)+40)(BuildId:f3ec37dd
ca55cd2b52366606c94f3e2a)
#06pc341c/vendor/lib64/hw/(Session::Destroy(int)+84)(BuildId:ec7b9034d259422289af1a300c889c42
)
#07pcb606c/vendor/lib64/hw/(UcaMultiCamera::Destroy(int)+1660)(BuildId:ec7b9034d259422289a
f1a300c889c42)
#08pc3e64/vendor/lib64/hw/(Uca::DestroyObject(int)+732)(BuildId:ec7b9034d259422289af1a300
c889c42)
#09pc7a34/vendor/lib64/hw/(ExtensionModule::TeardownOverrideUca(camera3_deviceconst*,int)
+628)(BuildId:ec7b9034d259422289af1a300c889c42)
#10pc6f0c/vendor/lib64/hw/(ExtensionModule::TeardownOverrideSession(camera3_deviceconst*,unsi
gnedlong,void*)+724)(BuildId:ec7b9034d259422289af1a300c889c42)
#11pcdae8/vendor/lib64/hw/(CamX::HALDevice::Clo()+960)(BuildId:f3ec37ddca55cd2b52366606c94f3e2
a)
#12pc13d0/vendor/lib64/hw/(CamX::clo(hw_device_t*)+3928)(BuildId:f3ec37ddca55cd2b52366606c94f3e
2a)
#13pc5e60/vendor/lib64/hw/(CamX::clo(hw_device_t*)+336)(BuildId:f3ec37ddca55cd2b52366606c94f3e2
a)
#14pcd888/vendor/lib64/@(android::hardware::camera::device::V3_2::implementation::CameraDevic
eSession::clo()+248)(BuildId:53ae9500ac4e483bad90ffa7a69dfc)
2.根据tombstone⽂件中的版本信息、报错堆栈找到带调试信息的so(通常在同次编译的symbols⽬录,如
out/target/product/K81950AA1/symbols/),根据信号量signal值初步判断出错的类型,怎么确定so带调试信息?
在Linux中,可通过file命令读取信息,如果输出显⽰notstripped表⽰带调试信息
fileout/target/product/K81950AA1/symbols/vendor/lib64/hw/
out/target/product/K81950AA1/symbols/vendor/lib64/hw/:ELF64-bitLSBsharedobject,ARMaarch64,version1(SYSV),dyn
amicallylinked,BuildID[md5/uuid]=4e4e59619ceda634c1cd826fd2717bd8,notstripped
常见的signal含义:
SIGABRT6由abort(3)发出的退出指令,⼀般是代码逻辑⾛到了abort函数
SIGSEGV11⽆效的内存引⽤,段错误、内存问题、空指针等
其他信号含义参考
3.上⾯的backtrace中,显⽰的pc地址是相对地址(相对于somap的地址)可直接⽤addr2line来解析,如第5帧(#05pc
ae9a8/vendor/lib64/hw/),有时pc地址是绝对地址,我们怎么通过绝对地址来计算相对地址
呢?(下⾯例⼦和上⾯backtrace不是同⼀个)
#0pc7591fab52c
#1pc750f5a2044
#2pc750f881e08
#3pc750f884f40
#4pc750f9e0c50
#5pc750fa072bc
Dumpnativemapsfiles(tombstone⽂件⼀般包含,也可在进程奔溃前通过**adbshellcat/proc/$pid/maps**获得),根据pc值可知是落在:
750f4e0000-750fada000r-xp/system/lib64/
正常的so⽂件默认都是加载到0地址的,但是加载到其他地址的,不同的platform加载地址可能不同,具体可以⽤readelf证实。readelf放在⽬录pr
ebuilts/gcc/linux-x86/arm/arm-linux-androideabi-4.X/bin下⾯,使⽤命令:
ProgramHeaders:
TypeOfftVirtAddrPhysAddrFileSizMemSizFlgAlign
PHDR0x0000400x70400x70400x0001c00x0001c0R0x8
LOAD0x0000000x70000x70000x5eaf640x5eaf64RE0x1000
LOAD0x5eb2700x32700x32700x0133180x016438RW0x1000
可以看到第1个LOAD加载到了0x27000,即的加载地址是0x27000
计算相对地址
相对地址=pc-maps起始地址+so加载地址
例如上⾯的例⼦,加载地址是0x27000,则计算#1pc750f5a2044的公式是750f5a2044-750f4e0000+0x27000=0xE9044
4.解析地址(addr2line分32bit和64bit,根据要解析的库的abi来选择)
prebuilts/gcc/linux-x86/aarch64/aarch64-linux-android-4.9/bin/aarch64-linux-android-addr2line-Cfeout/target/product/K81950AA1/symbols/vendor/li
b64/hw/ae9a8
调试技巧
在进程卡住的时候,⽐如相机⿊屏卡住,我们在调试的时候经常通过adbshellkill-6$PID来主动杀进程,获取tombstone信息,也可以
通过adbshelldebuggerd-b$PID获得。
在杀进程前,最好能先收集进程的memory、fd等信息,附带⼀个脚本:
#!/bin/sh
logpath=$1
if[-z"$logpath"]
then
logpath=$(pwd)"/MemoryDebug"
echo$(pwd)
fi
adbroot
adbwait-for-device
#needclolinux,othersdumpfailure
adbshelltenforce0
pid=`er@2.4-rvice_64`
echo"dumppid="${pid}
time=$logpath"/pid"${pid}"_"$(date+%Y%m%d%H%M%S)
echo"logpath:"$time
mkdir-p${time}
#getmapsandsmapsinfo
adbshellcat/proc/$pid/maps>$time/
adbshellcat/proc/$pid/smaps>$time/
adbshell"er@2.4-rvice_64|xargspmap-x">$time/
#getprocessfdinfo
adbshellls-a-l/proc/$pid/fd>$time/
#getallprocessinfobeforedump
adbshellps-AT>$time/all_
#getprocessfilestate(threadinfo)
adbshelllsof-p$pid>$time/$pid"_process_"
#getcurrentmeminfobeforedump
adbshelldumpsysmeminfo$pid>$time/$pid"_"
#
>$time/
本文发布于:2023-01-04 04:54:52,感谢您对本站的认可!
本文链接:http://www.wtabcd.cn/fanwen/fan/90/88723.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
留言与评论(共有 0 条评论) |