首页 > 英语园地

【总结】9i RAC Instance Crash due to error 29702--Bug4390716

更新时间:2023-07-11 19:09:19 阅读：评论：0

[总结]9i RAC LMON: terminating instance due to error 29702

‐‐‐‐‐‐‐‐‐‐ Bug 4390716 解决过程

整理：王琦

时间： 2008‐02‐18

基本配置：

Linux AS3.0

内核版本： 2.4.21-37.ELsmp

Oracle 9.2.0.4 升级到Oracle9.2.0.7 , RAC ，两节点。 clusterware是9204软件安装的。上海光华学院剑桥国际中心

后来的查询发现Oracle 9.2.0.4 RAC 系统升级到Oracle9.2.0.7 , Oracle RDBMS Software 是可以升级到Oracle9.2.0.7的，但是Oracle9.2.0.7 Patcht 确实没有ORACM Cluster管理软件的升级版，是Oracle的一个bug （9.2.0.7.0 Bug 4163445） , 只有从Oracle9.2.0.6 Patcht上升级Oracle9.2.0.4 Clusterware软件ORACM （丛Oracle CM Log 中可以看到Oracle9.2.0.4 版本下安装的ORACM版本为 oracm 9.2.0.2.

0, 9.2.0.7 补丁没有升级ORACM版本，Oracle9.2.0.6 Patch升级的版本是 oracm 9.2.0.6.0.52） . 注意的一点是所有升级动作一定要严格按照Readme来操作，当然Oracle 的Readme也不一定都考虑到了，这个问题就是一个例子。

www.itpub/viewthread.php?tid=922265&extra=&highlight=%2Btolywang&page=3

（9.2.0.7.0 Bug 4163445）

问题描述：

出现的问题描述如下（节点1 以及节点 2 交替每隔5～8天左右实例crash一次）：

alter_orcl1.log

-------------------------------------------------------------------------------------------------

Sat Jan 5 18:44:19 2008

ARC1: Evaluating archive log 1 thread 1 quence 122

ARC1: Beginning to archive log 1 thread 1 quence 122

Creating archive destination LOG_ARCHIVE_DEST_1: '/ocfs_arch1/orcl/1_122.dbf'

ARC1: Completed archiving log 1 thread 1 quence 122

Sat Jan 5 19:36:06 2008

Thread 1 advanced to log quence 124

Current log# 4 q# 124 mem# 0: /ocfs_ctrl_redo/orcl/redo04.log

Current log# 4 q# 124 mem# 1: /ocfs_data/orcl/redo04b.log

Sat Jan 5 19:36:06 2008

ARC1: Evaluating archive log 3 thread 1 quence 123

ARC1: Beginning to archive log 3 thread 1 quence 123

Creating archive destination LOG_ARCHIVE_DEST_1: '/ocfs_arch1/orcl/1_123.dbf' ARC1: Completed archiving log 3 thread 1 quence 123

Sat Jan 5 19:45:15 2008

偏食

Errors in file /u01/product/admin/orcl/bdump/orcl1_:

ORA-29702: error occurred in Cluster Group Service operation

Sat Jan 5 19:45:15 2008

LMON: terminating instance due to error 29702

Sat Jan 5 19:45:16 2008

System state dump is made for local instance

Sat Jan 5 19:45:20 2008了望

Instance terminated by LMON, pid = 14214

Sat Jan 5 19:54:53 2008

Starting ORACLE instance (normal)

Sat Jan 5 19:54:53 2008

Global Enqueue Service Resources = 26694, pool = 4

Sat Jan 5 19:54:53 2008

Global Enqueue Service Enqueues = 39350

LICENSE_MAX_SESSION = 0

LICENSE_SESSIONS_WARNING = 0

SCN scheme 2

Using log_archive_dest parameter default value

LICENSE_MAX_USERS = 0

SYS auditing is disabled

Starting up ORACLE RDBMS Version: 9.2.0.7.0.

System parameters with non-default values:

process = 1000

timed_statistics = FALSE

resource_limit = TRUE

shared_pool_size = 419430400

large_pool_size = 33554432

java_pool_size = 33554432advantage

$ vi /u01/product/admin/orcl/bdump/orcl1_

=============

/u01/product/admin/orcl/bdump/orcl1_

Oracle9i Enterpri Edition Relea 9.2.0.7.0 - Production

With the Partitioning, Real Application Clusters, OLAP and Oracle Data Mining options JServer Relea 9.2.0.7.0 - Production

bigbang好听的歌

ORACLE_HOME = /u01/product/oracle

System name: Linux

Node name: DELL-RAC01

Relea: 2.4.21-37.ELsmp

Version: #1 SMP Wed Sep 7 13:28:55 EDT 2005

Machine: i686

Instance name: orcl1

Redo thread mounted by this instance: 0 <none>

Oracle process number: 4

Unix process pid: 14214, image: oracle@DELL-RAC01 (LMON)

*** SESSION ID 3.1) 2007-12-31 12:07:45.591

GES IPC: Receivers 3 Senders 3

GES IPC: Buffers Receive 1000 Send (i:2230 b:2230) Rerve 1000

GES IPC: Msg Size Regular 396 Batch 2048

Batch msg size = 2048

Batching factor: enqueue replay 48, ack 53

Batching factor: cache replay 34 size per lock 56

kjxggin: receive buffer size = 32768

kjxgmin: SKGXN ver (2 1 Oracle 9i Reference CM)

CMCLI WARNING: CMInitContext: init ctx(0xb6d93f8)

*** 2007-12-31 12:07:49.396

boobies

kjxgmrcfg: Reconfiguration started, reason 1

kjxgmcs: Setting state to 0 0.

*** 2007-12-31 12:07:49.396

Name Service frozen

kjxgmcs: Setting state to 0 1.

kjfcpiora: publish my weight 122787

kjxgmps: proposing substate 2

kjxgmcs: Setting state to 1 2.

Performed the unique instance identification check kjxgmps: proposing substate 3

kjxgmcs: Setting state to 1 3.

Name Service recovery started

Deleted all dead-instance name entries

kjxgmps: proposing substate 4

kjxgmcs: Setting state to 1 4.

Multicasted all local name entries for publish

Replayed all pending requests

kjxgmps: proposing substate 5

kjxgmcs: Setting state to 1 5.

Name Service normal

Name Service recovery done

*** 2007-12-31 12:07:49.611

kjxgmps: proposing substate 6

kjxgmcs: Setting state to 1 6.

*** 2007-12-31 12:07:49.832

Reconfiguration started (old inc 0, new inc 1) Synchronization timeout interval: 600 c

List of nodes:

Global Resource Directory frozen

node 0

eosinrelea 9 2 0 7

pinan* kjshashcfg: I'm the only node in the cluster (node 0) Active Sendback Threshold = 50 %

Communication channels reestablished

Master broadcasted resource hash value bitmaps

Non-local Process blocks cleaned out

Resources and enqueues cleaned out

Resources remastered 0

0 GCS shadows traverd, 0 cancelled, 0 clod

0 GCS resources traverd, 0 cancelled

t master node info

Submitted all remote-enqueue requests

Update rdomain variablesinvalidate

Dwn-cvts replayed, VALBLKs dubious

All grantable enqueues granted

*** 2007-12-31 12:07:50.121

0 GCS shadows traverd, 0 replayed, 0 unopened Submitted all GCS cache requests

0 write requests issued in 0 GCS resources

0 PIs marked suspect, 0 flush PI msgs

ORACM Log 当时的信息： ERROR: WriteEventPort: write failed with error 32

------------------------------------------------------------

Debug Hang :ClientProcListener (PID=14257) UnRegistered with watchdog daemon. {Sat Jan 5 19:45:16 2008 }^M

>WARNING: ReadCommPort: socket clod by peer on recv()., tid = ClientProcListener:688145 file = unixinc.c, line = 767 {Sat Jan 5 19:45:16 2008 }^M >ERROR: WriteEventPort: write failed with error 32., tid = ClientProcListener:688145 file = unixinc.c, line = 915 {Sat Jan 5 19:45:16 2008 }^M

Debug Hang :ClientProcListener (PID=14261) UnRegistered with watchdog daemon. {Sat Jan 5 19:45:16 2008 }^M

>WARNING: ReadCommPort: socket clod by peer on recv()., tid = ClientProcListener:622615 file = unixinc.c, line = 767 {Sat Jan 5 19:45:16 2008 }^M Debug Hang :ClientProcListener (PID=14255) UnRegistered with watchdog daemon. {Sat Jan 5 19:45:16 2008 }^M

>WARNING: ReadCommPort: socket clod by peer on recv()., tid = ClientProcListener:557077 file = unixinc.c, line = 767 {Sat Jan 5 19:45:16 2008 }^M

Diag trace log :

/u01/product/admin/orcl/bdump/orcl2_

Oracle9i Enterpri Edition Relea 9.2.0.7.0 - Production

With the Partitioning, Real Application Clusters, OLAP and Oracle Data Mining options JServer Relea 9.2.0.7.0 - Production

ORACLE_HOME = /u01/product/oracle

System name: Linux

Node name: DELL-RAC02

Relea: 2.4.21-37.ELsmp

Version: #1 SMP Wed Sep 7 13:28:55 EDT 2005

Machine: i686

Instance name: orcl2

Redo thread mounted by this instance: 0 <none>

Oracle process number: 3

Unix process pid: 14211, image: oracle@DELL-RAC02 (DIAG)

*** SESSION ID:(2.1) 2008-01-16 12:16:14.524

CMCLI WARNING: CMInitContext: init ctx(0xb9115f4)

kjzcprt:rcv port created

当然的英文Node id: 1

List of nodes: 0, 1,

*** 2008-01-16 12:16:14.526

Reconfiguration starts [incarn=0]

I'm the voting node

Send my bitmap to master 0

Rcfg confirmation is received from master 0

I agree with the rcfg confirmation

*** 2008-01-16 12:16:25.233

Reconfiguration completes [incarn=2]

*** 2008-01-19 04:50:21.933

Instance is terminating by process 14215 [ospid=oracle@DELL-RAC02 (LMON)] Performing diagnostic data dump for this instance

CMCLI WARNING: CommonContextCleanup: closing comm port

DIAG detachs from CM

error 29723 detected in background process

OPIRIP: Uncaught error 447. Error stack:

ORA-00447: fatal error in background process

ORA-29723: Failed to attach to the global enqueue rvice (status=32)

从metalink上面的错误描述上看，似乎是由于rac环境两个实例的libskgxn9.so不一致造成的。

处理方法：

1．由于是Oracle9.2.0.4 升级到Oracle9.2.0.7 , 而9207没有ORACM的升级版本软件，只有RDBMS的软件。所以还必须通过9206的patcht来升级oracm9.2.0.2到oracm9.2.0.6.0.52版本。注意了，一定要严格按照readme来操作。

2．当然升级Oracle RDBMS , Oracm9.2.0.6之后还需要运行一些catproc.sql ……等脚本来更新数据字典，这些在readme上都有。

3．有些bug是没有公布的，在google,baidu都不能找到，必须到metalink上才能看到。而

本文发布于:2023-07-11 19:09:19，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/78/1091222.html

上一篇：The milking machine due to that the milking manner

下一篇：赶due和赶ddl的英语表达

标签：版本描述升级

留言与评论（共有 0 条评论）