Wednesday, March 12, 2014

ERROR: sending message to EXTRACT EFLE01 (Timeout waiting for message).

As mentioned in my earlier posts, this is one of the issues that I have faced with GoldenGate 11.

I had three groups in this Golden Gate environment to migrate 11gR1 database from HP to 11gR2 Linux.
One group is constantly giving me this "Timeout waiting for message" error after few minutes of starting the group.

As you see below the Time Since Chkpt is like 18 hours!!!.

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING
EXTRACT     RUNNING     EFLE01      00:00:01      18:50:04

So, when you try getting the stats you get timeout warning error:

GGSCI () 9> stats efle01, status

Sending STATS request to EXTRACT EFLE01 ...

ERROR: sending message to EXTRACT EFLE01 (Timeout waiting for message).

Tried to STOP the extract and still the same error:

GGSCI () 10> stop efle01

Sending STOP request to EXTRACT EFLE01 ...

ERROR: sending message to EXTRACT EFLE01 (Timeout waiting for message).

Basically, this group is hung and only way to stop this is to kill (kill EFLE01) and restart. Well, this is not going to be accepted.

So, reached out Oracle and applied a patch on GG 11 to get to 11...17 and that gave me some more trouble (which is posted in previous post) but did not resolve this HANG issue with "Timeout waiting for message".
Going back and forth with Oracle SR, finally it was discovered that as bug with the Oracle Release that we are using 11.1.0.7 and have to apply a patch 16320411...

Not happy with the solution as we are in process of migrating from 11.1 to 11.2 from HP to Linux and applying a patch prior to that on a production is kind of not convincing solution.

So, this is how I thought and tried to see if I can get away with the issue and continue the GG replication without applying a patch.

This Group#1 had only two tables and both tables has LOB column with PARTITION and SUB-PARTITION. 

So, I decided to split these two tables into different groups instead and try it out. The reason for this thought was because I see that GG session in Oracle sits on a wait event "SQL*Net vector data to client" whenever I see this group is hung.

That's it. No more "Timeout waiting for message" issue and just yesterday we were successful in migrating to Linux also. 

No comments:

Post a Comment