Thursday, March 27, 2014

Wait Event - cell physical IO interconnect bytes -in 11g (11.2.0.3)

Was working on a query performance issue and noticed this wait event "cell physical IO interconnect bytes".

Reading more on this tells me that this wait event is for Exadata. But our database is not on Exadata so why am I seeing this event!!!

Well, sounds like a bug to me as the value for this event is exactly same is "physical read bytes".

Tuesday, March 25, 2014

alter partition with INTERVAL (11g New feature)

INTERVAL - is a new feature introduced in 11g.
Make sure you have compatible parameter set to >= 11.1.

CREATE TABLE san_interval_tab (
  col1 NUMBER,
  col2 VARCHAR2(10),
  col3 VARCHAR2(50),
  created_date DATE
)
PARTITION BY RANGE (created_date)
(
   PARTITION part_01 values LESS THAN (TO_DATE('01-OCT-2010','DD-MON-YYYY'))
);

From above there's only one partition created.

insert into san_interval_tab values (1,'a','b', to_date('20-nov-2010','dd-mon-yyyy'));

--ORA-14400: inserted partition key does not map to any partition -- you get this error as there is no partition exists to hold this value and in 10g environment we will create partitions manually for couple of months or years in advance to take care of these errors.

In 11g, we can easily alter this table partition to utilize INTERVAL partition and no need to worry about monitoring for new partitions.

alter table san_interval_tab add partition part_02 values LESS THAN (TO_DATE('01-dec-2010','DD-MON-YYYY')); --> this is how you would handle without INTERVAL option.

In 11g:
alter table san_interval_tab set interval (NUMTOYMINTERVAL(1,'MONTH')); --> this will create partitions automatically.

alter table san_interval_tab set interval (NUMTOYMINTERVAL(3,'MONTH')); --> example showing the interval range is 3 months instead of 1 month.

Use NUMTODSINTERVAL if you want to range by Day wise.

Check the dictionary table for the partitions and its high values assigned to each:
select partition_name, high_value from user_tab_partitions
 where table_name = 'SAN_INTERVAL_TAB';

Wednesday, March 12, 2014

ERROR: sending message to EXTRACT EFLE01 (Timeout waiting for message).

As mentioned in my earlier posts, this is one of the issues that I have faced with GoldenGate 11.

I had three groups in this Golden Gate environment to migrate 11gR1 database from HP to 11gR2 Linux.
One group is constantly giving me this "Timeout waiting for message" error after few minutes of starting the group.

As you see below the Time Since Chkpt is like 18 hours!!!.

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING
EXTRACT     RUNNING     EFLE01      00:00:01      18:50:04

So, when you try getting the stats you get timeout warning error:

GGSCI () 9> stats efle01, status

Sending STATS request to EXTRACT EFLE01 ...

ERROR: sending message to EXTRACT EFLE01 (Timeout waiting for message).

Tried to STOP the extract and still the same error:

GGSCI () 10> stop efle01

Sending STOP request to EXTRACT EFLE01 ...

ERROR: sending message to EXTRACT EFLE01 (Timeout waiting for message).

Basically, this group is hung and only way to stop this is to kill (kill EFLE01) and restart. Well, this is not going to be accepted.

So, reached out Oracle and applied a patch on GG 11 to get to 11...17 and that gave me some more trouble (which is posted in previous post) but did not resolve this HANG issue with "Timeout waiting for message".
Going back and forth with Oracle SR, finally it was discovered that as bug with the Oracle Release that we are using 11.1.0.7 and have to apply a patch 16320411...

Not happy with the solution as we are in process of migrating from 11.1 to 11.2 from HP to Linux and applying a patch prior to that on a production is kind of not convincing solution.

So, this is how I thought and tried to see if I can get away with the issue and continue the GG replication without applying a patch.

This Group#1 had only two tables and both tables has LOB column with PARTITION and SUB-PARTITION. 

So, I decided to split these two tables into different groups instead and try it out. The reason for this thought was because I see that GG session in Oracle sits on a wait event "SQL*Net vector data to client" whenever I see this group is hung.

That's it. No more "Timeout waiting for message" issue and just yesterday we were successful in migrating to Linux also. 

Sunday, March 2, 2014

GoldenGate REPLICAT abending with OGG-05500

OK, Have multiple issues going on with one of our GG environment.
The database which is involved in this GG replication is about 2TB and has LOB's with Partitions and SUB-Partitions.
First issue was that the GG Extract process hangs and only thing it reports is:

ERROR: sending message to EXTRACT EFLE01 (Timeout waiting for message).

Still working with Oracle on that issue.
Meanwhile, we were told to upgrade GG environment to 11.2.1.0.17. Even  with this upgrade the issue is still not resolved and Oracle raised a bug with their team to see what's causing this hang.

Anyways, after upgrading to 11.2.1.0.17, the Pump group which was working fine in the same environment started giving this new error: (Basically, there is a bug introduced from 11.2.1.0.14 and thus this OGG-05500 errors)

OGG-05500  Oracle GoldenGate Capture for Oracle, PFLE01.prm:  Detected database metadata mismatch between current trail file dirdat/la000010 and the previous sequence.
*DBCLIENTCHARSET: [1032]/[1073741824].

Went thru this Meatlink DocID#1602359.1 and fixed that Extract group.

But now, it started throwing the same error at Replicat side:

INFO OGG-00996 Oracle GoldenGate Delivery for Oracle, RFLE02.prm: REPLICAT RFLE02 started.
2014-03-01 22:33:39 ERROR OGG-05500 Oracle GoldenGate Delivery for Oracle, RFLE02.prm: Detected database metadata mismatch between current trail file dirdat/rb000030 and the previous sequence.

Well, looking into info rfle02, detail it finished working with file 29 and file 30 was not able to read because of the upgrade. So, just altered the replicat (alter replicat rfle02, extseqno 30, extrba 0
) and started and its happy now.

Still dealing with another group is being stuck/hang... Hope to resolve that issue soon.