This is story long back – when we were going thru the Alert log of our Oracle database after a sudden failure.
We found that there is a severe discrepancy inside the alert log file, it seems that some of the lines are just deleted….And we all thought our system admin team might be trying to hide some serious operational mistakes by deleting some of the lines in the oracle alert log file.
We were running an active/passive cluster environment using Veritas cluster with an external storage …….
So what happened actually? The HBA card (To understand easily, this is a device which helps you to connect your storage) got a problem.
And pathetically of the design was such that the oracle logs were also on the same storage device. So although Veritas cluster helped to start the oracle database on the other node (which was acting passive) after things failed on the first node, oracle was unable to write events in the alert log file.
Is achieving 100% availability is a joke? I don’t know……..