Reply
Wed 28 May, 2008 07:01 am
We purchased sixteen of these. Every single one of them, when put into a busy production environment, has failed, requiring a hard reboot.
Has anyone here had a similar experience?
Machines have 24-32GB of RAM, internal RAID-1 mirrored boot disks, five GBE ports (two heartbeat LANS), dual SAN Fibre Channel cards, and run Serviceguard. Applications vary but include MFG/Pro and Websphere. We are running SLES9 SP3.
Just sitting there, they work fine, but under duress, blammo. They literally lock up tight - you can get in remotely via the iLo port, but the console won't respond. You have to use the virtual power controls to power cycle them.
DOS attack? Does your firewall limit embryonic connections?
Our corporate firewall is far removed from this environment. If it's coming from internal sources, I will say that some of these are behind secondary firewalls, which only allow connections on a very few ports.
1. Not all attacks come from outside. In fact, a poorly-behaved application can appear very similar to a DoS attack.
2. "Firewall" is a very slippery term. If you have a port filter, that's fine, but a port filter is no longer considered to be an adequate firewall. At a minimum, you need stateful inspection, and prefereably something that can perform deep-packet inspection.
Sounds to me as if one of your apps is misbehaving, though. Do you run the same apps on a different hardware platform? Does the problem occur only during heavy loads (too many threads), or after a certain number of requests (memory leak)?
We're really leaning towards hardware - at least something specific to our environment that doesn't like the G4's. We have G3's that never have any issues.
OK; that's why I asked about other hardware platforms.
I'd try slapping in a different network card, then.
oops: unable to handle kernel paging request at 0000000000014b00 RIP:
amongst other locations. I've never seen that address repeat.
Do these boxes have more memory than the G3s? Not specifically with Linux, but I've seen boxes misbehave from having TOO MUCH ram. The OS had to be tweaked.
Yes I'm pretty sure they do.
I'd look at that, then too. Pulling some of the RAM should reduce the size of the paging file, as well.
G3's have 22GB, G4's have 32GB