|
May 11, 2006
Xeon Heat Management
Last week I wrote about issues dealing with getting some of my older
32-bit Athlon processors to run in a low-power, low-heat mode during
idle conditions. As I said then, being able to switch into this mode
when the operating system isn't busy is enough to get you most of the
way toward decent power and thermal management, although sometimes you
need to do some other things, like use better fans or heatsinks. To illustrate
just how much extra effort can sometimes be required, I thought I would
talk about my efforts in trying to get one of my Xeon-based servers to
operate at low temperatures.
The adventure all started when I picked up a couple of 3.4 GHz Xeon
processors to use for application and network testing under VMware. I
wanted to load the system with multiple Gigabit Ethernet cards, so I
bought a SuperMicro
X6DHE-XG2 dual-Xeon motherboard with multiple 64-bit PCI slots. I
also picked up a Chenbro
RM-216 2U chassis that came bundled with a 460-watt Zippy
P2G-6460P power supply and four generic high-speed intake cooling
fans. I also got a pair of Swiftech
MCX603-V low-profile Xeon heatsinks, which fit inside the 2U chassis
and also came with their own factory-supplied cooling fans. Lastly I
got a hold of four Intel
PRO/1000 MT dual-port server NICs, which came with low-profile brackets
that would fit inside the chassis.
Once everything was assembled and the base software installed, the
system seemed to run fine, but it also had a couple of problems. For
one thing, all the fans made the server VERY LOUD, especially since it
was sitting in my office a few feet from my desk--I literally couldn't
even talk on the telephone while the server was running. Second, and
much more worrisome, the system would periodically reboot or shut itself
down with no apparent warning. After some poking and probing, I discovered
that the source of all these problems was extreme heat from the CPUs.
The system was running at 70 degrees Celsius at idle and would sometimes
get up into the 90 degrees range when the VMs were busy. In turn, this
caused the system BIOS to run all the fans at their highest speed (which
was the primary source of the noise) and also caused the system to shut
itself off when things got too hot.
The first thing I tried toward fixing this problem was to simply reseat
the heatsinks with better thermal paste, but that only had a minor effect,
so I started experimenting with different fans and placements.
The X6DHE-XG2 motherboard has headers for eight different fans (two
for the CPUs, two for the rear exhaust, and four intakes right behind
the drives), and the RM-213 chassis also has cutouts for the same intake
and exhaust fans. Because I wasn't using the rear exhausts, adding those
in seemed to be the most obvious solution. However, since the chassis
is a 2U low-profile unit, the cutouts for the two exhaust fans are above
the motherboard's rear-panel connector block, meaning they can only accommodate
40 mm fans. Now, a decent 80 mm fan (the most common size) can move a
reasonable amount of air at a reasonable decibel level, but 40 mm fans
have to really crank up the RPMs in order to move even a modest amount
of air, and that means higher noise. And even after trying several different
brands and models, the temperature either didn't drop by any significant
amount or it actually got worse from the fans constraining airflow. Cumulatively,
the only noticeable change was that the system got a lot louder.
For the next attempt, I tried to tackle the problem head-on by increasing
the airflow around the CPUs themselves. The MCX603-V heatsinks have pretty
good ratings on various message boards, and the bundled Delta
AFB0812M fans have pretty good airflow and decibel ratings. However,
the fans are also 20 mm thick, and in this application they appeared
to be touching against the lid of the chassis, thereby causing airflow
to be restricted and also causing some baffling vibration problems that
produced extra noise.
In order to eliminate these problems, I ordered a set of Evercool
EC8015H12B 15 mm-thick high-speed fans and swapped them in for the
Deltas. The CPU temperature immediately dropped by a good 10 degrees
just from the extra half-centimeter of airflow on top of the heatsinks.
I was able to further lower the temperature by pointing the fans upwards,
which amplified the natural convection effect of the CPU and heatsinks
and allowed the chassis intake fans to push hot air off the top of the
heatsinks at the same time they were blowing cool air onto the CPUs.
At this point I was pretty pleased with the results, but the noise
from all the fans was getting on my nerves, so I went back to experimenting
with replacement models for the chassis itself. I must have gone through
a half-dozen vendors and fan models--"smart" variable-speed fans, high-density
fans, you name it--but eventually I settled on an intake system consisting
of the two Delta fans that came from Swiftech and a pair of Unincom
U8025 sleeve fans, which cumulatively provided a good combination
of targeted airflow and quiet operation. I left the exhaust ports open
and unused because none of the 40 mm fans I tried ever seemed to do anything
positive.
The biggest reduction in noise, however, came from replacing the power
supply. One day I was reading about home-theater PC gear, and I stumbled
across a site for a custom rackmount chassis that used Enermax EG451P-VD
2U power supplies. I figured that if they were quiet enough for home-theater
gear, they had to be quieter than the Zippy power supply I was using,
so I ordered one to test. Not only did the noise all but disappear, but
the system temperature dropped another 10 degrees, too. These power supplies
have separate intake and exhaust fans, each of which run at a lower rotational
speed than a normal unit, which both improves airflow and also keeps
the noise down. Unfortunately, these power supplies seem to be either
discontinued or were never widely released in the U.S. market, and I've
bought all the units I've been able to find for sale (I use them in all
my rackmount systems now). [Update 3 (May 27): they
have been discontinued; iStar seems to have some fairly quiet, dual-fan
rackmount
power supplies though.]
All told, my system is now running at a respectable 44 degrees Celsius
at idle and only goes up into the low 70s when multiple VMs are active
and busy (which isn't very common in the kind of testing I usually do).
It even runs cool enough that I'm able to use the BIOS "workstation" fan
setting, which uses a lower fan voltage during normal operations, but
turns up the speed when the system starts to get warm. Overall, the energy
requirements are about as low as can be expected (not very), there are
no heat problems to speak of, and best of all I can't hear the server
running unless I really put the system to work (thus causing the fans
to spin up), even thought it's only a few feet from my desk. However,
getting to this point took almost a year of experimentation and several
dozen pieces of trial-and-error gear, most of which didn't help and some
of which produced unexpectedly high results.
On a broader note, this kind of story is most useful for illustrating
just how difficult it can be to get heat management issues under control
with the current generation of processors, and why it's so important
for vendors and users to pay attention to this stuff. Without this effort,
the system would have been unusable for its intended purpose due to the
instability from the out-of-control thermal issues, while the ham-handed
fix of sticking more powerful fans into the box only produces louder
systems that have to be isolated farther away from the operators. This
is all very bad. But the good news is that chip designers have begun
to realize that this is a dead-end path, and that the real fix here is
to lower the power and temperature demands of the CPUs to begin with.
There's also some interesting technology in the gaming and high-end PC
sector that will probably find their way into servers pretty soon. I'll
keep an eye out in this area and let you know what I find.
Written by Eric
A. Hall.
Copyright © 2006 CMP Media, Used with permission.
|