Tag Archives: Efficiency

Folding@Home on GeForce RTX 3090 Review

Hi everyone, sorry for the delay in blog posts. Electricity in Connecticut has been so expensive lately that except for our winter heating Folding@Home cluster, it wasn’t affordable to keep running all those GPUs (even with our solar panels, which is really saying something). However, I did manage to get some good data on the top-tier Nvidia RTX 3090, which I got during COVID as the GPU in a prebuilt HP Omen gaming desktop. I transplanted the 3090 into my benchmark desktop, so these stats are comparable to previous cards I’ve tested.

Wait, what are we doing here?

For those just joining, this is a blog about optimizing computers for energy efficiency. I’m running Folding@Home, a distributed computing research project that uses your computer to help fight diseases such as cancer and covid and a host of other ailements. For more information, check out the project website here: https://foldingathome.org/

Look at this bad boy!

This is the HP OEM version of an RTX 3090. I was impressed that it had lots of copper heat pipes and a metal back plate. Overall this was a very solid card for an OEM offering.

HP OEM Nvidia RTX 3090 installed in my AMD Ryzen 9 3950X benchmark desktop

At the time of my testing, the RTX 3090 was the top-tier card from Nvidia’s new Ampere line. They have since released the 3090 Ti, which is ever so slightly faster. To give you an idea of where the RTX 3090 stacks compared to the previous cards I have tested, here is a table. Note that 350 watt TDP! That is a lot of power for this air cooler to dissipate.

The Test

I ran Folding@Home on my benchmark desktop in Windows 10, using Folding@Home client 7.6.13. I was immediately blown away by the insane Points Per Day (PPD) that the 3090 can spit out! Here’s a screen shot of the client, where the card was doing a very impressive 6.4 million PPD!

What was really interesting about the 3090 though was how much variation there was in performance depending on the size of the molecule being worked on. Very large molecules with high atom counts benefited greatly from the number of CUDA cores on this card, and it kicked butt in both raw performance (PPD) and effiency (PPD/Watt). Smaller molecules, however, did not fully utilize this card’s impressive potential. This resulted in a less efficiency and more wasted power. I would assume that running two smaller Ampere cards, for example the 3080, with small models would be more efficient than using the 3090 for small models, but I haven’t got any 3080’s to test that assumption (yet!).

In the plots below, you can see that the smaller model (89k atoms) resulted in a peak PPD of about 4 million, as opposed to the 7 million PPD with a 312k atom model. PPD/watt at 100% card power was also less efficient for the smaller model, coming in at about 16,500 PPD/Watt vs. 10,000 PPD/Watt. These are still great efficiency numbers, which shows how far GPU computing has come from previous generations.

Reduce GPU TDP Power Target to Improve Efficiency

I’ve previously shown how GPUs are set up for maximum performance out of the box, which makes sense for video gaming. However, if you are trying to maximize energy efficiency of your computational machines, reducing the power target of the GPU can result in massive efficiency gains. The GeForce RTX 3090 is a great example of this. When solving large models, this beast of a card benefits from throttling the power down, gaining 2.35% improved energy efficiency with a power target set for 85%. However, the huge improvement comes for solving smaller models. When running the 89k atom work unit, I got a whopping 29% efficiency improvement when setting the power target to 55% with only a 14% performance reduction! Since the F@H project gives out a lot of smaller work units in addition to some larger ones, I chose to run my machine at a 75% power target. On average, this splits the difference, and gives a noticeable efficiency improvement without sacrificing raw PPD performance too much. In the RTX 3090’s case, a 75% power target massively reduced the power draw on the computer (reduced wall consumption from 434 to 360 watts), as well as reduced heat and noise coming out of the chassis. This promotes a more happy office environment and a happier computer, that will last longer!

Tuning Results: 89K Atoms (Small Model)

Here are the tuning plots for a smaller molecule. In all cases, the X-axis is the power target, set in the Nvidia Driver. 100% corresponds to 350 Watts in the case of the RTX 3090.

Tuning Results: 312K Atoms (Large Model)

And here are the tuning results for a larger molecule.

Overall Results

Here are the comparison results to the previous hardware configurations I have tested. Note that now that the F@H client supports enabling CUDA, I did some tests with CUDA on vs. off with the RTX 2080 Ti and the 3090. Pro Tip: MAKE SURE CUDA IS ON! It really speeds things up and also improves energy efficiency.

Key takeaways from below is that the 3090 offers 50% more performance (PPD) than the 2080 Ti, and is almost 30% more energy efficient while doing it! Note this does not mean this card sips power…it actually uses more watts than any of the other cards I’ve tested. However, it does a lot more computation with those watts, so it is putting the electricity to better use. Thus, a data center or workstation can get through more work in a shorter amount of time with 3090s vs. other cards, and thus use less power overall to solve a given amount of work. This is better for the environment!

Nvidia RTX 3090 Folding@Home Performance (green bars) compared to other hardware configurations
Nvidia RTX 3090 Folding@Home Total System Power Consumption (green bars) compared to other hardware configurations
Nvidia RTX 3090 Folding@Home Energy Efficiency (green bars) compared to other hardware configurations.

Conclusion

The flagship Ampere architecture Nvidia GeForce RTX 3090 is an excellent card for compute applications. It does draw a ton of power, but this can be mitigated by reducing the power target in the driver to gain efficiency and reduce heat and noise. In the case of Folding@Home disease research, this card is a step change in both performance and energy efficiency, offering 50% more compute power and 30% more efficiency than the previous generation. I look forward to testing out other Ampere cards, as well as the new 40xx “Lovelace” architecture, if Eversource ever drops the electric rate back to normal levels in CT.

Advertisement

AMD Ryzen 9 3950x Part 4: Full Throttle Folding with CPB Overclocking and SMT

This is part four of my Folding@Home review for AMD’s top-tier desktop processor, the Ryzen 9 3950x 16-core CPU. Up until recently, this was AMD’s absolute beast-mode gaming and content creation desktop processor. If you happen to have one, or are looking for a good CPU to fight COVID and Cancer with, you’ve come to the right place.

Folding@Home is a distributed computing project where users can donate computational runtime on their home computers to fight diseases like Cancer, Alzheimer’s, Mad-Cow, and many others. For better or for worse, COVID-19 caused an explosion of F@H popularity, because the project was retooled to focus on understanding the coronavirus molecule to aid researches develop ways to fight it. This increase in users caused Folding@Home to become (once again) the most powerful supercomputer in the world. Of course this comes with a cost: namely, in the form of electricity. Most of my articles to date have focused on GPU folding. However, the point of this series of articles is to investigate how someone running CPU folding can optimize their settings to do the most work for the least amount of power, thus reducing their power bill and reducing the environmental impact of all this computing.

In the last part of this review, I investigated the differences seen between running Folding@Home with SMT (also known as Hyperthreading) on and off. The conclusion from that review was that performance does scale with virtual cores, and that the best science-fighting and energy efficiency is seen with 30 or 32 threads enabled on the CPU folding slot.

The previous testing was all performed with Core Performance Boost off. CPB is the AMD equivalent of Intel’s Turbo Boost, which is basically automatic, dynamic overclocking of the processor (both CPU frequency and voltage) based on the load on the chip. Keeping CPB turned off in previous testing resulted in all tests being run with the CPU frequency at the base 3.5 GHz.

In this final article, I enabled CPB to allow the Ryzen 9 3950x to scale its frequency and voltage based on the load and the available thermal and power headroom. Note that for this test, I used the default AMD settings in the BIOS of my Asus Prime X570-P motherboard, which is to say I did not enable Precision Boost Overdrive or any other setting to increase the automatic overclocking beyond the default power and thermal limits.

Test Setup

As with the other parts of this review, I used my new Folding@Home benchmark machine which was previously described in this post. The only tweaks to the computer since that post was written were the swap outs of a few 120mm fans for different models to improve cooling and noise. I also eliminated the 80 mm side intake fan, since all it did was disrupt the front-to-back airflow around the CPU and didn’t make any noticeable difference in temperatures. All of these cooling changes made less than a 2 watt difference in the machine’s idle performance (almost unmeasurable), so I’m not going to worry about correcting the comparison plots.

Because it’s been a while since I wrote about this, I figured I’d recap a few things from the previous posts. The current configuration of the machine is:

  • Case: Raidmax Sagitta
  • Power Supply: Seasonic Prime 750 Watt Titanium
  • Intake Cooling: 2 x 120mm fan (front)
  • Exhaust Cooling: 1 x 120 mm (rear) + PSU exhaust (top)
  • CPU Cooler: Noctua NH-D15 SE AM4
  • CPU: AMD Ryzen 9 3950x
  • Motherboard: Asus Prime X570-P
  • Memory: 32 GB Corsair Vengeance LPX DDR4 3600 MHz
  • GPU: Zotac Nvidia GeForce 1650 installed for CPU testing
  • OS Drive: Samsung 970 Evo Plus 512 GB NVME SSD
  • Storage Drive #1: Samsung 860 EVO 2TB SSD
  • Storage Drive #2: Western Digital Blue 128 GB NVME SSD
  • Optical Drive: Samsung SH-B123L Blu-Ray Drive
  • Operating System: Windows 10 Home

The Folding@Home software client used was version 7.6.13.

Test Methodology

The point of this testing is to identify the best settings for performance and energy efficiency when running Folding@Home on the Ryzen 3950x 16-core processor. To do this, I set the # of threads to a specific value between 1 and 32 and ran five work units. For each work unit, I recorded the instantaneous points per day (PPD) as reported in the client, as well as power consumption of the machine as reported on my P3 Kill A Watt meter. I repeated this 32 times, for a total of 160 tests. By running 5 tests at each nCPU setting, some of the work unit variability can be averaged out.

The Number of CPU threads can be set by editing the slot configuration

Folding@Home Performance: Ryzen 9 3950X

Folding@Home performance is measured in Points Per Day (PPD). This is the numbe that most people running the project are most interested in, as generating lots of PPD means your machine is doing a lot of good science to aid the researchers in their fight against diseases. The following plot shows the trend of Points Per Day vs. # of CPU threads engaged. The average work unit variation came out to being around 12%…this results in a pretty significant spread in performance between different work units at higher thread counts. As in the previous testing, I plotted a pair of boundary lines to capture the 95% confidence interval, meaning that assuming a Gaussian distribution of data points, 95% of the work units will perform between in this boundary region.

AMD Ryzen 9 3950X Folding@Home Performance: Core Performance Boost and Simultaneous Multi-Threading Enabled

As can be seen in the above plot, in general, the Folding@Home client’s Points Per Day production increases with increasing core count. As with the previous results, the initial performance improvement is fairly linear, but once the physical number of CPU cores is exceeded (16 in this case), the performance improvement drops off, only ramping up again when the core settings get into the mid 20’s. This is really strange behavior. I suspect it has something to do with how Windows 10 schedules logical process threads onto physical CPU cores, but more investigation is needed.

One thing that is different abut this test is that the Folding@Home consortium started releasing new work units based on the A8 core. These work units support the AVX2_256 instruction set, which allows some mathematical operations to be performed more efficiently on processors that support AVX2 (specifically, an add operation and a multiply operation can be performed at the same time). As you can see, the Core A8 work units, denoted by purple dots, fall far above the average performance and the 95% confidence interval lines. Although it is awesome that the Folding@Home developers are constantly improving the software to take advantages of improved hardware and computer programming, this influx of fancy work units really slowed my testing down! There were entire days when all I would get were core A8 units, when I really need core A7 units to compare to my previous testing. Sigh…such is the price of progress. Anyway, these work units were excluded from the 5-work unit averages composing each data point, since I want to be able to compare the average performance line to previous testing, which did not include these new work units.

As noted in my previous posts, some settings of the # of CPU threads result in the client defaulting to a lower thread count to prevent numerical problems that can arise for certain mathematical operations. For reference, the equivalent thread settings are shown in the table below:

Equivalent Thread Settings:

The Folding@Home Client Adjusts the Thread Count to Avoid Numerical Problems Arising with Prime Numbers and Multiples Thereof…

Folding@Home Power Consumption

Here is a much simpler plot. This is simply the power consumption as reported by my P3 Kill A Watt meter at the wall. This is total system power consumption. As expected, it increases with increasing core count. Since the instantaneous power the computer is using wobbles around a bit as the machine is working, I consider this to be an “eyeball averaged” plot, with an accuracy of about 5 watts.

AMD Ryzen 9 3950X Folding@Home Power Consumption: Core Performance Boost and Simultaneous Multi-Threading Enabled

As can be seen in the above plot, something interesting starts happening at higher thread counts: namely, the power consumption plateaus. This wasn’t seen in previous testing with Core Performance Boost set to off. Essentially, with CPB on, the machine is auto-overclocking itself within the factory defined thermal and power consumption limits. Eventually, with enough cores being engaged, a limit is reached.

Investigating what is happening with AMD’s Ryzen Master software is pretty enlightening. For example, consider the following three screen shots, taken during testing with 2, 6, and 16 threads engaged:

2 Thread Solve:

AMD Ryzen Master: Folding@Home CPU Folding, 2 Threads Engaged

6 Thread Solve

AMD Ryzen Master: Folding@Home CPU Folding, 6 Threads Engaged

16 Thread Solve

AMD Ryzen Master: Folding@Home CPU Folding, 16 Threads Engaged

First off, please notice that the temperate limit (first little dial indicator) is never hit during any test condition, thanks to the crazy cooling of the Noctua NH-D15 SE. Thus, we don’t have to worry about an insufficient thermal solution marring the test results.

Next, have a look at the second and third dial indicators. For the 2-core solve, the peak CPU speed is a blistering 4277 MHz! This is a factory overclock of 22% over the Ryzen 9 3950x’s base clock of 3500 MHz. This is Core Performance Boost in action! At this setting, with only 2 CPU cores engaged, the total package power (PPT) is showing 58% use, which means that there is plenty of electrical headroom to add more CPU cores. For the 6-core solve, the peak CPU speed has come down a bit to 4210 MHz, and the PPT has risen to 79% of the rated 142 watt maximum. What’s happening is the extra CPU cores are using more power, and the CPU is throttling those cores back a bit to keep everything stable. Still, there is plenty of headroom.

That story changes when you look at the plot for the 16-thread solve. Here, the peak clock rate has decreased to 4103 MHz and the total package power has hit the limit at 142 watts (a good deal beyond the 105 watt TDP of the 3950X!). This means that the Core Performance Boost setting has pushed the clocks and voltage as high as can be allowed under the default auto-overclocking limits of CPB. This power limit on the CPU is the reason the system’s wall power consumption plateaus at 208 watts.

If you’re wondering what makes up the difference between the 208 watts reported by my watt meter and the 142 watts reported by Ryzen Master, the answer is the rest of the system besides the CPU socket. In other words, the motherboard, memory, video card, fans, hard drives, optical drive, and the power supply’s efficiency.

Just for fun, here is the screen shot of Ryzen Master for the full 32-core solve!

AMD Ryzen Master: Folding@Home CPU Folding, 32 Threads Engaged

Here, we have an all-core peak frequency of 3855 MHz. Interestingly, the CPU temp and PPT have decreased slightly from the 16-core solve, even though the processor is theoretically working harder. What’s happening here is yet another limit has been reached. Look at the 6th dial indicator labeled ‘TDC’. This is a measure of the instantaneous peak current, in Amperes, being applied to the CPU. Apparently with 32 threads, this peak current limit of 95 amps is getting hit, so clock speed and voltage is reduced, resulting in a lower average socket power (PPT) than the 16-core solve.

Folding@Home Efficiency

Now for my favorite plot…Efficiency! Here, I am taking the average performance in PPD (excluding the newfangled A8 work units for now) and dividing it by the system’s wall power consumption. This provides a measure of how much work per unit of power (PPD/Watt) the computer is doing.

AMD Ryzen 9 3950X Folding@Home Efficiency: Core Performance Boost and Simultaneous Multi-Threading Enabled

This plot looks fairly similar to the performance plot. In general, throwing more CPU threads at the problem lets the computer do more work in a unit of time. Although higher thread counts consume more power than lower thread counts, the additional power use is offset by the massive amount of extra computational work being done. In short, effiency improves as thread count improves.

There is a noticeable dent in the curve however, from 15 to 23 threads. This is this interesting region where things get weird. As I mentioned before, I think what might be happening is some oddity in how Windows 10 schedules jobs once the physical number of CPU threads has been exceeded. I’m not 100% sure, but what I think Windows is doing is potentially juggling the threads around to keep a few physical CPU cores free (basically, it’s putting two threads on one CPU core, i.e. utilizing SMT, even when it doesn’t have to, in order to keep some CPU cores available for other tasks, such as using Windows). It isn’t until we get over 24 threads that Windows decides we are serious about running all these jobs, and reluctantly schedules the jobs out for pure performance.

I do have some evidence to back up this theory. Investigating what is going on with Ryzen Master with Folding@Home set to 20 threads is pretty telling.

AMD Ryzen Master: Folding@Home CPU Folding, 32 Threads Engaged

Since 20 threads exceeds the 16-core capacity of the processor, one would think all 16 cores would be spun up to max in order to get through this work as fast as possible. However, that is not the case. Only 12 cores are clocked up. Now, if you consider SMT, these 12 cores can handle 24 threads of computation. So, virtual cores are being used as well as physical cores to handle this 20-thread job. This obviously isn’t ideal from a performance or an efficiency standpoint, but it makes sense considering what Windows 10 is: a user’s operating system, not a high performance computing operating system. By keeping some physical CPU cores free when it can, Microsoft is hoping to ensure users a smooth computing experience.

Comparison to Previous Results

The above plots are fun and all, but the real juice is the comparison to the previous results. As a reminder, these were covered in detail in these posts:

SMT On, CPB Off

SMT Off, CPB Off

Performance Comparison

In the previous parts of this article, the difference between SMT (aka Hyperthreading) being on or off was shown to be negligible on the Ryzen 9 3950x in the physical core region (thread count = 16 or less). The major advantage of SMT was it allowed more solver threads to be piled on, which eventually results in increased performance and efficiency for thread counts above 25. In the plot below, the third curve basically shows what the effect of overclocking is. In this case, Core Performance Boost, AMD’s auto-overclocking routine, provides a fairly uniform 10-20 percent improvement. This diminishes for high core count settings though, becoming a nominal 5% improvement above 28 cores. It should be noted that the effects of work unit to work unit variation are still apparent, even with five averages per test case, so don’t try to draw any specific conclusions at any one thread count. Rather, just consider the overall trend.

AMD Ryzen 9 3950X Folding@Home Performance Comparison: Various Settings

Power Comparison

The power consumption plot shows a MASSIVE difference between wall power being used for the CPB testing vs the other two tests. This shouldn’t come as a surprise. Overclocking a processor’s frequency requires more voltage. Within a given transistor cycle, the Average Voltage * Average Current = Average Power, so for a constant current being supplied to the CPU socket, an increase in voltage increases the power being consumed. This is compounded by the transistor switching frequency going up as well (due to the increased frequency), which also results in a higher average power consumption due to there being more transistor switching activities occurring in a given unit of time.

In short, we are looking at a very noticable increase in your electrical bill to run Folding@Home on an overclocked machine.

AMD Ryzen 9 3950X Folding@Home Power Comparison: Various Settings

Efficiency Comparison

Efficiency is the whole point of this article and this blog, so behold! I’ve shown in previous articles both on CPUs and GPUs that overclocking typically hurts efficiency (and conversely, that underclocking and undervolting improves efficiency). The story doesn’t change with factory automatic overclocking routines like CPB. In the below, it is clear that and here we have a very strong case for disabling Core Performance Boost, since it is up to 25% less efficient when enabled.

AMD Ryzen 9 3950X Folding@Home Efficiency Comparison: Various Settings

Conclusion

The Ryzen 9 3950x is a very good processor for fighting disease with Folding@Home. The high core count produces exceptional efficiency numbers for a CPU, with a setting of 30 threads being ideal. Leaving 2 threads free for the rest of Windows 10 doesn’t seem to hurt performance or efficiency too much. Given the work unit variation, I’d say that 30 and 32 threads produce the same result on this processor.

As far as optimum settings, to get the most bang for electrical buck (i.e. efficiency), running that 30-thread CPU slot requires SMT to be enabled. Disabling CPB, which is on by default, results in a massive efficiency improvement by cutting over 50 watts off the power consumption. For a dedicated folding computer running 24/7, shaving that 50 watts off the electric bill would save 438 kWh/year of energy. In my state, that would save me $83 annually, and it would also save about 112 lbs of CO2 from being released into the atmosphere. Imagine the environmental impact if the 100,000+ computers running Folding@Home could each reduce their power consumption by 50 watts by just changing a setting!

Future Work

If there is one thing to be said about overclocking a Ryzen 3xxx-series processor, it’s that the possibilities are endless. A downside to disabling CPB is that if you aren’t folding all the time, your processor will be locked at its base clock rate, and thus your single-threaded performance will suffer. This is where things like PBO come in. PBO = Precision Boost Overdrive. This is yet another layer on top of CPB to fine-tune the overclocking while allowing the system to run in automatic mode (thus adapting to the loads that the computer sees). Typically, people use PBO to let the system sustain higher clock rates than standard CPB would allow. However, PBO also allows a user to enter in power, thermal, and voltage targets. Theoretically, it should be possible to set up the system to allow frequency scaling for low CPU core counts but to pull down the power limit for high core-counts, thus giving a boost to lightly threaded jobs while maintaining high core count efficiency. This is something I plan to investigate, although getting comparable results to this set of plots is going to be hard due to the prevalence of the new AVX2 enabled work units.

Maybe I’ll just have to do it all over again with the new work units? Sigh…

Power Supply Efficiency: Let’s Save Some Money

A while ago, I wrote a pair of articles on why it’s important to consider the energy efficiency of your computer’s power supply. Those articles showed how maximizing the efficiency of your Power Supply Unit (PSU) can actually save you money, since less electricity is wasted as heat with efficient power supplies.

Efficient Power Supplies: Part 1

Energy Efficient Power Supplies: Part 2

In this article, I’m putting this into practice, because the PSU in my Ubuntu folding box (Codenamed “Voyager”) is on the fritz.

This PSU is a basic Seasonic S12 III, which is a surprisingly bad power supply for such a good company as Seasonic. For one, it uses a group regulated design, which is inherently less efficient than the more modern DC-DC units. Also, the S12 is prone to coil whine (mine makes tons of noise even when the power supply is off). Finally, in my case, the computer puts a bunch of feedback onto the electrical circuits in my house, causing my LED lights to flicker when I’m running Folding@Home. That’s no good at all! Shame on you, Seasonic, shame!

Don’t believe me on how bad this PSU is? Read reviews here:

https://www.newegg.com/seasonic-s12iii-bronze-series-ssr-500gb3-500w/p/N82E16817151226

Now, I love Seasonic in general. They are one of the leading PSU manufactures, and I use their high-end units in all of my machines. So, to replace the S12iii, I picked up one of their midrange PSU’s in the Focus line…specifically, the Focus Gold 450. I got a sweet deal on eBay (got a used one for about $40, MSRP new on the SSR-450FM is $80).

SSR-450M Ebay Purchase Price

Here they are side by side. One immediate advantage of the new Focus PSU is that it is semi-modular, which will help me with some cable clutter.

Seasonic PSU Comparison: Focus Gold 450W (left) vs S12iii 500W (right)

Seasonic PSU Comparison: Focus Gold 450W (left) vs S12iii 500W (right)

Inspecting the specification labels also shows a few differences…namely the Focus is a bit less powerful (three less amps on the +12v rail), which isn’t a big deal for Voyager, since it is only running a single GeForce 1070 Ti card (180 Watt TDP) and an AMD A10-7700K (95 Watt TDP). Another point worth noting is the efficiency…whereas the S12iii is certified to the 80+ Bronze standard, the new Focus unit is certified as 80+ Gold.

 

 

 

 

Now this is where things get interesting. Voyager has a theoretical power draw of about 300 Watts max (180 Watts for the video card, 95 for the CPU, and about 25 Watts for the motherboard, ram, and drives combined). This is right around the 60% capacity rating of these power supplies. Here is the efficiency scorecard for the various 80+ certifications:

80+ Table

80+ Efficiency Table

As you can see, there is about a 5% improvement in efficiency going from 80+ bronze to 80+ gold. For a 300 watt machine, that would equate to 15 watts of difference between the Focus and the S12iii PSU’s. By upgrading to the Focus, I should more effectively turn the 120V AC power from my wall into 12V DC to run my computer, resulting in less total power draw from the wall (and less waste heat into my room).

I tested it out, using Stanford’s Folding@Home distributed computing project of course! Might as well cure some cancer, you know!

The Test

To do this test, I first let Voyager pull down a set of work units from Stanford’s server (GPU + CPU folding slots enabled). When the computer was in the middle of number crunching, I took a look at the instantaneous power consumption as measured by my watt meter:

Voyager_Old_PSU_Peak

80+ Bronze PSU: 259.1 Watts @ Full Load

260 Watts is about the max I ever see Voyager draw in practice, since Folding@Home never fully loads the hardware (typically it can hit the GFX card for about 90% capacity). So, this result made perfect sense. Next, I shut the machine down with the work units half-finished and swapped out the 80+ Bronze S12iii for the 80+ Gold Focus unit. I turned the machine back on and let it get right back to doing science.

Here is the updated power consumption number with the more efficient power supply.

Voyager_New_PSU_Peak

80+ Gold PSU Power Consumption @ 100% Load

As you can see, the 80+ Gold Rated power supply shaved 11.8 watts off the top. This is about 4.5% of the old PSU unit’s previous draw, and it is about 4.8% of the new PSU unit’s power draw. So, it is very close to the advertised 5% efficiency improvement one would expect per the 80+ specifications. Conclusion: I’m saving electricity and the planet! Yay! 

As a side note, all the weird coil whine and light flickering issues I was having with the S12iii went away when I switched to Seasonic’s better Focus PSU.

But, Was It Worth It?

Now, as an environmentalist, I would say that this type of power savings is of course worth it, because it’s that much less energy wasted and that much less pollution. But, we are really talking about just a few watts (albeit on a machine that is trying to cure cancer 24/7 for years on end).

To get a better understanding of the financial implications of my $40 upgrade, I did a quick calc in Excel, using Connecticut’s average price of electricity as provided by Eversource ($0.18 per KWH).

Voyager PSU Efficiency Upgrade Calc

Voyager PSU Efficiency Upgrade Calc

Performing this calculation is fairly straightforward. Basically, it’s just taking the difference in wattage between the two power supply units and turning that into energy by multiplying it by one year’s worth of run time (Energy = Power * Time). Then, I multiply that out by the cost of energy to get a yearly cost savings of about $20 bucks. That’s not bad! Basically, I could pay for my PSU upgrade in two years if I run the machine constantly.

Things get better if I sell the old PSU. Getting $20 for a Seasonic anything should be easly (ignoring the moral dilemma of sticking someone with a shitty power supply that whines and makes their lights flicker). Then, I’d recoup my investment in a year, all while saving the planet!

So, from my perspective as someone who runs the computer 24/7, this power supply efficiency upgrade makes a lot of sense. It might not make as much sense for people whose computers are off for most of the day, or for computers that just sit around idle, because then it would take a lot longer to recover the costs.

P.S. Now when I pop the side panel off Voyager, I am reminded to focus…

Voyager New PSU

Folding@Home: Nvidia GTX 1080 Review Part 3: Memory Speed

In the last article, I investigated how the power limit setting on an Nvidia Geforce GTX 1080 graphics card could affect the card’s performance and efficiency for doing charitable disease research in the Folding@Home distributed computing project. The conclusion was that a power limit of 60% offers only a slight reduction in raw performance (Points Per Day), but a large boost in energy efficiency (PPD/Watt). Two articles ago, I looked at the effect of GPU core clock. In this article, I’m experimenting with a different variable. Namely, the memory clock rate.

The effect of memory clock rate on video games is well defined. Gamers looking for the highest frame rates typically overclock both their graphics GPU and Memory speeds, and see benefits from both. For computation projects like Stanford University’s Folding@Home, the results aren’t as clear. I’ve seen arguments made both ways in the hardware forums. The intent of this article is to simply add another data point, albeit with a bit more scientific rigor.

The Test

To conduct this experiment, I ran the Folding@Home V7 GPU client for a minimum of 3 days continuously on my Windows 10 test computer. Folding@Home points per day (PPD) numbers were taken from Stanford’s Servers via the helpful team at https://folding.extremeoverclocking.com.  I measured total system power consumption at the wall with my P3 Kill A Watt meter. I used the meter’s KWH function to capture the total energy consumed, and divided out by the time the computer was on in order to get an average wattage value (thus eliminating a lot of variability). The test computer specs are as follows:

Test Setup Specs

  • Case: Raidmax Sagitta
  • CPU: AMD FX-8320e
  • Mainboard : Gigabyte GA-880GMA-USB3
  • GPU: Asus GeForce 1080 Turbo
  • Ram: 16 GB DDR3L (low voltage)
  • Power Supply: Seasonic X-650 80+ Gold
  • Drives: 1x SSD, 2 x 7200 RPM HDDs, Blu-Ray Burner
  • Fans: 1x CPU, 2 x 120 mm intake, 1 x 120 mm exhaust, 1 x 80 mm exhaust
  • OS: Win10 64 bit
  • Video Card Driver Version: 372.90

I ran this test with the memory clock rate at the stock clock for the P2 power state (4500 MHz), along with the gaming clock rate of 5000 MHz and a reduced clock rate of 4000 MHz. This gives me three data points of comparison. I left the GPU core clock at +175 MHz (the optimum setting from my first article on the 1080 GTX) and the power limit at 100%, to ensure I had headroom to move the memory clock without affecting the core clock. I verified I wasn’t hitting the power limit in MSI Afterburner.

*Update. Some people may ask why I didn’t go beyond the standard P0 gaming memory clock rate of 5000 MHz (same thing as 10,000 MHz double data rate, which is the card’s advertised memory clock). Basically, I didn’t want to get into the territory where the GDDR5’s error checking comes into play. If you push the memory too hard, there can be errors in the computation but work units can still complete (unlike a GPU core overclock, where work units will fail due to errors). The reason is the built-in error checking on the card memory, which corrects errors as they come up but results in reduced performance. By staying away from 5000+ MHz territory on the memory, I can ensure the relationship between performance and memory clock rate is not affected by memory error correction.

1080 Memory Boost Example

Memory Overclocking Performed in MSI Afterburner

Tabular Results

I put together a table of results in order to show how the averaging was done, and the # of work units backing up my +500 MHz and -500 MHz data points. Having a bunch of work units is key, because there is significant variability in PPD and power consumption numbers between work units. Note that the performance and efficiency numbers for the baseline memory speed (+0 MHz, aka 4500 MHz) come from my extended testing baseline for the 1080 and have even more sample points.

Geforce 1080 PPD Production - Ram Study

Nvidia GTX 1080 Folding@Home Production History: Data shows increased performance with a higher memory speed

Graphic Results

The following graphs show the PPD, Power Consumption, and Efficiency curves as a function of graphics card memory speed. Since I had three points of data, I was able to do a simple three-point-curve linear trendline fit. The R-squared value of the trendline shows how well the data points represent a linear relationship (higher is better, with 1 being ideal). Note that for the power consumption, the card seems to have used more power with a lower memory clock rate than the baseline memory clock. I am not sure why this is…however, the difference is so small that it is likely due to work unit variability or background tasks running on the computer. One could even argue that all of the power consumption results are suspect, since the changes are so small (on the order of 5-10 watts between data points).

Geforce 1080 Performance vs Ram Speed

Geforce 1080 Power vs Ram Speed

Geforce 1080 Efficiency vs Ram Speed

Conclusion

Increasing the memory speed of the Nvidia Geforce GTX 1080 results in a modest increase in PPD and efficiency, and arguably a slight increase in power consumption. The difference between the fastest (+500 MHz) and slowest (-500 MHz) data points I tested are:

PPD: +81K PPD (11.5%)

Power: +9.36 Watts (3.8%)

Efficiency: +212.8 PPD/Watt (7.4%)

Keep in mind that these are for a massive difference in ram speed (5000 MHz vs 4000 MHz).

Another way to look at these results is that underclocking the graphics card ram in hopes of improving efficiency doesn’t work (you’ll actually lose efficiency). I expect this trend will hold true for the rest of the Nvidia Pascal series of cards (GTX 10xx), although so far my testing of this has been limited to this one card, so your mileage may vary. Please post any insights if you have them.

Nvidia GeForce GTX 1070 Ti Folding@Home Review

In an effort to make as much use of the colder months in New England as I can, I’m running tons of Stanford University’s Folding@Home on my computer to do charitable science for disease research while heating my house. In the last article, I reviewed a slightly older AMD card, the RX 480, to determine its performance and efficiency running Folding@Home. Today, I’ll be taking a look at one of the favorite cards from Nvidia for both folding and gaming: The 1070 Ti.

The GeForce GTX 1070 Ti was released in November 2017, and sits between the 1070 and 1080 in terms of raw performance. As of February 2019, the 1070 Ti can be for a deep discount on the used market, now that the RTX 20xx series cards have been released. I got my Asus version on eBay for $250.

Based on Nvidia’s 14nm Pascal architecture, the 1070 Ti has 2432 CUDA cores and 8 GB of GDDR5 memory, with a memory bandwidth of 256 GB/s. The base clock rate of the GPU is 1607 MHz, although the cards automatically boost well past the advertised boost clock of 1683 Mhz. Thermal Design Power (TDP) is 180 Watts.

The 3rd party Asus card I got is nothing special. It appears to be a dual-slot reference design, and uses a blower cooler to exhaust hot air out the back of the case. It requires one supplemental 8-pin PCI-E Power connection.

IMG_20190206_185514342

ASUS GeForce GTX 1070 Ti

One thing I will note about this card is it’s length. At 10.5 inches (which is similar to many NVidia high-end cards), it can be a bit problematic to fit in some cases. I have a Raidmax Sagitta mid-tower case from way back in 2006, and it fits, but barely. I had the same problem with the EVGA GeForce 1070 I reviewed earlier.

IMG_20190206_190210910_TOP

ASUS GTX 1070 Ti – Installed.

Test Environment

Testing was done in Windows 10 on my AMD FX-based system, which is old but holds pretty well, all things considered. You can read more on that here. The system was built for both performance and efficiency, using AMD’s 8320e processor (a bit less power hungry than the other 8-core FX processors), a Seasonic 650 80+ Gold Power Supply, and 8 GB of low voltage DDR3 memory. The real key here, since I take all my power measurements at the wall with a P3 Kill-A-Watt meter, is that the system is the same for all of my tests.

The Folding@Home Client version is 7.5.1, running a single GPU slot with the following settings:

GPU Slot Options

GPU Slot Options for Maximum PPD

These settings tend to result in a slighter higher points per day (PPD), because they request large, advanced work units from Stanford.

Initial Test Results

Initial testing was done on one of the oldest drivers I could find to support the 1070 Ti (driver version 388.13). The thought here was that older drivers would have less gaming optimizations, which tend to hurt performance for compute jobs (unlike AMD, Nvidia doesn’t include a compute mode in their graphics driver settings).

Unfortunately, the best Nvidia driver for the non-Ti GTX 10xx cards (372.90) doesn’t work with the 1070 Ti, because the Ti version came out a few months later than the original cards. So, I was stuck with version 388.13.

Nvidia 1070 TI Baseline Clocks

Nvidia GTX 1070 Ti Monitoring – Baseline Clocks

I ran F@H for three days using the stock clock rate of 1823 MHz core, with the memory at 3802 MHz. Similar to what I found when testing the 1070, Folding@Home does not trigger the card to go into the high power (max performance) P0 state. Instead, it is stuck in the power-saving P2 state, so the core and memory clocks do not boost.

The PPD average for three days when folding at this rate was 632,380 PPD. Checking the Kill-A-Watt meter over the course of those days showed an approximate average system power consumption of 220 watts. Interestingly, this is less power draw than the GTX 1070 (which used 227 watts, although that was with overclocking + the more efficient 372.90 driver). The PPD average was also less than the GTX 1070, which had done about 640,000 PPD. Initial efficiency, in PPD/Watt, was thus 2875 (compared to the GTX 1070’s 2820 PPD/Watt).

The lower power consumption number and lower PPD performance score were a bit surprising, since the GTX 1070 TI has 512 more CUDA cores than the GTX 1070. However, in my previous review of the 1070, I had done a lot of optimization work, both with overclocking and with driver tuning. So, now it was time to do the same to the 1070 Ti.

Tuning the Card

By running UNIGINE’s Heaven video game benchmark in windowed mode, I was able to watch what the card did in MSI afterburner. The core clock boosted up to 1860 MHz (a modest increase from the 1823 base clock), and the memory went up to 4000 MHz (the default). I tried these overclocking settings and saw only a modest increase in PPD numbers. So, I decided to push it further, despite the Asus card having only a reference-style blower cooler. From my 1070 review, I found I was able to fold nice and stable with a core clock of 2012 MHz and a memory clock of 3802 MHz. So, I set up the GTX 1070 Ti with those same settings. After running it for five days, I pushed the core a little higher to 2050 Mhz. A few days later, I upgraded the driver to the latest (417.71).

Nvidia 1070 TI OC

Nvidia GTX 1070 Ti Monitoring – Overclocked

With these settings, I did have to increase the fan speed to keep the card below 70 degrees Celsius. Since the Asus card uses a blower cooler, it was a bit loud, but nothing too crazy. Open-air coolers with lots of heat pipes and multiple fans would probably let me push the card higher, but from what I’d read, people start running into stability problems at core clocks over 2100 Mhz. Since the goal of Folding@home is to produce reliable science to help Stanford University fight disease, I didn’t want to risk dropping a work unit due to an unstable overclock.

Here’s the production vs. time history from Stanford’s servers, courtesy of https://folding.extremeoverclocking.com/

Nvidia GTX 1070 Ti Time History

Nvidia GTX1070 Ti Folding@Home Production Time History

As you can see below, the overclock helped improve the performance of the GTX 1070 Ti. Using the last five days worth of data points (which has the graphics driver set to 417.71 and the 2050 MHz core overclock), I got an average PPD of 703,371 PPD with a power consumption at the wall of 225 Watts. This gives an overall system efficiency of 3126 PPD/Watt.

Finally, these results are starting to make more sense. Now, this card is outpacing the GTX 1070 in terms of both PPD and energy efficiency. However, the gain in performance isn’t enough to confidently say the card is doing better, since there is typically a +/- 10% PPD difference depending on what work unit the computer receives. This is clear from the amount of variability, or “hash”, in the time history plot.

Interestingly, the GTX 1070 Ti it is still using about the same amount of power as the base model GTX 1070, which has a Thermal Design Power of 150 Watts, compared to the GTX 1070 Ti’s TDP of 180 Watts. So, why isn’t my system consuming 30 watts more at the wall than it did when equipped with the base 1070?

I suspect the issue here is that the drivers available for the 1070 Ti are not as good for folding as the 372.90 driver for the non-Ti 10-series Nvidia cards. As you can see from the MSI Afterburner screen shots above, GPU Usage on the GTX 1070 Ti during folding hovers in the 80-90% range, which is lower than the 85-93% range seen when using the non-Ti GTX 1070. In short, folding on the 1070 Ti seems to be a bit handicapped by the drivers available in Windows.

Comparison to Similar Cards

Here are the Production and Efficiency Plots for comparison to other cards I’ve tested.

GTX 1070 Ti Performance Comparison

GTX 1070 Ti Performance Comparison

GTX 1070 Ti Efficiency Comparison

GTX 1070 Ti Efficiency Comparison

Conclusion

The Nvidia GTX 1070 Ti is a very good graphics card for running Folding@Home. With an average PPD of 703K and a system efficiency of 3126 PPD/Watt, it is the fastest and most efficient graphics card I’ve tested so far. As far as maximizing the amount of science done per electricity consumed, this card continues the trend…higher-end video cards are more efficient, despite the increased power draw.

One side note about the GTX 1070 Ti is that the drivers don’t seem as optimized as they could be. This is a known problem for running Folding@Home in Windows. But, since the proven Nvidia driver 372.90 is not available for the Ti-flavor of the 1070, the hit here is more than normal. On the used market in 2019, you can get a GTX 1070 for $200 on ebay, whereas the GTX 1070 Ti’s go for $250. My opinion is that if you’re going to fold in Windows, a tuned GTX 1070 running the 372.90 driver is the way to go.

Future Work

To fully unlock the capability of the GTX 1070 Ti, I realized I’m going to have to switch operating systems. Stay tuned for a follow-up article in Linux.

Folding on the NVidia GTX 1060

Overview

Folding@home is Stanford University’s charitable distributed computing project. It’s charitable because you can donate electricity, as converted into work through your home computer, to fight cancer, Alzheimer, and a host of other diseases.  It’s distributed, because anyone can run it with almost any desktop PC hardware.  But, not all hardware configurations are created equally.  If you’ve been following along, you know the point of this blog is to do the most work for as little power consumption as possible.  After all, electricity isn’t free, and killing the planet to cure cancer isn’t a very good trade-off.

Today we’re testing out Folding@home on EVGA’s single-fan version of the NVIDIA GTX 1060 graphics card.  This is an impressive little card in that it offers a lot of gaming performance in a small package.  This is a very popular graphics card for gamers who don’t want to spend $400+ on GTX 1070s and 1080s.  But, how well does it fold?

Card Specifications

Manufacturer:  EVGA
Model #:  06G-P4-6163
Model Name: EVGA GeForce GTX 1060 SC GAMING (Single Fan)
Max TDP: 120 Watts
Power:  1 x PCI Express 6-pin
GPU: 1280 CUDA Cores @ 1607 MHz (Boost Clock of 1835 MHz)
Memory: 6 GB GDDR5
Bus: PCI-Express X16 3.0
MSRP: $269

06G-P4-6163-KR_XL_4

EVGA Nvidia GeForce GTX 1060 (photo by EVGA)

Folding@Home Test Setup

For this test I used my normal desktop computer as the benchmark machine.  Testing was done using Stanford’s V7 client on Windows 7 64-bit running FAH Core 21 work units.  The video driver version used was 381.65.  All power consumption measurements were taken at the wall and are thus full system power consumption numbers.

If you’re interested in reading about the hardware configuration of my test rig, it is summarized in this post:

https://greenfoldingathome.com/2017/04/21/cpu-folding-revisited-amd-fx-8320e-8-core-cpu/

Information on my watt meter readings can be found here:

I Got a New Watt Meter!

FOLDING@HOME TEST RESULTS – 305K PPD AND 1650 PPD/WATT

The Nvidia GTX 1060 delivers the best Folding@Home performance and efficiency of all the hardware I’ve tested so far.  As seen in the screen shot below, the native F@H client has shown up to 330K PPD.  I ran the card for over a week and averaged the results as reported to Stanford to come up with the nominal 305K Points Per Day number.  I’m going to use 305 K PPD in the charts in order to be conservative.  The power draw at the wall was 185 watts, which is very reasonable, especially considering this graphics card is in an 8-core gaming rig with 16 GB of ram.  This results in a F@H efficiency of about 1650 PPD/Watt, which is very good.

Screen Shot from F@H V7 Client showing Estimated Points per Day:

1060 TI Client

Nvidia GTX 1060 Folding @ Home Results: Windows V7 Client

Here are the averaged results based on actual returned work units

(Graph courtesy of http://folding.extremeoverclocking.com/)

1060 GTX PPD History

NVidia 1060 GTX Folding PPD History

Note that in this plot, the reported results previous to the circled region are also from the 1060, but I didn’t have it running all the time.  The 305K PPD average is generated only from the work units returned within the time frame of the red circle (7/12 thru 7/21)

Production and Efficiency Plots

Nvidia 1060 PPD

NVidia GTX 1060 Folding@Home PPD Production Graph

Nvidia 1060 PPD per Watt

Nvidia GTX 1060 Folding@Home Efficiency Graph

Conclusion

For about $250 bucks (or $180 used if you get lucky on eBay), you can do some serious disease research by running Stanford University’s Folding@Home distributed computing project on the Nvidia GTX 1060 graphics card.  This card is a good middle ground in terms of price (it is the entry-level in NVidia’s current generation of GTX series of gaming cards).  Stepping up to a 1070 or 1080 will likely continue the trend of increased energy efficiency and performance, but these cards cost between $400 and $800.  The GTX 1060 reviewed here was still very impressive, and I’ll also point out that it runs my old video games at absolute max settings (Skyrim, Need for Speed Rivals).  Being a relatively small video card, it easily fits in a mid-tower ATX computer case, and only requires one supplemental PCI-Express power connector.  Doing over 300K PPD on only 185 watts, this Folding@home setup is both efficient and fast. For 2017, the NVidia 1060 is an excellent bang-for-the-buck Folding@home Graphics Card.

Request: Anyone want to loan me a 1070 or 1080 to test?  I’ll return it fully functional (I promise!)

F@H Efficiency: Overclock or Undervolt?

Efficiency Tweaking

After reading my last post about the AMD Phenom II X6 1100T’s performance and efficiency, you might be wondering if anything can be done to further improve this system’s energy efficiency.  The answer is yes, of course!  The 1100T is the top-end Phenom II processor, and is unlocked to allow tweaking to your heart’s content.  Normal people push these processors higher in frequency, which causes them to need more voltage and use more power.  While that is a valid tactic for gaining more raw points per day, I wondered if the extra points would be offset by a non-proportional increase in power consumption.  How is efficiency related to clock speed and voltage?  My aim here is to show you how you can improve your PPD/Watt by adjusting these settings.  By increasing the efficiency of your processor, you can reduce the guilt you feel about killing the planet with your cancer-fighting computer.  Note that the following method can be applied to any CPU/motherboard combo that allows you to adjust clock frequencies and voltages in the BIOS.  If you built your folding rig from scratch, you are in luck, because most custom PCs allow this sort of BIOS fun.  If you are using your dad’s stock Dell, you’re probably out of luck.

AMD Phenom II X6: Efficiency Improved through Undervolting

The baseline stats for the X6 Phenom 1100T are 3.3 GHz core speed with 2000 MHz HyperTransport and Northbridge clocks. This is achieved with the CPU operating at 1.375v, with a rated TDP (max power consumption) of 125 watts. Running the V7 Client in SMP-6 with my pass key, I saw roughly 12K ppd on A3 work units.  This is what was documented in my blog post from last time.

Now for the fun part.  Since this is a Black Edition processor from AMD, the voltages, base frequencies, and multipliers are all adjustable in the system BIOS (assuming your motherboard isn’t a piece of junk).  So, off I went to tweak the numbers.  I let the system “soak” at each setting in order to establish a consistent PPD baseline.  I got my PPD numbers by verifying what the client was reporting with the online statistics reporting.  Wattage numbers come from my trusty P3 Kill-A-Watt meter.

First, I tried overclocking the processor.  I upped the voltage as necessary to keep it stable (stable = folding overnight with no errors in F@H or my standard benchmark tests).  It was soon clear that from an efficiency standpoint, overclocking wasn’t really the way to go.  So, then I went the other way, and took a bit of clock speed and voltage out.

F@H Efficiency Curve: AMD Phenom II X6 1100T

F@H Efficiency Curve: AMD Phenom II X6 1100T

These results are very interesting.  Overclocking does indeed produce more points per day, but to go to higher frequencies required so much voltage that the power consumption went up even more, resulting in reduced efficiency.  However, a slight sacrifice of raw PPD performance allowed the 1100T to be stable at 1.225 volts, which caused a marked improvement in efficiency.  With a little more experimenting on the underclocking / undervolting side of things, I bet I could have got this CPU to almost 100 PPD / Watt!

Conclusion

PPD/Watt efficiency went up by about 30% for the Phenom II X6 1100T, just by tweaking some settings in the BIOS.  Optimizing core speed and voltage for efficiency should work for any CPU (or even graphics card, if your card has adjustable voltage).  If you care about the planet, try undervolting / underclocking your hardware slightly.  It will run cooler, quieter, and will likely last longer, in addition to doing more science for a given amount of electricity.

PPD/Watt Shootout: Uniprocessor Client is a Bad Idea

My Gaming / Folding computer with Q6600 / GTX 460 Installed

My Gaming / Folding computer with Q6600 / GTX 460 Installed

Since the dawn of Folding@Home, Stanford’s single-threaded CPU client known as “uniprocessor” has been the standard choice for stable folding@home installations.  For people who don’t want to tinker with many settings, and for people who don’t plan on running 24/7, this has been a good choice of clients because it allows a small science contribution to be done without very much hassle.  It’s a fairly invisible program that runs in the background and doesn’t spin up all your computer’s fans and heat up your room.  But, is it really efficient?  

The question, more specifically targeted for folding freaks reading this blog, is this:  Does the uniprocessor client make sense for an efficient 24/7 folding@home rig?  My answer:  a resounding NO!  Kill that process immediately!

A basic Google search on this will show that you can get vastly more points per day running the multicore client (SMP), a dedicated graphics card client (GPU), or both.  Just type “PPD Uniprocessor SMP Folding” into Google and read for about 20 minutes and you’ll get the idea.  I’m too lazy to point to any specific threads (no pun intended), but the various forum discussions reveal that the uniprocessor client is slower than slow.  This should not be surprising.  One CPU core is slower than two, which is slower than three!  Yay, math!

Also, Stanford’s point reward system isn’t linear but exponential.  If you return a work unit twice as fast, you get more than twice as many points as a reward, because prompt results are very valuable in the scientific world.  This bonus is known as the Quick Return Bonus, and it is available to users running with a passkey (a long auto-generated password that proves you are who you say you are to Stanford’s servers).  I won’t regurgitate all that info on passkeys and points here, because if you are reading this site then you most likely know it already.  If not, start by downloading Stanford’s latest all-in-one client known as Client V7.  Make sure you set yourself up with a username as well as a passkey, in case you didn’t have one.  Once you return 10 successful work units using your passkey, you can get the extra QRB points.  For the record, this is the setup I am using for this blog at the moment: V7 Client Version 7.3.6, running with passkey.

Unlike the older 6.x client interfaces, the new V7 client lets you pick the specific work package type you want to do within one program.  “Uniprocessor” is no longer a separate installation, but is selectable by adding a CPU slot within the V7 client and telling it how many threads to run.  V7 then downloads the correct work unit to munch on.

I thought I was talking efficiency!  Well, to that end, what we want to do is maximize the F@H output relative to the input.  We want to make as many Points per Day while drawing the fewest watts from the wall as possible.  It should be clear by now where this is going (I hope).  Because Stanford’s points system heavily favors the fast return of work units, it is often the case that the PPD/Watt increases as more and more CPU cores or GPU shaders are engaged, even though the resulting power draw of the computer increases.

Limiting ourselves to CPU-only folding for the moment, let’s have a look at what one of my Folding@Home rigs can do.  It’s Specs Time (Yay SPECS!). Here are the specs of my beloved gaming computer, known as Sagitta (outdated picture was up at the top).

  • Intel Q6600 Quad Core CPU @ 2.4 GHz
  • Gigabyte AMD Radeon HD 7870 Gigahertz Edition
  • 8 GB Kingston DDR2-800 Ram
  • Gigabyte 965-P S3 motherboard
  • Seasonic X-650 80+ Gold PSU
  • 2 x 500 GB Western Digital HDDs RAID-1
  • 2 x 120 MM Intake Fans
  • 1 x 120 MM Exhaust Fan
  • 1 x 80 MM Exhaust Fan
  • Arctic Cooling Freezer 7 CPU Cooler
  • Generic PCI Slot centrifugal exhaust fan

Ancient Pic of Sagitta (2006 Vintage).  I really need to take a new pic of the current configuration.

Ancient Pic of Sagitta (2006 Vintage). I really need to take a new pic of the current configuration.

You’ll probably say right away that this system, except for the graphics card, is pretty out of date for 2014, but for relative A to B comparisons within the V7 client this doesn’t matter.  For new I7 CPUs, the relative performance and efficiency differences seen by increasing the number of CPU cores for Folding reveals the same trend as will be shown here.  I’ll start by just looking at the 1-core option (uniprocessor) vs a dual-core F@H solve.

Uniprocessor Is Slow

As you can see, switching to a 2-CPU solve within the V7 client yields almost twice as many PPD (12.11 vs 6.82).  And, this isn’t even a fair comparison, because the dual-core work unit I received was one of the older A3 cores, which tend to produce less PPD than the A4 work units.

In conclusion, if everyone who is out there running the uniprocessor client switched to a dual-core client, FOLDING AT HOME WOULD BECOME TWICE AS EFFICIENT!  I can’t scream this loud enough.  Part of the reason for this is because it doesn’t take many more watts to feed another core in a computer that is already fired up and folding.  In the above example, we really started getting twice the amount of work done for only 13 more watts of power consumed.  THIS IS AWESOME, and it is just the beginning.  In the next article, I’ll look at the efficiency of 3 and 4 CPU Folding on the Q6600, as well as 6-CPU folding on my other computer, which is powered by a newer processor (AMD Phenom II X6 1100T). I’ll then move on to dual-CPU systems (non BIGADV at this point for those of you who know what this means, but we will get there too), and to graphics cards.  If you think 12 PPD/Watt is good, just wait until you read the next article!

Until next time…

-C

Energy Efficient Power Supplies: Part 2

A Seasonic 80+ Gold Modular Power Supply is the Perfect PSU for my Dual Opteron 4184 12-Core Server

A Seasonic 80+ Gold Modular Power Supply is the Perfect PSU for my Dual Opteron 4184 12-Core Server

The last post gave an overview of why efficiency matters for power supplies. This post is focused on how to pull this off in practice.  The 80+ (80 Plus) certification is an optional certification that power supply makers can get on their retail PSUs by submitting samples for testing at an independent lab. There are various levels of efficiency rankings within the standard, but any unit that achieves the basic 80+ rating can be considered efficient compared to the average 60-70% efficient PSUs of old.

80+ Efficiency Table

80+ Efficiency Table

For around the clock computer operation, you should get the most efficient unit possible, although the 80+ Platinum and Titanium units can be cost prohibitive.  My recommendation is to stick with an 80+ Gold unit, because they are significantly more efficient than most power supplies and can be obtained without first having to sell a kidney on the black market.  Note that the greatest efficiency can theoretically be achieved by selecting a power supply that has a rated maximum wattage of twice what your computer requires to run F@H full-blast.  For example, if your shiny new F@H rig requires 300 watts of power to run, getting an 80+ Gold PSU rated at 600 watts should guarantee you an excellent efficiency rating of 90%.  This is because power supplies tend to be most efficient at 50% of their rated maximum load.

For many power supplies you can find an efficiency curve that graphs out the unit’s efficiency vs. load, but to save yourself valuable time you might as well just buy a reputable power supply from a good manufacturer that has the 80+ Gold certification.  As with any computer part, read the user reviews before purchase to avoid a serious frowney face later.  JonnyGuru.com has some excellent power supply reviews, and they test their review samples in a much more grueling temperature environment than the 80+ standard requires. When buying from Newegg, just filter your PSU search by efficiency rating and then by user reviews to immediately find some good candidates.  My personal favorite is the Seasonic X-series of Gold-rated PSUs, although Antec, PC Power & Cooling, Thermaltake, Cooler Master, Corsair, and many others also make good units.  I have been using the Seasonic X-650 Gold, which is a great power supply for a bunch of reasons other than efficiency (modular cables, multiple PCI Express power connections, a smart fan, the latest ATX standard, great build quality, and so on until I’m blue in the face).  The Seasonic has reduced my desktop’s power consumption by over 32 watts at idle and 49 watts at load, compared to the Ultra X2 connect 500 watt PSU I had before.  I pitched the old one into the computer recycling bin at the local transfer station to make sure it stays out of service.  It made a nice sounding kerthunk, by the way.  (Random environmental tip: Most city dumps take recycle computer electronics for free, so take your old wasteful power supply as well as any of those nasty compact fluorescent mercury-ridden light bulbs to the dump for recycling instead of throwing them in the trash.)

Efficient Power Supplies: Part 1

Good morning!  This is an intro article…feel free to skip if you already know what efficiency means for power supplies.  Part 2 goes into detail of the 80 Plus standard and is likely a more enthralling read for you spec heads!

Let’s talk about the most important piece of hardware that a desktop computer can have…the power supply!  This little guy is responsible for electrifying all the goodies inside your computer.  Furthermore, a good power supply protects your computer from dirty power (voltage spikes, EMI ripple, power fluctuations, etc).  If you have ever read an article on custom desktop building, you probably know how crucial a good power supply is, as well as the consequences of using a cheap PSU.  Suffice it to say that, for the sake of your computer’s health, this is one area where you don’t want to skimp on cost.

There is one trait of quality power supplies that is often overlooked, and that is energy efficiency.  In a perfect world, a PSU would convert every watt of 120 V AC input power into usable DC power.  In reality a portion of the power is lost as heat.  The more efficient a power supply, the less energy it wastes as heat.  In other words, your computer simply draws fewer watts from the wall.

Having an efficient power supply is crucial for F@H contributors and non-folders alike, because it will make your computer less power hungry no matter what it is doing.  From gaming and graphics design to office work and Folding@Home, an efficient PSU will put a smile on your P3’s cute little face.  (If you don’t get the reference, please also read the previous post about Watt meters)

Before I go on, I should note that the target audience of this article is those who have built or are building their own custom desktop.  People with laptops or with name-brand consumer desktops are sometimes out of luck because the power supplies are often proprietary and can’t be upgraded.  However, it doesn’t hurt to find out from the manufacturer of your computer what the efficiency of your power supply is.  Some brands, such as Dell, HP, and Apple (among others) do have energy efficient power supplies of varying levels in their machines.

Cheap No-Name Brand Power Supply Unit that Came with a Case Bundle

Cheap No-Name Brand Power Supply Unit

If your power supply looks as lame as the one in the above pic, then it probably has an efficiency rating of 60 to 70 percent.  This means that if your computer parts need roughly 200 watts of power to run, your PSU might draw 250 watts or more from the wall in order to supply the 200 watts of DC power.  That extra 50+ watts is wasted as heat.

PC Power & Cooling SILENCER PSU

PC Power & Cooling SILENCER PSU

Seasonic SS-380GB PSU Installed

But, if your power supply looks like the one in Pic # 2 or #3, it might be closer to 80 or 90 percent efficient.  For that same 200 watt load, it is only drawing perhaps 220 watts from the wall.  The thirty watt difference might not seem like much, but for a Folding rig running 24/7 the wasted wattage of the el-cheapo unit adds up.  Let’s assume we are running a machine with the craptastic PSU.  To calculate the total extra energy wasted relative to the better PSU (remember, watts is a power quantity, which means energy/time), we need to multiply the wasted wattage by the amount of time the computer was in service to get an energy quantity in watt-hours.  So, 30 watts * 24 hours/day * 365 days/year = 262800 watt-hours.  Converting to kilowatt hours (dividing by 1000) gives 262.8 kWh.  Assuming an average electricity cost of ten cents per kWh, we get an annual cost of 262.8 * 0.10 $/kWh = $26.28.  Assuming the folding computer is running with that same power supply for 5 years (mine has been going for longer), that is over $125 wasted dollars, not to mention a slap in the face for poor planet Earth!  A good energy efficient PSU could have been bought for $40 in the first place to negate this wasted energy cost and lessen the environmental impact.

So how can you spot an efficient power supply unit?  Well, for that you can go by the independent test & certification program known as 80+.  I will cover this in detail in the next article, so that people who want to jump right into the specs and skip this intro can do so.