Tag Archives: GPU

Folding@Home Efficiency vs. GPU Power Limit

Folding@Home: The Need for Efficiency

Distributed computing projects like Stanford University’s Folding@Home sometimes get a bad rap on account of all the power that is consumed in the name of science.  Critics argue that any potential gains that are made in the area of disease research are offset by the environmental damage caused by thousands of computers sucking down electricity.

This blog hopes to find a balance by optimizing the way the computational research is done. In this article, I’m going to show how a simple setting in the graphics card driver can improve Folding@Home’s Energy Efficiency.

This blog uses an Nvidia graphics card, but the general idea should also work with AMD cards. The specific card here is an EVGA GeForce GTX 1060 (6 GB).  Green F@H Review here: Folding on the NVidia GTX 1060

If you are folding on a CPU, similar efficiency improvements can be achieved by optimizing the clock frequencies and voltages in the BIOS.  For an example on how to do this, see these posts:

F@H Efficiency: AMD Phenom X6 1100T

F@H Efficiency: Overclock or Undervolt?

(at this point in time I really just recommend folding on a GPU for optimum production and efficiency)

GPU Power Limit Overview

The GPU Power limit slider is a quick way to control how much power the graphics card is allowed to draw. Typically, graphics cards are optimized for speed, with efficiency a second goal (if at all). When a graphics card is pushed harder, it will draw more power (until it runs into the power limit). Today’s graphics cards will also boost their clock rate when loaded, and reduce it when the load goes away. Sometimes, a few extra MHz can be achieved for minimal extra power, but go too far and the amount of power needed to drive the card will grow exponentially. Sure the card is doing a bit more work (or playing a game a bit faster), but the heaps of extra power needed to do this are making it very inefficient.

What I’m going to quickly show is that going the other way (reducing power) can actually improve efficiency, albeit at a reduction of raw output. For  this quick test, I’m just going to look a the default power limit, 100%, vs 50%. Specific tuning is going to be dependent on your actual graphics card. But, with a few days at different settings, you should be able to find a happy balance between performance and efficiency.

For these plots, I used my watt meter to obtain actual power consumption at the wall. You can read about my watt meters here.

Changing the Power Limit

A tool such as MSI Afterburner can be used to view the graphics card’s settings, including the power limit. In the below screenshot, I reduced the card’s power limit by 50% midway through taking data. You can clearly see the power consumption and GPU temperature drop. This suggests the entire computer should be drawing less power from the wall. I confirmed this with my watt meter.

Adjust Power Limit MSI Afterburner

MSI Afterburner is used to reduce the graphics card’s power limit.

Effect on Results

I ran the card for multiple days at each power setting and used Stanford’s actual stats to generate an averaged number for PPD. Reporting an average number like this lends more confidence that the results are real, since PPD as reported in the client varies a lot with time, and PPD can bounce around by +/- 10 percent with different projects.

Below is the production time history plot, courtesy of https://folding.extremeoverclocking.com/. I marked on the plot the actual power consumption numbers I was seeing from my computer at the wall. As you can see, reducing the power limit on the 1060 from 100% to 50% saved about 40 watts of power at the wall.

GTX 1060 F@H Reduced Power Limit Production

GTX 1060 Folding@Home Performance at 100% and 50% Power

On the efficiency plot, you can see that reducing the power limit on the 1060 actually improved its efficiency slightly. This is a great way to fold more effectively.

Nvidia 1060 PPD per Watt Updated

NVidia GTX 1060 Folding@Home Efficiency Results

There is a downside of course, and that is in raw production. The Points Per Day plot below shows a pretty big reduction in PPD for the reduced power 1060, although it is still beating its little brother, the 1050 TI. One of the reasons PPD falls off so hard is that Stanford provides bonus points that are tied to how fast your computer can return a work unit. These points increase exponentially the faster your computer can do work. So, by slowing the card down, we not only lose on base points, but we lose on  the quick return bonus as well.

Nvidia 1060 PPD Updated

NVidia GTX 1060 Folding@Home Performance Results

Conclusion

Reducing the power limit on a graphics card can increase its computational energy efficiency in Folding@Home, although at the cost of raw PPD. There is probably a sweet spot for efficiency vs. performance at some power setting between 50% and 100%. This will likely be different for each graphics card. The process outlined above can be used for various power limit settings to find the best efficiency point.

 

Advertisements

Folding on the Nvidia GTX 1070

Overview

Folding@home is Stanford University’s charitable distributed computing project. It’s charitable because you can donate electricity, as converted into work through your home computer, to fight cancer, Alzheimer’s, and a host of other diseases.  It’s distributed, because anyone can run it with almost any desktop PC hardware.  But, not all hardware configurations are created equally.  If you’ve been following along, you know the point of this blog is to do the most work for as little power consumption as possible.  After all, electricity isn’t free, and killing the planet to cure cancer isn’t a very good trade-off.

Today we’re testing out Folding@home on an EVGA NVIDIA GTX 1070 graphics card.  This card offers a big step up in gaming and compute horsepower compared to the 1060 I reviewed previously, and is capable of pushing solid frame rates at 4K resolution. So, how well does it fold?

Card Specifications (Nvidia Reference Specs)

1070 specs

Nvidia GTX 1070 Specifications

evga 1070 acx stock photo

EVGA Nvidia GTX 1070 ACX 3.0 (photo credit: EVGA)

FOLDING@HOME TEST SETUP

For this test I used my normal desktop computer as the benchmark machine.  Testing was done using Stanford’s V7 client on Windows 10 64-bit running FAH Core 21 work units.  The video driver version used was initially 388.59, and subsequently 372.90. Power consumption measurements reported in the charts were taken at the wall and are thus full system power consumption numbers.

If you’re interested in reading about the hardware configuration of my test rig, it is summarized in this post:

https://greenfoldingathome.com/2017/04/21/cpu-folding-revisited-amd-fx-8320e-8-core-cpu/

Information on my watt meter readings can be found here:

I Got a New Watt Meter!

Initial Testing and Troubleshooting

Like the GTX 1060, the 1070 uses Nvidia’s Pascal architecture, which is very efficient and has a reputation for solid compute performance. The 1070 has 50% more CUDA cores than the 1060, and with Folding@Home’s exponential points system (the quick return bonus gives you more points for doing work quickly), we should see roughly double the PPD of the 1060, which does 300 – 350 thousand PPD depending on the work unit. Based on various people’s experiences, and especially this forum post, I was expecting the 1070 to produce somewhere in the range of 600-700K PPD.

That wasn’t what happened. The card wasn’t exactly slow, but initial testing showed an estimated 450 to 550K PPD, as reported by the client. I ran it for a few days, since PPD can vary a good deal depending on the work unit, but the result was unfortunately the same. 550K PPD was about as much as my card would do.

initial_1070_results

Initial GTX 1070 Results – 544K PPD

At first I thought it might be due to the card running hot. Unlike my test of a brand new 1060, I obtained my 1070 used off of eBay for a great price of $200 dollars + shipping. It was a little dusty, so I blew it all out and fired up MSI Afterburner to check out the temps. Unfortunately, the fans on the card weren’t even breaking a sweat, and it was nice and cool. Points didn’t increase.

evga 1070 acx 3.0

My Used EVGA GTX 1070 ACX 3.0 – eBay Price: $200

initial 1070 afterburner report

MSI Afterburner Report: NVidia GTX 1070, Stock Clocks, Driver 388.59

After doing some more digging, I ran across a few threads online that indicated the 1070 (along with a few other GTX models) don’t always boost up to their maximum clock rates for compute loads. Opening up a video, or Folding@home’s protein viewer, can sometimes force the card to clock up. I tried this and didn’t have any luck. My card was running at the stock clocks, and in fact the memory even appeared to be running 200 Megahertz below the 4000 Mhz reference clock rate. This suggested the card was in a low-power mode.

Thankfully, Nvidia’s System Management Interface tool can be used to see what is going on. This tool, which in Windows 10 lives in C:\Program Files\Nvidia Corporation, can be accessed by the command line. I followed the tutorial here to learn a few things about what my 1070 was doing. Although that write-up is geared at people mining for cryptocurrency, the steps are still releveant.

As can be seen here, my card was in the “P2” state, which is not the high-performance “P0” state. This is why the card wasn’t boosting, and why the memory clock seems diminished.

1070 performance state

Nvidia 1070 Performance State

Another feature of the Nvidia System Management Interface is the ability to get the power consumption at the card. This is measured by the driver, using the card’s hardware, and is the total instantaneous power the card is consuming (PCI slot power + supplemental power connections). As you can see, in the P2 state, the card is very rarely nearing the 150 watt TDP.

Now, this doesn’t necessarily mean the card would get closer to 150 watts in the P0 state. F@H does not utilize every portion of the graphics card, and it is expected that the power consumption would not be right at the limit. Still, these numbers seemed a bit low to me.

1070 card-level power consumption (before tuning)

1070 card-level power consumption (before tuning)

Overclocking Manually to Approximate P0 State

Unlike what was suggested in that crypto mining article, I wasn’t able to use the NVSMI tool to force a P0 state. For some reason, my NVSMI tool wouldn’t show me the available clock rate settings for my 1070. However, manual overclocking with a program such as MSI Afterburner is really easy. By maxing out the power limit and setting the core clock to a higher value, I can basically make the card run at its boost frequency, or higher.

First, I set the power limit to the maximum allowed (112%). Don’t worry, this won’t hurt anything. It is limited in the driver to not cause any damage. Basically, this will allow the card to sip a bit more electricity (albeit at a reduction of efficiency). For a card that was in the P0 state (say, running a video game), this would allow higher boost clocks.

Next, I started upping the core clock in increments of 100 Mhz. I didn’t run into any stability problems, and settled in on a core clock of 2000 Mhz (factory clock is 1506 Mhz / 1683 boost). Note that that factory boost number is deceiving, since the latest drivers will crank the GPU core up past 1900 MHz if there is power and voltage headroom. From what I read, many people can run the 1070 stable at 2050 Mhz without adding voltage.

I decided not to boost the voltage, and to stay 50 Mhz below that supposedly stable number, because it’s not worth risking the stability of Folding@home. We want accurate, repeatable science! Plus, dropping work units is much worse for PPD than running slightly below a card’s maximum capability.

I experimented with clocking the memory up from 3800 MHz to 4000 MHz (note it’s double data rate so this equates to 8000 MHz as reported by some programs). This didn’t seem to affect results. F@H has historically been fairly insensitive to memory clocks, and boosting memory too much can cause slowdowns due to the error-checking routines having to work harder to ensure clean results. Basically, everyone says it’s not worth it. I ran it at 4000 MHz long enough to confirm this (a day), then throttled it back down to 3800 MHz. The benefit here will be more power available for the GPU cores, which is what really counts for folding.

Here are my final overclock numbers. The card has been running with these clocks for a week and a half non-stop, with no stability issues:

final 1070 afterburner report

Overclocked Settings: +160 MHz Core, 112% Power Limit

Note the driver version as shown in the updated Afterburner screen shot is different…as it turns out, this can have a huge effect on F@H PPD. More on that in a moment.

Overclocking Result: An Extra 50,000 PPD

Running the core at 2012 MHz (+160 MHz boost from the P2 power state) and upping the card’s power limit by 12% made the average PPD, as observed over two days, climb from 500-550K PPD to 550K-600K PPD. So, that’s a 50,000 PPD increase for minimal effort. But, something still seemed off. At the time I was still running driver version 388.59, and one of the things I had discovered when searching around for 1070 tuning tips is that not all drivers are created equal.

Nvidia Driver 372.90: The Best Folding Driver for the GTX 1070

Nvidia has been updating drivers with more and more emphasis on gaming optimizations and less on compute. So, it makes sense that older drivers might actually offer better compute performance. There are many threads in the Folding@Home Hardware Forum discussing this, and one driver version that keeps being mentioned is 372.90. It’s a bit tricky to keep it installed on Windows 10, since Windows is always trying to push a newer version, but for my 24/7 folding rig, I installed it and simply never rebooted it in order to get a week’s worth of data.

This driver change alone seemed to also offer a 50,000 point boost. After running various core 21 work units, the GTX 1070’s PPD has stayed between 630,000 and 660,000. This is normal variation between work units, and I feel confident reporting a final PPD of 640K. As I write this, the client is estimating 660K PPD.

final_1070_results

Nvidia GTX 1070: 660K PPD on Project 13815 (Core 21)

This is an excellent result. It’s twice the PPD of the GTX 1060, although eking out that last 100K PPD took a manual overclock plus a driver “update” to an older version.

Now, for the fun part. Efficiency! This 1070 is rated at 150 watts, which is only 30 watts more than the 1060. So we are supposedly doing 100% more science for Stanford University, and for a meager 25% increase in power consumption. Time to bust out the watt meter and find out!

Power Consumption at the Wall

Using my P3 Kill-A-Watt Power Meter, I measured the total system power consumption. This is the same way I measure all of my graphics cards (as opposed to estimating the card’s power by the TDP or using the video card driver to spit out instantaneous card power). The reason is that I like to have a full-system view, factoring in the power usage of my CPU, main board, and RAM, all essential components to keep the card happy.

While folding with the GTX 1070, my system’s total power draw varied between 225 and 230 watts. I’m going to go with 227 watts as the average power number. 

Efficiency

Computing computational efficiency as Points Per Day (PPD) / Power (Watts) gives:

640,000 PPD / 227 Watts = 2820 PPD/Watt.

Conclusion

The Nvidia GTX 1070 is a very efficient card for running Stanford’s Folding@Home Distributed Computing Project. The trend established in my previous articles seems to be continuing, namely that the more expensive high-end video cards are more efficient, despite their higher power draw. In this case of the 1070, some manual overclocking was needed to unlock the full PPD potential. As proven by many others, the default drivers weren’t very good, but the 372.90 drivers really opened it up.

Base PPD: 550,000

Tuned PPD (drivers + overclock) = 640,000

PPD/Watt(@wall) = 2820

1070 ppd plot

Nvidia GTX 1070 Performance Comparison

1070 efficiency plot

Nvidia 1070 Efficiency Comparison

As a final note, this post focused more on PPD than efficiency, since for much of the testing my watt meter was not installed (my kids keep playing with it). At some point in the future, I’ll do an article where I tune one of these cards to find the best efficiency point. This will likely be at a lower power limit than 100%, with perhaps a slight reduction in clock rate.

Folding@Home on the Nvidia GeForce GTX 1050 TI: Extended Testing

Hi again.  Last week, I looked at the performance and energy efficiency of using an Nvidia GeForce GTX 1050 TI to run Stanford’s charitable distributed computing project Folding@home.  The conclusion of that study was that the GTX 1050 TI offers very good Points Per Day (PPD) and PPD/Watt energy efficiency.  Now, after some more dedicated testing, I have a few more thoughts on this card.

Average Points Per Day

In the last article, I based the production and efficiency numbers on the estimated completion time of one work unit (Core 21), which resulted in a PPD of 192,000 and an efficiency of 1377 PPD/Watt.  To get a better number, I let the card complete four work units and report the results to Stanford’s collection server.  The end result was a real-world performance of 185K PPD and 1322 PPD/Watt (power consumption is unchanged at 140 watts @ the wall).  These are still very good numbers, and I’ve updated the charts accordingly.  It should be noted that this still only represents one day of folding, and I am suspicious that this PPD is still on the high end of what this card should produce as an average.  Thus, after this article is complete, I’ll be running some more work units to try and get a better average.

Folding While Doing Other Things

Unlike the AMD Radeon HD 7970 reviewed here, the Nvidia GTX 1050 TI doesn’t like folding while you do anything else on the machine.  To use the computer, we ended up pausing folding on multiple occasions to watch videos and browse the internet.  This results in a pretty big hit in the amount of disease-fighting science you can do, and it is evident in the PPD results.

Folding on a Reduced Power Setting

Finally, we went back to uninterrupted folding on the card, but at a reduced power setting (90%, set using MSI Afterburner).  This resulted in a 7 watt reduction of power consumption as measured at the wall (133 watts vs. 140 watts).  However, in order to produce this reduction in power, the graphics card’s clock speed is reduced, resulting in more than a performance hit.  The power settings can be seen here:

GTX 1050 Throttled

MSI Afterburner is used to reduce GPU Power Limit

Observing the estimated Folding@home PPD in the Windows V7 client shows what appears to be a massive reduction in PPD compared to previous testing.  However, since production is highly dependent on the individual projects and work units, this reduction in PPD should be taken with a grain of salt.

GTX 1050 V7 Throttled Performance

In order to get some more accurate results at the reduced power limit, we let the machine chug along uninterrupted for a week.  Here is the PPD production graph courtesy of http://folding.extremeoverclocking.com/

GTX 1050 Extended Performance Testing

Nvidia GTX 1050 TI Folding@Home Extended Performance Testing

It appears here that the 90% power setting has caused a significant reduction in PPD. However, this is based on having only one day’s worth of results (4 work units) for the 100% power case, as opposed to 19 work units worth of data for the 90% power case. More testing at 100% power should provide a better comparison.

Updated Charts (pending further baseline testing)

GTX 1050 PPD Underpowered

Nvidia GTX 1050 PPD Chart

GTX 1050 Efficiency Underpowered

Nvidia GTX 1050 TI Efficiency

As expected, you can contribute the most to Stanford’s Folding@home scientific disease research with a dedicated computer.  Pausing F@H to do other tasks, even for short periods, significantly reduces performance and efficiency.  Initial results seem to indicate that reducing the power limit of the graphics card significantly hurts performance and efficiency.  However, there still isn’t enough data to provide a detailed comparison, since the initial PPD numbers I tested on the GTX 1050 were based on the results of only 4 completed work units.  Further testing should help characterize the difference.