Category Archives: Computer Efficiency

Folding@Home Efficiency vs. GPU Power Limit

Folding@Home: The Need for Efficiency

Distributed computing projects like Stanford University’s Folding@Home sometimes get a bad rap on account of all the power that is consumed in the name of science.  Critics argue that any potential gains that are made in the area of disease research are offset by the environmental damage caused by thousands of computers sucking down electricity.

This blog hopes to find a balance by optimizing the way the computational research is done. In this article, I’m going to show how a simple setting in the graphics card driver can improve Folding@Home’s Energy Efficiency.

This blog uses an Nvidia graphics card, but the general idea should also work with AMD cards. The specific card here is an EVGA GeForce GTX 1060 (6 GB).  Green F@H Review here: Folding on the NVidia GTX 1060

If you are folding on a CPU, similar efficiency improvements can be achieved by optimizing the clock frequencies and voltages in the BIOS.  For an example on how to do this, see these posts:

F@H Efficiency: AMD Phenom X6 1100T

F@H Efficiency: Overclock or Undervolt?

(at this point in time I really just recommend folding on a GPU for optimum production and efficiency)

GPU Power Limit Overview

The GPU Power limit slider is a quick way to control how much power the graphics card is allowed to draw. Typically, graphics cards are optimized for speed, with efficiency a second goal (if at all). When a graphics card is pushed harder, it will draw more power (until it runs into the power limit). Today’s graphics cards will also boost their clock rate when loaded, and reduce it when the load goes away. Sometimes, a few extra MHz can be achieved for minimal extra power, but go too far and the amount of power needed to drive the card will grow exponentially. Sure the card is doing a bit more work (or playing a game a bit faster), but the heaps of extra power needed to do this are making it very inefficient.

What I’m going to quickly show is that going the other way (reducing power) can actually improve efficiency, albeit at a reduction of raw output. For  this quick test, I’m just going to look a the default power limit, 100%, vs 50%. Specific tuning is going to be dependent on your actual graphics card. But, with a few days at different settings, you should be able to find a happy balance between performance and efficiency.

For these plots, I used my watt meter to obtain actual power consumption at the wall. You can read about my watt meters here.

Changing the Power Limit

A tool such as MSI Afterburner can be used to view the graphics card’s settings, including the power limit. In the below screenshot, I reduced the card’s power limit by 50% midway through taking data. You can clearly see the power consumption and GPU temperature drop. This suggests the entire computer should be drawing less power from the wall. I confirmed this with my watt meter.

Adjust Power Limit MSI Afterburner

MSI Afterburner is used to reduce the graphics card’s power limit.

Effect on Results

I ran the card for multiple days at each power setting and used Stanford’s actual stats to generate an averaged number for PPD. Reporting an average number like this lends more confidence that the results are real, since PPD as reported in the client varies a lot with time, and PPD can bounce around by +/- 10 percent with different projects.

Below is the production time history plot, courtesy of https://folding.extremeoverclocking.com/. I marked on the plot the actual power consumption numbers I was seeing from my computer at the wall. As you can see, reducing the power limit on the 1060 from 100% to 50% saved about 40 watts of power at the wall.

GTX 1060 F@H Reduced Power Limit Production

GTX 1060 Folding@Home Performance at 100% and 50% Power

On the efficiency plot, you can see that reducing the power limit on the 1060 actually improved its efficiency slightly. This is a great way to fold more effectively.

Nvidia 1060 PPD per Watt Updated

NVidia GTX 1060 Folding@Home Efficiency Results

There is a downside of course, and that is in raw production. The Points Per Day plot below shows a pretty big reduction in PPD for the reduced power 1060, although it is still beating its little brother, the 1050 TI. One of the reasons PPD falls off so hard is that Stanford provides bonus points that are tied to how fast your computer can return a work unit. These points increase exponentially the faster your computer can do work. So, by slowing the card down, we not only lose on base points, but we lose on  the quick return bonus as well.

Nvidia 1060 PPD Updated

NVidia GTX 1060 Folding@Home Performance Results

Conclusion

Reducing the power limit on a graphics card can increase its computational energy efficiency in Folding@Home, although at the cost of raw PPD. There is probably a sweet spot for efficiency vs. performance at some power setting between 50% and 100%. This will likely be different for each graphics card. The process outlined above can be used for various power limit settings to find the best efficiency point.

 

Squeezing a few more PPD out of the FX-8320E

In the last post, the 8-core AMD FX-8320E was compared against the AMD Radeon 7970 in terms of both raw Folding@home computational performance and efficiency.  It lost, although it is the best processor I’ve tested so far.  It also turns out it is a very stable processor for overclocking.

Typical CPU overclocking focuses on raw performance only, and involves upping the clock frequency of the chip as well as the supplied voltage.  When tuning for efficiency, doing more work for the same (or less) power is what is desired.  In that frame of mind, I increased the clock rate of my FX-8320e without adjusting the voltage to try and find an improved efficiency point.

Overclocking Results

My FX-8320E proved to be very stable at stock voltage at frequencies up to 3.6 GHz.  By very stable, I mean running Folding@home at max load on all CPUs for over 24 hours with no crashes, while also using the computer for daily tasks.   This is a 400 MHz increase over the stock clock rate of 3.2 GHz.  As expected, F@H production went up a noticeable amount (over 3000 PPD).  Power consumption also increased slightly.  It turns out the efficiency was also slightly higher (190 PPD/watt vs. 185 PPD/watt).  So, overclocking was a success on all fronts.

FX 8320e overclock PPD

FX 8320e overclock efficiency

Folding Stats Table FX-8320e OC

Conclusion

As demonstrated with the AMD FX-8320e, mild overclocking can be a good way to earn more Points Per Day at a similar or greater efficiency than the stock clock rate.  Small tweaks like this to Folding@home systems, if applied everywhere, could result in more disease research being done more efficiently.

F@H Efficiency on Dell Inspiron 1545 Laptop

Laptops!  

When browsing internet forums looking for questions that people ask about F@H, I often see people asking if it is worth folding on laptops (note that I am talking about normal, battery-life optimized laptops, not Alienware gaming laptops / desktop replacements).  In general, the consensus from the community is that folding on laptops is a waste of time.  Well, that is true from a raw performance perspective.  Laptops, tablets, and other mobile devices are not the way to rise to the top of the Folding at Home leader boards.  They’re just too slow, due to the reduced clock speeds and voltages employed to maximize battery life.

But wait, didn’t you say that low voltage is good for efficiency?

I did, in the last article.  By undervolting and slightly underclocking the Phenom II X6 in a desktop computer, I was able to get close to 90 PPD/Watt while still doing an impressive twelve thousand PPD.

However, this raised the interesting question of what would happen if someone tried to fold on a computer that was optimized for low voltage, such as a laptop.  Lets find out!

Dell Inspiron 1545

Specs:

  • Intel T9600 Core 2 Duo
  • 8 GB DDR2 Ram
  • 250 GB spinning disk style HDD (5400 RPM, slow as molasses)
  • Intel Integrated HD Graphics (horrible for gaming, great for not using much extra electricity)
  • LCD Off during test  to reduce power

I did this test on my Dell Inspiron 1545, because it is what I had lying around.  It’s an older laptop that originally shipped with a slow socket P Intel Pentium dual core.  This 2.1 GHz chip was going to be so slow at folding that I decided to splurge and pick up a 2.8 GHz T9600 Core 2 Duo from Ebay for 25 bucks (can you believe this processor used to cost $400)?  This high end laptop processor has the same 35 watt TDP as the Pentium it is replacing, but has 6 times the total cache.  This is a dual core part that is roughly similar in architecture to the Q6600 I tested earlier, so one would expect the PPD and the efficiency to be close to the Q6600 when running on only 2 cores (albeit a bit higher due to the T9600’s higher clock speed).  I didn’t bother doing a test with the old laptop processor, because it would have been pretty bad (same power consumption but much slower).

After upgrading the processor (rather easy on this model of laptop, since there is a rear access panel that lets you get at everything), I ran this test in Windows 7 using the V7 client.  My computer picked up a nice A4 work unit and started munching away.  I made sure to use my passkey to ensure I get the quick return bonus.

Results:

The Intel T9600 laptop processor produced slightly more PPD than the similar Q6600 desktop processor when running on 2 cores (2235 PPD vs 1960 PPD). This is a decent production rate for a dual core, but it pales in comparison to the 6000K PPD of the Q6600 running with all 4 cores, or newer processors such as the AMD 1100T (over 12K PPD).

However, from an efficiency standpoint, the T9600 Core2 Duo blows away the desktop Core2 Quad by a lot, as seen in the chart and graph below.

Intel T9600 Folding@Home Efficiency

Intel T9600 Folding@Home Efficiency

Intel T9600 Folding@Home Efficiency vs. Intel Desktop Processors

Intel T9600 Folding@Home Efficiency vs. Desktop Processors

Conclusion

So, the people who say that laptops are slow are correct.  Compared to all the crazy desktop processors out there, a little dual core in a laptop isn’t going to do very many points per day.  Even modern quad cores laptops are fairly tame compared to their desktop brethren.  However, the efficiency numbers tell a different story.

Because everything from the motherboard, video card, audio circuit, hard drive, and processor are optimized for low voltage, the total system power consumption was only 39 watts (with the lid closed).  This meant that the 2235 PPD was enough to earn an efficiency score of 57.29 PPD/Watt.  This number beats all of the efficiency numbers from the most similar desktop processor tested so far (Q6600), even when the Q6600 is using all four cores.

So, laptops can be efficient F@H computers, even though they are not good at raw PPD production.  It should also be noted that during this experiment the little T9600 processor heated up to a whopping 67 degrees C. That’s really warm compared to the 40 degrees Celsius the Q6600 runs at in the desktop.  Over time, that heat load would probably break my poor laptop and give me an excuse to get that Alienware I’ve been wanting.  

F@H Efficiency: AMD Phenom X6 1100T

Welcome back to the fold!  In the last post, I showed how increasing the # of CPU cores has a massive positive impact on the amount of cancer-fighting research your computer does, as well as how efficiently it does it.  In stock form, the quad core Intel Q6600 delivered just shy of 6000 points per day of F@H with all 4 cores engaged.  My computer’s total power draw at the wall was 169 watts.  So, that works out to be 6000 PPD / 169 Watts = 35 PPD/Watt.  Not too bad, considering the horrible efficiency numbers of the uniprocessor client.

In this article, I’m jumping forward in time to a more modern processor…the AMD Phenom II X6 1000T.  This six-core beast is the last of the true core-for-core chips from AMD (Bulldozer and newer CPUs have 2 integer units but only 1 floating point unit per core).  With 6 physical floating point cores, the AMD 1100T should be good at folding.

Note that I am obviously using a completely different computer setup here than in the last post (I have an AMD machine and an Intel machine).  So, the efficiency numbers aren’t a perfect apples-to-apples comparison, due to the different supporting parts in both computers.  However, the difference between processors is so large that the differences in the host computers really doesn’t matter.  The newer AMD chip is much better, and that is what is driving the results!

Test Rig Specs:

AMD Phenom II X6 1100T
Gigabyte GA-880GMA-USB3 Micro ATX Motherboard
8 GB Kingston ValueRam DDR3 1333 MHz (4 x 2GB)
Seasonic S12 II 380W 80+ PSU
Hitachi 80 G SATA Hard Drive
Linkworld MicroATX
Fans: 2 x 80mm Side Intake, 1 x 80mm front intake, 1 x 92 mm Exhaust
Noctua NH-C12P SE14 140mm SSO CPU Cooler

A note about the operating system…

The previous tests on my Intel Q6600 were performed using Windows 7 with the V7 folding client.  Due to Windows costing money, I used Ubuntu Linux on my AMD system with the V7 folding client.  Linux is a bit more capable of maxing out a PC’s hardware than Windows, so the resulting PPD numbers are likely slightly higher than they would be had the machine been running Windows.  However, the difference is typically small (5 percent or so).  Note that over time, this performance bonus can really add up.  This is why Linux is the preferred operating system for many dedicated Folding at Home users.

AMD Folding Rig - Phenom II X6 Configuration

AMD Folding Rig – Phenom II X6 Configuration

Test Results

AMD Phemom II X6 1100T Folding at Home Performance and Efficiency

AMD Phemom II X6 1100T Folding at Home Performance and Efficiency

AMD 1100T 6-core CPU pushes the efficiency curve further

AMD 1100T 6-core CPU pushes the efficiency curve further

As expected, the 6-core 1100T is a performer when it comes to F@H.  Producing just shy of 13,000 Points Per Day with a total system power draw of 185 watts, this setup has an efficiency of 67 PPD/Watt.  This is almost twice that of the older Intel quad-cores.  Note that I am not Intel-bashing here…if you do some google searching, you will likely see that the new Intel Core I5 and I7’s do even better in both raw PPD and PPD/W than the AMD 1100T.  The moral of the story is that you should try and set up your folding Rig with the most powerful, latest-generation processor you can.  I recommend upgrading at least once a year to keep improving the performance and efficiency of your F@H contributions.  Don’t be that guy running an old-school Athlon X2 generation 300 points per day (while using 150 watts to do it).

Folding at Home CPU Efficiency: Multi-Core Intel Q6600

In the last post, I showed how environmentally unfriendly it is to run just the uniprocessor client.  In this post, I’ll finish off the study about # of CPU cores vs. folding efficiency.  As it turns out, you can virtually double your folding at home efficiency when you double the amount of CPU cores you are running with. Using the same Intel Q6600 as before, I told the Folding at Home client to ramp up and use three cores.  Then, once I had some data, I switched it to four-core folding.  With the CPU fully engaged, my computer became a bit slow to use, but that’s not a problem since what we are all about here is dedicated F@H Rigs (the only way to fold efficiently is to fold 100%).   If I want to use my computer, I’ll stop the folding to do so, then start it up later.

Here are the results of the 1 through 4 core F@H PPD experiment!

Q6600_Efficiency

As you can see, both performance (PPD) and energy efficiency (technically efficacy in PPD/Watt) scale with the # of CPU cores being used.  Yes, the system does use more total electricity when more cores are engaged (169 watts vs. 142), but the amount of work being done per day has far surpassed the slight increase in power consumption.  In graph form:

Intel Q6600 Folding@Home Points Per Day / Watt Graph

Intel Q6600 Folding at Home Efficiency Graph

Intel Q6600 Folding at Home Efficiency Graph

In conclusion, it makes the most sense from a performance and efficiency standpoint to use as much of your CPU as you can.  In the next post, I’ll look at a few more powerful CPU-based folding@home systems.

PPD/Watt Shootout: Uniprocessor Client is a Bad Idea

My Gaming / Folding computer with Q6600 / GTX 460 Installed

My Gaming / Folding computer with Q6600 / GTX 460 Installed

Since the dawn of Folding@Home, Stanford’s single-threaded CPU client known as “uniprocessor” has been the standard choice for stable folding@home installations.  For people who don’t want to tinker with many settings, and for people who don’t plan on running 24/7, this has been a good choice of clients because it allows a small science contribution to be done without very much hassle.  It’s a fairly invisible program that runs in the background and doesn’t spin up all your computer’s fans and heat up your room.  But, is it really efficient?  

The question, more specifically targeted for folding freaks reading this blog, is this:  Does the uniprocessor client make sense for an efficient 24/7 folding@home rig?  My answer:  a resounding NO!  Kill that process immediately!

A basic Google search on this will show that you can get vastly more points per day running the multicore client (SMP), a dedicated graphics card client (GPU), or both.  Just type “PPD Uniprocessor SMP Folding” into Google and read for about 20 minutes and you’ll get the idea.  I’m too lazy to point to any specific threads (no pun intended), but the various forum discussions reveal that the uniprocessor client is slower than slow.  This should not be surprising.  One CPU core is slower than two, which is slower than three!  Yay, math!

Also, Stanford’s point reward system isn’t linear but exponential.  If you return a work unit twice as fast, you get more than twice as many points as a reward, because prompt results are very valuable in the scientific world.  This bonus is known as the Quick Return Bonus, and it is available to users running with a passkey (a long auto-generated password that proves you are who you say you are to Stanford’s servers).  I won’t regurgitate all that info on passkeys and points here, because if you are reading this site then you most likely know it already.  If not, start by downloading Stanford’s latest all-in-one client known as Client V7.  Make sure you set yourself up with a username as well as a passkey, in case you didn’t have one.  Once you return 10 successful work units using your passkey, you can get the extra QRB points.  For the record, this is the setup I am using for this blog at the moment: V7 Client Version 7.3.6, running with passkey.

Unlike the older 6.x client interfaces, the new V7 client lets you pick the specific work package type you want to do within one program.  “Uniprocessor” is no longer a separate installation, but is selectable by adding a CPU slot within the V7 client and telling it how many threads to run.  V7 then downloads the correct work unit to munch on.

I thought I was talking efficiency!  Well, to that end, what we want to do is maximize the F@H output relative to the input.  We want to make as many Points per Day while drawing the fewest watts from the wall as possible.  It should be clear by now where this is going (I hope).  Because Stanford’s points system heavily favors the fast return of work units, it is often the case that the PPD/Watt increases as more and more CPU cores or GPU shaders are engaged, even though the resulting power draw of the computer increases.

Limiting ourselves to CPU-only folding for the moment, let’s have a look at what one of my Folding@Home rigs can do.  It’s Specs Time (Yay SPECS!). Here are the specs of my beloved gaming computer, known as Sagitta (outdated picture was up at the top).

  • Intel Q6600 Quad Core CPU @ 2.4 GHz
  • Gigabyte AMD Radeon HD 7870 Gigahertz Edition
  • 8 GB Kingston DDR2-800 Ram
  • Gigabyte 965-P S3 motherboard
  • Seasonic X-650 80+ Gold PSU
  • 2 x 500 GB Western Digital HDDs RAID-1
  • 2 x 120 MM Intake Fans
  • 1 x 120 MM Exhaust Fan
  • 1 x 80 MM Exhaust Fan
  • Arctic Cooling Freezer 7 CPU Cooler
  • Generic PCI Slot centrifugal exhaust fan
Ancient Pic of Sagitta (2006 Vintage).  I really need to take a new pic of the current configuration.

Ancient Pic of Sagitta (2006 Vintage). I really need to take a new pic of the current configuration.

You’ll probably say right away that this system, except for the graphics card, is pretty out of date for 2014, but for relative A to B comparisons within the V7 client this doesn’t matter.  For new I7 CPUs, the relative performance and efficiency differences seen by increasing the number of CPU cores for Folding reveals the same trend as will be shown here.  I’ll start by just looking at the 1-core option (uniprocessor) vs a dual-core F@H solve.

Uniprocessor Is Slow

As you can see, switching to a 2-CPU solve within the V7 client yields almost twice as many PPD (12.11 vs 6.82).  And, this isn’t even a fair comparison, because the dual-core work unit I received was one of the older A3 cores, which tend to produce less PPD than the A4 work units.

In conclusion, if everyone who is out there running the uniprocessor client switched to a dual-core client, FOLDING AT HOME WOULD BECOME TWICE AS EFFICIENT!  I can’t scream this loud enough.  Part of the reason for this is because it doesn’t take many more watts to feed another core in a computer that is already fired up and folding.  In the above example, we really started getting twice the amount of work done for only 13 more watts of power consumed.  THIS IS AWESOME, and it is just the beginning.  In the next article, I’ll look at the efficiency of 3 and 4 CPU Folding on the Q6600, as well as 6-CPU folding on my other computer, which is powered by a newer processor (AMD Phenom II X6 1100T). I’ll then move on to dual-CPU systems (non BIGADV at this point for those of you who know what this means, but we will get there too), and to graphics cards.  If you think 12 PPD/Watt is good, just wait until you read the next article!

Until next time…

-C

Energy Efficient Power Supplies: Part 2

A Seasonic 80+ Gold Modular Power Supply is the Perfect PSU for my Dual Opteron 4184 12-Core Server

A Seasonic 80+ Gold Modular Power Supply is the Perfect PSU for my Dual Opteron 4184 12-Core Server

The last post gave an overview of why efficiency matters for power supplies. This post is focused on how to pull this off in practice.  The 80+ (80 Plus) certification is an optional certification that power supply makers can get on their retail PSUs by submitting samples for testing at an independent lab. There are various levels of efficiency rankings within the standard, but any unit that achieves the basic 80+ rating can be considered efficient compared to the average 60-70% efficient PSUs of old.

80+ Efficiency Table

80+ Efficiency Table

For around the clock computer operation, you should get the most efficient unit possible, although the 80+ Platinum and Titanium units can be cost prohibitive.  My recommendation is to stick with an 80+ Gold unit, because they are significantly more efficient than most power supplies and can be obtained without first having to sell a kidney on the black market.  Note that the greatest efficiency can theoretically be achieved by selecting a power supply that has a rated maximum wattage of twice what your computer requires to run F@H full-blast.  For example, if your shiny new F@H rig requires 300 watts of power to run, getting an 80+ Gold PSU rated at 600 watts should guarantee you an excellent efficiency rating of 90%.  This is because power supplies tend to be most efficient at 50% of their rated maximum load.

For many power supplies you can find an efficiency curve that graphs out the unit’s efficiency vs. load, but to save yourself valuable time you might as well just buy a reputable power supply from a good manufacturer that has the 80+ Gold certification.  As with any computer part, read the user reviews before purchase to avoid a serious frowney face later.  JonnyGuru.com has some excellent power supply reviews, and they test their review samples in a much more grueling temperature environment than the 80+ standard requires. When buying from Newegg, just filter your PSU search by efficiency rating and then by user reviews to immediately find some good candidates.  My personal favorite is the Seasonic X-series of Gold-rated PSUs, although Antec, PC Power & Cooling, Thermaltake, Cooler Master, Corsair, and many others also make good units.  I have been using the Seasonic X-650 Gold, which is a great power supply for a bunch of reasons other than efficiency (modular cables, multiple PCI Express power connections, a smart fan, the latest ATX standard, great build quality, and so on until I’m blue in the face).  The Seasonic has reduced my desktop’s power consumption by over 32 watts at idle and 49 watts at load, compared to the Ultra X2 connect 500 watt PSU I had before.  I pitched the old one into the computer recycling bin at the local transfer station to make sure it stays out of service.  It made a nice sounding kerthunk, by the way.  (Random environmental tip: Most city dumps take recycle computer electronics for free, so take your old wasteful power supply as well as any of those nasty compact fluorescent mercury-ridden light bulbs to the dump for recycling instead of throwing them in the trash.)