Folding@Home on GeForce RTX 3090 Review

Hi everyone, sorry for the delay in blog posts. Electricity in Connecticut has been so expensive lately that except for our winter heating Folding@Home cluster, it wasn’t affordable to keep running all those GPUs (even with our solar panels, which is really saying something). However, I did manage to get some good data on the top-tier Nvidia RTX 3090, which I got during COVID as the GPU in a prebuilt HP Omen gaming desktop. I transplanted the 3090 into my benchmark desktop, so these stats are comparable to previous cards I’ve tested.

Wait, what are we doing here?

For those just joining, this is a blog about optimizing computers for energy efficiency. I’m running Folding@Home, a distributed computing research project that uses your computer to help fight diseases such as cancer and covid and a host of other ailements. For more information, check out the project website here: https://foldingathome.org/

Look at this bad boy!

This is the HP OEM version of an RTX 3090. I was impressed that it had lots of copper heat pipes and a metal back plate. Overall this was a very solid card for an OEM offering.

HP OEM Nvidia RTX 3090 installed in my AMD Ryzen 9 3950X benchmark desktop

At the time of my testing, the RTX 3090 was the top-tier card from Nvidia’s new Ampere line. They have since released the 3090 Ti, which is ever so slightly faster. To give you an idea of where the RTX 3090 stacks compared to the previous cards I have tested, here is a table. Note that 350 watt TDP! That is a lot of power for this air cooler to dissipate.

The Test

I ran Folding@Home on my benchmark desktop in Windows 10, using Folding@Home client 7.6.13. I was immediately blown away by the insane Points Per Day (PPD) that the 3090 can spit out! Here’s a screen shot of the client, where the card was doing a very impressive 6.4 million PPD!

What was really interesting about the 3090 though was how much variation there was in performance depending on the size of the molecule being worked on. Very large molecules with high atom counts benefited greatly from the number of CUDA cores on this card, and it kicked butt in both raw performance (PPD) and effiency (PPD/Watt). Smaller molecules, however, did not fully utilize this card’s impressive potential. This resulted in a less efficiency and more wasted power. I would assume that running two smaller Ampere cards, for example the 3080, with small models would be more efficient than using the 3090 for small models, but I haven’t got any 3080’s to test that assumption (yet!).

In the plots below, you can see that the smaller model (89k atoms) resulted in a peak PPD of about 4 million, as opposed to the 7 million PPD with a 312k atom model. PPD/watt at 100% card power was also less efficient for the smaller model, coming in at about 16,500 PPD/Watt vs. 10,000 PPD/Watt. These are still great efficiency numbers, which shows how far GPU computing has come from previous generations.

Reduce GPU TDP Power Target to Improve Efficiency

I’ve previously shown how GPUs are set up for maximum performance out of the box, which makes sense for video gaming. However, if you are trying to maximize energy efficiency of your computational machines, reducing the power target of the GPU can result in massive efficiency gains. The GeForce RTX 3090 is a great example of this. When solving large models, this beast of a card benefits from throttling the power down, gaining 2.35% improved energy efficiency with a power target set for 85%. However, the huge improvement comes for solving smaller models. When running the 89k atom work unit, I got a whopping 29% efficiency improvement when setting the power target to 55% with only a 14% performance reduction! Since the F@H project gives out a lot of smaller work units in addition to some larger ones, I chose to run my machine at a 75% power target. On average, this splits the difference, and gives a noticeable efficiency improvement without sacrificing raw PPD performance too much. In the RTX 3090’s case, a 75% power target massively reduced the power draw on the computer (reduced wall consumption from 434 to 360 watts), as well as reduced heat and noise coming out of the chassis. This promotes a more happy office environment and a happier computer, that will last longer!

Tuning Results: 89K Atoms (Small Model)

Here are the tuning plots for a smaller molecule. In all cases, the X-axis is the power target, set in the Nvidia Driver. 100% corresponds to 350 Watts in the case of the RTX 3090.

Tuning Results: 312K Atoms (Large Model)

And here are the tuning results for a larger molecule.

Overall Results

Here are the comparison results to the previous hardware configurations I have tested. Note that now that the F@H client supports enabling CUDA, I did some tests with CUDA on vs. off with the RTX 2080 Ti and the 3090. Pro Tip: MAKE SURE CUDA IS ON! It really speeds things up and also improves energy efficiency.

Key takeaways from below is that the 3090 offers 50% more performance (PPD) than the 2080 Ti, and is almost 30% more energy efficient while doing it! Note this does not mean this card sips power…it actually uses more watts than any of the other cards I’ve tested. However, it does a lot more computation with those watts, so it is putting the electricity to better use. Thus, a data center or workstation can get through more work in a shorter amount of time with 3090s vs. other cards, and thus use less power overall to solve a given amount of work. This is better for the environment!

Nvidia RTX 3090 Folding@Home Performance (green bars) compared to other hardware configurations
Nvidia RTX 3090 Folding@Home Total System Power Consumption (green bars) compared to other hardware configurations
Nvidia RTX 3090 Folding@Home Energy Efficiency (green bars) compared to other hardware configurations.

Conclusion

The flagship Ampere architecture Nvidia GeForce RTX 3090 is an excellent card for compute applications. It does draw a ton of power, but this can be mitigated by reducing the power target in the driver to gain efficiency and reduce heat and noise. In the case of Folding@Home disease research, this card is a step change in both performance and energy efficiency, offering 50% more compute power and 30% more efficiency than the previous generation. I look forward to testing out other Ampere cards, as well as the new 40xx “Lovelace” architecture, if Eversource ever drops the electric rate back to normal levels in CT.

AMD Ryzen 9 3950x Part 4: Full Throttle Folding with CPB Overclocking and SMT

This is part four of my Folding@Home review for AMD’s top-tier desktop processor, the Ryzen 9 3950x 16-core CPU. Up until recently, this was AMD’s absolute beast-mode gaming and content creation desktop processor. If you happen to have one, or are looking for a good CPU to fight COVID and Cancer with, you’ve come to the right place.

Folding@Home is a distributed computing project where users can donate computational runtime on their home computers to fight diseases like Cancer, Alzheimer’s, Mad-Cow, and many others. For better or for worse, COVID-19 caused an explosion of F@H popularity, because the project was retooled to focus on understanding the coronavirus molecule to aid researches develop ways to fight it. This increase in users caused Folding@Home to become (once again) the most powerful supercomputer in the world. Of course this comes with a cost: namely, in the form of electricity. Most of my articles to date have focused on GPU folding. However, the point of this series of articles is to investigate how someone running CPU folding can optimize their settings to do the most work for the least amount of power, thus reducing their power bill and reducing the environmental impact of all this computing.

In the last part of this review, I investigated the differences seen between running Folding@Home with SMT (also known as Hyperthreading) on and off. The conclusion from that review was that performance does scale with virtual cores, and that the best science-fighting and energy efficiency is seen with 30 or 32 threads enabled on the CPU folding slot.

The previous testing was all performed with Core Performance Boost off. CPB is the AMD equivalent of Intel’s Turbo Boost, which is basically automatic, dynamic overclocking of the processor (both CPU frequency and voltage) based on the load on the chip. Keeping CPB turned off in previous testing resulted in all tests being run with the CPU frequency at the base 3.5 GHz.

In this final article, I enabled CPB to allow the Ryzen 9 3950x to scale its frequency and voltage based on the load and the available thermal and power headroom. Note that for this test, I used the default AMD settings in the BIOS of my Asus Prime X570-P motherboard, which is to say I did not enable Precision Boost Overdrive or any other setting to increase the automatic overclocking beyond the default power and thermal limits.

Test Setup

As with the other parts of this review, I used my new Folding@Home benchmark machine which was previously described in this post. The only tweaks to the computer since that post was written were the swap outs of a few 120mm fans for different models to improve cooling and noise. I also eliminated the 80 mm side intake fan, since all it did was disrupt the front-to-back airflow around the CPU and didn’t make any noticeable difference in temperatures. All of these cooling changes made less than a 2 watt difference in the machine’s idle performance (almost unmeasurable), so I’m not going to worry about correcting the comparison plots.

Because it’s been a while since I wrote about this, I figured I’d recap a few things from the previous posts. The current configuration of the machine is:

  • Case: Raidmax Sagitta
  • Power Supply: Seasonic Prime 750 Watt Titanium
  • Intake Cooling: 2 x 120mm fan (front)
  • Exhaust Cooling: 1 x 120 mm (rear) + PSU exhaust (top)
  • CPU Cooler: Noctua NH-D15 SE AM4
  • CPU: AMD Ryzen 9 3950x
  • Motherboard: Asus Prime X570-P
  • Memory: 32 GB Corsair Vengeance LPX DDR4 3600 MHz
  • GPU: Zotac Nvidia GeForce 1650 installed for CPU testing
  • OS Drive: Samsung 970 Evo Plus 512 GB NVME SSD
  • Storage Drive #1: Samsung 860 EVO 2TB SSD
  • Storage Drive #2: Western Digital Blue 128 GB NVME SSD
  • Optical Drive: Samsung SH-B123L Blu-Ray Drive
  • Operating System: Windows 10 Home

The Folding@Home software client used was version 7.6.13.

Test Methodology

The point of this testing is to identify the best settings for performance and energy efficiency when running Folding@Home on the Ryzen 3950x 16-core processor. To do this, I set the # of threads to a specific value between 1 and 32 and ran five work units. For each work unit, I recorded the instantaneous points per day (PPD) as reported in the client, as well as power consumption of the machine as reported on my P3 Kill A Watt meter. I repeated this 32 times, for a total of 160 tests. By running 5 tests at each nCPU setting, some of the work unit variability can be averaged out.

The Number of CPU threads can be set by editing the slot configuration

Folding@Home Performance: Ryzen 9 3950X

Folding@Home performance is measured in Points Per Day (PPD). This is the numbe that most people running the project are most interested in, as generating lots of PPD means your machine is doing a lot of good science to aid the researchers in their fight against diseases. The following plot shows the trend of Points Per Day vs. # of CPU threads engaged. The average work unit variation came out to being around 12%…this results in a pretty significant spread in performance between different work units at higher thread counts. As in the previous testing, I plotted a pair of boundary lines to capture the 95% confidence interval, meaning that assuming a Gaussian distribution of data points, 95% of the work units will perform between in this boundary region.

AMD Ryzen 9 3950X Folding@Home Performance: Core Performance Boost and Simultaneous Multi-Threading Enabled

As can be seen in the above plot, in general, the Folding@Home client’s Points Per Day production increases with increasing core count. As with the previous results, the initial performance improvement is fairly linear, but once the physical number of CPU cores is exceeded (16 in this case), the performance improvement drops off, only ramping up again when the core settings get into the mid 20’s. This is really strange behavior. I suspect it has something to do with how Windows 10 schedules logical process threads onto physical CPU cores, but more investigation is needed.

One thing that is different abut this test is that the Folding@Home consortium started releasing new work units based on the A8 core. These work units support the AVX2_256 instruction set, which allows some mathematical operations to be performed more efficiently on processors that support AVX2 (specifically, an add operation and a multiply operation can be performed at the same time). As you can see, the Core A8 work units, denoted by purple dots, fall far above the average performance and the 95% confidence interval lines. Although it is awesome that the Folding@Home developers are constantly improving the software to take advantages of improved hardware and computer programming, this influx of fancy work units really slowed my testing down! There were entire days when all I would get were core A8 units, when I really need core A7 units to compare to my previous testing. Sigh…such is the price of progress. Anyway, these work units were excluded from the 5-work unit averages composing each data point, since I want to be able to compare the average performance line to previous testing, which did not include these new work units.

As noted in my previous posts, some settings of the # of CPU threads result in the client defaulting to a lower thread count to prevent numerical problems that can arise for certain mathematical operations. For reference, the equivalent thread settings are shown in the table below:

Equivalent Thread Settings:

The Folding@Home Client Adjusts the Thread Count to Avoid Numerical Problems Arising with Prime Numbers and Multiples Thereof…

Folding@Home Power Consumption

Here is a much simpler plot. This is simply the power consumption as reported by my P3 Kill A Watt meter at the wall. This is total system power consumption. As expected, it increases with increasing core count. Since the instantaneous power the computer is using wobbles around a bit as the machine is working, I consider this to be an “eyeball averaged” plot, with an accuracy of about 5 watts.

AMD Ryzen 9 3950X Folding@Home Power Consumption: Core Performance Boost and Simultaneous Multi-Threading Enabled

As can be seen in the above plot, something interesting starts happening at higher thread counts: namely, the power consumption plateaus. This wasn’t seen in previous testing with Core Performance Boost set to off. Essentially, with CPB on, the machine is auto-overclocking itself within the factory defined thermal and power consumption limits. Eventually, with enough cores being engaged, a limit is reached.

Investigating what is happening with AMD’s Ryzen Master software is pretty enlightening. For example, consider the following three screen shots, taken during testing with 2, 6, and 16 threads engaged:

2 Thread Solve:

AMD Ryzen Master: Folding@Home CPU Folding, 2 Threads Engaged

6 Thread Solve

AMD Ryzen Master: Folding@Home CPU Folding, 6 Threads Engaged

16 Thread Solve

AMD Ryzen Master: Folding@Home CPU Folding, 16 Threads Engaged

First off, please notice that the temperate limit (first little dial indicator) is never hit during any test condition, thanks to the crazy cooling of the Noctua NH-D15 SE. Thus, we don’t have to worry about an insufficient thermal solution marring the test results.

Next, have a look at the second and third dial indicators. For the 2-core solve, the peak CPU speed is a blistering 4277 MHz! This is a factory overclock of 22% over the Ryzen 9 3950x’s base clock of 3500 MHz. This is Core Performance Boost in action! At this setting, with only 2 CPU cores engaged, the total package power (PPT) is showing 58% use, which means that there is plenty of electrical headroom to add more CPU cores. For the 6-core solve, the peak CPU speed has come down a bit to 4210 MHz, and the PPT has risen to 79% of the rated 142 watt maximum. What’s happening is the extra CPU cores are using more power, and the CPU is throttling those cores back a bit to keep everything stable. Still, there is plenty of headroom.

That story changes when you look at the plot for the 16-thread solve. Here, the peak clock rate has decreased to 4103 MHz and the total package power has hit the limit at 142 watts (a good deal beyond the 105 watt TDP of the 3950X!). This means that the Core Performance Boost setting has pushed the clocks and voltage as high as can be allowed under the default auto-overclocking limits of CPB. This power limit on the CPU is the reason the system’s wall power consumption plateaus at 208 watts.

If you’re wondering what makes up the difference between the 208 watts reported by my watt meter and the 142 watts reported by Ryzen Master, the answer is the rest of the system besides the CPU socket. In other words, the motherboard, memory, video card, fans, hard drives, optical drive, and the power supply’s efficiency.

Just for fun, here is the screen shot of Ryzen Master for the full 32-core solve!

AMD Ryzen Master: Folding@Home CPU Folding, 32 Threads Engaged

Here, we have an all-core peak frequency of 3855 MHz. Interestingly, the CPU temp and PPT have decreased slightly from the 16-core solve, even though the processor is theoretically working harder. What’s happening here is yet another limit has been reached. Look at the 6th dial indicator labeled ‘TDC’. This is a measure of the instantaneous peak current, in Amperes, being applied to the CPU. Apparently with 32 threads, this peak current limit of 95 amps is getting hit, so clock speed and voltage is reduced, resulting in a lower average socket power (PPT) than the 16-core solve.

Folding@Home Efficiency

Now for my favorite plot…Efficiency! Here, I am taking the average performance in PPD (excluding the newfangled A8 work units for now) and dividing it by the system’s wall power consumption. This provides a measure of how much work per unit of power (PPD/Watt) the computer is doing.

AMD Ryzen 9 3950X Folding@Home Efficiency: Core Performance Boost and Simultaneous Multi-Threading Enabled

This plot looks fairly similar to the performance plot. In general, throwing more CPU threads at the problem lets the computer do more work in a unit of time. Although higher thread counts consume more power than lower thread counts, the additional power use is offset by the massive amount of extra computational work being done. In short, effiency improves as thread count improves.

There is a noticeable dent in the curve however, from 15 to 23 threads. This is this interesting region where things get weird. As I mentioned before, I think what might be happening is some oddity in how Windows 10 schedules jobs once the physical number of CPU threads has been exceeded. I’m not 100% sure, but what I think Windows is doing is potentially juggling the threads around to keep a few physical CPU cores free (basically, it’s putting two threads on one CPU core, i.e. utilizing SMT, even when it doesn’t have to, in order to keep some CPU cores available for other tasks, such as using Windows). It isn’t until we get over 24 threads that Windows decides we are serious about running all these jobs, and reluctantly schedules the jobs out for pure performance.

I do have some evidence to back up this theory. Investigating what is going on with Ryzen Master with Folding@Home set to 20 threads is pretty telling.

AMD Ryzen Master: Folding@Home CPU Folding, 32 Threads Engaged

Since 20 threads exceeds the 16-core capacity of the processor, one would think all 16 cores would be spun up to max in order to get through this work as fast as possible. However, that is not the case. Only 12 cores are clocked up. Now, if you consider SMT, these 12 cores can handle 24 threads of computation. So, virtual cores are being used as well as physical cores to handle this 20-thread job. This obviously isn’t ideal from a performance or an efficiency standpoint, but it makes sense considering what Windows 10 is: a user’s operating system, not a high performance computing operating system. By keeping some physical CPU cores free when it can, Microsoft is hoping to ensure users a smooth computing experience.

Comparison to Previous Results

The above plots are fun and all, but the real juice is the comparison to the previous results. As a reminder, these were covered in detail in these posts:

SMT On, CPB Off

SMT Off, CPB Off

Performance Comparison

In the previous parts of this article, the difference between SMT (aka Hyperthreading) being on or off was shown to be negligible on the Ryzen 9 3950x in the physical core region (thread count = 16 or less). The major advantage of SMT was it allowed more solver threads to be piled on, which eventually results in increased performance and efficiency for thread counts above 25. In the plot below, the third curve basically shows what the effect of overclocking is. In this case, Core Performance Boost, AMD’s auto-overclocking routine, provides a fairly uniform 10-20 percent improvement. This diminishes for high core count settings though, becoming a nominal 5% improvement above 28 cores. It should be noted that the effects of work unit to work unit variation are still apparent, even with five averages per test case, so don’t try to draw any specific conclusions at any one thread count. Rather, just consider the overall trend.

AMD Ryzen 9 3950X Folding@Home Performance Comparison: Various Settings

Power Comparison

The power consumption plot shows a MASSIVE difference between wall power being used for the CPB testing vs the other two tests. This shouldn’t come as a surprise. Overclocking a processor’s frequency requires more voltage. Within a given transistor cycle, the Average Voltage * Average Current = Average Power, so for a constant current being supplied to the CPU socket, an increase in voltage increases the power being consumed. This is compounded by the transistor switching frequency going up as well (due to the increased frequency), which also results in a higher average power consumption due to there being more transistor switching activities occurring in a given unit of time.

In short, we are looking at a very noticable increase in your electrical bill to run Folding@Home on an overclocked machine.

AMD Ryzen 9 3950X Folding@Home Power Comparison: Various Settings

Efficiency Comparison

Efficiency is the whole point of this article and this blog, so behold! I’ve shown in previous articles both on CPUs and GPUs that overclocking typically hurts efficiency (and conversely, that underclocking and undervolting improves efficiency). The story doesn’t change with factory automatic overclocking routines like CPB. In the below, it is clear that and here we have a very strong case for disabling Core Performance Boost, since it is up to 25% less efficient when enabled.

AMD Ryzen 9 3950X Folding@Home Efficiency Comparison: Various Settings

Conclusion

The Ryzen 9 3950x is a very good processor for fighting disease with Folding@Home. The high core count produces exceptional efficiency numbers for a CPU, with a setting of 30 threads being ideal. Leaving 2 threads free for the rest of Windows 10 doesn’t seem to hurt performance or efficiency too much. Given the work unit variation, I’d say that 30 and 32 threads produce the same result on this processor.

As far as optimum settings, to get the most bang for electrical buck (i.e. efficiency), running that 30-thread CPU slot requires SMT to be enabled. Disabling CPB, which is on by default, results in a massive efficiency improvement by cutting over 50 watts off the power consumption. For a dedicated folding computer running 24/7, shaving that 50 watts off the electric bill would save 438 kWh/year of energy. In my state, that would save me $83 annually, and it would also save about 112 lbs of CO2 from being released into the atmosphere. Imagine the environmental impact if the 100,000+ computers running Folding@Home could each reduce their power consumption by 50 watts by just changing a setting!

Future Work

If there is one thing to be said about overclocking a Ryzen 3xxx-series processor, it’s that the possibilities are endless. A downside to disabling CPB is that if you aren’t folding all the time, your processor will be locked at its base clock rate, and thus your single-threaded performance will suffer. This is where things like PBO come in. PBO = Precision Boost Overdrive. This is yet another layer on top of CPB to fine-tune the overclocking while allowing the system to run in automatic mode (thus adapting to the loads that the computer sees). Typically, people use PBO to let the system sustain higher clock rates than standard CPB would allow. However, PBO also allows a user to enter in power, thermal, and voltage targets. Theoretically, it should be possible to set up the system to allow frequency scaling for low CPU core counts but to pull down the power limit for high core-counts, thus giving a boost to lightly threaded jobs while maintaining high core count efficiency. This is something I plan to investigate, although getting comparable results to this set of plots is going to be hard due to the prevalence of the new AVX2 enabled work units.

Maybe I’ll just have to do it all over again with the new work units? Sigh…

AMD Ryzen 9 3950X Folding@Home Review: Part 3: SMT (Hyperthreading)

Hi all. In my last post, I showed that the AMD Ryzen 9 3950x is quite a good processor for fighting diseases like Cancer, Alzheimer’s, and COVID-19. Folding@Home, the distributed computing project helping researchers understand various diseases, definitely makes good use of the 16 cores / 32 threads on the 3950x.

In this article, I’m taking a look at how virtualized CPU cores (Simultaneous Multithreading in AMD speak or Hyperthreading for you Intel fans) helps computational performance and efficiency when running Folding@Home on a high-end CPU such as the Ryzen 9 3950x.

Instead of regurgitating all of the previous information, here are some links to bring you up to speed if you haven’t read the previous posts.

Socket AM4 Benchmark Machine

AMD Ryzen 9 3950x Review: Part 1 (Overview)

AMD Ryzen 9 3950X Review: Part 2 (Average Results vs. # of Threads)

Test Setup

For this test, I used the same settings as in Part 2, except that I disabled SMT in the BIOS on my motherboard. Thus, Windows 10 will only see the 16 physical CPU cores, and will not be able to run two logical threads per CPU core. As before, I ran all testing using Folding@Home’s V7 client. I set the CPU slot configuration for a thread value of 1-16. At each setting, I ran five work units and averaged the results. Note that AMD’s core performance boost was turned off for all tests, so at all times the processor ran at 3.5 GHz.

Performance

As expected, as you throw more CPU cores at a problem, the computer can chew through the math faster. Thus, more science gets done in a given amount of time. In the case of Folding@Home, this performance is rated in terms of Points Per Day (PPD). The following plot shows the increase in computational performance as a function of # of threads utilized by the solver. Unlike in my previous testing on the 3950x, here an increase of 1 thread corresponds to an increase of 1 engaged CPU core, since virtual threads (SMT / Hyperthreading) are disabled.

The plot below includes the individual samples at each data point as light gray dots, as well as a + / – 2 sigma (95%) confidence interval. This means that 95% of the results for a given thread setting are statistically predicted to fall within the dashed lines.

AMD Ryzen 9 3950x Performance SMT Off

As a side note, certain settings of thread count actually result in the exact same performance, because the Folding@Home client is internally using a different number than the specified value. For example, setting the CPU slot to 5 threads will still result in a 4-thread solve, because the solver is avoiding the numerical issues that occur when trying to stitch the solution together with 5 threads (5 is a tricky prime number to work with numerically). I noted these regions on the plot. If you would like more detail about this, please read the previous part of this review (part 2).

One interesting observation is that the maximum performance occurs with 15 CPU cores enabled, not the complete 16! This is somewhat similar to what was observed in Part 2 of this review (SMT enabled), where 30 threads provided slightly more points than 32 threads. More on that in a moment…

Power Consumption

Using my P3 Kill A Watt Power Meter, I measured the power consumption of the entire computer at the wall. As expected, as you increase the number of CPUs engaged, the instantaneous power consumption goes up. The power numbers reported here are averaged by “the eyeball method”, since the actual instantaneous power goes up and down by a few watts as the computer does its thing. I’d estimate that these numbers are accurate within 5 watts.

AMD Ryzen 9 3950x Power Consumption SMT Off

Efficiency

The ultimate goal of this blog is to find the most efficient settings for computer hardware, so that we can do the most scientific research for a given amount of power consumption. Thus, this next plot is just performance (in PPD) divided by power consumption (in watts). I left off all the work unit variation and confidence interval lines, since it looks about the same as the performance plot, and it’s cleaner with just the one average line.

AMD Ryzen 9 3950x Efficiency SMT Off

As with performance, setting Folding@Home to use 15 CPUs instead of the full 16 is surprisingly the best option for efficiency. The difference is pretty profound here, as the processor used more power at 16 threads than at 15 threads while producing less points at 16 threads than at 15.

Comparison to Hyperthreaded Results

To get a better idea of what’s going on, here are the same three plots again with the average results overlaid on the previous results from when SMT was enabled. Of course the SMT results go up to 32 threads, since with virtual cores enabled, the 16-core Ryzen 9 3950x can support 32 total threads.

AMD Ryzen 9 3950x Performance SMT Off vs On

AMD Ryzen 9 3950X Performance: SMT Study

AMD Ryzen 9 3950x Power SMT Off vs On

AMD Ryzen 9 3950X Power Consumption: SMT Study

AMD Ryzen 9 3950x Efficiency SMT Off vs On

AMD Ryzen 9 3950X Efficiency: SMT Study

Conclusion

Disabling SMT (aka Hyperthreading) essentially limits the Ryzen 9 3950x to a maximum thread count of 16 (one thread per physical core). The results from 1-16 threads are very similar to those results obtained with SMT enabled. Due to work unit variation, the performance and efficiency plots show what I would say is effectively the same result with SMT on vs. off, up to 16 threads. One thing to note was that the power consumption in the 12-16 thread range did trend higher for the SMT off case, although the offset was small (about 5-10 watts). This is likely due to Windows scheduling work to a new physical core to handle the higher thread count when SMT is disabled, as opposed to virtualizing the work onto an already-running core using SMT. Ultimately, this slightly higher power consumption didn’t have a noticeable effect on the efficiency plot.

The big takeaway is that for thread counts above 16 (the physical core count), the Ryzen 9 3950x can utilize thread virtualization very well. The logical processors that Windows sees don’t work quite as well as true physical cores (hence the decrease in slope on the performance and efficiency plots above 16 CPUs). However, when the thread count is doubled, SMT still does allow the processor to eek out an extra 100K PPD (about 33% more) and run more efficiently than when it is limited to scheduling work to physical CPUs.

Pro Tip #1: Turn on Hyperthreading / SMT and run with high core counts to get the most out of Folding@Home!

The final observation worth noting is that in both cases, setting the F@H client to use the maximum available number of threads (16 for SMT off, 32 for SMT on) is not the fastest or most efficient setting. Backing the physical core count down to 15 (and, similarly, the SMT core count down to 30) results in the fastest and most efficient solver performance.

My theory is that by leaving one physical core free (one physical core = 2 threads with SMT on), the computer has enough spare capacity to run all the crap that Windows 10 does in the background. Thus, there is less competition for CPU resources, and everything just works better. The computer is also easier to use for other tasks when you don’t fully max out the CPU core count. This is also especially valuable for those people also trying to fold on a GPU while CPU folding (more on that in the next article).

Pro Tip #2: For high core count CPUs, don’t fold at 100% of your processor’s core capacity. Go right to the limit, and then back it off by a core.

Since you’re using SMT / Hyperthreading due to Pro Tip #1, this means setting the CPUs box in the client to 2 less than the maximum allowed. On my 16-core, 32-thread Ryzen 9 3950x, this means CPUs = 32 (theoretical max) – 2 (2 threads per core) = 30

CPU Slot Config

This result will be different on CPUs with different numbers of cores, so YMMV…I always recommend testing out your individual processor. For lower core count processors such as Intel’s quad core Q6600, running with the maximum number of cores offers the best performance. I previously showed this here.

Future Work

In the next article, I’m going to kick off folding on the GPU, an Nvidia GeForce 1650, which I previously tested by its lonesome here. In a CPU + GPU folding configuration, it’s important to make sure the CPU has enough resources free to “feed” the GPU, or else points will suffer.

I’ve also started re-running the thread tests with Core Performance Boost enabled. This allows the processor to scale up in frequency automatically based on the power and thermal headroom. This should significantly change the character of the SMT On and SMT Off plots, since everything up till now has been run at the stock speed of 3.5 GHz.

Support My Blog (please!)

If you are interested in measuring the power consumption of your own computer (or any device), please consider purchasing a P3 Kill A Watt Power Meter from Amazon. You’ll be surprised what a $35 investment in a watt meter can tell you about your home’s power usage, and if you make a few changes based on what you learn you will save money every year! Using this link won’t cost you anything extra, but will provide me with a small percentage of the sale to support the site hosting fees of GreenFolding@Home.

If you enjoyed this article, perhaps you are in the market for an AMD Ryzen 9 3950x or similar Ryzen processor. If so, please consider using one of the links below to buy one from Amazon. Thanks for reading!

AMD Ryzen 9 3950x Direct Link

AMD Ryzen (Amazon Search)

AMD Ryzen 9 3950X Folding@Home Review: Part 2: Averaging, Efficiency, and Variation

Welcome back everyone! In my last post, I used my rebuilt benchmark machine to revisit CPU folding on my AMD Ryzen 9 3950x 16-core processor. This article is a follow-on. As promised, this includes the companion power consumption and efficiency plots for thread settings of 1-32 cores. As a quick reminder, I did this test with multi-threading (SMT) on, but with Core Performance Boost disabled, so all cores are running at the base 3.5 GHz setting.

Performance

The Folding@Home distributed computing project has come a long way from its humble disease-fighting beginnings back in 2000. The purpose of this testing is to see just how well the V7 CPU client scales on a modern, high core-count processor. With all the new Folding@Home donors coming onboard to fight COVID, having some insight into how to set up the configuration for the most performance is hopefully helpful.

For this test, I simply set the # of threads the client can use to a value and ran five sequential work units. I averaged the performance (Points Per Day), but I also plot the individual work unit performance values to give you a sense of the variation. Since the Ryzen 9 3950x supports 32 threads, I essentially ran 160 tests. Since I wanted the Folding@Home Consortium to get useful data in their fight against COVID-19, I let each work unit run to completion, even though I only need them to run to about 10-20% complete to get an accurate PPD estimate from the client.

So, without further blabbing on my part, here is the graph of Folding@Home performance vs. thread count in Windows 10 on the Ryzen 9 3950x

Ryzen_3950x_Performance_SMT_Off_CPB_On

Here, the solid blue line is the averaged performance, and the gray circles are the individual tests. The dashed blue lines represent a statistical 95% confidence interval, which is computed based on the variation. The expected Points Per Day (PPD) of a work unit run on the 3950x is expected to fall within this band 95% of the time.

My first observation is, holy crap! This is a fast processor. Some work units at high thread counts get really close to 500K PPD, which for me has only been achievable by GPU folding up to this point.

My second observation is that there is a lot of variation between different work units. This makes sense, because some work units have much larger molecules to solve than others. In my testing, I found the average variation of all 160 tests to be 12.78%, with individual variance up to 25%.

My third observation is that there seems to be two different regions on this plot. For the first half, the thread count setting is less than the number of physical cores on the chip, and the results are fairly linear. For the second half, the thread count setting is higher than the number of physical cores on the chip (thus forcing the CPU to virtualize those cores using SMT). Performance seems to fall off when the CPU cores become fully saturated (threads = 16), and it takes a while to climb out of the hole (threads = 24 starts showing some more gains).

As a side note, the client does not actually run all of these thread count settings, since some prime numbers, especially large primes (7, 11) and multiples thereof cause numerical issues. For example, when you try to run a 7-thread solve, the client automatically backs the thread count down to 6. You can see warnings in the log file about this when it happens.

Prime Number Thread Adjust

I noted all the relevant thread counts where this happens on the x-axis of the plot. Theoretically, these should be equivalent settings. The fact that the average performance varies a bit between them is just due to work unit variation (I’d have to run hundreds of averages to cancel all the variation out).

Finally, I noticed that the highest PPD actually occurred with a thread count of 30 (PPD = 407200) vs a thread count of 32 (PPD = 401485). This is a small but interesting difference, and is within the range of statistical variation. Thus I would say that setting the thread count to 30 vs 32 provides the same performance, while leaving two CPU threads free for other tasks (such as GPU folding…more on that later!).

Power Consumption

Power consumption numbers for each thread setting were taken at the wall, using my P3 Kill A Watt meter. Since the power numbers tend to walk around a bit as the computer works, it’s hard to get an instantaneous reading. Thus these are “eyeball averaged”. There was enough change at each CPU thread setting to clearly see a difference (not counting those thread settings that are actually equivalent to an adjacent setting).

Ryzen_3950x_Power_SMT_Off_CPB_On

The total measured power consumption rose fairly linearly from just under 80 watts to just under 160 watts. There’s not too much surprising here. As you throw more threads at the CPU, it clocks up idle cores and does more work (which causes more transistors to switch, which thus takes more power). This seems pretty believable to me. At the high end, the system is drawing just under 160 watts of power. The AMD Ryzen 9 3950x is rated at a 105 watt TDP, and with CPB turned off it should be pretty close to this number. My rough back of the hand calculation for this rig was as follows:

  1. CPU Loaded Power = 105 Watts
  2. GPU Idle Power (Nvidia GTX 1650) = 10 Watts
  3. Motherboard Power = 15 Watts
  4. Ram Power = 2 watts * 4 sticks = 8 watts
  5. NVME Power = 2 watts * 2 drives = 4 watts
  6. SSD Power = 2 watts

Total Estimated Watts @ F@H CPU Load = 144 Watts

Factor in a boat load of case fans, some silly LED lights, and a bit of PSU efficiency hit (about 90% efficient for my Seasonic unit) and it’ll be close to the 160 watts as measured.

Efficiency

This being a blog about saving the planet while still doing science with computers, I am very interested in energy efficiency. For Folding@Home, this means at doing the most work (PPD) for the least amount of power (watts). So, this plot is just PPD/Watts. Easy!

Similar to the PPD plot, this efficiency plot averages five data points for each thread setting. I chose to leave off the individual points and the confidence interval, because that looks about the same on this plot as it does on the PPD plot, and leaving all the clutter off makes this easier to read.

Ryzen_3950x_Efficiency_SMT_Off_CPB_On

As with the PPD plot, there seem to be two regions on the efficiency curve. The first region (threads less than 16) shows a pretty good linear ramp-up in efficiency as more threads are added. The second region (threads 16 or greater) is what I’m calling the “core saturation” region. Here, there are more threads than physical cores, and efficiency stays relatively flat. It actually drops off at 16 cores (similar to the PPD plot), and doesn’t start improving again until 24 or more threads are allocated to the solver.

This plot, at first glance, suggests that the maximum efficiency is realized at # of threads = 30. However, it should be noted that work unit variation still has a lot of influence, even with reporting results of a 5-sample average. You can see this effect by looking at the efficiency drop at threads = 31. Theoretically, the efficiency should be the same at threads = 31 and threads = 30, because the solver runs a 30-thread solution even when set to 31 to prevent domain decomposition.

Thus, similar to the PPD plot, I’d say the max efficiency is effectively achieved at thread counts of 30 and 32. My personal opinion is that you might as well run with # of threads = 30 (leaving two threads free for other tasks). This setting results in the maximum PPD as well.

Weird Results at Threads = 16-23

Some of you might be wondering why the performance and efficiency drops off when the thread count is set to the actual number of cores (16) or higher. I was too, so I re-ran some tests and looked at what was happening with AMD’s built-in Ryzen Master tool. As you can see in the screen shot below, even though the # of threads was set to 18 in Folding@Home (a number greater than the 16 physical cores), not all 16 cores were fully engaged on the processor. In fact, only 14 were clocked up, and two were showing relatively lazy clock rates.

Two Cores are Lazy!

Folding@Home 18-Thread CPU Solve on 16-Core Processor

I suspect what is happening is that some of the threads were loaded onto “virtual” CPU cores (i.e. SMT / hyper threading). This might be something Windows 10 does to preserve a few free CPU cores for other tasks. In fact, I didn’t see all of the cores turbo up to full speed until I set Folding@Home’s thread count to 24. This incidentally is when performance starts coming back in on the plots above.

This weird SMT / Hyper-threading behavior is likely what is responsible for the large drop-off / flat part of the performance and efficiency curves that exists from thread count = 16 to 23. As you can see in the picture below, once you fully load all the available threads, the CPU frequencies on each core all hit the maximum value, as expected.

Ryzen_Master_32_Thread_Solve

Folding@Home 32-Thread CPU Solve on 16-Core Processor

Results Comparison

The following plots compare overall performance, power consumption, and efficiency of my new AMD Ryzen 9 3950x Folding@Home rig to other hardware configurations I have tested so far.

Performance

As you can see from the plot below, the Ryzen 9 3950x running a 32-thread Folding@Home solve can compete with relatively modern graphics cards in terms of raw performance. High-end GPUs will still offer more performance, but for a processor, getting over 400K PPD is very impressive. This is significantly more PPD than the previous processors I have tested (AMD Bulldozer-based FX-8320e, AMD Phenom II X6 1100t, Intel Core2Quad Q6600, etc). Admittedly I have not tested very many CPUs, since this is much more involved than just swapping out graphics cards to test.

AMD Ryzen 9 3950x Performance

Power Consumption

From a total system power consumption standpoint, my new benchmark machine with the AMD Ryzen 9 3950x has a surprisingly low total power draw when running Folding. Another interesting point is that since the 3950x lacks onboard graphics, I had to have a graphics card installed to get display. In my case, I had the Nvidia GTX 1650 installed, since this is a relatively low power consumption card that should provide minimal overhead. As you can see below, folding on the 3950x CPU (with the 1650 GPU idle) uses nearly the same amount of power as folding on the 1650 GPU (with the 3950x idle).

AMD Ryzen 9 3950x Power Consumption

Efficiency

Efficiency is the point of this blog, and in this respect the 3950x comes in towards the upper middle of the pack of hardware configurations I have tested. It’s definitely the most efficient processor I have tested so far, but graphics cards such as the 1660 Super and 1080 Ti are more efficient. Despite drawing more total power from the wall, these high-end GPUs do a lot more science.

Still, a PPD/Watt of over 2500 is not bad, and in this case the 3950x is more efficient than folding on the modest GPU installed in the same box (the Nvidia GTX 1650). Compared to the much older AMD FX-8320e, the Ryxen 9 3950x is 14x more efficient! What a difference 7 years can make!

AMD Ryzen 9 3950x Efficiency

Conclusion

The 16-core, 32-thread AMD Ryzen 9 3950x is one fast processor, and can do a lot of science for the Folding@Home distributed computing project. Although mid to high-end graphics cards such as the 1080 Ti ($450 on the used market) can outperform the $700 3950x in terms of performance and efficiency, it is still important to have a smattering of high-end CPU folding rigs on the Folding@Home network, because some molecules can only be solved on CPUs.

There is a general trend of increasing efficiency and performance as the # of CPU threads allocated to Folding@Home increases. For the Ryzen 9 3950x, using a setting of 30 or 32 threads is recommended for maximum performance and efficiency. If you plan on using your computer for other tasks, or for simultaneously folding on the GPU, 30 threads is the ideal CPU slot setting.

Please Support My Blog!

If you are interested in measuring the power consumption of your own computer (or any device), please consider purchasing a P3 Kill A Watt Power Meter from Amazon. You’ll be surprised what a $35 investment in a watt meter can tell you about your home’s power usage, and if you make a few changes based on what you learn you will save money every year! Using this link won’t cost you anything extra, but will provide me with a small percentage of the sale to support the site hosting fees of GreenFolding@Home.

If you enjoyed this article, perhaps you are in the market for an AMD Ryzen 9 3950x or similar Ryzen processor. If so, please consider using one of the links below to buy one from Amazon. Thanks for reading!

AMD Ryzen 9 3950x Direct Link

AMD Ryzen (Amazon Search)

Future Work

In the next article, I’ll disable multithreading (SMT) to see the effect of virtualized CPU cores on Folding@Home performance.

Later, I plan to enable core performance boost on the 3950x to see what effect the automatic clock frequency and voltage overclocking has on Folding@Home performance and efficiency.

 

 

How to Make a Folding@Home Space Heater (and why would you want to?)

My normal posts on this site are all about how to do as much science as possible with Folding@Home, for the least amount of power. This is because I think disease research, while a noble and essential cause, shouldn’t be done without respecting the environment.

With that said, I think there is a use case for a power-hungry, inefficient Folding@Home computer. Namely, as a space heater for those in colder climates.

The logic is this: Running Folding@Home, or any other piece of software, makes your computer do work. Electricity flows through the circuits, flipping tiny silicon switches, and producing heat in the process. Ultimately all of the energy that flows into your computer comes back out as heat (well, a small amount comes out as light, or electromagnetic radiation, or noise, but all of those can and do get converted back into heat as they strike things in the room).

Have you ever noticed how running your gaming computer with the door to your room closed makes your feet nice and toasty in the winter? It’s the same idea. Here, one of my high-performance rigs (dual NVidia 980 Ti GPUs) is silently humming away, putting off about 500 watts of pleasant heat. My son is investigating:

My Folding@Home Space Heater Experiment

Folding@Home uses CPUs and GPUs to run molecular dynamic models to help research understand and fight diseases. You get the most points per day (PPD) by using cutting-edge hardware, but the Folding@Home Consortium and Stanford University openly encourage everyone to run the software on whatever they happen to have.

With this in mind, I started thinking about all the old hardware that is out there…CPUs and graphics cards that are destined for landfills because they are no longer fast enough to do any useful gaming or decode 4K video. People describe this type of hardware as “bricks” or “space heaters”–useful for nothing other than wasting power.

That gave me an idea…

It didn’t take me long to find a sweet deal on an nForce 680i-based system on eBay for $60 shipped (EVGA board with Nvidia n680i chipset, supporting three full-length PCI-E X16 slots). I swapped out the Core 2 Duo that this machine came with for a Core 2 Quad, and purchased four Fermi-based Nvidia graphics cards, plus a used 1300 Watt Seasonic 80+ Gold power supply. All of this was amazingly cheap. The beautiful Antec case was worth the $60 cost of the parts that came with it alone. Because I knew lots of power would be critical here, I spent most of the money on a high-end power supply (also used on eBay). Later on, I found that I needed to also upgrade the cooling (read: cut a hole in the side panel and strap on some more fans).

  • Antec Mid-Tower Case + Corsair 520 Watt PSU, EVGA 680i motherboard, Core 2 Duo CPU, 4 GB Ram, CD Drives, and 4 Fans = $60
  • 2x EVGA Nvidia GeForce GTX 480 graphics cards: $40
  • 1 x EVGA NVidia GeForce GTX 580 Graphics Card: $50
  • 1 x EVGA NVidia GeForce GeForce GTX 460 Graphics Card: $20
  • 1 x PCI-E X1 to X16 Riser: $10
  • 1 x Core 2 Quad Q6600 CPU (Socket 775) – $6
  • 1 x Seasonic 1300 Watt 80+ Gold Modular Power Supply: $90
  • 2 x Noctua 120 MM fans + custom aluminum bracket (for modifying side panel): $60
  • 1 x Arctic Cooling Freezer Tower Cooler – $10
  • 1 x Western Digital Black 640GB HDD – $10

Total Cost (Estimated): $356

This is the cost before I sold some of the parts I didn’t need (Core 2 Duo, Corsair PSU, etc).

Here is a shot of the final build. It took a bit of tweaking to get it to this point.

F@H_Space_Heater_Quad_GPUs

Used Parts Disclaimer!

Note that when dealing with used parts on eBay, it’s always good to do some basic service. For the GPUs in this build, I took them apart, cleaned them, applied fresh thermal paste (Arctic MX-4), and re-assembled. It was good that I did…these cards were pretty gross, and the decade-old thermal paste was dried on from years of use.

 

I mean, come on now, look at the dust cake on the second GTX 480! Clean your graphics cards, random eBay people!

GTX 480 Dust

Here’s how the 3 + 1 GPUs are set up. The two GTX 480s and the GTX 580 are on the mobo in the X16 slots. I remotely mounted the GTX 460 in the drive bay. I used blower-style (slot exhaust) cards on purpose here, because they exhaust 100% of the hot air outside the case. Open-fan style cards would have overheated instantly in this setup.

To keep costs down, I just used Ubuntu Linux as the operating system. I configured the machine for 4-slot GPU folding using proprietary Nvidia drivers. Although I ultimately control all of my remote Linux machines with TeamViewer, it is helpful to have a portable monitor and combo wireless keyboard/mouse for initial configuration and testing. In the shot below (of an earlier config), I learned a lot just trying the get the machine stable with 3 cards.

Space_Heater_Early_Config_Initial_Fireup_small

Initial Testing on the Space Heater (3 GPUs installed). This test showed me that I needed better CPU cooling (hence I chucked that stock Intel cooler)

I also did some thermal testing along the way to make sure things weren’t getting too hot. It turns out this testing was a bit misleading, because the system was running a lot cooler with the side panel off than with it on.

Some Thermal Camera Images During Initial Burn-In (3 GPUs, stock CPU cooler):

Now that’s some heat coming out of this beast! Thankfully, the upgraded 14-gauge power plug and my watt meter aren’t at risk of melting, although they are pretty warm.

Once I had the machine up and running with all four GPUs the final configuration, I found that it produced about 55-95K PPD on average (based on the work unit), with the following breakdown

  • GTX 460: 10-20K PPD
  • GTX 480: 20-30K PPD each
  • GTX 580: 25-45 K PPD

Power consumption, as measured at the wall, ranged from 900 to 1000 watts with all 4 GPUs engaged. By turning different GPUs on and off, I could get varying levels of power (about 200 Watts idle. I typically ran it with one 580 and one 480 folding, for an average power consumption of about 600 watts).

Space_Heater_Power_Consumption

After running the machine for a while, my room was nice and toasty, as expected!

One thing that I should mention was the effect of the two additional intake fans that I mounted in the side panel. Originally I did not have these, and the top graphics card in the stack was hitting 97 degrees C according to the onboard monitoring! After modding this custom side-intake into the case (found a nice fan bracket on Amazon, and put my dremel tool to good use), the temps went down quite a lot. I used fan grilles on the inside of the fans to keep internal cables out of them, and mesh filters on the outside to match the intake filters on the rest of the case.

 

The top card stays under 85 degrees C (with the fan at 50%). The middle card stays under 80 degrees C, and the bottom card runs at 60 degrees C. The GTX 460 mounted in the drive bay never goes over 60 degrees C, but it’s a less powerful card and is mounted on the other side of the case.

Here’s some more pictures of the modded side panel, along with a little cooling diagram I threw together:

PPD, Wattage, and Efficiency Comparison

I debated about putting these plots in here, because the point of this machine was not primarily to make points (pun intended), or to be efficient from a PPD/Watt perspective. The point of this machine was to replace the 1500 watt space heater I use in the winter to keep a room warm.

As you can see, the scientific production (PPD) on this machine, even with 4 GPUs, is not all that impressive in 2020, since the GPUs being used are ten years old. Similarly, the efficiency (PPD/Watt) is terrible. There’s no surprise there, since it averages just under 1000 watts of power consumption at the wall!

Conclusion

It is totally possible to build a (relatively) inexpensive desktop computer out of old, used parts to use as a space heater. If the primary goal is to make heat, then this might not be a bad idea (although at $350, it still costs way more than a $20 heater from Walmart). The obvious benefit is that this sort of space heater is actually doing something useful besides keeping you warm (in this case, helping scientists learn more about diseases thanks to Folding@Home).

Other benefits that I found were the remote control (TeamViewer), which lets me use my cellphone to turn GPUs on and off to vary the heat output. Also, I think running this machine for extended durations in its medium-high setting (700 watts or so) is much healthier for the electrical wiring in my house vs. the constant cycling on and off of a traditional 1500 watt space heater.

From an environmental standpoint, you can do much worse than using electric heat. In my case, electric space heaters make a lot of sense, especially at night. I can shut off the entire heating zone (my house only has two zones) to the upstairs and just keep the bedroom warm. This drastically reduces my fossil fuel usage (good old New England, where home heating oil is the primary method of keeping warm in the winter). Since my house has an 8.23 KW solar panel array on the roof, a lot of my electricity comes directly from the sun, making this electric heat solution even greener.

Parting Thoughts:

I would not recommend running a machine like this during the warmer months. If warm air is not wanted, all the waste heat from this machine will do nothing but rack up your power bill for relatively little science being done. If you want to run an efficient summer-time F@H rig that uses low power (so as to not fight your AC) , check out my article on the GTX 1660 and 1650.

In a future article, I plan to show how I actually saved on heating costs by running Folding@Home space heaters all last winter (with a total of seven Folding@Home desktops placed strategically throughout my house, so that I hardly had to burn any oil).

 

AMD Ryzen 9 3950X Folding@Home Review: Part 1: PPD vs # of Threads

Welcome back everyone. Over the last month, I’ve been experimenting with my new Folding@Home benchmark machine to see how effectively AMD’s flagship Ryzen processor (Ryzen 9 3950X) can fight diseases such as COVID-19, Cancer, and Alzheimer’s. I’ve been running Folding@Home, a charitable distributed computing project, which provides scientists with valuable computing resources to study diseases and learn how to combat them.

This blog is typically focused on energy efficiency, where I try to show how to do the most science for the least amount of power consumption possible. In this post, I’m stepping away from that (at least for now) in order to understand something much simpler: how does the Folding@Home CPU client scale with # of processor threads?

I’d previously investigated Folding@Home performance and efficiency vs. # of CPU cores on an old Intel Q6600. I’ve also done a few CPU articles on AMD’s venerable Phenom II X6 1000T and my previous processor, the AMD FX-8320e. These CPU articles were few and far-between however, as I typically focus on using graphics cards (GPUs). The reason is twofold. Historically, graphics cards have produced many more points per day (PPD) for a given amount of power, thanks to their massively parallel architecture, which is well-suited for running single precision molecular dynamics problems such as those used by Folding@Home. Also, graphics cards are much easier to swap out, so it was relatively easy to make a large database of GPU performance and efficiency.

Still, CPU folding is just as important, because there are certain classes of problems that can only be efficiently computed on the CPU. Folding@Home, while originally a project that ran exclusively on CPUs, obtains the bulk of its computational power from GPU donors these days. However, the CPU folders sill play a key part, running work units that cannot be solved on GPUs, thus providing a complete picture of the molecular dynamics.

In my last article, I highlighted the need for me to build a new benchmark machine for testing out GPUs, since my old rig would soon become a bottleneck and slow the GPUs down (thus potentially affecting any comparison plots I make). Now that this Ryzen-based 16-core monster of a desktop is complete, I figured I’d revisit CPU folding once more to see just what a modern enthusiast-class processor like the $749 Ryzen 9 3950X is capable of. For this first part of a multi-part review, I am simply looking at the preliminary results from running Folding@Home on the CPU. Instead of running with the default thread settings, I manually set up the client, examining just how performance results scale from the 1 to 32 available threads on the Ryzen 9 3950x.

Test Setup

Testing was performed in Windows 10 Home, using the latest Folding@Home client (7.6.13). Points Per Day were estimated from the client window for each setting of # of CPU threads. These instantaneous estimates have a lot of variability, so future testing will investigate the effect of averaging (running multiple tests at each setting) on the results.

Benchmark Machine Hardware:

Case Raidmax Sagitta (2006)
Power Supply Seasonic Prime 750 Titanium
Fresh Air 2 x 120 mm Enermax Front Intake
Rear Exhaust 1 x 120 mm Scythe Gentile Typhoon
Side Exhaust 1 x 80 mm Noctua
Top Exhaust 1 x 120 mm (Seasonic PSU)
CPU Cooler Noctua NH-D15 SE AM4
Thermal Paste Arctic MX-4
CPU AMD Ryzen 9 3950X 16 Core 32 Thread (105W TDP)
Motherboard ASUS Prime X570-P Socket AM4
Memory 32 GB (4 x 8 GB) Corsair Vengeance LPX DDR4 3600 MHz
GPU Zotac Nvidia GeForce 1650
OS Drive Samsung 970 Evo Plus 512 GB NVME SSD
Storage #1 Samsung 860 Evo 2 TB SSD
Storage #2 Western Digital Blue 256 GB NVME SSD (for Linux)
Optical Samsung SH-B123L Blu-Ray Drive
OS Windows 10 Home, Ubuntu Linux (on 2nd NVME)

Processor Settings:

The AMD Ryzen 9 3950x is a beast. With 16 cores and 32 threads, it has a nominal power consumption of 105 watts, but can easily double that when overclocked. With the factory Core Performance Boost (CPB) enabled, the processor will routinely draw 150+ watts when loaded due to the individual cores turboing as high as 4.7 GHz, up from the 3.5 GHz base clock. Under heavy multi-threaded work loads, the processor supports an all-core overclock of up to 4.3 GHz, assuming sufficient cooling and motherboard power delivery.

This automatic core turbo behavior is problematic for creating a plot of folding at home performance (PPD) vs # of threads, since for lightly threaded loads, the processor will scale up individual cores to much higher speeds. In order to make an apples to apples comparison, I disabled CPB, so that all CPU cores run at the base speed of 3.5 GHz when loaded. In future testing, I will perform this study with CPB on in order to see the effect of the factory automatic overclocking.

A note about Cores vs. Threads

Like many Intel processors with Hyper-Threading, AMD supports running multiple code execution strings (known as threads) on one CPU core. The Simultaneous Multi-Threading (SMT) on the Ryzen 9 3950x is simply AMD’s term for the same thing: a doubling of certain parts within each processor core (or sometimes the virtualization of multiple threads within one CPU core) to allow multiple thread execution (two threads per core, in this case). The historical problem with both Hyper-Threading and SMT is that it does not actually double a CPU core’s capacity to perform complex floating point mathematics, since there is only one FPU per CPU core. SMT and Hyperthreading work best when there is one large job hogging a core, and the smaller job can execute in the remaining part of the core as a second thread. Two equally intensive threads can end up competing for resuorses within a core, making the SMT-enabled processor actually slower. For example: https://www.techspot.com/review/1882-ryzen-9-smt-on-vs-off/

For the purposes of this article, I left SMT on in order to make the coolest plot possible (1-32 threads!). However, I suspect that SMT might actually hurt Folding@Home performance, for the reasons mentioned above. Thus in future testing, I will also try disabling this to see the effect.

Preliminary Results: PPD vs # Threads on Ryzen 9 3950x

So, to summarize the caveats, this test was performed once under each test condition (# of threads), so there are 32 data points for 32 threads. SMT was on (so Folding@Home can run two threads on one CPU core). CPB was off (all cores set to 3.5 GHz).

The figure below shows the results. As you can see, there is a general trend of increasing performance with # of threads, up to around the halfway point. Then, the trend appears to get messy, although by the end of the plot, it is clear that the higher thread counts realize a higher PPD.

Ryzen 9 3950X PPD vs Thread Count 1

Observations

It is clear that, at least initially, adding threads to the solution makes a fairly linear improvement in points per day. Eventually, however, the CPU cores are likely becoming saturated, and more of the work is being executed in via SMT. Due to the significant work unit variability in Folding@Home (as much as 10-20% between molecules), these results should be taken with a grain of salt. I am currently re-running all of these tests, so that I can show a plot of average PPD vs. # of Threads. I am also logging power using my watt meter, so that we can make wall power consumption and efficiency plots.

Conclusions

Seeing a processor produce nearly half a million points per day in Folding@Home was insane! My previous testing with old 4, 6, and 8-core processors was lucky to show numbers over 20K PPD. In general, allowing Folding@Home to use more processor threads increases performance, but there is significant additional work needed to verify a statistical trend. Stay tuned for Part II (averaging).

P.S.

Man, that’s a lot of cores! You’d better be scared, COVID-19…I’m coming for you!

Cores!

So Many Cores!

Ryzen Update / Consider Supporting my Writing (somewhat off topic)

For those following along, it might be a bit until I get the next article published on testing out CPU folding on the Ryzen 3950x. I’m doing a # of threads vs. PPD and PPD/Watt efficiency plot, but with 32 threads this will take a while. I’m running a minimum of 3 work units per test, to try and minimize the work unit variation. I also want to do this with and without hyper threading (SMT) on, to see the effect. So far, I’m seeing PPD of up to 400K, which is insane for a CPU!

In the mean time, I’m working on my other writing projects. For anyone who likes science-fiction, you can check out my free online web novel The Chronicles of the Starfighters over on Royal Road. You can also check out my books Sagitta and Hrain on Amazon. These are pulpy, science-fiction adventure stories (think Star Trek or Star Wars) with a teenage and / or alien protagonists. It’s a far cry from the non-fiction posts of this blog, but it is equally nerdy, I promise.

For those interested in supporting this blog, giving Chronicles a review or rating on Royal Road would be a great way to help me gain traction (honest reviews only please, I’m looking for feedback on what I can improve as well as what I do well).

Starfighter Promo

You can read more about my sci-fi writing projets at my other blog: https://starfightersf.com/

Finally, since I don’t have a Patreon (yet), anyone interested in donating to help me fund more graphics card purchases can consider buying a kindle book. Even if you don’t enjoy science fiction, you can gift copies to someone who does!

Kindle book links:

Sagitta (story about a human warp ship that changes the course of an alien war)

Hrain (story about a young telepathic alien who beats people up and stuff)

Thanks! And hopefully I’ll have that Ryzen data up soon.

-Chris

New Folding@Home Benchmark Machine: It’s RYZEN TIME!

Folding@Home, the distributed computing project that fights diseases such as COVID-19 and cancer, has hit an all-time high in popularity. I’m stunned to find that my blog is now getting more views every day than it did every month last year. With that said, this is a perfect opportunity to reach out and see if all the new donors are interested in tuning their computers for efficiency, to save a little on power, lighten the burden on your wallet, and hopefully produce nearly the same amount of science. If this sounds interesting to you, let me know in the comments below!

In my last post, I noted that the latest generation of graphics cards are starting to push the limits of what my primary GPU Folding@Home benchmark rig can do. That computer is based on an 11-year-old chipset (AMD 880), and only supports PCI-Express 2.0. In order for me to keep testing modern fast graphics cards in Windows 10, I wanted to make sure that PCI-Express slot bandwidth wasn’t going to artificially bottleneck me.

So, without further ado, let me present the new, re-built Folding@Home rig, SAGITTA:

Sagitta Desktop

I’ve (re)created a monster!

This build leverages the Raidmax Sagitta case that I’ve had since 2006. This machine has hosted multiple builds (Pentium D 805, Core 2 Duo e8600, Core 2 Quad Q6600, Phenom II X6 1100T, and the most recent FX-8320e Bulldozer). There have been too many graphics cards to count, but the latest one (Nvidia GTX 1650 by Zotac) was carried over for some continuity testing. The case fans and power supply (initially) were also the same since the previous FX build (they aren’t the same ones from back in 2006…those got loud and died long ago). I also kept my Blu-Ray drive and 3.5 inch card reader. That’s where the similarities end. Here is a specs comparison:

Sagitta Rebuild Benchmark Machine Specs

  • Note I ended up updating the power supply to the one shown in the table. More on that below…

System Power Consumption

Initially, the power consumption at idle of the new Ryzen 9 build, measured with my P3 Kill A Watt Meter, was 86 watts. The power consumption while running GPU Folding was 170 watts (and the all-core CPU folding was over 250 watts, but that’s another article entirely).

Using the same Nvidia GeForce GTX 1650 graphics card, these idle and GPU folding power numbers were unfortunately higher than the old benchmark machine, which came in at 70 watts idle and 145 watts load. This is likely due to the overkill hardware that I put into the new rig (X570 motherboards alone are known to draw twice the power of a more normal board). The system’s power consumption difference of 25 watts while folding was especially problematic for my efficiency testing, since new plots compared to graphics cards tested on the old benchmark machine would not be comparable.

To solve this, I could either:

A: Use a 25 watt offset to scale the new GPU F@H efficiency plots

B: Do nothing and just have less accurate efficiency comparisons to previous tests

C: Reduce the power consumption of the new build so that it matches the old one

This being a blog about energy efficiency, I decided to go with Option C, since that’s the one that actually helps the environment. Lets see if we can trim the fat off of this beast of a computer!

Efficiency Boost #1: Power Supply Upgrade

The first thing I tried was to upgrade the power supply. As noted here, the power supply’s efficiency rating is a great place to start when building an energy efficient machine. My old Seasonic X-650 is a very good power supply, and caries an 80+ Gold rating. Still, things have come a long way, and switching to an 80+ Titanium PSU can gain a few efficiency percentage points, especially at low loads.

80+ Table

80+ Efficiency Table

With that 3-5% efficiency boost in mind, I picked up a new Seasonic 750 Watt Prime 80+ Titanium modular power supply. At $200, this PSU isn’t cheap, but it provides a noticeable efficiency improvement at both idle and load. Other nice features were the additional 100 watts of capacity, and the fact that it supported my new motherboard’s dual pin (8 + 4) CPU aux power connection. That extra 4-pin isn’t required to make the X570 board work, but it does allow for more overclocking headroom.

Disclaimer: Before we get into it, I should note that these power readings are “eyeball” readings, taken by glancing at the watt meter and trying to judge the average usage. The actual number jumps around a bit (even at idle) as the computer executes various background tasks. I’d say the measurement precision on any eyeball watt meter readings is +/- 5 watts, so take the below with a grain of salt. These are very small efficiency improvements that are difficult to measure, and your mileage may vary. 

After upgrading the power supply, idle power dropped an impressive 10 watts, from 86 watts to 76. This is an awesome 11% efficiency improvement. This might be due to the new 80+ Titanium power supply having an efficiency target at very low loads (90% efficiency at 10% load), whereas the old 80+ Gold spec did not have a low load efficiency requirement. Thus, even though I used a large 750 watt power supply, the machine can still remain relatively efficient at idle.

Under moderate load (GPU folding), the new 80+ titanium PSU provided a 4% efficiency improvement, dropping the power consumption from 170 watts to 163. This is more in line with expectations.

Efficiency Boost #2: Processor Underclock / Undervolt

Thanks to video gaming mentality, enthusiast-grade desktop processors and motherboards are tuned out of the box for performance. We’re talking about blistering fast, competition-crushing benchmark scores. For most computing tasks (such as running Folding@Home on a graphics card), this aggressive CPU behavior is wasting electricity while offering no discernible performance benefit. Despite what my kid’s shirt says, we need to reel these power hungry CPUs in for maximum GPU folding efficiency.

Never Slow Down

Kai Says: Never Slow Down

One way to improve processor efficiency is to reduce the clock rate and associated voltage. I’d previously investigated this here. It takes exponentially more voltage to support high frequencies, so just by dropping the clock rate by 100 MHz or so, you can lower the voltage a bunch and save on power.

With the advent of processors that up-clock and up-volt themselves (as well as going in the other direction), manual tuning can be a bit more difficult. It’s far easier to first try the automatic settings, to see if some efficiency can be gained.

But wait, this is a GPU folding benchmark rig? Why does the CPU’s frequency and power settings matter?

For GPU folding with an Nvidia graphics card, one CPU core is fully loaded per GPU slot in order to “feed” the card. This is because Nvidia’s implementation of open CL support using a polling (checking) method. In order to keep the graphics card chugging along, the CPU constantly checks on the GPU to see if it needs any data. This polling loop is not efficient and burns unnecessary power. You can read more about it here: https://foldingforum.org/viewtopic.php?f=80&t=34023. In contrast, AMD’s method (interrupts) is a much more graceful implementation that doesn’t lock up a CPU core.

The constant polling loop drives modern gaming-oriented processors to clock up their cores unnecessarily. For the most part, the GPU does not need work at every waking moment. To save power, we can turn down the frequency, so that the CPU is not constantly knocking on the GPU’s metaphorical door.

To do this, I disabled AMD’s Core Performance Boost (CPB) in the AMD Overclocking section of the BIOS (same thing as Intel’s Turbo Boost). This caps the processor speed at the base maximum clock rate (3.5 GHz for the Ryzen 9 3950x), and also eliminates any high voltage values required to support the boost clocks.

Success! GPU folding total system power consumption is now much lower. With less superfluous power draw from the CPU, the wattage is much more comparable to the old Bulldozer rig.

Ryzen 9 3950x Power Reduction Table

It is interesting that idle power consumption came down as well. That wasn’t expected. When the computer isn’t doing anything, the CPU cores should be down-clocked / slept out. Perhaps my machine was doing something in the background during the earlier tests, thus throwing the results off. More investigation is needed.

GPU Benchmark Consistency Check

I fired up GPU folding on the Nvidia GeForce GTX 1650, a card that I have performance data for from my previous benchmark desktop. After monitoring it for a week, the Folding@Home Points Per Day performance was so similar to the previous results that I ended up using the same value (310K PPD) as the official estimate for the 1650’s production. This shows that the old benchmark rig was not a bottleneck for a budget card like the GeForce GTX 1650.

Using the updated system power consumption of nominally 140 watts (vs 145 watts of the previous benchmark machine), the efficiency plots (PPD/Watt) come out very nearly the same. I typically consider power measurements of + / – 5 watts to be within the measurement accuracy of my eyeball on the watt meter anyway, due to normal variations as the system runs. The good news is that even with this variation, it doesn’t change the conclusion of the figure (in terms of graphics card efficiency ranking).

GTX 1650 Efficiency on Ryzen 9

* Benchmark performed on updated Ryzen 9 build

Conclusion

I have a new 16-core beast of a benchmark machine. This computer wasn’t built exclusively for efficiency, but after a few tweaks, I was able to improve energy efficiency at low CPU loads (such as Windows Idle + GPU Folding).

For most of the graphics cards I have tested so far, the massive upgrade in system hardware will not likely affect performance or efficiency results. Very fast cards, such as the 1080 Ti, might benefit from the new benchmark rig’s faster hardware, especially that PCI-Express 4.0 x16 graphics card slot. Most importantly, future tests of blistering fast graphics cards (2080 Ti, 3080 Ti, etc) will probably not be limited by the benchmark machine’s background hardware.

Oh, I can also now encode my backup copies of my blu-ray movies at 40 fps in H.265 in Handbrake (old speed was 6.5 fps on the FX-8320e). That’s a nice bonus too.

Efficiency Note (for GPU Folding@Home Users)

Disabling the automatic processor frequency and voltage scaling (Turbo Boost / Core Performance Boost) didn’t have any effect on the PPD being generated by the graphics card. This makes sense; even relatively slow 2.0 GHz CPU cores are still fast enough to feed most GPUs, and my modern Ryzen 9 at 3.5 GHz is no bottleneck for feeding the 1650. By disabling CPB, I shaved 23 watts off of the system’s power consumption for literally no performance impact while running GPU folding. This is a 16 percent boost in PPD/Watt efficiency, for free!

This also dropped CPU temps from 70 degrees C to 55, and resulted in a lower CPU cooler fan speed / quieter machine. This should promote longevity of the hardware, and reduce how much my computer fights my air conditioning in the summer, thus having a compounding positive effect on my monthly electric bill.

Future Articles

  • Re-Test the 1080 Ti to see if a fast graphics card makes better use of the faster PCI-Express bus on the AM4 build
  • Investigate CPU folding efficiency on the Ryzen 9 3950x

 

Shout out to the helpers…Kai and Sam

Folding@Home on Turing (NVidia GTX 1660 Super and GTX 1650 Combined Review)

Hey everyone. Sorry for the long delay (I have been working on another writing project, more on that later…). Recently I got a pair of new graphics cards based on Nvidia’s new Turing architecture. This has been advertised as being more efficient than the outgoing Pascal architecture, and is the basis of the popular RTX series Geforce cards (2060, 2070, 2080, etc). It’s time to see how well they do some charitable computing, running the now world-famous disease research distributed computing project Folding@Home.

Since those RTX cards with their ray-tracing cores (which does nothing for Folding) are so expensive, I opted to start testing with two lower-end models: the GeForce GTX 1660 Super and the GeForce GTX 1650.

 

These are really tiny cards, and should be perfect for some low-power consumption summertime folding. Also, today is the first time I’ve tested anything from Zotac (the 1650). The 1660 super is from EVGA.

GPU Specifications

Here’s a quick table I threw together comparing these latest Turing-based GTX 16xx series cards to the older Pascal lineup.

Turing GPU Specs

It should be immediately apparent that these are very low power cards. The GTX 1650 has a design power of only 75 watts, and doesn’t even need a supplemental PCI-Express power cable. The GTX 1660 Super also has a very low power rating at 125 Watts. Due to their small size and power requirements, these cards are good options for small form factor PCs with non-gaming oriented power supplies.

Test Setup

Testing was done in Windows 10 using Folding@Home Client version 7.5.1. The Nvidia Graphics Card driver version was 445.87. All power measurements were made at the wall (measuring total system power consumption) with my trusty P3 Kill-A-Watt Power Meter. Performance numbers in terms of Points Per Day (PPD) were estimated from the client during individual work units. This is a departure from my normal PPD metric (averaging the time-history results reported by Folding@Home’s servers), but was necessary due to the recent lack of work units caused by the surge in F@H users due to COVID-19.

Note: This will likely be the last test I do with my aging AMD FX-8320e based desktop, since the motherboard only supports PCI Express 2.0. That is not a problem for the cards tested here, but Folding@Home on very fast modern cards (such as the GTX 2080 Ti) shows a modest slowdown if the cards are limited by PCI Express 2.0 x16 (around 10%). Thus, in the next article, expect to see a new benchmark machine!

System Specs:

  • CPU: AMD FX-8320e
  • Mainboard : Gigabyte GA-880GMA-USB3
  • GPU: EVGA 1080 Ti (Reference Design)
  • Ram: 16 GB DDR3L (low voltage)
  • Power Supply: Seasonic X-650 80+ Gold
  • Drives: 1x SSD, 2 x 7200 RPM HDDs, Blu-Ray Burner
  • Fans: 1x CPU, 2 x 120 mm intake, 1 x 120 mm exhaust, 1 x 80 mm exhaust
  • OS: Win10 64 bit

Goal of the Testing

For those of you who have been following along, you know that the point of this blog is to determine not only which hardware configurations can fight the most cancer (or coronavirus), but to determine how to do the most science with the least amount of electrical power. This is important. Just because we have all these diseases (and computers to combat them with) doesn’t mean we should kill the planet by sucking down untold gigawatts of electricity.

To that end, I will be reporting the following:

Net Worth of Science Performed: Points Per Day (PPD)

System Power Consumption (Watts)

Folding Efficiency (PPD/Watt)

As a side-note, I used MSI afterburner to reduce the GPU Power Limit of the GTX 1660 Super and GTX 1650 to the minimum allowed by the driver / board vendor (in this case, 56% for the 1660 and 50% for the 1650). This is because my previous testing, plus the results of various people in the Folding@Home forums and all over, have shown that by reducing the power cap on the card, you can get an efficiency boost. Let’s see if that holds true for the Turing architecture!

Performance

The following plots show the two new Turing architecture cards relative to everything else I have tested. As can be seen, these little cards punch well above their weight class, with the GTX 1660 Super and GTX 1650 giving the 1070 Ti and 1060 a run for their money. Also, the power throttling applied to the cards did reduce raw PPD, but not by too much.

Nvidia GTX 1650 and 1660 performance

Power Draw

This is the plot where I was most impressed. In the summer, any Folding@Home I do directly competes with the air conditioning. Running big graphics cards, like the 1080 Ti, causes not only my power bill to go crazy due to my computer, but also due to the increased air conditioning required.

Thus, for people in hot climates, extra consideration should be given to the overall power consumption of your Folding@Home computer. With the GTX 1660 running in reduced power mode, I was able to get a total system power consumption of just over 150 watts while still making over 500K PPD! That’s not half bad. On the super low power end, I was able to beat the GTX 1050’s power consumption level…getting my beastly FX-8320e 8-core rig to draw 125 watts total while folding was quite a feat. The best thing was that it still made almost 300K PPD, which is well above last generations small cards.

Nvidia GTX 1650 and 1660 Power Consumption

Efficiency

This is my favorite part. How do these low-power Turing cards do on the efficiency scale? This is simply looking at how many PPD you can get per watt of power draw at the wall.

Nvidia GTX 1650 and 1660 Efficiency

And…wow! Just wow. For about $220 new, you can pick up a GTX 1660 Super and be just as efficient than the previous generation’s top card (GTX 1080 Ti), which still goes for $400-500 used on eBay. Sure the 1660 Super won’t be as good of a gaming card, and it  makes only about 2/3’s the PPD as the 1080 Ti, but on an energy efficiency metric it holds its own.

The GTX 1650 did pretty good as well, coming in somewhere towards the middle of the pack. It is still much more efficient than the similar market segment cards of the previous generation (GTX 1050), but it is overall hampered by not being able to return work units as quickly to the scientists, who prioritize fast work with bonus points (Quick Return Bonus).

Conclusion

NVIDIA’s entry-level Turing architecture graphics cards perform very well in Folding@Home, both from a performance and an efficiency standpoint. They offer significant gains relative to legacy cards, and can be a good option for a budget Folding@Home build.

Join My Team!

Interested in fighting COVID-19, Cancer, Alzheimer’s, Parkinson’s, and many other diseases with your computer? Please consider downloading Folding@Home and joining Team Nuclear Wessels (54345). See my tutorial here.

Interested in Buying a GTX 1660 or GTX 1650?

Please consider supporting my blog by using one of the below Amazon affiliate search links to find your next card! It won’t cost you anything extra, but will provide me with a small part of Amazon’s profit so I can keep paying for this site.

GTX 1660 Amazon Search Affiliate Link!

GTX 1650 Amazon Search Affiliate Link!

How to Run Folding@Home on a Graphics Card in Windows 10

(A Folding at Home Unofficial Configuration Guide for GPU, Multi-GPU, and CPU/GPU Folding)

Folding@Home is a distributed computing project for fighting diseases. If you’re reading this post, they you are probably looking for some help getting Folding@Home running on your graphics card. GPU folding, when configured properly, is one of the best way to do tons of science efficiently. I hope this Folding@Home GPU Guide helps you start kicking butt against cancer and other diseases. So, let’s get started.

Note: for people who already have the Folding@Home client up and running and you want to switch from CPU folding to GPU folding, skip right to Step 3. Please note that if you are changing your hardware configuration on a machine that is already folding, it is courteous to let the existing work units finish by using the “finish” option on the client prior to re-arranging hardware. This keeps work units from being lost.

Step 0: System Requirements

Yes, we’re starting at zero, because computer indexing starts here too. Plus, before you even try this, you need the right stuff in the box.

Operating System

While Folding@Home supports many operating systems, this guide is aimed at Windows users. I’ll be using Windows 10, but the steps are the same for Windows 7.

Overall Computer

CPU

Give Me Efficiency or Give Me an Empty CPU Socket!

You do need to think about what goes in this socket, even if you’re GPU folding

Even though this is a guide about graphics card folding, the rest of your computer needs to be up to snuff to keep the card fed. Ideally, you want one dedicated CPU core for your overall Windows environment, plus one CPU core for each graphics card you want to run F@H on. So, for a 1-GPU computer, having two CPU cores available is optimal. A dual-GPU computer should have a 3 cores available, a three-GPU computer should have four cores available, etc. In terms of clock rate, almost all modern processors with clock rates above 2.0 GHz will work. Remember, we aren’t doing CPU folding here; the CPU just needs to be fast enough to keep the GPU fed.

Motherboard

Circuit City

Circuit City

Motherboards don’t matter too much, except that you should have a full-width PCI-Express x16 slot for each graphics card you want to fold on. When you get into really fast, new graphics cards like the RTX 2080,  a PCI-E 3.0 x16 slot will ensure the data flows fast enough to the card. PCI-Express 2.0 bandwidth will work with these ultra-fast cards, but there will be a slight bottleneck. Note I have never seen any bottlenecks with my GTX 1080 Ti on PCI-Express 2.0 x16 in Windows, but when adding a second card (using an x1 riser), I did see a slowdown on my Gigabyte 880-series socket AM3 board.

Memory

You should also aim to have 8 GB of ram (16 ideally), just because Windows tends to be a resource hog. Some people can fold just fine with 4 GB, but for this guide I am assuming you want to be able to use the machine as well. Memory channel configuration and speed doesn’t matter very much for Folding@Home, especially on GPUs.

Hard Drives

Any old hard drive with 60 GB or so of free space will do. The F@H client takes up almost no space. The 60 GB of free space is really just what you need for Windows 10 to not run really bad, regardless of what the machine is being used for.

Internet Connection

Almost anything works, as long as it doesn’t drop out.

Power Supply

PC P&C PSU

PC Power & Cooling SILENCER PSU

This is a critical and often overlooked component in the world of computational computing. I’ve written many articles on power supplies, so feel free to browse through my site to learn more. In short, make sure your system has enough PSU wattage to drive the video card, based on the video card’s recommendation. You’ll also need to make sure your power supply has the correct auxiliary power cables (PCI-Express 6-pin and/ or 8-pin) to supply enough current to cards requiring supplemental power.

For multiple cards, you’ll need more nameplate PSU wattage. Power supplies should be 80+ Bronze certified or better to help deliver power efficiently, because no one likes wasting money on misused electricity (and this hurts the environment). Also, you should try and stick with major manufacturers, such as (but not limited to) Corsair, Antec, Seasonic, Cooler Master, PC Power & Cooling, Thermaltake, etc.

Here are some common computer configurations and a reasonable power supply wattage to drive them:

  • 1 x Low-End GPU –> (GTX 1050, RX560, etc) –> 380 Watt PSU
  • 1 x Mid-Range GPU (GTX 1060, RX570, etc) –> 450 Watt PSU
  • 1 x High-End GPU (GTX 1080, Vega64, etc) –> 550 Watt PSU
  • 2 x Mid-Range GPUs  or 3 x Low-End GPUs–> 600 Watt PSU
  • 2 x High-End GPUs or 3 x Mid-Range GPUs –> 800 Watt PSU
  • 3 x High-End GPUs or  4 x Mid-Range GPUs–> 1000 Watt PSU
  • 4 x High-End GPUs (you’re crazy!) –> 1200+ Watt PSU

Saving the Planet Tip: Any PSU supplying an active load of 600 Watts or more should be 80+ Gold certified or better. This will minimize waste heat due to efficiency losses, which really start to add up for high power-draw computers.

Cooling

This is another overlooked requirement. Any computer doing 24/7 computations on a graphics card is going to get pretty toasty. Thankfully, most modern CPU cases come with enough space and fans to deal with this. You’ll want at least 1 dedicated 120 MM exhaust fan (not including the PSU fan) and one 120 MM intake fan to keep the air flowing. If you have dual graphics cards, having an intake fan right on the side panel blowing on the cards is one of the best way to keep a hot pocket of air from forming between the cards. Consider reference-style video cards (centrifugal 2-slot blower coolers) for multi-card setups to help dump the heat, since open-fan cards tend to just drown in their own heat if there isn’t enough airflow. I also recommend aftermarket coolers on CPUs, since your processor will be actively spooled up and feeding your graphics card. Yet, CPU cooling doesn’t need to be overkill.

Icy Opteron 4184

NOCTUA OVERKILL!

Graphics Card

Graphics Card Showdown: EVGA Nvidia Geforce GTX 1050 TI vs. Gigabyte AMD Radeon HD 7970 GHz Edition

Graphics Cards: You’ll Need One

First off, you should actually have a discrete graphics card. While F@H might run on some onboard / APU graphics solutions, the performance won’t be worth it, and you might as well just run CPU folding.

Folding@Home works on many discrete graphics cards that support OpenCL, but not all cards are supported. AMD RADEON HD 5xxx cards and Nvidia GeForce 4xx cards and newer are currently supported, but that can always change. See the project’s system requirements for a complete list. I personally recommend using Nvidia 9xxx series cards or AMD RX 5xx cards or newer, since these are more efficient than older hardware. My review of the GeForce 1080 Ti has some plots on efficiency and performance that might be helpful if you are selecting a card specifically for folding. Make sure you have the latest drivers for your card from either AMD or Nvidia.

Step 1: System Prep

Before even downloading Folding@Home, you should do a few basic things just to make sure the system is going to be stable for heavy computations. On the software side, this means updating drivers, making sure virus definitions and Windows updates are up to date, etc. On the hardware side, I recommend fully air-canning the dust out of your machine to optimize cooling. If the computer is older and the GPU you plan to use has been installed for a while, it’s worth taking the graphics card out and hitting it with some compressed air from all angles to clean out the heat sinks.

Step 2: Download and Install V7 Client

The Folding@Home V7 client can be found here:

The current client version is 7.5.1. Go ahead and install it. For this part, it’s basically just following the prompts. F@H’s default Windows install guide works well enough, and you can read that here. All of this can be configured later within the client (and this will be required for GPU folding). So, I’m linking to the standard install guide instead of regurgitating the steps, because I’m lazy I want this to be done identically to how Stanford * the F@H Consortium recommends it be done. If you don’t want to fold anonymously, select the “Set up an identity” button. You’ll want to pick a user name and enter a team number if you have one you’d like to join.

For example, if you wanted to join our team, you’d enter number 54345 in the team number field to join team Nuclear Wessels!

A note about Passkeys: you want one of these if you want to get lots of points and compete on the F@H leaderboards. Passkeys are a secure key that makes sure your points are your own (i.e. no one is using your username to generate points elsewhere). You need to have a Passkey if you want to be eligible for the Quick Return Bonus (more points given to users who do science quickly). You become eligible for the bonus once you have successfully completed ten work units and you have a valid passkey. You can get a Passkey here (but you don’t have to do this right away. Just like configuring your GPUs, it can be done later).

Step 3: Configure the Client for GPU Folding

FAH_Molecule

Now we are going to edit settings within the Advanced Control section of the Folding@Home client. To get here, look at your Windows task bar (next to the clock). Once F@H is installed, there should be a little molecule there. Right-click that bad boy and select “Advanced Control” to open the local client window.

Right-Click FAH

This opens up the client view. Here is what mine currently looks like (with GPU slots configured). Depending on how you got here, you might or might not have a team name and user identity displayed, and you might or might not have CPU folding enabled.

F@H Control V7

Go ahead and click the “Configure” button in the top-left of the window. Go to the “Identity” tab first.

Identity

Here, you can change any of the user info and team name info you entered when you installed the V7 client. You can also enter a Passkey if you have one (for those sweet, sweet Quick Return Bonus Points!).

Pitch: I’d be honored if you joined team # 54345 (Nuclear Wessels). We are currently doing everything we can to fight the COVID-19 coronavirus.

Nuclear Wessels Meme

Next, go one tab over to “Slots”. Here, you can see what devices Folding@Home plans to use (either CPU or GPU). For my setup, I have removed all CPU slots and added two GPU slots (one slot for my 980 Ti and one for my 1080 Ti). If you originally started folding on the CPU and want to switch to GPU folding, you can delete your CPU slot here and add GPU slot(s) for your graphics card(s).

Note: If you want to do mixed hardware folding (CPU + GPU), I will talk about that in Step 4.

Slots

The slot configuration window opens up when you add or edit a slot. Here are the options.

Slot Config Selecting the GPU button and leaving all the index settings at -1 is a good place to start. Nine times out of ten, the client will properly detect graphics cards this way. For my computer, adding two GPU slots with settings like this resulted in it properly detecting and folding on my installed GTX 980 Ti and GTX 1080 Ti cards.

In rare cases, the client might get confused. This happens in systems with onboard graphics (such as with AMD APUs). What happens is you are trying to fold on your discrete graphics card, and instead the F@H client is running the GPU slot on the APU. When this happens, I’ve found the easiest thing to do is reboot the computer, go into the BIOS, and disable the APU graphics from there, so that the client can’t even see the APU. Thus, the GPU slot with a -1 index defaults to the discrete graphics card.

Alternatively, you can use the gpu-index, opencl-index, and cuda-index boxes to try and get the slot to run on the correct graphics card. This is a trial and error process that is beyond the scope of this guide (leave me a comment if you need help with this, or ask someone in the Folding@Home Forums).

Advanced Slot Options

The Extra Slot Options (expert only) box on the bottom can sometimes help you eek a bit more performance out of the GPU slots. However, your mileage may vary. You can add or remove slot options with the + and – buttons on the bottom-right.

The settings I tend to add are these:

Advanced Options

Here, client-type advanced lets me get “late stage beta” work units, which might be a bit more unstable than normal work units, yet this helps the Folding@Home Consortium get new projects tested sooner. Max-Packet-Size Big (other options are “normal” and “small”) lets me download large molecules that will push the system a bit harder (more VRAM needed, more internet bandwidth, etc). Pause-on-start (value of “true” or “false”) tells the system to pause the folding slot when the computer boots (instead of automatically folding as soon as the machine is on). This is nice for when I want to kick folding off manually. Set this to “false” or leave it blank if you want the computer to fold automatically after a restart.

For a detailed list of these slot options, see the config guide here. Note: some of this is out of date.

Step 4 (Optional): Configure a CPU Slot as well

If you have CPU cores to spare, you can add a CPU folding slot in addition to the GPU slots. I recommend leaving 1 CPU core free for Windows background tasks (unless you are making a dedicated folding rig and don’t mind it being a bit slow to use). You should also keep 1 CPU core free for feeding each GPU that you have in your system. So, for my 8-core AMD FX-8320e with my two graphics cards, I could do something like this:

Total CPU Cores: 8

Cores needed for Windows: 1

Cores Needed for GPU Slots: 2 (one for each GPU)

Cores Remaining: = 8-1-2 = 5

So, theoretically, I can set my CPU folding slot to use 5 CPU cores. Now, an interesting fact is that in multi-core computing, prime numbers like 3, 5, and 7 do not work so well. Folding at home also doesn’t do well with high prime numbers, or multiples thereof (such as 14 threads, which is a multiple of prime number 7). It has to do with how all the data threads are stitched together.

For example, you get similar performance folding with 4 CPU cores as with 5 (4 is a nice base 2 number that computers like). In my case, for a non-dedicated folding rig, I set up a CPU slot with 4 CPU cores enabled, leaving two cores to handle whatever else the computer is doing and 2 cores to feed the graphics cards. Incidentally, if this were a guide about just setting up CPU folding, I would leave this box at “-1”.

4 CPU Core Config

Now, just hit the OK button and then save the slot configuration.

Save Slot Config

Step 5: Observe Slots Descriptions in the Client

Now, I can see that I have three slots (two GPU and one CPU) listed in the client window.

Ready Slots

Here, you should see that the CPU slot is using the number of threads you told it to use (4, in my case), and that the graphics cards are correctly identified. This all looks good.

Step 5: Watch it run!

Once you have your slots configured, you should be able to sit back and watch your computer fight disease with everything it’s got. One last thing: A helpful tool for graphics card monitoring is something like MSI Afterburner, or AMD’s built-in tool Wattman. It’s good to use these to make sure your card has enough thermal headroom to perform (keep it under 80 degrees C if you can!). If your card is thermally throttling, you’ll see an impact to folding@home PPD. I find that setting custom fan curves, or just setting the fan to run a bit faster than it normally would, is often enough to eliminate this.

Troubleshooting

The V7 client installer does the best job at detecting your specific graphics hardware during initial software installation. If you added a new graphics card that is not recognized, you should do a clean re-install of the V7 client. Write down your Name, Team Number, and Passkey, uninstall the client completely (including data), reinstall, and see if the new card is detected.

Some new graphics cards are also not immediately supported upon release. For example, the Radeon 5700 XT is only recently gaining support with advanced beta work units, but work is progressing to get this card fully supported (as of 3/2020). You can read up on which cards are supported and which aren’t yet on the GPU Whitelist Thread.

Leave me a comment if…

Did this guide help you? Did I miss something? Let me know how I can help and make this better by leaving a comment. Thanks!

-Chris

Addendum: Helpful Links to Other Tutorials

HFM.net – A remote monitoring program for F@H Clients

HFM.net monitoring tutorial (Youtube) – Video Tutorial by Frax1006

Teamviewer Guide – A remote desktop solution to let you log into folding machines and monitor / configure them. This is an excellent write-up by Pyroball.

Official F@H Advanced User Custom Installation Guide

Official F@H Configuration Guide

Overclocker’s Club F@H Guide