Welcome back everyone! In my last post, I used my rebuilt benchmark machine to revisit CPU folding on my AMD Ryzen 9 3950x 16-core processor. This article is a follow-on. As promised, this includes the companion power consumption and efficiency plots for thread settings of 1-32 cores. As a quick reminder, I did this test with multi-threading (SMT) on, but with Core Performance Boost disabled, so all cores are running at the base 3.5 GHz setting.
The Folding@Home distributed computing project has come a long way from its humble disease-fighting beginnings back in 2000. The purpose of this testing is to see just how well the V7 CPU client scales on a modern, high core-count processor. With all the new Folding@Home donors coming onboard to fight COVID, having some insight into how to set up the configuration for the most performance is hopefully helpful.
For this test, I simply set the # of threads the client can use to a value and ran five sequential work units. I averaged the performance (Points Per Day), but I also plot the individual work unit performance values to give you a sense of the variation. Since the Ryzen 9 3950x supports 32 threads, I essentially ran 160 tests. Since I wanted the Folding@Home Consortium to get useful data in their fight against COVID-19, I let each work unit run to completion, even though I only need them to run to about 10-20% complete to get an accurate PPD estimate from the client.
So, without further blabbing on my part, here is the graph of Folding@Home performance vs. thread count in Windows 10 on the Ryzen 9 3950x
Here, the solid blue line is the averaged performance, and the gray circles are the individual tests. The dashed blue lines represent a statistical 95% confidence interval, which is computed based on the variation. The expected Points Per Day (PPD) of a work unit run on the 3950x is expected to fall within this band 95% of the time.
My first observation is, holy crap! This is a fast processor. Some work units at high thread counts get really close to 500K PPD, which for me has only been achievable by GPU folding up to this point.
My second observation is that there is a lot of variation between different work units. This makes sense, because some work units have much larger molecules to solve than others. In my testing, I found the average variation of all 160 tests to be 12.78%, with individual variance up to 25%.
My third observation is that there seems to be two different regions on this plot. For the first half, the thread count setting is less than the number of physical cores on the chip, and the results are fairly linear. For the second half, the thread count setting is higher than the number of physical cores on the chip (thus forcing the CPU to virtualize those cores using SMT). Performance seems to fall off when the CPU cores become fully saturated (threads = 16), and it takes a while to climb out of the hole (threads = 24 starts showing some more gains).
As a side note, the client does not actually run all of these thread count settings, since some prime numbers, especially large primes (7, 11) and multiples thereof cause numerical issues. For example, when you try to run a 7-thread solve, the client automatically backs the thread count down to 6. You can see warnings in the log file about this when it happens.
I noted all the relevant thread counts where this happens on the x-axis of the plot. Theoretically, these should be equivalent settings. The fact that the average performance varies a bit between them is just due to work unit variation (I’d have to run hundreds of averages to cancel all the variation out).
Finally, I noticed that the highest PPD actually occurred with a thread count of 30 (PPD = 407200) vs a thread count of 32 (PPD = 401485). This is a small but interesting difference, and is within the range of statistical variation. Thus I would say that setting the thread count to 30 vs 32 provides the same performance, while leaving two CPU threads free for other tasks (such as GPU folding…more on that later!).
Power consumption numbers for each thread setting were taken at the wall, using my P3 Kill A Watt meter. Since the power numbers tend to walk around a bit as the computer works, it’s hard to get an instantaneous reading. Thus these are “eyeball averaged”. There was enough change at each CPU thread setting to clearly see a difference (not counting those thread settings that are actually equivalent to an adjacent setting).
The total measured power consumption rose fairly linearly from just under 80 watts to just under 160 watts. There’s not too much surprising here. As you throw more threads at the CPU, it clocks up idle cores and does more work (which causes more transistors to switch, which thus takes more power). This seems pretty believable to me. At the high end, the system is drawing just under 160 watts of power. The AMD Ryzen 9 3950x is rated at a 105 watt TDP, and with CPB turned off it should be pretty close to this number. My rough back of the hand calculation for this rig was as follows:
- CPU Loaded Power = 105 Watts
- GPU Idle Power (Nvidia GTX 1650) = 10 Watts
- Motherboard Power = 15 Watts
- Ram Power = 2 watts * 4 sticks = 8 watts
- NVME Power = 2 watts * 2 drives = 4 watts
- SSD Power = 2 watts
Total Estimated Watts @ F@H CPU Load = 144 Watts
Factor in a boat load of case fans, some silly LED lights, and a bit of PSU efficiency hit (about 90% efficient for my Seasonic unit) and it’ll be close to the 160 watts as measured.
This being a blog about saving the planet while still doing science with computers, I am very interested in energy efficiency. For Folding@Home, this means at doing the most work (PPD) for the least amount of power (watts). So, this plot is just PPD/Watts. Easy!
Similar to the PPD plot, this efficiency plot averages five data points for each thread setting. I chose to leave off the individual points and the confidence interval, because that looks about the same on this plot as it does on the PPD plot, and leaving all the clutter off makes this easier to read.
As with the PPD plot, there seem to be two regions on the efficiency curve. The first region (threads less than 16) shows a pretty good linear ramp-up in efficiency as more threads are added. The second region (threads 16 or greater) is what I’m calling the “core saturation” region. Here, there are more threads than physical cores, and efficiency stays relatively flat. It actually drops off at 16 cores (similar to the PPD plot), and doesn’t start improving again until 24 or more threads are allocated to the solver.
This plot, at first glance, suggests that the maximum efficiency is realized at # of threads = 30. However, it should be noted that work unit variation still has a lot of influence, even with reporting results of a 5-sample average. You can see this effect by looking at the efficiency drop at threads = 31. Theoretically, the efficiency should be the same at threads = 31 and threads = 30, because the solver runs a 30-thread solution even when set to 31 to prevent domain decomposition.
Thus, similar to the PPD plot, I’d say the max efficiency is effectively achieved at thread counts of 30 and 32. My personal opinion is that you might as well run with # of threads = 30 (leaving two threads free for other tasks). This setting results in the maximum PPD as well.
Weird Results at Threads = 16-23
Some of you might be wondering why the performance and efficiency drops off when the thread count is set to the actual number of cores (16) or higher. I was too, so I re-ran some tests and looked at what was happening with AMD’s built-in Ryzen Master tool. As you can see in the screen shot below, even though the # of threads was set to 18 in Folding@Home (a number greater than the 16 physical cores), not all 16 cores were fully engaged on the processor. In fact, only 14 were clocked up, and two were showing relatively lazy clock rates.
I suspect what is happening is that some of the threads were loaded onto “virtual” CPU cores (i.e. SMT / hyper threading). This might be something Windows 10 does to preserve a few free CPU cores for other tasks. In fact, I didn’t see all of the cores turbo up to full speed until I set Folding@Home’s thread count to 24. This incidentally is when performance starts coming back in on the plots above.
This weird SMT / Hyper-threading behavior is likely what is responsible for the large drop-off / flat part of the performance and efficiency curves that exists from thread count = 16 to 23. As you can see in the picture below, once you fully load all the available threads, the CPU frequencies on each core all hit the maximum value, as expected.
The following plots compare overall performance, power consumption, and efficiency of my new AMD Ryzen 9 3950x Folding@Home rig to other hardware configurations I have tested so far.
As you can see from the plot below, the Ryzen 9 3950x running a 32-thread Folding@Home solve can compete with relatively modern graphics cards in terms of raw performance. High-end GPUs will still offer more performance, but for a processor, getting over 400K PPD is very impressive. This is significantly more PPD than the previous processors I have tested (AMD Bulldozer-based FX-8320e, AMD Phenom II X6 1100t, Intel Core2Quad Q6600, etc). Admittedly I have not tested very many CPUs, since this is much more involved than just swapping out graphics cards to test.
From a total system power consumption standpoint, my new benchmark machine with the AMD Ryzen 9 3950x has a surprisingly low total power draw when running Folding. Another interesting point is that since the 3950x lacks onboard graphics, I had to have a graphics card installed to get display. In my case, I had the Nvidia GTX 1650 installed, since this is a relatively low power consumption card that should provide minimal overhead. As you can see below, folding on the 3950x CPU (with the 1650 GPU idle) uses nearly the same amount of power as folding on the 1650 GPU (with the 3950x idle).
Efficiency is the point of this blog, and in this respect the 3950x comes in towards the upper middle of the pack of hardware configurations I have tested. It’s definitely the most efficient processor I have tested so far, but graphics cards such as the 1660 Super and 1080 Ti are more efficient. Despite drawing more total power from the wall, these high-end GPUs do a lot more science.
Still, a PPD/Watt of over 2500 is not bad, and in this case the 3950x is more efficient than folding on the modest GPU installed in the same box (the Nvidia GTX 1650). Compared to the much older AMD FX-8320e, the Ryxen 9 3950x is 14x more efficient! What a difference 7 years can make!
The 16-core, 32-thread AMD Ryzen 9 3950x is one fast processor, and can do a lot of science for the Folding@Home distributed computing project. Although mid to high-end graphics cards such as the 1080 Ti ($450 on the used market) can outperform the $700 3950x in terms of performance and efficiency, it is still important to have a smattering of high-end CPU folding rigs on the Folding@Home network, because some molecules can only be solved on CPUs.
There is a general trend of increasing efficiency and performance as the # of CPU threads allocated to Folding@Home increases. For the Ryzen 9 3950x, using a setting of 30 or 32 threads is recommended for maximum performance and efficiency. If you plan on using your computer for other tasks, or for simultaneously folding on the GPU, 30 threads is the ideal CPU slot setting.
Please Support My Blog!
If you are interested in measuring the power consumption of your own computer (or any device), please consider purchasing a P3 Kill A Watt Power Meter from Amazon. You’ll be surprised what a $35 investment in a watt meter can tell you about your home’s power usage, and if you make a few changes based on what you learn you will save money every year! Using this link won’t cost you anything extra, but will provide me with a small percentage of the sale to support the site hosting fees of GreenFolding@Home.
If you enjoyed this article, perhaps you are in the market for an AMD Ryzen 9 3950x or similar Ryzen processor. If so, please consider using one of the links below to buy one from Amazon. Thanks for reading!
In the next article, I’ll disable multithreading (SMT) to see the effect of virtualized CPU cores on Folding@Home performance.
Later, I plan to enable core performance boost on the 3950x to see what effect the automatic clock frequency and voltage overclocking has on Folding@Home performance and efficiency.