GPU plot generator v4.1.1 (Win/Linux)



  • Hey guys,

    Has anyone else had an issue with gpuPlotGenerator where in buffer mode it will start by using pretty much 100% of my GPU (which is what I want) at around 30k-50k nonces/minute and then it drops to around 12k nonces/minute. Sometimes it will stay at the higher amount and I can get 1TB plotted in an hour or less. Other times it drops down and only utilizes my gpu around 17%. It's weird that this doesn't happen all the time and I was wondering if anyone else had this problem.

    Also I have a MSI Gaming X RX 480 8GB GPU.

    My devices txt is:
    0 0 8096 64 8192

    My Batch file looks like this:
    gpuPlotGenerator.exe generate buffer j:\Burst\plots\3817333640460646654_1026441216_4091904_36864
    0_1488893107768_gpu.png

    As you can see there are a couple spikes here and there, but it is barely using my GPU.

    The only gain I have managed to get so far is running gpuPlotgenerator as admin and I seemed to get a gain 23%-27% percent utilization of my gpu and and 18k-20k nonces/minute.

    Any help would be greatly appreciated!


  • admin

    @KB-Bountyhunter Plotgenerator can only generate plots as fast as your drive can write them, it starts with 30-50k cause they are generated in memory ... once writing to disk, the rate may drop to the write speed of your drive ... you could plot to multiple drives at once to increase plotting speed.



  • Hi.

    I've been using Xplotter for some time, but recently got a RX480 card.

    One feature I loved about the Xplotter was, that if you set the nonce number to 0, then it will calculate the biggest plotfile, and fill the drive completely..

    Will this be a feature in a future update.

    And maybe an argument to automaticaly split the file, that would also be great.



  • mah, idk I tried ploting with HD6870 but system start to freeze.



  • Hi,

    is this project still being developed?
    I'm not really good at C++ since most of my work is based on Java but i think there are still some points that could be improved. I looked at my gpu and hdd usages and found it very odd to have like 50% average gpu usage and 50% average hdd usage becase generation and writing are not done in parallel...

    For example it should be possible to add two features:

    1. hybrid mode, which does fill a part of the hdd (like direct mode does for the full plot size) while the gpu calculates the plot and then write it
    2. reserve twice (4x?) as much system ram then generate part after part (keeps the gpu running all the time) while the hdd writes those parts


  • @tco42 i think in most cases gpu will be much faster in generating than the data write speed of hdd, so it would still sit idle



  • yeah, thats a part of the problem... lets say your gpu generates 24,000 nonce/min (equals 100mb/s) and your hdd can write 100mb/s so 1tb should take about 10,000 seconds in reality it looks like that (of example at 4gb stagger size):

    • gpu generates 4gb of data in 40 seconds
    • hdd writes 4gb of data in 40 seconds
    • gpu generates 2. set of 4gb of data in 40 seconds
    • hdd writes 2. set of 4gb of data in 40 seconds
      etc.
      -> it will take 20,080 seconds to finish

    if you add additional ram to hold a second set of data it looks like that:

    • gpu generates 4gb of data in 40 seconds
    • hdd writes 4gb of data while gpu generates the 2. set of data in 40 seconds
    • hdd writes 4gb of data while gpu generates the 3. set of data in 40 seconds
      etc
      -> it will only take 10,080 seconds to finish (about 100% faster)

    if your gpu is faster than your hdd write speeds you would still be able to keep your hdd writing 100% of the time so 48,000 nonce/min and 100mb/s write speed would look like that:

    • gpu generates 4gb of data in 20 seconds
    • hdd writes 4gb of data (40s) while gpu generates the 2. set of data (20s) and waits (20s) in 40 seconds
    • hdd writes 4gb of data (40s) while gpu generates the 3. set of data (20s) and waits (20s) in 40 seconds
      etc.
      -> it will take 10,060 seconds to finish

    EDIT:
    a faster gpu would ofcourse change the first example to 40s hdd and 20s gpu which results in 15,060 seconds so the optimized code would be about 33% faster

    EDIT 2:
    you should be able to optimize the plot while its in system ram and waiting to be writen without any additional time



  • What the hell is this supposed to mean:

    Install the build-essential and g++ packets. Install OpenCL (available in the manufacturer SDKs). You may have to install the opencl headers ([apt-get install opencl-headers] on Ubuntu).

    Modify the [PLATFORM] variable to one of [32] or [64] depending on the target platform. Modify the [OPENCL_INCLUDE] and [OPENCL_LIB] variables of the Makefile to the correct path. Example:

    OPENCL_INCLUDE = /opt/AMDAPPSDK-2.9-1/include
    OPENCL_LIB = /opt/AMDAPPSDK-2.9-1/lib/x86_64

    ????



  • @Bradiss69 said in GPU plot generator v4.0.3 (Win/Linux):

    Hello everyone. I decided to give this GPU plotting method a try. Once I got through the initial stages of prepping my system I recall there being a question of whether gpuPlotGenerator 4.0.3 supported CUDA or not.

    It is opencl, but the nividia cuda drivers include opencl support.



  • So I came across 4 Fermi M2090, installed CentOS-6.9, the toolchain, …, and was able to build gpuPlotGenerator-4.0.4

    #bin/gpuPlotGenerator.exe listDevices 0
    Id:                          3
    Type:                        GPU
    Name:                        Tesla M2090
    Vendor:                      NVIDIA Corporation
    Version:                     OpenCL 1.1 CUDA
    Driver version:              375.66
    Max clock frequency:         1301MHz
    Max compute units:           16
    Global memory size:          5GB 946MB 704KB
    Max memory allocation size:  1GB 492MB 688KB
    Max work group size:         1024
    Local memory size:           48KB
    Max work-item sizes:         (1024, 1024, 64)
    

    BUT: Plotting in buffer mode just hangs at the last "buffer block" - up to that point all is good, GPUs are busy, the output file grows. Then the last piece is never written, GPUs and plotter process are idle.

    This command line produces a 14 GiB file and omits the last 2 GiB:

    #bin/gpuPlotGenerator.exe generate buffer /tmp/12345678901234567890_738197504_65536_8192
    bin/gpuPlotGenerator.exe: /usr/local/cuda-8.0/targets/x86_64-linux/lib/libOpenCL.so.1: no version information available (required by bin/gpuPlotGenerator.exe)
    -------------------------
    GPU plot generator v4.0.4
    -------------------------
    Author:   Cryo
    Bitcoin:  138gMBhCrNkbaiTCmUhP9HLU9xwn5QKZgD
    Burst:    BURST-YA29-QCEW-QXC3-BKXDL
    ----
    Loading platforms...
    Loading devices...
    Loading devices configurations...
    Initializing generation devices...
        [0] Device: Tesla M2090 (OpenCL 1.1 CUDA)
        [0] Device memory: 512MB
        [0] CPU memory: 512MB
    Initializing generation contexts...
        [0] Path: /tmp/12345678901234567890_738197504_65536_8192
        [0] Nonces: 738197504 to 738263039 (16GB 0MB)
        [0] CPU memory: 2GB 0MB
    ----
    Devices number: 1
    Plots files number: 1
    Total nonces number: 65536
    CPU memory: 2GB 512MB
    ----
    Generating nonces...
    

    Oh, and I played around with devices.txt, single+multiple GPU, various localWorkSize + hashesNumber, to no avail;

    0 0 2048 128 8192
    0 0 2048 256 8192
    0 0 4096 512 8192
    ...
    

    globalWorkSize (corresponding to RAM on GPU) MUST be under 4 GiB, although these M2090 have 6. -?-

    Yes, /tmp has 32 GiB of free space.

    Ideas, anyone ?



  • @Akito you have to install the build toolchain (compiler, kernel-headers, nvidia drivers and cuda-devel) and build it yourself.

    I did this (NOT a dev type) with the help of google within 2 hours. Add 3 hours trying to install centos7 (crashes?! wtf?) and going back to centos6. Did I mention my hatred for linux ?

    Or you download the binaries.



  • @tco42 said in GPU plot generator v4.0.3 (Win/Linux):

    EDIT 2:
    you should be able to optimize the plot while its in system ram and waiting to be writen without any additional time

    It does optimize all data collected from GPUs, that is the reason the main process needs memory.

    You define the GPU's memory (3rd parameter in devices.txt), and you give a stagger value for the plotfile - THAT defines the amount of RAM your process needs for collection/reordering.

    In order to keep computing, you'd need twice the RAM for the main process (and synchronization/barrier logic). This practically halves your stagger, which is not good for later optmization, as it influences the read-speed of your harddisk (lower stagger -> more head movement).

    The source is there, hack away.



  • @vaxman I already did this before I posted what you replied to. It doesn't work this way. At all. It literally doesn't work in any way. No matter what I tried to install from all the things you mentioned, it always failed with a huge dependency hell, that isn't even manually fixable. (This of course applies to the machine that has actually a high-end graphics card.)



  • @vaxman im still not sure why there is such a low gpu usage at my setup...
    im currently using 2* 1080ti, 2* xeon E5-2620, 128gb ram and 12* 8tb hdds

    i can get more than 200,000 nonce / min when plotting to a single samsung 960 pro which should be the maximum those gpus can calculate. when plotting to those 12 hdds my speed drops to about 50.000 nonce / min. It seems like the gpus are only working when the hdds finished writing and the hdds are only writing when the gpus finished calculating.



  • @tco42

    It seems like the gpus are only working when the hdds finished writing and the hdds are only writing when the gpus finished calculating.

    Yes, this is the way it works. You outlined the reasons above.
    There is one special case, though:

    1- when you have the same globalWorkSize in devices.txt and stagger value for output file
    2- your OS is able to ingest and buffer at least this amount of data
    3- you write that file asynchronously to the filesystem - i.e. don't wait for confirmation.

    (1) minimizes data shuffling, but this is not too important as memory bandwith is high today
    (2)+(3) are the important things here, and the programmer has no control over that.

    If your output channel is blocked, you can't get data out. You see that in your own installation:
    When the target is fast enough, the data just streams out, no major hickups.
    You say you get ~800 MiB/s (200k nonce/min) when your plot target is a SSD. This is good.

    When the target is not fast enough or simply has more latency, the OS needs to buffer more.
    If you have an array set up that should be able to ingest 800 MB/s, you are seeing effects
    of filesystem design on moving disk heads.

    So let's see.

    1. Are the disks single or in any sort of array ? >200 MB/s plot speed dictate some array.
    2. Which filesystem ?


  • This post is deleted!


  • @tco42 said in GPU plot generator v4.0.3 (Win/Linux):

    I'm currently using 2* 1080Ti, 2* Xeon E5-2620, 128gb ram and 12* 8Tb HDDs. I can get more than 200k nonce /min when plotting to a single Samsung 960-Pro which should be the maximum those gpus can calculate. when plotting to those 12 HDDs my speed drops to about 50k Nonce /Min. It seems like the GPUs are only working when the hdds finished writing and the HDDs are only writing when the GPUs finished calculating.

    tco42. ~ You mention writing plots to a Samsung-SSD. Is that a large-capacity device, and do you then transfer the completed plots over to your HDD-8Tb workers? Plus, are you able to combine the 2x GPUs via SLI & does that work any better? Also, are you able to plot to several HDD (simultaneously) from those powerful video cards, as others suggest, or is that just not viable?

    Thx.



  • @vaxman

    1. those discs (hgst deskstar 8tb) are all connected to a 24 port sas hba (https://www.broadcom.com/products/storage/host-bus-adapters/sas-9305-24i)
      i tested the maximum speeds of the hba by copying from my first 12 hdds to the second set of 12 hdds which resulted in about 200mb/s per drive -> 2,4gb/s total read and write with all hdds connected to the same hba

    2. I'm currently using ntfs

    @BeholdMiNuggets
    those plots were just some tests to find the bottleneck that limits my plotting.
    you dont need to connect the gpus using sli, there are just minor differences. those gpus should be able to put lots of data to the drives (200k nonce / min = 50gb / min = 20 min/tb) but there is a bottleneck (mb in the plotter, mb in windows) that i'm still trying to find



  • @tco42 said in GPU plot generator v4.0.3 (Win/Linux):

    1. those discs (hgst deskstar 8tb) are all connected to a 24 port sas hba.

    Do you find the speed difference between SAS/SATA makes any difference for Burst Mining? And what about the Hard-Drive Cache?

    (https://www.broadcom.com/products/storage/host-bus-adapters/sas-9305-24i).

    Only see 12x data slots on that [PCIe 8x] DIC. Do they all double up? Presume you also have a dedicated (rack) server case to store all those (24) drives. Any pictures?

    i tested the maximum speeds of the hba by copying from my first 12 hdds to the second set of 12 hdds which resulted in about 200mb/s per drive -> 2,4gb/s total read and write with all hdds connected to the same hba.

    I guess that's why you need Server CPUs. What's the MoBo & Psu?

    1. I'm currently using ntfs

    Is there a possible, superior option that you're considering?

    those plots were just some tests to find the bottleneck that limits my plotting.> you dont need to connect the gpus using sli, there are just minor differences. those gpus should be able to put lots of data to the drives (200k nonce / min = 50gb / min = 20 min/tb). But there is a bottleneck (mb in the plotter, mb in windows) that i'm still trying to find.

    Fortunately, each HDD only has to be plotted once or twice. So, not that burdensome considering the projected working life of the Drives.



  • @BeholdMiNuggets

    @BeholdMiNuggets said in GPU plot generator v4.0.3 (Win/Linux):

    @tco42 said in GPU plot generator v4.0.3 (Win/Linux):

    1. those discs (hgst deskstar 8tb) are all connected to a 24 port sas hba.

    Do you find the speed difference between SAS/SATA makes any difference for Burst Mining? And what about the Hard-Drive Cache?

    (https://www.broadcom.com/products/storage/host-bus-adapters/sas-9305-24i).

    Only see 12x data slots on that [PCIe 8x] DIC. Do they all double up? Presume you also have a dedicated (rack) server case to store all those (24) drives. Any pictures?

    SAS or SATA makes no difference in this case, because i use SATA drives and SAS->SATA adapter cables.

    There are 6 SAS connectors on the broadcom and each connector has 4* SAS3 connections. My "professional" rack case is selfmade 😃 will post pictures soon

    i tested the maximum speeds of the hba by copying from my first 12 hdds to the second set of 12 hdds which resulted in about 200mb/s per drive -> 2,4gb/s total read and write with all hdds connected to the same hba.

    I guess that's why you need Server CPUs. What's the MoBo & Psu?

    You can use the same PCIe adapter in every mobo with a PCIe x8 slot. It does not need too much gpu power. Main reason to go for xeons is, that i get 2*40 PCIe lanes which are quite useful when i use that server as a workstation/gaming pc when mining is no longer profitable.

    1. I'm currently using ntfs

    Is there a possible, superior option that you're considering?

    EXT4 could be useful when switching from windows to linux.

    those plots were just some tests to find the bottleneck that limits my plotting.> you dont need to connect the gpus using sli, there are just minor differences. those gpus should be able to put lots of data to the drives (200k nonce / min = 50gb / min = 20 min/tb). But there is a bottleneck (mb in the plotter, mb in windows) that i'm still trying to find.

    Fortunately, each HDD only has to be plotted once or twice. So, not that burdensome considering the projected working life of the Drives.

    Yeah, thats a big plus. But i still needed about 120h to plot only 12 drives (96tb) and this server might get upgraded to more than a pb soon 🙂