@zero24x the stagger is dependent on a few factors;
target file size
available memory
IO strategy implemented
TLDR;
granularity = floor ( <target_size> / <memory> )
or, when using gpuplot -buffer
granularity = floor ( <target_size> / <memory> / 2 )
example 1
the stagger value directly relates to the amount of memory allocated to the plotting process. Nonces are computed sequentially (with 4096 sequential scoops in them), but writing a file with stagger 1
[id_start_length_stagger]
filename: 0_0_1024_1
produces a 256 MiB ( 1024 * 4096 * 64Bytes) file that stored nonces like this:
0,1,2,..4095
..
0,1,2,..,4095 (1024x total)
which is very unfortunate for mining, as a lot of seeks have to be performed.
While mining, we're interested in a specific scoop in all nonces - this hardens Burst against GPU/ASIC attacks.
Therefore, the perfect organization of this example file would be
filename: 0_0_1024_1024
0,0,..,0 (1024x)
1,1,..,1 (1024x)
..
4095,4095,..,4095 (1024x)
This organziation allows the miner process to read 1024 scoops needed for the particular block to be solved in one go, a sequential read of 1024 * 64 Bytes = 64 KiB, as opposed to 1024 * 64 Bytes and 1023 seeks in between (head movement).
So the plotter process computes as many nonces as fit the configured memory limit. Upon writing into the file,
all scoops 0 are collected and written sequentially,
repeat for the remaining 4095 scoops.
if you allocated 32 MiB to the plotter, the internal structure of the file is:
filename: 0_0_1024_128
0,0,..,0 (128x)
..
4095,4095,..,4095 (128x)
0,0,..,0 (128x)
..
4095,4095,..,4095 (128x)
... (repeated for a total of 8 times).
The plotter can therefore only plot a multiple of 128 nonces (in this example).
gpuplot has another "tweak" - it uses twice the memory for a shadow copy to allow for parallel computing and file I/O.
If your stagger value is less than, say, 8 MiB (131,072), your disk may not operate optimally because it spends more time seeking then reading. You then need to optimize your file, that is: reorganize from
0_0_1024_128 to
0_0_1024_1024.
But there is a variant on this, and it trades compute power against IO
example 2
using the gpuplot terminology: you may plot in "direct" or "buffere" mode.
"buffer mode" was example 1 above.
"direct mode" computes file-length times the scoop 0 and writes them out.
Then it computes file-length times the scoop 0 to 1, discards 0, writes all scoop 1 out.
Repeat until finished (4096 scoops).
The problem is, scoops (64 Bytes) can not be computed "individually" but only as part of a whole nonce (4096 * 64 Bytes).
If you plot a 1 TiB file ( 1 TiB = 4,194,304 nonces ) every single scoop needs 256 MiB ( 1 TiB / 4096 ).
If you assign 4 GiB memory to gpuplot, 16 scoops can be computed in one go ( 4 GiB / 256 MiB ).
No double buffering here, as computing is slow - you throw away ~127 of 128 results - you keep only the requested 16 out of computed 4096 values.
(I think it aborts right after hitting the wanted scoop, hence 127/128 and not 255/256, a 50% speedup).
This scenario only makes sense if your IO is a lot slower than computation (most users : single disk and potent GPU).
The plotter can therefore only plot a multiple of 16 nonces (in this example).
example 3
another variant is xplotter, which has a fast mode of pre-allocating a sequential file of the target size on NTFS.
wplot (?) pre-allocated a continous file by writing out <target-size> zeroes (to have it physically sequential on disk) and then seeking in this file for writing a fully optimized file (length==stagger).
It then computes nonces as usual (scoops 0..4095), aggregates them according to available memory, and then writes out a fully optimized file by doing the seeks for perfect placement.
If we still have 4 GiB memory for plotting, the granularity is 256 ( 1 TiB / 4 GiB ).