pyseer supports the use of multiple CPUs through the
--cpu option. This
sends batches of processed variants to a core, which will fit the chosen model
on all variants in the batch.
--block-size controls the number of variants sent to each
core. The higher this is set the more efficient the use of CPUs will be (up to
a limit, set by the time spent reading the variant input) at the expense of
a roughly linear increase in memory usage. The default is 1000, using which on
8 cores required around 1.5Gb of memory for a 1.4x speedup with the mixed model.
Increasing this to 30000 while using 4 cores gave a similar (1.5x) speedup, but needed 12Gb of memory.
Depending on your computing architecture, you may wish to split the input and
run separate jobs. This will be more efficient, but is less convenient. This
can be done using GNU
split -d -n l/8 fsm_kmers.txt fsm_out
This would split the input k-mers into 8 separate files.