MultiprocessingΒΆ

pyseer supports the use of multiple CPUs through the --cpu option. This sends batches of processed variants to a core, which will fit the chosen model on all variants in the batch.

The constant --block-size controls the number of variants sent to each core. The higher this is set the more efficient the use of CPUs will be (up to a limit, set by the time spent reading the variant input) at the expense of a roughly linear increase in memory usage. The default is 1000, using which on 8 cores required around 1.5Gb of memory for a 1.4x speedup with the mixed model. Increasing this to 30000 while using 4 cores gave a similar (1.5x) speedup, but needed 12Gb of memory.

Depending on your computing architecture, you may wish to split the input and run separate jobs. This will be more efficient, but is less convenient. This can be done using GNU split:

split -d -n l/8 fsm_kmers.txt fsm_out

This would split the input k-mers into 8 separate files.