pyseer was first written a python reimplementation of seer, which was written in C++.
pyseer uses linear models with fixed or mixed effects to estimate the
effect of genetic variation in a bacterial population on a phenotype of
interest, while accounting for potentially very strong confounding population
structure. This allows for genome-wide association studies (GWAS) to be performed in
clonal organisms such as bacteria and viruses.
The original version of
seer used sequence elements (k-mers) to represent
variation across the pan-genome.
pyseer also allows variants stored in VCF
files (e.g. SNPs and INDELs mapped against a reference genome) or Rtab files
(e.g. from roary or
piggy to be used too). There are also a greater range of association models
available, and tools to help with processing the output.
Testing shows that results (p-values) should be the same as the original
seer, with a runtime that is roughly twice as long as the optimised C++
We have also extended
pyseer to fit association models to the whole genome, which also
allows the use of machine learning to predict traits in new samples.
If you find pyseer useful, please cite:
Lees, John A., Galardini, M., et al. pyseer: a comprehensive tool for microbial pangenome-wide association studies. Bioinformatics 34:4310–4312 (2018). doi:10.1093/bioinformatics/bty539.
If you use unitigs (through unitig-counter) please cite:
Jaillard M., Lima L. et al. A fast and agnostic method for bacterial genome-wide association studies: Bridging the gap between k-mers and genetic events. PLOS Genetics. 14, e1007758 (2018). doi:10.1371/journal.pgen.1007758.
The whole genome/predictive models:
Lees, John A., Mai, T. T., et al. Improved inference and prediction of bacterial genotype-phenotype associations using interpretable pangenome-spanning regressions. (2020) Preprint: https://doi.org/10.1101/852426
- pyseer documentation
- Option reference
- Best practices
- GWAS tutorial
- Prediction tutorial
- Reference documentation