SLEDGe: Inference of ancient whole genome duplications using machine learning


Ancient whole-genome duplication–previous genome duplication events that have since been eroded via diploidization, are increasingly identified throughout eukaryotes. One of the constraints against large-scale studies of ancient eukaryotic WGD is the relatively large, high-quality datasets often needed to definitively establish ancient WGD events; alternatively, the more low-input method interpretation of genome-wide synonymous substitution rates (Ks plots) is prone to bias and inconsistency. We improve upon the shortcomings of the current Ks plot method by building a Ks plot simulator. This data-agnostic approach simulates common distributions found in Ks plots in the presence or absence of ancient WGD signatures. In conjunction with a machine-learning classifier, this approach can quickly assess the likelihood that transcriptomic and genomic data bear WGD signatures. On independently-generated synthetic data and real plant transcriptomic data, SLEDGE is capable of correctly identifying ancient WGD in 93-100% of samples. This approach can serve as a quick classification step in large-scale genomic analyses, identifying putative ancient polyploids for further study.

Competing Interest Statement

The authors have declared no competing interest.

Source link

Leave a Reply

Your email address will not be published. Required fields are marked *