The findings below are with the new version of EPP which began being rewritten in May 2021 to be fast and independent of AutoGate and interoperable with FlowJo and other software. The new version includes a new clustering method “modal clustering” for the higher parts of a gating hierarchy but it still relies on DBM clusters for determining stop points. Certain time costly elements of the old EPP are no longer used.
To assess EPP’s value as an unsupervised fully automatic gater,
- Run it on 7 published data sets where there are expert manual gates defined that can be compared with automatic gating results.
- Compare EPP accuracy and speed to UMAP. Flow informaticians frequently run UMAP and tSNE as unsupervised fully automatic gaters. UMAP runs is not our UST with supervised templates. We use basic reduction to find “data islands” and then DBM clustering to delineate the borders of those clusters.
- For 4 data sets, EPP achieves very high and better match scores (with the expert gates) than UMAP.
- For 1 data set (Eliver’s), both gating methods achieve very high and equal match scores.
- For 2 data sets UMAP achieves high and better match scores than EPP
- EPP’s results are much easier to explain to the conventional flow analyst than UMAP’s or the results of most “all at once” gating methods.
- Neither EPP nor UMAP is consistently a winner in speed. EPP seems to do better with larger data (> 200k cells). But the amount of splits in data can nullify any advantages with data size. The UMAP upon which my results are based is (naturally) our UMAP which we have sped up significantly since last September. It is now much faster than other implementations particularly with large data. The paper reviewer may prefer we compare EPP with the original python UMAP that everyone (including FlowJo) uses.
- The speed of matching depends on the difference in number of subsets between the sets of subsets being matched.