Greater phylogenetic signal is often found in parsimony-based analyses of third codon positions of protein-coding genes relative to their corresponding first and second codon positions, even for early-derived (“basal”) clades. We used the Soltis et al. (2000; Bot. J. Linn. Soc. 133:381–461) data matrix of atpB and rbcL from 567 seed plants to quantify how each of six factors (observed character-state space, frequencies of observed character states, substitution probabilities among nucleotides, rate heterogeneity among sites, overall rate of evolution, and number of parsimony-informative characters) contributed to this phenomenon. Each of these six factors was estimated from the original data matrix for parsimony-informative third codon positions considered separately from first and second codon positions combined. One of the most parsimonious trees found was used as the constraint topology; branch lengths were estimated using likelihood-based distances, and characters were simulated on this tree. Differential frequencies of observed character states were found to be the most limiting of the factors simulated for all three codon positions. Differential frequencies of observed character states and differential substitution probabilities among states were relatively advantageous for first and second codon positions. In contrast, differential numbers of observed character states, differential rate heterogeneity among sites, the greater number of parsimony-informative characters, and the higher overall rate of evolution were relatively advantageous for third codon positions. The amount of possible synapomorphy was predictive of the overall success of resolution.
Simmons, M. P., Zhang, LI-B., Webb, C. T., & Reeves, A. (2006). How can third codon positions outperform first and second codon positions in phylogenetic inference? An empirical example from the seed plants. Systematic Biology, 55(2), 245-258. https://doi.org/10.1080/10635150500481473