In that case, it might be fine to include all the data, regardless of their sample sizes. The alternative of removing some data can introduce biases. If this is a genuine concern, you could try both approaches and compare the differences.
Is the PF
column generated based on a customized function? An alternative could be to consider an adaptive approach, as discussed in this blog post.
Gang