I. Introduction, with Example in Population and Area of Countries and Country-Like Entities
In this post, I introduce a way of looking at correlated data I will term “dual frontier analysis”.
What motivates this idea? Often, we like to compare entities via a certain “rate”, how much of one quantity there is for a unit amount of another quantity, across a set of entities. One example of this is population density. But if you, like me, have glanced at a population density chart of, say, the countries, you may have had one of the same first reactions as I have had: “the top of the chart is pretty much just a listing of city-states!” You might then proceed with questioning whether it really makes sense to compare this quantity for city-states versus for “more normal” countries. Maybe we want a way of looking at this data that better captures what our prior idea of what an “impressively high” or “impressively low” population density is: Bangladesh’s population density definitely “feels” more impressive, even if it’s not as numerically high as Bahrain’s.
There are probably solutions to this problem involving designing a prior distribution of likeliness of one variable in terms of the other, and then comparing percentiles along respective distributions, but going down this path requires crunching a lot of numbers and, more importantly, extensive knowledge in the ideas being analyzed already.
Here is another solution: output the data on the dual frontiers. If two attributes are somewhat correlated, a scatterplot for entities in these attributes probably looks something like this.
What we’re outputting is this.
That is, we’re outputting entities for which no other entity has both more of one attribute and less of the other attribute than this entity.
In this way, we would capture, for instance, the country with the highest population density among countries of similar size. (We could even extend this to become a quantitative metric for entities not on this frontier: the percentage of the way an entity is from one frontier to the other.)
One could also look at an entity in this data and compare it to neighboring entities and see how much larger in one attribute another entity must be to be larger in the other attribute as well (as otherwise, this entity would also be in the frontier), which shows how prominently impressive a particular entity is in the ratio.