Does Harvard really have an admissions problem when it comes to Asian students? The numbers suggest not

Background

Harvard came under heavy scrutiny last year when an activist group known as Students for Fair Admissions filed a lawsuit against the Ivy League institution for discriminating against Asian students during their admissions process.

The trial, which last from October to November of 2018, raised several questions not only within the general public but also among other top rated schools in the United States.

As the merits of both sides are weighed by federal judge Allison Burroughs from Massachusetts, implications of the trial are already apparent.

Likely a response to the case, Harvard’s admission rate increased from 22.7% last year to 25.4% this year, a dramatic spike considering that the last time such a number was reported was in 1980 according to data released as part of the lawsuit.

Fig 1. Bar graph showing the admission rate for Harvard over the last decade

From Fig 1, it can be seen that the admission rate had an unusual increase for the most recent admission year compared to the previous ones. Apart from 2016, the other years in the graph above show a gradual increase in admissions until we get to 2023 which happens to be the most recent admitted class.

Since Harvard adopts the practice called affirmative action, which takes a candidates race into consideration during applications, the case has garnered even more attention. Students for Fair Admissions put forth a model which they claimed proved that Harvard in fact discriminated against Asian students during their admissions process.

To find out whether there is any merit to this case, we gathered data for colleges across the country in order to compare Harvard’s admission numbers against the others.

Note that the demographic in question are Asian Americans specifically and not international students who are Asians.

Data Description

Data from College Scorecard was used for this analysis, which contains extensive information on universities from across the U.S. The website provides data dictionaries, changelogs and reports to help navigate through the vast amount of data.

The data is grouped into useful categories such as academics, admissions, students, cost etc, which can be directly downloaded in CSV format or using API calls.

For this project, data for the most recent cohort available was extracted representing the 2017-2018 academic year. Since the datasets are available in CSV format, loading it into Python for wrangling and analysis is convenient using the Pandas library.

To get a sense of the vastness of the dataset, there are 7058 rows (universities) and 1977 columns (college variables). A snippet of the initial dataset is shown in Fig 2, displaying variables such as university name (INSTNM), location (city, state, zip) and website URL (INSTURL).

Fig 2. Snippet of initial dataset

For our analysis, just a handful of variables are picked based on information that is required to answer the research question at hand.

Methodology

Exploratory Data Analysis

In order to get a preliminary idea of Asian student enrollment for each university, the dataset is first filtered to include schools that are comparable to Harvard. Since it is known that Harvard is a top school and would likely have a high SAT score requirement, the filtering is done by determining the highest combined SAT scores, specifically taking reading and math scores into consideration.

By default, the dataset consisted of individual median reading and math scores which allowed us to easily calculate the combined score by simply adding the two sets of values together.

Fig 3. Table showing universities ranked by highest combined reading and math SAT scores

Fig 3 shows the top 10 schools based on combined reading and math SAT scores. It comes as no surprise that prestigious institutions such as California Institute of Technology (CalTech), Rice University, Massachusetts Institute of Technology (MIT) and University of Chicago make the first four spots on the list, all having scores of greater than 1530. In fact all the colleges on this list have a score greater than or equal to 1510 which is quite impressive.

As expected, Harvard sits high on this list as well at fifth position with a score of 1525. It’s not enough to beat CalTech with a mammoth total of 1560 but impressive nonetheless.

To get a broader picture of the rankings, a histogram of the top 30 colleges based on combined SAT score is shown in Fig 4 below. Note that Harvard is marked in red with the fifth highest score and boasting a percentile of 99.67 among colleges that do have valid combined scores. It turns out that only 1233 schools are able to report combined SAT scores.

Fig 4. Bar plot of the top 30 universities ranked by combined SAT score with Harvard shown in red

To make this effort simpler, the dataset is filtered to just include schools that have a combined SAT score of more than 1200. The total number of schools comes out to be 256.

For a first cut, the universities are sorted by highest Asian enrollment and the top 10 schools following this can be seen in fig 5. Immediately, one notices that Harvard does not make this list. University of California Berkeley (Berkeley) sits on top with a huge 10554 Asian students followed by University of California Irvine and University of California San Diego with 10452 and 9890 Asian students respectively.

Fig 5. Table showing the top 10 universities ranked by total Asian enrollment

Fig 6 shows a bar plot of the top 35 schools ranked by total Asian enrollment and Harvard is still no where to be seen.

Fig 6. Bar plot of the top 35 universities ranked by total Asian enrollment with Harvard shown in red

So, where does Harvard lie when it comes to Asian enrollment? It’s actually at 54th spot with an Asian enrollment of 1463 which would not be captured in the plot above, resulting in a percentile of 79.29.

However, one thing to note is that the university from Cambridge has a relatively lower admission rate and a lower total enrollment as a result. Therefore, total Asian enrollment is not the best metric to judge the admission of Asian students for the school.

In fig 7, the top 10 universities are shown ordered by highest Asian students enrollment percentage. CalTech once again tops the charts with a massive 43.29% followed by University of the Sciences, University of California Irvine and University of California San Diego at 35.80%, 35.68% and 34.61% respectively. Interestingly, Harvard does not make this list while half of the schools turned out to be from California.

Fig 7. Table showing the top 10 universities ranked by Asian enrollment percentage

Once again, to get a bigger picture of these rankings, the histogram in fig 8 visualizes the top 35 universities with the highest Asian enrollment percentage. Notice that Harvard (marked in red) has the lowest score out of the 35 schools with a percentage of 19.42%.

Fig 8. Bar plot of the top 35 universities ranked by Asian enrollment percentage with Harvard shown in red

Compared to the entire list of 7058 colleges, rank 35 does make for good reading but when we only take the top 256 schools (in terms of SAT score) into consideration, Harvard sits at the 87.62 percentile which is a considerable drop from their SAT score percentile of 99.67.

To further study these trends in Asian student enrollment, the change in rankings from SAT score to Asian enrollment percentage is analyzed. In order to do that, the ranks of each university are stored separately for both SAT scores and Asian enrollment percentage. The delta between these two values are then visualized to get an idea of how these trends change for the top 30 universities with the highest SAT scores.

Fig 9. Table showing a snippet of the comparison of ranks for SAT score, Asian enrollment and Asian enrollment percentage

To get a better idea of this technique, fig 9 shows a snippet of a table where the top five universities (ranked by SAT score) are shown with the corresponding Asian enrollment rank as well as the Asian enrollment percentage also provided for comparison.

Fig 10 shows a horizontal bar plot showing the change in rankings when going from SAT score to Asian enrollment percentage. For example, Franklin W Olin College of Engineering and Webb Institute see the greatest change whereas Wellesley College is ranked higher for enrollment percentage than for SAT score as depicted by the positive value.

Fig 10. Bar plot showing the delta between SAT score rank and Asian enrollment percentage rank for the top 30 universities ordered by SAT score with Harvard in green

The goal of the above plot is to show if a particular school with a high SAT score drops significantly in ranking when Asian students enrollment is taken into consideration. On a closer look, the more renowned colleges such as CalTech (no change), Johns Hopkins, Duke University, Princeton and Stanford see very little variation. Comparatively, Harvard (shown in green) sees a slightly higher change at around -30, although not too dramatic.

Lowest admission rates

To truly judge Harvard’s Asian enrollment numbers, it is further compared with other schools with low admission rates. To correctly filter these schools, it was important to make sure that both admission rate and Asian enrollment percentage were above 0 otherwise the dataset would include invalid/unwanted entries.

Fig 11 shows a table with the 10 schools having the lowest admission rates. It comes as no surprise that Harvard sits at third spot with an incredible 5.16%. The only schools higher ranked are Curtis Institute of Music and Stanford with 3.3% and 4.73% respectively.

Other well known schools as such Princeton, Columbia, Yale and CalTech also make the list.

Fig 11. Table showing the top 10 schools ranked by lowest admission rates

The table in fig 12 shows the schools sorted by highest Asian enrollment percentage. Harvard just about sneaks into this list sitting at number 10.

Fig 12. Table showing the 10 schools with the highest Asian enrollment percentage from the dataset containing schools with the 30 lowest admission rates

The bar plot in fig 13 shows the top 30 universities with Harvard marked in red at 10th spot. CalTech is by far the highest ranked followed by MIT and Johns Hopkins.

Fig 13. Bar plot of top 30 schools with highest Asian enrollment percentage after filtering the schools with the 30 lowest admission rates.

Finally, the most complete picture of the data is displayed in fig 14 which shows a histogram of the Asian enrollment percentage of the 256 schools with the lowest admission rates. The number 256 was picked since the analysis with the SAT scores in the previous section was also done using the top 256 schools.

Fig 14. Histogram of Asian enrollment percent of the 256 schools with lowest admission rates

As can be seen from the plot above, the distribution is highly skewed to the right with the mean (8.77%) being expectedly higher than the median (5.405). Harvard sits well above the median at 19.42% and also has a percentile of 87.11.

Ivy league

Another important metric to measure Harvard’s admission practices was by comparing it to it’s fellow Ivy league colleges.

Fig 15. Table showing the eight Ivy league colleges ranked by the highest Asian enrollment percentage

Fig 15 shows the eight Ivy league colleges ordered by highest Asian enrollment percentage. Princeton leads the pack with 21.44% followed by University of Pennsylvania with 20.05% and Harvard with 19.42%. It is interesting to note that Harvard is in fact ranked higher here than five other Ivy league schools.

Statistical Inference

The population that was chosen for statistical analysis was universities with SAT scores above 700. There were several reasons for doing so:

  • To satisfy the conditions for hypothesis testing, the sample size should ideally be less than 10% of the population.
  • A SAT score threshold of 700 provides a dataset with 310 observations out of which 30 can be randomly sampled which in turn satisfy the criteria for having a sample size of at least 30 if the distribution is not normal.

Fig 16 shows the Asian enrollment percentage distribution of the 310 schools with the highest SAT scores. The 95% confidence intervals (CI) are shown in black (-4.6271611337503007, 33.184257907943866).

Fig 16. Histogram showing the distribution for the 310 schools with the highest SAT scores.

Once again, Harvard’s percentage is higher than both the mean (14.28%) and median (11.32%).

A sample of 30 observations was randomly selected from the dataset, the distribution shown in fig 17. Similar to the larger dataset, Harvard’s percentage is higher than both the mean and median which makes sense as this is a random sample from the population.

Fig 17. Histogram showing the distribution of 30 samples from the 310 observations

For the 30 samples, the CI is (1.1758912348476667, 28.70944209848566) which is also similar to that of the larger dataset. In the sample distribution, the mean and median are very close to together which is often an indicator of a normal or nearly normal distribution. The histogram in fig 17 does look much less skewed than the population distribution but the graph is not completely normal.

Both of the graphs above indicate that Harvard is in fact towards the higher end of the distribution when it comes to Asian enrollment percentage when compared with other schools with high SAT scores.

This theory is further explored with a hypothesis test. These are the hypotheses used:

  • Null hypothesis (H0): population mean (u) = Harvard’s score (19.42%)
  • Alternate hypothesis (HA): u < 19.42%

From this test, we are trying to determine whether Harvard’s enrollment percentage is unusually lower than the population mean or not. But since we have already seen that it is actually higher than the mean, the question is formulated in a different manner.

We are now trying to find out whether the population mean is unusually lower than Harvard’s score, which if true would prove that Harvard’s Asian enrollment is in fact much better than the average Asian enrollment.

Samplet-statisticp-value
Entire population-9.36957.93E-19
30 samples-3.43260.000909

The table above shows the results of the hypothesis testing using both the entire population (310 observations) as well as the 30 samples.

When using the entire population, the p-value comes out to be a very small number, much less than any significance level that we would pick (usual value is 5% or 0.05). This suggests that the null hypothesis can be rejected for the alternate which would mean that the population mean is significantly smaller than Harvard’s score. For the 30 samples dataset, the p-value is not as small but small enough to safely reject the null hypothesis once again.

Generally, only a sample of the population is selected for statistical analysis, the results of which need to be generalized to a larger population. However, the entire population can be analyzed for analytical insights without the goal of generalization.

Here are two main reasons when sampling the population can be useful:

  • Small population size (310 observations)
  • Uncommon characteristics between observations: Each college has unique attributes

Results

The table below summarizes the findings from before, highlighting Harvard’s rank and percentile during each analysis.

AnalysisSamplesHarvard RankPercentile
Highest SAT scores256599.67
Highest Asian enrollment2565479.29
Highest Asian enrollment percentage2563587.62
Lowest admission rates2561087.11
Ivy league colleges8375

As can be seen from the table, Harvard boasts one of the best combined SAT scores coming in at an impressive rank 5 out of the 7058 schools in the dataset which is an astounding stat.

When the Asian enrollment percentage is analyzed from the top 256 schools with the highest SAT scores, Harvard comes in at rank 35 for the enrollment percentage and at 54 for the total enrollment of Asian students, both respectable numbers. In fact, the school is in the 87th percentile when it comes to enrollment percentage, much higher than the median.

When analyzed with the 256 with the lowest admission rates, Harvard placed at rank 10 with a percentile of 87.11 which are impressive numbers.

And when it came to Ivy league colleges, Harvard placed at third spot out of the eight schools. It ranked higher than other prestigious schools such as Cornell, Yale and Columbia.

Hypothesis testing shows that the Asian enrollment percentage population mean is likely much lower than that of Harvard since the p-value of the test is much smaller than the significance level of 5%.

These results can be interpreted as such, given the population mean is equal to Harvard’s score (19.42%) the probability of observing a value of ~15% (sample mean) is very small. This is basically the definition of the p-value.

Discussion

The results from the descriptive statistics suggests that Harvard does in fact have a relatively higher Asian enrollment rate than most other schools. It ranks above the 75th percentile when it comes to the top 250 schools based on SAT score and lowest admission, coming in at rank 10 for the latter.

Harvard even ranks third when it comes to the eight Ivy league schools.

Statistical inference shows that Harvard’s enrollment percentage is clearly higher than the average value for the 310 schools that have a combined SAT score of more than 700.

The results of the hypothesis testing further confirms that not only is Harvard’s Asian students enrollment policy acceptable but it is also significantly better than that of the average school with a SAT score greater than 700.

Conclusion

Provide all these results, statistically, it is clear that there is no discrimination in Harvard’s Asian student enrollment policies.

If an enrollment value of 19.42% is considered a low value, other renowned schools are likely to come under scrutiny as well.

References

[1] https://www.nbcnews.com/news/asian-america/harvard-announces-high-admittance-asian-americans-judge-weighs-affirmative-action-n990051

[2] https://www.insidehighered.com/admissions/article/2019/04/01/share-asian-americans-hits-record-high-harvards-class-admitted

[3] https://collegescorecard.ed.gov/data/documentation/

Appendix