Background
Harvard came under heavy scrutiny last year when an activist group known as Students for Fair Admissions filed a lawsuit against the Ivy League institution for discriminating against Asian students during their admissions process.
The trial, which last from October to November of 2018, raised several questions not only within the general public but also among other top rated schools in the United States.
As the merits of both sides are weighed by federal judge Allison Burroughs from Massachusetts, implications of the trial are already apparent.
Likely a response to the case, Harvard’s admission rate increased from 22.7% last year to 25.4% this year, a dramatic spike considering that the last time such a number was reported was in 1980 according to data released as part of the lawsuit.
From Fig 1, it can be seen that the admission rate had an unusual increase for the most recent admission year compared to the previous ones. Apart from 2016, the other years in the graph above show a gradual increase in admissions until we get to 2023 which happens to be the most recent admitted class.
Since Harvard adopts the practice called affirmative action, which takes a candidates race into consideration during applications, the case has garnered even more attention. Students for Fair Admissions put forth a model which they claimed proved that Harvard in fact discriminated against Asian students during their admissions process.
To find out whether there is any merit to this case, we gathered data for colleges across the country in order to compare Harvard’s admission numbers against the others.
Note that the demographic in question are Asian Americans specifically and not international students who are Asians.
Data Description
Data from College Scorecard was used for this analysis, which contains extensive information on universities from across the U.S. The website provides data dictionaries, changelogs and reports to help navigate through the vast amount of data.
The data is grouped into useful categories such as academics, admissions, students, cost etc, which can be directly downloaded in CSV format or using API calls.
For this project, data for the most recent cohort available was extracted representing the 2017-2018 academic year. Since the datasets are available in CSV format, loading it into Python for wrangling and analysis is convenient using the Pandas library.
To get a sense of the vastness of the dataset, there are 7058 rows (universities) and 1977 columns (college variables). A snippet of the initial dataset is shown in Fig 2, displaying variables such as university name (INSTNM), location (city, state, zip) and website URL (INSTURL).
For our analysis, just a handful of variables are picked based on information that is required to answer the research question at hand.
Methodology
Exploratory Data Analysis
In order to get a preliminary idea of Asian student enrollment for each university, the dataset is first filtered to include schools that are comparable to Harvard. Since it is known that Harvard is a top school and would likely have a high SAT score requirement, the filtering is done by determining the highest combined SAT scores, specifically taking reading and math scores into consideration.
By default, the dataset consisted of individual median reading and math scores which allowed us to easily calculate the combined score by simply adding the two sets of values together.
Fig 3 shows the top 10 schools based on combined reading and math SAT scores. It comes as no surprise that prestigious institutions such as California Institute of Technology (CalTech), Rice University, Massachusetts Institute of Technology (MIT) and University of Chicago make the first four spots on the list, all having scores of greater than 1530. In fact all the colleges on this list have a score greater than or equal to 1510 which is quite impressive.
As expected, Harvard sits high on this list as well at fifth position with a score of 1525. It’s not enough to beat CalTech with a mammoth total of 1560 but impressive nonetheless.
To get a broader picture of the rankings, a histogram of the top 30 colleges based on combined SAT score is shown in Fig 4 below. Note that Harvard is marked in red with the fifth highest score and boasting a percentile of 99.67 among colleges that do have valid combined scores. It turns out that only 1233 schools are able to report combined SAT scores.
To make this effort simpler, the dataset is filtered to just include schools that have a combined SAT score of more than 1200. The total number of schools comes out to be 256.
For a first cut, the universities are sorted by highest Asian enrollment and the top 10 schools following this can be seen in fig 5. Immediately, one notices that Harvard does not make this list. University of California Berkeley (Berkeley) sits on top with a huge 10554 Asian students followed by University of California Irvine and University of California San Diego with 10452 and 9890 Asian students respectively.
Fig 6 shows a bar plot of the top 35 schools ranked by total Asian enrollment and Harvard is still no where to be seen.
So, where does Harvard lie when it comes to Asian enrollment? It’s actually at 54th spot with an Asian enrollment of 1463 which would not be captured in the plot above, resulting in a percentile of 79.29.
However, one thing to note is that the university from Cambridge has a relatively lower admission rate and a lower total enrollment as a result. Therefore, total Asian enrollment is not the best metric to judge the admission of Asian students for the school.
In fig 7, the top 10 universities are shown ordered by highest Asian students enrollment percentage. CalTech once again tops the charts with a massive 43.29% followed by University of the Sciences, University of California Irvine and University of California San Diego at 35.80%, 35.68% and 34.61% respectively. Interestingly, Harvard does not make this list while half of the schools turned out to be from California.
Once again, to get a bigger picture of these rankings, the histogram in fig 8 visualizes the top 35 universities with the highest Asian enrollment percentage. Notice that Harvard (marked in red) has the lowest score out of the 35 schools with a percentage of 19.42%.
Compared to the entire list of 7058 colleges, rank 35 does make for good reading but when we only take the top 256 schools (in terms of SAT score) into consideration, Harvard sits at the 87.62 percentile which is a considerable drop from their SAT score percentile of 99.67.
To further study these trends in Asian student enrollment, the change in rankings from SAT score to Asian enrollment percentage is analyzed. In order to do that, the ranks of each university are stored separately for both SAT scores and Asian enrollment percentage. The delta between these two values are then visualized to get an idea of how these trends change for the top 30 universities with the highest SAT scores.
To get a better idea of this technique, fig 9 shows a snippet of a table where the top five universities (ranked by SAT score) are shown with the corresponding Asian enrollment rank as well as the Asian enrollment percentage also provided for comparison.
Fig 10 shows a horizontal bar plot showing the change in rankings when going from SAT score to Asian enrollment percentage. For example, Franklin W Olin College of Engineering and Webb Institute see the greatest change whereas Wellesley College is ranked higher for enrollment percentage than for SAT score as depicted by the positive value.
The goal of the above plot is to show if a particular school with a high SAT score drops significantly in ranking when Asian students enrollment is taken into consideration. On a closer look, the more renowned colleges such as CalTech (no change), Johns Hopkins, Duke University, Princeton and Stanford see very little variation. Comparatively, Harvard (shown in green) sees a slightly higher change at around -30, although not too dramatic.
Lowest admission rates
To truly judge Harvard’s Asian enrollment numbers, it is further compared with other schools with low admission rates. To correctly filter these schools, it was important to make sure that both admission rate and Asian enrollment percentage were above 0 otherwise the dataset would include invalid/unwanted entries.
Fig 11 shows a table with the 10 schools having the lowest admission rates. It comes as no surprise that Harvard sits at third spot with an incredible 5.16%. The only schools higher ranked are Curtis Institute of Music and Stanford with 3.3% and 4.73% respectively.
Other well known schools as such Princeton, Columbia, Yale and CalTech also make the list.
The table in fig 12 shows the schools sorted by highest Asian enrollment percentage. Harvard just about sneaks into this list sitting at number 10.
The bar plot in fig 13 shows the top 30 universities with Harvard marked in red at 10th spot. CalTech is by far the highest ranked followed by MIT and Johns Hopkins.
Finally, the most complete picture of the data is displayed in fig 14 which shows a histogram of the Asian enrollment percentage of the 256 schools with the lowest admission rates. The number 256 was picked since the analysis with the SAT scores in the previous section was also done using the top 256 schools.
As can be seen from the plot above, the distribution is highly skewed to the right with the mean (8.77%) being expectedly higher than the median (5.405). Harvard sits well above the median at 19.42% and also has a percentile of 87.11.
Ivy league
Another important metric to measure Harvard’s admission practices was by comparing it to it’s fellow Ivy league colleges.
Fig 15 shows the eight Ivy league colleges ordered by highest Asian enrollment percentage. Princeton leads the pack with 21.44% followed by University of Pennsylvania with 20.05% and Harvard with 19.42%. It is interesting to note that Harvard is in fact ranked higher here than five other Ivy league schools.
Statistical Inference
The population that was chosen for statistical analysis was universities with SAT scores above 700. There were several reasons for doing so:
- To satisfy the conditions for hypothesis testing, the sample size should ideally be less than 10% of the population.
- A SAT score threshold of 700 provides a dataset with 310 observations out of which 30 can be randomly sampled which in turn satisfy the criteria for having a sample size of at least 30 if the distribution is not normal.
Fig 16 shows the Asian enrollment percentage distribution of the 310 schools with the highest SAT scores. The 95% confidence intervals (CI) are shown in black (-4.6271611337503007, 33.184257907943866).
Once again, Harvard’s percentage is higher than both the mean (14.28%) and median (11.32%).
A sample of 30 observations was randomly selected from the dataset, the distribution shown in fig 17. Similar to the larger dataset, Harvard’s percentage is higher than both the mean and median which makes sense as this is a random sample from the population.
For the 30 samples, the CI is (1.1758912348476667, 28.70944209848566) which is also similar to that of the larger dataset. In the sample distribution, the mean and median are very close to together which is often an indicator of a normal or nearly normal distribution. The histogram in fig 17 does look much less skewed than the population distribution but the graph is not completely normal.
Both of the graphs above indicate that Harvard is in fact towards the higher end of the distribution when it comes to Asian enrollment percentage when compared with other schools with high SAT scores.
This theory is further explored with a hypothesis test. These are the hypotheses used:
- Null hypothesis (H0): population mean (u) = Harvard’s score (19.42%)
- Alternate hypothesis (HA): u < 19.42%
From this test, we are trying to determine whether Harvard’s enrollment percentage is unusually lower than the population mean or not. But since we have already seen that it is actually higher than the mean, the question is formulated in a different manner.
We are now trying to find out whether the population mean is unusually lower than Harvard’s score, which if true would prove that Harvard’s Asian enrollment is in fact much better than the average Asian enrollment.
Sample | t-statistic | p-value |
Entire population | -9.3695 | 7.93E-19 |
30 samples | -3.4326 | 0.000909 |
The table above shows the results of the hypothesis testing using both the entire population (310 observations) as well as the 30 samples.
When using the entire population, the p-value comes out to be a very small number, much less than any significance level that we would pick (usual value is 5% or 0.05). This suggests that the null hypothesis can be rejected for the alternate which would mean that the population mean is significantly smaller than Harvard’s score. For the 30 samples dataset, the p-value is not as small but small enough to safely reject the null hypothesis once again.
Generally, only a sample of the population is selected for statistical analysis, the results of which need to be generalized to a larger population. However, the entire population can be analyzed for analytical insights without the goal of generalization.
Here are two main reasons when sampling the population can be useful:
- Small population size (310 observations)
- Uncommon characteristics between observations: Each college has unique attributes
Results
The table below summarizes the findings from before, highlighting Harvard’s rank and percentile during each analysis.
Analysis | Samples | Harvard Rank | Percentile |
Highest SAT scores | 256 | 5 | 99.67 |
Highest Asian enrollment | 256 | 54 | 79.29 |
Highest Asian enrollment percentage | 256 | 35 | 87.62 |
Lowest admission rates | 256 | 10 | 87.11 |
Ivy league colleges | 8 | 3 | 75 |
As can be seen from the table, Harvard boasts one of the best combined SAT scores coming in at an impressive rank 5 out of the 7058 schools in the dataset which is an astounding stat.
When the Asian enrollment percentage is analyzed from the top 256 schools with the highest SAT scores, Harvard comes in at rank 35 for the enrollment percentage and at 54 for the total enrollment of Asian students, both respectable numbers. In fact, the school is in the 87th percentile when it comes to enrollment percentage, much higher than the median.
When analyzed with the 256 with the lowest admission rates, Harvard placed at rank 10 with a percentile of 87.11 which are impressive numbers.
And when it came to Ivy league colleges, Harvard placed at third spot out of the eight schools. It ranked higher than other prestigious schools such as Cornell, Yale and Columbia.
Hypothesis testing shows that the Asian enrollment percentage population mean is likely much lower than that of Harvard since the p-value of the test is much smaller than the significance level of 5%.
These results can be interpreted as such, given the population mean is equal to Harvard’s score (19.42%) the probability of observing a value of ~15% (sample mean) is very small. This is basically the definition of the p-value.
Discussion
The results from the descriptive statistics suggests that Harvard does in fact have a relatively higher Asian enrollment rate than most other schools. It ranks above the 75th percentile when it comes to the top 250 schools based on SAT score and lowest admission, coming in at rank 10 for the latter.
Harvard even ranks third when it comes to the eight Ivy league schools.
Statistical inference shows that Harvard’s enrollment percentage is clearly higher than the average value for the 310 schools that have a combined SAT score of more than 700.
The results of the hypothesis testing further confirms that not only is Harvard’s Asian students enrollment policy acceptable but it is also significantly better than that of the average school with a SAT score greater than 700.
Conclusion
Provide all these results, statistically, it is clear that there is no discrimination in Harvard’s Asian student enrollment policies.
If an enrollment value of 19.42% is considered a low value, other renowned schools are likely to come under scrutiny as well.
References
[3] https://collegescorecard.ed.gov/data/documentation/