Wikipedia-Mining Algorithm Reveals World’s Most Influential Universities

Wikipedia-Mining Algorithm Reveals World’s Most Influential Universities
Wikipedia-Mining Algorithm Reveals World’s Most Influential Universities

It is hard for academics and governments alike to know for sure which are the world’s most influential universities. Yet this is interesting data that many could benefit from. Creating an objective ranking is almost impossible as most people would have some level of bias, and factors such as geography, history and culture all have an influence. Therefore, until now this has been quite challenging to understand. However, according to the MIT Technology Review (2015) recently an algorithm has been created which figures out which are the most influential universities. The MIT Technology Review reports that:

“Today, we get such a ranking thanks to the work of Jose Lages at the University of Franche-Comte in France and a few pals. They’ve used the way universities are mentioned on Wikipedia to produce a world ranking.”

The ranking is interesting because it helps in the understanding of bias that can easily be introduced when producing potential ranking systems. One excellent example of such a bias is explained to be the fact that English has become for one reason or another the main language used in science.

As described, it is very difficult to understand the advantage and bias that this creates towards universities in countries where English is the first language. Rankings are also difficult to compile in the sense that different institutions focus on different things. As outlined, some have a preference for teaching and others are much more grounded in research. This makes it hard for any ranking to figure out how these factors should be considered.

The approach that has been taken with the new ranking uses an algorithm like that which Google utilises for the purpose of placing websites in order in the search engine results for different key word terms. The links of the website are highly relevant to Google in deciding which sites are most important, as more links is seen to be indicative of a more authoritative site.

Consequently, the algorithm looks at sites that have a lot of links coming in to them and ranks those as important, and ranks other nodes based on this too. As outlined, the approach taken uses this type of methodology, but based on Wikipedia articles.

The creators of the ranking looked at the mentions of universities in articles (each was a node in the network) and the number of links pointing to these were used to decide on the ranking for each university. This was implemented across 24 Wikipedia language editions, accounting for 59% of the world’s population and 68% of all the Wikipedia articles in all languages. Prior to getting underway the team ranks each university in each language first.

Wikipedia Ranking
Wikipedia Ranking

Looking at the top 10 universities when ranked in this way, it can be seen that Cambridge, Oxford, Harvard, Columbia and Princeton make up the top five. The rest of the top ten is comprised of universities solely from the United States – in order, Massachusetts Institute of Technology, Chicago, Stanford, Yale and University of California, Berkeley. The top 100 can be seen on this list.

This is interesting compared to a more conventional ranking, which lists the top 10 in the following order: Harvard, Stanford, University of California Berkeley, MIT, Cambridge, California Institute of Technology, Princeton, Columbia, Chicago and Oxford. As noted by the research team that created the Wikipedia ranked list, the Wikipedia list tends to show a greater emphasis towards universities that had more of a cultural impact.

Additionally, the Wikipedia list includes a greater diversity of countries, such as universities in Africa, as well as higher rankings for Japanese and Indian universities. Interestingly in the Wikipedia produced list, the USA has the top ranked universities, followed by Germany and then the UK, but conventional rankings tend to show the US, followed by the UK and subsequently Australia.

While the new ranking does not include every language, such as the fact that Ukrainian is not included, it is interesting. However, it is noted that biases are introduced just through the fact that it is easy for anyone to edit a Wikipedia article.

Nonetheless it provides food for thought on how rankings might be carried out in the future and introduces new ideas into the ways of producing such lists.