Bibliometrics: new methods in the age of big data

Bibliometrics will be turning 100 next year. So it is not a new invention by well-known media companies and database producers; instead, it emerged from the notion of supporting librarians in their core task of literature selection and holding management. The basic idea and the approach of bibliometrics was the systematic evaluation of journal articles with regard to the literature used and cited. Nowadays, bibliometrics is an instrument for the investigation of objective publication data, which is often used as performance data.

Classical bibliometrics

Classical bibliometrics measures the number of papers published. The attempt to determine the quality and significance of the publications, however, is where it becomes trickier. Here, bibliometrics takes a straightforward route that is applied in practice to this day: a publication is all the more important the more it is perceived. Citation was selected as an indicator to quantify this perception: a publication that is frequently cited in other publications is an important paper; a publication that is not or rarely cited therefore less so.

New methodology of bibliometrics: usage metrics

Although the era of “classical” bibliometrics is not over, new methods are appearing that are bound to take over from the old approaches one day. After all, the publication of academic articles in electronic form, whether it be an article in a journal, an electronic book, a blogpost, a chat or a multimedia article in an undetermined media format, has long been an established form of academic communication and the distribution of knowledge. As a result, new bibliometric methods have developed.

“User metrics”, for instance, has a fundamentally different approach to classical citation bibliometrics. For the first time, it opens up the opportunity to no longer solely determine the significance of academic publications via the indirect inference of the citation rate, but rather enable a correlation between usage and importance. As a result, usage metrics opens up the possibility of establishing whether a publication has been downloaded, and determining and evaluating statistically the time the user spends in the document and the type of use (e.g. highlighting or copying). The forwarding of a document or its sharing with other on social media can also reveal a lot about the importance of an academic article.

Altmetrics – bibliometrics using big data technology

In the case of altmetrics ¹ the use and combination of all free web data and the application of big data technologies on the system of publications and their measurement lead to new insights:

 “Because the internet makes such information available all over the world and the technology has become second nature to us, the digital public can report on things, people, experiences and events in real time.” ²

Half of all academic articles from the European Union (EU) are freely available on the web. With the further development of freely accessible academic web contents on this side of the paywall, there have long been new possibilities to adjudge the significance of academic output increasingly, more directly and more accurately.

New challenges for bibliometrics

The development of a digital academic environment is much more than simply the use of a new medium on the transportation of old contents. It is a revolution in the system of knowledge acquisition, its communication and the evidence it yields.

The pursuit of the digital footprint, which all users leave on the web, will enable more detailed assertions than was the case for paper-based publications. Determining the number of (online) readers, comments, tags, bookmarks or entries on blogs or tweets suggests how much potential webometry will have with alternative metrics in future. The more data there is on scientists and their output in a wide range of forms, the easier it is to make assertions about the significance of publications and their copyright holders in an automated manner with the aid of algorithms.


Bibliometrics in the age of big data

In the future, algorithm systems will conjure truthful assertions from the colossal amounts of data on the web without the aid of bibliometrics specialists. They will emerge without someone painstakingly extracting, structuring or storing data beforehand. And without relational databases being filled with it and creating specific categories to sort it out again.

The amount of data on people and their assertions and performances on the web, such as the contents of websites, professional networks, social networks, blogs and chats, but also data from Smartphones and other mobile terminal devices, all the way to data from lifelogging systems, is ballooning. As a result, an increasingly clearer profile of the individual scientist emerges, which enables inferences to be drawn about his or her academic importance.

The individual indicators, which have been difficult to ascertain within the scope of classical bibliometrics thus far, therefore take a back seat as general profiling can lead to a far more accurate and comprehensive picture of a scientist’s capability, significance and work. The classical bibliometrics indicators will become obsolete and may be replaced with a digital “scientist score”: a value that automatically takes into account and combines all the available web data on a scientist


The publication “Bibliometrie im Zeitalter von Open und Big Data. Das Ende des klassischen Indikatorenkanons” by Rafael Ball was released in 2015 by the publisher Dinges und Frick. It is available from bookstores and can be ordered on the Knowledge Portal

¹ An article on altmetrics can be found on this blog here.

² Bunz, Mercedes: Die stille Revolution: wie Algorithmen Wissen, Arbeit, Öffentlichkeit und Politik verändern, ohne dabei viel Lärm zu machen. Berlin: Suhrkamp 2012, S. 145

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International Public License.


DOI Link: 10.16911/ethz-ib-1917-en

Leave a Reply

Your email address will not be published. Required fields are marked *