Many of you have likely already seen the maps of scientific fields generated based on citation information. In those visualizations, different scientific fields whose papers cite each other regularly get linked closely together on the map, and it produces a neat depiction of how different fields are related.
In a recent article on PLoS ONE by Johan Bollen et al. (original article, Nature News summary), they generate a similar visualization using click-based data instead of citations. Each “clickstream” is an anonymized sequence of user requests for research articles and generates a first-order Markov model of the clicks. For those who haven’t worked with Markov models before, a first-order model means that it calculates the probability that someone who’s clicked on an article from journal A will then click on an article from journal B, generating these probabilities for all possible journal combinations (it’s been several years for me, so my memory might be sloppy). It then applies some algorithmic foo which has the end result of arranging journals in particular fields such that those with high click-through probabilities with each other are positioned close to each other in 2D space.
There’s some benefits/differences using clicks to generate these visualizations has compared to using citations:
- Much more data
- The data is more recent, and you can easily get plenty of useful data from a specific time span
- It includes not just data from publishing researchers, but also end-users of the data, such as doctors, nurses, government officials, undergrads writing class reports, etc.
- It tends to be much more responsive to recent trends, which can be either a good or bad thing
I’m particularly interested in seeing how these sorts of maps may change over time. For example, I suspect that a few years from now you might see economics and “brain studies” more closely related to each other. I also find it kind of curious how “brain studies” and “brain research” are on totally different parts of the map — “brain studies” is close to cognitive science, language, and nursing (?), while “brain research” is over near physiology, animal behavior, and genetics. I’d like to see what actual journals are included in the two categories.
There’s of course some privacy concerns, but it would also be neat to see how the maps would compare between diferent institutions, or even different countries.
I do wish that they would have included computer science and engineering fields, though. I imagine this is because of the sources they used, although I imagine one could get wider-ranging results if one had access to Google Scholar’s logs (::drools::). It’d be pretty cool to generate a video showing how the map evolves over the years (although where you’d get your data set is another story), with, say, computer science starting off on a branch with mathematics and electrical engineering, and then moving to be more linked with things like physics, and then eventually dragging fields like music, neuroscience, brain research, etc. next to it. While some changes in the map may be obvious, although I imagine there may also be some surprises and sources of insight.