DREaM event 3: Introduction to webometrics
Professor Mike Thelwall from Wolverhampton University presented a workshop session introducing webometrics on 30th January 2012 at the second DREaM workshop.
Professor Thelwall provided a preview of this session in a short interview.
You can also view this presentation on Slideshare.
You can also view this video on Vimeo.
Professor Mike Thelwall provided an introduction webometrics by outlining the types of data available from the web and the type of research questions that can be answered with this data.
He observed that an increasing proportion of the population use and create content for the social web, so there is the opportunity to gather data from the social web as an alternative to more traditional survey and interview methods. It can be much quicker to get insights this way, particularly when ethical approval for a questionnaire can take ages. However, unless you are looking at a web-related question, your sample will be biased if you rely solely on web metrics.
Thelwall provided examples of webometrics projects, including a project which sought to establish the extent to which life science researchers are internationally recognised. This examined the links to the web sites of a number of life science research groups and assessed where those links came from, based on the domain name. The researchers found that the web sites of life science research groups in Hungary were mainly linked to by other Hungarian web sites, suggesting that they are not well recognised internationally, whilst countries like Germany and Britain had a wider range of international incoming links. This work was then followed up via more traditional surveys to establish whether the research groups involved are well known internationally off the web.
Similar research allowed Thelwall to show how collaboration and links between EU universities occur using web links. He identified that language and borders have an impact on the likelihood that groups will collaborate, and whilst there is a systematic bias towards subject areas, such as computer science (because it is more likely to have a greater volume of web content and links than other domains such as those in the humanities) this methodology allows for a quick indication of patterns which can then be investigated further by more traditional means.
Thelwall moved on to discuss the idea of altmetrics, which challenges traditional ways of evaluating the impact of research, such as citation data or the H-index. Altmetrics looks for web-based indications that your research has made an impact, particularly outside the traditional scholarly community. He described the integrated online impact indicator (IOI) which combines a range of web-based methods and online sources, to provide one indicator of impact online. He discussed how this has been used to inform decisions about the value of spending time writing certain types of report.
Next, Thelwall outlined sentiment analysis and the SentiStrength programme. This analyses text on the web to determine whether it is positive or negative in sentiment. This works by using a list of 2,489 words related to sentiment, each of which is given a numerical measurement to grade how positive or negative it is. The program looks for words in this list and records the highest positive and the highest negative mark in the text. Thelwall has used this to identify sentiment about major media events using Twitter posts in English. Interestingly, he found very negative events are typified by small increases in negativity.
Finally, Thelwall discussed how to analyse the sentiment in comments on YouTube videos using the Webometric Analyst programme. This creates interesting graphs of the comments surrounding individual YouTube videos very easily, giving a picture of discussions taking place on a particular video, including age and gender of commenters. Thelwall had set homework for the group prior to the session asking them to comment on a particular YouTube video, which he analysed using the software.
Circles represent contributors: blue for male, pink for female, and white for unknown. Circles are annotated with usernames and ages (possibly fake in some cases!). Arrows between circles reflect a reply posted by one commenter to another: red indicates a positive tone, black a negative tone, and grey a neutral tone.
Workshop participants were interested in the ethics of collecting web data in this way, and Thelwall assured them that his position is that if data is publicly available on the web, you don’t have to ask permission to analyse it.
Thelwall concluded by summing up the advantages of using web metrics, emphasising the speed that you can get data. However, he also warned that the downside of this is that the sample is often poor, so the value of webometrics as a methodology depends on the nature of your research question.
If you would like to comment on this presentation, please join the discussion in the DREaM online community.