PhD topics

Below is a list of PhD topics that our academic staff have suggested as study opportunities for PhD students to undertake. If you wish to develop a proposal around one of these topics, please contact the relevant member of staff to discuss this, and copy in

Note that you are not required to undertake one of these projects to study for a PhD with us - you can also formulate your own proposal by following these instructions.

Also available is the central University's Find a Supervisor tool, if you want to search across the whole University.

Academics' perceptions of literacy and being literate

Contact: Dr Peter Stordy

Digital technologies have transformed what it means to be literate and to experience literacy. Various literacies have been coined to capture this transformation including established literacies like computer literacy, information literacy, digital literacy, media literacy and internet literacy, to newer conceptions like transliteracy, metaliteracy and multimodal literacy. With varying degrees of success, some scholars have attempted to categorise these literacies (e.g. Addison & Meyers, 2013; McClure, 1994; Spitzer et al., 1998; Bawden, 2001; Savolainen, 2002; Lonsdale and McCurry, 2004; Stordy, 2015) but how do others perceive the myriad of literacies and literacy types? What do they understand it means to be literate in the 21st century?

Addressing the needs of dementia patients’ carers: a national profiling exercise to ensure long-term wellbeing

Contact: Dr Laura Sbaffi

According to Alzheimer’s Research UK (2015) and the Alzheimer Society (2014), there are 850,000 people in the UK living with different stages of dementia and 80% of them are being looked after by a friend or family member; this translates in 700,000 people who had to put their own lives on hold and experience often severe physical, emotional, psychological and financial distress as a result of their role as carers.

The overarching idea behind this research project is to identify segments (i.e. typologies) of dementia carers based on a number of contributing factors, such as geo-demographics, behaviours, attitudes and needs to establish an up-to-date national picture. This exercise will be conducted via a quantitative research tool (i.e. survey) to be distributed across a number of charities and associations and will be complemented by interviews and focus groups where needed to shed light on the more complex aspects.

Boots et al. (2015) showed that “early therapeutic interventions could help caregivers identify their needs, and focus on enhancement of the positive, intact experiences to prevent caregiver burden”, suggesting that carers of patients affected by mild dementia or having been recently diagnosed, could benefit the most from this research.

I am very interested in supervising a PhD project that investigates experiences and issues of dementia patients’ cares, especially in a study that explored:

  • the literature review on the current situation of the carers around the UK;
  • how people deal (if at all) with their role as carers;
  • the key aspects that might emerge from a survey on a national sample of carers;
  • the formulation of an instrument that can help healthcare professionals to identify needs and stir carers towards personalised coping solutions.
Affordances of popular digital games in education

Contact: Dr Peter Stordy

Gamification typically describes the creation of games to promote learning outcomes. However, popular digital games like Pokemon Go and World of Warcraft, arguably already do this. This area has received less attention in the literature.

Bottom-up ICT innovation

Contact: Dr Christopher Foster

I am interested in the shifting concept of ‘innovation’. I have previously researched strategy and policy that might allow ICT innovations to be more socially inclusive.

Possible areas of focus:

  • Localised and/or informal ICT innovation amongst marginal groups, and how this might be captured and scaled.
  • How do digital firms use customers, users or intermediaries as a source of innovation?
  • What are the policy conditions which drive (ICT) innovation to be more inclusive?
Cultures of data science practice

Contact: Dr Jo Bates

How do cultures of practice influence how data science is done? How do these cultural factors shape the outputs of data science projects? What are the actual and potential implications of these cultural dynamics? This topic could be approached from a variety of perspectives e.g. cultural economy, feminist etc, and would likely use ethnographic or similar methods. I am interested in supervising projects that explore these questions with a focus on specific empirical cases. There are many possible cases, however preference will be given to project ideas that focus on novel ideas that offer realistic opportunities for empirical data collection. This project is suitable for students with an academic background in the social sciences/humanities (e.g. sociology, anthropology, cultural studies, politics etc), and students should have some knowledge of social/cultural theory.

Data Journeys/Data Frictions

Contact: Dr Jo Bates

How do interrelated socio-material forces shape the movement of data between different people, organisations, sectors? What socio-material forces slow down, obstruct and block data movements? How do emergent data flows bring social actors into new types of relation with one another? How ought these emergent data flows be theorised in order to inform our understand of emergent dynamics of power, structure and agency in an era of datafication?

I am interested in supervising projects that explore these questions with a focus on specific empirical cases. There are many possible cases, however preference will be given to project ideas that focus on novel ideas that offer realistic opportunities for empirical data collection. This project is suitable for students with an academic background in the social sciences/humanities (e.g. sociology, anthropology, cultural studies, politics etc), and students should have some knowledge of social/cultural theory.

Digital ecosystem

Contact: Dr Angela Lin

The rapid development of smart and connected devices and the services that are built upon them are gradually changing and blurring organisational, social, and temporal boundaries. An ecosystem approach to managing IT systems, business partners, and strategy has been proposed to replace the traditional approach. This new approach requires different ways of thinking and approaching challenges and planning the strategies. The topics (not limited to) that I am interested in this area are:

  • Consumer behavior in digital ecosystems
  • The role of new digital ecosystems in the organizational context
  • Organisational, social, and ethical issues arising with new digital ecosystems
  • Privacy and confidentiality issues of digital ecosystems (with Dr Jonathan Foster)
Digital transformations and organisations

Contact: Dr Jorge Martins

I am interested in the ubiquity of digital technology and its implications for work practices and organisational processes. Possible areas of focus include:

  • How does the socio-materiality of digital technology shape work practices and organisational arrangements?
  • What kinds of organisational and institutional implications does the making of data-based goods and services carry in terms of business models, work practices and organisational structures?
Digital transformation in the public sectors

Contact: Dr Angela Lin

Governments around the world are taking advantage of digital technologies with an aim to improve internal efficiency and to provide quality services to its citizens. The management of government IT systems and IT projects is not easy, and sometimes the outcomes of IT systems development and implementation can be disappointing. I am interested in any projects focusing on digital transformation initiatives in the public sectors.

Food logging

Contact: Dr Andrew Cox and Pam McKinney

Internationally, governments are recognising that obesity is a major health challenge for this century, and people are becoming more aware of the influence of diet on their health. Yet in a time of economic austerity resources to support healthcare are stretched, and it is vital that innovative methods of health information provision are investigated. The increasing availability of mobile health applications is of great interest, in terms of informing people about their own health and promoting improved self-management. Diet and fitness tracking apps are increasingly popular, as a form of food logging: the activity of recording food intake and monitoring weight and other health conditions that may be affected by diet, using applications (apps) accessed through mobile devices and personal computers. MyFitnessPal having amassed 75 million registered users worldwide. Tracking what one eats has long been recognised as a way to improve diet and support outcomes such as weight and symptom management, and an app is probably more effective than a paper based diary. But we need to know much more about how people weave food logging into their daily lives. Evidence of the practical benefits of logging food as such, have to be set in the context of controversies around the quantified self movement, and more widely in critical debates around “big data”

We are very interested in supervising a PhD project that investigates experiences of food logging, especially in a study that explored:

  • What information literacy means in the context of food logging;
  • how food logging relates to other forms of quantified self, such as activity tracking;
  • How food logging varies in specific situations eg in the context of a medical condition like diabetes or a practice such as running. This could involve working with relevant third sector organisations.
  • How food logging integrates with wider information behaviour around diet, health and fitness.
Globalisation of firms, firm value chains and digital technologies

Contact: Dr Christopher Foster

Global production is increasingly fragmented and often includes small and marginal producers from developing and emerging nations. Recent literature has explored the potential for firms and regions who are part of such fragmented networks to improve their position and ‘upgrade’, particularly the literature on value chains and global production networks.

With the growth of digital ICTs and connectivity, we are seeing a digitisation of value chains right down to smallest firms and producers, and a growth of relevant online services and platforms. However, we know little about the impact of digital technologies in these networks, particularly on those smaller producers and firms.

Potential questions to explore include:

  • Do digital ICTs and connectivity support smaller firms upgrading (or downgrading) in value chains?
  • How can regions build policy around digital technologies to improve their position in global production networks?
  • Are there new constraints on firms in value chains as digitation and platforms emerge?
  • Are online services and platforms enabling new types of the value chain for small firms?

I would particularly interested in studies which explore changes in more established economic sectors (e.g. apparel, agribusiness).

Globally Distributed Collaborative Work

Contact: Dr Pamela Abbott

I am interested in research about how globally distributed teams (e.g. agile software development teams) collaborate and innovate. Increasingly, the practice of global software outsourcing is being undertaken to produce innovative outcomes. Firms are outsourcing IT services not only to gain cost and scale advantages but to increase their innovative capacity by leveraging the cutting edge and often entrepreneurial expertise of small global firms. Their practices are very often fraught with difficulties related to distance, time and geographical separation as well as cultural and knowledge differences. I have investigated some instances of these issues in Chinese software and services outsourcing firms and found various “work-arounds” and collaborative strategies (Abbott, Zheng, Du, & Willcocks, 2013; Abbott, Zheng, & Du, 2014; Zheng & Abbott, 2013). There is also a wide range of research about this topic from well-established authors in the field e.g. (Hinds & Kiesler, 2002; Levina & Vaast, 2013; O’Hara-Devereaux & Johansen, 1994).

Some possible research questions:

  • How do collaborative work practices emerge in distributed teams?
  • How do the characteristics of distributed environments (time-space separation, knowledge, status and cultural differences) influence the efficacy of collaborative work ?
  • How does the nature of work (e.g. software development) change/adapt/transform when in distributed settings and what contributes to these changes?
  • How do working relationships contribute to changes in work practices when influenced by distributed environments?

Abbott, P., Zheng, Y., Du, R., & Willcocks, L. (2013). From boundary spanning to creolization: A study of Chinese software and services outsourcing vendors. The Journal of Strategic Information Systems, 22(2), 121–136.

Abbott, P., Zheng, Y., & Du, R. (2014). Collaboration, learning and innovation across outsourced services value networks: software services outsourcing in China. Cham: Springer.

Hinds, P., & Kiesler, S. (2002). Distributed Work. MIT Press.

Levina, N., & Vaast, E. (2013). A Field-of-Practice View of Boundary Spanning in and across Organizations: Transactive and Transformative Boundary Spanning Practices. In J. L. Fox & C. Cooper (Eds.), Boundary-Spanning in Organizations: Network, Influence and Conflict (pp. 285–307). New York: Routeledge. Retrieved from

O’Hara-Devereaux, M., & Johansen, R. (1994). Globalwork: Bridging Distance, Culture and Time. San Francisco, California: Jossey-Bass.

Zheng, Y., & Abbott, P. (2013). Moving Up the Value Chain or Reconfiguring The Value Network? An Organizational Learning Perspective On Born Global Outsourcing Vendors. In ECIS 2013 Completed Research (p. Paper 162). Utrecht, Netherlands. Retrieved from

How can governance contribute to the effective handling of information and data in organizational and social media contexts?

Contact: Dr Jonathan Foster

How can information governance contribute to organisations' handling of their information and data assets?

Contact: Dr Jonathan Foster

ICTs, Development and Globalisation

Contact: Dr Pamela Abbott

I am interested in studies investigating phenomena around ICTs and development, i.e., the contested relationship between the development of ICT initiatives in poor, underdeveloped communities and the resulting influence this may have on development efforts in those environments. This topic is related to ICTs and globalisation, in general, were we see the emergence of socio-technical innovations that either work well in relation to their contexts of implementation or are caught up with complex institutional arrangements that inhibit their usefulness. Some specific topics around this area may include:

  • Social entrepreneurship projects in developing countries that are ICT-enabled or have a significant component of ICT infrastructure involved
  • Development of ICT infrastructure to support ICT-enabled Research and Education initiatives
  • Failed ICT initiatives in developing countries with analysis of causes of failure
  • Studies looking ICTs meant to enhance healthcare provision or wellbeing in underserved communities to determine how they are appropriated by end-users
  • Studies looking at technology innovation emerging from developing country contexts
  • Studies looking at the appropriation of technology to deal with social problems such as conflict, forced migration, social exclusion, financial exclusion
The impact of digitisation on micro, small, or medium companies

Contact: Dr Angela Lin

Democratisation of digital technologies has enabled micro businesses and SMEs to access to the capitals that were not available to them before. However, the evidence has shown that not all businesses can take advantage of digital technologies and those who are unable to do so are lagging behind those who can. I am interested in projects which aim to investigate the impacts of digitisation on businesses and businesses' digital strategies for the digital economy.

Impact Sourcing

Contact: Dr Pamela Abbott

I am interested in research about models of global sourcing that attempt to engage in improving the socio-economic conditions of the local contexts in which the outsourcing service providers operate. For example, if a multi-national firm decides to offshore its IT service provision to India (as a case in point) and sets up a captive centre in a remote town where it hopes to make a positive impact on the economy and social life of the community, this would provide fertile ground for an impact sourcing study. I studied such cases in the past publishing my observations in two papers (Abbott, 2005; Abbott & Jones, 2012) and also looked at how a lack of engagement in local contexts could negatively impact social relations in communities where global sourcing was a key provider of economic development (Suri & Abbott, 2012). A number of other references are given below which provide good resources for studies about impact sourcing (Babin & Nicholson, 2013; Carmel, Lacity, & Doty, 2014; Lacity, Rottman, & Carmel, 2012; Sandeep, 2015).

Some possible research questions:

  • How do firms who practice impact sourcing reconcile the competing ethical positions of profit motive and socio-economic improvement?
  • How do we effectively evaluate the development impact of impact sourcing ventures?
  • How do impact sourcing ventures demonstrate sensitivity to local contexts when engaging in social improvement activities?

Abbott, P. Y. (2005). Software export strategies for developing countries: A Caribbean perspective. The Electronic Journal of Information Systems in Developing Countries, 20. Retrieved from

Abbott, P. Y., & Jones, M. R. (2012). Everywhere and nowhere: nearshore software development in the context of globalisation. European Journal of Information Systems, 21(5), 529–551.

Babin, R., & Nicholson, B. (2013). Sustainable Global Outsourcing: Achieving Social and Environmental Responsibility in Global IT and Business Process Outsourcing. Basingstoke, Hampshire: Palgrave Macmillan. Retrieved from

Carmel, E., Lacity, M. C., & Doty, A. (2014). The Impact of Impact Sourcing: Framing a Research Agenda. In R. Hirschheim, A. Heinzl, & J. Dibbern (Eds.), Information Systems Outsourcing: Towards Sustainable Business Value (pp. 397–429). Berlin, Heidelberg: Springer Berlin Heidelberg.

Lacity, M. C., Rottman, J. W., & Carmel, E. (n.d.). Emerging ITO and BPO Markets: Rural Sourcing and Impact Sourcing: Mary C. Lacity, Joseph W. Rottman, Erran Carmel: 9780769549187: Books. Retrieved from

Sandeep, M. S. (2015). Innovations in outsourcing: the emergence of impact sourcing. \copyright Sandeep Mysore Seshadrinath. Retrieved from

Suri, G. S., & Abbott, P. Y. (2012). IT cultural enclaves and social change: the interplay between Indian cultural values and Western ways of working in an Indian IT organization. Information Technology for Development, 1–22.

Information Systems, strategy practices and performativity

Contact: Dr Jorge Martins

I am interested in Information Systems strategy as a social activity, something organisational actors do in practice, and the processes employed to accomplish it. Possible areas of focus include:

  • What is the praxis of the effective Information Systems professional (e.g. Chief Information Officer, Information Systems consultant, change manager) as a strategic thinker, relationship builder and negotiator?
  • How is strategic influence won or lost through discursive practices in the context of how information technologies are adopted and adapted?
Internet infrastructure and digital inequality

Contact: Dr Christopher Foster

Underlying our connected world is an internet infrastructure which is marked by opaque agreements and conflicting interests. These processes are invisible but they shape the ways we interact with the internet, and the ways that digital information is used and consumed.

I’m interested in studies that explore some of the processes around internet infrastructure, particularly studies that looks at how infrastructural decision-making impacts on digital divides, digital inequality or regional connectivity.

Such work can readily draw on theories of infrastructure from information systems, actor-network theories or wider social science literature on infrastructure

Examples of possible areas of focus:

  • Exploring public-private projects around internet fibre, and the impacts on digital information.
  • The emergence of new infrastructure components (such as content delivery networks and internet exchanges) and how decisions around these is impacting on digital inequality
  • Activist driven and open internet infrastructure.
Legal text mining

Contact: Dr Nikolaos Aletras

In his work on investigating the potential use of information technology in the legal domain, Lawlor surmised that computers would one day become able to analyse and predict the outcomes of judicial decisions [1]. He also stated that reliable prediction of the activity of judges would depend on a scientific understanding of the ways that the law and the facts impact on the relevant decision-makers, i.e. the judges. Building text-based predictive systems of judicial decisions can offer lawyers and judges a useful assisting tool [2]. Such systems may be used to rapidly identify cases and extract patterns that correlate with certain outcomes. They can also be used to prioritise the decision process on cases where law violations seem very likely. This may improve the delays imposed by the courts and encourage more applications by individuals who may have been discouraged by the expected time delays.

[1] R.C. Lawlor (1963). What computers can do: analysis and prediction of judicial decisions. American Bar Association Journal
[2] N. Aletras, D. Tsarapatsanis, D. Preoţiuc-Pietro, V. Lampos (2016). Predicting Judicial Decisions of the European Court of Human Rights: A Natural Language Processing Perspective, Peer J Computer Science.

Mapping and aligning large knowledge bases

Contact: Dr Ziqi Zhang

Information Extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured documents. It is a crucial technology to enable the Semantic Web vision. Recent years have seen the popularity of large scale knowledge bases, or knowledge graphs (e.g., the Google knowledge base, the Never-Ending Language Learning (NELL) knowledge base. Many of these are constructed by applying IE at Web-scale to automatically extract information from webpages and link them in a structured way.

The availability of such very large knowledge bases has created new opportunities for many downstream applications such as Information Extraction, Text Mining, Natural Language Processing, and Information Retrieval. However, one major challenge that remains for the use of such knowledge bases is heterogeneity, the fact that many different knowledge bases contain overlapping information that is described differently. For example, ‘Bone’ and ‘Artery’ is classified under ‘BodyPart’ in the NELL knowledge base but ‘AnatomicalStructure’ in DBpedia, NELL has ‘MLConference’ (machine learning) which is not present in the DBpedia ‘Conference’ class. Further, different knowledge bases often contain complementary data. For example, NELL has over 15,000 instances of ‘Disease’ while DBpedia has 5,600.

Hence a research question arises as how to align and integrate large scale knowledge bases, particularly those that are created by automatic text mining techniques, often contain noisy data (e.g., inaccurate facts). To do so, techniques such as Machine Learning, semantic similarity, and data mining will be used. The challenges are how to create models that are 1) scalable, as knowledge bases can contain millions of instances; 2) robust to noise (due to the presence of incorrect facts; 3) able to learn on small set of training data, as such large knowledge bases often contain statements of facts but without document evidence of where they are extracted from.

NoSQL database design

Contact: Dr Peter Stordy

Effective relational database design is well-documented (e.g. Connolly & Begg, 2005). Despite the increase of NoSQL (non-relational) databases by organisations and businesses, their effective design is less well understood. To what extent can the techniques and concepts present in 'good' relational database design be transferred to NoSQL database design? What implications are there for teaching database design?

Open access publishing and dissemination

Contact: Prof Stephen Pinfield

Scientific and other scholarly publishing is currently being transformed with a greater emphasis on making work available in an open access form. An increasing number of governments, funding agencies and institutions now require results from research they support to made open access, a number of disciplines have developed a culture of sharing, and technologies and infrastructures are being developed to enable rapid and wide dissemination of outputs. I am interested in supervising students investigating various aspects of open access and open science. These might include specific studies on policy development, business models, disciplinary cultures, technology-based innovation or a range of other topics.

Personal IT used and impacts

Contact: Dr Angela Lin

Personal ICTs range from smart gadgets (e.g., smartphones, smartwatches, activity trackers, smart home), services (e.g., messengers, advance personal assistance), to complex peer-to-peer ecosystems (e.g. social networks, sharing services, and collaborative systems) (Trenz, 2018). Personal ICTs are expected to impact not only on individual adaptors but also on organisations as well as society. The topics relating to the use and behavioural changes because of the use are particularly welcome.

Social media and computational social science

Contact: Dr Nikolaos Aletras

The daily interaction of billions of users with online social platforms such as Facebook, Twitter, Reddit or Instagram has made available enormous amounts of user generated content. The plethora and diversity of this data (e.g. text, images, videos or interactions with other users such as 'retweets' or 'likes') enabled studies in computational social science and sociolinguistics to analyse human behaviour on a large scale and automatically infer user latent attributes. Particularly, the growth of the user-generated content in social media can be used as a complementary source to traditional methods for extracting and studying user socioeconomic attributes such as occupation [1], income [2] and socioeconomic class [3]. I am interested in studying language use in social media to infer user characteristics using interpretable machine learning models while modelling the complex non-linear nature of the data. These approaches have real world applications in target advertising, health intervention and recommender systems.

[1] D. Preoţiuc-Pietro, V. Lampos and N. Aletras (2015). An Analysis of the User Occupational Class through Twitter Content. In ACL.
[2] D. Preoţiuc-Pietro, S. Volkova, V. Lampos, Y. Bachrach, N. Aletras (2015). Studying User Income through Language, Behaviour and Affect in Social Media, PLOS ONE.
[3] V. Lampos, N. Aletras, J. K. Geyti, B. Zou, I. J. Cox (2016). Inferring the Socioeconomic Status of Social Media Users based on Behaviour and Language. In ECIR.

Social media, sousveillance and protests

Contact: Dr Paul Reilly

Eyewitness perspectives on protests and civil unrest can now be shared by recording footage on a mobile phone and sharing it on sites such as YouTube. This could potentially redefine journalism, allowing previously marginalised voices to be heard in a public sphere that is co-created by both citizens and professional journalists. Social media is facilitating sousveillance, a form of inverse surveillance which empowers citizens through their use of technology to ‘access and collect data about their surveillance’. Sometimes witnesses deliberately record the actions of authority figures, such as police officers, and have a clear political agenda for sharing this material. Yet, the use of mobile phones by members of the public to record personal experiences may, in many cases, be transformed into a form of inverse surveillance through its dissemination on YouTube, thus raising questions about the actions of the police officers captured on camera.

The visibility of campaigns such as #BlackLivesMatter illustrates how the use of social media by advocacy groups can help shape public debates about policing and human rights in democratic states. Yet, not all such online campaigns have the same impact. Recent research has suggested that the use of YouTube to share sousveillance footage may reinforce pre-existing attitudes towards protesters. One study of 52 videos purporting to show police brutality in Northern Ireland found that the ambiguous nature of the footage raised as many questions about the behaviour of the protesters, as it did the police (Reilly, forthcoming). However, further empirical research is needed in order to explore the ways in which citizens respond to this use of social media for sousveillance.

I am very interested in supervising a PhD project that examines:

  • How social media platforms (Facebook, Twitter and YouTube in particular) are used by protesters for sousveillance purposes in the United Kingdom and United States
  • How citizens engage with footage purporting to show alleged police brutality that has been shared on these sites
  • The extent to which the responses of these citizens appear congruent with the media framing of these protests
A text mining approach towards counter cyber hate and/or extremism online

Contact: Dr Ziqi Zhang

Social media such as Twitter is increasingly exploited for the propagation of hate speech and extremism content and the organisation of related activities. Their anonymity has made the breeding and spreading of such content effortless in a virtual landscape beyond traditional law enforcement, eventually breeding crime and violence. The UK has seen significant increase of hate speech and spread of extremism content on social media following events including leaving the EU, and the Manchester attacks, leading to spikes of hate crimes (e.g., the Finsbury attack). Implementing effective counter measures depends on the real-time understanding of such content, i.e., automated detection of the emergence and spread of the content, and semantic content analysis.

The research will take a text mining approach to focus on either counter-hate or counter-extremism, and can choose from a couple of directions: 1) to develop Natural Language Processing (NLP) and Information Extraction (IE) methods to detect hate speech, or extremism on social media, and extract structured data such as named entities, relations, geospatial and temporal information from their content. The structured data provide means for indexing, linking, clustering and summarisation of such content, which will ultimately support humans in the tracking and interpretation of hate or extremism online, to enable effective counter measures; 2) to develop data mining methods that analyse the formation of network and community that spread hate and extremisim to understand how such content is generated, propagated, and used to influence other users on social media; and ultimately, to understand how such network and community emerges, grows, and shapes people’s ideology.

Understanding large document collections

Contact: Dr Nikolaos Aletras

Much of the information in digital libraries is stored in an unstructured way and is not organised using any automated system. That is often overwhelming for users in a way that makes it difficult to find specific information or explore such collections. A particular set of unsupervised statistical methods, namely topic models have been extensively used in Natural Language Processing and Information Retrieval for analysing and organising large document collections. Topic models have been integrated into document browsing systems allowing humans to navigate through and identify relevant information on a large scale [1]. The output of topic models, often represented by lists of the most probable words, needs post-processing to make it interpretable for users [2,3,4].

[1] N. Aletras, T. Baldwin, J. H. Lau and M. Stevenson (2017). Evaluating Topic Representations for Exploring Document Collections. Journal of the Association for Information Science and Technology (JASIST).
[2] N. Aletras and M. Stevenson (2013). Representing Topics Using Images. In NAACL-HLT.
[3] N. Aletras and M. Stevenson (2014). Labelling Topics using Unsupervised Graph-based Methods. In ACL.
[4] N. Aletras and A. Mittal (2017). Labeling Topics with Images using Neural Networks. In ECIR.

Understanding the role of social media on public healthcare: a text mining approach

Contact: Dr Ziqi Zhang

Social Media Sites (SMS) are playing an important role in the generation and sharing of health information, as studies have shown that a substantial and increasing percentage of population is seeking and following health advice found on SMS (e.g., 90% of 18-24 years of age said they would trust medical information shared by others on their social media networks; 19% of smartphone owners have at least one SMS based health app on their phone.). However, the impact of using such resources on health improvement is still rarely studied. With limited research, it is known that there are both positive findings but also misunderstanding and misuse of information, such as anti-biotics abuse. With the increasing influence of SMS on public healthcare, it is becoming critical to adopt a systematic approach to investigate the content from such resources and understand the opportunities it brings to the healthcare sector.

To address this issue, I am interested in research on developing text minng and information extraction methods to automatically process and extract structured information from heterogenous SMS sources at scale. The ultimate goal is to create a structured knowledge base of facts (e.g., pholcodine relieves dry tickly cough) mined from such resources, linked to each other, and quantified based on the frequency of their mentions across disparate datasets. This will ultimately enable efficient and effective querying such as ‘what do people consider as remedies for dry tickly cough; how often is each one used/recommended; are there any contradictary statements and hence controversial remedies’. A knowledge base as such can be the first crucial step to facilitate further research to build our understanding of the impact of SMSs on health improvement and opportunities they bring to health care sector.

The particular research challenges in this research will be how to make use of the conversational nature in such data sources as useful context in text mining; how to cope with the colloquial, conversational text that are known to be much more difficult than conventional text mining settings; and how to reduce the dependence on manually labeled data that is scarce and expensive to create, by exploiting largely available un-labeled data.

University students' perception of the feedback and assessment

Contact: Dr Peter Stordy

Various student satisfaction surveys (e.g. National Student Survey - NSS) have arguably brought about improvements in UK Higher Education. However, universities have struggled to improve students' perceptions of the assessment & feedback they have experienced. Why is this area proving so intractable?

Web-scale Information Extraction: addressing the long-tail in knowledge base construction

Contact: Dr Ziqi Zhang

Information Extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured documents. It is a crucial technology to enable the Semantic Web vision. Recent years have seen the popularity of large scale knowledge bases, or knowledge graphs (e.g., the Google knowledge base, the Never-Ending Language Learning (NELL) knowledge base. Many of these are constructed by applying IE at Web-scale to automatically extract information from webpages and link them in a structured way.

One of the remaining issues in this process (and indeed, for IE in general) is extracting information in the ‘long tail’, referring to information that is infrequently mentioned in the data sources. As an example, there is plenty of information for a pop music star but much less information for an indie artist. However, both are equally important to their fans and it is widely known that the combined value of the long tail can largely outweigh that of the ‘head’. Mining the long tail will enable information service providers to reach the ‘niche’ market, which is becoming increasingly important.

Extracting information in the long tail is extremely challenging, because conventional IE methods rely on ‘information redundancy’, that the information to be extracted will be repeated many times in the data, and only then we can identify sufficient and effective patterns that can extract such information. However, this will not apply to information in the long tail, due to their low frequency. For decades, this remains a major challenge for the IE community and research on this direction is at best, addressing specific tasks in isolated, lab environment.

I am interested in developing systematic approach to quantify, qualify and addressing the issue of IE from the long tail. The research will answer questions such as: what does ‘long tail’ mean for different types of IE tasks; what characteristics does the information in the long tail have, across domains and tasks; how can we use such findings to develop methods that are more effective at extracting information from the long tail; how does such methods cope with IE from the ‘head’; and how do we evaluate such methods.

What are the regulatory and social challenges raised by personally identifying information in a digital environment?

Contact: Dr Jonathan Foster

What challenges does the digital environment pose for the ethics of information and data handling?

Contact: Dr Jonathan Foster