Information research case studies

Our research spans many interdisciplinary areas and impacts a wide variety of industries and sectors. Below are just a few case studies showing the applications of our research outputs.

Off

Enhancing the effectiveness of drug discovery programmes

Most new drugs are designed to treat diseases where there are sufficient sufferers to offset the massive research and development costs of drug discovery, but rare diseases may affect insufficient patients to make conventional procedures economically viable. In such cases, a drug can be given “orphan drug” status if it can be shown that the new drug is markedly dissimilar to existing drugs for that rare disease.

The EMA is the regulatory authority responsible for licensing drugs for use throughout the EU, and its Committee for Medicinal Products for Human Use (CHMP) evaluates 12-15 orphan-drug applications each year, with a successful application meaning that no similar compound can be registered and marketed to patients for a 10-year period.

Research in Sheffield, led by Professors Val Gillet and Peter Willett, developed software to quantify the concept of “markedly dissimilar” using a novel application of the “fingerprint”, a simple computer representation that describes the atoms and bonds comprising the structure of a molecule. The dissimilarity is computed between the fingerprints describing a new drug that is being proposed by a pharmaceutical company and those describing each of any existing drugs for that disease. This computation then uses a statistical model to decide whether the new and existing drugs are indeed markedly dissimilar, and hence whether the new drug is eligible for consideration for orphan drug status.

This research benefits three groups: international pharmaceutical companies developing novel drugs; the regulatory authority responsible for the evaluation and supervision of those drugs throughout the EU; and, most importantly, patients suffering from rare diseases who can be treated with drugs that would not otherwise be commercially viable. The fingerprint software described above has been available for use by CHMP since 2014. A marked dissimilarity in structure is one of several criteria that are used by CHMP to decide whether a new drug differs sufficiently from any existing remedies and hence whether orphan drug status is appropriate. The software provides CHMP with a quantitative basis for helping to make this vital decision; and in-house implementations of the published method are also used by pharmaceutical companies when submitting new drugs for consideration by the EMA, e.g. Novartis (the world’s sixth largest pharmaceutical company by revenue in 2019).

Three submissions to CHMP illustrate the use of the fingerprint approach. Novartis Europharm Ltd. used nine different types of fingerprint to demonstrate that their potential orphan drug Rydapt was markedly dissimilar to three existing orphan drugs for the treatment of specific forms of myeloid leukaemia and mastocytosis. Tesaro UK used one particular type of fingerprint to demonstrate that their potential orphan drug Zejula was markedly dissimilar to two existing orphan drugs for the treatment of specific forms of ovarian, fallopian tube and peritoneal cancer; and this was confirmed by CHMP using five other types of fingerprint. CHMP used a range of fingerprints to demonstrate that a potential orphan drug Chenodeoxycholic acid sigma-tau (later renamed Chenodeoxycholic acid Leadiant) proposed by Sigma-tau Arzneimittel GmbH was not markedly dissimilar to an existing orphan drug for the treatment of inborn errors of primary bile acid synthesis due to sterol 27 hydroxylase deficiency, a finding that necessitated further investigation of the submission by the EMA. The three applications were subsequently awarded orphan drug status for use within the EU and most recently have been approved for use within the UK by the NHS to treat acute myeloid leukaemia, recurrent ovarian cancer and inborn errors of primary bile acid synthesis.

Fingerprints effectively identify structurally similar molecules in searches of large databases but are less effective at finding molecules that have similar bioactivities but that have different structures. Such molecules are very valuable in drug discovery since they allow novel, and hence patentable, regions of chemical space to be explored, and an alternative representation, the reduced graph (RG), has been studied for this purpose.

An RG represents compounds by nodes (groups of atoms and bonds) that are connected by the topological (through-bond) distances between atoms in the nodes (rather than by the individual atoms and bonds as in a fingerprint). The RG was originally developed in Sheffield to describe the structures in chemical patents, but was adapted for database searches to identify compounds with the same bioactivity but with different structures [R4]. A collaboration with GlaxoWellcome (now GSK) showed the RG approach to be effective at finding compounds that share the same activity but which would not be found using traditional fingerprints. Further collaborations with GSK demonstrated the effectiveness of RGs in identifying structure activity relationships using machine learning techniques, and with Sanofi explored the use of RGs to suggest functional group replacements that can improve the properties of a potential drug in order to increase its potency or to reduce undesirable side-effects.

The Sheffield RG research has impact on the software used to support early-stage R&D in industrial drug discovery, and the principal beneficiaries are hence international pharmaceutical companies and the patients needing drug treatments. This impact takes place via inclusion in corporate information systems and in products marketed by chemoinformatics software companies.

The software developed in collaboration with GSK is now used by them for their routine compound acquisition procedures. These identify chemical compounds synthesised by external suppliers that should be purchased to increase the structural diversity of their corporate database of compounds. GSK also employs RGs for virtual screening, i.e., computational procedures for prioritising compounds for testing for biological activity, and as part of an Artificial Intelligence (AI) based system for identifying potential drug molecules. RGs also formed the basis for the Extended Reduced Graph (or ErG). This was originally developed at Eli Lilly and ErGs are now available for use throughout industry as part of the RDKit module https://www.rdkit.org/) in GitHub, which is the world’s largest repository of open-source software. Commercial products that have adopted the RG technology are also available from Discngine and Lhasa Limited.

Enhancing national and international open access policy and practice

The drive to make academic research outputs open access (OA) has gathered momentum over the last decade but has created major challenges for policy and practice, particularly around costs for higher education institutions. Professor Stephen Pinfield’s research has focused on providing an evidence base for real-world solutions to address these challenges.

In collaboration with Jisc, the UK’s higher education digital information service provider, Professor Pinfield led ground-breaking work examining the costs faced by higher education institutions in providing OA. This included the costs of implementing ‘hybrid’ OA, where subscription journals allow papers funded by pre-publication article-processing charges (APCs) to be made OA. This was important to consider due to the controversial ‘double-dipping’ issue, where publishers draw two income streams (subscriptions and APCs) from the same content.

The result was the TCP: ‘total costs of publication’ for OA articles, which includes subscription, article processing costs and new administrative fees, laid out as one cost for the first time by the Sheffield team. APCs were found to make up about 10% of the TCP.

Higher education institutions were found to be moving towards hybrid OA faster than previously thought. This was significant because higher education institutions could plan their budgets to cater for more ‘gold’ OA (i.e. articles published OA in journals), but at the same time raised important questions about the sustainability of such an approach at those price levels.

Later work by Pinfield and the team in Sheffield clarified specifics of these costs including that administrative costs were less significant than previously thought, but that APC expenditure was rising rapidly along with subscription prices, leading to an overall increase in the price of publication. It was found that the hybrid OA model was creating new and rising costs for the sector, and these needed to be ‘offset’ against subscription income; Pinfield suggested that a more robust ‘offsetting’ model could be an important part of the transition to full OA. The research ultimately provided a clearer picture of the situation and provided the sector with data they could use for financial planning and forecasting.

Policymakers, research funders, publishers and librarians in the UK and the EU have used Professor Pinfield’s research findings to improve their approaches to financial planning and implementation of open access. Pinfield drew on this research for his contribution to two reports commissioned by Universities UK and published in 2015 and 2017 (‘Monitoring Reports’). This work was demonstrably important in subsequent policy development.

Pinfield’s extensive research in OA policy led to his contribution to a wide-ranging report for the European Commission on the European OA market and policy environment. The report, co-authored by Pinfield, surveyed the market and demonstrated that the European Commission was unlikely to meet its OA target for its funded outputs by 2020. This finding links to Pinfield’s earlier work that sets out the financial challenges being faced by higher education institutions responding to OA policy requirements. The research recommended that further policy intervention was required to address these problems, contributing to an impetus to develop a new approach, coordinated by the Commission, Plan S. The Swedish National Library has also used Pinfield’s work in their national approaches to OA.

Pinfield’s findings were referenced in the Burgess Review, Research Councils UK’s 2014 review of its OA policy, paving the way for subsequent policy changes driven by the Science Minister. Jisc was tasked with developing an evidence base for its negotiations with publishers on behalf of UK higher education institutions, and their collections CEO, Lorrain Estelle, cited Pinfield’s work as helping their understanding of the situation and context.

In a 2016 report to the Minister of State for Universities, Science, Research and Innovation drawing directly from Pinfield’s findings, Professor Adam Tickell, Chair of the UUK Open Access Coordination Group, was successful in recommending that a series of stakeholder working groups be established to stimulate further progress on OA.

Do You See What I See? How Google results differ depending on where you are

“We rely so much on Google these days”, says Dr Frank Hopfgartner, Senior Lecturer at the Information School and Investigator on the ‘Do You See What I See’ project. “Google has a search engine market share of over 90% worldwide.”

Undertaken by Dr Hopfgartner with several fellow members of the Cyprus Center for Algorithmic Transparency (CyCAT) - which was profiled in the research magazine Inform II in 2019 (page 29) - this project aimed to discover the differences in the search results that Google provides to users in different parts of the world. Google states that their mission is to “organize the world’s information and make it universally accessible and useful”. This project asked: is that true? Does everyone everywhere have equal access to the same information? And if not, what impact might that have?

“CyCAT for me was very interesting because algorithmic transparency and bias is a very timely topic and one which is receiving a lot of attention”, says Dr Hopfgartner of his own involvement with the Center. “My prior work was based on user profiling and personalisation; I was looking at the methods and now CyCAT is looking at the societal consequences and what we can do about them.”

The COVID-19 pandemic, whilst challenging for all, did create a fortuitous environment for this kind of research. “We didn’t initially think about COVID as a subject for this study”, explains Dr Hopfgartner, “but then we realised that it was the one thing that was on everyone's minds, which gave us a unique opportunity; it doesn’t happen too often that there’s a single topic that the whole world talks about, apart from maybe the World Cup or the Olympics.”

The project team wanted to compare what Google shows to users of their search engine when they searched for topics related to COVID-19 from different geographical locations, as well as from the same location but using different languages.

The first issue with gaining any insight from this kind of research is that without working directly with Google, it’s impossible to know how their algorithms work. For this reason, the search engine had to be treated as a “black box”; a system whose internal workings are hidden. The team could put search queries in and analyse the output, drawing conclusions from there.

The second issue was knowing what people actually search for, without access to Google’s log files. The solution to this lay in crowdsourcing. The team used a crowdsourcing platform to assign two tasks to 400 crowd workers - 100 each from the UK, Italy, Spain and Germany. As one of the crowd workers, your first task was to imagine you had to create a photo diary of what happened during the pandemic; a history book for future generations. Your second task was to create a similar photo diary, this time of what habits were developed during the pandemic. To complete both of these tasks, the crowd workers used Google’s image search function, with the researchers then collecting the search terms that were used.

The search terms were split into five categories which emerged as the most common themes: “stay at home”, “personal protection”, “healthcare”, “pandemic general” and “society impact”. The researchers then put these terms into the Google image search themselves and analysed the images that came back by running them through CLARIFAI, an AI tool designed to create tags based on what it ‘sees’ in images (for example, given an image of a beach at a holiday resort, CLARIFAI would return tags like “sand”, “water”, “people”, “outside” etc). Alongside these tags, the team also looked at the URL of each image that was returned in the search and ascertained where the servers that hosted these sites were based - essentially, from which country the images were being shown - to see how many were local to the user and how many were foreign.

“The results from different countries varied significantly”, Dr Hopfgartner says. “For example the results from Spain and Italy were closer than the results from Spain and the UK were.”

Searches conducted in the UK showed almost no results from Spanish, Italian or German servers, but searches conducted in all three other countries did show a fair amount of results from the UK.

People from different countries were clearly seeing different results when searching for the same thing, but there was some overlap. Searching location-specific terms like “lockdown protest London” or “thank you NHS” showed upwards of 90% of the same results wherever you searched from. However, searching more general terms like “COVID 2020”, “COVID social” or “COVID lockdown” only showed 2% overlap between countries.

Even where there was overlap, there were some surprises. Searching “how to get taste back” gave an 87% overlap between locations, despite there being far from a universal answer to this question in medical science.

The team also looked at how much overlap there was in results split into the broad categories of search terms, as well as how many local vs foreign results were returned in each country. Again, there were wide variations in both cases.

“Given that there are these differences in how search results are shown in different countries, we concluded that this might actually influence how we see the world”, says Dr Hopfgartner. “For example, if a UK Google user is seeing a vast majority of results that are from the UK, they have no idea what is going on in, say, Italy.”

When we rely on one search engine - or one company - to collate our information for us, it’s easy to assume that they are doing so completely blindly and objectively, but that may not always be the case.

“If we look back on this time in the future we may realise that we had no idea what was going on across the world because we were using Google as a filtering lens that narrows our viewpoint”, says Dr Hopfgartner.

There are also concerns about the creation or exacerbation of a so-called ‘digital divide’, where people who regularly use search engines may develop a certain view of the world, whilst those who don’t (or can’t) use them develop a totally different one.

One of the main focuses of the CyCAT project as a whole is aimed at combating some of these issues. The team has developed a system that interfaces with search engines and highlights potential biases to users. For example, the tool might tell you that most of your search results are coming from the UK and give you the option to filter your results to show you only sites hosted in Spain. This greater transparency and control over your search results is what CyCAT is aiming at, and work on perfecting this tool is ongoing.

With the ‘Do You See What I See’ project having wrapped up in December of 2021, a research paper is now published showing the team’s findings. As scrutiny on fairness and transparency in our tech increases, work like this is becoming more and more important, with consequences for the everyday lives of many of us who use services like Google’s search engine on a daily basis.

“It can be an eye-opener to realise that when we use Google to gather information, they are a gatekeeper”, concludes Dr Hopfgartner. “We need to remember that there may be more information out there that we don’t get if we rely on just one such service.”

Supporting Early Stage Digital Startups - How to define "growth" for young businesses

Starting a business is inherently challenging and risky. Entering a world of experienced competitors where you have no standing or reputation means that it’s always going to be an uphill battle to gain recognition and start to be financially viable. What often makes this even harder is that the systems and processes that support businesses are largely modelled around companies that have a track record, not those just starting out.

“When a startup is in its early stages - the first five years or so - it’s very difficult to have financial indicators that are above zero to show their funders that they are productive and are experiencing growth”, says Dr Efpraxia Zamani, Senior Lecturer at the Information School and one of four co-investigators engaged in research to address this problem.

With Anastasia Griva (from the National University of Ireland Galway, funded by the EU’s Horizon 2020 research and innovation programme), Dr Dimosthenis Kotsopoulos and Dr Angeliki Karagiannaki (both from ELTRUN, Athens University of Economics and Business), Dr Zamani has been working on this project with early career digital startups in Greece over the past two years, building on existing relationships from past research.

The project’s first stage aimed to define what “growth” actually means for early stage startups; this is a gap in the literature, but also something that is much needed in practice.

“Existing definitions of growth tend to rely on indicators such as profit growth and labour growth”, says Dr Zamani. “These metrics have very little relevance for early stage startups.”

Firstly, the research team consulted experts and practitioners in the field, venture capital funds and incubators (bodies which support startups), asking them for their views on what growth is, as well as how - or even if - it should be measured, and how all of this relates to the capabilities of startups. Next, a survey was conducted with both early stage startups and startups that were more mature, asking about their experiences. Specifically, the survey looked at how company culture was perceived to affect growth and what capabilities were necessary to achieve growth early on. After gathering all this data, the team then needed a more nuanced understanding of what these metrics really meant.

“We wanted to focus more on the contextual factors; how these things actually relate to the everyday experience of startups”, Dr Zamani explains. To do this, the team undertook a comparative case study of two Greek startups. Startup A was an early stage virtual company, who had been working remotely from their very inception, with no physical footprint. This company was focused on technological innovation, in terms of their products and services but also in terms of their internal processes as a business. Startup B was a little more mature, having gone through growth and now functioning as a profitable company with positive financial indicators. This second company did have physical premises, but - like so many - was now working remotely due to the COVID-19 pandemic. Both companies were interviewed about their company culture, their perceptions of growth and how they actually went about promoting and selling their goods and services to their customers.

In comparing the two companies’ responses, the team found that Startup A were in the habit of holding back their technological innovations, making sure they were absolutely perfected before releasing them. This was at the expense of focusing on customer experience and satisfaction. Startup B, in contrast, focused more on continually improving their products and services once they were released, as well as paying attention to customer experience. They also placed a lot of emphasis on the satisfaction and happiness of their staff base.

“Whilst the first case couldn't really break free from that early stage, the second case managed to very quickly gain positive financial indicators due to their focus on their staff and their clients”, says Dr Zamani. “They were more ready to adapt to the turbulent environment inflicted on them by the pandemic.”

Despite already being based solely online, Startup A were simply not able to adjust their operations and selling capabilities effectively enough to sustain growth through the challenges that faced them in 2020.

From their findings, the research team came to define growth for early stage startups as “the result of the company’s selling capabilities, but also the ability to scale up using their entrepreneurial skills, adaptability skills and innovation capacity”. In order to achieve this kind of growth, the company’s “absorbing capacity” has to also be considered: realising the value of external knowledge, such as from consultants or incubators, and bringing it inside knowledge from outside. This, the researchers conclude, will help them to attract additional funding.

At the end of the study, the researchers developed some propositions intended for academics to take on to further research, but also for practitioners to implement into their contexts in the field. One of these propositions is that early stage digital startups which experience growth do so because they have enhanced their selling capabilities to include empathy for their customers; engaging with them and developing their products and services based on their needs, rather than just pushing whatever things they want to sell. Another proposition is that successful early stage startups exhibit absorbent and adaptive capabilities; taking external information from experts and from their own analytics and appropriating them into their own contexts. This helps prepare them for any major disruptions - such as the major one we are all still living through in 2022 - and making them more likely to survive.

“My personal motivation to be involved in this project was less to do with an interest in growth but more an interest in the digital”, says Dr Zamani. “What is its role in a startup? Does it help or inhibit operations? To what extent do advanced technologies like business analytics and the Internet of Things provide value for clients and businesses?”

“The pandemic gave us an opportunity to see how these technologies offer business continuity or how they can actually be an obstacle”, Dr Zamani continues. “These companies are not always supported by governments, so we were interested in how they themselves could leverage the digital to move into a place where they don’t need more investment.”

Startups need information to feed into the analytics machine right from their beginnings in order to identify opportunities. Though they might not have an ‘information professional’ on their team defined as such, every stage of a digital startup's early years will require the exploitation of data in order to put the company at the forefront of their area, making them attractive to funders even in the absence of traditional financial indicators.

Dr Zamani says that some of the findings about company culture were unexpected. The successful startup that was studied was consumer-focused, as expected, but the prioritising of continuous staff development was a surprise.

“They were looking for people that would fit into the organisation, irrespective of whether they had the technical skills”, Dr Zamani explains. “They were confident that the business itself could support them to develop the skills, whereas the organisational fit either works or it doesn’t.”

The other startup was hiring people that were experts in their field and had the technical skills required for their roles, but that didn’t always fit into the organisation.

“They failed to see that and it created a higher staff turnover, which impeded their growth”, concludes Dr Zamani.

A research paper has been published on this project, with another in the pipeline. Dr Zamani hopes that angel investors and other funders will take an interest in the project’s findings; investors decide whether to invest in a company based on if they believe in the product or service, but they also want to see indicators which may not exist for early stage startups, and this research aims to close the gap and stop these startups from being excluded from funding. Dr Zamani posits that anyone interested in digital entrepreneurship will get something from this research, especially given that the study was undertaken over such a long period of time and with multiple different approaches, giving many different angles to the issues tackled.

By helping young businesses to be able to define and show their growth to investors and customers alike, Dr Zamani and the rest of the research team behind this project are taking their expertise in the digital world out into the real world and trying to level the playing field in any number of competitive markets. Information is at the heart of everything in modern society, with business and technology being just two of many intersecting areas. There’s never been a more exciting time to be working in information science.

School of Information, Journalism and Communication

School of Information, Journalism and Communication

Information research case studies

School of Information, Journalism and Communication

School of Information, Journalism and Communication

Information research case studies

Centres of excellence