Digging into Image Data to Answer Authorship Related Questions

Executive Summary

An international, multi-institutional and multi-disciplinary team of researchers from the University of Sheffield (UoS), UK; Michigan State University (MSU), MI, USA; and University of Illinois at Urbana-Champaign (UIUC), IL, USA; will jointly explore authorship across three distinct but in some respects complementary datasets: a collection of digitized 15th-century manuscripts, a collection of 17th- and 18th-century digitised maps and a collection of 19th- and 20th-century digitized quilts. The datasets are all freely available to the investigators, and represent a very large and diverse collection of digitized scans or photographs in standard image file formats. The US team will consist of members from UIUC (applying to NSF) and MSU (applying to NEH). The UIUC team include Anne D. Hedeman, Karen Fresco, Robert Markley, Kevin Franklin and Peter Bajcsy (as US NSF project director). The MSU members include Dean Rehberger (as US NEH project director), Wayne Dyksen, Matt Geimer, Anil Jain, Justine Richardson, Marsha MacDowell, and Mark Kornbluh, as well as project evaluator Steve Cohen. The UK members (who will apply to JISC) include Peter Ainsworth (as UK JISC project director) and Michael Meredith from the University of Sheffield, UK.

The topic of authorship is one of the common research questions in multiple disciplines of humanities, arts and social sciences that unites the proposed underlying image analyses. In the past, the problem of authorship has been explored in cases of individual masterpieces or small collections of art from the same time period. Our research seeks to investigate the accuracy and computational scalability of adaptive image analyses when it is applied to diverse collections of image data while driven by the same overarching question of authorship. The objective of this proposal is:

  • To design image analysis algorithms that will extract salient image features, group together images based on similarity of these features, classify groups according to a priori knowledge, optimize algorithmic steps and parameters
  • To apply the algorithms jointly developed to all the aforementioned collections of images
  • To report accuracy and computational requirements over all of the image collections

The intellectual merit of the proposed activities lies in addressing the open research problem of authorships and the corresponding image analyses leading to computationally scalable and accurate data-driven discoveries of corresponding pairs of images and labels. The open research problems are divided into artistic, scientific and technological questions based on the datasets.

The broader impacts resulting from these activities are in the research results of image analyses, methodologies for determining authorship, scalable algorithms, and exploratory frameworks that will support the aforementioned scientific questions. Our study would be the first of its kind to report accuracy and computational scalability of adaptive image analyses as applied to diverse collections of image data driven by the same question of authorship. The proposed effort also addresses the basic questions posed by the Digging into Data Challenge. First, it addresses the question "what do you do with a million photographs of artwork?" by tackling machine learning research problems for image analysis that will help focus the search for relevance in large image datasets, and by selecting the common problem of authorship. Second, it addresses the question of finding the most accurate and computationally scalable algorithms for establishing authorship. Third, it addresses the question of enabling discoveries over huge repositories of diverse collections where the notion of scale becomes a major obstacle for humanities and social science research. In summary:

  • The effort will promote the development and deployment of innovative image analyses targeting the problem of authorship and applied to large-scale data analysis
  • It will foster interdisciplinary collaboration among scholars in the humanities, computer sciences, and information sciences
  • It will promote international and domestic collaborations
  • It will lead to unique accuracy and computational scalability findings over a set of large, diverse digital collections made available over the grid to a significant body of researchers from complementary disciplines keen to learn from each other.