Digital Resources: Digital Scholar Lab

Digital Scholar Lab


Digital Scholar Lab is a tool for Digital humanities studying and research by Gale. It is designed to facilitate learnining and researching Gale's digital resources licensed by the University of Helsinki. The Digital Scholar Lab is a cloud-based platform that enables students and researchers with UH credentials to access content and OCR data from Gale Primary Sources as well as one's own plain text files and analyse these with text and data mining tools. One can create custom data sets from the archives available and use digital tools to analyse them. A few sample projects are provided for teaching and learning the different stages of DH research. In addition, a learning centre within the tool offers instructional videos and texts on the various elements involved.

With a database-style interface, the Digital Scholar Lab provides a familiar navigational style for students who are new to text data mining as well as seasoned scholars. Users of the Gale Digital Scholar Lab can download individual content sets (up to 5,000 documents). Downloaded content sets are for scholarly, non-commercial use only.

What is the Digital Scholar Lab?

Three Basic Functions in Digital Scholar Lab

In the Digital Scholar Lab, there are three basic functions, following the basic research methology in digital humanities. In the Lab you can find helpful videos and information on all these functions.

1. Building a custom content set (data set)

By searching a certain topic from Gale's Primary Sources available to the library you can find and collect appropriate material which you can then analyze with the Lab. In addition, you can also upload a plain text of your own choice. You can search (basic or advanced search) using words that appear in a text, or the metadata describing it. Once you have got results, you can review the information on each document to determine if it's right for your research. You can also click into results to see the original document scan and it's text. You can also use filters to further refine your search. Once you are happy with the results, you can create a custom content set. Choose the results most relevant to your research question (or scanned documents) and add them to the set.

2. Cleaning the content set

Once you have the content set ready, it is usually necessary to clean it (removing unwanted words, punctuation or characters) for analyzing the data. You can create multiple cleaning configurations so you can tailor how a content set is cleaned depending on the analysis you are trying to do. Test your configurations by selecting a content set and then the first 10 documents will be cleaned with your settings. You will then be able to download the original and clean version of those documents for comparison. When you are happy with the results, you can use the configuration you choose as basis for analysis.

3. Analyzing the content set

Analysis allows you to take hundreds or thousands of documents and use digital tools to analyze them in ways that would have been too time consuming without the help of computational algorithms. There are several tools available for analysis in the Lab (see the box right) with a helpful video instructions. Each tool has settings you can use to tune the results you can get. With these tools you can also create nice visualizations like the one below.


Possible Users of the Digital Scholar Lab

  • Undergraduates / Graduates/ Researchers learning the fundamentals of text mining, archival research and analysis methodologies.
  • Teachers who want to introduce elements of text mining into their teaching, but are aware that students' technical skills might be limited. Alongside digital literacy skills, research and archival skills are also being taught, so a tool that contextualizes research is ideal.
  • Post-Graduate / Postdoctoral / Traditional Humanities Faculty beginning to explore incorporating DH methodologies into their scholarship; introducing more text analysis into research; concerned with research outputs. Often teaches classes/acts as TA. May have some self-taught technical skills, or very little.
  • Established Digital Humanties (DH) Faculty and DH Librarians conducting research, often with coding skills and experience in DH and who may be lecturers on Digital Humanities, and who may also be published on the subject.

What you can do with the Digital Scholar Lab

With Digital Scholar Lab you can:

  • Access many interesting digital archives by Gale licensed by the University of Helsinki, such as Times Digital Archive, Eighteenth Century Collections Online  - Ecco I-II, Nineteenth Century Collections Online;  7 and 12 19th and Century U.S. Newspapers.     
  • In addition to these archives, you can upload plain text files, manually create text documents, apply metadata, clean, build and manage their content sets in a single environment. You can analyze and visualize plain text files you have collected alongside your Gale Primary Source archives to enrich research outcomes and extend the content reach for text mining and analysis. You will find the upload feature in the section "Build" in the lab.
  • View the original primary source document and OCR text side-by-side. The keywords used to perform the search are highlighted in the primary source document
  • View the OCR confidence rating of a document and learn how the OCR text was generated for these collections
  • Construct custom data sets from the Gale Primary Sources available to the Helsinki University staff and students.
  • Analyse the custom data sets with powerful text mining tools
  • Organise and manage your research
  • Export tabular data, and visualisations in standard formats

Gale Resources Available in the Digital Scholar Lab

  • British Library Newspapers contains full runs of influential national and regional newspapers representing different political and cultural segments of British society.
  • Use Eighteenth Century Collections Online to access the digital images of every page of books published during the 18th Century. With full-text searching of millions of pages, the product allows researchers new methods of access to critical information in the fields of history, literature, religion, law, fine arts, science and more.
  • Indigenous Peoples of North America sources collections from Canadian and American institutions, providing insight into the cultural, political and social history of Native Peoples from the seventeenth into the twentieth century. Including diverse manuscripts; book collections; newspapers from various tribe and Indian-related organizations; materials such as Bibles, dictionaries and primers in Indigenous languages all enable students' examination of important primary source materials.
  • Nineteenth Century Collections Online. A ground-breaking resource for nineteenth century studies, Nineteenth Century Collections Online is a multi-year global digitization and publishing program focused on primary source collections of the "long" nineteenth century. Collections for this program are sourced through partnerships with major world libraries as well as specialist libraries, and content includes monographs, newspapers, pamphlets, manuscripts, ephemera, maps, statistics, and more.
  • Nineteenth Century U. S. Newspapers. With digital facsimile images of both full pages and clipped articles for hundreds of nineteenth-century U.S. newspapers and advanced searching capabilities, researchers will be able to research history in ways previously unavailable. For each issue, the newspaper is captured from cover-to-cover, providing access to every article, advertisement and illustration.
  • Seventeenth and Eighteenth Century Burney Newspapers Collection. The newspapers and news pamphlets gathered by the Reverend Charles Burney (1757‒1817) represent the largest single collection of seventeenth and eighteenth-century English news media. The 700 or so bound volumes of newspapers and news pamphlets were published mostly in London, however there are also some English provincial, Irish and Scottish papers, and a few examples from the American colonies, Europe and India.
  • Seventeenth and Eighteenth Century Nichols Newspapers Collection. Seventeenth and Eighteenth Century Nichols Newspapers Collection features London newspapers and pamphlets gathered by antiquarian and printer John Nichols. This collection, sourced from the Bodleian Library, spans the years 1672 to 1737 and complements the titles and issues found in seventeenth and eighteenth-century Burney Collection Newspapers.
  • The Economist Historical Archive 1843-2015. The Economist Historical Archive is the fully searchable facsimile edition of The Economist, the weekly paper for anyone engaged in politics, current affairs, business and trade worldwide. Containing every issue since its launch in 1843, the archive offers full-colour images, multiple search indexes, topic and area supplements and surveys. It is an unrivalled multidisciplinary primary source for researching and teaching the nineteenth and twentieth centuries.
  • The TImes Digital Archive. First published in 1785, The Times of London is widely considered to be the world's 'newspaper of record'. The Times Digital Archive allows users to search over 200 years of this invaluable historical source.
  • U.S. Declassified Documents Online offers access to over 750,000 pages of government documents. Covering major policy issues from the period before the World War II into the twenty-first century, the archive serves as a convenient source for documents from government departments including Defense; State; Treasury; CIA; and the White House. U.S. Declassified Documents Online supports the study of history, politics, international relations, and journalism, among other fields.

Analysis Tools in Digital Scholar Lab

  • Document Clustering analyzes documents using statistical measures and groups them based on term frequencies and the k-means algorithm to determine similarity between each document in your content set.
  • Named Entity Recognition (NER) recognizes and extracts proper and common nouns from documents using spaCy's Parts of Speech tagging model, and outputs them as lists grouped by entity type including people, organizations, companies, locations, and more.
  • Ngram is a term, or collocation of terms, found in your content set. You set the range or number of terms (‘N') you wish to consider in your analysis. Then, the frequency or those Ngrams is counted and displayed for analysis.
  • Parts of Speech uses natural language processing of syntax to recognize and tag various parts of speech. It provides users with the building blocks for looking at how phrases are constructed within each document in a content set.
  • Sentiment Analysis scores assigns an overall sentiment to each document by assigning positive and negative values to each term and then averaging those scores. Terms are assigned scores based on the AFINN lexicon.
  • Topic Modeling allows users to analyze a large corpus of unstructured text and groups terms that co-occur frequently. These groups of terms are "topics" that you then assign meaning to based on the terms and other measures.

How is Digital Scholar Lab used in teaching and instruction?

Questions about the Digital Scholar Lab?

Questions about Digital Scholar Lab? Write to us at