top of page

An Introduction to Distant Reading with Voyant Tools

Contents

     i. What is "Distant Reading?" ​

     ii. Why Distant Read?

    iii. Introduction to Voyant Tools

    iv. Tools and Methods 

    v. Drawing Conclusions

    vi. Next Steps

This tutorial is intended to introduce the reader to the idea of "distant reading" as a method for analyzing text documents, and to guide the user through the various functions of the free, online text analysis platform called Voyant Tools. The lesson will include definitions and uses for distant reading, and is targeted at those who have no experience doing a distant reading or applying big data methods to texts. No coding skills are necessary, and some knowledge of traditional close reading techniques may be useful. 

What is Distant Reading?

Distant reading is way of looking at features of a text, or collection of texts, without actually doing any reading in the conventional sense. Instead, distant reading uses technology to analyze trends and primary features of a text by treating the text like data, rather than literature. It includes “processing” texts, with regards to content (like themes, subjects, word choices, etc...) and information (such as publication dates, location, and author) in order to glean insights about cultural or literary trends as they are reflected in the text. This method of analysis can also be used to process other text-based elements, such as tweets, website content, news articles, periodicals, and more.

The term distant reading was coined by Franco Moretti, founder of the Stanford Literary Lab, who argues that modern studies in literature would greatly benefit from the application of data mining strategies to collections of texts. If you are interested in learning more about Moretti’s work and the reasoning behind his advocacy for distant reading, check out his essays “Conjectures on World Literature” and “Slaughterhouse of Literature.”

Why Do Distant Reading?

You might be asking yourself, “how can I possibly learn more about a text by NOT reading it?” That is a valid question. As an English major, I do not suggest that distant reading strategies are a replacement for traditional reading. However, as new technologies allow us to digitize and store more textual information, more than could be read by a one person in a single lifetime, we need to find new ways to “read” and synthesize vast bodies of text.

Benefits

  • "Read" large volumes of texts in less time than traditional reading

  • Make comparisons across texts over time, location, author, etc... 

  • Allows more texts to be "read," broadening the canon of literature that we engage with. 

  • Allows for a broad examination of cultural, social, and political trends and evolutions. 

Concerns

  • Disengages the reader from the emotional involvement of traditional reading; the story of a text may not have the same effect.

  • "Reading" is a misnomer, since distant reading is more related to statistics and data analytics.

  • Conclusions may be difficult to come to if the reader does not know what to look for. 

Anchor 1
Why?

Getting Started with Voyant Tools

Voyant Tools is a free, web-based, text reading and analysis tool that allows the user to apply basic processes to digitized texts. It is developed and updated by Stéfan Sinclair and Geoffrey Rockwell, both of whom are Canadian scholars of the digital humanities.

First, let’s talk about how to upload the texts that will make up your body of analysis, or corpus. There are three main ways to do this:

Dialogue box
Upload files here
2
3
4
5
1

1. Cirrus (a word cloud), List of terms, and Links (network visualization) 

2. Text of your corpus

3. Summary of the corpus, documents in the corpus, and phrases 

4. Trends (frequency of words across documents or segments of documents)

5. Contents (each sentence), Bubblelines (visual representations of term frequencies across each document), and Correlations (words commonly used near or with each other in the texts). 

Now, let us talk about modifying the corpus. To do this, locate the documents tab in window 3 (located in the lower left-hand corner). In this window, you can see what documents comprise your corpus, as well as reorder them. 

​

* A note on order -  

It is useful to know the order of your documents when interpreting some of the visual representations of the text. The first document will usually be represented on the left, followed by the second, etc.. 

Most often, the best order is dependent on the informational aspect of the text you would like to analyze. Chronological order, for example, makes sense when looking for historical trends. 

​

Click here to add or remove documents from the corpus

Doing a Search

A critical aspect of navigating Voyant Tools is knowing how to effectively use the search function. All of the tools have a search box in the toolbar located at the bottom of the display. Here are a few of the most useful search syntaxes:

  • “Moral Treatment” - putting two words in quotations will search for this as a single term

  • Moral* - adding an asterisk at the end of a term will search for words that begin with “moral” (treats it as a prefix)

  • *moral - adding an asterisk at the beginning of a term will search for words that end with “moral” (treats it as a suffix)

  • Treatment|cure - search for terms separated by pipes as a single term

​

​

​

​

​

Most documents contain a certain number of words that, unless they are the subject of your query, are useless for analysis. These words include things like “the,” “of,” “that,” etc... Additionally, there might be extratextual words that are not part of the original document, such as “Project Gutenberg,” which is included in all texts they provide, but is not a common feature of your documents that is useful for analysis. To remedy this, we can make use of stopwords.

​

​

​

​

​

Using Stopwords

Click for tool options.

Now that you know the basic overview of Voyant Tools, I will go a little more in depth on some of the most useful features... 

Getting Started
  1. Paste texts straight into the dialogue box. This option is the easiest and most intuitive, but can be impractical for large documents or corpora.

  2. Upload a file from your computer. Voyant Tools is fairly decent at reading TXT, HTML, XML, and Microsoft Word files, and is okay at reading PDF’s (sometimes it will not process all the textual elements). I highly suggest converting your documents into Word or plain text files before uploading them, just to save yourself some frustration.

  3. Copy and paste URLs into the dialogue box. If your documents are available from online resources such as Project Gutenberg, Europeana, or the Internet Archive, then you can simply copy/paste the URLs directly into the dialogue box. Give each URL its own line and enter them in the order you’d like them to be processed in.

​

Some notes on this process: If you upload a file, the program will sometimes only allow one file in the initial upload. In order to add more documents to your corpus, you will have to "modify" the corpus. Instructions on this follow. You can also use a sample corpus, provided under the “open” button, to experiment with the features. Voyant tools provides full texts for Wiliam Shakespeare’s plays as well as the works of Jane Austen.

​

Here is the default screen you will be shown after uploading your texts. 

​

Using Stopwords  

1. Select the first dropdown menu in the options panel (hint: the one titled "stopwords")

2. Choose the list appropriate for your document. This will most likely be "English" or "Auto-detect," although      they offer options for an array of other languages. 

3. If you want to add more words to the list, click the button that says "edit list," and manually add new words. 

​

*note - if you select the checkbox to "apply globally," then the same list of stopwords will be applied to all the features in Voyant Tools.

Tools and Drawing Insights

bottom of page