The Function Word Spectrum and the Content Word Vector

Function words (conjunctions, prepositions, pronouns, auxiliary verbs and certain kinds of adverbs) express how content words (nouns, adjectives, main verbs and adverbs) relate to each other. Content words are the semantic units that carry the meaning and function words express the relationships existing between them. Sometimes, function words act as content words. 

Calculation of word frequencies is a common practice in text analysis. The frequencies are of interest for a number of reasons. For instance, when ranked by frequency of occurrence, the content words of a text provide a certain insight into its subject. The three most frequently occurring content words in the Bible are ‘lord’, ‘god’, ‘said’. We will refer to the list of content words of a text ordered by frequency of occurrence as the content word vector.

If all the words in a text represent the complete semantic domain of that text, then the content words and the function words could be viewed as two distinct, separate components of this domain. Even though the same word can act as a function word in one instance and a content word in another, in any specific instance any word performs a single role and can be assigned to either one group or the other. 

Examining content word vectors is relatively straightforward. It is possible to read through the ordered list of content words. We have already looked at the top three words in the Bible. Here are the top 12 words from the works of Emily Dickinson: love, good, come, lord, enter, man, go, know, say, make, see, take. The content word vector of a poet, it seems, is poetic. 

Although they are relatively few in number, function words occur in speech and writing with particularly high frequency. It is of course possible to read through the list of all the function words in any particular text ordered by frequency of occurrence; however, we propose another type of indicator that gives you a kind of an insight into the function words within a text, as a whole. 

We group all the function words in a text into categories either by likely meaning–the meaning implied in the relationship that they express–or the function that they perform. For instance, the auxiliary verbs and the modal verbs are in the ‘verbs’ category. They may be in the past, present or future tenses; they can be positive or negative (do, don’t; did, didn’t). Articles, affirmation/negation expressions (yes, no, not) and exclamations are four separate categories. (It is possible to fine-tune the category of the affirmation/negation by adjusting for the affirmation/negation expressed through verbs, auxiliary verbs and modals.) The category of pronouns includes personal, possessive and reflexive pronouns. We include, in this category, such words as ‘our’, ‘your’ or ‘my’ which are sometimes referred to as possessive determiners. Additionally, the indefinite pronouns are in the indefinite category (see below). The quantity category includes words related to a quantity or an amount, such as ‘more’, ‘less’, ‘many’, ‘few’. The indefinite category includes the words ‘any’, ‘every’, ‘some’, ‘whatever’, ‘whoever’ and some others. The demonstrative category includes ‘that’, this’, ‘those’, ‘these’ and certain other words. ‘What’, ‘which’, ‘who’ and some other words are in the relative/interrogative category. Other categories include the groups of time, place, manner, cause and consequence. 

The categories that we propose aim to identify and represent all the distinct aspects of meaning implied in the relationships observed among the content words.


Category Name:

We assign each function word, in a particular text, to its category and calculate the relative weight of each category. Each of the 22 categories is represented as a colour in a rainbow; the width of a specific colour in the rainbow is adjusted to reflect the weight of the category in the text.

As discussed above, most words, including function words, may act in different capacities in different situations. For instance, the same word can be a preposition in one place and an adverb in another. Consider sentences: ‘He lives in the house across the street’ (‘across’ is a preposition – a function word expressing a relationship of relative position in physical space) and ‘Let us swim across’ (‘across’ is an adverb – a content word expressing a direction/destination of movement in physical space). Additionally, the same word may express different kinds of relationships and have multiple meanings. The word ‘thence’ may refer to a place, a time, or a source. The word ‘might’ may be a form of the verb ‘may’ or may mean ‘power’, ‘force’. ‘May’ may refer to the verb ‘may’ or the month of May. 

Consider Hamlet’s famous words from Shakespeare’s Hamlet, Act 3, Scene 1:

To be, or not to be, that is the question:

Whether ’tis nobler in the mind to suffer

The slings and arrows of outrageous fortune,

Or to take Arms against a Sea of troubles,

And by opposing end them…

In computerised text analysis, ‘to’ and  ‘be’, ordinarily, would be regarded as function words; here, the infinitive ‘to be’ could be viewed as performing a function of a unit of content.

For these reasons, it is essential to stress that the proposed weighted category indicator, which we will call the function word spectrum, provides only an approximate sense of what may be the nature of grammatical relationships represented by all the function words within in a text. In the absence of a manual mark-up, this sense will remain imprecise. 

It is also important to note that the proposed indicator is construction- or definition-dependent. We could group the function words into categories through a different logic and obtain a different insight into the text. Additionally, as function words are relatively few in number, despite their high frequency of occurrence, it is possible to single out specific words through manual mark-up and examine their role within a text in greater detail, by looking at their frequency, distribution or other parameters.

Some examples of function words with content word meaning excluded from the content word vectors

art (art and the old form of are)

bee (a bee and the old form of be)



myghte (an old spelling of might)


To examine the content word vector of any literary work, as discussed, select the literary work in the tab and order the word list, in the Words tab, by frequency of occurrence (Note: function words are omitted from the list). To examine the function word spectrum for a specific work, click on the title in the authors and works area of the Books tab. The function word spectrum is shown as a rainbow in the Source Text area on the lower right, underneath the title of the work and the author’s name. The second rainbow, when shown, represents the aggregated function word spectrum for all the literature that is currently selected. To compare the function words spectra of two works, for instance, select one and click on the title of the other: the first rainbow will show the spectrum of the latter and the second–the former.

Interesting Resources:

A considerable amount of research work is being done in this area. 

Some examples:

  1. The narrative arc: Revealing core narrative structures through text analysis Ryan L. Boyd1, Kate G. Blackburn2 and James W. Pennebaker2

Science Advances  07 Aug 2020:
Vol. 6, no. 32, eaba2196
DOI: 10.1126/sciadv.aba2196

The researchers study similarities in the core narrative structures in various texts. For this purpose, they examine the frequencies of “cognitive processing” words and function words. However, they group the function words differently:

At the beginning, then, the author must signal concrete labels, names, and other identifying clues for the characters, places, and objects in the story; importantly, the author must also connect these dots by elaborating on their interrelations. In providing the necessary background, the author must necessarily use high rates of prepositions and articles (the mansion was next to the lake, below a bluff, by the road)—words that are inherently information-structural (18). Once the reader becomes familiar with the context, the author can later refer to the mansion as “it” or “her home” or perhaps not at all. Once the plot gets moving, there should be a large increase in pronouns, auxiliary verbs, and other function words and a corresponding drop in articles and prepositions.

The narrative arc: Revealing core narrative structures through text analysis
  1. The OED recently published the OED Text Visualizer. Explore its functionalities here.