The Function Word Spectrum and the Content Word Vector

Function words (conjunctions, prepositions, pronouns, auxiliary verbs and certain kinds of adverbs) express how content words (nouns, adjectives, main verbs and adverbs) relate to each other. Content words are the semantic units carrying the meaning and function words signify the relationships existing between them. Sometimes, function words act as content words and vice versa. 

Calculation of frequencies of words is a common practice in text analysis. The frequencies are of interest for a number of reasons. For instance, when ranked by frequency of occurrence, the content words of a text provide a certain insight into its meaning. The three most frequently occurring content words in the Bible are ‘lord’, ‘god’, ‘said’. We will refer to the list of content words of a text ordered by frequency of occurrence as the content word vector.

If all the words in a text represent the complete semantic domain of that text, then the content words and the function words could be viewed as two distinct, separate components of this domain. Even though the same word can act as a function word in one instance and a content word in another, in any specific instance a word performs a single role and can be assigned to either one group or the other. 

Examining content word vectors is relatively straightforward. It is possible to read through the ordered list of content words. We have already looked at the top three words in the Bible. Here are the top 12 words from the works of Emily Dickinson: love, good, come, lord, enter, man, go, know, say, make, see, take. The content word vector of a poet, it seems, is poetic. 

Function words occur in speech and writing with particularly high frequency; long lists of articles, prepositions and auxiliary verbs are not as readable as the content word vector. For this reason, we propose an indicator that gives you a kind of insight into the function words within a text as a whole. 

We group all the function words into categories either by likely meaning–the meaning implied in the relationship that they express–or the function that they perform. For instance, the auxiliary verbs and modals are in the ‘verbs’ category. They may be in the past, present or future tenses; they can be positive or negative (do, don’t; did, didn’t). Articles, affirmation/negation expressions (yes, no, not) and exclamations are four separate categories. (It is possible to fine-tune the category of the affirmation/negation by adjusting for the affirmation/negation expressed through verbs and auxiliary verbs.) The category of pronouns includes personal, possessive and reflexive pronouns. We include, in this category, such words as ‘our’, ‘your’ or ‘my’ which are sometimes referred to as possessive determiners. Additionally, the indefinite pronouns are in the indefinite category (see below). The quantity category includes words related to a quantity or an amount, such as ‘more’, ‘less’, ‘many’, ‘few’. The indefinite category includes the words ‘any’, ‘every’, ‘some’, ‘whatever’, ‘whoever’ and some others. The demonstrative category includes ‘that’, this’, ‘those’, ‘these’ and certain other words. ‘What’, ‘which’, ‘who’ and some other words are in the relative/interrogative category. Other categories include the groups of time, place, manner, cause and consequence. 

The categories that we propose aim to identify and represent all the distinct aspects of meaning implied in the relationships observed among the content words.

Categories:

Category Name:
1ARTICLES
2TITLES_ABBREVIATIONS
3PRONOUNS
4INDEFINITE
5VERBS
6EXCLAMATIONS
7YES
8NO_NOT_NONE
9RELATIVE_INTERROGATIVE
10DEMONSTRATIVE
11SPACE
12TIME
13MANNER
14CAUSE
15CONSEQUENCE
16JUXTAPOSITION
17POSSIBILITY
18CERTAINTY
19QUANTITY
20DEGREE
21COORDINATING
22SUBORDINATING

We assign all the function words, in a particular text, to the categories and calculate the relative weight of each category. Each of the 22 categories is represented as one of the colours in a rainbow; the width of a specific colour in the rainbow is adjusted to reflect the weight of the category in the text.

As discussed above, most words, including function words, may act in different capacities in different situations. For instance, the same word can be a preposition in one place and an adverb in another. Consider sentences: ‘He lives in the house across the street’ (‘across’ is a preposition – a function word expressing a relationship of relative position in physical space) and ‘Let us swim across’ (‘across’ is an adverb – a content word expressing a direction/destination of movement in physical space). Additionally, the same word may express different kinds of relationships and have multiple meanings. The word ‘thence’ may refer to a place, a time, or a source. The word ‘might’ may be the modal form of the verb ‘may’ or may mean ‘power’, ‘force’. ‘May’ may refer to the verb ‘may’ or the month of May. 

Consider Hamlet’s famous words from Shakespeare’s Hamlet, Act 3, Scene 1:

To be, or not to be, that is the question:

Whether ’tis nobler in the mind to suffer

The slings and arrows of outrageous fortune,

Or to take Arms against a Sea of troubles,

And by opposing end them…

In computerised text analysis, ‘to’ and  ‘be’, ordinarily, would be regarded as function words; here, the infinitive ‘to be’ could be viewed as performing a function of a unit of content.

For these reasons, it is essential to stress that the proposed weighted category indicator, which we will call the function word spectrum, provides only an approximate sense of what may be the nature of grammatical relationships represented by all the function words within in a text; in the absence of manual text mark-up, this sense will remain imprecise. 

It is important to note that the proposed indicator is construction- or definition-dependent. We could group the function words into categories through a different logic and obtain a different insight into the text. As function words are relatively few in number, despite their high frequency of occurrence, it is possible to single out certain words through manual mark-up and examine their role within a text in greater detail by looking at their frequency, distribution or other parameters.

Some examples of function words with content word meaning excluded from the content word vectors

art (art and the old form of are)

bee (a bee and the old form of be)

being

might

myghte (an old spelling of might)

still

To examine the content word vector of any literary work, as discussed, select the literary work in the Books tab and order the word list, in the Words tab, by frequency of occurrence (Note: function words are omitted from the list). To examine the function word spectrum for a specific work or a group of works, select the book/s in the Books tab and click on the title (in the authors and works list section of the Books tab): the function word spectrum is shown as a rainbow in the Source text area on the lower right, underneath the title of the work and the author’s name. The second rainbow represents the aggregated function word spectrum for all the literature that is currently selected.

Interesting Resources:

A considerable amount of research work is being done in this area. 

Some examples:

  1. The narrative arc: Revealing core narrative structures through text analysis Ryan L. Boyd1, Kate G. Blackburn2 and James W. Pennebaker2

Science Advances  07 Aug 2020:
Vol. 6, no. 32, eaba2196
DOI: 10.1126/sciadv.aba2196

The researchers study similarities in the core narrative structures in various texts. For this purpose, they examine the frequencies of “cognitive processing” words and function words. However, they group the function words differently:

At the beginning, then, the author must signal concrete labels, names, and other identifying clues for the characters, places, and objects in the story; importantly, the author must also connect these dots by elaborating on their interrelations. In providing the necessary background, the author must necessarily use high rates of prepositions and articles (the mansion was next to the lake, below a bluff, by the road)—words that are inherently information-structural (18). Once the reader becomes familiar with the context, the author can later refer to the mansion as “it” or “her home” or perhaps not at all. Once the plot gets moving, there should be a large increase in pronouns, auxiliary verbs, and other function words and a corresponding drop in articles and prepositions.

  1. The OED recently published the OED Text Visualizer. Explore its functionalities here.