About the General Service List

John Bauman
Enterprise Training Group
This page: http://jbauman.com/aboutgsl.html

The 1953 GSL
About this version of the GSL
Copying and using this GSL
Bibliography
John Bauman's homepage
The actual 2,284 words, with frequency numbers


The 1953 GSL


The General Service List (GSL) (West. 1953) is a set of 2,000 words selected to be of the greatest "general service" to learners of English. They are not the most common 2,000 words, though frequency was one of the factors taken into account in making the selection. Each of the 2,000 words is a headword representing a word family that is only loosely defined in West. Frequency numbers are given, derived from Thorndike and Lorge (1944). Frequency data is also given for the various meanings of words. This list has had a wide influence for many years, serving as the basis for graded readers as well as other material. Texts based on the GSL are still on sale, but the list itself is out of print. A fuller discussion of the GSL, and word lists in general, can be found in Nation (1990, pp 21-24) and Carter and McCarthy (1988, Ch. 1)

As published, the GSL is a medium-sized red book, organized like a dictionary. Each of the 2,000 headwords is listed alphabetically with brief definitions and example sentences. A number is given for each word, representing the number of occurrences per 5 million words. A percentage number is given for each meaning, representing the frequency of that meaning in the occurrences of the word. Headwords are listed in uppercase bold type. Derived forms are listed under the headwords in lowercase bold type and are (usually) given their own frequency numbers.

The inclusion of related form under a headword is not consistent. If all related forms are understandable to a learner of English who knows the headword, then the GSL consists of 2,000 items. But this is clearly not the case. To take an extreme example, these derived forms are listed under EFFECT: effective, effectively, efficient, efficiency, efficiently, and (with a [?]) affect. This entry for EFFECT does not represent a single learning unit for a student of English. The question remains open; how many "words" does the GSL contain?

The frequency numbers given for the words provide a way to rank the words in importance for students of English. Again, there are problems. First, the transcription of the numbers and words is a tedious task. Second is the issue of whether the frequency numbers of related forms should always be added to the headword before the words are ranked. If not always, in which cases should the numbers be added? A third concern is related to the age of the written material that the frequency numbers come from. This data was originally published in 1938 and 1949 (West pp xi-xiii). Is this data sufficiently relevant to the current state of English?

Back to the top

About this version of the GSL


The list given here was created by John Bauman and Brent Culligan in early 1995. We wanted a version of the GSL ranked in frequency order. In order to address the above problems, we adapted two authorities. To determine which words to include as forms related to a headword, we used the standard set out in Bauer and Nation (1995). This article uses various criteria to group derived forms into word families. Related words are ranked into levels. Words related by levels 1 to 4 are grouped under a headword on this list, and the frequency numbers are added. To determine the frequency of a words, we used the frequency numbers from the Brown Corpus (Frances and Kucera, 1982). Using these criteria, the GSL ends up as 2284 words.

What follows may be more detail than most people need. I include it because I'd like to think that what we did was rigorous enough that someone else, using the same sources and the same criteria, would come up with the same list.

This list contains all of the headwords and bold-faced derived forms listed in the GSL, excluding hyphenated and compound words, grouped into word families based on levels one through four of Bauer and Nation (1995) and ranked according to frequency numbers supplied by the Brown Corpus.

Every capitalized headword from the GSL was included. In a few cases the GSL listed a derived form as the headword, for example COMPLICATED. In these cases, the base form, i.e.. complicate, was used as the headword. All other headwords from the GSL were included as headwords. GSL bold-faced derived forms that did not qualify for levels three or four of Bauer and Nation were entered as new headwords.

The GSL was prepared for pedagogical purposes and clarity, therefore "no attempt has been made to be rigidly consistent in the method used for displaying the words" (West, 1955:viii). The GSL does not consistently include derived forms. By applying Bauer and Nations Level one to four, some of these inconsistencies were rectified (see below). Derived forms that were not included in these levels continued to reflect the inconsistency of the GSL. The following four affixes, -ion, -ition, -or and -en, caused the creation of about 100 new headwords. To increase consistency, derived forms using these affixes, were included as headwords when they appeared on the Brown Corpus, even if they did not appear in the GSL.

The word families include as derived forms all Bauer and Nation level two forms (verb inflections, plurals, comparatives, and superlatives). We also included as derived forms all qualifying Bauer and Nation levels three and four forms that occur in either the GSL or the Brown Corpus. Derived forms that do not occur in either of these sources were not included.

In the lemmatized Brown Corpus, while parts of speech are differentiated, non-semantically related homographs of the same part of speech are given a single, composite frequency number. In this list, the frequency number assigned to each headword include those homographic forms as well as the semantically linked, different parts of speech and represent all occurrences of that graphic word in the Brown Corpus.

Back to the top

Copying and using this list


On the "next page" is the list of 2,284 words. It is in frequency order, with one word per line. Each line contains the following:

rank number-space-frequency number-space-word

The frequency number represents the number of occurrences of that word and its related forms in the 1,000,000 words of the Brown corpus.

I assume that people who are reading this have some familiarity with computers. Here is what I would do with this list to make it most useful:

Copy the whole list as text onto my hard disk.
Put it into a word processor.
Replace the spaces with tabs.
Save the list and open it in a spreadsheet.

If you do this with Microsoft Word and Excel, you'll end up with a spreadsheet that has 3 columns and 2,284 rows. Now you can alphabetize the list, or manipulate it in any other way you may want.

This list with its related forms can be made available. I also have the first 1,000 words arranged into 20 units of 50 words each, with lots of quizzes and crossword puzzles and stuff. I also have a Japanese-English glossary of these 1,000 words.

Back to the top
To the actual list of words

Bibliography:


Bauer and Nation, 1995, Word Families, International Journal of Lexicography

Carter and McCarthy, 1988, Vocabulary and Language Teaching, Longman, London

Frances and Kucera, 1982. Frequency Analysis of English Usage, Houghton Mifflin, Boston

Nation, I.S.P., 1990, Teaching and Learning Vocabulary, Newbury House, New York

Thorndike and Lorge, 1944, The Teachers Word Book of 30,000 Words, Columbia University, New York

West, 1953, A General Service List of English Words, Longman, London

Back to the top