About the General Service List
John Bauman
Enterprise Training Group
This page: http://jbauman.com/aboutgsl.html
The 1953 GSL
About this version of the GSL
Copying and using this GSL
Bibliography
John Bauman's homepage
The actual 2,284 words, with frequency numbers
The 1953 GSL
The General Service List (GSL) (West. 1953) is a set of 2,000 words selected to be
of the greatest "general service" to learners of English. They are not the most common
2,000 words, though frequency was one of the factors taken into account in making
the selection. Each of the 2,000 words is a headword representing a word family that is
only loosely defined in West. Frequency numbers are given, derived from Thorndike
and Lorge (1944). Frequency data is also given for the various meanings of words.
This list has had a wide influence for many years, serving as the basis for graded readers
as well as other material. Texts based on the GSL are still on sale, but the list
itself is out of print. A fuller discussion of the GSL, and word lists in general,
can be found in Nation (1990, pp 21-24) and Carter and McCarthy (1988, Ch. 1)
As published, the GSL is a medium-sized red book, organized like a dictionary.
Each of the 2,000 headwords is listed alphabetically with brief definitions and example
sentences. A number is given for each word, representing the number of occurrences
per 5 million words. A percentage number is given for each meaning, representing the
frequency of that meaning in the occurrences of the word. Headwords are listed in
uppercase bold type. Derived forms are listed under the headwords in lowercase bold
type and are (usually) given their own frequency numbers.
The inclusion of related form under a headword is not consistent. If all related forms
are understandable to a learner of English who knows the headword, then the GSL consists
of 2,000 items. But this is clearly not the case. To take an extreme example, these derived forms are listed under EFFECT: effective, effectively, efficient, efficiency,
efficiently, and (with a [?]) affect. This entry for EFFECT does not represent a
single learning unit for a student of English. The question remains open; how many
"words" does the GSL contain?
The frequency numbers given for the words provide a way to rank the words in importance
for students of English. Again, there are problems. First, the transcription of the
numbers and words is a tedious task. Second is the issue of whether the frequency
numbers of related forms should always be added to the headword before the words are
ranked. If not always, in which cases should the numbers be added? A third concern is related to the age of the written material that the frequency numbers come from. This data was originally published in 1938 and 1949 (West pp xi-xiii). Is this data sufficiently relevant to the current state of English?
Back to the top
About this version of the GSL
The list given here was created by John Bauman and Brent Culligan in early 1995. We
wanted a version of the GSL ranked in frequency order. In order to address the above
problems, we adapted two authorities. To determine which words to include as forms
related to a headword, we used the standard set out in Bauer and Nation (1995). This article uses various criteria to group derived forms into word families. Related words are
ranked into levels. Words related by levels 1 to 4 are grouped under a headword on
this list, and the frequency numbers are added. To determine the frequency of a words,
we used the frequency numbers from the Brown Corpus (Frances and Kucera, 1982). Using these criteria, the GSL
ends up as 2284 words.
What follows may be more detail than most people need. I include it because I'd like
to think that what we did was rigorous enough that someone else, using the same sources
and the same criteria, would come up with the same list.
This list contains all of the headwords and bold-faced derived forms listed in the
GSL, excluding hyphenated and compound words, grouped into word families based on
levels one through four of Bauer and Nation (1995) and ranked according to frequency
numbers supplied by the Brown Corpus.
Every capitalized headword from the GSL was included. In a few cases the GSL listed
a derived form as the headword, for example COMPLICATED. In these cases, the base
form, i.e.. complicate, was used as the headword. All other headwords from the GSL
were included as headwords. GSL bold-faced derived forms that did not qualify for levels
three or four of Bauer and Nation were entered as new headwords.
The GSL was prepared for pedagogical purposes and clarity, therefore "no attempt has
been made to be rigidly consistent in the method used for displaying the words" (West,
1955:viii). The GSL does not consistently include derived forms. By applying Bauer
and Nations Level one to four, some of these inconsistencies were rectified (see below).
Derived forms that were not included in these levels continued to reflect the inconsistency
of the GSL. The following four affixes, -ion, -ition, -or and -en, caused the creation of about 100 new headwords. To increase consistency, derived forms using
these affixes, were included as headwords when they appeared on the Brown Corpus,
even if they did not appear in the GSL.
The word families include as derived forms all Bauer and Nation level two forms (verb
inflections, plurals, comparatives, and superlatives). We also included as derived
forms all qualifying Bauer and Nation levels three and four forms that occur in either
the GSL or the Brown Corpus. Derived forms that do not occur in either of these sources
were not included.
In the lemmatized Brown Corpus, while parts of speech are differentiated, non-semantically
related homographs of the same part of speech are given a single, composite frequency
number. In this list, the frequency number assigned to each headword include those homographic forms as well as the semantically linked, different parts of speech
and represent all occurrences of that graphic word in the Brown Corpus.
Back to the top
Copying and using this list
On the "next page" is the list of 2,284 words. It is in frequency order, with one
word per line. Each line contains the following:
rank number-space-frequency number-space-word
The frequency number represents the number of occurrences of that word and its related
forms in the 1,000,000 words of the Brown corpus.
I assume that people who are reading this have some familiarity with computers. Here
is what I would do with this list to make it most useful:
Copy the whole list as text onto my hard disk.
Put it into a word processor.
Replace the spaces with tabs.
Save the list and open it in a spreadsheet.
If you do this with Microsoft Word and Excel, you'll end up with a spreadsheet that
has 3 columns and 2,284 rows. Now you can alphabetize the list, or manipulate it
in any other way you may want.
This list with its related forms can be made available. I also have the first 1,000
words arranged into 20 units of 50 words each, with lots of quizzes and crossword
puzzles and stuff. I also have a Japanese-English glossary of these 1,000 words.
Back to the top
To the actual list of words
Bibliography:
Bauer and Nation, 1995, Word Families, International Journal of Lexicography
Carter and McCarthy, 1988, Vocabulary and Language Teaching, Longman, London
Frances and Kucera, 1982. Frequency Analysis of English Usage, Houghton Mifflin, Boston
Nation, I.S.P., 1990, Teaching and Learning Vocabulary, Newbury House, New York
Thorndike and Lorge, 1944, The Teachers Word Book of 30,000 Words, Columbia University,
New York
West, 1953, A General Service List of English Words, Longman, London
Back to the top