skip to content
 
 

Survey of Academics: Technical Note on Sampling

by Lindsay Paterson

7 January 2002

graphic: pillar

 

link back to The Survival of the Democratic Intellect main paper

 

1. Sample Selection

The sample was to consist of academics working in higher education institutions in Scotland and England, and the target sample size was around 600-700 people in each country. Thus the sampling fraction would have to be larger in Scotland than in England, because Scotland contains about 12.5% of all academics working in Scotland or England (HESA, 2001). The original intention was to follow Halsey (1992) by selecting the sample from the Commonwealth Universities Yearbook (ACU, 2000). Since Halsey conducted his survey in 1989, this had been expanded to include the former polytechnics and colleges, and does provide a fairly thorough coverage of these higher education institutions. Nearly all the Scottish and English institutions recorded by HESA appear in the Yearbook: of staff recorded by HESA in Scotland, 98.8% are in institutions which are listed in the Yearbook; in England, the figure is 94.1%. However, the Yearbook has stopped listing staff below the level of senior lecturer (in the former UFC-funded institutions or Scottish Central Institutions) or principal lecturer (in the former polytechnics in England). So the Yearbook was supplemented by university web sites to construct a two-stage clustered sample.

The aim was to select approximately the same number of departments in Scotland and England, and to select the same number of sample members in each department. This would allow estimation of variance measures at individual, departmental and institutional level. Some experimentation with selecting the Scottish departments suggested that choosing about 100 departments, with 7 staff per department, would give reasonable coverage of the institutions, while retaining a reasonable sample size within department.

The sampling proceeded as follows:

  1. The Yearbook is laid out as a list of staff in departments or other academic administrative units within institutions (referred to as 'departments' below); there are three columns on each page. Institutions, departments within institutions, and staff within departments are listed alphabetically. Departments were chosen by treating the Yearbook as a list of individuals, and selecting those departments corresponding to individuals separated by an approximately fixed selection interval (a standard way of drawing a probability sample: see Hoinville and Jowell (1977, pp. 61-2)). For Scotland, that was all departments which occupied the top left-hand and top right-hand positions on the page; this yielded 101 departments. For England, an initial selection was made of all departments which occupied the top right-hand position, and then a random sub-sample of these was selected (using the random number generator in SPSS) to give a similar number of departments to the Scottish sample (104 in all). So the departments formed the sample clusters, selected with probability approximately proportional to the number of their staff who were listed in the Yearbook. We denote the number of listed staff in department number i by mi.

  2. If the Yearbook had listed all staff, then a self-weighting sample would have been achieved by choosing to interview the same number of individuals from each of the selected departments, but the situation was not so straightforward here because only relatively senior staff were listed. The next step was to find the web pages of the selected departments, and list all staff who appeared there. This step was carried out by Margaret Macpherson, the Leverhulme project administrator. In cases where a department did not have a web page, or where the web page did not list staff, the next department of similar size in the Yearbook was selected instead. Seven academics were selected at random (without replacement) from each of the departmental lists (using the random number generator in SPSS). For departments with fewer than 7 staff, all members were selected (this being necessary in only 6% of departments). In the lists, academics were defined to include lecturing, research, post-doctoral and honorary staff, and to exclude secretarial, administrative and library staff, technicians and PhD students. This yielded 685 staff in Scotland and 694 in England. We denote the number of staff on the web site in department i by ni, and the number of staff selected into the sample from department i by ki. (Thus ki was usually 7.)

In short: the target sample consisted of 205 departments (101 in Scotland and 104 in England) and usually 7 staff per department. These came from 14 institutions in Scotland and 57 in England (which includes 9 constituent institutions of the University of London).

Design weights wi were then constructed as

wi = ni/(miki),

and all analysis of the data must use them. These weights were derived according to standard rules, and are proportional to the inverse of the probability that any individual member of staff would be selected by the two-stage mechanism described above: see Groves (1989, p. 257). Since ki is more or less constant, the idea is that the weight is large when ni/mi is large, and that happens when a relatively large proportion of the staff in the department are not in the Yearbook. That is, the weighting compensates for the fact that the Yearbook is confined to staff at senior lecturer or equivalent and above.

Analysis of the data must also take account of the clustered sampling mechanism. The most straightforward way of doing that is to do any statistical modelling in a multi-level framework, with departments as level 2 and individuals as level 1 (Goldstein, 1995, pp. 97-112).

The extent to which this sampling mechanism worked in the sense of yielding a target sample that was representative is assessed below, in relation to broad subject area of academic discipline and to broad category of institution.

2. Questionnaires

The questionnaires are attached. The initial questionnaire was 10 pages long, in slightly different versions for Scotland and England. As explained below, in section 3, a 2-page version of these was sent to people who had not responded by about two months after the long questionnaire was sent out. The short questionnaire was supplemented with information on 2 variables from the web pages: respondents' gender and grade (eg lecturer or professor). For the 105 respondents to the short questionnaire, the web pages did not contain enough information to decide on the gender of 12 people, and the grade of 7 people; moreover, the grades of 16 further people could be specified only as either lecturer or senior lecturer.

3. Fieldwork

The fieldwork was carried out by the Survey Team at Edinburgh University. Questionnaires were identified by code to allow tracking of responses. The long questionnaire was sent out on 22 March 2001 to the 685 target sample members in Scotland and the 694 in England. A postcard reminder was sent on 16 April to people who had not then replied (459 in Scotland and 466 in England), and the long questionnaire was sent out again on 9 May to those who had still not responded (408 in Scotland and 430 in England). The short questionnaire was then sent on 1 June to people who had still not replied (279 in Scotland and 325 in England). The final returns were received in July, but all but 24 (9 in Scotland, 15 in England) of the final achieved sample had been received by 19 June. Only a small number of questionnaires was returned as undelivered or as having been sent to people who were not eligible (see Table 1), which tends to confirm that the web sites showed reasonably up-to-date staff lists.

Table 1 summarises this process. The eventual response rate for the long questionnaire, excluding undelivereds and not eligibles, was 55.4% (58.7% in Scotland and 52.1% in England). Adding in the short questionnaire, the rate was 63.4% (66.4% in Scotland and 60.5% in England). A response rate of this level is respectable for postal surveys, and - for the long questionnaire - is almost the same as that achieved by Halsey in 1989 (55.8%: Halsey (1992, pp. 273-4)). This yielded a total achieved sample of 830 (434 in Scotland and 396 in England). Fifteen individuals had removed the identifying code on the questionnaire, and so could not be assigned to departments. Of those which could be identified, responses came from 97 departments in Scotland and 100 in England (out of the 101 and 104 that had been selected from the Yearbook). These were from all the institutions that had been represented in the target sample: 14 in Scotland and 57 in England.

When the weights were calculated on the achieved sample of 830, 15 individuals were found to have very large weights (ranging from 4.05 to 10.73). These turned out to be from departments where only one contact person had been listed in the Yearbook. It was decided to omit these 15 cases from the data set, to avoid their being unduly influential on analysis. Unfortunately, that entailed omitting one whole institution in Scotland, where all departments had been listed in the Yearbook in this way; indeed, all but 2 of these 15 were from Scotland. So the usable sample had 421 cases from Scotland and 394 cases from England. The calculated weights ranged from 0.07 to 3.54; the three quartiles were 0.46, 0.67 and 1.08, and so the majority of respondents had weights that differed by a factor of only about 2. This compares quite well with large-scale surveys that use design weights: for example, in the 2000 British Social Attitudes Survey, the weights at the lower and upper quartiles differ by a factor of about 2, and in the first sweep of the British Household Panel Study the cross-sectional individual-level weights at the lower and upper quartiles differ by a factor of 1.2.

4. Representativeness

The representativeness of the achieved sample can be assessed by comparing it with data supplied specially by the Higher Education Statistics Agency (HESA, 2001). The most up-to-date data from them that was available concerned session 1999-2000. Some comparisons can be made only for people who responded to the long questionnaire. Where this is the case, it is indicated below; otherwise, comparisons are for the full achieved sample.

There are two sources of unrepresentativeness. One is the selection mechanism (described above): did the two-stage combination of Yearbook and web pages yield a set of departments that was representative? The other source is non-response. For two measures (subject area and type of institution) it is possible to assess the contribution of these separately.

The sample under-represented contract research staff. In both Scotland and England, around one third of staff in the HESA data were classed as 'research only': 34.3% in Scotland, 30.2% in England. Likewise, in Scotland, 31.6% of the population were on research contracts; in England, the proportion was 28.1%. By contrast, only 13.1% of respondents in Scotland described themselves as contract researchers, and 11.7% in England. The same conclusion can be reached from another survey item (for respondents to the long questionnaire only) - a question about the proportion of time spent on research. 91.3% of respondents in Scotland spent less than 90% of their time on research, and so presumably would not fall into the HESA category of 'research only'; the proportion for England was 92.0%. There are two possible explanations of this under-representation of research staff. One is that contract researchers simply responded at a lower rate. The other is that not all contract researchers are listed on university web pages, or that some of them are classified as 'technicians' etc (who were not included in the target sample). On the whole, the latter explanation is the more plausible, because it turns out that in some respects the contract researchers who did respond (41 in Scotland, 35 in England) are reasonably representative of the 'research only' category in the HESA data, as we will see below.

However, because of this, for the remainder of the assessment of representativeness, we consider contract researchers separately. That is, we compare respondents who are not contract researchers with people classified by HESA as on 'teaching and research' or 'teaching only' contracts. We refer to these as the 'teaching and research' group, and the other group as 'researchers'.

The implication for analysis is that it will always be necessary to control for the distinction between contract researchers and others.

Age

For the teaching and research group, the age distribution of the achieved sample is fairly close to the age distribution of the population, as Table 2 shows (long questionnaire only): for example, in Scotland, 47.1% of the sample was aged 45 or younger, compared to 50.3% of the population; the corresponding figures for England were 48.7% and 50.5%. For researchers, the smallness of the sample sizes makes detailed comparison less reliable, but the concentration of this group in ages under 45 is accurately reflected in the sample: in Scotland, 92.4% in the sample compared to 91.2% in the population; in England, 92.9% compared to 88.9%.

Length of service in present post

The teaching and research group (long questionnaire only) is somewhat biased towards people who have been in their present post for 9 years or fewer (Table 3), but the population pattern is broadly reproduced. The heavy concentration of researchers into this category is very closely reproduced in the sample.

Employment grade

The distribution of employment grades is shown in Table 4. In Scotland, the distribution of grades of teaching and research staff in the sample was close to that in the population: 18.3% of the teaching and research group in the sample were professors, very close to the 17.8% in the population; and 28.4% of the sample were senior lecturers or equivalent, close to the 26.1% in the population. In England, the situation was not so clear because of ambiguity in the interpretation of the term 'senior lecturer' in the institutions where, before 1993, grade levels were governed by the Polytechnic and Colleges Employers' Federation ('senior lecturer' there is taken by HESA to be broadly parallel to part of the lecturer scale in the former UFC-funded institutions). However, for the other institutions (sometimes referred to as the 'old' universities: see below), the grade distribution in the sample was close to that in the population: 17.3% of the sample were professors, compared to 19.2% in the population, and 30.7% of the sample were senior lecturers or equivalent, compared to 27.5% in the population.

Gender

The achieved sample was close to the population in the proportion which was fenale. For teaching and research staff in Scotland, 28.2% of the sample was female (sample size 372), and 30.4% of the population. In England, the corresponding proportions were 30.8% (sample size 349) and 33.7%. For research staff in Scotland, 38.9% of the sample was female (sample size 40), and 44.2% of the population. In England, these proportions were 43.2% (sample size 34) and 41.5%.

Subject area

There are two ways of assessing the representativeness with respect to subject area: š

compare the population with the target sample in the achieved departments; š

compare the population with the achieved sample.

The first of these gives a reasonable indication of the extent to which the sampling mechanism (of Yearbook and web pages) yielded a target sample that resembled the population. The second reflects the effect of the sampling mechanism combined with the effect of differences in response rates.

For the target sample, we categorised the 197 departments that had at least one response into the broad subject groupings shown in Table 5; the distribution there takes account of the fact that a small number of departments had fewer than 7 target sample members. As can be seen, this distribution corresponds quite closely to the population distribution, in both Scotland and England. This is a confirmation that the sampling mechanism worked in the way intended.

However, because the response rates varied by subject area - being relatively high in medicine and the humanities, and relatively low among social scientists - the distribution of the achieved sample had rather too many people in these disciplines, and rather too few in the remainder.

Type of institution

The institutions were classified into the five types shown in Table 6, corresponding broadly to the date at which they became universities or equivalent. In the HESA data, colleges which, according to the Yearbook, are organisationally part of another institution were classified with that institution, since the purpose is to use the era of achieving university status as an indicator of prestige. The four ancient Scottish universities are St Andrews, Glasgow, Aberdeen and Edinburgh. The English redbrick universities are all those founded in the nineteenth and twentieth centuries before 1963; thus the institutions founded or upgraded in the 1950s and the very early 1960s are included in this category. The Robbins universities are those founded or upgraded from 1963 onwards until the mid-1970s, and the post 1992 universities are those upgraded in the 1990s. The 1963 boundary is admittedly somewhat arbitrary, but it was applied consistently to the sample data and to the HESA data.

Again, there are two ways of assessing this, as for broad subject area. The 197 departments quite closely resemble the population distribution, but the response rates were lower in the post-1992 institutions. So the achieved sample, especially in England, under-represents these. The under-representation was even greater among people who were teachers and researchers (defined as in the other tables): among these people, the post-1992 share in Scotland ought to be 31.8%, and yet is 23.8% in the sample; in England, it ought to be 39.6%, but in the sample is 15.5%. Note that Halsey, too, achieved a lower response rate in the then polytechnics (which became the post-1992 universities) than in older universities (Halsey, 1992, p. 273).

This suggests that a further standard control in any modelling ought to be a variable recording the type of institution.

5. Conclusion: implications for statistical analysis

There are four main implications for analysing the data:

  1. All statistical analysis must use the weights, since these are design weights. These are called WTFACTOR in the SPSS data set.
  2. Except with respect to the distinction between contract researchers and others, the sampling mechanism seems to have achieved a broadly representative target sample, but the clustering requires that all analysis be done by multi-level modelling with at least two levels - individuals and departments. The departments are recorded in the variable XDEPT in the SPSS data set.
  3. As well as under-representing contract researchers, the achieved sample also under-represents staff in post-1992 institutions. So all analysis should control for these variables as a matter of routine, even when they are not of substantive interest. In the SPSS data set, the information on contract researchers is in the variable PRESPOST, and the information on post-1992 institutions is in HEITYPE.
  4. Note further that the different sampling fractions in Scotland and England imply that all analysis must also control for country, even when that is not of substantive interest. The information is in the variable COUNTRY in the SPSS data set.

Acknowledgement

The population data were obtained from the Individualised Staff Records 1999-2000 supplied by the Higher Education Statistics Agency. HESA cannot accept responsibility for any inferences or conclusions derived from the data. Funding for the research came from the Leverhulme Foundation as part of its Nations and Regions research programme.

References

Association of Commonwealth Universities (2000), Commonwealth Universities Yearbook 2000, London: ACU.

Goldstein, H. (1995), Multilevel Statistical Models, London: Arnold.

Groves, R. M. (1989), Survey Errors and Survey Costs, New York: Wiley.

Halsey, A. H. (1992), The Decline of Donnish Dominion, Clarendon, Oxford: Clarendon.

Higher Education Statistics Agency (2001), HESA Individualised Staff Record, 1999-2000, Cheltenham: HESA.

Hoinville, G. and Jowell, R. (1977), Survey Research Practice, Aldershot: Gower.

Tables


Table 1: Response to questionnaire

 

despatched

returned

% of despatch

not eligible (NE)

undelivered (UD)

refusals

% of despatch minus NE and UD

long questionnaire

Scotland

685

384

56.1

11

20

8

58.7

England

694

341

49.1

14

25

7

52.1

Total

1379

725

52.6

25

45

15

55.4

short questionnaire

Scotland

279

50

17.9

England

325

55

16.9

Total

604

105

17.4

all questionnaires

Scotland

685

434

63.4

66.4

England

694

396

57.1

60.5

Total

1379

830

60.2

63.4



Table 2: Age

 

Scotland

England

 

teaching and research*

research**

teaching and research*

research**

age

sample

pop.

sample

pop.

sample

pop.

sample

pop.

25 or under

1.5

1.3

7.4

10.9

0.6

0.8

3.7

10.5

26-35

11.1

18.4

50.0

60.3

19.6

18.7

36.1

57.4

36-45

34.5

30.6

35.0

20.0

28.5

31.0

53.1

21.0

46-55

35.0

33.7

6.0

7.0

29.4

34.4

4.3

8.1

56-65

16.2

15.8

1.6

1.7

20.7

14.2

0.0

2.7

66 or over

1.9

0.3

0.0

0.2

1.1

0.1

2.8

0.1

sample size

331

39

308

31

  • Long questionnaire only.
  • Sample data are weighted; sample sizes are unweighted.
  • Source of population data: HESA Individualised Staff Record, 1999-2000.

* Teaching and research is a category of 'principal employment function' in the HESA data. In the sample, it is all respondents other than contract researchers.

** Research is a category of 'principal employment function' in the HESA data. In the sample, it is contract researchers.


Table 3: Length of time in present post

 

Scotland

England

 

teaching and research*

research**

teaching and research*

research**

number of years

sample

pop.

sample

pop.

sample

pop.

sample

pop.

9 or under

67.1

53.6

92.4

92.5

66.9

58.6

97.2

92.7

10-19

22.8

22.0

5.6

6.2

19.3

21.8

0.0

5.5

20-29

7.2

15.6

2.0

1.1

10.1

13.1

0.0

1.3

30-39

2.7

8.5

0.0

0.3

3.7

6.4

0.0

0.5

40 or over

0.2

0.2

0.0

0.0

0.0

0.2

2.8

0.0

sample size

330

39

307

31

  • Long questionnaire only.
  • Sample data are weighted; sample sizes are unweighted.
  • Source of population data: HESA Individualised Staff Record, 1999-2000.

* Teaching and research is a category of 'principal employment function' in the HESA data. In the sample, it is all respondents other than contract researchers.

** Research is a category of 'principal employment function' in the HESA data. In the sample, it is contract researchers.


Table 4: Grade of present post

 

Scotland

England*

grade

sample

pop.

sample

pop.

lecturer

40.5

46.5

37.9

39.3

senior lecturer, reader etc

28.4

26.1

30.7

27.5

professor

18.3

17.8

17.3

19.2

other

12.8

9.6

14.2

14.0

sample size

375

263

  • Respondents on teaching and research or teaching only contracts  (see footnote to Table 2).
  • Sample data are weighted; sample sizes are unweighted.
  • Source of population data: HESA Individualised Staff Record, 1999-2000.

* In England, excludes staff in the post-1992 institutions: see  text.


Table 5: Broad subject area of present post

 

Scotland

England

broad subject area*

achieved sample

response rate (%)

target sample**

pop.

achieved sample

response rate (%)

target sample**

pop.

medicine

19.8

66

19.7

14.6

23.8

59

17.5

11.9

science

18.3

58

25.0

26.7

23.9

55

24.6

23.4

engineering

8.9

60

9.5

10.1

13.5

64

11.2

8.7

social science

18.1

61

25.5

24.6

13.9

52

28.6

28.2

humanities

28.0

66

18.5

16.1

20.0

59

14.0

18.9

other

6.9

92

1.8

7.9

4.9

50

4.1

8.9

sample size

331

308

  • Respondents on teaching and research or teaching only contracts  (see footnote to Table 2).
  • Sample data are weighted; sample sizes are unweighted.
  • Source of population data: HESA Individualised Staff Record, 1999-2000.

* 'medicine' is medicine and subjects allied to medicine; 'science' is science, agriculture and mathematical sciences; 'engineering' is engineering sciences and architecture; 'social science' is social sciences, business and education; 'humanities' is humanities and arts.

** The distribution of the target sample members in those departments from which at least one response was obtained (97 in Scotland and 100 in England).


Table 6: Broad type of institution

 

Scotland

England

broad institution type*

achieved sample

response rate (%)

target sample**

pop.

achieved sample

response rate (%)

target sample**

pop.

ancient Scottish

44.2

63

46.2

49.7

-

-

-

-

Oxford, Cambridge

-

-

-

-

14.3

56

13.8

7.7

English redbrick

-

-

-

-

49.9

56

49.6

42.8

Robbins

34.5

63

33.9

26.3

20.4

61

11.7

12.9

post-1992

21.4

59

19.9

23.9

15.3

54

24.9

30.6

sample size

416

 
 

386

 
  • Sample data are weighted; sample sizes are unweighted.
  • Source of population data: HESA Individualised Staff Record, 1999-2000, omitting staff who were in institutions that were  not listed in the Commonwealth Universities Yearbook: 1.2% of all HESA-listed staff in Scotland and 5.9% of all HESA-listed staff in England.

* See text for definitions.

** The distribution of the target sample members in those departments from which at least one response was obtained (97 in Scotland and 100 in England).

(Published Online: 10 September 2002)

 

link back to The Survival of the Democratic Intellect main paper

 

graphic back to top

 

View available articles by AUTHOR

View by DATE published online

 

read the latest issue of
PARLIAMENT NEWS
Scottish Affairs
journal
Find out about our Political Internship Programme
at the Institute