Section 7.4
7 Standard Algorithms
7.4 External Sorting
External sorts are used when the volume of data is so great that it cannot be
held in memory.
7.4.1 Merge Sort
This algorithm takes a serial file and produces a sequential file.
It has temporary files: file A, file B, file C, file D.
At any one time, two of these files are 'transmitting' records and the other
two are 'receiving' records.
Example
The serial file contains records with the key values:
Serial File
|
23
|
16
|
57
|
43
|
90
|
13
|
29
|
75
|
36
|
25
|
41
|
82
|
19
|
The records are written out alternatively:
File A
|
23
|
57
|
90
|
29
|
36
|
41
|
19
|
|
|
|
|
|
|
File B
|
16
|
43
|
13
|
75
|
25
|
82
|
|
|
|
|
|
|
|
Files A & B 'transmit' record to files C & D.
Each pair of records are merged to be in sequence.
File C
|
16
|
23
|
13
|
90
|
25
|
36
|
19
|
|
|
|
|
|
|
File D
|
43
|
57
|
29
|
75
|
41
|
82
|
|
|
|
|
|
|
|
File C and File D hold sequences of records with a minimum length of two
records.
A second pass starts and File C and File D now 'transmit' their records to
File A and File B.
File A
|
(16
|
23
|
47
|
57)
|
(25
|
41
|
82)
|
|
|
|
|
|
|
File B
|
(13
|
29
|
75
|
90)
|
(19)
|
|
|
|
|
|
|
|
|
On the third pass:
File C
|
(13
|
16
|
29
|
43
|
57
|
75
|
90)
|
|
|
|
|
|
|
File D
|
(19
|
25
|
36
|
41
|
82)
|
|
|
|
|
|
|
|
|
On the final pass:
File C
|
13
|
16
|
19
|
23
|
25
|
29
|
36
|
41
|
43
|
57
|
75
|
82
|
90
|
7.4.2 Classic Four Tape Merge Sort
This is a better version of the basic merge sort. The main difference lies in
the initialisation step.
Serial File
|
23
|
16
|
57
|
43
|
90
|
13
|
29
|
75
|
36
|
25
|
41
|
82
|
19
|
The records are moved in groups of ascending key field values to File A and File
B.
File A
|
(23)
|
(43
|
90)
|
(36)
|
(19)
|
|
|
|
|
|
|
|
|
File B
|
(16
|
57)
|
(13
|
29
|
75)
|
(25
|
41
|
82)
|
|
|
|
|
|
These values are 'transmitted' to files C and D.
File C
|
(16
|
23
|
57)
|
(25
|
36
|
41
|
82)
|
|
|
|
|
|
|
File D
|
(13
|
29
|
43
|
75
|
90)
|
(19)
|
|
|
|
|
|
|
|
On the second pass:
File A
|
(13
|
16
|
23
|
29
|
43
|
57
|
75
|
90)
|
|
|
|
|
|
File B
|
(19
|
25
|
36
|
41
|
82)
|
|
|
|
|
|
|
|
|
On the final pass:
File C
|
13
|
16
|
19
|
23
|
25
|
29
|
36
|
41
|
43
|
57
|
75
|
82
|
90
|
This algorithm took 3 passes whereas the simple merge sort took four passes.