SICP 2.4 Multiple Representations for Abstract Data
2023-09-27 Wed
Data Abstraction separates the use from the implementation, the
interface from the representation. A Scheme programmer, for
example, usually operates at a level of abstraction such that she
doesn't have to worry about how car
, cdr
, and cons
are
implemented. She just has to know how they behave. She just has to
know that cons
takes two entities and creates a further entity — a
pair — the application of car
to which returns the first entity
and the application of cdr
to which returns the second entity. When
we use pairs, their implementation remains in the shadows1. The programmer knows how to use pairs, but he doesn't
necessarily have to know how pairs are represented.
Section 2.1.1 showed an example of this kind of ``abstractions barriers''. There we saw ``how to separate the task of designing a program that uses rational numbers from the task of implementing rational numbers. The ``abstractions barriers'' we are talking about can be thought of as horizontal barriers which are present at different levels (see Figure 2.1). In that specific example, there are the following barriers:
cons
,car
,cdr
;make-rat
,numer
,denom
;add-rat
,sub-rat
, etc.;- programs that use rational numbers.
Despite the benefits that those barriers provide, those barriers are not enough. They are not enough, because there might be more than one useful representation for a certain data object and we might want to use those different representations together in our system. Complex numbers offer a more-or-less-toy example. Complex numbers can be represented in the so-called ``rectangular'' form — which is in terms of a real part and an imaginary part —, or in the so-called ``polar'' form — which is in terms of a magnitude and an angle. Each of those two representations can be, depending on the circumstances, more appropriate than the other.
We need:
- not only: ``data-abstraction barriers'' that isolate use from representation, interface from implementation,
- but also: abstraction barriers that isolate different representations for the same data object and allow those different design choices to coexist.
- moreover: we need to able to add a certain representation to a system additively, that is, without having to re-design or re-implement it.
We can think of those additional barriers as vertical barriers:
*Figure 2.19:* Data-abstraction barriers in the complex-number system. Programs that use complex numbers +-------------------------------------------------+ --| add-complex sub-complex mul-complex div-complex |-- +-------------------------------------------------+ Complex arithmetic package ---------------------------+--------------------------- Rectangular | Polar representation | representation ---------------------------+--------------------------- List structure and primitive machine arithmetic [Figure from SICP Unofficial Texinfo Format version 2.neilvandyke4 (January 10, 2007)]
Vertical barriers can be built through the usage of generic procedures, ``procedures that can operate on data that may be represented in more than one way''. We will be able to write generic procedures thanks to the usage of type tags. To achieve the ability to add representation to a certain system additively we will use the technique of data-directed programming. We will also briefly look at an alternative to data-directed programming called message passing.
2.4.1 Representations for Complex Numbers
We want a complex number arithmetic system. Basically we want four operations that work with complex numbers: addition, subtraction, multiplication, and division.
Using a data-abstraction strategy — exactly what we did with rational number) —, we can just assume that we have the following selectors and constructors:
- Selectors:
real-part
;imag-part
;magnitude
;angle
.
- Constructors:
make-from-real-imag
;make-from-mag-ang
.
The selectors and constructors above are our abstract data. We can specify operations on complex numbers in terms of that abstract data:
add-complex
:(define (add-complex z1 z2) (make-from-real-imag (+ (real-part z1) (real-part z2)) (+ (imag-part z1) (imag-part z2))))
sub-complex
:(define (sub-complex z1 z2) (make-from-real-imag (- (real-part z1) (real-part z2)) (- (imag-part z1) (imag-part z2))))
mul-complex
:(define (mul-complex z1 z2) (make-from-mag-ang (* (magnitude z1) (magnitude z2)) (+ (angle z1) (angle z2))))
div-complex
:(define (div-complex z1 z2) (make-from-mag-ang (/ (magnitude z1) (magnitude z2)) (- (angle z1) (angle z2))))
Now that we have the complex number arithmetic operations, we need somebody to implement a complex number representation. Both Ben and Alyssa want to do it and, for some reason — pick your favorite reason —, we are forced to use both representations in our system.
Here is what Ben does:
;; Ben's representation (define (real-part z) (car z)) (define (imag-part z) (cdr z)) (define (magnitude z) (sqrt (+ (square (real-part z)) (square (imag-part z))))) (define (angle z) (atan (imag-part z) (real-part z))) (define (make-from-real-imag x y) (cons x y)) (define (make-from-mag-ang r a) (cons (* r (cos a)) (* r (sin a))))
And here is what Alyssa does:
;; Alyssa's representation (define (real-part z) (* (magnitude z) (cos (angle z)))) (define (imag-part z) (* (magnitude z) (sin (angle z)))) (define (magnitude z) (car z)) (define (angle z) (cdr z)) (define (make-from-real-imag x y) (cons (sqrt (+ (square x) (square y))) (atan y x))) (define (make-from-mag-ang r a) (cons r a))
Ben has implemented what can be called a ``rectangular'' representation (in which a complex number is represented as a pair of a real-part and an imaginary part), whereas Alyssa has implemented what can be called a ``polar'' representation (in which a complex number is represented as a pair of a magnitude and an angle). The selectors and the constructors they have created have the same name, operate differently underneath the hood.
Given that the operations add-complex
, sub-complex
, mul-complex
,
and div-complex
are implemented in terms of abstract data, choosing
Ben's representation over Alyssa's, or vice versa, would make no
difference: those operations would work in both cases.
2.4.2 Tagged data
Now, what if don't want to choose one representation over the other? What if we want to keep both representations? What if we want a system that looks like that figure 2.19?
If our system has to include multiple representations for the same object type, then we need some way to distinguish objects with representation foo from objects with representation bar. A simple way to do that is tagging the objects. To tag and check the tags we can do something like this:
(define (attach-tag type-tag contents) (cons type-tag contents)) (define (type-tag datum) (if (pair? datum) (car datum) (error "Bad tagged datum -- TYPE-TAG" datum))) (define (contents datum) (if (pair? datum) (cdr datum) (error "Bad tagged datum -- CONTENTS" datum))) (define (rectangular? z) (eq? (type-tag z) 'rectangular)) (define (polar? z) (eq? (type-tag z) 'polar))
If Ben and Alyssa have a designed their representation packages separately, what would they have to do to exist compatibly in the system? Here is what they can do. Ben can write his representation in this way:
(define (real-part-rectangular z) (car z)) (define (imag-part-rectangular z) (cdr z)) (define (magnitude-rectangular z) (sqrt (+ (square (real-part-rectangular z)) (square (imag-part-rectangular z))))) (define (angle-rectangular z) (atan (imag-part-rectangular z) (real-part-rectangular z))) (define (make-from-real-imag-rectangular x y) (attach-tag 'rectangular (cons x y))) (define (make-from-mag-ang-rectangular r a) (attach-tag 'rectangular (cons (* r (cos a)) (* r (sin a)))))
And Alyssa can write her representation in this way:
(define (real-part-polar z) (* (magnitude-polar z) (cos (angle-polar z)))) (define (imag-part-polar z) (* (magnitude-polar z) (sin (angle-polar z)))) (define (magnitude-polar z) (car z)) (define (angle-polar z) (cdr z)) (define (make-from-real-imag-polar x y) (attach-tag 'polar (cons (sqrt (+ (square x) (square y))) (atan y x)))) (define (make-from-mag-ang-polar r a) (attach-tag 'polar (cons r a)))
These ``packages'' differ from the original packages in two respects:
- the constructor is now tagging the objects it creates;
- the function names have been modified in order to avoid name
conflicts (for example, in Ben's representation,
real-part
has becomereal-part-rectangular
).
Now that we have been given typed data, we need somebody, say a ``manager'' (see lecture), that looks at those types and make things work. The manager can now write generic selectors:
(define (real-part z) (cond ((rectangular? z) (real-part-rectangular (contents z))) ((polar? z) (real-part-polar (contents z))) (else (error "Unknown type -- REAL-PART" z)))) (define (imag-part z) (cond ((rectangular? z) (imag-part-rectangular (contents z))) ((polar? z) (imag-part-polar (contents z))) (else (error "Unknown type -- IMAG-PART" z)))) (define (magnitude z) (cond ((rectangular? z) (magnitude-rectangular (contents z))) ((polar? z) (magnitude-polar (contents z))) (else (error "Unknown type -- MAGNITUDE" z)))) (define (angle z) (cond ((rectangular? z) (angle-rectangular (contents z))) ((polar? z) (angle-polar (contents z))) (else (error "Unknown type -- ANGLE" z))))
This strategy is called dispatch on type. We can think of the system as having three parts: Ben, Alyssa, and the manager. The idea is that you break your system into a bunch of pieces. There is Ben and Alyssa, who are making representations, and then there is the manager, who looks at the types on the data and dispatches tasks to the right person.
2.4.3 Data-Directed Programming and Additivity
How can we criticize the system described above? First, Ben and Alyssa had to change the names of their procedures. Second, when somebody wants to add a representation into the system, even if Ben and Alyssa don't care, whoever is in charge of the generic selectors — the manager, in our narrative —, has to change all of them!
(Roughly, from the lecture:) ``The inflexibility in the system, the place where work has to be done to accommodate is in the manager. That's quite annoying. It's even more annoying when you think that the manager isn't really doing anything… the bottleneck in the system is in the bureaucracy…''
Now, abstractly, in the system, there is a table:
polar | rectangular | |
---|---|---|
real-part | real-part-polar | real-part-rectangular |
imag-part | imag-part-polar | imag-part-rectangular |
magnitude | magnitude-polar | magnitude-rectangular |
angle | angle-polar | angle-rectangular |
In the table we have the right procedures for the given types (columns) and operators (rows). ``That's really what's going on. In some sense, all the manager is doing is acting as this table''.
How do we fix our system? We get rid of the manager. We let our system use the table directly. How do we do that? Let's assume — again, data abstraction! — that we have some sort of table data structure in which we can put things:
`(put <OP> <TYPE> <ITEM>)' installs the `<ITEM>' in the table, indexed by the `<OP>' and the `<TYPE>'. `(get <OP> <TYPE>)' looks up the `<OP>', `<TYPE>' entry in the table and returns the item found there. If no item is found, `get' returns false.
Given that we have this table, now Ben and Alyssa, when they build their systems, can fill the table appropriately:
Ben:
(define (install-rectangular-package) ;; internal procedures (define (real-part z) (car z)) (define (imag-part z) (cdr z)) (define (make-from-real-imag x y) (cons x y)) (define (magnitude z) (sqrt (+ (square (real-part z)) (square (imag-part z))))) (define (angle z) (atan (imag-part z) (real-part z))) (define (make-from-mag-ang r a) (cons (* r (cos a)) (* r (sin a)))) ;; interface to the rest of the system (define (tag x) (attach-tag 'rectangular x)) (put 'real-part '(rectangular) real-part) (put 'imag-part '(rectangular) imag-part) (put 'magnitude '(rectangular) magnitude) (put 'angle '(rectangular) angle) (put 'make-from-real-imag 'rectangular (lambda (x y) (tag (make-from-real-imag x y)))) (put 'make-from-mag-ang 'rectangular (lambda (r a) (tag (make-from-mag-ang r a)))) 'done)
Alyssa:
(define (install-polar-package) ;; internal procedures (define (magnitude z) (car z)) (define (angle z) (cdr z)) (define (make-from-mag-ang r a) (cons r a)) (define (real-part z) (* (magnitude z) (cos (angle z)))) (define (imag-part z) (* (magnitude z) (sin (angle z)))) (define (make-from-real-imag x y) (cons (sqrt (+ (square x) (square y))) (atan y x))) ;; interface to the rest of the system (define (tag x) (attach-tag 'polar x)) (put 'real-part '(polar) real-part) (put 'imag-part '(polar) imag-part) (put 'magnitude '(polar) magnitude) (put 'angle '(polar) angle) (put 'make-from-real-imag 'polar (lambda (x y) (tag (make-from-real-imag x y)))) (put 'make-from-mag-ang 'polar (lambda (r a) (tag (make-from-mag-ang r a)))) 'done)
Who makes a representation has the responsibility of setting up a
column in the table. The manager has been ``automated out of
existence''. The manager is replaced by a procedure. This procedure is
called apply-generic
and is the key procedure in the whole system
(see also its simpler version operate
shown in the lecture).
(define (apply-generic op . args) (let ((type-tags (map type-tag args))) (let ((proc (get op type-tags))) (if proc (apply proc (map contents args)) (error "No method for these types -- APPLY-GENERIC" (list op type-tags))))))
We use apply-generic
to define the generic selectors:
(define (real-part z) (apply-generic 'real-part z)) (define (imag-part z) (apply-generic 'imag-part z)) (define (magnitude z) (apply-generic 'magnitude z)) (define (angle z) (apply-generic 'angle z))
This strategy is called ``data-directed programming'', because, ``in some sense the data objects themselves are carrying with them the information about how you should operate on them''.
Exercise 2.74
Exercise:
Insatiable Enterprises, Inc., is a highly decentralized conglomerate company consisting of a large number of independent divisions located all over the world. The company's computer facilities have just been interconnected by means of a clever network-interfacing scheme that makes the entire network appear to any user to be a single computer. Insatiable's president, in her first attempt to exploit the ability of the network to extract administrative information from division files, is dismayed to discover that, although all the division files have been implemented as data structures in Scheme, the particular data structure used varies from division to division. A meeting of division managers is hastily called to search for a strategy to integrate the files that will satisfy headquarters' needs while preserving the existing autonomy of the divisions.
Show how such a strategy can be implemented with data-directed programming. As an example, suppose that each division's personnel records consist of a single file, which contains a set of records keyed on employees' names. The structure of the set varies from division to division. Furthermore, each employee's record is itself a set (structured differently from division to division) that contains information keyed under identifiers such as
address
andsalary
. In particular:a. Implement for headquarters a
get-record
procedure that retrieves a specified employee's record from a specified personnel file. The procedure should be applicable to any division's file. Explain how the individual divisions' files should be structured. In particular, what type information must be supplied?b. Implement for headquarters a
get-salary
procedure that returns the salary information from a given employee's record from any division's personnel file. How should the record be structured in order to make this operation work?c. Implement for headquarters a
find-employee-record
procedure. This should search all the divisions' files for the record of a given employee and return the record. Assume that this procedure takes as arguments an employee's name and a list of all the divisions' files.d. When Insatiable takes over a new company, what changes must be made in order to incorporate the new personnel information into the central system?
Answer:
a.
A generic get-record
procedure could be implemented as follows:
(define (get-record personnel-file name) (apply-generic 'get-record personnel-file name))
Division1 (and, analogously, other divisions) could provide its package like that:
(define (install-division1-package) ;; internal procedures (define (get-record file employee-name) ;; ... ) ;; constructors ;; ... ;; interface to the rest of the system (define (tag-file x) (attach-tag 'division1 x)) (define (tag-employee-name x) (attach-tag 'employee-name-division1 x)) (define (tag-employee x) (attach-tag 'employee-division1 x)) (put 'get-record '(division1 employee-name-division1) get-record))
[Should employee-name1 be typed data? Couldn't it be just a non-typed string]
b.
Headquarters could use this procedure:
(define (get-salary employee-record) (apply-generic 'get-salary employee-record))
Division1 (and, analogously, the other divisions) could provide their
version of get-salary
procedures in the following ways:
(define (install-division1-package) ;;... (define (get-salary employee-name) ;; ) ;... ;; interface to the rest of the system ;; .. (put 'get-salary '(employee-division1) get-salary))
c
(define (find-employee-record name divisions) (if (null? divisions) nil (or (get-record name (car divisions)) (find-employee-record name (cdr divisions)))))
d
Somebody has fill the table with the relevant new methods.
get-record
, get-salary
, and find-employee-record
remain the way
they are.
Message Passing
Data-directed programming makes use of a operation-and-type table.
The first method we have seen — generic operations with explicit dispatch — can be thought of as decomposing that table ``into rows, with each generic operation procedure representing a row of the table.''
There is at least a third alternative, known as ``message passing'', which consists in decomposing the table into type columns only:
instead of using "intelligent operations" that dispatch on data types, to work with "intelligent data objects" that dispatch on operation names.
Here is how we could write make-from-real-imag
following such an
approach:
(define (make-from-real-imag x y) (define (dispatch op) (cond ((eq? op 'real-part) x) ((eq? op 'imag-part) y) ((eq? op 'magnitude) (sqrt (+ (square x) (square y)))) ((eq? op 'angle) (atan y x)) (else (error "Unknown op -- MAKE-FROM-REAL-IMAG" op)))) dispatch)
And here is how apply-generic
would look like:
(define (apply-generic op arg) (arg op))
The name comes from the image that a data object is an entity that receives the requested operation name as a "message."
Exercise 2.75
Exercise:
Implement the constructor `make-from-mag-ang' in message-passing style. This procedure should be analogous to the `make-from-real-imag' procedure given above.
Answer:
(define (make-from-mag-ang r a) (define (dispatch op) (cond ((eq? op 'real-part) (* r (cos a))) ((eq? op 'imag-part) (* r (sin a))) ((eq? op 'magnitude) r) ((eq? op 'angle) a) (else (error "Unknown op - MAKE-FROM-MAG-ANG" op)))) dispatch)
Exercise 2.76
Exercise:
As a large system with generic operations evolves, new types of data objects or new operations may be needed. For each of the three strategies — generic operations with explicit dispatch, data-directed style, and message-passing-style — describe the changes that must be made to a system in order to add new types or new operations. Which organization would be most appropriate for a system in which new types must often be added? Which would be most appropriate for a system in which new operations must often be added?
Answer:
Generic operations with explicit dispatch
- Adding a new type
Let's imagine Henry wants to add his representation to the complex number arithmetic system.
Somebody (the ``manager''…) has to change all generic operators.
real-part
,imag-part
,magnitude
,angle
, now need an additonal check for Henry's representation. Moreover, Henry should make sure that the names of his procedures don't conflict with those used by Ben and Alyssa. - Adding a new operation
Suppose we need a
get-foo
operation.All of those in charge of maintaining representation (Ben, Alyssa, and Henry) have to write a method that performs the right operation when their their representation is used. And, again, name conflict is to be avoided.
Moreover, somebody (the ``manager''…) has to write a generic
get-foo
operation. In order to do so, he has to know all the names of the procedures written by Ben, Alyssa, and Henry.
Data-directed Programming
- Adding a new type
Again, let's imagine that Henry wants to add his representation to the complex number arithmetic system we already have.
Henry, in data-directed system, has to ``install'' his packaged, that is, he has to put his procedures into the table.
By following the method shown at p. ? he doesn't have to worry about name conflict (That method exploits scope. There are other ways as well; for example, putting lambdas into the table; see the video lecture. The table stores objects, not names).
There is now no work for the ``manager''. The manager has been ``automated out of existence'' (see the lecture). The generic operations we already have will do the right thing thanks to
apply-generic
(or its simpler versionoperate
shown in the lecture). - Adding a new operation
Again, suppose we need a
get-foo
operation.Somebody has to write a generic
get-foo
.However, unlike with generic operations with explicit dispatch, now we don't need to know anything besides the name of the operation we want to add.
Each package maintainer will have the responsibility to add their version of
get-foo
to the table. However, in case one doesn't and we try toget-foo
to complex number of their type, the generic operation will show a suitable error message.
Message passing
- Adding a new type
Again, let's imagine that Henry wants to add his representation to the complex number arithmetic system we already have.
Henry just as to add a new constructor. The situation is somewhat analogous to the data-directed programming case.
- Adding a new operation
Again, suppose we need a
get-foo
operation.Somebody has to write the generic operation
get-foo
, like in data-directed programming.Moreover, each ``package maintainers'' will have to add their version of
get-foo
in thedispatch
procedure. In this case as well, if somebody forgets to add the relevant procedure, an error message is printed when we applyget-foo
on the relevant object.
Conclusions
Which organization would be most appropriate for a system in which new types must often be added?
Either data-directed programming or message passing. As far as I can now see, they are both superior, in this respect, to generic operations with explicit dispatch, but none of them is obviously better than the other.
Which would be most appropriate for a system in which new operations must often be added?
I would give the same answer: either data-directed programming or message passing. As far as I can now see, they are both superior, in this respect, to generic operations with explicit dispatch, but none of them is obviously better than the other.
Footnotes:
A cool way to represent pairs using procedures only is shown on page 91.