Umbriel - imperative programming for unsophisticated students

P.D. Terry,

Department of Computer Science,
Rhodes University,
GRAHAMSTOWN 6140, South Africa

p.terry@ru.ac.za

(This article was published in ACM SIGCSE Bulletin, 20(3),7-14 (1995))

Abstract

This article discusses an experiment in designing and using Umbriel, a minimal imperative programming language in the Pascal tradition, for teaching the rudiments of programming in situations where the overwhelming complexities of many modern language implementations have become intolerable.

Introduction

The debate about how best to teach computer programming to novices shows no signs of diminishing; rather the opposite. Some have dared to suggest that it should not be attempted at all (Soloway, 1993), others spend considerable effort in searching for the perfect language and the perfect paradigm (see Conway (1993) and Decker and Hirshfield (1993, 1994) for some typical recent surveys).

Of one thing we have become convinced: given that one feels some experience in imperative programming is still necessary, the trend for systems and programming languages to become ever more complex has had a devastating effect on the teaching and learning of algorithmic skills. Languages like Pascal are now implemented with a host of features (and relaxations of run time checking) that were never part of the original designers' plan, causing them to lose the intrinsic simplicity that made them so suited to teaching the fundamental skills of algorithm design and implementation. Worse still, current fashion dictates that only languages like C and C++ are worth learning, even thought their complexity is quite terrifying (Mody, 1991; Sakkinen, 1992). Students find an enormous, disproportionate, amount of effort goes into understanding the quirks of syntax, scope and linkage, the exceptions rather than the rules, and the foibles of the compiler; into trying to find the procedure that they need from the thousands that are listed in the run-time libraries; into fighting linkers that complain about incompatible versions of object code; into being sucked into using sophisticated debuggers that provide too much, rather than too little information, and so on. Faculty who have been using ornate systems for some time manage, with some difficulty, to keep abreast of the new additions and glitz. There is a danger that they start to believe that their students can instantly gain the same maturity. This is just not true, even though the most vocal part of any class may be the minority that have had some exposure to computing before embarking on university studies.

For second-language students, who now make up a significant proportion of our own first-year classes, and who often come from educationally deprived backgrounds, the total effect is known to be quite overwhelming; most of them in recent years have simply not survived beyond the first few weeks.

Various interesting attempts have been made to handle these problems. One such, an attempt to provide a free-format algorithmic language for students in a similar situation is described by Pyott and Sanders (1991). At Rhodes University, where we have successfully used Modula-2 as the language of choice for some years over major portions of our curriculum, 1994 saw the Umbriel project undertaken. This represents an attempt to cut right back to using a very simple system for a foundation course that would satisfy the following goals:

It might be thought that these goals could be achieved in a more traditional way - by simply teaching a subset of a full language, and only using a subset of the features available in a full implementation. However, as already intimated, the problems of successfully hiding unwanted complexity in the hope that weaker students would never discover it, fall foul of it, or be encouraged by more able colleagues to be beguiled by it, seemed to demand a more draconian approach.

The programming course

Nothing particularly original can be claimed for the content of the programming course. Past experience had suggested that there are five aspects of elementary programming in Algol-like languages that seem to cause more than the predictable difficulties to beginners, namely

It was hoped that by careful attention to language design, the first three of these could finally be tamed. Since the array concept was already familiar from exposure to spreadsheets, no particular precautions were taken in that regard; pointers were, understandably, simply left right out of the syllabus.

Several other features distinguished the course from previous attempts to teach programming to second-language speakers, few of whom had ever met with any real degree of success:

Umbriel - the programming language

A fuller description of Umbriel can be found elsewhere (Terry, 1995). It was modelled on Modula-2 (rather than Pascal) not only to allow for easy transition to the language the students would use in later courses, but also to take advantage of the improvements that came about when Modula-2 was designed - such as the elimination of the "dangling ELSE" and the possibility of a default option in CASE statements. Umbriel, like Oberon, is the name of a moon of Uranus. Miranda is the smallest of her moons, but that name is already bespoke in the programming language game. So is Ariel, the original name chosen for the new language.

Umbriel is characterized mainly by omissions. In particular, there are no WITH statements, variant records, open arrays, set types, pointer types, subranges (except in array declarations), user defined enumeration types, procedure types, forward declarations, imports, exports, or local modules.

In two important respects the language is even more restrictive than Modula-2: - Anonymous structured types are not allowed: like formal parameters, variables must be declared in terms of a previously introduced type identifier. This increases orthogonality, heightens the idea of "data abstraction" and eliminates the need to discuss concepts like name and structural equivalence. The expense of a slight clutter in the name space is unimportant for programs on this scale. - In classic Pascal the ordering of the components in a program or procedure block is very restrictive. It may be summarized in EBNF on the lines of

Header [ Constants ] [ Types ] [ Variables ] { Procedure } StatementPart

In Modula-2, however, this ordering is highly permissive:

Header { Constants | Types | Variables | Procedure } StatementPart

Oberon introduced an interesting restriction:

Header { Constants | Types | Variables } { Procedure } StatementPart

However, Umbriel imposes a different restriction:

Header { Constants | Types | Procedure } { Variables } StatementPart

that is, variables may not be declared ahead of procedures in the same scope. This has the effect of denying access to "global" variables from within procedures. It makes the treatment of issues like scope rules very much easier, and forces students to come to terms with the proper use of parameters from an early stage. The formal parameters of a procedure are, of course, "visible" from within any procedures nested within it.

Another restriction imposed by Umbriel, but not found in classic implementations of its relatives, is that a FOR loop control variable has to be strictly "local", and may not be "threatened" within the body of its loop.

On the other hand, Umbriel treats INTEGER and REAL as expression compatible in the way that Pascal and Oberon do. The incompatabilities between numeric types in Modula-2 are a constant source of irritation to beginners, who cannot be convinced of any reason for such restrictions, and who find them a very real stumbling block when trying to develop simple classical data handling algorithms. Similarly, the standard procedures INC and DEC may be applied in Umbriel to any scalar type, including REAL.

Allowing for the omitted features and the extra restrictions already mentioned, the syntax is essentially that of Modula-2, with one other important difference. I/O is similar to that found in Pascal, with the further addition that strings may be inserted into Read statements to serve as prompts to the user. Experience has shown that this rather baroque model is concise, effective, easily learned, and perfectly adequate for beginners' programs.

To the usual INTEGER, REAL, CHAR and BOOLEAN types were added STRING (ARRAY [0 .. 79] OF CHAR) and COLORS (for turtle graphics). Variables of any standard type, including STRING and COLORS, may be used as arguments in Read and Write procedure calls. However, other string support is like that of Modula-2, that is, virtually non-existent without the development of special procedures. The compiler recognizes only Length and CompareStr as pervasives. This allows very simple strings to be used from the outset, and yet allows one to point out that not all "obvious" overloadings of operators are permitted in computer languages, as a prelude to meeting this in earnest in Modula-2.

Two instances of non-orthogonality in the language design persisted in the implementations used in the pilot course:

It is interesting to note that both gave the students some discomfort - for example, after being introduced to the function concept, the best student in the class tried, as his first experiment, to declare a function that would return a value of STRING type, and was dismayed to discover that this was not possible.

Umbriel - the implementation

The first implementation of Umbriel was developed in Turbo Pascal as a heavily modified version of the Pascal-S compiler (Wirth, 1981). The current implementation is coded in Modula-2. The compiler produces p-code very similar to that used in Pascal-S, which is then interpreted. Various devices were employed to improve performance on the relatively slow machines available to the students, with the result that compilation is very fast indeed, while the inefficiencies of interpretation are simply not noticed for programs of the complexity that the programming course demands.

Various features of the implementation have proved particularly useful:

Appraisal

Much of the students' reaction and difficulty with the course was fairly predictable. To our relief, they seemed to experience no difficulty with handling the integrated environment, with the basic aspects of the edit- compile-execute cycle, or with understanding and correcting simple syntax errors, which suggests that effort put into tuning the implementation to the students' ability paid off. Similarly, far less difficulty was experienced with the concepts of parameter handling and scope rules than ever before, suggesting that the language design in these areas had been successful. Indeed, a case might be made for using Umbriel rather than Modula-2 as the first teaching language for a wider audience.

Because so few students (11) were involved, no attempt was made to run controlled experiments to measure the success of the group relative to a similar group who might have taken the normal programming course in the same year. Experience from previous years would suggest that the group would all have failed, or dropped out of the regular course, but all of them passed the final examination. Some showed a quite unbelievable maturity in their algorithm design and coding style, and all expressed the desire to study Computer Science further. Nevertheless, various shortcomings were obvious to experienced faculty members, and future courses will attempt to correct these. In particular, we fell into the common trap of placing too much emphasis on simple numerical problems. While these are understandable, they are "not the material to inspire the cellular phone generation" (Cox and Clark, 1994).

The problem areas that persisted were interesting. Students complained that turtle graphics was "difficult". This was unexpected, since previous experience had suggested that turtle graphics was a very useful tool in teaching programming. We came to the conclusion that the problem lay more with an inability to reason about trigonometry than with programming (almost unbelievably, many students could not calculate the interior angles of regular polyhedra). Nested loops and nested procedures were regarded as very difficult, and attempts to introduce all three forms of indeterminate loop just left many students confused. This served to reinforce our suspicion that faculty overestimate the ability of students to handle complexity, particularly in the form of non-orthogonal and redundant language features.

Future developments

Several authors have written enthusiastically of the merits of introducing OOP concepts from the start of the programming course (Decker and Hirshfield (1993, 1994)). We remain highly suspicious of the viability of this approach when it is based on anything as elaborate as C++, especially for unsophisticated students like ours. However, an investigation is under way to add simple classes to Umbriel, perhaps along the lines of those used in FST Modula-2, to which students in our regular courses have adapted with great ease.

In the current implementation, "declare before use" is strictly enforced, and there is no provision "forward" declaration of procedures. Since no attempt was made to cover mutually recursive procedures, this did not prove problematic at all. However, it remains a language design issue that has not yet been satisfactorily reconciled with future directions the course might take.

Availability

The current implementation runs under MS-DOS, and may be obtained by FTP from cs.ru.ac.za, directory /pub/languages, file UMBRIELn.EXE (the n stands for a release number). This is a self-extracting file, containing executables, documentation and some examples.

Notes on configuring the QEdit editor from Semware Corporation to provide a complete IDE are also supplied; a shareware version of QEdit is available from many sites, including this one. A monograph on introductory programming in Umbriel, and a large set of worked examples is also available on request to the author.

That FTP server is no longer operational. The files are available from http://www.scifac.ru.ac.za/cspt/qedit3c.zip and http://www.scifac.ru.ac.za/cspt/umbriel.zip

Acknowledgements

The author is grateful to Professor Peter Wentworth and Rory Freeman for valuable suggestions on the design of Umbriel, and to Ismail Ikram for assistance with the laboratory sessions.

References

Conway, D.: "Criteria and Consideration in the Selection of a First Programming Language". Technical Report 93/102. Department of Computer Science, Monash University, 1993.

Cox, K.R. and Clark D.: "Computing modules that empower students". Computers Educ., 23(4), 277-284, 1994.

Decker R. and Hirshfield, S.: "Top-Down Teaching: Object Oriented Programming in CS1". ACM SIGCSE Bulletin, 25(1), 270-273, 1993.

Decker R. and Hirshfield, S.: "The top 10 reasons why Object-Oriented programming can't be taught in CS1". ACM SIGCSE Bulletin, 26(1), 51-55, 1994.

Mody, R.P.: "C in education and software engineering". ACM SIGCSE Bulletin, 23(3), 45-56, 1991.

Pyott, S. and Sanders, I.: "ALEX: an aid to teaching algorithms". ACM SIGCSE Bulletin, 23(3), 36-44, 1991.

Sakkinen, M.: "The darker side of C++ revisited". Structured Programming, 13(4), 155-178, 1992.

Solway, E.: "Should we teach students to program?". Comm ACM, 36(10), 21, 1993.

Terry, P.D.: "Umbriel - a minimal programming language". ACM SIGPLAN Notices, 30(5), 11-17, 1995.

Wirth, N.: "Pascal-S: a subset and its implementation", in Pascal - The Language and its Implementation, John Wiley, Chichester, 1981.

Wirth, N,: "The programming language Oberon". Software - Practice and Experience, 18, 671-690, 1988.

Wirth, N.: Programming in Modula-2 (3rd edition). Springer-Verlag, Berlin, 1985.

MS-DOS is a trademark of MicroSoft Corporation. QEdit is a trademark of Semware Corporation. Turbo Pascal is a trademark of Borland International.

A taste of Umbriel

We conclude with an example of an Umbriel program, typical of those that might have been written by the best students at the end of our course:

MODULE Epidemic;
(* Simulate a simple epidemic where there is a 5 x 5 grid of people.  An
   instance of the disease runs to 30 days.  In the first ten days the
   sufferer can pass it on to people nearby.  For the last twenty days the
   sufferer has effectively recovered, and cannot be reinfected (ie is
   immune).  If one is at risk, one has a 10% chance of contracting the
   disease if a neighbour in front, behind, on the left, or on the right is
   currently contagious.
   P.D. Terry, Rhodes University, 1994 *)

  CONST
    HighlyContagious = 0;
    Recovered        = 9;
    Immune           = 29;
    AtRisk           = 100;
    GridSize         = 5;
  TYPE
    (* we use an extra set of cells on the border to make the logic easy *)
    GRIDS = ARRAY [0 .. GridSize + 1] , [0 .. GridSize + 1] OF INTEGER;

  PROCEDURE Display (State : GRIDS; Day : INTEGER);
  (* Describe the scene on a particular Day *)
    VAR
      I, J : INTEGER;
    BEGIN
      GoToXY(0, 0); Write("Day ", Day);
      FOR I := 1 TO GridSize DO
        FOR J := 1 TO GridSize DO
          GoToXY(12 * (J-1), 2 * I);
          CASE State[I, J] OF
              HighlyContagious .. Recovered : Write("Contagious")
            | Recovered+1 .. Immune         : Write("Immune    ")
            ELSE                              Write("At Risk   ")
          END
        END
      END
    END Display;

  PROCEDURE Initialize (VAR State : GRIDS; VAR Day, Contagious : INTEGER);
  (* Initialize the grid so that all cells save the poor ones Contagious are "at risk" *)
    VAR
      X, Y : INTEGER;
    BEGIN
      FOR X := 0 TO GridSize + 1 DO
        FOR Y := 0 TO GridSize + 1 DO
          State[Y, X] := AtRisk
        END
      END;
      Contagious := 0;
      LOOP
        Read("Supply x-y coordinates of an infected person [1..5] or 0 to stop ", X);
        IF X = 0 THEN EXIT END;
        Read(Y); State[Y, X] := HighlyContagious;
        INC(Contagious)
      END;
      Day := 1; ClearScreen; Display(State, Day)
    END Initialize;

  PROCEDURE Update (VAR State : GRIDS; VAR Date, Contagious : INTEGER);
  (* Update State matrix to reflect the next day.  Also count the number
     still Contagious (hopefully this eventually becomes zero) *)

    PROCEDURE NeighbourContagious (I, J : INTEGER) : BOOLEAN;
    (* check cells in front, behind, left, right of cell at I, J *)
      BEGIN
        RETURN (State[I-1, J] <= Recovered) OR (State[I+1, J] <= Recovered) OR
               (State[I, J-1] <= Recovered) OR (State[I, J+1] <= Recovered)
      END NeighbourContagious;

    VAR
      NewState : GRIDS;
      I, J : INTEGER;

    BEGIN
      NewState := State; Contagious := 0; INC(Date);
      FOR I := 1 TO GridSize DO
        FOR J := 1 TO GridSize DO
          INC(NewState[I, J]);
          IF (NewState[I, J] > Immune)              (* susceptible again *)
             AND NeighbourContagious(I, J)          (* and nasty neighbours *)
             AND (RANDOM(10) = 0)                   (* that 10% chance! *)
            THEN NewState[I, J] := HighlyContagious (* bad luck - you're ill *)
          END;
          IF NewState[I, J] <= Immune THEN INC(Contagious) END
        END
      END;
      State := NewState
    END Update;

  VAR
    Population : GRIDS;
    DayNow, Contagious : INTEGER;

  BEGIN
    Initialize(Population, DayNow, Contagious);
    WHILE Contagious > 0 DO
      Update(Population, DayNow, Contagious);
      Display(Population, DayNow)
    END
  END Epidemic.