3. Abstract Types

Traditionally, programming books wax mathematical when they arrive at the topic of abstract data types… Such books make it seem as if you'd never actually use an abstract data type except as a sleep aid.
- Steve McConnell

Information hiding is a technique for handling complexity. By hiding implementation details, programs can be understood and developed in distinct modules and the effects of a change can be localized. One technique for information hiding is data abstraction. An abstract type is used to represent some natural program abstraction. It provides functions for manipulating instances of the type. The module that implements these functions is called the implementation module. We call the functions that are part of the implementation of an abstract type the operations of the type. Other modules that use the abstract type are called clients.

Clients may use the type name and operations, but should not manipulate or rely on the actual representation of the type. Only the implementation module may manipulate the representation of an abstract type. This hides information, since implementers and maintainers of client modules should not need to know anything about how the abstract type is implemented. It provides modularity, since the representation of an abstract type can be changed without having to change any client code.

LCLint supports abstract types by detecting places where client code depends on the concrete representation of an abstract type.

To declare an abstract type, the abstract annotation is added to a typedef. For example (in mstring.h),

typedef /*@abstract@*/ char *mstring;

declares mstring as an abstract type. It is implemented using a char *, but clients of the type should not depend on or need to be aware of this. If it later becomes apparent that a better representation such as a string table should be used, we should be able to change the implementation of mstring without having to change or inspect any client code.

In a client module, abstract types are checked by name, not structure. LCLint reports an error if an instance of mstring is passed as a char * (for instance, as an argument to strlen), since the correctness of this call depends on the representation of the abstract type. LCLint also reports errors if any C operator except assignment (=) or sizeof is used on an abstract type. The assignment operator is allowed since its semantics do not depend on the representation of the type.[4] The use of sizeof is also permitted, since this is the only way for clients to allocate pointers to the abstract type. Type casting objects to or from abstract types in a client module is an abstraction violation and will generate a warning message.

Normally, LCLint will assume a type definition is not abstract unless the /*@abstract@*/ qualifier is used. If instead you want all user-defined types to be abstract types unless they are marked as concrete, the +impabstract flag can be used. This adds an implicit abstract annotation to any typedef that is not marked with /*@concrete@*/.

Some examples of abstraction violations detected by LCLint are shown in Figure 2.

3.1 Access

Where code may manipulate the representation of an abstract type, we say the code has access to that type. If code has access to an abstract type, the representation of the type and the abstract type are indistinguishable. Usually, an abstract type is implemented by a single program module that is the only code that has access to the type representation. Sometimes, more complicated access control is desired if the implementation of an abstract type is split across program files, or particular client code needs to access the representation.

There are a several ways of selecting what code has access the representation of an abstract type:

Modules. An abstract type defined in M.h is accessible in M.c. Controlled by the accessmodule flag. This means when accessmodule is on, as it is by default, the module access rule is in effect. If accessmodule is off (when -accessmodule is used), the module access rule is not in effect and an abstract type defined in M.h is not necessarily accessible in M.c
File names. An abstract type named type is accessible in files named type.<extenstion>. For example, the representation of mstring is be accessible in mstring.h and mstring.c. Controlled by the accessfile flag.
Function names. An abstract type named type may be accessible in a function named type_name or typeName. For example, mstring_length and mstringLength would have access to the mstring abstract type. Controlled by the naming convention (see Section 9).
Access control comments. The syntax /*@access type,+@*/[6] allows the following code to access the representation of type. Similarly, /*@noaccess type,+@*/ restricts access to the representation of type. The type in a noaccess comment must have been declared as an abstract type.

3.2 Mutability

We can view types as being mutable or immutable. A type is mutable if the value of an instance of the type can be changed by passing it as a parameter to a function call.[7] For example, the primitive type int is immutable. If i is a local variable of type int and no variables point to the location where i is stored, the value of i must be the same before and after the call f(i). Structure and union types are also immutable, since they are copied when they are passed as arguments. On the other hand, pointer types are mutable. If x is a local variable of type int *, the value of *x (and hence, the value of the object x) can be changed by the function call g(x).

The mutability of a concrete type is determined by its type definition. For abstract types, mutability does not depend on the type representation but on what operations the type provides. If an abstract type has operations that may change the value of instances of the type, the type is mutable. If not, it is immutable. The value of an instance of an immutable type never changes. Since object sharing is noticeable only for mutable types, they are checked differently from immutable types.

The /*@mutable@*/ and /*@immutable@*/ annotations are used to declare an abstract type as mutable or immutable. (If neither is used, the abstract type is assumed to be mutable.) For example,

typedef /*@abstract@*/ /*@mutable@*/ char *mstring; typedef /*@abstract@*/ /*@immutable@*/ int weekDay; declares mstring as a mutable abstract type and weekDay as an immutable abstract type.

Clients of a mutable abstract type need to know the semantics of assignment. After the assignment expression s = t, do s and t refer to the same object (that is, will changes to the value of s also change the value of t)?

LCLint prescribes that all abstract types have sharing semantics, so s and t would indeed be the same object. LCLint will report an error if a mutable type is implemented with a representation (e.g., a struct) that does not provide sharing semantics (controlled by mutrep flag).

The mutability of an abstract type is not necessarily the same as the mutability of its representation. We could use the immutable concrete type int to represent mutable strings using an index into a string table, or declare mstring as immutable as long as no operations are provided that modify the value of an mstring.

3.3 Boolean Types

Standard C has no boolean representation - the result of a comparison operator is an integer, and no type checking is done for test expressions. Many common errors can be detected by introducing a distinct boolean type and stronger type checking.

Use the -booltype name flag to select the type name used to represent boolean values[8] Relations, comparisons and certain standard library functions are declared to return bool types.

LCLint checks that the test expression in an if, while, or for statement or an operand to &&, || or ! is a boolean. If the type of a test expression is not a boolean, LCLint will report an error depending on the type of the test expression and flag settings. If the test expression has pointer type, LCLint reports an error if predboolptr is on (this can be used to prevent messages for the idiom of testing if a pointer is not null without a comparison). If it is type int, an error is reported if predboolint is on. For all other types, LCLint reports an error if predboolothers is on.

Since using = instead of == is such a common bug, reporting of test expressions that are assignments is controlled by the separate predassign flag. The message can be suppressed by adding extra parentheses around the test expression.

Apppendix C (page 50) describes other flags for controlling boolean checking.

Figure 3. Boolean Checking

3.4 Primitive C Types

Two types have compatible type if their types are the same.
- ANSI C, 3.1.2.6.

Two types need not be identical to be compatible.
- ANSI C, footnote to 3.1.2.6.

LCLint supports stricter checking of primitive C types. The char and enum types can be checked as distinct types, and the different numeric types can be type-checked strictly.

3.4.1 Characters

The primitive char type can be type-checked as a distinct type. If char is used as a distinct type, common errors involving assigning ints to chars are detected.

The +charint flag can be used for checking legacy programs where char and int are used interchangeably. If charint is on, char types indistinguishable from ints. To keep char and int as distinct types, but allow chars to be used to index arrays, use +charindex.

3.4.2 Enumerators

Standard C treats user-declared enum types just like integers. An arbitrary integral value may be assigned to an enum type, whether or not it was listed as an enumerator member. LCLint checks each user-defined enum type as distinct type. An error is reported if a value that is not an enumerator member is assigned to the enum type, or if an enum type is used as an operand to an arithmetic operator.

If the enumint flag is on, enum and int types may be used interchangeably. Like charindex, if the enumindex flag is on, enum types may be used to index arrays.

3.4.3 Numeric Types

LCLint reports where numeric types are used in dangerous or inconsistent ways. With the strictest checking, LCLint will report an error anytime numeric types do not match exactly. If the relaxquals flag is on, only those inconsistencies which may corrupt values are reported. For example, if an int is assigned to a variable of type long (or passed as a long formal parameter), LCLint will not report an error if relaxquals is on since a long must have at least enough bits to store an int without data loss. On the other hand, an error would be reported if the long were assigned to an int, since the int type may not have enough bits to store the long value.

Similarly, if a signed value is assigned to an unsigned, LCLint will report an error since an unsigned type cannot represent all signed values correctly. If the ignoresigns flag is on, checking is relaxed to ignore all sign qualifiers in type comparisons (this is not recommended, since it will suppress reporting of real bugs, but may be necessary for quickly checking certain legacy code).

3.4.4 Arbitrary Integral Types

LCLint supports three different kinds of arbitrary integral types:

/*@integraltype@*/
An arbitrary integral type. The actual type may be any one of short, int, long, unsigned short, unsigned, or unsigned long.

/*@unsignedintegraltype@*/
An arbitrary unsigned integral type. The actual type may be any one of unsigned short, unsigned, or unsigned long.

/*@signedintegraltype@*/
An arbitrary signed integral type. The actual type may be any one of short, int, or long.

LCLint reports an error if the code depends on the actual representation of a type declared as an arbitrary integral. The match-any-integral flag relaxes checking and allows an arbitrary integral type is allowed to match any integral type.

Other flags set the arbitrary integral types to a concrete type. These should only be used if portability to platforms that may use different representations is not important. The long-integral and long-unsigned-integral flags set the type corresponding to /*@integraltype@*/ to be unsigned long and long respectively. The long-unsigned-unsigned-integral flag sets the type corresponding to /*@unsignedintegraltype@*/ to be unsigned long. The long-signed-integral flag sets the type corresponding to /*@signedintegraltype@*/ to be long.

David Evans
Software Devices and Systems
evs@sds.lcs.mit.edu