9. Naming Conventions

Naming conventions tend to be a religious issue. Generally, it doesn't matter too much what naming convention is followed as long as one is chosen and followed religiously. There are two kinds of naming conventions supported by LCLint. Type-based naming conventions (Section 9.1) constrain identifier names according to the abstract types that are accessible where the identifier is defined. Prefix naming conventions (Section 9.2) constrain the initial characters of identifier names according to what is being declared and its scope. Naming conventions may be combined or different conventions may be selected for different kinds of identifiers. In addition, LCLint supports checking that names do not conflict with names reserved for the standard library or implementation (Section 9.3) and that names are sufficiently distinguishable from other names.

9.1 Type-Based Naming Conventions

Generic naming conventions constrain valid names of identifiers. By limiting valid names, namespaces may be preserved and programs may be more easily understood since the name gives clues as to how and where the name is defined and how it should be used.

Names may be constrained by the scope of the name (external, file static, internal), the file in which the identifier is defined, the type of the identifier, and global constraints.

9.1.1 Czech Names

Of course, this is a complete jumble to the uninitiated, and that's the joke.
- Charles Simonyi, on the Hungarian naming convention

Czech[23] names denote operations and variables of abstract types by preceding the names by <type>_. The remainder of the name should begin with a lowercase character, but may use any other character besides the underscore. Types may be named using any non-underscore characters.

The Czech naming convention is selected by the czech flag. If accessczech is on, a function, variable, constant or iterator named <type>_<name> has access to the abstract type <type>.

Reporting of violations of the Czech naming convention is controlled by different flags depending on what is being declared:

czechfcns

Functions and iterators. An error is reported for a function name of the form <prefix>_<name> where <prefix> is not the name of an accessible type. Note that if accessczech is on, a type named <prefix> would be accessible in a function beginning with <prefix>_. If accessczech is off, an error is reported instead. An error is reported for a function name that does not have an underscore if any abstract types are accessible where the function is defined.

czechvars
czechconstants
czechmacros

Variables, constants and expanded macros. An error is reported if the identifier name starts with <prefix>_ and prefix is not the name of an accessible abstract type, or if an abstract type is accessible and the identifier name does not begin with <type>_ where type is the name of an accessible abstract type. If accessczech is on, the representation of the type is visible in the constant or variable definition.

czechtypes

User-defined types. An error is reported if a type name includes an underscore character.

9.1.2 Slovak Names

Slovak names are similar to Czech names, except they are spelled differently. A Slovak name is of the form <type><Name>. The type prefix may not use uppercase characters. The remainder of the name starts with the first uppercase character.

The slovak flag selects the Slovak naming convention. Like Czech names, it may be used with accessslovak to control access to abstract representations. The slovakfcns, slovakvars, slovakconstants, and slovakmacros flags are analogous to the similar Czech flags. If slovaktype is on, an error is reported if a type name includes an uppercase letter.

9.1.3 Czechoslovak Names

Czechoslovak names are a combination of Czech names and Slovak names. Operations may be named either <type>_ followed by any sequence of non-underscore characters, or <type> followed by an uppercase letter and any sequence of characters. Czechoslovak names have been out of favor since 1993, but may be necessary for checking legacy code. The czechoslovakfcns, czechoslovakvars, czechoslovakmacros, and czechoslovakconstants flags are analogous to the similar Czech flags. If czechoslovaktype is on, an error is reported if a type name contains either an uppercase letter or an underscore character.

9.2 Namespace Prefixes

Another way to restrict names is to constrain the leading character sequences of various kinds of identifiers. For example, a the names of all user-defined types might begin with "T" followed by an uppercase letter and all file static names begin with an uppercase letter. This may be useful for enforcing a namespace (e.g., all names exported by the X-windows library should begin with "X") or just making programs easier to understand by establishing an enforced convention. LCLint can be used to constrain identifiers in this way to detect identifiers inconsistent with prefixes.

All namespace flags are of the form, -<context>prefix <string>. For example, the macro variable namespace restricting identifiers declared in macro bodies to be preceded by "m_" would be selected by -macrovarprefix "m_". The string may contain regular characters that may appear in a C identifier. These must match the initial characters of the identifier name. In addition, special characters (shown in Table 1) can be used to denoted a class of characters.[24] The * character may be used at the end of a prefix string to specify the rest of the identifier is zero or more characters matching the character immediately before the *. For example, the prefix string "T&*" matches "T" or "TWINDOW" but not "Twin".

^    Any uppercase letter, A-Z                                                 
&    Any lowercase letter, a-z                                                 
%    Any character that is not an uppercase letter (allows lowercase           
     letters, digits and underscore)                                           
~    Any character that is not a lowercase letter (allows uppercase letters,   
     digits and underscore)                                                    
$    Any letter (a-z, A-Z)                                                     
/    Any letter or digit (A-Z, a-z, 0-9)                                       
?    Any character valid in a C identifier                                     
#    Any digit, 0-9

Table 1. Prefix character codes.

Different prefixes can be selected for the following identifier contexts:

macrovarprefix

Any variable declared inside a macro body

uncheckedmacroprefix

Any macro that is not checked as a function or constant (see Section 8.4)

tagprefix

Tags for struct, union and enum declarations

enumprefix

Members of enum types

typeprefix

Name of a user-defined type

filestaticprefix

Any identifier with file static scope

globvarprefix

Any variable (not of function type) with global variables scope

externalprefix

Any exported identifier

If an identifier is in more than one of the namespace contexts, the most specific defined namespace prefix is used (e.g., a global variable is also an exported identifier, so if globalvarprefix is set, it is checked against the variable name; if not, the identifier is checked against the externalprefix.)

For each prefix flag, a corresponding flag named <prefixname>exclude controls whether errors are reported if identifiers in a different namespace match the namespace prefix. For example, if macrovarprefixexclude is on, LCLint checks that no identifier that is not a variable declared inside a macro body uses the macro variable prefix.

Here is a (somewhat draconian) sample naming convention:

-uncheckedmacroprefix "~*"
unchecked macros have no lowercase letters
-typeprefix "T^&*"
all type typenames begin with T followed by an uppercase letter. The rest of the name is all lowercase letters.
+typeprefixexclude
no identifier that does not name a user-defined type may begin with the type name prefix (set above)
-filestaticprefix"^&&&"
file static scope variables begin with an uppercase letter and three lowercase letters
-globvarprefix "G"
all global variables variables start with G
+globvarprefixexclude
no identifier that is not a global variable starts with G

9.3 Naming Restrictions

Additional naming restrictions can be used to check that names do no conflict with names reserved for the standard library, and that identifier are sufficiently distinct (either for the compiler and linker, or for the programmer.) Restrictions may be different for names that are needed by the linker (external names) and names that are only needed during compilations (internal names). Names of non-static functions and global variables are external; all other names are internal.

9.3.1 Reserved Names

Many names are reserved for the implementation and standard library. A complete list of reserved names can be found in [vdL, p. 126-128] or [ANSI, Section 4]. Some name prefixes such as str followed by a lowercase character are reserved for future library extensions. Most C compilers do not detect naming conflicts, and they can lead to unpredictable program behavior. If ansireserved is on, LCLint reports errors for external names that conflict with reserved names. If ansireservedinternal is on, errors are also reported for internal names.

9.3.2 Distinct Identifiers

The decision to retain the old six-character case-insensitive restriction on significance was most painful.
- ANSI C Rationale

LCLint can check that identifiers differ within a given number of characters, optionally ignoring alphabetic case and differences between characters that look similar. The number of significant characters may be different for external and internal names.

Using +distinctexternalnames sets the number of significant characters for external names to six and makes alphabetical case insignificant for external names. This is the minimum significance acceptable in an ANSI-conforming compiler. Most modern compilers exceed these minimums (which are particularly hard to follow if one uses the Czech or Slovak naming convention). The number of significant characters can be changed using the externalnamelength <number> flag. If externalnamecaseinsensitive is on, alphabetical case is ignored in comparing external names. LCLint reports identifiers that differ only in alphabetic case.

For internal identifiers, a conforming compiler must recognize at least 31 characters and treat alphabetical cases distinctly. Nevertheless, it may still be useful to check that internal names are more distinct then required by the compiler to minimize the likelihood that identifiers are confused in the program. Analogously to external names, the internalnamelength <number> flag sets the number of significant characters in an internal name and internalnamecaseinsensitive sets the case sensitivity. The internalnamelookalike flag further restricts distinctions between identifiers. When set, similar-looking characters match -- the lowercase letter "l" matches the uppercase letter "I" and the number "1"; the letter "O" or "o" matches the number "0"; "5" matches "S"; and "2" matches "Z". Identifiers that are not distinct except for look-alike characters will produce an error message. External names are also internal names, so they must satisfy both the external and internal distinct identifier checks.

Figure 18 illustrates some of the name checking done by LCLint.

David Evans
Software Devices and Systems
evs@sds.lcs.mit.edu