Skip to content
SE eBook
Menu

Token Count Metric

Public section
Preferences are saved on this device.
Token Count and Halstead’s Metrics

Token count measures program size and complexity by treating source code as a sequence of tokens, each classified as either an operator or an operand. This idea underlies Halstead’s software metrics, which are widely used in tools that analyse code and estimate complexity.

In this context:

  • Operators include:
    • Arithmetic symbols – + - * /
    • Keywords – while, for, if, return, printf
    • Special symbols – { } ( ) = ; , [ ]
    • Function names used as actions – e.g. eof, scanf, sort
  • Operands include variables, constants, and labels used in the program.

Halstead’s central idea is that a program (an implementation of an algorithm) can be viewed as a collection of operator and operand tokens. From their counts, several base and derived measures are computed.

Base Measures

By scanning the source code and classifying tokens, we collect four base measures:

  • n1 – number of distinct operators.
  • n2 – number of distinct operands.
  • N1 – total number of operator occurrences.
  • N2 – total number of operand occurrences.
Derived Halstead Metrics

From these four base values, Halstead defined several derived metrics:

Program vocabulary – total number of distinct tokens:

$$\mathrm{n} = n_1 + n_2$$

Program length – total number of token occurrences:

$$\mathrm{N} = N_1 + N_2$$

Estimated program length – theoretical length based on the vocabulary:

$$\hat{\mathrm{N}} = n_1 \log_2 n_1 + n_2 \log_2 n_2$$

Program volume – information content of the program in bits:

$$\mathrm{V} = \mathrm{N} \log_2 \mathrm{n}$$

The unit of program volume V is bits.

Program difficulty – how hard the program is to write or understand:

$$\mathrm{D} = \frac{n_1}{2}\cdot\frac{N_2}{n_2}$$

Program level – inverse of difficulty; higher level means easier (better) code:

$$\mathrm{L} = \frac{1}{\mathrm{D}}$$

Programming effort – estimated mental effort to implement or understand the program:

$$\mathrm{E} = \mathrm{D} \times \mathrm{V}$$

As a rough guideline:

  • Larger vocabulary and volume → more complex program.
  • Higher difficulty and effort → more error-prone and harder to maintain.
Example: Token Count

Table 5 shows a sample token count for a small program. Operators and operands are listed with their number of occurrences.

Table 5: A token count example
Operators Occurrences Operands Occurrences
int 4 SORT 1
() 5 x 7
, 4 n 3
[] 7 i 8
if 2 j 7
< 2 save 3
; 11 im1 3
for 2 2 2
= 6 1 3
- 1 0 1
<= 2
++ 2
return 2
{ } 3
n1 = 14,  N1 = 53 n2 = 10,  N2 = 38

Using these values, you can now compute vocabulary n = n₁ + n₂, length N = N₁ + N₂, volume V = N \log₂ n, difficulty D = (n₁/2) · (N₂/n₂), and effort E = D × V using the formulas above.

Login to add personal notes and bookmarks.