Tuesday, 18 April 2017

A tutorial on signed and unsigned integers


written by Nigel Jones
One of the interesting things about writing a blog is looking at the search terms that drive traffic to your blog. In my case, after I posted these thoughts on signed versus unsigned integers, I was amazed to see how many people were ending up here looking for basic information concerning signed and unsigned integers. In an effort to make these folks visits more successful, I thought I’d put together some basic information on this topic. I’ve done it in a question and answer format.
All of these questions have been posed to a search engine which has driven traffic to this blog. For regular readers of this blog looking for something a bit more advanced, you will find the last section more satisfactory.

Are integers signed or unsigned?

A standard C integer data type (‘int’) is signed. However, I strongly recommend that you do not use the standard ‘int’ data type and instead use the C99 data types. See here for an explanation.

How do I convert a signed integer to an unsigned integer?

This is in some ways a very elementary question and in other ways a very profound question. Let’s consider the elementary issue first. To convert a signed integer to an unsigned integer, or to convert an unsigned integer to a signed integer you need only use a cast. For example:
int  a = 6;
unsigned int b;
int  c;

b = (unsigned int)a;

c = (int)b;
Actually in many cases you can dispense with the cast. However many compilers will complain, and Lint will most certainly complain. I recommend you always explicitly cast when converting between signed and unsigned types.
OK, well what about the profound part of the question? Well if you have a variable of type int, and it contains a negative value such as -9 then how do you convert this to an unsigned data type and what exactly happens if you perform a cast as shown above? Well the basic answer is – nothing. No bits are changed, the compiler just treats the bit representation as unsigned. For example, let us assume that the compiler represents signed integers using 2’s complement notation (this is the norm – but is *not* mandated by the C language). If our signed integer is a 16 bit value, and has the value -9, then its binary representation will be 1111111111110111. If you now cast this to an unsigned integer, then the unsigned integer will have the value 0xFFF7 or 6552710. Note however that you cannot rely upon the fact that casting -9 to an unsigned type will result in the value 0xFFF7. Whether it does or not depends entirely on how the compiler chooses to represent negative numbers.

What’s more efficient – a signed integer or an unsigned integer?

The short answer – unsigned integers are more efficient. See here for a more detailed explanation.

When should I use an unsigned integer?

In my opinion, you should always use unsigned integers, except in the following cases:
  • When the entity you are representing with your variable is inherently a signed value.
  • When dealing with standard C library functions that required an int to be passed to them.
  • In certain weird cases such as I documented here.
Now be advised that many people strongly disagree with me on this topic. Naturally I don’t find their arguments persuasive.

Why should I use an unsigned integer?

Here are my top reasons:
  • By using an unsigned integer, you are conveying important information to a reader of your code concerning the expected range of values that a variable may take on.
  • They are more efficient.
  • Modulus arithmetic is completely defined.
  • Overflowing an unsigned data type is defined, whereas overflowing a signed integer type could result in World War 3 starting.
  • You can safely perform shift operations.
  • You get a larger dynamic range.
  • Register values should nearly always be treated as unsigned entities – and embedded systems spend a lot of time dealing with register values.

What happens when I mix signed and unsigned integers?

This is the real crux of the problem with having signed and unsigned data types. The C standard has an entire section on this topic that only a compiler writer could love – and that the rest of us read and wince at. Having said that, it is important to know that integers that are signed get promoted to unsigned integers. If you think about it, this is the correct thing to happen. However, it can lead to some very interesting and unexpected results. A number of years ago I wrote an article “A ‘C’ Test:The 0x10 Best Questions for Would-be Embedded Programmers” that was published in Embedded Systems Programming magazine. You can get an updated and corrected copy at my web site. My favorite question from this test is question 12 which is reproduced below – together with its answer: What does the following code output and why?
void foo(void)
{
 unsigned int a = 6;
 int b = -20;
 (a+b > 6) ? puts("> 6") : puts("<= 6");
}
This question tests whether you understand the integer promotion rules in C – an area that I find is very poorly understood by many developers. Anyway, the answer is that this outputs “> 6”. The reason for this is that expressions involving signed and unsigned types have all operands promoted to unsigned types. Thus -20 becomes a very large positive integer and the expression evaluates to greater than 6. This is a very important point in embedded systems where unsigned data types should be used frequently (see reference 2). If you get this one wrong, then you are perilously close to not being hired.
This is all well and good, but what should one do about this? Well you can pore over the C standard, run tests on your compiler to make sure it really does conform to the standard, and then write conforming code, or you can do the following: Never mix signed and unsigned integers in an expression. I do this by the use of intermediate variables. To show how to do this, consider a function that takes an int ‘a’ and an unsigned int ‘b’. Its job is to return true if b > a, otherwise it returns false. As you shall see, this is a surprisingly difficult problem… To solve this problem, we need to consider the following:
  • The signed integer a can be negative.
  • The unsigned integer b can be numerically larger than the largest possible value representable by a signed integer
  • The integer promotion rules can really screw things up if you are not careful.
With these points in mind, here’s my stab at a robust solution
bool foo(int a, unsigned int b)
{
 bool res;

 if (a < 0)
 {
  res = true; /* If a is negative, it must be less than b */
 }
 else
 {
  unsigned int c;
  c = (unsigned int) a; /* Since a is positive, this cast is safe */
  if (b > c)            /* Now I'm comparing the same data types */
  {
   res = true;
  }
  else
  {
   res = false;
  }
 }
 return res;
}
Is this a lot of work – yes. Could I come up with a more compact implementation that is guaranteed to work for all possible values of a and b – probably. Would it be as clear – I doubt it. Perhaps regular readers of this blog would like to take a stab at producing a better implementation?