Introduction to Endianess

Warning! Some information on this page is older than 5 years now. I keep it for reference, but it probably doesn't reflect my current knowledge and beliefs.

# Introduction to Endianess

Wed
31
Mar 2010

For those of my readers who don't know this subject yet, here is my brief introduction to what the endianess is. It's simply the way of storing numbers that span across many bytes in the computer memory, file or network. There are two possibilites: Big-Endian and Little-Endian. It is (or at least should be) always clearly stated which endianess is in use in particular hardware platform, file format or network protocol.

But let's start with something simple. As we all know, a byte has 8 bits and we usually imagine it as a sequence of zeros and ones ordered from the most signififant bit to the least significant bit. So for example, if we have an 8-bit unsigned integer number, we could write it this way:

201 = 0b11001001

BTW, new the Java version 7 will have the ability to understand such binary numbers starting with 0b. I wish the creators of C++ allowed this instead of totally unuseful and misleading octal numbers starting with just 0 :(

Back to the point... As byte is the smallest piece of data that can be addressed and processed, we don't care in what order does the memory or file store the bits inside of it. But as a number spans across multiple bytes, things get a little more complicated. Let's say we have a 32-bit unsigned integer number:

20100331 = 0x0132B4EB

In all notations we as humans and programmers use to show numbers, whether it has decimal, hexadecimal, octal or binary radix, we start from the most significant digit. So it would be natural and intuitive to think that compuler stores such 32-bit number as such consecutive four bytes (shown as hex numbers) :

01 32 B4 EB (Big-Endian)

But it's often not the case. The order in which the most significant byte of a multibyte value comes first is called Big-Endian and it's used in some of file formats, network protocols or as the native way of keeping numbers in memory on some hardware platforms (like Xbox 360). But our PC-s natively use Little-Endian, which means they store the bytes of multibyte values in the opposite order, where least significant byte comes first:

EB B4 32 01 (Little-Endian)

So if you write a piece of C++ code that assigns unsigned x = 0x0132B4EB, saves this variable to a file: fwrite(&x, sizeof(unsigned), 1, myFile); and then see this file in some hex editor, you will see bytes stored in the order opposite to the natural one - EB B4 32 01 - because PC is a Little-Endian platform. It looks same way in the RAM memory when you look under the address where variable x is being kept.

The good news is that you don't always have to remember about all this endianess stuff. When you just use your multibyte numbers (short, int, __int64, float, double) in memory, store them in files and transfer across the network and your application is compiled only under platform with same endianess (like the PC, no matter if it's Windows or Linux), you don't even have to think what's the byte order of these values.

But there are some cases when you have to think about the endianess, like when you address the memory or file byte-after-byte while trying to interpret its data. You sometimes even have to convert the endianess. For example, you may prepare some data to be read under Xbox 360 or you code support for a file format or a network protocol that use Big-Endian for its numbers.

How to do such conversion? The multiplatform sockets API provides some functions that convert 16-bit "short" and 32-bit "long" numbers between so called "host" byte order (the order used under current platform) and "network" byte order (which is Big-Endian, used by IP, TCP, UDP etc.). They are: htons, ntohs, htonl, ntohl.

I've also included some functions for endianess conversion into my CommonLib. library. First, here is how I swap bytes in 16-bit, 32-bit and 64-bit numbers. It's quite simple and just based on bit shifts.

inline void SwapEndian16(void *p)
{
  uint2 &u = *(uint2*)p;
  u = (u << 8) | (u >> 8);
}
inline void SwapEndian32(void *p)
{
  uint4 &u = *(uint4*)p;
  u = (u << 24)
    | ((u & 0x0000ff00u) << 8)
    | ((u & 0x00ff0000u) >> 8) | (u >> 24);
}
inline void SwapEndian64(void *p)
{
  uint8 &u = *(uint8*)p;
  u = (u << 56) | (u >> 56)
    | ((u & 0x000000000000ff00ull) << 40)
    | ((u & 0x00ff000000000000ull) >> 40)
    | ((u & 0x0000000000ff0000ull) << 24)
    | ((u & 0x0000ff0000000000ull) >> 24)
    | ((u & 0x00000000ff000000ull) <<  8)
    | ((u & 0x000000ff00000000ull) >>  8);
}

I've also wanted to have functions to swap endianess of whole array of numbers in a single call:

void SwapEndian16_Array(void *p, uint count);
void SwapEndian32_Array(void *p, uint count);
void SwapEndian64_Array(void *p, uint count);

void SwapEndian16_Data(void *p, uint count, int stepBytes);
void SwapEndian32_Data(void *p, uint count, int stepBytes);
void SwapEndian64_Data(void *p, uint count, int stepBytes);

And finally I've defined more type-aware, overloaded functions that swap byte endianess regardless of type of the parameter passed. This way I can easily define new versions of this function for some of my custom types and semantics stays the same so I can use it in templates. Functions for single-byte types like bool and char are here for completeness and, as you can see, do nothing :)

inline void SwapEndian(bool             &v) { }
inline void SwapEndian(unsigned char    &v) { }
inline void SwapEndian(signed char      &v) { }
inline void SwapEndian(unsigned short   &v) { SwapEndian16(&v); }
inline void SwapEndian(short            &v) { SwapEndian16(&v); }
inline void SwapEndian(unsigned         &v) { SwapEndian32(&v); }
inline void SwapEndian(int              &v) { SwapEndian32(&v); }
inline void SwapEndian(unsigned long    &v) { SwapEndian32(&v); }
inline void SwapEndian(long             &v) { SwapEndian32(&v); }
inline void SwapEndian(unsigned __int64 &v) { SwapEndian64(&v); }
inline void SwapEndian(__int64          &v) { SwapEndian64(&v); }
inline void SwapEndian(float            &v) { SwapEndian32(&v); }
inline void SwapEndian(double           &v) { SwapEndian64(&v); }

Comments | #c++ Share

Comments

STAT NO AD
[Stat] [STAT NO AD] [Download] [Dropbox] [pub] [Mirror] [Privacy policy]
Copyright © 2004-2019