Thu
13
Aug 2009
Today I want to write a bit about a part of my library containing classes that represent binary data streams. My concept of streams is based on Delphi approach and maybe also Java. I use it for many years and I think it's very convenient. General assumtions are: I use object-oriented approach with base, abstract Stream class and virtual methods. I report errors using C++ exceptions. Of course they both cause some performance overhead, but flexibility and good error control has higher priority for me. Reading and writing disk files is orders of magnitude slower than CPU computations anyway :)
Streams in C++ standard library have separate classes for input stream (stream that can be read) and output stream (stream that can be written). I don't like this idea. First, disk files can be opened for both reading and writing, which forces basic_iostream class to use multiple inheritance and basic_ios to be inherited virtually. Second, I rather see opening a file for writing and/or reading as a mode, wchich is more suitable for constructor parameter than for separate classes. I prefer different division instead. I distinguish simple streams - these that can be only read and/or written and "seekable" streams - these that can also tell their data size as well as get and set current cursor position.
There are still two issues related to data streams that have to be considered. First is an end of stream concept. Standard library of C, Java and many other languages handle this in a way that looks totally weird for me. You can know about reaching end of file only if you have already tried to read past the end and that operation failed. Another strange thing is reading single byte (getc function) but returing type int instead of char, where value EOF == -1 means end of file. Maybe that's why some (beginner) programmers think that there is a magical "end of file" character standing at the end of every file, while the truth is that file has just its remembered size in bytes and each byte can have any of 256 possible values.
My approach for handling end of file is to implement bool End() method which returns true if the cursor is at the end of stream so that no more bytes can be read (cursor position == data size). Of course I have standard Read method which tries to read as many bytes as possible up to given buffer size and returns number of bytes read. It's convenient when you want to process a big file without loading it all to memory, using a buffer of constant size.
But I believe that more often you want to read a single value (like a number or a header structure) and you expect this read operation to succeed. If you do I/O with standard C, C++ or WinAPI functions, admit to yourself now whether you check for success of any read operation as well as number of bytes actually read? If you don't, then analyze what happens if you are already at the end of file and you do the following code:
unsigned elemCount; fread(&elemCount, sizeof(elemCount), 1, myFile); // What's the value of elemCount now and what will happen next?! Vector3 *vectors = new Vector3[elemCount]; fread(vectors, sizeof(Vector3), elemCount, myFile);
That's why I also implement MustRead method, which throws an exception if expected number of bytes couldn't be read. I also expect every write operation to fully succeed by writing all bytes given or to throw the exception if it couldn't be done. Thanks to that I can be sure that I will always know about any failure in reading or writing any single value.
Second thing that has to be stated is whether any read/write operation can fail (or just not read all expected bytes) "for now", but succeed later. For example that's the case for network sockes, where write buffer can be currently full and read buffer can be empty so you have to wait for a while to be able to write/read more bytes. I assume that my streams cannot work this way, so I can't implement socket as a class derived from my Stream. It simplifies my streams because now I know that if I can't read more bytes, I've just reached end of data.
That's it for now. This entry is long enought :) I'll continue this streams topic next time. Let's say the interface of my base stream classes looks like this:
class Stream { public: virtual ~Stream() { } virtual void Write(const void *Data, size_t Size); virtual void Flush() { } virtual size_t Read(void *Data, size_t MaxLength); virtual void MustRead(void *Data, size_t Length); virtual bool End(); ... }; class SeekableStream : public Stream { public: virtual size_t GetSize(); virtual void SetSize(size_t Size); virtual int GetPos(); virtual void SetPos(int pos); virtual void SetPosFromCurrent(int pos); virtual void SetPosFromEnd(int pos); ... };