M station --> Erik de Castro Lopo Interview feb 03

Interview - Erik de Castro Lopo, libsndfile, libsamplerate
OO yes, C++ No .. well, Maybe

The decision of whether or not to use OO can be a vexing one. Quite often the decision is made on the grounds that C or C++ is already known. And also quite often the choice is justified after the fact by making vague statements that amount to religious belief.

If you don't have experience of both it's hard to make rational decisions. Earlier we talked to Richard Bown (Rosegarden) about some positive aspects of OO and here we talk to Erik de Castro Lopo (libsndfile, libsamplerate) about some more aspects, particularly with C++ and C, and testing.
libsndfile
Secret Rabbit - libsamplerate

Mstation: What do you think is good about OO in general and C++ in particular?

Erik: I think OO is such a very important way of approaching the thinking about and tackling of programming problems. OO encourages the programmer to build complex entities (objects) out of more simple entities (ints, floats, other objects etc) and then define how that object behaves.

Once the behaviour of an object has been defined, it can then be treated as a black box. The person using the object does not need to know anything about the internal workings of the object to use it. In addition, having a blackbox with defined behaviour allows you to perform validation testing of the black box. (I am a real booster of methodical and consistent software testing which is an aspect of software development which I think is very much neglected both in Free Software/Open Source and commercial development.)

When approaching a programming project, one of the first steps should be to partition the problem into sub problems. Many of these sub problems then turn out to be objects. Suddenly one big problem seems like a number of smaller, more simple problems.

OO is a very powerful concept and I really cannot imagine working without it.

On the other hand C++ I don't have many good things to say about. When I have to choose between standard C and C++ I only choose C++ when I really must have operator overloading. Operator overloading is the one thing I can't really do in C. Even templates can can be done in C using GNU Autogen.

I know this is heretical, but I think standard C is a better language for doing OO programming than C++ because C doesn't railroad me into doing OO programming in a particular way.

Would I be right in saying that one of the earlier instances of this practice was the Windows OS (3.1??).

Probably not :-). I believe that the Standard C fopen/fread/frwite/fseek/ fclose family of functions, which probably pre date microsoft the company, are an OO design. The object is the FILE* pointer and the functions are the methods.

Actually coding in an OO style only impacts on project management and design doesn't it. Is that right?

Yes, OO is all about design and using OO design as a tool to make the task of coding and maintainig software easier.

I wouldn't mind expanding on your early statement about design. It's noticeable that people who work with OO usually say something along the lines that you have... which means something to other people who have already converted to that method of thinking but is quite meaningless to those who haven't. Do you think you can summarise the idea while abstracting from polymorphism and inheritance?

Well lets look at a novice programmer who is assigned the task of writing a program to perform some task. Lets also say that the task requires the use of a variable sized array, an array length and a current index.

In the traditional stuctured programming paradigm, the array, its length and the current index are defined/allocated and treated as separate entities and manipulated accordingly. If the array needs to be passed to subroutine, the program will pass the array, length and current index to the subroutine.

Now what happens when changing requirements means that this program now needs to deal with two arrays with differing lengths and indices. How does our novice programmer move forward from where he is now? Well most of them will name their two arrays as ARRAY_1 and ARRAY_2, the lengths as LEN_1 and LEN_2 and their indices as INDEX_1 and INDEX_2. However this suddenly creates a new potential problem. If our novice ever makes the mistake of passing ARRAY_1, LEN_1 and INDEX_2 to the function defined earlier they will have a potentially difficult to track down bug. This problem will also get worse every time another array is added to the program.

Fortunately, there is another way of dealing with the above problem; the Object Oriented (OO) way. If our novice programmer was a little more experienced and had a little exposure to the OO way, they would immediately have seen that the array, its length and the current index are so closely tied together that they should be treated as a single entity, a list, containing subcomponents of the array, its length and its current index. The way this is done is language dependant, but most languages have a way of grouping a number of variables together and then treating the group as a single variable. In C and C++ this can be done using structs, in Perl or Python you could use lists or dicts, in Pascal records etc. All of this without ever using any of the built in OO features of the language.

Now that the group of variables is an object, the programmer should start by defining a set of functions for manipulating these objects (ie object methods). For our example above, the novice programmer might decide they need a method for returning the Nth object in the array, another for returning the first array entry from the start which matches a certain criteria, another for inserting an item at index N and so on. The object methods that need to be defined usually depend on the application.

Once the object and the methods have been defined, adding an extra array to the example above is trivial as they are all self contained. If the array, its length and current index needs to be passed to a function, these items are no longer passed separately; the programmer simply passes the whole object.

Obviously, this was a much simplified example, but notice that the language used to describe what was happening was completely language independant. The obvious conclusion, is that the most important aspects of OO design and programming can be performed in languages which do not have any in built OO support.

So moving on to polymorphism and inheritance. Polymorphism is where you have a number of different object types (ie classes) with the same set of methods. This then allows a programmer to write an algorithm using one object type and then change the object type later. If the two objects have the same methods, the algorithm should give the same results. So in our previous example, it would be possible to replace an array based storage object with a linked list based one. As interesting as this is from a CompSci point of view, I really don't think that this is an important aspect of OO design and applications programming.

Inheritance is where you have a object type with behaviour defined by a set of methods as a base object type and then define a new object type which "inherits" all the properties of the base object type and then has new properties and methods defined on top of these. The base methods can also be overridden to redefine how the object behaves.

One of the areas where inheritance can really help is the design of GUI toolkits. The Win32 API (coded in C at the lowest level) is a good example as are Qt (C++) and GTK+ (pure C). For each API, there is some base class called a window or widget. The base class is not really very useful of itself, but other windows/widgets inherit their default behaviour from the base class and then override or extend that behaviour to provide buttons, listboxes, menus progress bars and whatever else.

Maybe next, we can explore what you were saying on LAD about the downside of OO programming, which as far as I understood (well, "understand now" is probably more truthful), was about the defects of C++. Do these defects impact on the design process or are they merely a pain?

I don't actually believe that there is any downside to OO design and programming, at least none that I am aware of. I do however think that C++ has a number of downsides as a programming language and that these downsides are worse than any upsides it may have in comparison to C. Anyone who is reading this should note that this is my opinion. I used C++ for a number of years as my main programming language and then gradually drifted back to standard C.

OK, lets lets look C++. C++ started with the C programming language as a basis and then added a bunch of features intended to make OO programming easier. These features include classes, function and operator overloading, templates, exception handlers etc.

My main beef with C++ is that the code is so damn unreadable in comparison to standard C. Code that is hard to read is more likely to have hidden bugs and is more difficult to maintain. Compare C++ code with templates and whatever everywhere with something really clean and highly readable like Python. There's simply no comparison.

Another problem with C++ is that it is so complex that few programmers use all of the features of the language. Just grab any two programmers who have never worked together before and you will find that they use two different subsets of C++. When you have multiple programmers using different subsets of the language on the same project you have trouble with maintainability and lack of code uniformity. If you compare that with C, which is a far more simple language to begin with, you will find that most experienced C programmers have used just about every feature of the language at one time or another.

I also think that there is a real problem with the way classes are defined in C++ and this is a problem which really does impact design. The problem is that you are forced to define the private data members and functions at the same time as the public ones. This is OK for the trivial C++ examples you see in textbooks but as soon as you REALLY want to hide the private information you end up defining the class to have a single private void pointer which gets something allocated to it in the implementation. This is basically the same thing you do when doing OO programming in C, so where is the benefit of C++?

On top of that you have the iostream disaster. As soon as you start using C++, you get this huge, monstrous header file shoved down your throat. Yes, it does allow you to define cout and cin compatible printing methods for your object, but most objects are way too complex for this to have any benefit. I also think overloading the << and >> operators for use with iostreams was a huge mistake; those operators should have been reserved for bit shift operationes. Taking all the above into consideration, I much prefer the standard C printf function, even when I am coding in C++.

When I raise issues like the above with C++ coders they usually bring up issues like "C++ has better error checking" and "C++ has better type safety". While this may have been true once it isn't any more. If you are using a good modern C compiler like GCC you can match the standard C++ level of error checking by turning on warning messages. When I compile my own C code with GCC I turn on a whole bunch of warning messages and go out of my way to fix the code so the warnings disappear.

Apart from classes, the other OO aspects of C++ are overrated. Function overloading is mildly useful but can be easily done without. Likewise for operator overloading, but operator overloading can be really badly abused to make C++ code even more unreadable. Templates are also mildly useful but the template definition code has some real readability problems.

Finally, exceptions are a nice idea but if the throw is defined too far away from the catch you have more readability problems. In C, you do the same thing by returning error codes up through the call stack, checking for error conditions and acting accordingly. The C approach may take a little more time for the programmer but it forces the programmer to think more about the consequences of errors deep inside the code. Thinking about the consequences and doing something sensible makes for better quality code.

Moving on to the design issue I think that C++ lulls programmers into a false sense of security. Its really easy to write a large chunk of code with objects left, right and center and not no what is going on behind the scenes. For instance, many programmers don't consider the overhead of object creation and destruction. If the object uses dynamically allocated memory (ie new/delete) and you are creating objects on the fly while your algorithm is running, the speed impacts can be significant. This is especially important in areas like audio signal processing done in real time. With C, allocation and deallocation are much more explicit and programmers are less likely to fall into this trap.

C++ is also pushed because it supposedly assists or helps code reuse. The usual example is a class header file and implementation file can be developed in one project and then copied to another. I agree that yes, this is possible. Its also possible with well designed C code. However, from a code and project management point of view it's a really bad idea. What happens is that you suddenly have two pieces of code which are identical to begin with but without any method of ensuring that they remain in sync. In my opinion, real code reuse means putting code in a shared library and linking both projects against the same library. The shared library should be treated as a separate project.

All in all, I just don't think C++ is worth the effort and I am much happier doing OO coding in C.

Are there any OO languages that you like to work with or like the look of but haven't worked with?

Well I do really like Python mainly because it is hands down the easiest to read computer language I have ever seen. The way classes are defined in Python is also nice even though all class members are public. Before I started with Python I used Perl for a number of years. That was a mistake which I rectified as soon as I started using Python.

As for other languages, I'm really not interested in using languages with a lower profile and user base than say Python.

You mentioned testing before. How does implementation differ in the OO context?

I believe testing is the most neglected part of programming. Even when people do testing, it is often haphazard, arbitrary and non repeatable. Testing must be made as comprehensive as possible, automatic (ie make check runs the full test suite), and must be part of the design process. People need to start thinking about testing during the design stage.

However, I will admit that testing GUI programs will obviously be far harder than testing command line aps or libraries. Since it is a long time since I last wrote anything of any size with a GUI I haven't really put my mind to the approach to testing such a beast.

I also think that testing methodology is pretty much independent of how a program might have been designed (ie OO design) and depends far more on how the application interacts with the user. For example, with libraries and command line apps, blackbox testing is probably the most sensible option; ie test that all valid inputs return a valid output and that invalid inputs return the error they should. Obviously for GUI apps, this is not really possible, so for these, subsystem tests make far more sense. Test as many subsystems and combinations of subsystems as possible. This will usually mean that the app needs to have been designed with a good separation of GUI and backend. This backs up my assertion that programmers need to think about testing at the design stage.

Testing also has benefits beyond simply ensuring that the software works correctly. With a comprehensive test suite in place, a coder can be much more adventurous when it comes to hacking the internals of an application ie refactoring. Refactoring is the restructuring of code to make it more readable, efficient and merge code with largely similar functionality so that the merged code can take the place of the original pieces.

As an example, I have completely refactored large chunks of the internals of libsndfile since its inception in 1999. These refactorings have made the code more efficient, understandable and easier to extend. Without the test suite I simply would not have been able to do this and libsndfile would have suffered as a result.

In "thinking about testing at the design stage", what should that entail and what action should be taken ... other than good things to do such as seperating out GUI code etc.?

The most important thing is to decide that yes, this code is going to be tested in a comprehensive, methodical and repeatable manner and to then follow through: write a test suite, maintain it, keep it up to date to test the latest features and to run it whenever changes to the code are made.

So, comprehensive means that as much as possible of the code is tested by the test suite. Methodical means that if feature B of the library/ applications depends on feature A, feature A will be tested and passed before attempting to test feature B. In addition, the test suite should test the simplest features first and more complicated features later. If any test fails, the test suite should halt there and print out as much useful debugging information as it can. Finally, repeatable means that if the test suite fails because of a bug in the test target, the test fails every time the test is run. Pushing random data at the test target is a bad idea because failures may not be repeatable. If they are not repeatable, they are difficult to track down.

Another important time to think about testing is when you are coding. For instance, when you come across a piece of code where there are important corner cases, you should consider the possibility of adding a test to make sure that these are handled correctly. Secret Rabbit Code (aka libsamplerate) had a really nasty set of these corner cases and I actually shipped the first public version with a bug in this area. The test suite has now been updated to always catch this possibility.

When looking at the mechanics of testing, I don't think there can be a general approach because testing is pretty much application specific. For example, in libsndfile, during debugging of parsers for new file formats, I needed to be able to figure out how far an example file had been parsed successfully and where it had gone astray. Originally I was doing this with printf() statements within the code. Of course before the new code shipped, all these printf() statements had to be removed. Then later, when I suspected there was a bug in the parser, I had to add the printf()s back in. Obviously this was a major pain in the neck. The solution, was adding a logging buffer to an internal data structure and writing a log_printf() function which workes like a printf() but writes its output to the internal log buffer. I also added a little extra code to allow the log buffer to be retrieved by any application. The real beauty of this solution was that the debug code is always part of the application and it is useful for other things as well. For instance, testing. It is now possible to write tests that retrieve this log buffer and ensure that it does or does not contain a particular piece of information.

A more general example would be thinking about how errors are handled and reported throughout the library/application and how those errors are reported to the user. The way I work in my two libraries is that every error condition has a unique non-zero integer associated with it. These error values are not in themselves made available to the library user. Instead, the library supplies a function which converts the integer error value to a string which the caller can either print to stdout/stderr or display in some sort of GUI component. This has the added advantage that when two applications using libsndfile get the same error back from the library, they are likely to have the same error message. In addition, in the test suite, I have a test which makes sure that there is a error string corresponding to every valid error value. It is also possible to test that when you force an error an error does occur and that the error string is correct.

Deciding on how errors are handled and reported also makes coding easier. Consider a situation where a coder is deep in some really hairy code and finds a spot where there is a possiblity of an important error occuring. If there is no code base wide policy for handling this, the coder has to come up with something on the spot. However, if there is such a policy, following it (for example by adding an extra error code and error string) will take five seconds allowing the coder to continue with the task at hand.

Testing is really not difficult, but neither is it particularly exciting or glamourous. However, if you decide at design time that the code must be testable, make small concessions for testability during coding and spend the time to write the test suite and fix any bugs, you will end up with a far more robust and maintainable piece of code. If the code is robust and maintainable it means that the coder can spend less time debugging more time adding features and improvements or actually using the code.

Any final comments??

I just like to send out some thanks. There's the obvious one to RMS and Linus and all the other coders. I also want to thank people who use libsndfile and libsamplerate, especially those who have requested new features or provided bug reports. Conrad Parker (Sweep) and Dominic Mazzoni have been especially useful for kicking around ideas on new features.

There's also a special thanks for the people I worked with for 5.5 years at Fairlight ESP (makers of a very end high Digital Audio Workstation). I learnt a lot at Fairlight, especially from Chris Alfred.

Thanks a lot Erik.

After this was published there was a fair amount of discussion on the Linux Audio Dev mailing list. You can read it by visiting the archive. The subject was (OT) C++ flame war.

Back to top
Bookmark:
post to Delicious Digg Reddit Facebook StumbleUpon
Recent on Mstation: music: Vivian Girls, America's Cup, music: Too Young to Fall..., music: Pains of Being Pure At Heart, Berlin Lakes, music: Atarah Valentine, Travel - Copenhagen, House in the Desert

front page / music / software / games / hardware /wetware / guides / books / art / search / travel /rss / podcasts / contact us