Disassembling C++ Part 2 -- Objects

What is an object? This article is part two of my Disassembling C++ series.  The first one was here about overloaded functions, and mang...

Thursday, November 28, 2019

Disassembling C++ - Part 1 (Intro, overloads)

  Disassembling C++

Introduction


In this next series of articles, I will delve into the inner workings of the C++ and how it accomplishes its object oriented behavior.  It was after all, one of the first object oriented languages, and is one of the most evolved.  It was first released in 1985, but was actually designed in 1979 by Bjarne Stroustrup as a next generation of the popular C programming language developed by Bell Labs.  But enough history, you can look it up yourself on wikipedia or go to Bjarne's home page at http://www.stroustrup.com .  The purpose of this isnt to show you how to program in C++, but more how it works.  This is targeted for old timers like me who have used C most of their career and have gone kicking and screaming into object oriented land, or just the intellectually curious. The first thing is that C++ is C under the hood.  Its dressed up nicely and does some fancy object oriented tricks, but when it starts generating machine code, its more or less C.

 

Compile time or run time behavior


One of the biggest advantages to knowing how all of this works under the hood is that you can implement more compile time features and less run time ones.  If accessing an object takes a standard library to sort through a list of strings, it will be much slower than just taking a pointer the compiler passed in. It is not all compile time magic, some of it is handled in the low level libstdc++ library.  I like everyone else always assumed it was all just compiler tricks and never thought about it.  Then one day came as a hobbyist OS developer that I wanted to explore the APCI tables, and C++ looked like an obvious choice.  Thats where I not only learned how a lot of this works under the hood, but developed an appreciation for the tight integration of the library and compiler.

 

Overloads


One very useful feature is overloading.  You can use the same function name with different parameters.  For example:
int add(int n1,int n2)
double add(double n1,double n2)
In fact, you may have been bitten by it during compile time, especially with math functions.  The way that it accomplishes it is through a process called mangling.  The function is renamed according to its parameters and is the one used instead.  In the example two functions above those are two distinct functions.  This is a compile time feature and will have no effect on run speed, so overload away.  Lets show an example, take the following simple program:



To ensure that the correct function gets called, it translates return_value to two different function names based on the parameters.



Now wait a minute, why is main() named exactly like it is named? That is because it was declared as having "C" linkage.  To define a function as having C linkage, you just surround it with extern "C".  For example:



Notice that I declared the function extern "C" in its definition and not in the code of the function itself.  That way the compiler knows this function needs to be in the regular C namespace.  And consequently:


Mixed C and C++ namespaces can cause a lot of headaches.  nm is a wonderful tool to show you exactly what is going on in there.  To de-mangle the names, use the tool c++filt, or sometimes called cxxfilt on older systems.


c++filt can take text either through standard input as shown above, or as a command line argument.  The man page can explain it all.  Notice how the return values are not shown?  That is because you can only overload a function based on the parameters  and not the return value, evn though the mangled name includes the return value.
Remember how earlier I said that its all C under the hood?  Consider the following two files:



Notice we defined the external C++ function under its mangled name to be called from main inside the C namespace.  And sure enough:



You can call C++ code from within a purely C context if you know the overloaded function name.  Obviously there are much better ways to do this, but for our purposes it illustrates an important point.  Although you can think semantically that a function just operates two different ways, you are in fact calling two completely different and distinct functions.  This concept applies to classes too.  More to come in succeeding articles, we have barely scratched the surface.

No comments:

Post a Comment