NOTE: This article reflects changes in Bash++ as of version 0.3.0.

One of the most important goals for Bash++ is that it should behave in the way that the programmer expects it to. In large part, this means keeping the general “feel” of Bash (a kind of intangible goal) even as we’re adding new features. Another part of this is making sure that those pieces of its syntax which are new (that is, the object-oriented pieces) behave as much like their counterparts in other languages as possible.

Sometimes it’s difficult to say whether a given behavior is a bug or a feature. For example, in Bash++, the following code is perfectly valid:

@Object* obj="Hello, world!"

Of course, trying to use @obj as an object after this will give you tons of runtime errors – so why is this allowed?

Well, Bash++ pointers are considered primitive types, and so they can be assigned any primitive value. In one sense, this should be viewed as a natural consequence of the rules of Bash++. In another sense, however, this is a bug, because other object-oriented languages would insist on a type check here, and so the programmer would probably expect one.

Then again, let’s think for a moment about our goal of preserving the “feel” of Bash. When you call a program using your shell, you pass it lots of primitives as arguments:

program-name /home/user/file arg2 arg3

The shell doesn’t care what you pass it, and it doesn’t check to see if the arguments are valid. It just passes them along. The program might expect that the first argument should be a file path, but your shell has no idea whether or not the string /home/user/file is a valid path, whether the file exists, or even whether the program expects a file path at all. Your shell just lets the program handle it.

Consider the following Bash++ code, in which a method accepts a pointer:

@class Object {
	@public member="default value"
}

@class Example {
	@public @method display @Object* obj {
		echo "Object member: @obj.member"
	}
}

@Example example
@Object object
@example.display &@object

It’s the responsibility of the method to ensure that the pointer @obj points to a valid Object – not the responsibility of the caller. The same as it’s the responsibility of a program to ensure that the arguments passed to it are valid, not the responsibility of the shell.

Well, despite all this, Bash++ v0.2 doesn’t offer any particularly nice way for that method to check whether that pointer is valid. Surely we have to give the method the tools that it needs to do its job, right?

C++

It seems to me that the obvious way to start here is to check what everyone else is doing. Let’s take a look at C++. More importantly, let’s outline a related case which is much more obviously a bug in Bash++:

#include <iostream>

struct Base {
	virtual void A() {
		std::cout << "Base class" << std::endl;
	}

	void B() {
		this->A();
	}
};

struct Derived : public Base {
	void A() override {
		std::cout << "Derived class" << std::endl;
	}
};

Two classes: Base and Derived.

Base has a method A which prints "Base class". Base also has a method B which just calls A.

Derived inherits from Base and overrides A to print "Derived class", but does not override B.

int main() {
	Base base_object;
	base_object.B(); // "Base class"

	Derived derived_object;
	derived_object.B(); // "Derived class"

Here, we only call the B method – the one which was not overridden. When we call B on a Base object, we get "Base class", and when we call B on a Derived object, we get "Derived class".

Let’s take a look at what happens in Bash++ v0.2 when we do the same thing:

@class Base {
	@virtual @public @method A {
		echo "Base class"
	}

	@public @method B {
		@this.A
	}
}

@class Derived : Base {
	@public @method A {
		echo "Derived class"
	}
}

@Base base_object
@base_object.B # "Base class"

@Derived derived_object
@derived_object.B # "Base class" -- BUG!

The same configuration, but this time, we get "Base class" for both the Base and Derived objects. Why and how does this happen?

It happens when we generate code at compile-time. As the compiler scrolls through the class definition for Base and gets to method B, it sees the reference to @this.A and generates the code once and only once – the generated code points to the Base class’s version of A, because that’s what the compiler found when it searched for @this.A at compile-time.

In C++, the this keyword is a pointer to the object that the method is being called on. When the B method is called on a Derived object, the this pointer points to the Derived object, and so the A method that is called is the Derived class’s version of A.

In Bash++, the @this keyword is also a pointer to the object that the method is being called on – however, we don’t store any information about the object’s type at runtime.

You’ll note that if we were to override the B method in the Derived class, we would get the expected behavior:

@class Base {
	@virtual @public @method A {
		echo "Base class"
	}

	@virtual @public @method B {
		@this.A
	}
}

@class Derived : Base {
	@public @method A {
		echo "Derived class"
	}

	@public @method B {
		@this.A
	}
}

Even though it shares exactly the same code as the Base class’s B method! Because this time, when we again resolve the reference to @this.A at compile-time, we find the Derived class’s version of A.

Let’s go even a little further, and continue the main() function from our C++ example:

// ...main...
	Derived* from_base_to_derived = reinterpret_cast<Derived*>(&base_object);
	from_base_to_derived->A(); // "Base class"

	Base* from_derived_to_base = &derived_object;
	from_derived_to_base->A(); // "Derived class"
}

Here, we’re declaring a couple of pointers of the wrong type, and then calling a method on them. In C++, this is perfectly valid, and the program will compile and run without any issues. The reinterpret_cast operator is just a way to tell the compiler “relax, I know what I’m doing.”

Again, C++ is perfectly capable of finding the correct method to call at runtime, even when the pointer is of the wrong type. In Bash++, however, this is not the case:

@Derived* from_base_to_derived=&@base_object
@from_base_to_derived.A # "Derived class" -- BUG!

@Base* from_derived_to_base=&@derived_object
@from_derived_to_base.A # "Base class" -- BUG!

Again, we’ve tried to discover everything at compile-time, and so we’ve failed to find the correct method to call. The compiler believes you that the first pointer points to a Derived object, and the second pointer points to a Base object. It never double-checks or second-guesses, and it calls the versions of the methods that you told it to call.

This is very clearly a bug – even the Bash++ compiler itself (which is written in C++) depends very heavily on the ability of the language to infer the correct method to call at runtime. This is, I would say, a pretty fundamental feature of object-oriented programming.

Here’s an example from the Bash++ compiler itself:

void BashppListener::exitObject_address(BashppParser::Object_addressContext *ctx) {
	// ...
	std::shared_ptr<bpp::bpp_object_address> object_address_entity = std::dynamic_pointer_cast<bpp::bpp_object_address>(entity_stack.top());
	if (object_address_entity == nullptr) {
		throw internal_error("Object address context was not found in the entity stack", ctx);
	}

	entity_stack.pop();

	// Add the object address to the current code entity
	std::shared_ptr<bpp::bpp_code_entity> current_code_entity = std::dynamic_pointer_cast<bpp::bpp_code_entity>(entity_stack.top());
	if (current_code_entity == nullptr) {
		throw internal_error("Current code entity was not found in the entity stack", ctx);
	}

	current_code_entity->add_code_to_previous_line(object_address_entity->get_pre_code());
	current_code_entity->add_code_to_next_line(object_address_entity->get_post_code());
	current_code_entity->add_code(object_address_entity->get_code());
}

In the above sample, we’re handling code such as &@object, which is expected to return the address of an object. We have a large family of classes which inherit from the same base class: bpp_entity. First, we use a dynamic_pointer_cast to check that the top of the stack is a bpp_object_address, and then we use another dynamic_pointer_cast to check that the second-to-top of the stack is a bpp_code_entity. Both of those checks already require information about the object’s type to be available at runtime.

Secondly, we interact with the bpp_code_entity at the top of the stack. bpp_code_entity (an entity which can contain code – such as the program, a method, or a supershell) is itself a parent class of many derived types of code entities, which each may have different versions of the add_code methods. It’s important that we don’t have to know the exact type of the bpp_code_entity at compile-time, because we don’t know what kind of code entity we’re going to be adding code to. We can just call the add_code methods, and the correct versions will be called at runtime.

vTables

In C++, the way that the compiler knows which version of a method to call is by using a vTable. A vTable is a table of function pointers which is created for each class that has virtual methods. Each class is given a vTable, and each object is given a pointer to that vTable. When a method is called, the compiler checks the object’s pointer to find the right vTable, and then uses that vTable to find the correct method to call.

Well! This is a pretty good solution, isn’t it? It’s a little bit of overhead, but it’s a very clean way to solve the problem.

We can implement it fairly easily as well. We can add a vTable to each class, and then add a pointer to that vTable to each object. When a method is called, we can look up the method in the vTable and call the appropriate function.

When we’re generating code for each class:

/* src/bpp_include/bpp_program.cpp: bool add_class() */
// ...
// Declare the vTable
std::string class_vTable = "declare -A bpp__" + name + "____vTable\n";

// ...

// Add the methods
for (auto& method : class_->get_methods()) {
	// ...
	// Add the method to the vTable if it is virtual
	if (method->is_virtual()) {
		class_vTable += "bpp__" + name + "____vTable[\"" + method_name + "\"]=\"bpp__" + name + "__" + method_name + "\"\n";
	}
}

And we have to make sure to have a function we can call at runtime to look up methods in the vTable:

function bpp____vTable__lookup() {
	local __objectAddress="$1" __method="$2" __outputVar="$3"
	([[ -z "${__objectAddress}" ]] || [[ -z "${__method}" ]] || [[ -z "${__outputVar}" ]]) && >&2 echo "Bash++: Error: Invalid vTable lookup" && exit 1
	while : ; do
		if ! eval "declare -p \"${__objectAddress}\"" &>/dev/null; then
			break
		fi
		[[ -z "${!__objectAddress}" ]] && break
		__objectAddress="${!__objectAddress}"
	done
	local __vTable="${__objectAddress}____vPointer"
	if ! eval "declare -p \"${__vTable}\"" &>/dev/null; then
		>&2 echo "Bash++: Error: vTable not found for object '${__objectAddress}'"
		exit 1
	fi
	local __result="${!__vTable}[\"${__method}\"]"
	[[ -z "${!__result}" ]] && >&2 echo "Bash++: Error: Method '${__method}' not found in vTable for object '${__objectAddress}'" && exit 1
	eval "${__outputVar}=\$__result"
}

This function will be called in the form bpp____vTable__lookup {object-address} {method-name} {output-var-name}. Once we have the method name, we can call it like this:

${outputVar} {object-address} {arguments}

Where Bash will automatically expand ${outputVar} for us. We just have to make sure to call this properly whenever the compiler spots that we should be looking up a method in the vTable:

/* src/listener/BashppListener.cpp: code_segment generate_method_call_code() */
// Is the method virtual?
if (assumed_method->is_virtual()) {
	// Look up the method in the vTable
	result.pre_code = "bpp____vTable__lookup \"" + reference_code + "\" \"" + method_name + "\" __func" + std::to_string(program->get_function_counter()) + "\n";
	result.post_code = "unset __func" + std::to_string(program->get_function_counter()) + "\n";
	result.code = "${!__func" + std::to_string(program->get_function_counter()) + "} " + reference_code + " 1";
	program->increment_function_counter();
} else {
	// Call the method directly
	result.code = "bpp__" + assumed_class->get_name() + "__" + method_name + " " + reference_code + " 1";
}

return result;

Note: One thought that’s occurred to me is that we could be using a runtime stack instead of these temporary variables (like ‘__func{number}’ in the above code) to be more efficient. The trouble with that is our current inability to absolutely guarantee that the post-code which unsets the temporary variable will be run – which, if we were to use a stack, could screw us over and lead to wacky and unexpected behavior. Any developers who see solutions to problems like these: contribute to Bash++!

And that’s it! We’ve implemented vTables in Bash++, and finally made our @virtual methods work as expected. We can now call the correct method at runtime, even when the pointer is of the wrong type.

But wait – in the beginning of this article, we were talking about type checking, and how we should be giving our methods the tools that they need to do their jobs. What does it take to do this?

Fortunately, once we have vTables, dynamic type checking becomes an absolute breeze. We only need to add one extra entry to each class’s vTable:

/* src/bpp_include/bpp_program.cpp: bool add_class() */
// ...
// Add the class's parent to the vTable
if (class_->get_parent() != nullptr) {
	class_vTable += "bpp__" + name + "____vTable[\"__parent__\"]=\"bpp__" + class_->get_parent()->get_name() + "____vTable\"\n";
}

A pointer to the parent class’s vTable – which serves as a link in an inheritance chain that we can find at runtime. Now we can add a @dynamic_cast operator to Bash++. This operator will perform a runtime check, using this information, to verify that the cast is valid, and return @nullptr if it is not.

@Derived derived_object
@Base* base_pointer=@dynamic_cast<Base> &@derived_object

if [[ @base_pointer != @nullptr ]]; then
	@base_pointer.A # "Derived class"
fi

And all is well!