Bash++ 0.3.8 Changes

The last update was published around the release of Bash++ 0.3.2. Since then there have been a large number of small changes, bug fixes, and improvements. This post summarizes the most important changes in the last few releases. Bigger or more important changes are marked with a bold title.

We now skip comments directly in the lexer. This means that the lexer will not return any tokens for comments, and they will not be included in the AST. This is a significant change that improves performance and reduces memory usage.
We now allow taking the address of an object’s method. This means that you can now use the & operator to get the address of a method, which can be useful for passing methods as arguments to other functions.
- The return value of &@object.method is a function pointer to the method, followed by the object pointer as the method’s implicit first argument.
Patched a bug that caused vTable lookups to fail in rvalue self-references
Sped up the parser by removing left-side ambiguity between member declarations and method definitions by enclosing them in a shared parent rule
Added proper support for C-style arithmetic Bash for loops
Added proper support for Bash number ranges in the form {#..#[..#]}
Patched a bug in non-primitive object copies of the form @obj1=@obj2 in which the compiler failed to get the correct address of the object to copy
We now implicitly dereference pointers at runtime within methods. This change has strong potential to reduce compiler complexity without sacrificing too much performance.
Pointer declarations within methods are now declared local by default.
We now treat constructors and destructors as ordinary methods. Prior to this, they were considered specially by the compiler. They are now considered ordinary methods which are always virtual and public.
- This change simplifies things and fixes a few bugs. For example, before this change, instantiating an object of class A and copying its address to a pointer of class B (which is allowed in Bash++), and then calling @delete on the B pointer, would call the destructor of B instead of A. This is clearly incorrect behavior. Ensuring that the destructor is virtual means that the correct destructor will be called, regardless of the compile-time inferred type of the object.
- This also fixes a bug which prevented the programmer from overriding the constructor or destructor of a derived class. This is now possible, and the compiler will call the constructor or destructor of the derived class instead of the base class.
@new statements now properly call an object’s constructor before returning the pointer.

In addition, there are several planned features and changes for future releases:

The syntax for @include and @include_once will change, but will remain backwards compatible.
- The current syntax is @include "/path/to/file" (for absolute paths and paths relative to the source program) or @include <path/to/file> (for files under the compiler’s include paths).
- The new syntax will enable you to optionally specify whether the included file should be linked statically or dynamically. At the moment, there is a single global flag in the compiler which determines whether all included files are linked statically or dynamically. This is not ideal, as it means that you cannot mix static and dynamic linking in the same program. The new syntax will allow you to specify this on a per-file basis.
- The new syntax will also allow you, in the event that it’s a dynamic include, to optionally specify where exactly the compiled version of the included file will be found at runtime. At the moment, the compiler just assumes that the compiled version of the included file will be found in the same directory as the source file, with the same name, but with a .sh extension. This assumption will likely remain the default in the event that you do not specify anything. However, if you do specify a path, the compiler will use that path instead.
- The new syntax will be: @include [dynamic|static] {PATH} [as "PATH"]. For example, @include dynamic "includes/foo.bpp" as "/usr/local/lib/foo.sh". It’s important to note that @include "includes/foo.bpp" will still be valid syntax.
I’m considering dropping the requirement for member declarations and method definitions to specify scope. This would mean assuming some default scope (likely @public) in the event that none is specified.
I’m also very strongly considering a full rewrite of the lexer and parser. The current implementation is a bit of a mess, and I think it would be better to start from scratch. This may be necessary in order to patch a (relatively major) bug with supershells, in which supershells are evaluated even when not needed.
- Up to now, the lexer and parser have been written with the goal of parsing as little Bash as possible, focusing on Bash++-specific syntax. A very oversimplified description of this process is: “if I don’t understand it, and I don’t see any indication that I should understand it, then I’ll just assume it’s valid Bash and pass it through unchanged.”
- One major downside of this has been the introduction of the need for a lot of backtracking and lookahead in the parser, which slows things down.
- Another (bigger) problem with it is this: because supershells are so foreign to Bash, implementing them in a way that guarantees that they’ll always work in exact compliance with the spec requires some logical analysis of the code that uses them. For example, how can we guarantee that the supershell in command1 || command2 "@(supershell as an argument)" will only be evaluated if command1 returns false, unless the parser understands the syntax of a Bash command? The answer is that we can’t. Skipping parsing Bash was a great way to get started, but we’re coming up against the limits of this approach.
- Such a rewrite would likely also result in a massive performance improvement, but it’s also guaranteed to be a ton of work, so I’ve been putting it off for a while.