The Road to 1.0: the Corporate, the Bloated, and the Trendy

Bash++ is sometimes misunderstood because of its name. That’s my own fault; I don’t keep up to date with modern trends and don’t use any social media, so I can’t always anticipate how something will be received in the cultural zeitgeist.

The name seems to give people an impression of something corporate, bloated, or trendy – Bash++ aims to be anything but that (although, I’m sure even my best efforts fall flat from time to time!). The name was obviously meant to be a nod to C++ (hence the tagline “Bash with classes”), and I think a lot of people also still misjudge C++ based on 1990s-era perceptions of it. I suppose that puts me in good company.

If you’re worried that Bash++ is corporate, I can assure you that I don’t represent any company and make absolutely no money by it. It’s free software done for free by human beings in their spare time.

If you’re worried that Bash++ is bloated, I would point out that the compiler and language server together only make up about 10,000 lines of C++ source (which is something almost unheard of in the world of compilers), and commits contributing negative code are common.

If you’re worried that Bash++ is “trendy” (by which I mean trend-chasing) – I don’t think so. The goal is pretty straightforward: add support for object-orientation without breaking backwards compatibility. It does seem like every other month there’s a new fashion in the programming world, but for myself, sometimes even several years go by without my learning about them. I still keep a paper calendar and work primarily in the terminal; I’m fairly set in my ways. I think it’s unlikely that a project of mine would be “trendy.”

The v0.8 series has been marked by preparations for a fully-stable v1.0 release. We’re not quite there yet, but development focus has been shifting to spec finalization, performance optimizations, and correctness improvements. Let’s cover the changes in the v0.8 series, and their three categories: the corporate, the bloated, and the trendy.

Note: please understand this was written with a smile, not malice

The Corporate

Removed --port, --socket from the language server

Previously, the language server had these options to serve over TCP or Unix sockets rather than stdio. As far as I could see, the only purpose of this would be for “enterprise deployment” / corporate users – which, to be honest, I have zero interest in supporting. I do not believe that the complexity of serving over a network is justified by the use case, although this may be re-added in the future if there’s a real demand for it.

The Bloated

Replaced internal table copies with lazy symbol lookups

Previously, each time a new entity was created in the compiler, it would inherit its parent entity’s maps of known classes and objects by copying them. This model enforced our scoping semantics implicitly, but was also very inefficient. Now, each entity records only those classes/objects that it owns and performs lazy lookups to its parent entities when necessary. In one local benchmark, a 2.5-million-line compilation which previously took 30 minutes now completes in 22 seconds.

Flattened our interval tree implementation for better cache locality

The compiler maintains an interval tree to track source code locations of entities. In particular, the language server likes to use this to determine context for features like “go to definition” and completions. Because of our particular invariants, we were able to flatten the tree into a single contiguous vector. This also resulted in massive performance gains.

Made AST node type queries non-virtual

This removed a layer of indirection when querying AST node types. This is maybe a case of premature optimization; realistically it only saves a few billionths of a second per query. But it’s a simple enough change and doesn’t hurt readability.

Developed XGetOpt for more efficient CLI option parsing

This change is not strictly about Bash++, although XGetOpt is now used by Bash++. I developed this header-only CLI option parsing library as a wrapper around getopt_long to provide a native C++ interface and avoid runtime costs for information which is knowable at compile time. Most importantly, the “help string” (the options list and descriptions printed with -h / --help) is fully-generated at compile time based on the (also compile-time) option definitions. This prevents runtime cost without permitting drift between the code and the help text. The XGetOpt header is vendored in the Bash++ source tree and is licensed GPL-2-or-later.

The Trendy

Bug fix in supershells: treat stderr exactly the same as subshell substitutions do

It’s best, wherever possible, to match Bash semantics and prevent surprises for users. Previously, supershells suppressed stderr output, while ordinary subshell substitutions did not. This divergence was a bug, and has been fixed in the v0.8 series.

Standard library: empty queries return exit status codes instead of strings

Containers such as stacks and arrays now write nothing to any output stream when queried for emptiness, and instead return an exit status code of 0 for “true” and 1 for “false.” It really should’ve been this way all along, but better late than never. This better matches ordinary Unix conventions.

Expanded and elaborated the language spec

A key focus recently is to finalize the language spec for v1.0. Certain behaviors which were previously implicit (or undefined) have been made explicit in the spec, and some edge cases have been clarified. Most importantly, the spec now defines exactly when destructors are called and who manages object lifetimes. Having a document that says what the behavior should be is necessary to make sure the compiler is behaving correctly.

Future Changes

There is still one key language feature missing from Bash++ that must be added before the compiler can be called “feature-complete.” That is: full support for parameter expansion on non-primitive references. This will not be a very difficult change; it’s really just a question of sitting down and doing the work.

Apart from that, I have a goal to separate codegen from AST traversal and implement an optimizer pass. At the moment, code is generated at the same time as we’re building our IR. This is of course very performant, but it also prevents us from doing any optimizations on that IR. We should, ideally, change the architecture of the compiler to first traverse the AST to build the IR, then perform optimizations on that IR (dead code elimination, inlining, etc), and finally generate code from the optimized IR. This is a non-trivial amount of work, but it would be big deal.

I would also like to provide support for more editors. At the moment, we ship a VSCode extension which provides syntax highlighting and language server support, because VSCode is probably the most common editor currently. However, I would like to support as many editors as we can, and personally I’m not a fan of VSCode myself.