Clicking Mermaid

This is the story of implementing click functionality for Mermaid.js Entity Relationship Diagrams. What started as "just copy the flowchart code" became a deep dive into parser generators, the LALR(1) algorithm, and the long history of using language to build language that can read language.

The Goal

Contents

  1. The Goal
  2. Setup the Project Locally
  3. Codebase Recon
  4. What's Jison?
  5. Interactive Parser Visualization
  6. Implementation Details
  7. Working Demo
  8. Lessons Learned
  9. What's Next?

I wanted to contribute to any open source project that I used personally and considered an immediate genuine benefit to its users. After nosing around Postgres and finding it to be a contributer's equivalent to Everest, I settled on Mermaid.js. I considered Cytoscape and Janus Gateway as well, but GitHub Issue #2880 caught my interest. Issue 2880 (and duplicate? Issue 3966) request click functionality for Entity Relationship Diagrams in Mermaid.js. Two issues, and each has other Github users agreeing this enhancement would be appreciated.

Mermaid's flowcharts already supported clicks, so naturally I thought: "How hard could it be?"

Narrator: Harder than expected.

Isn't it always?

As described in the Mermaid docs flowcharts support various click patterns a user may add to their diagram code:

click A call aCallbackFunction 
 click B someCallback(arg1, arg2) 
 click C href "https://example.com" 
 click D href "https://example.com" _blank

The corresponding flowchart nodes (boxes in the diagram) will react to clicks as you'd expect: linking around, opening tabs, calling custom javascript 'callback' functions with or without arguments.

Setup the project locally

The Mermaid project's CONTRIBUTING.md explains how to get rolling pretty quickly.

  1. Fork mermaid on Github
  2. Clone the fork
  3. install pnpm, a project manager for node projects
  4. pnpm install, pnpm run dev

A little jiggling and I have a working development environment locally serving example diagrams at http://localhost:9000/er.html and http://localhost:9000/flowchart.html The flowchart demo has working click examples. I tweak the html file, and the demo page reflects the change. We have an iteration loop!

Codebase Recon

Before attempting to implement ER clicks, I needed to understand how flowchart clicks worked. Here's the detective work:

Flowchart Files (Source)

ER Files (Target)

Great, I'm familiar with typescript already. Digging into those files, flowDb defines a setClickEvent(), flow.jison has a lot of setClickEvent() calling, and it stands to reason the renderer goes last, so it looks like mermaid does something like:

  1. Jison: Recognizes click syntax and calls database methods
  2. Database: Stores link/callback data on node objects intermediary state
  3. Renderer: Wraps clickable nodes in SVG anchor elements

The key insight was that flowchart's setClickEvent() and setLink() database methods were exactly what ER diagrams needed - just called from a different grammar. With a little luck the two grammars would not be too disparate from each other, and I might even get a win without understanding whatver Jison is.

One naive porting of click events, types and jison-ey looking click stuff to erDiagram.jison later and I'm thinking I'll have to more thoughtfully comprehend what I'm doing.

What's Jison?

The Jison Docs are a little help, providing a calculator example, but punt on some of the 'major concepts':
Until the Bison guide is properly ported for Jison, you can refer to it for the major concepts...

Jison is a JavaScript parser generator. We write grammar rules (the jison file), Jison itself generates us a custom parser from our rules capable of receiving input text adhering to those rules. But what does the parser do with input with those rules? How does it act or react to the input? Comparing the Jison calculator example to the Mermaid source code, they don't seem very similar to me. Not similar enough for me to connect the dots.

Uh, what's Bison?

That Bison manual the Jison docs were dead linking to turns out to be an older version of this pdf. 244 pages of language explaining the language used to defines languages! Great.

Bison is a general-purpose parser generator that converts an annotated context-free grammar into a deterministic LR or generalized LR (GLR) parser ... Once you are proficient with Bison, you can use it to develop a wide range of language parsers, from those used in simple desk calculators to complex programming languages.

Okay, that may seem a little daunting, but it makes sense the Mermaid.js needs something like this. The whole Mermaid paradigm is you pick a kind of diagram, input your specific diagram details in text, and that text needs to be in a syntax that supports the kind of diagram you want. Flowcharts use flowchart words, but a Sequence Diagrams use a different jargon. Currently Mermaid supports over twenty kinds of diagrams, they all get their own input jargon, and jison/bison is serving as the in-between glue that converts the user's input into javascript method calls (supplied by those 'db' files).

Distraction: Jison isn't the only parser in mermaid, packages/parser/ contains a Langium based parser some of the newer diagram types use instead of Jison. I'm on a mission, so this Langium parser should be ignored. (Lies, I wasted time on a "little" Langium side project to compare parser-feel, concluding that Langium feels newer but solves a complex problem with a complex, undocumented solution; where Jison feels old but ultimately simplifies the complex problem.)

But the jison files aren't doing the parsing themselves, they just define a grammar to generate a parser. So to avoid authoring more errors of ambiguity while I modify the erDiagram.jison, I can either read the Bison manual or find the generated parser in the code and infer why it is complaining. Considering the Bison manual starts off with

Anyone familiar with Yacc should be able to use Bison with little trouble. You need to be fluent in C, C++, D or Java programming in order to use Bison or to understand this manual.

Let us be candid. The Bison parser is complicated. Really Bison includes many parsing strategies and mechanisms with names like LR, IELR, LALR, LAC, GLR.

Technically, Jison generates LALR(1) parsers - an optimized variant of LR(1) with smaller parse tables. For our purposes, they're equivalent. When people say "LR(1)" they usually mean the LALR(1) implementation.

"Bison is optimized for what are called LR(1) grammars... must be possible to tell how to parse any portion of an input string with just a single token of lookahead"

Can we fix the erDiagram jison grammar, defining everything a Mermaid ERD user might want to author, with just a single token of lookahead? What are the options here?

Parser Evolution: From BNF to Jison

Apparently parser generators evolved through decades of computer science research:

Each step built on Chomsky's grammar hierarchy:

Jison combines Type 3 grammars (for lexing) with Type 2 grammars (for parsing) to generate complete parsers from simple specifications.

graph TD subgraph T0["Type 0 Grammars: Unrestricted"] subgraph T1["Type 1: Context-Sensitive"] subgraph T2["Type 2: Context-Free"] subgraph T3["Type 3: Regular"] subgraph exT3["A → aB | b
(right-linear only)"] end end subgraph exT2["A → aAb | ε
(single non-terminal left)"] end end subgraph exT1["aAb → acdb
(A replaced in context ab)"] end end subgraph exT0["AB → BA
(anything → anything)"] end end BNF["1958-1963: BNF
Backus-Naur Form
for ALGOL"] LR["1965: Knuth
LR Parsing Theory"] YACC["1975: Yacc
'Yet Another
Compiler Compiler'"] BISON["1989: Bison
GNU replacement
for Yacc"] JISON["2009: Jison
'Bison in JavaScript'
Used by Mermaid!"] T0 --> BNF BNF --> LR LR --> YACC T2 -.->|"for parsing"| YACC T3 -.->|"for lexers"| YACC YACC --> BISON BISON --> JISON

A big dividing line between these parsers is whether they're deterministic, at every step there is one correct action and interpretation of input text. Deterministic grammars are fast to parse, but inflexible. Ambiguity throws an error.

So what if we wanted a grammar that was more ambiguous? Enter GLR.

Wife:What are you reading?
...when faced with unresolved shift/reduce and reduce/reduce conflicts, GLR parsers use the simple expedient of doing both, effectively cloning the parser to follow both possibilities. Each of the resulting parsers can again split, so that at any given time, there can be any number of possible parses being explored. The parsers proceed in lockstep; that is, all of them consume (shift) a given input symbol before any of them proceed to the next. Each of the cloned parsers eventually meets one of two possible fates: either it runs into a parsing error, in which case it simply vanishes, or it merges with another parser...
Wife:So it does a multiverse?

Which is a good callout. Do I want to have to worry about a multiverse when I'm really just trying to make some boxes clickable? Does Mermaid even GLR? Fortunately, a search of the Mermaid codebase suggests none of the jison files engage GLR, so Mermaid uses LALR(1), which is deterministic and doesn't require multiversal thinking. I can take what I've learned so far, and hold off on the rest of the Bison manual. Maybe.

LALR(1) means:

In practical terms: the parser reads your grammar left to right, can peek at one token ahead, and needs unambiguous rules. When your grammar is ambiguous, jison will let you know with cryptic shift/reduce conflicts or ambiguity errors.

Lexer vs Grammar Rules

Jison files are split into two distinct sections, each serving a different purpose in the parsing pipeline:

Lexer Rules: Turning Characters into Tokens

The lexer (lexical analyzer) reads raw text and produces tokens. Think of it as recognizing words in a sentence:

%lex
%%
"click"                         return 'CLICK';
"href"                          return 'HREF';
"call"                          return 'CALL';
"_blank"                        return 'LINK_TARGET';
[\\s]+                           return 'SPACE';
\"[^\"]*\"                       return 'WORD';
%%

ELI5: The lexer is like someone reading aloud, recognizing that "c-l-i-c-k" spells the word "click". It groups characters into meaningful chunks.

Grammar Rules: Understanding Token Relationships

The grammar (syntax analyzer) takes tokens and builds a structure. It understands how tokens relate to each other:

clickStatement
    : CLICK entityName HREF WORD LINK_TARGET
      { $$ = $CLICK; yy.setLink($2, $4.replace(/"/g, ''), $5); }
    | CLICK entityName CALL UNICODE_TEXT
      { $$ = $CLICK; yy.setClickEvent($2, $4); }
    ;

ELI5: The grammar is like understanding sentence structure. It knows that "click CUSTOMER href 'https://example.com' _blank" means "make CUSTOMER clickable and open example.com in a new tab".

To grasp all of these examples we also need to understand special symbols like $$ and yy.

$$ comes from Bison, and is the value a particular rule is going to return. Sometimes returning $$ is not strictly necessary, but we may do so anyway for convention, debuggability, or future-proofing. The click example is like that, $$ = $CLICK isn't actually required, the real work is in the side effects provided by yy.

Sometimes returning $$ does matter, usually when building up datastructures via grammar rules. An example in Mermaid's erDiagrams is the attribute list

  attributes
    : attribute
    { $$ = [$attribute]; }              // Create array with one attribute
    | attribute attributes
    { $$ = [$attribute, ...$attributes]; }  // Prepend to existing array

Here 'attributes' is the list of type-value pairs a user may include in their entities. In the context of a diagram of a real database these attributes would usually be the columns of the tables each entity was representing. The parser is going to read these attributes one at a time, and either start making an attribute list or prepending new attributes, one line at a time. The $$ is used to build that list until the entity block completes and a yy.addAttributes() method is called to attach the attribute list to the entity in the 'db'.

What is yy and how does it know to have an addAttributes() method? yy is an object attached to the parser purposefully, by Mermaid, to provide the custom methods that the parser may call as a result of a grammar rule firing. The attaching is done in Mermaid's generic Diagram class:

parser.yy = db

The various Jison based Mermaid diagram types inherit this yy is really db methods behavior, implementing their own 'db' objects with methods their jison parsers will expect.

Hindsight: I skipped the New Diagram (Jison) helper doc because it was about implementing a new diagram, and because it is marked as deprecated. But actually this document does a decent job explaining how some mermaid parts fit together like the yy binding, db pattern, etc. Sometimes the recon stage is hard because you don't know what you don't know!

The deprecated status is more about Mermaid now preferring Langium over Jison moving forward, but for modifying existing Jison-based diagrams this doc is still useful.

While we're calling out jisons special symbols, the dollar-number combinations like $1, $2, etc. are references to the parts of the rule as it is written by the user in order.

Here's a concrete example from the click implementation:

clickStatement
      : CLICK entityName CALL UNICODE_TEXT { $$ = $CLICK; yy.setClickEvent($2, $4); }
     //   ↑       ↑       ↑     ↑                   ↑                      ↑   ↑
     //   $1      $2      $3    $4                  $1            entityName   UNICODE_TEXT

Both the numbered references and the corresponding named references are interchangeable, unless two symbols in a rule share the same name. For instance,

relationship : entityName relSpec entityName

Needs to reference those entityNames with $1 and $3 respectively to unambiguously refer to those two different entityNames.

Where are the generated parsers really?

Having researched what the parsers are and how they work, I was interested in reading to verify my assumptions. So where are they? It isn't obvious at first, the parser typescript files kind of look like they're treating the jison files as if they were the parsers themselves.

import flowJisonParser from './flow.jison';

That can't be right. I notice a jisonTransformer.js in Mermaid's .build directory, and a jisonPlugin.ts calling the transformer during the build process whenever a jison file gets read. The jison parsers are getting generated at build time, triggered by a typescript import! Sneaky! From the point of view of a dev just getting on with things this is pretty convenient.

But that convenience isn't very useful to me as a learning tool. It means there's an invisible part of the process. I could point Jison at the erDiagram.jison file to generate the parser 'manually', but without the corresponding erDB methods to hand over to the parser in the yy variable the parser can't do anything. So I stubbed out the erDb.js file and made an interactive parser to verify my suspicions and comprehension:

Interactive Parser Visualization

You may step through parsing one character at a time with the left and right arrows to see how Jison tokenizes input, applies grammar rules, and builds the ERD database:

Tokenized Input

Parse Events

ER Database

Making the interactive parser tool led to a revelation: lexing and parsing are not two distinct stages, they're interleaved. As the generated parser consumes input text it toggles back and forth between recognizing new tokens and applying the grammar rules to the tokens as they get perceived. Well, one token back because of the 1-token lookahead of the LR1 grammar.

After researching docs and the codebase, and building a few toys, I can expand on the Mermaid process:

  1. Grammar definition (.jison file): Defines lexer to recognize tokens and syntax rules pairing to 'db' methods
  2. Parser generation (build-time): Vite plugin transforms .jison into a javascript parser
  3. Parser execution (runtime): The Generated parser processes user's input text calling those db methods
  4. Database/State (runtime): ErDB stores entities/relationships in-memory (Map/arrays)
  5. Renderer (runtime): Reads from DB via getData() and generates SVG

Implementation Details

Armed with parser understanding, here's what actually went into the implementation:

Token Ordering is Critical

The lexer processes patterns in order. This seemingly minor detail was crucial:

([A-Za-z_][A-Za-z0-9_]*\\s*\\([^)]*\\))  return 'FUNCTION_CALL';
([^\\x00-\\x7F]|\\w|\\-|\\*)+        return 'UNICODE_TEXT';

FUNCTION_CALL must come before UNICODE_TEXT or callbacks with arguments like callback(arg1, arg2) get misidentified as plain text. The lexer reads top-to-bottom and returns on first match - order matters. Honestly this is what tripped up the initial blind attempt to emulate the flow diagram's click implementation. Had I just been 'lucky' with that blind attempt I would know much less about parsing and the topics covered in this writeup.

Ten Grammar Productions

Supporting all click syntax variations required 10 separate grammar productions in clickStatement:

This covers: click A "url", click A href "url", click A href "url" _blank, etc. The href keyword is optional for user convenience.

Parsing Callback Arguments

Splitting callback arguments on commas is tricky when arguments themselves contain commas. The solution uses a lookahead regex:

argList = functionArgs.split(/,(?=(?:(?:[^"]*"){2})*[^"]*$)/);

This splits on commas except when inside quoted strings, so callback("arg with, comma", "another") correctly parses as two arguments. Each argument is then trimmed and unquoted.

If no arguments are provided, the entity name is automatically passed: click CUSTOMER call callback becomes callback("CUSTOMER").

Security Considerations

Security: Click functionality only works when securityLevel='loose'. In strict mode, entities get the clickable class but no actual event handlers.

Additional security measures:

SVG Anchor Wrapping

Making entities clickable requires wrapping them in SVG <a> elements. The renderer does this by manipulating the DOM:

const link = doc.createElementNS('http://www.w3.org/2000/svg', 'a');
// Set href, target, classes...
const parent = node.node()?.parentNode;
parent.insertBefore(link, node.node());  // Insert <a> before entity
link.appendChild(node.node());           // Move entity inside <a>

This pattern inserts the anchor, then reparents the entity node inside it - standard DOM manipulation for SVG.

Deferred Event Binding

Callbacks aren't bound immediately when parsed. Instead, setClickEvent() pushes functions into a funs array:

this.funs.push(() => {
  const elem = document.querySelector(`[id="${entity.id}"]`);
  if (elem !== null) {
    elem.addEventListener('click', () => {
      runFunc(functionName, ...argList);
    }, false);
  }
});

Later, the renderer calls bindFunctions() to execute all stored functions. This deferred binding ensures the DOM elements exist before trying to attach listeners.

Documentation Updates

The PR added comprehensive documentation to both docs/syntax/entityRelationshipDiagram.md and the demos, including:

Working Demo

The following ER diagram demonstrates the implemented functionality:

erDiagram direction LR CUSTOMER { int id PK string name string email } ORDER { int id PK int customer_id FK string status } PRODUCT { int id PK string name decimal price } CUSTOMER ||--o{ ORDER : places ORDER ||--o{ PRODUCT : contains click CUSTOMER href "https://github.com/mermaid-js/mermaid/issues/2880" _blank click ORDER call showAlert("Order clicked!") click PRODUCT "#the-goal"

Try it:

Lessons Learned

What's Next?

The basic functionality works. Future click-related improvements could include: