A language for building DSL that decouples text and its formatting as a complementary alternative to Markdown and HTML

Spoiler: this language may be of interest not only to frontend developers but also to backend developers. The author also tried to make it understandable to the general public. Those who know Markdown and HTML can skip the next paragraph without sacrificing their understanding of the article.

To remind, the idea of Markdown syntax is to use punctuation characters for defining a basic text formatting so that even an unconverted text remains a human readable and you don't have to use a special viewer to read it. Special symbols for this language: \, `, *, _, {, }, [, ], (, ), #, +, -, ., !. Writing something like an article or just a message, most of us get used to some special designations for formatting text with headings, paragraphs, indentations, outlining or emoticons. Tools such as Markdown convert these notations into an intermediate language like HTML, and as a result we see the formatted text in the browser or other programs. Unlike Markdown, HTML supports much more text formatting, but its syntax is much more verbose, and the resulting text is not readable without a browser, because the raw text has many embedded tags with properties, style elements, script elements and other technical stuff.

The goal of the developed format was to meet the following 4 properties:

  1. human readability
  2. text formatting is defined declaratively, using user-defined notation
  3. one can easily change user-defined notation and/or its handlers to set different formatting settings
  4. online text translators don't break this format (to the maximum extent possible)

It turned out that this task has an elegant solution and we got additional features. In particular, using the developed language, you can create text and define not only rich formatting settings but much more complex logic without the loss of text readability. The other key feature of this language is simplicity. It uses just 3 key syntax elements and one for screening these elements, so you can start to use it even if you are not a developer. You can define rules in different programming languages how to convert an encoded text into other languages like HTML, Markdown, Tex or others. And last but not least, you can refactor the encoded text, change names of operators and variables, like you can do in any high level programming language but for that you don't need to use any advanced IDE, you can do it easily in any standard text editor.

For example, we can change the formatting of the following text

Asimov's most famous work is the Foundation series, the first three books of which won the one-time Hugo Award for "Best All-Time Series" in 1966. His other major series are the Galactic Empire series and the Robot series. The Galactic Empire novels are set in the much earlier history of the same fictional universe as the Foundation series. Later, with Foundation and Earth (1986), he linked this distant future to the Robot stories, creating a unified "future history" for his stories. He also wrote over 380 short stories, including the social science fiction novelette "Nightfall," which in 1964 was voted the best short science fiction story of all time by the Science Fiction Writers of America. Asimov wrote the Lucky Starr series of juvenile science-fiction novels using the pen name Paul French.
(https://en.wikipedia.org/wiki/Isaac_Asimov)
into
Asimov's most famous work is the [Foundation] series, the first three books of which won the one-time Hugo Award for "Best All-Time Series" in 1966. His other major series are the [Galactic Empire] series and the [Robot] series. The [Galactic Empire] novels are set in the much earlier history of the same fictional universe as the Foundation series. Later, with [Foundation and Earth] (1986), he linked this distant future to the [Robot] stories, creating a unified "future history" for his stories. He also wrote over 380 short stories, including the social science fiction novelette "[Nightfall]," which in 1964 was voted the best short science fiction story of all time by the Science Fiction Writers of America. Asimov wrote the [Lucky Starr] series of juvenile science-fiction novels using the pen name Paul French.
(https://en.wikipedia.org/wiki/Isaac_Asimov)
just changing one line in the code. As you see, you can not achieve it by using HTML or Markdown, since we are not only changing the text formatting here but also the text (putting words in square brackets). In fact, now the task of the writer is to write a text and just specify structure of the text, maybe defining some categories of elements for the same formatting. Formatting of the text can be chosen later and you can even use several formatting styles, depending on type of media resources (social network, web site, OS, type of mobile device or just file type) the text is published. At the end of the article we will give the source code of the previous text fragment.

The name for the developed language is Opt2Сode. You can find a free online tool for converting this language into formatted text on the website opt2code.com. Now we are ready to describe several rules of the basic syntax of Opt2Сode:

1) Core key syntax elements are "[[", "]]" and "][". To screen these elements in a text we use the symbol "|". The expression "||" is converted to "|", one symbol "|" converted to the empty string "", so that the expression "|||" converted into "|", "||||" into "||" and so forth. For instance, to screen expression "[[" one should use the expression "[|[".

2) Operators(sections) are defined by expressions of the form "[[nameOfOperator]]", where the name of operator doesn't contain any core syntax elements. An expression of the form "[[nameOfOperator]]" is called an operator (section) declaration. The operator is a (system) reserved one if the name of operator is one of the following reserved words: "sConfStart", "sTagTranslation", "sPropTranslation","sUserTagTranslation", "sUserPropTranslation", "sTagDef", "sDefDesc","sConfEnd". Otherwise, we call them user-defined operators. User-defined operators are declared in the main body of the text, system reserved operators should be defined after the main text and they are used for setting substitution rules, the priority of operators and function definitions, corresponding user-defined operators.

3) The Opt2Code tool, to analyze the text, first, splits it into text fragments by system reserved declarations. The received first fragment of the text we call user-defined part of the document. The others we call a configuration parts of the text. Each fragment of the text coming after the reserved declaration till the next one is considered as the body of the first operator (declaration). After analyzing rules defined in the bodies of reserved operators, the user-defined part of the document is splitted by user-defined declarations. Analogously, each fragment of the text coming after an user-defined declaration till the next one (or the end of the text or a reserved declaration) is considered as the body of the first operator (declaration).

4) As you can guess, an operator can define formatting for its body and for next tags if they are considered as subordinate operators. Using the priority of operators, the Opt2Code tool construct the ordered sequence of tree structures and invoke the corresponding function for each operator beginning from leafs of the trees to their roots. The result of joining results of these tree transformations is the output of the algorithm. It can be a HTML formatted text or Markdown based text or a list of objects in the memory used for displaying a formatted text.

We can start with a simple example. Suppose, the user would like to write the following text:

 

We love playing tennis with friends. This summer...

To do this, the user can write the code
 [[paragraph]] We love playing [[inBlue]]tennis[[text]] with friends. This summer... 
or
 [[paragraph]] We love playing [[blue]]tennis[[]] with friends. This summer...
or
 [[pf]] We love playing [[operatorNameForHobbies]]tennis[[]] with friends. This summer...
As you see, the user can use any descriptive name for an operator in double square brackets. We can notice the priority of the operator "pf" (or "paragraph") should be defined higher then operators [[inBlue]], [[text]], [[]] and so forth, to have this operator applied to the whole text. If one uses some abbreviation, a substitution rule should be added in the predefined section "sUserTagTranslation" like "[[pf]=paragraph" and description should be added in the section "sDefDesc" like "[[pf]- for creating new paragraph". See example:

[[sConfStart]]

[[sUserTagTranslation]]

   [[h2]=h2
   [[h3]=h3
   [[pfh]=paragraphHeader
   [[pf]=paragraph

[[sUserPropTranslation]]

[[sDefDesc]]
 [[h2]- for header
 [[h3]- for smaller header
 [[pfh]- for paragraph header style
 [[pf]- for creating new paragraph with custom style

[[sConfEnd]]
          
          
Using this syntax for rules, we provide that any operator can be easily renamed by replacing expression "[[oldName]" by "[[newName]" in any text editor.

To use user-defined operators, they should be defined in the system reserved section "sUserTagTranslation" through known predefined functions in the system (for example, to be executed by the converter based on embedded rules on http://opt2code.com) or functions which are defined in the section "sTagDef" in JavaScript code (for the converter based on user-defined rules on http://opt2code.com). Note that since these rules can be defined after the entire text has been written, the user can fully concentrate on the task of writing. The user can also later use some rules which were published by other users. By default, if the corresponding function for an operator is not found, the result would be the empty string.

If you are a bit familiar with HTML format you can see some remote analogy with the language we define. Indeed, the tree of HTML tags corresponds to the tree of operators. But the attentive reader can ask now how the properties of a HTTP tags (<tagName propName1="value1" propName2="value2" propName3="value3" ...>) correspond with the Opt2Code syntax. Now we will explain it. In general, the user-defined operator declaration are defined by an expression of the form

"[[nameOfOperator][arg1expr][arg2expr]...[argNexpr]]",
where an argument expression doesn't contain the symbol "]". A variable of the form "argXexp" from the previous operator declaration we call an argument expression of an operator. As for tags in HTML, we can define a value for a named variable using expression "propertyName=propertyValue" in an argument expression of an operator so that, say, it can influence on the formatting of the text, but the ideology of the usage of an argument expression is different. As for operators, we promote creation of a user-defined argument expression which the user can redefine. It can be done after writing a text in the section "sUserPropTranslation" using an expression of the form
"][argumentExpression]=propertyName=propertyValue"
or
"][argumentExpression]=otherArgumentExpression".
As for operators, in order to fix the meaning of an argument expression it is recommended to document it in the section "sDefDesc". It should be done in an expression of the form "][argumentExpression]- the description text". As for operators, such syntax allows us to rename any argument expression by replacing an expression "][oldArgumentExpression]" by "][newArgumentExpression]" in any text editor.

Now consider the code


[[h2]]
Terms of Use


[[pf]]...

[[pf]]
Any rights not expressly granted herein are reserved.


[[pfh]]
Contact Information
[[pf]]
We welcome your comments regarding the Terms of Use. If you believe that the Site has not adhered to this Agreement, please contact us. We will use commercially reasonable efforts to promptly determine and remedy the problem.

[[pf]] Email Contact: [[var][contactEmailArg]]

[[pfh]]
Entire User Agreement

[[pf]]
This Agreement is the complete and entire agreement between the parties and supersedes any prior agreement, whether written or oral.

[[sConfStart]]
[[sUserTagTranslation]]

[[h2]=h2
[[h3]=h3
[[pfh]=paragraphHeader
[[pf]=paragraph
[[ul]=ul
[[li]=li
[[var]=var

[[sUserPropTranslation]]

][contactEmailArg]=sValue=test@test.com


[[sConfEnd]]
        
        
Here the argument expression contactEmailArg is rewritten in the section "sUserPropTranslation" into sValue=test@test.com, "var" is embedded system operator defined to return value of its property "sValue". Using the online converter Opt2Code (based on embedded rules), we get

Terms of Use

...

Any rights not expressly granted herein are reserved.

Contact Information

We welcome your comments regarding the Terms of Use. If you believe that the Site has not adhered to this Agreement, please contact us. We will use commercially reasonable efforts to promptly determine and remedy the problem.

Email Contact: test@test.com

Entire User Agreement

This Agreement is the complete and entire agreement between the parties and supersedes any prior agreement, whether written or oral.

We haven't discussed yet how Opt2Code can help us to avoid problems with online translators. Indeed, human readable names of operators as well as argument expressions can be translated into another language and that will inevitably break the code. The Opt2Code online tool solves this problem by transforming the initial code into digital form by renaming user-defined operators and argument expressions into numbers and adding substitution rules into system sections (which are reserved for it) to save mapping between the old notation and the new notation to make this operation reversible. As a result, the user-defined part of the text contains only numeric operator expressions which are, in most cases, not changed by online translator so that the translated text keeps its formatting settings (of cause, if the order of these operators is not changed). So, we call this resulting code a digital form (of Opt2Code code). The result of backward transformation is called a user form (of Opt2Code code). See the chain of transformations below: initial view, digital view, digital view translated into German, user form for translated text.
The initial view:


[[h2]]
Terms of Use


[[pf]]...

[[pf]]
Any rights not expressly granted herein are reserved.


[[pfh]]
Contact Information
[[pf]]
We welcome your comments regarding the Terms of Use. If you believe that the Site has not adhered to this Agreement, please contact us. We will use commercially reasonable efforts to promptly determine and remedy the problem.

[[pf]] Email Contact: [[var][contactEmailArg]]

[[pfh]]
Entire User Agreement

[[pf]]
This Agreement is the complete and entire agreement between the parties and supersedes any prior agreement, whether written or oral.

[[sConfStart]]
[[sUserTagTranslation]]

[[pfh]=paragraphHeader
[[pf]=paragraph

[[sUserPropTranslation]]

][contactEmailArg]=sValue=test@test.com


[[sConfEnd]]
         
The digital form:

           [[1]]
Terms of Use


[[2]]...

[[2]]
Any rights not expressly granted herein are reserved.


[[3]]
Contact Information
[[2]]
We welcome your comments regarding the Terms of Use. If you believe that the Site has not adhered to this Agreement, please contact us. We will use commercially reasonable efforts to promptly determine and remedy the problem.

[[2]] Email Contact: [[4][1]]

[[3]]
Entire User Agreement

[[2]]
This Agreement is the complete and entire agreement between the parties and supersedes any prior agreement, whether written or oral.

[[sConfStart]]
[[sTagTranslation]]
[[1]=h2
[[2]=pf
[[3]=pfh
[[4]=var
[[sPropTranslation]]
][1]=contactEmailArg
[[sUserTagTranslation]]

[[pfh]=paragraphHeader
[[pf]=paragraph

[[sUserPropTranslation]]

][contactEmailArg]=sValue=test@test.com


[[sConfEnd]] 
                  
          
The translated digital form (we translate only the user-defined part of the text):

[[1]]
Nutzungsbedingungen


[[2]]...

[[2]]
Alle hier nicht ausdrücklich gewährten Rechte sind vorbehalten.


[[3]]
Kontaktinformationen
[[2]]
Wir freuen uns über Ihre Kommentare zu den Nutzungsbedingungen. Wenn Sie der Meinung sind, dass die Website nicht mit dieser Vereinbarung übereinstimmt, kontaktieren Sie uns bitte. Wir werden alle wirtschaftlich vertretbaren Anstrengungen unternehmen, um das Problem umgehend zu ermitteln und zu beheben.

[[2]] E-Mail-Kontakt: [[4][1]]

[[3]]
Gesamte Nutzungsvereinbarung

[[2]]
Diese Vereinbarung ist die vollständige und gesamte Vereinbarung zwischen den Parteien und ersetzt alle vorherigen Vereinbarungen, ob schriftlich oder mündlich.

[[sConfStart]]
[[sTagTranslation]]
[[1]=h2
[[2]=pf
[[3]=pfh
[[4]=var
[[sPropTranslation]]
][1]=contactEmailArg
[[sUserTagTranslation]]

[[pfh]=paragraphHeader
[[pf]=paragraph

[[sUserPropTranslation]]

][contactEmailArg]=sValue=test@test.com


[[sConfEnd]] 
The translated user form:


[[h2]]
Nutzungsbedingungen


[[pf]]...

[[pf]]
Alle hier nicht ausdrücklich gewährten Rechte sind vorbehalten.


[[pfh]]
Kontaktinformationen
[[pf]]
Wir freuen uns über Ihre Kommentare zu den Nutzungsbedingungen. Wenn Sie der Meinung sind, dass die Website nicht mit dieser Vereinbarung übereinstimmt, kontaktieren Sie uns bitte. Wir werden alle wirtschaftlich vertretbaren Anstrengungen unternehmen, um das Problem umgehend zu ermitteln und zu beheben.

[[pf]] E-Mail-Kontakt: [[var][contactEmailArg]]

[[pfh]]
Gesamte Nutzungsvereinbarung

[[pf]]
Diese Vereinbarung ist die vollständige und gesamte Vereinbarung zwischen den Parteien und ersetzt alle vorherigen Vereinbarungen, ob schriftlich oder mündlich.

[[sConfStart]]
[[sUserTagTranslation]]

[[pfh]=paragraphHeader
[[pf]=paragraph

[[sUserPropTranslation]]

][contactEmailArg]=sValue=test@test.com


[[sConfEnd]]
      
So, after transformation of the above text into HTML view by embedded rules (on the website opt2code.com) we get

Nutzungsbedingungen

...

Alle hier nicht ausdrücklich gewährten Rechte sind vorbehalten.

Kontaktinformationen

Wir freuen uns über Ihre Kommentare zu den Nutzungsbedingungen. Wenn Sie der Meinung sind, dass die Website nicht mit dieser Vereinbarung übereinstimmt, kontaktieren Sie uns bitte. Wir werden alle wirtschaftlich vertretbaren Anstrengungen unternehmen, um das Problem umgehend zu ermitteln und zu beheben.

E-Mail-Kontakt: test@test.com

Gesamte Nutzungsvereinbarung

Diese Vereinbarung ist die vollständige und gesamte Vereinbarung zwischen den Parteien und ersetzt alle vorherigen Vereinbarungen, ob schriftlich oder mündlich.

As promised, here is the code of the example mentioned at the beginning of the article (see comments in it):


 [[p]]... 
[[citation]]      Asimov's most famous work is the [[wn]]Foundation[[]] series, the first three books of which won the one-time Hugo Award for "Best All-Time Series" in 1966.
        His other major series are the [[wn]]Galactic Empire[[]] series and the [[wn]]Robot[[]] series. The [[wn]]Galactic Empire[[]] novels are set in the much earlier history of the same fictional universe
        as the Foundation series.
        Later, with [[wn]]Foundation and Earth[[]] (1986), he linked this distant future to the [[wn]]Robot[[]] stories, creating a unified "future history" for his stories.
        He also wrote over 380 short stories, including the social science fiction novelette "[[wn]]Nightfall[[]]," which in 1964 was voted the best short science
        fiction story of all time by the Science Fiction Writers of America. Asimov wrote the [[wn]]Lucky Starr[[]] series of juvenile science-fiction novels using the pen name Paul French.
        [[br]]([[i]]https://en.wikipedia.org/wiki/Isaac_Asimov[[]])
[[p]]just changing one line in the code (from [[wn]=artWorkNameOldStyle to [[wn]=artWorkName or changing the definition of "artWorkName" in the section sTagDef).


[[sConfStart]]

   [[sUserTagTranslation]]
   Section for establishing replacement rules for operators
      [[wn]=artWorkName

   [[sTagPropDeclaration]]
   This section is for setting default values for properties of operators. Some system properties like sPriorityLevel are used for setting priority of operators
   to calculate hierarchy of operators i.e. parent-child relationships of operators.
   By the way, the higher value of sPriorityLevel, the lower hierarchy position (in the tree of operators) of the corresponding operator.

     [[][sPriorityLevel=100]]
     [[artWorkName][sPriorityLevel=100]]
     [[br][sPriorityLevel=100]]
     [[i][sPriorityLevel=100]]
     [[citation][sPriorityLevel=10]]
     [[p][sPriorityLevel=10]]

   [[sTagDef]]
     Every rule in this section is executed accordingly the order of corresponding operators.
     See also the comment after these rules.
     [[]=return `<span>${it.text}${it.children.join('')}</span>`;

     [[artWorkNameOldStyle]=
     return `<i style="color:grey">${it.text}${it.children.join('')}</i>`

     [[artWorkName]=
     return `<i style="color:blue">[${it.text}${it.children.join('')}]</i>`

     [[br]= return `<${it.name}/><span>${it.text}${it.children.join('')}</span>`

     [[i]= return `<${it.name}>${it.text}${it.children.join('')}</${it.name}>`;

     [[citation]=
     return `<div style="background:#f6f8fa;margin-top:0em;margin-left: 1em;margin-right: 1em; max-height: 15em; overflow-y: auto;">${it.text}${it.children.join('')}</div>`;

     [[p]=return `<${it.name}>${it.text}${it.children.join('')}</${it.name}>`;


  /* This is a JavaScript code. To calculate the string (`...`) with embedding expressions we use the JavaScript syntax of template literal. By the way, that's why we can use the JavaScript comment syntax here (otherwise the code could be broken).
     The system wraps this code in the body of anonymous function and executed it with one argument "it", where "it" is an JavaScript object containing properties of the corresponding operator, context properties (for example, it.name is the name of this mapping ("p"), it.text - is a body of the operator) and the call results of functions corresponding child operators of the current one (it.children).

      As you can see, the result of this function is meant to be an HTML code. Use converter "HTML view by user-defined rules" on the website opt2code.com to see how this code is displayed in a browser.*/


  [[sDefDesc]]
    [[artWorkNameOldStyle] - previous name for artwork titles
    [[artWorkName]- for artwork titles
    [[wn]- alias for artwork titles
    [[]- for default formatting of the text
    [[citation]- for citing
    [[br]- for line break
    [[i]- for italic font
    [[p]- for starting a new paragraph


[[sConfEnd]]      

You can check the conversion of this code into HTML view by converter "HTML view by user-defined rules" on opt2code.com.

In conclusion, the reader may notice that the described format can be used inside any high-level language so that functions corresponding to user-defined operators are defined in these languages. This is possible because we only use names for matching user-defined operators with functions. For example, embedded rules for the Opt2Code converter on opt2code.com were written in kotlin language. The usage of this format is not limited to text formatting area, since we can call any function in high-level language using the algorithm behind this format and in fact this format is used to define declaratively a composition of functions. For more information, see the specification of this language.

© 2022-present opt2code.com. All Rights Reserved.