Javascript: lexical structure

·

3 min read

The lexical structure of a programming language is a set of elementary rules that specifies how you write programs like how variable names look, the delimiting characters for comments, keywords etc.

Text

Javascript is a case-sensitive language that means "abc" and "ABC" are treated differently. javascript ignores spaces between tokens and also ignores new lines for the most part but there are some exceptions that you will see later in this article. Because of this you can format and indent your code neatly and consistently which makes your code easy to read.

comments

javascript supports two styles of comments. Anything between "//" and the end of the line is a comment and ignored by javascript. Anything between "/*" and "*/" is also treated as a comment, this can be used for multi line comments.

Literals

A literal is a data value that appears directly in a program. The following examples are literal values

12 // the number twelve
true // boolean 
"helloworld" // string of text

identifiers and reserved words

An identifier is nothing but a name. In javascript, it is used to name variables, constants, functions, classes etc. An identifier must begin with an underscore or a dollar sign or a letter. The subsequent characters can be digits, letters, underscores, or dollar sign. javascript reserves certain identifiers which are known as reserved words like if,while. These cannot be used as regular identifiers.

Unicode

javascript programs are written using a unicode character set. You can use any unicode character in strings or comments. It is common to use only ASCII characters in identifiers but it allows Unicode letters, digits, and ideographs(but not emojis) in identifiers. This means that you can use mathematical symbols, and words from other languages in identifiers.

const π = 3.14;
const sí = true;

Unicode escape sequences

old computers or certain software do not support a full set of unicode characters. javascript defines escape sequences that allow us to write unicode characters using only ASCII characters. These unicode escapes begin with \u and are either followed by exactly four hexadecimal or by one to six hexadecimal digits enclosed within curly braces. The unicode escape sequences may appear in javascript string literals, regular expressions, and identifiers but not in keywords.

let café = 1;  // define a variable
caf\u00e9;    // => 1: access variable using escape sequence
caf\u{E9};    // => 1: another form of same escape sequence
console.log("\u{1F600}"); // Prints a smiley face emoji

earlier versions of javascript supported the four-digit escape sequence. The version with curly braces was introduced in ES6 to support codepoints that require more than 16 bits such as emoji.

Semicolons

Like any other programming language javascript supports semicolons. But this is optional because javascript treats line breaks as semicolon but not all. It usually treats line breaks as semicolon if it cant parse the code without adding an implicit semicolon. consider the following example

let a
a
=
3
console.log(a)
// javascript interprets this code as.
let a; a = 3; console.log(a)

javascript treats the first line break as semicolon because it cannot parse the code let a a.
But It does not treat the second line break as semi colon because it can continue parsing the code as a = 3.

There are some exceptions where this does not work like if you use the following.

  1. return

  2. break

  3. continue

  4. postfix operators

  5. arrow function syntax

return 
true;
// javascript will treat the line break as semicolon even though it looks parsable.
return; true;

Source:

  1. JavaScript: The Definitive Guide, 7th Edition by David Flanagan