= 5
x
def modify():
global x
= 6
x
modify()print(x)
-- 6
Ryan Heslin
January 18, 2023
Most programming languages have some notion of scope: a frame of names bound to values. A program will involve many interlocking scopes, all associated with constructs like functions and (in some languages) blocks. Each language has its own rules for managing scope. These rules determine how references to variable names are resolved at runtime. They typically organize scopes into a hierarchy and determine which scope “wins” when the same name exists in multiple scopes visible at a given time. An important part of scoping rules is what happens when a name is looked up in a scope where it is not bound to any object. As we will see, different languages have different ways of handling this case.
A common pattern is called lexical scope. This means that objects, particularly functions, consider their parent scope to be the scope where they were defined. In a lexically scoped language, if a function can’t find an object called x
in its execution environment, the interpreter or compiler would look next in whichever environment the function was defined in (often called the enclosing environment). This behavior has the advantage of consistency, since a given function’s enclosing environment is the same no matter where the function is called.
Languages with lexical scope often allow assignments in a function’s enclosing environment. If a function is defined in the global environment, this is usually bad idea, since it violates the principle that functions should avoid unnecessary side effects. But side effects are no issue if that enclosing environment is another function’s execution environment. Suppose f
is a function that returns a function g
when called. g
’s enclosing environment, the place it was defined, is f
’s execution environment. So any binding g
makes in its enclosing environment when it is called is visible only to subsequent calls to g
! This fact makes it possible for functions to read and write a private cache, enabling a host of powerful programming techniques.
This post compares the syntax several languages use for assignments in enclosing scope. We’ll implement a common pattern in each and contrast the languages’ very different means of supporting the same basic idea.
That pattern is called memoization. It can be used to write functions that “remember” their result for a given input after computing it the first time. This approach trades performance for memory; used well, it can speed up code by saving the results of expensive computations instead of repeating them.
We’ll compare how several popular languages approach lexical scope using a simple application called neighbors
. neighbors
is a function that returns a memoized function that finds the four non-diagonal neighbors of a point on a two-dimensional Cartesian grid. Each time this function is called on a new point, it computes that point’s neighbors and stores them in an associative data structure in its enclosing environment. Subsequent calls with the same argument find the cached value and return it, preventing repeated computation. While trivial, this technique might be useful on certain Advent of Code problems, where it might be necessary to check a point’s neighbors many times.
For simplicity, I make no attempt to verify that inputs as two-dimensional coordinates, which a real implementation would have to do.
We may as well start with good old Python. One of Python’s core design principles is “explicit is better than implicit.” So it is with lexical scope. If you want to modify a variable in the global environment from within a function (which you probably shouldn’t), you have to use the global
keyword:
There is a different keyword, nonlocal
, that should be used when the enclosing environemnt is a function’s evaluation environment. Here is neighbors
in Python:
def neighbors():
memo = {}
def inner(x):
nonlocal memo
if x not in memo.keys():
memo[x] = ((x[0] - 1, x[1]),
(x[0] + 1, x[1]),
(x[0], x[1] - 1),
(x[0], x[1] + 1))
return memo[x]
return inner
get_neighbors = neighbors()
get_neighbors((2, 3))
-- ((1, 3), (3, 3), (2, 2), (2, 4))
-- ((0, 4), (2, 4), (1, 3), (1, 5))
-- ((1, 3), (3, 3), (2, 2), (2, 4))
This implementation shows off one of my favorite Python features: any immutable object can serve as a dict key, not just strings. We can just use coordinates as indices. The use of keywords makes it impossible to use global data unless you really want to, but it’s hardly elegant.
I could have avoided most of the work above by just writing a function to compute neighbors and adding the cache
decorator from the functools
module. (In Python, decorators are functions that modify other functions. A special syntax exists for applying them). That would have memoized the function automatically. But I think it’s instructive to demonstrate the concept in pure Python.
Speaking of elegance, we’ll now consider a language about as elegant as an A-10: Bash. Bash is designed for interactively managing filesystems and operating systems, or writing scripts that do the same thing. Its syntax isn’t pretty, but it’s brutally effective in its role.
But this pattern can’t be implemented in Bash, so far as I know. Bash is dynamically scoped, not lexically scoped. That means a function that fails to find a variable in its local scope will next look in its caller’s scope, then the caller’s caller’s scope, and so on, instead of the scope where it was defined. If we tried to implement a Python-style neighbors
in Bash like this:
neighbors(){
declare -A memo
local code='get_neighbors(){
local x="$1"
local y="$2"
local hash = "$x,$y"
if [ ! -v memo["$hash"] ]; then
local west="$(($x-1)),$y"
local east="$(($x+1)),$y"
local south="$x,$(($y -1))"
local north="$x,$(($y +1))"
memo+=(["$hash"]="$north-$east-$south-$west")
fi
}'
exec "$code"
}
the function it returned (to be exact, the function generated by the code string it evaluated ) would not “remember” the memo
array, because bindings in the scope where the array was defined would not be preserved. Even if we ignore that fact, Bash is a bad choice for this problem because it doesn’t support multidimensional arrays, so I had to resort to some ugly code to represent the coordinates as a string. Bash is indispensable in many situations, but not this one.
Lua is a fast, lightweight scripting language. Though its design is minimalist, with simple syntax and few data structures, it does support lexical scope, enabling the usual functional programming tricks.
present_table = function(x)
result = {}
for i, tab in ipairs(x) do
result[i] = "{" .. table.concat(tab, ", ") .. "}"
end
return "{" .. table.concat(result, ", ") .. "}"
end
neighbors = function()
local memo = {}
return function(x)
local hash = table.concat(x, ",")
if memo[hash] == nil then
memo[hash] = {
{ x[1] - 1, x[2] },
{ x[1] + 1, x[2] },
{ x[1], x[2] - 1 },
{ x[1], x[2] + 1 },
}
end
return memo[hash]
end
end
get_neighbors = neighbors()
print(present_table(get_neighbors({ 2, 3 })))
print(present_table(get_neighbors({ 1, 4 })))
print(present_table(get_neighbors({ 2, 3 })))
-- {{1, 3}, {3, 3}, {2, 2}, {2, 4}}
-- {{0, 4}, {2, 4}, {1, 3}, {1, 5}}
-- {{1, 3}, {3, 3}, {2, 2}, {2, 4}}
Lua automatically finds the memo
table (what Lua calls its all-purpose record data structure) in the enclosing scope and modifies it. I had to write my own function to print tables, however, because Lua only prints a table’s memory address by default. The only other wrinkle is that Lua makes variables global by default. You have to use the local
keyword to bind variables in a function’s execution scope (or make them local to a script). Since creating global variables from functions is usually a bad idea, I think it should be the other way around, as in Python, but that’s a matter of preference.
This snippet demonstrates another neat feature of Lua: looking up a nonexistent table index or unbound variable returns nil
instead of an error, making it safe to test for index existence.
Like Bash, JavaScript is respected for its usefulness and ubiquity rather than its elegance. Say what you will about JavaScript’s loose typing, design inconsistencies, and erratic syntax, it isn’t Python that powers the modern Web.
As in Lua, there’s no need to use special syntax to write in the enclosing scope. But I do have to use JavaScript’s declaration keywords, which have characteristically intricate rules. I need to use let
inside the function instead of const
to avoid creating global variables; var
has yet another assignment behavior. Still, it’s easy enough to express the functional idiom:
function neighbors(){
let memo = {};
return function(x){
let hash = x.toString();
if(!(hash in memo)){
memo[hash] = [[x[0] -1, x[1]],
[x[0] + 1, x[1]],
[x[0], x[1] - 1],
[x[0], x[1] + 1]];
}
return memo[hash];
}
}
const get_neighbors = neighbors();
console.log(get_neighbors([ 2, 3 ]));
console.log(get_neighbors([ 1, 4 ]));
console.log(get_neighbors([ 2, 3 ]));
-- [ [ 1, 3 ], [ 3, 3 ], [ 2, 2 ], [ 2, 4 ] ]
-- [ [ 0, 4 ], [ 2, 4 ], [ 1, 3 ], [ 1, 5 ] ]
-- [ [ 1, 3 ], [ 3, 3 ], [ 2, 2 ], [ 2, 4 ] ]
Surprisingly, SQL provides the most elegant interface for functional programming. Strange as it sounds, its expressive, if finicky, declarative syntax enables powerful idioms. With creative use of GROUP BY
clauses and certain window functions, it is possible to -
Just kidding. But it probably is possible, albeit silly, to implement neighbors
in procedural SQL.
As in many things, I’m biased toward R because it was my first language. But I think R wins this comparison on the merits.
R has the most elegant solution to writing in the enclosing environment. The regular assignment operator, <-
, only ever binds in its caller environment (and yes, it technically is a function, so it’s correct to speak of its caller environment). Instead, R implements assignment in enclosing environments using a _super_assignment operator, <<-
. (In R, environments are first-class objects that record their parent environments). This operator checks the parent of the caller environment for a binding with the same name as the one being used, and overwrites that binding if it exists. If the name is not bound in the parent environment, it repeats this process for each parent of that environment, and finally binds the name in the global environment if it is not defined anywhere.
This power can be used for evil, as in this snippet that modifies a global variable:
f1 <- function() {
x <<- 5
}
f1()
x
-- [1] 5
Or for good, as in this version of neighbors
. It works because closures, the type R uses to implement most functions, record the environment where they were defined.
neighbors <- function() {
memo <- new.env(hash = TRUE)
function(x) {
hash <- paste(as.character(x), collapse = "")
result <- tryCatch(get(hash, memo), error = function(e) {
result <- list(
c(x[[1]] - 1, x[[2]]),
c(x[[1]] + 1, x[[2]]),
c(x[[1]], x[[2]] - 1),
c(x[[1]], x[[2]] + 1)
)
memo[[hash]] <<- result
result
})
result
}
}
get_neighbors <- neighbors()
get_neighbors(c(2, 3))
-- [[1]]
-- [1] 1 3
--
-- [[2]]
-- [1] 3 3
--
-- [[3]]
-- [1] 2 2
--
-- [[4]]
-- [1] 2 4
get_neighbors(c(1, 4))
-- [[1]]
-- [1] 0 4
--
-- [[2]]
-- [1] 2 4
--
-- [[3]]
-- [1] 1 3
--
-- [[4]]
-- [1] 1 5
get_neighbors(c(2, 3))
-- [[1]]
-- [1] 1 3
--
-- [[2]]
-- [1] 3 3
--
-- [[3]]
-- [1] 2 2
--
-- [[4]]
-- [1] 2 4
Notice how <<-
still works correctly, even embedded in an anonymous function passed as an error handler to tryCatch
. The one annoyance is that numeric vectors can’t be used as names.
At the end of the day, broad techniques like memoization transcend any individual language. All the languages above (except Bash) make it relatively easy to use memoization. As with most programming concepts, translating the principles into any given language is much easier than understanding them in the first place. Still, the comparison here highlighted each language’s character: Python’s explicitness, Lua’s simplicity, and R’s lovable blend of elegance and jankiness. (Also worth noting: the languages featured each call their associative data structure something different).
I like R’s approach best. A special version of the assignment operator clearly signals that something unusual is being done, and it is visually distinct yet hard to type by accident. (RStudio, the premier R IDE, has a keyboard shortcut for <-
that makes mistakenly typing <<-
very unlikely). But which is your favorite?
---
title: "Lexical Scope: Who Does It Best?"
author: "Ryan Heslin"
date: "2023-01-18"
categories:
urlcolor: "blue"
---
```{r, include = FALSE}
render_chunk <- function(options, command){
code <- paste(options$code, collapse = "\n")
temp <- tempfile()
writeLines(code, temp)
result <- system2(c(command, temp), stdout = TRUE)
print(result)
knitr::engine_output(options, code, result)
}
knitr::knit_engines$set(js = function(options) render_chunk(options, "js"))
knitr::knit_engines$set(lua = function(options) render_chunk(options, "lua"))
```
# Introduction
Most programming languages have some notion of scope: a frame of names bound to values.
A program will involve many interlocking scopes, all associated with
constructs like functions and (in some languages) blocks.
Each language has its own rules for managing
scope. These rules determine how references to variable names are resolved at runtime. They typically organize scopes into
a hierarchy and determine which scope "wins"
when the same name exists in multiple
scopes visible at a given time.
An important part of scoping rules is what happens when a name is looked up in a scope where it is not bound to any object. As we will see, different languages have different
ways of handling this case.
A common pattern is called _lexical scope_.
This means that objects, particularly functions, consider their parent scope to be
the scope where they were defined. In a lexically scoped language,
if a function can't find an object called
`x` in its execution environment,
the interpreter or compiler would look next
in whichever environment the function was
defined in (often called the enclosing environment). This behavior has the
advantage of consistency, since a given function's enclosing environment is the same
no matter where the function is called.
Languages with lexical scope often allow
assignments in a function's enclosing environment.
If a function is defined in the global environment, this is usually bad idea,
since it violates the principle that functions should avoid unnecessary side
effects. But side effects are no issue if that enclosing environment is
_another_ function's execution environment. Suppose `f` is a function that returns a function
`g` when called. `g`'s enclosing environment, the place it was defined, is `f`'s execution environment. So any binding `g`
makes in its enclosing environment when it is called is
visible _only_ to subsequent calls to `g`!
This fact makes it possible for functions
to read and write a private cache, enabling a host of powerful programming techniques.
This post compares the syntax several languages use for assignments in enclosing
scope. We'll implement a common pattern in each and contrast the
languages' very different means of supporting the same basic idea.
That pattern is called [memoization](https://en.wikipedia.org/wiki/Memoization).
It can be used to write functions that "remember" their result for a given
input after computing it the first time. This approach trades performance for
memory; used well, it can speed up code by saving the results of expensive
computations instead of repeating them.
We'll compare how several popular languages approach lexical scope using a simple application called `neighbors`.
`neighbors` is a function that returns a memoized function that finds the
four non-diagonal neighbors of a point
on a two-dimensional [Cartesian](https://en.wikipedia.org/wiki/Cartesian_coordinate_system) grid.
Each time this function is called on a new point, it computes that point's neighbors and stores them in an associative data structure in its enclosing environment. Subsequent
calls with the same argument find the cached value and return it, preventing repeated computation. While trivial, this technique
might be useful on certain [Advent of Code](https://adventofcode.com) problems, where
it might be necessary to check a point's neighbors many times.
For simplicity, I make no attempt to verify that inputs as two-dimensional coordinates,
which a real implementation would have to do.
# Python
We may as well start with good old Python.
One of Python's core design principles is
"explicit is better than implicit." So it is with lexical scope. If you want to
modify a variable in the global environment from within a function (which you probably shouldn't),
you have to use the `global` keyword:
```{python}
x = 5
def modify():
global x
x = 6
modify()
print(x)
```
There is a different keyword, `nonlocal`, that should be used when the enclosing environemnt
is a function's evaluation environment.
Here is `neighbors` in Python:
```{python}
def neighbors():
memo = {}
def inner(x):
nonlocal memo
if x not in memo.keys():
memo[x] = ((x[0] - 1, x[1]),
(x[0] + 1, x[1]),
(x[0], x[1] - 1),
(x[0], x[1] + 1))
return memo[x]
return inner
get_neighbors = neighbors()
get_neighbors((2, 3))
get_neighbors((1, 4))
get_neighbors((2, 3))
```
This implementation shows off one of my favorite Python features: any immutable
object can serve as a dict key, not just strings. We can just
use coordinates as indices.
The use of keywords makes it impossible to use global data unless you really want to, but
it's hardly elegant.
I could have avoided most of the work above by just writing a function to
compute neighbors and adding the `cache` decorator from the `functools` module.
(In Python, decorators are functions that modify other functions. A special syntax
exists for applying them). That would have memoized the function automatically.
But I think it's instructive to demonstrate the concept in pure Python.
# Bash
Speaking of elegance, we'll now consider a language about as elegant
as an [A-10](https://en.wikipedia.org/wiki/Fairchild_Republic_A-10_Thunderbolt_II): Bash.
Bash is designed for interactively managing
filesystems and operating systems, or
writing scripts that do the same thing.
Its syntax isn't pretty, but it's brutally
effective in its role.
But this pattern can't be implemented in Bash, so far as I know.
Bash is dynamically scoped, not lexically scoped. That means
a function that fails to find a variable in its local scope will next look in its
_caller_'s scope, then the caller's caller's scope, and so on, instead of the scope
where it was defined. If we tried to implement a Python-style `neighbors` in
Bash like this:
```{sh, eval = FALSE}
neighbors(){
declare -A memo
local code='get_neighbors(){
local x="$1"
local y="$2"
local hash = "$x,$y"
if [ ! -v memo["$hash"] ]; then
local west="$(($x-1)),$y"
local east="$(($x+1)),$y"
local south="$x,$(($y -1))"
local north="$x,$(($y +1))"
memo+=(["$hash"]="$north-$east-$south-$west")
fi
}'
exec "$code"
}
```
the function it returned (to be exact, the function generated by the code string it evaluated ) would not "remember" the `memo` array, because
bindings in the scope where the array was defined would not be preserved.
Even if we ignore that fact, Bash is a bad choice for this problem because it doesn't support
multidimensional arrays, so I had to resort to some ugly code to represent the coordinates
as a string. Bash is indispensable in many situations, but not this one.
# Lua
Lua is a fast, lightweight scripting language. Though its design is minimalist,
with simple syntax and few data structures,
it does support lexical scope, enabling the
usual functional programming tricks.
```{lua}
present_table = function(x)
result = {}
for i, tab in ipairs(x) do
result[i] = "{" .. table.concat(tab, ", ") .. "}"
end
return "{" .. table.concat(result, ", ") .. "}"
end
neighbors = function()
local memo = {}
return function(x)
local hash = table.concat(x, ",")
if memo[hash] == nil then
memo[hash] = {
{ x[1] - 1, x[2] },
{ x[1] + 1, x[2] },
{ x[1], x[2] - 1 },
{ x[1], x[2] + 1 },
}
end
return memo[hash]
end
end
get_neighbors = neighbors()
print(present_table(get_neighbors({ 2, 3 })))
print(present_table(get_neighbors({ 1, 4 })))
print(present_table(get_neighbors({ 2, 3 })))
```
Lua automatically finds the `memo` table (what Lua calls its all-purpose record data structure) in the
enclosing scope and modifies it. I had to write my own function to print tables, however, because Lua only prints a table's memory address by default. The only other
wrinkle is that Lua makes variables global _by default_. You have to use the `local` keyword to bind variables
in a function's execution scope (or make them local to a script). Since
creating global variables from functions
is usually a bad idea, I think it should be the other way around, as in Python, but that's a
matter of preference.
This snippet demonstrates another neat feature of Lua: looking up a nonexistent table
index or unbound variable returns `nil` instead of an error, making it safe to
test for index existence.
# JavaScript
Like Bash, JavaScript is respected
for its usefulness and ubiquity rather than its
elegance. Say what you will about JavaScript's
loose typing, design inconsistencies, and erratic syntax,
it isn't Python that powers the modern Web.
As in Lua, there's no need to use special
syntax to write in the enclosing
scope. But I do have to use JavaScript's declaration keywords, which have characteristically intricate rules. I need to use `let` inside the function instead of `const` to avoid creating global variables; `var` has yet another assignment behavior. Still, it's easy enough
to express the functional idiom:
```{js}
function neighbors(){
let memo = {};
return function(x){
let hash = x.toString();
if(!(hash in memo)){
memo[hash] = [[x[0] -1, x[1]],
[x[0] + 1, x[1]],
[x[0], x[1] - 1],
[x[0], x[1] + 1]];
}
return memo[hash];
}
}
const get_neighbors = neighbors();
console.log(get_neighbors([ 2, 3 ]));
console.log(get_neighbors([ 1, 4 ]));
console.log(get_neighbors([ 2, 3 ]));
```
# SQL
Surprisingly, SQL provides
the most elegant interface for functional
programming. Strange as it sounds, its
expressive, if finicky, declarative syntax
enables powerful idioms. With creative use of `GROUP BY`
clauses and certain window functions, it
is possible to -
Just kidding. But it probably is possible,
albeit silly,
to implement `neighbors` in procedural
SQL.
# R
As in many things, I'm biased toward R because it was my first language. But
I think R wins this comparison on the merits.
R has the most elegant solution to writing in the enclosing environment.
The regular assignment operator, `<-`, only ever binds in its caller environment
(and yes, it technically is a function, so it's correct to speak of its caller environment). Instead, R implements assignment in enclosing
environments using a _super_assignment
operator, `<<-`. (In R, environments are first-class objects that record their parent
environments). This operator checks the
parent of the caller environment for a binding with the same name as the one being used, and overwrites that binding if it exists.
If the name is not bound in the parent environment, it repeats this process for each
parent of that environment, and finally binds the name in the global environment if it is
not defined anywhere.
This power can be used for evil, as in this snippet that modifies a global variable:
```{r}
f1 <- function(){
x <<-5
}
f1()
x
```
Or for good, as in this version of `neighbors`.
It works because closures, the type R uses to implement most functions, record the environment
where they were defined.
```{r}
neighbors <- function(){
memo <- new.env(hash = TRUE)
function(x){
hash <- paste(as.character(x), collapse = "")
result <- tryCatch(get(hash, memo), error = function(e){
result <- list(c(x[[1]] -1, x[[2]]),
c(x[[1]] + 1, x[[2]]),
c(x[[1]], x[[2]] -1),
c(x[[1]], x[[2]] + 1))
memo[[hash]] <<- result
result
})
result
}
}
get_neighbors <- neighbors()
get_neighbors(c(2, 3))
get_neighbors(c(1, 4))
get_neighbors(c(2, 3))
```
Notice how `<<-` still works correctly,
even embedded in an anonymous function passed as
an error handler to `tryCatch`. The one
annoyance is that numeric vectors can't
be used as names.
# Conclusion
At the end of the day, broad techniques like memoization transcend any individual language.
All the languages above (except Bash) make it relatively easy to use memoization.
As with most programming concepts, translating the principles into any given
language is much easier than understanding them in the first place. Still,
the comparison here highlighted each language's character: Python's explicitness,
Lua's simplicity, and R's lovable blend of elegance and jankiness.
(Also worth noting: the languages featured each call their associative
data structure something different).
I like R's approach best. A special version of the assignment operator clearly
signals that something unusual is being done, and it is visually distinct yet
hard to type by accident. (RStudio, the premier R IDE, has a keyboard shortcut
for `<-` that makes mistakenly typing `<<-` very unlikely). But which is
your favorite?