nREPL with laziness in Clojure
26 October 2013I was using Clojure to process github repos for use in libsearch.nicksellen.co.uk and ran into a problem. I had a function that was going through each repository and pulling out the useful information. This is quite time consuming as I use multiple API calls per repository and so I wanted to print some output to show progress.
The problem was that I was only seeing output printed after the entire operation had finished.
I'm quite new to Clojure and there are multiple things that might have been involved:
- Clojure uses lazy sequences in a lot of places - values are only calculated when needed
- in functional languages side effects are kind of special, and writing to
stdout
is a side effect - I was running the code inside Emacs Live in a Leiningen REPL - that's a lot of layers in which to add behaviour
Show me the code!
I've removed all the specific code and reduced it to this single function which exhibits identical behaviour:
Code explanation:
- iterate over each number from
1
to5
inclusive - for each number
- sleep for 1 second
- print out
{timestamp} > {number}
- map value to
"#{number}"
- return mapped values
Executing this in the REPL gives me (I added the ; comments
in after):
Why is it not printing out as we iterate?
A couple of things to notice:
- the timestamps are as we would expect - they increment in 1 second steps
- the parenthesis contain both the output from
println
as well as a result from the map function - the
println
must be running - if thesleep
is running then surely the println before it runs right?
As I'm writing this code at Hacker Retreat there are people around to bounce thoughts off. After a chat with a couple of people I run it within the doall
function:
It works! And the opening parenthesis no longer contain the println
output.
But why?
Lets look what doall
does (as Clojure is mostly written in Clojure we can just read the source - I've been using clojuredocs.org):
this doesn't do very much, most of the work is done in dorun
:
This just repeatedly calls next
on our collection to realize the values from the lazy sequence. But why did that fix the problem?
The source doc explains this is a way to prevent execution being delayed until the seq is consumed, especially when side effects are involved. This sounds sort of related to my problem.
But hang on, my println
calls are not being delayed (the timestamp, and the sleep
call executing proved this earlier). But adding the doall
does fix the issue so it must be related somehow, I'm not comfortable just throwing a dorun
in there without really understand what is happening. We must go deeper.
Aha!
Then it came to me, I'm running in a REPL right? Read Evaluate Print Loop.
Here's a quick reminder of the code:
Lets think what happens at each stage when we run (run)
in the REPL.
Step 1: Read
The text is read and evaluated as a clojure data structure. Something like:
It's given us a list of length 1 containing an unevaluated symbol - run
.
Step 2: Evaluate
When we evaluate the code we see the output from the first call to println
:
Our returned value is a LazySeq and it has not been realized:
Step 3: Print
We have a sequence so it starts by printing (
. Our sequence is lazy though so it needs to realize some values from it. It reads up to *print-length*
(which defaults to everything). This is when Thread/sleep
and println
get called for each value.
The output from println
is intermingled with the values inside the sequence. BUT - this still hasn't recreated the problem, this printed out a line each second.
Step 4: Loop
Goes back to waiting for more input (not particularly interesting).
What have we learnt?
(added as comments in the code)
n for network
I'm using nREPL for my REPL - the n is for network. Code is typed, sent over the network to an nrepl server, evaluated, then returned. Understanding this is the key to solving the problem.
The Evaluate phase is performed in the nREPL server - and anything written to stdout during this phase is sent back over the network for display as it occurs.
The Print phase is also performed remotely - but output is collected and sent in one go.
Using doall
causes the evaluation of the lazy-seq to happen in the Evaluate phase and so send back println
output as it occurs.
Conclusion
It requires a surprisingly large amount of thought process and knowledge to investigate problems like this, and on the way I end up learning a lot → learning is you updating your mental models when they fail to predict reality.