Crosstwine Labs

From: Harvis Wang

In some code you posted earlier, it seems that the QuotedCellState function returns either QuotedCellStateSeenQuote, which is another function, or itself.

Can you confirm that that’s what’s happening? If so, can you tell me a bit more about how this works?

Thanks!
[Question rephrased -Ed]

From: Damien Diederen <dd@crosstwine.com>
Date: Wed, 16 Jan 2013 12:32:04 +0100

Hello Harvis,

I see that you are making quick progress :)

The code you are looking at is part of the Veda distribution, and implements a small state machine for parsing CSV data streams.

There are many ways to parse text, but coding a state machine has some advantages: the state can be updated on a character-per-character basis, so the parser never blocks waiting for input, and it makes it “fairly easy” to see the state graph in the code.

The current implementation in csv.ils doesn’t take advantage of the first point, because the only published function is a “pump”, which feeds characters coming from a source into the state machine.

Let’s start from there. The relevant code is in the ParseCharStream function; here is a slightly simplified version:

(let ((state InitialState))
  (while state
    (let ((c (GetNextChar)))
      (setq state (funcall state c)))))

The state variable holds the current parser state. Its initial value is, indeed, a function.

This is one of the strong points of SKILL++: the language makes it very easy to use functions as values. And contrarily to legacy SKILL, these functions can be full closures (even though they are not used as such in our case), which can encapsulate additional information.

That makes it very, very convenient to implement the states as a set of functions sharing a common invocation protocol (they all take the same arguments), and which, when invoked, return the next state.

As an special case, the final state is not a function, but nil—which causes the while “pump” to exit, and parsing to stop.

Note that in our CSV parser, the equivalent of GetNextChar returns either a character or nil on end of file; the state functions have to account for that, hence the null tests.

The code you are mentioning:

;; An ASCII quotation mark has been seen; the cursor is collecting
;; characters within quoted cell contents.
(defun QuotedCellState (p c)
  (cond
    ((eq c asciiQuotationMark)
     QuotedCellStateSeenQuote)
    ((null c)
     (Error p "Unexpected EOF within quoted CSV cell."))
    (t
     (PushToCell p c)
     QuotedCellState)))

corresponds to the “Quoted” state of this subset of the state machine:

Subset of the CSV parser’s state machine

where:

There you are. Functions which manipulate functions are called higher-order functions, and are indeed quite useful… but I’m sure you’ve encountered them before; sort is probably the most common occurrence:

sort(
  l_data
  u_comparefn
)
=> l_result

Now you know that there’s nothing magical about them; you can write your own!

Hope this helps,
Damien