SKILL: Using Functions as Values?
From: Harvis Wang
In some code you posted earlier, it seems that the
QuotedCellState
function returns eitherQuotedCellStateSeenQuote
, which is another function, or itself.Can you confirm that that’s what’s happening? If so, can you tell me a bit more about how this works?
Thanks!
[Question rephrased -Ed]
From: Damien Diederen <dd@crosstwine.com>
Date: Wed, 16 Jan 2013 12:32:04 +0100
Hello Harvis,
I see that you are making quick progress :)
The code you are looking at is part of the Veda distribution, and implements a small state machine for parsing CSV data streams.
There are many ways to parse text, but coding a state machine has some advantages: the state can be updated on a character-per-character basis, so the parser never blocks waiting for input, and it makes it “fairly easy” to see the state graph in the code.
The current implementation in csv.ils
doesn’t take advantage of the first point, because the only published
function is a “pump”, which feeds characters coming from a source into
the state machine.
Let’s start from there. The relevant code is in the ParseCharStream
function; here is a slightly simplified version:
(let ((state InitialState))
(while state
(let ((c (GetNextChar)))
(setq state (funcall state c)))))
The state
variable holds the current parser state. Its
initial value is, indeed, a function.
This is one of the strong points of SKILL++: the language makes it very easy to use functions as values. And contrarily to legacy SKILL, these functions can be full closures (even though they are not used as such in our case), which can encapsulate additional information.
That makes it very, very convenient to implement the states as a set of functions sharing a common invocation protocol (they all take the same arguments), and which, when invoked, return the next state.
As an special case, the final state is not a function, but
nil
—which causes the while
“pump” to exit, and
parsing to stop.
Note that in our CSV parser, the equivalent of
GetNextChar
returns either a character or nil
on end of file; the state functions have to account for that, hence the
null
tests.
The code you are mentioning:
;; An ASCII quotation mark has been seen; the cursor is collecting
;; characters within quoted cell contents.
(defun QuotedCellState (p c)
(cond
((eq c asciiQuotationMark)
QuotedCellStateSeenQuote)
((null c)
(Error p "Unexpected EOF within quoted CSV cell."))
(t
(PushToCell p c)
QuotedCellState)))
corresponds to the “Quoted” state of this subset of the state machine:
where:
Base → function
CellState
, the basic cell parsing state;Quoted →
QuotedCellState
, as discussed above;"?
→QuotedCellStateSeenQuote
, which should really be calledQuotedCellSeenQuoteState
, and has to decide what to do after a quotation mark: finish the quoted section, or insert a single"
in the cell?Error → Error state, another exception to the “rule” above: this state is not implemented as a function, but as a “throw”, as it has nowhere to go.
There you are. Functions which manipulate functions are called higher-order
functions, and are indeed quite useful… but I’m sure you’ve
encountered them before; sort
is probably the most common
occurrence:
sort(
l_data
u_comparefn
)
=> l_result
Now you know that there’s nothing magical about them; you can write your own!
Hope this helps,
Damien