Discussion about this post

User's avatar
The Dangerous Maybe's avatar

I’m glad to see you’re on Substack.

Expand full comment
Thomas Murphy's avatar

This is a very strong piece that far exceeds its assignment as a review (indeed, you have sketched out the beginnings of a general theoretical foundation for LLMs). The point about newer reasoning models in particular treads carefully between knee-jerk deflation and recognition of novelty. While it essentially involves passing forward acquired context or reprompting, what seems more interesting to me is the path dependency of this process of context acquisition, which, as you note, constrains the space of possible responses going forward. It also makes it possible to discover and isolate subsequences where reasoning goes wrong. There are a couple of points with which I take umbrage:

- Human cognition is intellectual ectypus in Kantian terms, meaning that it is irrevocably constricted to cognition through representation (i.e., in the concept in the form of judgement). Empirical schemata are products of the rather fuzzily-defined faculty of the imagination. They are temporal mediations between intuitions and empirically acquired concepts. The model here makes it sound like you are directly equating learned representations—weights—with these rules for the dynamic application of the concept to intuition in time. However, the schema is a rule that applies an already-possessed concept (whether a priori or empirically acquired). LLMs approximate the generalised abstraction of the concept via brute force: scaling up parameters/empirical instances. Surely these weights are more like the mechanisms of association by which empirical concepts are first formed from repeated intuitions? Granted, Kant is less clear on this phase than on the theory of judgement, but LLMs are still operating at the level of the empirical laws of association. To suggest otherwise would be to argue that learned representations are the same thing as concepts. At least for now, they do not have that scope, that ability to generalise: the reliance on the massive scale of empirical particulars in the training set and the vast disparity in energy expenditure for the performance of equivalent linguistic tasks secure this. This seems to undermine your other point about their activity being pre-cognitive/sub-representational. I would suggest either way that because LLMs are parasitic on the substrate of human language reproduction, they are dealing with representations-of-representations (i.e., they are doubly ectypal). This seems better than attempting to map pieces of the theory of judgement to computational processes and elements in LLM architecture. Much of the philosophy attempting the latter has been radically unsuccessful. See below on the territorial transcendental/empirical confusion.

- The humanism charge, which is a bit of a canard and quite hackneyed in academic circles at this point—almost an "epistemological aesthetic"—does not land. LLMs are operating on the products of human cognitive labour. They are not being misrecognised for human output; they are recombining human output. The various attempts at philosophies of the inhuman in recent years, which remain yoked to the human by defining themselves against it, are not especially helpful in this specific instance.

- To preface this, I am not quite sure from this piece whether this is your position or Weatherby's. However, the understanding of classification here seems not a little dated, confined to hand-labelled training data (supervised learning). Self-supervised learning has been the dominant mode since BERT/GPT. There are also foundation models trained on unlabelled corpora where the question of labelling simply becomes "predict the next token". There are forms of reinforcement learning involving preference comparisons rather than conceptual classification. The classifier as human-judgement-mediated hypothesis generator has been obviated by this variety of approaches in general and self-supervision in particular. It still holds out somewhat as far as training set curation goes, but barely, and certainly does not legitimise a purely linguistic approach.

- The invocation of the Curry-Howard-Lambek correspondence seems like a bit of sleight of hand to smuggle empirical content into the a priori, which strikes me as a general problem with the neorationalist project. Computation is not substrate-independent. Otherwise, why would we have to restrict the mathematics of computer science to sequential time, effective procedures, finitary methods, or discrete state spaces? Why not pure mathematics? The very reason we use intuitionistic/constructivist logic is due to the time-bound nature of computational processes. There is a kind of notational fetishism or idealism regarding symbolic cognition here: successful models of computation are not in themselves computation. Turing machines, lambda calculus, general recursive functions, operational semantics, process calculi: these are ways of modelling computation; they are not constitutive of computation itself. CHL connects formal systems but does not make the passage to concrete computation and therefore inhibits analysis of the effects of computation in the real. The transcendental is not computational because computation is already an empirical specification, i.e., a constraint on what physically realisable systems can do in time, not a constraint on the conditions of possibility for experience as such. This move from "mental construction requires time" (Brouwer) to "all temporal processes are computational" substitutes a class of mathematical models for the structure of synthesis itself. This seems like straightforward dogmatism: tracing the transcendental in the image of the empirical.

Expand full comment
3 more comments...

No posts

Ready for more?