2026-02-14

Education and Large Language Models

a thought dump

I sat on this for a year, spent doing more research and teaching. Re-reading what’s below I’m comfortable with publishing it as-is, although I need to make a disclaimer: All of this only applies to post-secondary studies. I don’t think putting AI in the hands of literal children is good.

I spent quite some time this year teaching, working on lab sessions and discussing with colleagues on the job. Unsurprisingly, AI came up a lot in discussions. It also came up in reports and assignments, more or less obviously. I thought I would dump some of what I’ve taken out from the experience and the discussions.

The reason there are actual thoughts to dump here is that I believe banning usage of LaLaMos is retarded. Even if you work within a specific domain where they are mostly clueless (I would know), there are still some valid usages here and there, some of them greatly accelerating the tasks that cannot be delegated. Even if banning usage was desirable, then we would run into feasability issues.

AI is improving, and it’s here to stay. I don’t think the “exponentially improving” trope sold by AI companies is true though, an S-curve seems more realistic to me. If we approach the problem with Amara’s law in mind, then we have to learn (and teach) about these new tools to not be left behind.

From my observations, usages of LaLaMos roughly overlap with a Gaussian distribution:

behind-the-curve students will copy-paste questions-&-answers in-&-out
middle-of-the-curve students will read, understand, and reformulate questions and answers,
ahead of the curve students will leverage LaLaMos for task variety (e.g. prompt it when stuck, compress information of a document, …).

Notes:

These behaviors pretty much fit the global user base.
Everyone, including CS students, still overwhelmingly use ChatGPT. People now know that alternatives exist (I’d say thanks to people at DeepSeek), but they don’t bother with it.
Groups 1 and 2 may act this way out of laziness or lack of competence. Both happens, and aren’t mutually exclusive.
Groups 2 and 3 may occasionally give in to laziness and act like group N-1. We’ll ignore it since we’re reasoning with averages.

We should also define (non-)goals:

We should maximize the experience for students who already care.
Making students care is still part of the job, but it will probably be harder since there is now one more tool which encourages laziness.

More notes:

A much larger reoccurring discussion is “how do we make student care?”, which is 50% of education IMO.
Another stake of dealing with LaLaMos is retaining the value of education. I hate the diploma-value discourse and idea, but it is still important, especially since we are currently realizing most diplomas aren’t needed.

Excluding teaching about LaLaMos, there are a number of thing we can do to ensure the quality of education doesn’t go down.

Evaluation formats must change to some extent. What is applicable realistically depends on the number of students, which is mostly tied to the year they’re in here in France.

Grading code that comes from a lab session or a homework is pretty much a no-go, especially if the subjects are puzzle-like (think leetcode) or just a bunch of ToDos. LaLaMos will nail these, with or without precise indications. Code assignments should be proper projects evaluated with a report and a defence.

They can be individual or group projects, this doesn’t matter; ideally you’d do both for general education, but this isn’t relevant for our specific discussion.

The report is here to favor exploration of the subject (assuming we also loosen guidelines a little), while the defence is here to assess how comfortable students are with the task they just completed. This allows teachers to (a) identify imbalances in group work, and (b) confirm dubious LaLaMo usage.

Now, some very-intuitive-but-absolutely-abstract-and-black-and-white examples:

A student that has a good grasp on the subject and used AI to improve his efficiency, or some phrasing in the report will receive high grades in both report and defence.
A student who was a little dumber with his AI usage (e.g. large copy-pasted sections in the report), but understands the subject and can answer questions about the “suspicious” paragraphes of the report will receive a lower grade on the report.
A student who blatantly offloaded the project to an AI and isn’t able to demonstrate his own understanding will be sent to the shadow realm (sorry, I don’t make the rules).

Those are some very rough archetypes, yet I can very easily associate these with some of the stuff I’ve seen this year. I therefore declare those are good.

Not everything needs to be an assignment, or directly related to code. I’ll come clean directly and admit: I am a pen and paper advocate. I’m not a fanatic who thinks writing code on paper for an exam is perfectly fine, but I am biased.

But hey, at least you don’t have to worry about digital calls to cthulu when the only things your students have access to is a pen and sheets of paper.

Examples:

logic gates, ML, basically all fields that are just math
EE (thats pushing it, but embedded systems are CS)

even hardware related stuff:

some optimization (here’s some code, here’s the data it accesses, design a layout that would improve cache usage, explain your design)
speculative ASM execution (w/ pipeline stages, hazards handling depending on arch components, …)