STORM at Stanford – An AI Breakthrough?

I have been finalizing the chapter drafts of the new edition of my book Learning Engineering Practice. The publishers were keen for me to address AI, artificial intelligence (or actual incompetence?). In the first chapter I addressed this topic, arguing that AI tools really just average all the hearsay and misplaced notions one can pick up on the internet. Even the best (Perplexity) could not pick up systematic research on engineering practice that shows engineers are not doing what they’re taught in school. Any engineer can tell you that, of course. But could STORM, a new AI tool configured for real researchers, really figure that out? I actually hoped that it would, and I would therefore have to modify my final first chapter draft.

I heard a glowing account of STORM on Youtube from Danny Lui at Sydney University in a TEQSA video: you don’t get a much more authoritative source on higher education than that in Australia. So I decided to give it a trial run. My research focus is engineering practice, systematic ethnographic research studies on what engineers really do, which is not what we teach in universities.

STORM was touted as a big advance on ChatGPT because, for one, the references it gives you are actual sources, not invented ones.

I first asked for a review of research studies on what engineers actually do in their work. The response was a rather boring “average” of widespread popular notions on what engineers do, framed in terms of design, problem-solving and communication skills. Collaboration was acknowledged as a critical element (10/10), mentioning that engineers frequently attend meetings with colleagues, clients and project managers. Effective communication, it said, is essential for bridging the gap between technical and non-technical team members. And then gave me a reference to a site advertising a recruitment agency, not a peer reviewed source.

Unfortunately, STORM was unable to distinguish systematically researched studies on engineers at work from hearsay and gossip sources. It’s analysis of engineering work reflects widely held but woefully inaccurate ideas that circulate in engineering faculties, where the faculty teach but reluctantly admit that they know nothing about practicing as an engineer.

So, I decided to give STORM another chance. I gave it a more precise prompt: “Engineering practice, the nature of engineering work”. Unfortunately, the results were broadly similar, almost word for word in some sections.

So, as a last chance, I gave an even more precise prompt: ethnographic studies of engineers at work. Again the results were disappointing and again STORM could not differentiate between systematic peer-reviewed research studies in books and journals on the one hand, and social media gossip on the other. The single study it managed to find was inaccessible because the URL was incorrect. However, I did locate the 25 year old article and found it was very superficial in its findings and coverage.

Would this be useful for an entry-level researcher? Absolutely no. The results are misleading, dated, based on commercial hearsay and gossip. Any student using this tool will get is a lot of trouble with a research supervisor who knows the topic. This has done little to move my current assessment of AI as artificial incompetence. And, I have argued that the AI business case does not stack up, given my knowledge from being in robotics and artificial intelligence since the early 1970s and my current knowledge of the digital advertising industry.

Sometime, soon perhaps, but maybe still in a few years, I foresee a horrible financial crash when investors finally work out they have been sold just hype. But don’t bank on it.

Picture Credit: Photoshop generative image production. I acknowledge all the artists whose work was scraped off the internet, probably without their knowledge, and certainly without any financial compensation, to enable software like Photoshop to generate images like this.

Added

After testing STORM, a friend suggested I try Claude. Claude provided similar responses to Perplexity and ChatGPT. I directly asked Claude for a literature review of ethnographic research on engineering practice and how it reveals what engineers actually do, and what we don’t know about engineering practice, the result was 7/10 for a student literature review. It was still easy to identify it as AI-generated.

The point here is that it was only because I knew that the research existed, and the keywords to find it, that Claude was able to generate a satisfactory result. A Google Scholar search yielded far more and more up-to-date references. Claude (and Perplexity) were able to identify relevant issues.

Only Claude was able to write a reasonable literature review document. So, yes, nice tools which would definitely help get a young researcher started, but no substitute for actually reading the literature and allowing time to understand what it tells us.

All of the Chatbots responded to the initial question “What do engineers do” with popular misconceptions, and none could point out that researchers have shown these to be misconceptions without being specifically directed to that research.

When I asked Claude what it takes to transform an engineer’s concept for a solution into practical reality, again the response provided an outline of the technical steps, missing a lot of what has to happen. Crucially, it missed the essential requirement for finance! This reflects the widespread intellectual separation between engineering writing and business.

So AI chatbots, predictably, are a good way to assess popular misconceptions on a topic. A knowledgeable human researcher is needed to help them get closer to the best known truth of the matter. And Google Scholar (and other specialized search tools) are essential for getting to the contemporary literature.


Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.