We conducted a test back in 2023 (where did the time go?) to see if ChatGPT could generate reliable code for an HTML email. The verdict then: no, it couldn’t. While some of the results were close, they weren’t nearly close enough to use in the real world.
Technology doesn’t stand still for long and things have of course moved on since then. We ran that original experiment on ChatGPT-3.5. Its successor, ChatGPT-4 has since arrived on the scene. The new version’s programming skills are better than ever, and its memory within a conversation has been considerably extended. You know what that means – it’s time to put ChatGPT’s email coding skills to the test once more.
Fair fight
We’ll follow the same steps as the original test. Scientific accuracy is important! Or at least we will start out the same way. Depending on ChatGPT’s output, the test could end up following a completely different path.
Our original test kicked off with an intentionally succinct request:
Code a responsive email template
Due to the nature of our request, the interface has switched to a development-oriented view, with a large code panel taking up most of the screen, and the chat itself shifted to a smaller area on the left. In addition, a pop-up has appeared:
Clearly, ChatGPT is taking its coding assignment seriously. This feels much more like a development session already, rather than simply a chat about development.
Anyway, let’s see what it has given us. Unlike the first stage in the original test, it has generated some table-based code rather than email-unfriendly divs. That’s a positive development. Here’s what it looks like in a browser:
But I can see that it’s not going to work properly. Most notable is the blue button that relies on CSS in the document head. Outlook won’t recognise that, so it’ll collapse. Here’s how it looks in Outlook 2021:
While it does fare well in other email applications, being broken in Outlook is a deal-breaker.
Result:
A more promising start than before, but still unusable.
We could take ChatGPT up on its refactoring offer here, but the code isn’t nearly usable enough to simply iron out a few kinks. We’ll start a fresh chat and be clearer about our requirements from the outset.
So let me tell you what I want
When working with any generative AI, it can be easy to forget that it is not magic. It can’t read our minds. The more detailed the instruction provided by the user, the better the chance of usable output.
With that in mind, we’re going to add a lot more detail to our prompt:
Code a responsive email template, using HTML tables for structure. Set the width to 600 pixels on desktop, with a fluid width on mobile. Include CSS classes to enable stacking of content on mobile devices. Include all known email client fixes that are still relevant. Set the page background to a light grey colour, and the email content area to white.
That matches word-for-word what we asked for at the second stage of the original test. The output however is better. There’s some stacking code for mobile, and a breakpoint of 600 pixels. Here it is in a browser:
As before, I can see that the code is imperfect. But it’s not so bad that I’d skip the testing stage, so I’ll check it out via a real email test again. It renders well pretty much everywhere. iPhone Mail and Gmail apps stack as promised:
Even Outlook looks pretty good:
…at first glance. But let’s be honest, there’s not a lot going on in this template to mess up. There’s no button this time around. Did the AI decide to skip that in order to make its life easier? That’s the most accidentally human thing I’ve seen it do. There are no images or other particularly visual elements beyond some background colours.
About that – the blue area around the heading is greatly inflated in Outlook. That’s not a big deal in this simple template, but in a real email with actual graphic design elements it would mean that things break.
Result:
It’s better than stage 2 of the original test, but still not actually usable.
Working towards a goal
For the third step in the original test, we decided to approach the project from a different angle. Instead of asking for a base template, we instead designed a simple email for ChatGPT to recreate. Here it is:
Before we go ahead with that, it’s worth pointing out that the overall test has already failed at this point. Our initial objective of creating a base email template is not an unreasonable one. It’s a task that might actually be assigned to a human developer. That person would be qualified to know the typical content and structure of a marketing email, without the need for a mockup for guidance.
With that said, let’s continue. While it is now possible to upload images to ChatGPT, that wasn’t so during the original experiment. So we’ll stick to words. Our prompt is a lengthy one:
Code a responsive email template, with the following requirements:
• 600 pixels wide on desktop
• Fluid width on mobile
• A page background colour of #f1f1f1
• Email content area background colour #ffffff
• A hero section with an image, heading, paragraph of text, and a button
• The hero image should be 600 pixels wide, to match the email content area
• Button should be pill-shaped, with a background colour of #a56e53 and white text
• Under the hero section should be two secondary features
• Each of these must also have an image, heading, paragraph and button
• Secondary feature images will be 290px wide on desktop, to match their containing column, and expanding to full width on mobile
• Hero text and button should be a bit larger than those of the secondary features
• These secondary features should take the form of adjacent columns on desktop, each at 290 pixels wide
• Place a 20 pixel gap between them
• The secondary features must stack into a single column on mobile
• All parts of the email should have 20 pixels of padding on each side on mobile, except for the hero image which can be full width and touching the edges of the viewport
• All body text should follow this font stack: HelveticaNeue-Light, Helvetica, Arial, sans-serif
• All body text should be colour #61524b
• All heading text should be colour #a56e53
• Use lorem ipsum placeholders for text
• Enter all hrefs as # placeholders
• Apply links only to buttons. Do not apply links to images
• Include all known, currently-relevant email client fixes
• Include CSS or HTML comments around each section to explain what it is or does
• Set a mobile breakpoint based on a max width of 639 pixels
• To ensure compatibility with Outlook and other email clients, use HTML tables for structure
And once I add some image references, it’s ready for a test run. First thing I notice when checking it in a browser is the lack of padding around text and buttons:
Flaws or not, we’ll proceed to send it as an actual email test. Frankly, it’s not great. There are spacing and sizing problems even on iPhone Mail, the most robust of email clients:
Outlook is a total disaster with the two-column section trying to act full-width, this forcing the email to scroll horizontally.
Result:
A mess. That’s a downgrade from the equivalent step in the original test.
Here, try this
We’re an email marketing agency. That means we already have an email template and coding style of our own. So what we’ll do instead is give that base template to ChatGPT and then try to get it to recreate that mock-up. Is it sophisticated enough to pick up our coding style and apply it to the task?
The local browser result looks alright-ish. That last feature is attempting to escape the email, which is worrying.
It’s upon looking at the code that the situation become even less promising. ChatGPT has taken our template code and effectively said nah. It has pretty much ignored our base template and used its own code. Broken code. Let’s run an email test to discover just how broken.
Here’s Gmail mobile:
It stacks correctly but the alignment and spacing need attention. Unfortunately this is one of the better results. Gmail webmail suffers from the same ‘escaping content’ problem as the local browser preview, and as for Outlook – let’s not even talk about that.
Result:
It’s broken.
Last chance saloon
Only one step remains – providing ChatGPT with some feedback. Perhaps it can indeed refactor the code, with our guidance.
The local browser result of that looks pretty good (let’s just ignore those underlined button links):
It stacks, albeit with similar alignment and spacing troubles to previous efforts. As always, we’ll run it as an actual email test.
Oh.
Result:
We give up.
Getting nowhere fast
ChatGPT can generate code for JavaScript and PHP and various other programming languages. I even use it from time to time to help with scripting languages for different email platforms. So why is an HTML/CSS email development task failing over and over?
It’s because email code is weird. Not just weird but largely undocumented. And it’s undocumented because not everyone is on the same page about what an email actually is. Is it a simple digital letter, perhaps enclosing a picture or two? Because that’s what Microsoft Outlook expects to send and receive. Or is it a rich, CSS-laden piece of visual design? Because that’s what Apple Mail is capable of rendering. There’s a vast gulf between those polar opposites.
Not only is there no rulebook to which to refer, but ChatGPT is working blind. It’s not unusual to send multiple tests in a row when creating a new email template, checking it in the major email clients each time.
Now what?
Now nothing. Maybe some kind of custom implementation or plugin for ChatGPT could yield better email results. Or maybe it’s worth having a bash on a code-specialised AI assistant like Github Copilot.
But for now I’ve seen enough. Until such times as it’s easier to ask the machine than to do it myself, I’ll do it myself.