helping people Build quality AI products

Designing a product to dramatically improve the quality of Generative AI products built using Large Language Models

If you’re like me, you’ve probably had a few magical experiences using ChatGPT or Midjourney…but over time you realise the product’s quality is too poor to use day to day. You end up re-writing your “automagical” headlines, requesting iteration after iteration of an illustration or randomly writing prompt after prompt hoping for a better result.

I was contacted by an ex-colleague who has a hypothesis: the problem all these products have is that there is no good way to scale your LLM product while ensuring quality. For example:

  • There’s no easy way to tweak your prompts and know what is happening in your product

  • It’s hard to evaluate, measure and test the abstract concept of “prompt output quality”

  • It’s hard to know what is happening in production and when something breaks

  • It’s hard to experiment with your prompts and really understand the impact of changes over a large data set

I came onto the project as the founding designer with the goal of helping develop a vision of how we might solve these problems and get a MVP built. I ran a workshop and built a prototype to validate our idea with the goal of raising funding to pursue the product in earnest.

At a glance

  • Founding designer

  • Pre-seed company

  • Team of three - myself, engineering founder and product founder

  • Led user-centric workshop to identify top use cases and key features for the product

  • Developed a prototype and built a MVP with the goal of raising funding

  • Current status: in talks with several investors

When I joined my two friends they had been discussing the need for a tool to optimise building generative AI products for several months. There was lots of thinking and several customer discovery interviews, but not a lot of concrete ideas for how the product might work. My mission was to accelerate the thinking the team had done so far.

Kicking off with a (mini) workshop

Despite being under time pressure (we had an investor pitch in less than 2 weeks), I gathered the team together for a one-day workshop that included reviewing customer discovery interviews, analyzing competitors and aligning on our top customers and their problems. This workshop was cobbled together with just 48 hours of prep - some workshops can take a long time and can be a time-sink, but I wanted to show that you can still get the best parts of a design thinking workshop without spending weeks of prep and planning. Hustle was the name of the game!

The output of the workshop was a better distillation of our most important customers and their top problems and a start on some initial solutions and sketches.

Some of the potential solutions the team sketched as part of our workshop

Identifying personas & use cases

Building on the workshop insights, I formalized two distinct user personas: the less-technical prompt writer at a big company, struggling with crafting effective prompts and understanding where to integrate AI, and the founder/ CTO at a startup, seeking a way to iterate on their production prompts without disrupting their existing AI products.

Something we realized early on is that this product could get very complex. If we weren’t careful we could be designing for months to tackle every feature and edge case. To avoid this I led the team to identify the top use cases. Based on the workshop and customer discovery, I was able to identify four critical use cases that encompassed a variety of generative AI scenarios we could support:

Use Cases

Summarization

I have a bunch of information about local businesses (name, reviews, photos, etc.) and I want to produce a short summary of each to enable my users to make quick decisions about whether to hire the business or not.”

Generation

My social media team spend a lot of time putting together an email newsletter each month highlighting all the new content on our website. I want to see if this work can be automated by AI and assess the quality of the output compared to historical human editors.”

Chatbot

”I am building a support chatbot capable of responding to user queries by referencing FAQ content. I want to make sure the bot is friendly and its voice is aligned with my brand, but most important is that the bot effectively answers the wide range of questions my users might have.”

Categorization

”I have an internal system that rates each piece of UGC to determine if it is spam or valid content. I want to use AI to assess this content and be able to determine if it is better than my current system or not.”

I did get some pushback on the use cases, as I think it can be difficult for founders to “box themselves in” to a specific user journey. But having both the category of use case and an example specific use case allowed my partners to feel like the scope was broad enough but also allowed me to get super user-focused and start to explore the pain points a particular user might have.

putting myself in the users shoes

I immersed myself in the role of a user by generating real content for each use case. I would write the prompt and generate the content exactly like a real user would. ChatGPT helped a lot with this. This hands-on approach provided invaluable insights into the challenges users faced when crafting effective prompts and provided a lot of insight into the types of features and UX they would need.

It may seem like a small (and time-consuming) detail but using real content on data-heavy screens helped me strike the tricky balance between UI simplicity and giving the user the information they need to make decisions.

Getting hands-on: Creating the UX & UI

I initiated the design process with low-fidelity wireframes, gathering feedback from the team. This initial feedback unblocked our CTO to start development so he could start developing our “proof of concept” MVP.

I enjoy sketching and being able to explore UX decisions and trade-offs super quickly with the team was invaluable

In the second pass, I transitioned to high-fidelity designs, creating a simple design system and focusing on key pages and interactions. I was definitely rusty when it came to the high-fidelity screens…for a moment I felt I had forgotten how to design, but with a few iterations I was happy with the results.

I enjoyed honing my technical skills with Figma’s newer features like variables and variants. Using these skills, I developed a functional prototype to facilitate user testing and further refine the product.

As a reminder: the goal here was not production-read mocks, but something that was thought through enough for engineering to start, and a prototype that we could use to show investors the key functionality. To that end - the designs were a success,

The Outcome

By taking my knowledge of the user-centric design thinking approach and shaving it down to its critical steps, I was able to achieve our goals in an incredibly short amount of time.

  • In just one week, we transformed a concept that had been brewing for six months into a viable MVP and compelling pitch deck.

  • We successfully initiated investor discussions, indicating interest in funding our proposal.

  • I was reminded that I still can hustle as a founding designer - and enjoy it!

Key Learnings

  1. You have to balance execution with vision: Working with an early-stage startup necessitates a balance between maintaining a flexible vision and addressing the team's immediate needs. it requires constant vigilance to avoid excessive fine-tuning in the absence of a concrete product - you need to be constantly asking of your design work: “What is the most critical thing the business needs me to deliver ?”.

  2. The power of real content: Even in a highly technical product, authentic content is paramount. Using real data allowed us to grasp the nuances of user interactions and ensured the end product addressed real-world needs.

  3. AI will change how we design: This project provided a glimpse of how AI tools will revolutionize design processes. They summarised my workshop and customer discovery, helped write personas and generated illustrations for the prototype. Staying at the forefront of future tools in the space will be critical for highly effective design teams.

  4. I am (still) a hands-on builder. By the end of my six years at Yelp I was starting to feel more distant from the work. Getting back to my roots as a founding designer confirmed that not only do I still enjoy the hands-on work, but I can also produce something of reasonable quality.

Further Reading

  • It was fun innovating and getting hands-on with a founding team - I did a similar task when I helped re-invent the Services space at Yelp and launched a new product. Read the case study.

  • Learn about how I took the Yelp Consumer design team from a struggling team of underperformers to a team of leaders that transformed the product. Read the case study