Granola
Granola is an AI-powered notepad for professionals with heavy meeting schedules. It captures microphone input and system audio to transcribe conversations without meeting bots, allowing users to take manual notes that the AI later structures into clear summaries and action items.
All CustomersAutomated Testing
We transformed Granola’s manual QA workflow into an efficient pipeline by building a custom automation framework from scratch. Using Playwright, Electron, and GitHub Actions, we automated 76% of their core regression suite across macOS and Windows, drastically speeding up their weekly release cycles.
Learn moreAI Testing
Testing Granola’s dynamic AI meeting notes required a specialized approach beyond standard exact-match validations. We thoroughly evaluated the LLM for factual accuracy and integrated AI directly into our own testing scripts to verify that generated summaries consistently captured the right context and actionable insights.
Learn moreIntroduction
Granola functions as an AI notepad that works alongside the user during meetings. Instead of relying on virtual bots to join and record calls, the app captures audio directly from the computer while the user types their own notes. Once the meeting ends, Granola’s AI combines the user’s manual input with the full transcript to generate organized summaries, key takeaways, and action items. Users retain full control to edit these notes, apply custom templates for specific meeting types like 1-on-1s or discovery calls, and export the finalized information directly to platforms like Slack, CRMs, or project boards.
Challenge
As Granola’s user base and feature set expanded, the team recognized the need for a dedicated approach to quality assurance. Before partnering with QAwerk, Granola operated without an internal QA department, leaving developers and customer support teams to handle all testing alongside their primary duties. This setup, combined with a lack of diverse test devices, made it difficult to maintain stability during their rapid, weekly release schedule.
Granola turned to QAwerk to take full ownership of their software quality. To make our partnership a success and achieve tangible business outcomes, we needed to overcome several specific challenges:
- Keep Pace with Rapid Releases: With new features rolling out approximately once a week, we had to implement a fast and reliable testing routine that aligned perfectly with Granola’s fast-paced development cycle, ensuring quality without causing deployment delays.
- Expand Device and Hardware Coverage: Because Granola handles complex audio capture and runs in the background, it needs to perform flawlessly on varying setups. We had to utilize QAwerk’s extensive device infrastructure to thoroughly test the app across a wide range of hardware configurations and operating systems that Granola’s internal team didn’t have access to. We also validated its compatibility across various audio setups, including built-in audio devices, external wired speakers, wired headsets, and Bluetooth headsets.
- Relieve Developers and Support Teams: We needed to take the testing burden off Granola’s internal staff by independently investigating user-reported bugs, isolating the root causes, and creating clear, comprehensive test documentation from scratch.
- Build Test Automation from Scratch: To ensure long-term efficiency and product stability amidst frequent updates, a major goal was designing and implementing a robust test automation framework to handle regression testing and free up time for deeper manual exploration.
Solution
We took full ownership of Granola’s QA workflow to support their rapid release cycle. We built comprehensive test documentation from scratch using Qase.io (maintaining over 1,100 test cases), reported 90 bugs within the first two months, and implemented a multi-layered testing strategy tailored to an AI-driven, cross-platform application.
Here is a breakdown of the specific testing solutions we implemented:
- Functional Testing. We validated core macOS and Windows features, including transcription, AI note generation, offline modes, and audio device switching. We also supported major rollouts like Microsoft login, Recipes, MCP server, Granola Chat updates, Pre-Meeting Briefs, and the February 2026 rebranding to ensure the app performed dependably under real-world conditions.
- Regression & AI-Assisted Test Selection. To keep up with weekly updates without bottlenecking development, we ran targeted regression suites. Partnering with Granola engineers, we built a smart AI flow that analyzes pull requests and automatically selects the most relevant test cases.
- Dedicated iOS Testing. To support Granola’s mobile growth, we set up a structured testing process entirely dedicated to iOS. This included building out the necessary test documentation and running strict pre-release regression cycles. Our team rigorously tested the mobile app across different software versions (iOS 17, 18, and 26) and device form factors (regular, mini, and Max screens) to guarantee a consistent user experience on any iPhone.
- Test Automation Framework. We automated 76% of the critical regression suite and set up direct Slack reporting. To achieve this, we solved complex technical hurdles: bypassing Google SSO authorizations, injecting system audio directly into tests, and using AI within our scripts to verify Granola’s AI meeting summaries.
- Integration Testing. We tested Granola’s connections to essential external tools like Outlook, Google Calendar, Stripe, Slack, HubSpot, and Zapier, guaranteeing users could seamlessly export meeting data without synchronization errors.
- Cross-Platform & Compatibility Testing. Leveraging QAwerk’s extensive device lab, we verified app stability across various macOS and Windows hardware configurations, as well as across multiple iPhone models and iOS versions, ensuring a consistent experience regardless of the user’s setup.
- LLM Testing. We focused on evaluating the AI’s response quality, coherence, and factual accuracy across various real-world meeting scenarios. This included testing the model’s robustness against edge cases, such as ambiguous audio inputs and hallucinations, while ensuring it maintained accurate context throughout complex, multi-turn conversations.
- Usability Testing. To ensure the app was intuitive and distraction-free, our QA engineers approached testing strictly from the end-user’s perspective. We collaborated closely with Granola’s product team to refine the user experience, sharing actionable suggestions like clarifying error messages and adding helpful UI hints.
- API Testing. We conducted thorough testing of Granola’s API endpoints to ensure reliable data exchange, stable integrations, and seamless backend communication.
- Migration Testing. To guarantee data integrity during app updates, we performed strict migration testing on every release candidate. This ensured that users’ sensitive meeting notes, templates, and workspace settings were safely preserved when transitioning to the newest version.
By actively investigating user reviews, handling ad-hoc testing requests, and covering all new functionality, we provided Granola’s team with the confidence to release updates quickly.
Test Automation
To accelerate Granola’s release cycle and drastically reduce manual regression time, we built a scalable test automation framework. We prioritized automating stable, core functionalities like onboarding, note management, and workspace sharing, leaving hardware-dependent and external payment scenarios to the manual QA team to keep the automated suite fast and reliable.
The framework was built using TypeScript and Playwright to automate testing for their Electron-based desktop app across both macOS and Windows. We used the Page Object Model (POM) architecture to keep the test suite easy to maintain as the app evolves, and we integrated visual regression testing to automatically catch unintended UI layout shifts.
Automating an AI-driven app required creative problem-solving. Because AI-generated notes vary by design, we couldn’t rely on exact-match verifications. Moving beyond simple keyword validation, we implemented an AI-assisted testing approach, using LLMs to intelligently analyze the context and accuracy of the outputs. We also bypassed unstable third-party logins (like Google SSO) by implementing a secure, internal authentication helper to prevent false test failures.
Finally, we integrated the entire suite into Granola’s CI/CD pipeline using GitHub Actions. Tests now run automatically in parallel on Mac and Windows environments — either nightly or before a code merge. To ensure developers can debug quickly, we set up a Slack integration to deliver autotest reports fast to the team, providing rapid, actionable feedback on every build.
Bugs Found
The majority of the detected bugs centered on AI chat interactions, workspace synchronization, and subscription management.

Actual result: Granola fails to recognize the attached files.
Expected result: Granola successfully detects the attached files, reads their content, and provides answers based on the provided information.
Actual result: The paid seats counter does not update after removing a team member from the workspace.
Expected result: The counter decreases by one immediately after a team member is removed from the workspace.
Actual result: An error message is displayed when attempting to transfer notes to another workspace if the user has items in their “Trash” folder.
Expected result: Notes should transfer to the new workspace successfully without displaying an error, regardless of any items existing in the “Trash” folder.
Result
Our partnership transformed Granola’s quality assurance from an ad-hoc task into a structured, highly efficient machine. With technical debt and critical bugs out of the way, Granola was able to focus entirely on scaling its product. The business outcomes of this reliable foundation have been noticeable:
- Successful Pivot to Enterprise AI: With a thoroughly tested ecosystem for workspace management, permissions, and cross-account synchronization, Granola confidently expanded from a personal note-taking tool into a full-scale enterprise application. The platform’s rock-solid stability enabled successful adoption by major tech organizations, including Vanta, Gusto, Thumbtack, Asana, and Mistral AI.
- Rapid Adoption and a $1.5B Valuation: A polished, seamless user experience has driven exponential market growth. Granola recently raised $125M, reaching a massive $1.5B valuation. The app’s rapid penetration into the B2B space is further proven by financial data: it ranked among the top 25 fastest-growing tools purchased by Brex customers and appeared on Ramp’s lists for both trending and fastest-growing software vendors.
- Scalable QA Infrastructure: We replaced an undocumented testing process with a comprehensive quality architecture. By writing and maintaining 1,100+ test cases, we ensured total coverage of the app’s features. During our partnership, we uncovered and reported 200+ bugs before they could impact end-users.
- Accelerated Delivery Through Automation: By successfully automating 150+ core regression tests, we drastically reduced the time required to verify new builds. This allows Granola to maintain its aggressive schedule of weekly (and sometimes bi-weekly) releases, guaranteeing that new features roll out flawlessly without breaking existing functionality.
In Press
Need to keep your AI product stable during rapid releases?
Let’s talkTools
Qase.io
BrowserStack
PlaywrightQAwerk Team Comment
Yevhen
Pentester
Automating an AI-driven app like Granola required a completely different approach since the generated meeting notes vary every single time. Instead of relying on standard exact-match checks, we actually used AI within our own automation scripts to validate the transcripts, allowing us to successfully automate 76% of their core regression suite.




Related in Our Blog
Hidden Risks and Failures in AI Agents (And How We Found Them)
Have you considered that, with the exceptionally fast growth of this tech, the hidden risks of AI agents can have just as much impact on our lives as AI technology itself? In 2025, about 25% of businesses that ...
Read More
How We QA Chatbots That Give Different Answers to the Same Question
If your QA chatbot gives three different answers to the same question, users stop trusting it long before your funnel report catches up. That inconsistency is not a “quirk of generative AI,” it is a quality...
Read More
Testing AI Search & Recommenders: How to Avoid Confusing or Frustrating Buyers
Testing AI search and recommenders is critical to delivering a seamless user experience that engages rather than annoys buyers. Poorly configured AI search engines and ineffective AI recommender systems can fru...
Read More
8 RAG Evaluation Tools to Test and Debug LLM Apps
Most RAG failures don’t look like failures at first. The model sounds confident. The response reads well. But the retrieved context was wrong, or the answer drifted from the source entirely....
Read MoreImpressed?
Hire usOther Case Studies
Sitch
Delivered the rock-solid app quality this AI matchmaker needed to expand across the US and secure $6.7M in funding
Thirdfort
Ensured a smooth fintech app migration with stable onboarding, identity checks, and Source of Funds workflows.
Fext
Performed rigorous QA for a mass text messaging app, slashing post-launch bug reports by 65%
