One of Amplitude’s main use cases is monitoring core metrics and understanding their trends. Today, there are multiple ways users seek to understand changes in their core metrics within Amplitude.
The current ways our users monitor metrics and decide if a change is meaningful are very unscientific and full of noise, leading to wasted resources in investigating these fluctuations.
Design Lead
April - July 2020 (4 months)
Figma, JIRA, Amplitude,
So how does Amplitude's growth team fit into the rest of the product org?
Imagine growth as the resource utilized to test and validate concepts quickly and iteratively before investing resources full-time in those problem areas.
Growth assisted in meaningful discoveries around risk versus reward in various product teams' bets, before having to commit those to a roadmap.
Users need to see and understand the thing, apply and do the thing, and interpret the thing. That means:
Our solution offers value to all of Amplitude's personas while focusing on the optimal solution for our project's target persona. We built a tool that empowers users to uncover meaningful fluctuations quickly without reliance on others to complete their jobs to be done. The design also creates trust between the user and how Amplitude's model applies parameters in order to produce accurate results.
This stage helps us orient on a path forward with our projects. We solidify confidence in the path we're building to paint an end-to-end picture for optimal feature success. We hypothesized outcomes that might derive directly from the impact features we built to help stakeholders understand the value customers see with the tool.
Product teams should have the ability to detect anomalies in their data. This is the base-level MVP for this project.
Simultaneously monitor all parent metrics and subgroups without needing to define individual monitors for each permutation.
Users should be able to quickly distinguish what caused an anomaly whether it be instrumentation or user behavior change.
We laid the foundation for our Analytics team to continue to build upon to maximize the value customers gain from the feature.
A primary use case for sharing within Amplitude is when users find something "insightful". We hypothesized that users who identified anomalies would share frequently.
Amplitude isn't just for our data scientists. We want to enable product teams to detect anomalies as they occur, while utilizing fewer resources to gain those insights
% of weekly active users that have a “chart compute” event with the anomaly feature turned on.
The % of users utilizing the feature in week 2. Our proxy was our "compare to past" feature, with a 24% 2-week retention.
Broadcasted learnings are the number of notebooks and dashboards shared within an org between users. our target was 100 weekly BL.
An overview of the UX process applied to this project. Informing on what we started with, who we targeted, how we approached research, what we discovered, and the outcome. There are no specialized roles at Amplitude for a user researcher, so this process is part of the product designer's role expectations.
Initial research done prior to Growth's ownership of this project revealed that "pioneers" were our target personas.
Data scientists can typically spot outliers or anomalies at a high-level glance due to constant involvement in the data, and the role of the aspiring pioneer is out of scope for targeting these types of fluctuations within their roles.
Product managers would benefit most from this enablement, allowing them to utilize fewer resources, and spot statistically significant changes quicker.
Our discovery phase took 2 weeks. Once we moved into validating design iterations through usability testing, that added another 2-3 weeks of calls, iterating, and re-testing. The research was not linear and was spread out over the course of 3 months.
We held 30+ customer call sessions total (combining discovery calls and usability testing), across 17 organizations across all segment verticals and sizes. These included customers such as Peloton, T-Mobile, Hubspot, and Mozilla to name a few. All calls were conducted via Zoom.
We initially identified Amplitude users who fit our research criteria by analyzing which users engaged with the "compare to past" feature in the last 90 days. Additionally, we received a list of customers who had previously expressed interest in a potential "anomalies" feature via the ticketing system or through CSM's.
The prototype we used in testing revealed three main feature asks that were grouped into high-level themes.
Using the feedback provided, we broke down those key pieces of feedback into themes. This created a bigger picture for our customer asks, not just focusing on the specifics of feature desires mentioned in each call, but what the root of those asks is to holistically apply those concepts throughout the feature.
With early iterations, we weren’t being clear enough with differentiating future vs past data when forecasting as well as partial data versus expected value.
The nature of the tool was confusing and complex. we needed our UX writing to speak to users, not robots.
The most common inquiry was about how to trust what they’re seeing and understanding which parameters were best to utilize for their insights.
During the testing and iterating phase, we presented a few different ideas with live prototypes in our dev environment for our cohorts to engage with. This helped scope down design iterations by highlighting the trade-offs in each as told through the users' interaction.
It was important to see the "a-ha" moment happen in real-time. Using our north star for the project, we knew we had to create a strong foundation for this tool in order to create a case for prioritizing the continuation of feature builds for added value. If users can't find the feature, understand what it's doing, or trust the model, it will fail to get the adoption and engagement necessary to deem it worthy of dedicating engineering resources to it.
We needed to ensure we remained consistent with our design system, chart patterns, and any other tools that have a similar behavior expectancy with state changes.
The UI in Amplitude is complicated and at times cluttered. The design didn’t have existing patterns that represented state changes while simultaneously layering more data in. We needed to nail representing clarity behind the behavior of the tool.
Users expressed their biggest concern was trusting the tool. They expressed that to adopt it, they needed to trust that the results are accurate, as well as be able to communicate to their stakeholders how they uncovered that output. It raised the question with our customers "How can I trust this tool more than my Data Scientist?"
So how did we create an intuitive experience for users with the expected behavior of the feature, and the interaction with the button itself?
We landed on a button that functions as a toggle, rather than an actual toggle. Besides adding a component to our product that doesn't currently exist, it would also create confusion and inconsistencies with our design patterns throughout Amplitude. Color and tag affordances were used to signify the state of the chart.
Selecting it turns it on, automatically defining a default mode. This displays the smart default tags that the mode is applying to the chart. Users can still edit global settings easily through the button or by selecting the tags.
Throughout the research, one thing remained consistent: users loved smart defaults.
We used smart defaults to represent the parameters being applied to the chart through the associated “modes”. This clarified confusion around users trusting their selection and how each setting might be perceived (or guessed) "best".
For forecasting the tag defaults to an empty state but provides access to layer in added complexity, without having to open the settings. This was done to preserve the discoverability of the feature.
One of the biggest pain points in the tool was how users formed an understanding of what “model” we apply to the chart and how results are being calculated. Each user had a different guess as to what information the model might apply in order to create an output.
Some users thought the standard might be 60 days of data. For others, they thought it meant the confidence bands applied needed to be set to a higher %.
Using modes allowed us to encompass multiple personas while simultaneously reducing confusion for the lesser experienced, allowing flexibility for the more experienced, and building trust in the model for all through transparency and best practice.
We landed on a hovering effect for each segment (line) in order to offer the ability to analyze anomalous behaviors between several related metrics. Up to 10 segments can be applied by default, which is the max amount of information to be displayed before the design became compromised (very messy!)
Hovering solved the problem of needing to show more context on the chart all at once with the confidence bands and forecasting parameters, while also allowing users to investigate and isolate anomalies easily.
I was able to align the design team to agree on reserving one color from the design system to be used only in the case of displaying an anomalous data point to avoid further confusion amongst all the noise and "blue" sea of features in Amplitude.
Launch and measure! Building machines takes a while, so it could be quite a bit of time before you see success metrics listed here.
Alert users as anomalies happen in real-time. Any anomaly from more than 1 day ago is old news that makes this capability less useful.
Quickly answer users’ first question when they see an anomaly: is this an instrumentation issue or user behavior change?
The high-level objective of anomalies + forecasting was to set the Analytics team up for success through a strong foundation built early and quickly during an uncertain time (covid 2020).
Keeping our process focused meant a lot of questions that surrounded key parts of the original customer feature requests that were not within our scope. This led to time and resources being spent on figuring out ways around the constraints, as it was difficult to pull insights from customers without guiding questions, for them to pinpoint the value of this feature, without being able to tell what caused the anomaly.
The project's highlight was working collaboratively with my teammates on a machine learning project, which was a first for me! I was able to learn more about how we can utilize data to prevent future failures, and how our previous patterns can inform our decisions in a meaningful way.