One of Amplitude’s main use cases is monitoring core metrics and understanding their trends. Today, there are multiple ways users seek to understand changes in their core metrics within Amplitude.
The current ways our users monitor metrics and decide if a change is meaningful are very unscientific and full of noise, leading to wasted resources in investigating these fluctuations.
April - July 2020 (4 months)
Figma, JIRA, Amplitude,
So how does Amplitude's growth team fit into the rest of the product org?
Imagine growth as the resource utilized to test and validate concepts quickly and iteratively before investing resources full-time in those problem areas.
Growth assisted in meaningful discoveries around risk versus reward in various product teams' bets, before having to commit those to a roadmap.
Users need to see and understand the thing, apply and do the thing, and interpret the thing. That means:
Our solution offers value to all of Amplitude's personas while focusing on the optimal solution for our project's target persona. We built a tool that empowers users to uncover meaningful fluctuations quickly without reliance on others to complete their jobs to be done. The design also creates trust between the user and how Amplitude's model applies parameters in order to produce accurate results.
This stage helps us orient on a path forward with our projects. We solidify confidence in the path we're building to paint an end-to-end picture for optimal feature success. We hypothesized outcomes that might derive directly from the impact features we built to help stakeholders understand the value customers see with the tool.
Product teams need the ability to detect anomalies in their data. This is the base-level MVP for this project that will enable Analytics to build P2+3 to reach maximum value.
Simultaneously monitor all parent metrics and subgroups without needing to define individual monitors for each permutation. This allows product resources to focus on other (more impactful) areas.
Users should be able to quickly distinguish what caused an anomaly whether it be instrumentation or user behavior change. What used to require a data scientist can now be done by anyone.
We’re creating the foundation for the Analytics team to build upon while utilizing less company resources. The completion of phases 1-3 is what we anticipate to drive maximum customer value from the feature.
A primary use case for sharing within Amplitude is when users find something "insightful". We hypothesized that users who identified anomalies would share frequently with colleagues, increasing product engagement.
Amplitude isn't just for our data scientists. We want to enable all personas across product teams to detect anomalies as they occur, while utilizing fewer resources to gain those insights.
% of weekly active users that have a “chart compute” event with the anomaly feature turned on.
The % of users utilizing the feature in week 2. Our proxy was our "compare to past" feature, with a 24% 2-week retention.
Broadcasted learnings are the # of “notebooks” shared within an org between users. our target was 100 weekly BL.
Discovery took 2 weeks. Validating design iterations through usability testing added another 2-3 weeks of calls, iterating, and re-testing. Research wasn’t linear and spread out over 3 months.
We held 30+ customer call sessions total (combining discovery calls and usability testing), across 17 organizations of all segment verticals and sizes.
We targeted Amplitude users who fit research criteria by analyzing which users engaged with the "compare to past" feature in the last 90 days. Additionally we had contacts provided by CSMs of customers who had requested this feature.
The prototype we used in testing revealed three main feature asks that were grouped into high-level themes. Using the feedback provided, we broke down those key pieces of feedback into themes. This created a bigger picture for our customer asks, not just focusing on the specifics of feature desires mentioned in each call, but what the root of those asks is to holistically apply those concepts throughout the feature.
Through research we discovered our pioneers were our target personas. Data scientists can typically spot outliers or anomalies at a high-level glance due to constant involvement in the data, and the role of the aspiring pioneer is out of scope for targeting these types of fluctuations within their roles.Product managers would benefit most from this enablement, allowing them to utilize fewer resources, and spot statistically significant changes quicker.
During the testing and iterating phase, we presented a few different ideas with workable prototypes in a staging environment for research cohorts to engage with. This helped scope down design iterations by highlighting the trade-offs in each as told through the users' interaction.
It was important to see the "a-ha" moment happen in real-time. Using our north star for the project, we knew we had to create a strong foundation for this tool in order to create a case for prioritizing the continuation of feature builds for added value. If users can't find the feature, understand what it's doing, or trust the model, it will fail to get the adoption and engagement necessary to deem it worthy of dedicating engineering resources to it.
Remain consistent with our design system, chart patterns, and any other tools that have a similar behavior expectancy with state changes to maintain existing trust with users.
Enable sophistication for more experienced users to configure charts based on use case without limitations set by the target persona.
Offer transparency in how the model is configured, i.e which set of parameters are being applied, to build trust with users by displaying that logic.
So how did we create an intuitive experience for users with the expected behavior of the feature, and the interaction with the button itself?
We landed on a button that functions as a toggle, rather than an actual toggle. Besides adding a component to our product that doesn't currently exist, it would also create confusion and inconsistencies with our design patterns throughout Amplitude. Color and tag affordances were used to signify the state of the chart.
Selecting it turns it on, automatically defining a default mode. This displays the smart default tags that the mode is applying to the chart. Users can still edit global settings easily through the button or by selecting the tags.
Throughout the research, one thing remained consistent: users loved smart defaults.
We used smart defaults to represent the parameters being applied to the chart through the associated “modes”. This clarified confusion around users trusting their selection and how each setting might be perceived (or guessed) "best".
For forecasting the tag defaults to an empty state but provides access to layer in added complexity, without having to open the settings. This was done to preserve the discoverability of the feature.
One of the biggest pain points in the tool was how users formed an understanding of what “model” we apply to the chart and how results are being calculated. People already trust Amplitude to run the analysis for them, what if we chose the parameters that best fit a “mode”? This would:
1. Give personas across varying data proficiency levels equal value from the tool
2. Remove the doubt in “which parameter is right for my results” by wrapping them into industry standard modes.
3. Allow users to build upon their growth in proficiency without a steep learning curve between leveling
We landed on a hovering effect for each segment (line) in order to offer the ability to analyze anomalous behaviors between several related metrics. Up to 10 segments can be applied by default, which is the max amount of information to be displayed before the design became compromised (very messy!)
Hovering solved the problem of needing to show more context on the chart all at once with the confidence bands and forecasting parameters, while also allowing users to investigate and isolate anomalies easily.
I was able to align the design team to agree on reserving one color from the design system to be used only in the case of displaying an anomalous data point to avoid further confusion amongst all the noise and "blue" sea of features in Amplitude.
Alert users as anomalies happen in real-time. Any anomaly from more than 1 day ago is old news that makes this capability less useful.
Quickly answer users’ first question when they see an anomaly: is this an instrumentation issue or user behavior change?
The high-level objective of anomalies + forecasting was to set the Analytics team up for success through a strong foundation built early and quickly during an uncertain time (covid 2020).
Keeping our process focused meant a lot of questions that surrounded key parts of the original customer feature requests that were not within our scope. This led to time and resources being spent on figuring out ways around the constraints, as it was difficult to pull insights from customers without guiding questions, for them to pinpoint the value of this feature, without being able to tell what caused the anomaly.
The project's highlight was working collaboratively with my teammates on a machine learning project, which was a first for me! I was able to learn more about how we can utilize data to prevent future failures, and how our previous patterns can inform our decisions in a meaningful way.