Part 3 in the Zone7 series: melting down AI metrics.

In part two, we analyzed what principles a successful Artificial Intelligence system adheres to:

  • Results are personalized to account for how individual users (athletes, in Zone7’s case) have different needs and physiologies.
  • Continuously learns and improves from new data. Hopefully, this is done automatically, by the artificial intelligence system’s ‘algorithms’, or recipes for solving problems.

Without going too deep into “Artificial Intelligence”, let’s briefly breakdown what it does. To provide really valuable answers to tough questions, an algorithm aspires to create a model from data. It does this, first, by analyzing the data, far faster and more comprehensively than a human brain might achieve, looking for patterns or trends. The more data analyzed, the better the algorithm can be in finding the right patterns. Then, these patterns are applied to a new environment, and results are quantified for accuracy and value.

How do we know it’s accurate?

But what does “accuracy” mean? Let’s use a metaphor — a missile detection system. The system analyzes satellite data in realtime and determines if a missile was launched at our city. Intuitively, we want this system to accomplish two things:

NEVER miss a launch. Being hit by a missile with no warning can create devastating damage. Failing to detect a missile launch is called a False Negative.

NEVER raise a false alarm. Incorrectly mistaking a bird for a missile creates a lot of wasteful energy and resources as we try to erroneously evacuate our city. False alarms are also called False Positives.

Realistically, no system can deliver ZERO errors so there is ALWAYS A TRADEOFF. Squint your eyes harder to see if a bird is perhaps a missile, and you lose peripheral vision and may miss out on something else in the sky.

A good place to learn more about this is of course - Wikipedia's page about specificity and sensitivity


At Zone7, we use the same measures to validate our success, as scientists use to measure the validity of medical diagnostic tests.

It is important to understand that the relative importance of false-negative (oops… missed that missile launch, sorry!) and false positive (oops, no, that actually was not a hostile missile heading toward our cities) varies, according to the context. For COVID-19 tests, for example, false negative is disastrous.[1].

Accuracy in the context of what Zone7 does

However, in the area of sports performance, understanding the context in which AI operates is key. In reality, Zone7's algorithms analyze data leading to tens of thousands of injuries and identify complex patterns that precede these incidents. We then analyze an individual's data in real-time and ask:

"Are the injury-related patters present in today's data for this individual?" If they are present with a high enough degree of confidence, we then ask:

"What should tomorrow's workload/recovery/rest look like to counter these patters?" This enables personalized training programs that reduce such injuries.

Zone7 optimizes a difficult tradeoff. If you signal too many players at risk, you may hamper effective training. If you signal too few potential injuries, you risk missing a disastrous injury that should have been detected and prevented. Building on our dataset of tens of thousands of injury incidents AND our learnings of having collaborated with hundreds of coaches, doctors, and trainers, we can effectively manage this tradeoff to best-fit every specific environment.

Does Zone7 work? Yes. We know because we continuously test it using the same approach that scientists use to validate diagnostic tests.

Generally speaking, we accurately identify 70%-75% of injury incidents ahead of time while keeping the volume of alerts to 10%-15% of the squad. In a typical soccer squad, this means that 2–3 alerts are provided per day, allowing intervention that can effectively reduce 70%-75% of injuries.

Who cares about statistics? Just let me us it!

That's a good point. Most of the time, when we use Zone7 or any other product like it that creates answers to questions, we don't bother ourselves with false positive or false negative rates. We just use it, and over time (or quickly) we develop a personal sense of trust.

Have you ever quit Google Maps because the route seemed like a bad choice to you? Sure, we all have (although over time, my sense is that is becoming less and less common). That's a signal that today's experience wasn't usable to you. And this is where the statistics are translated into USABILITY.

In the case of Zone7, it's impossible to know how many false positives are present. If we are flagging 4 players for risk of injury, some of these 'alerts' will be 'false alarms', but we will never know which ones. So for us, the way that false positives are manifested to our users is through the TOTAL NUMBER of alerts per day. We track this number across weeks and months to ensure there isn't too much noise for our users. Generally speaking, over the past year we've Quadrupled (X4) the volume of data we use to train our algorithms, which has seen a 70% reduction in the volume of alerts per day - WITHOUT affecting our detection rate (or False Negative scores).

Smart people hate black boxes, so we opened it up.

AI is often a kind of black box - proponents use the term and expect people to believe that it works magic, without producing the evidence. Zone7 believes strongly that predictive / forecasting AI should be judged in the same way medical technologies are validated: by evidence, by results, by specificity and sensitivity. The beauty of Zone7 is that unlike medical diagnostic tests, which must be standardized for all as a condition of regulatory approval, Zone7 can be tailored for each team, each sport, each player.

And this is where the "black box" opens up and allows us not only to peak inside but also to change how it behaves. So depending on the user's preference, we can:

User A's top priority is to avoid missed injuries at all costs, so we can optimize for low false negatives and have a few extra alerts per day.

User B's top priority is to maximize the number of players going through the same daily routine, so we can optimize for low 'false positives' and have even fewer alerts per day.

The Next Post

Future blogs will bring lots of evidence showing how and why you can trust Zone7 but in the meanwhile, you can certainly look at Zone7's case studies on our website: https://zone7.ai/case-studies/.