By: Laura Schewel on September 28th, 2017

Print/Save as PDF

How We Doubled Our Sample Size in One Year

Big Data  |  Mobile  |  Software Updates

We just passed our one-year anniversary of using Location-Based Services (LBS) data, so we decided to update some key sample size figures. The results are exciting: Our sample size has doubled to more than 62 million devices in the US and Canada in the past year. In other words, now our analytics anonymously describe the travel behavior of 23% of the US and Canadian adult population.

There are many reasons for this increase, including our main LBS data partner, Cuebiq, doing a great job. However, the most important reason is that Location-Based Services are becoming more and more widely adopted by consumers. As a result, our clients can now analyze the aggregate travel patterns of nearly ¼ of the population in just a few mouse clicks.

That’s a large sample by any measure, but when you consider the “status quo” methods of collecting travel behavior data, it’s even more dramatic. Imagine how much it would cost – and how long it would take – to collect household travel surveys from 62 million people, or to install sensors and traffic counters on the roads they use every day. It just wouldn’t be feasible. In this blog post, I’ll explain how we calculate sample size (hint: accuracy is more important to us than flashiness) and why it’s grown so much in just one year.

Measuring Sample Size and Pokemon

The most obvious benefit of using Big Data to understand travel behavior is sample size: Bigger is better, up to a point. Big Data, which we define as the location records mobile devices create, is the only resource that lets you tap into the anonymous, aggregate travel behavior of millions of people. With more data, we can provide an even more comprehensive picture of travel behavior.

However, measuring the size of our Big Data sample is more complicated than it may seem. This is chiefly because there are many ways to express sample size, especially when it comes to the difference between total sample and “useful” sample. The size of our sample before processing is very different from the size of our sample afterwards. So, while we think bigger is better, the quality of the data is just as important to us. We care the most about quantifying the data that is actually useful for travel pattern studies.

To illustrate that point, let’s think about Pokemon Go....

 StreetLight Data Pokemon Go.jpg

 

As of February 2017, Pokemon Go had been downloaded 650 million times globally. So, if Pokemon Go was StreetLight’s sole supplier, would we say our sample size was 650 million? No, we would not. However, many sample size claims in the world of LBS data are based on these falsely high “downloads” values. 

So, what makes that claim "falsely high?" First, we all know that even if someone has downloaded an app, it doesn’t mean that the app is still on their phone or that it ever was opened (even in the background) to collect data. A much more accurate number would be “active users,” or something similar. We know that Pokemon Go revenue (a proxy for usage) dropped by nearly 90% less than six months after its peak. But they can still claim those massive download numbers. Next, even at the peak of the game's popularity, most users of Pokemon Go did not play or even open the app every day, and thus, would not regularly generate useful data for our purposes. Only about 1/5 of its active users logged in daily, which is actually quite high for this type of app. 

Another important note is that when we create our Metrics, we ignore data that have worse than 25 meter spatial precision, as well as data that start “mid-trip” (for example, if it starts on a highway going 60mph) - among other factors. See Figure 1 below for a diagram of how this works. 

StreetLight Data Metrics Processing_diagram.jpg

Figure 1: This diagram shows how we process raw location data into transportation analytics available via StreetLight InSight. StreetLight InSight is our easy-to-use online platform for measuring travel behavior.

Our rule of thumb is that a device must produce location data that is used for our analytics to be included in our sample. It’s not enough for a device to simply be included in the data sets our suppliers send. If it doesn’t create data that we actually use – if we can’t identify key patterns like expected home-work locations or trips – then we don’t include it in our sample size measurements.

In short, StreetLight’s sample size numbers are based on us looking at a month of data and evaluating the number of devices that actually create usable data in that time period. If we took an approach similar to counting the total number of “downloads,” our sample would double again.

For example, if we looked at the total devices that have been in our system in 2017, our sample size would be well over 130M. But that would include devices we collected data from for a few days, and that never showed up again. That’s not an accurate representation of the sample size a client will get. And, of course, we also know that most clients really want sample size for their particular analysis. That's why all StreetLight InSight Projects come with a specific sample size estimate available by request. 

We aim to be upfront and transparent about the size of our Big Data sample instead of trumpeting the largest number we can. It helps our clients use our analytics more effectively when they have this information.

Why Our Sample Grew

So, why did our sample grow so dramatically – and without us even adding a new data provider? While there are several reasons, the key driver is the rise of Location-Based Services (LBS) data, which we obtain from our partner Cuebiq.

We recognized the utility of this data resource for transportation early on, and we’re excited that it has become even more valuable over time. As one of the pioneers in developing travel LBS data, we’ve been writing about its benefits in terms of spatial precision, rural coverage, and representativeness for months. So, why is this specific type of data growing?

According to Pew Research, about 77% of Americans owned a smartphone in 2016, and it’s a safe bet to say that number is growing. Importantly, these smartphone adopters are using apps with LBS more and more regularly. According to a recent survey conducted by Flurry, a mobile analytics company, this is what people are using their smartphones for:

  • Navigation (94%)
  • Shopping and Retail (93%)
  • Social Media (90%)
  • Health and Fitness (60%)

It follows that these are some of the most popular types of LBS apps:

  • Social Media: Did you know that over 80% of all social media activity is done on smartphones, per ComScore?
  • Brands: These are apps for your retailers, like your pharmacy or grocery store. They help you place orders, deliver coupons, create shopping lists – and, importantly, find the closest store to you.
  • Fitness Tracking: These apps can help measure the steps you take and the calories you burn throughout the day.
  • Transit and Transportation: These apps provide turn-by-turn navigation and tell you when public transit is arriving.

Depending on the app and the device users’ preferences, these apps track locations when their users have them open and operating as well as when they’re closed, or “backgrounding.” It’s important to note that “opting in” or “enabling Location-Based Services” is not a requirement for users. We think that “opt-in” step is extremely important for privacy protection.

As expected, there’s a ton of overlap between the most popular iOS and Android apps and those that provide LBS services (see Figure 2 below). From Snapchat to Pandora Music to Netflix, the most-downloaded apps are accessing devices' locations to provide better experiences to their customers. At the end of the day, that means we’re getting a lot more data to improve our own customers’ experiences, too.

Screenshot 2.png

Figure 2: This image shows the top apps in the iOS app store as of September 26th 2017. All of the apps in the top 6 provide Location-Based Services to users, although not all of these apps require users to enable location sharing.

From a transportation measurement perspective, fitness tracking is one of the trends that we’re most excited about. It’s also an area we expect to grow over time: According to Flurry’s report, usage of these apps grew a whopping 330% between 2014 and 2017.

These apps are providing a valuable service to their users, who are normally interested in measuring their “steps” and other athletic activities to improve their health. The data that these apps create are ideal for understanding pedestrian behavior because they record activities with very high spatial precision “in the background” throughout the day – not only when their users are working out.

Putting It All Together

In sum, our sample size is on the rise, and without us even bringing another data supplier into our partner network. The main reason it’s growing is that consumer behavior is changing, and these new habits are creating more and more anonymous location data.

We think this means that LBS data will continue to be incredibly valuable for transportation planning purposes and more. Next month, we’ll be hosting a webinar deep-dive on LBS data, so stay tuned for your chance to register, or contact our team here to let us know you'd like an invite. 

 

big data transportation