Let’s Get Transparent: The Size of Your Big Data Sample
Locational Big Data – the geospatial data created by mobile devices – is ubiquitous. Smartphones, connected cars, fitness trackers, and more create trillions of location records as their users go about their daily lives. There are many benefits of this Big Data, but one, of course, is large sample size. But just knowing something is “large” is not always enough. Many of our clients want to know more detailed info on sample size for individual projects. It helps them understand certainty of results.
In this blog post, I’ll explain how we’ve updated StreetLight InSight® (that's our easy-to-use online platform for transforming Big Data into transportation analytics) so that our clients always know the size of the sample they’re working with.
One of the most common questions we get from clients and prospective clients is, “How large was the sample for this analysis?” Answering this question has not been as straightforward to as you might expect because:
- There are many different ways to measure the size of our sample. (Read our “Sample Size: Not A Simple Concept!” blog post for more on that).
- Our navigation-GPS and Location-Based Services (LBS) data samples must be measured differently. This is due to both their technical characteristics and our privacy protocols.
- The size of our sample varies regionally. (Check out this analysis of variation in our LBS sample in Florida to see how we account for regional differences during data processing.)
However, sample size is still critically important to the transportation modelers, planners, and engineers that we work with. That’s why we’re now providing sample size estimates in the Metrics download files of every StreetLight InSight project that uses one of our two Big Data sources. Next, I'll walk you through how it works for both navigation-GPS and LBS data. (Note: This feature is not yet available for our beta StreetLight Volume: 2016 AADT Metrics, which are derived from a blend of these two data sources.)
Calculating Sample Size for Projects that Use Navigation-GPS Data
For projects that use navigation-GPS data, we estimate the total number of trips that took place in the region analyzed. For example, if you have an O-D project with an origin Zone of On Ramp A, destination Zones of Off Ramps B and C, and a data period June 2016, the sample size is estimated as follows:
- We look at travel activity on all the off-ramps in aggregate.
- We count all of the trips that “touch” – that is, they start in, stop in, or pass through – one or more of those Zones.
- Trips that touch multiple Zones are only counted once.
The Total Trip Sample Size for the project is 79,000 trips. If your project measured June and July, and another 81,000 trips touched those Zones in July, then the Trip Sample Size would be 160,000 trips. It would make sense to compare these numbers to things like loop counters (which count trips).
Keep in mind that the O-D matrix for this project will only reflect the trips that began at On Ramp A – our origin Zone – and ended at On Ramps B or C – our destination Zones. However, a trip that began at On Ramp A but ended elsewhere would not be included in the analysis.
Calculating Sample Size for Projects that Use LBS Data
For projects that use LBS data, we do things a little differently. We look at the number of distinct devices that were in the region during the data period for the analysis. For example, if you’re doing an O-D analysis between TAZ A, TAZ B, and TAZ C in the month of July, we would add all the distinct devices seen in TAZ A, TAZ B, and TAZ C during that month.
Table 2: This is an example of raw device count estimates for an O-D matrix derived from LBS Data. Note: Unlike the table below, the O-D matrices we give clients are usually expressed as StreetLight Trip Index values that are normalized and adjusted - not as raw counts.
The total devices in this matrix add to 32,800. It would make sense to compare these numbers to the population of an area. As with the O-D matrix derived from navigation-GPS data, the Metrics in your project would only reflect the devices that started in one of your origin Zones and ended in one of your destination Zones.
How to See the Sample Sizes for Your Projects:
Once you’ve completed a project, go to the “All Projects” tab. Click the download button on the right (I’ve highlighted it in the image below).
Then, open the folder that downloads (you may have to unzip it). Click on the Sample Size .csv file, highlighted below.
Why Knowing Sample Size Matters
Having sample size information can help our clients do many things. Here are just a few:
- Create certainty statistics to their own liking
- Communicate sample size to build confidence with their stakeholders
- Compare sample size for different regions, to more deeply understand results and Big Data strengths in different regions
Do you know another way that sample size information could be useful? Let us know in the comments below!