Sample size is not a simple concept when it comes to Massive Mobile Data Analytics - see our prior blog post for some examples. In this post, we're analyzing our commercial vehicle data’s sample size in California based on a Daily Trip Sample Ratio. The results are that our archival data captures ~11.8% of commercial vehicle trips that took place in California in 2015.
Daily Trip Sample Size
For this analysis, we chose the Daily Trip Sample Ratio as the unit of sample analysis because “trips” are the active unit for most of our clients’ projects. Therefore, we consider it a “useful” measure of sample.
First, we collected data from the California Trucking Census. We carefully checked the geocodes for 290 census count locations that spanned 55 counties (see Figure 1 below). This data set gave us the estimated Heavy and Medium Duty trucks that pass through on a typical day.
Figure 1: Census Count Locations from the California Trucking Census
For each count location, we also drew a “gate” across the road in StreetLight InSight® and counted the number of Heavy Duty and Medium Duty trips that crossed that gate to determine the average per day for 2015. Our trips are derived from data provided by our partner INRIX. Next, we divided the StreetLight InSight daily counts by the Traffic Census daily counts to get a Daily Trip Sample Ratio.
The average ratio for California was 11.8%. This result was remarkably consistent across all the points (for our statistically inclined readers, using StreetLight InSight daily counts to predict Traffic Census daily counts yielded a 0.9 R2 across the 290 locations). Heavy Duty and Medium Duty had a similar Daily Trip Sample Ratio.
Rural vs. Urban Regions
We were particularly curious to see if the Daily Trip Sample Ratio was consistent across rural and urban regions. California boasts extremes in both directions. We classified regions as low, medium, and high density depending on the population per square kilometer. Figures 2 and 3 (below) are heat maps of stops in California that show both geographic spread and interesting differences in behavior for Medium and Heavy Duty.
The results also held up quite consistently, with a range from 10.2% (low density regions) to 12.2% (high density regions).
Of course, this is just one approach and there are several other approaches that we could look at in the future. For example, we are exploring Trip Tours as an add-on to trips. We also could explore sample size variability affiliated with key places, such as agricultural lands or intermodal transit facilities, and the benefits of having many months in the sample period versus just a few days in the sample period.
Figure 2: StreetLight InSight Heat Map of all 2015 Medium Duty Commercial Vehicle Stops in CA
Figure 3: StreetLight InSight Heat Map of all 2015 Heavy Duty Commercial Vehicle Stops in CA
Check back on the blog for upcoming posts thinking about personal vehicle sample, other techniques to measure sample size, and much more! Let us know in the comments if there's a specific topic you're interested in.