Understanding the Domain

You have a dataset consisting of logged events from our streaming play (ie: events like “play”, “pause”, “error”, “end”, etc…). Each device has a unique ID (ESN) and each streaming session has a session ID unique to that device (SessionID). In addition, many other attributes are also provided.

Due to a variety of different issues, sometimes the logging data can be a bit messy. Duplicate events sometimes come in, some fields may be NULL for certain events, and services that enhance the logged events sometimes experience failures/fallbacks (e.g.: geo-IP services). In the end, this data arrives in a table with the following schema: