{"id":138,"date":"2014-08-25T16:14:00","date_gmt":"2014-08-25T16:14:00","guid":{"rendered":"https:\/\/timallanwheeler.com\/blog\/?p=138"},"modified":"2022-01-26T16:14:55","modified_gmt":"2022-01-26T16:14:55","slug":"model-inference-in-the-presence-of-truncated-censored-or-missing-data","status":"publish","type":"post","link":"https:\/\/timallanwheeler.com\/blog\/2014\/08\/25\/model-inference-in-the-presence-of-truncated-censored-or-missing-data\/","title":{"rendered":"Model Inference in the Presence of Truncated, Censored, or Missing Data"},"content":{"rendered":"\n<p>I have been working with probability and machine learning lately, particularly with fitting distributions to datasets. Fitting data is something covered in conventional pre-college curriculum, but I have only ever done it when all of my data has been complete. More recently I ran into a problem.<\/p>\n\n\n\n<p>One fundamental feature in autonomous driving is the distance to the car in front of you. This tends to be a real number, something hopefully a bit bigger than a few meters when travelling at high speeds and potentially quite large when traffic is low. If you have a set of sensors on your car, like radars or lidar, you can only pick up cars up to a certain distance away from yourself. What do you do if there is no car in front of you? Set the distance to infinity?<\/p>\n\n\n\n<p>This is a good example of censored data, a feature where data must fall within a given range but you know when it does so. The other types are truncated and missing data. Data is truncated\u00a0when the only data you have is within a certain range, and you do not know of the occurrences when it falls outside of that range. Missing Data occurs when you missed or corrupted a reading, or for some reason it is not available. A good example is the velocity of the car in front of you.<\/p>\n\n\n\n<p>So how does one handle fitting distributions to such features?<\/p>\n\n\n\n<p>The answer is a surprisingly straightforward application of Bayes\u2019 theorem.<\/p>\n\n\n\n<p>Consider first a toy problem:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>Unstable particles are emitted from a source of decay at a distance \\(x\\), a real number that has an exponential probability distribution with characteristic length \\(\\lambda\\). We observe \\(N\\) decays at locations \\(x_1, x_2, \\ldots, x_N\\). What is \\(\\lambda\\)?<\/p><cite>\u201cInformation Theory, Inference, and Learning Algorithms\u201d by David MacKay<\/cite><\/blockquote>\n\n\n\n<p><\/p>\n\n\n\n<p>Solving this for the case with perfect data provides us some insight.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"fully-observed-data\">Fully Observed Data<\/h3>\n\n\n\n<p>The probability distribution for a single sample point, given\u00a0\\(\\lambda\\), is:<\/p>\n\n\n\n<p>$$P(x\\mid \\lambda) = \\lambda e^{-\\lambda x}$$<\/p>\n\n\n\n<p>from the definition for the exponential probability distribution.<\/p>\n\n\n\n<p>Applying\u00a0Bayes\u2019 theorem and assuming independence:<\/p>\n\n\n\n<p>$$P(\\lambda \\mid x_{1:N}) = \\frac{P(x_{1:N}\\mid \\lambda)P(\\lambda)}{P(x_{1:N})} \\propto \\lambda^N \\exp \\left( -\\sum_{1}^N \\lambda x_n \\right) P(\\lambda)$$<\/p>\n\n\n\n<p>We can see that simply by conditioning on the data available and setting a prior we can determine the likelihood distribution for the value of\u00a0\\(\\lambda\\). From here we can do what we wish, such as \u00a0picking the most likely value for\u00a0\\(\\lambda\\).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"truncated-data\">Truncated Data<\/h3>\n\n\n\n<p>Suppose, however, that the data is truncated. Suppose that we only get readings for\u00a0particle decays between\u00a0\\(x_\\min\\)\u00a0and\u00a0\\(x_\\max\\). Fitting in the same way is going to cause inconsistencies. Let us start again, following the same steps.<\/p>\n\n\n\n<p>The probability distribution for a single sample point, potentially truncated, given\u00a0\\(\\lambda\\), is:<\/p>\n\n\n\n<p>$$P(x\\mid \\lambda) = \\begin{cases} \\lambda e^{- \\lambda x} \/ Z(\\lambda) &amp; x \\in [x_\\min, x_\\max] \\\\ 0 &amp; \\text{otherwise} \\end{cases}$$<\/p>\n\n\n\n<p>where $Z(\\lambda)$ is a normalization factor:<\/p>\n\n\n\n<p>$$Z(\\lambda) = \\int_{x_\\min}^{x_\\max}  \\lambda e^{- \\lambda x} \\> dx = \\left( e^{-\\lambda x_\\min} &#8211; e^{-\\lambda x_\\max} \\right)$$<\/p>\n\n\n\n<p>We then apply Bayes&#8217; theorem:<\/p>\n\n\n\n<p>$$P(\\lambda \\mid x_{1:N}) = \\frac{P(x_{1:N}\\mid \\lambda)P(\\lambda)}{P(x_{1:N})} \\propto \\left( \\frac{N}{Z(\\lambda)} \\right)^N \\exp\\left( -\\sum_1^N \\lambda x_n \\right) P(\\lambda)$$<\/p>\n\n\n\n<p>This is\u00a0<em>very<\/em>\u00a0similar, and was quite easy to determine.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"censored-data\">Censored Data<\/h2>\n\n\n\n<p>Without going into detail, we can derive the model for censored data. Suppose\u00a0\\(x\\)\u00a0is censored to be less than\u00a0\\(x_\\max\\), which occurs if our particle detector has a finite length and particles that would have decayed after \\(x_\\max\\) instead slam into the back and report as \\(x_\\max\\):<\/p>\n\n\n\n<p>$$P(x\\ mid \\lambda) = \\begin{cases} \\lambda e^{-\\lambda x} \/ Z(\\lambda) &amp; \\text{for } x \\leq x_\\max \\\\ Z'(\\lambda) \\delta(0) &amp; \\text{otherwise} \\end{cases}$$<\/p>\n\n\n\n<p>where $Z(\\lambda)$ is the probability of \\(x\\) being uncensored, $Z'(\\lambda)$ is the probability of \\(x\\) being censored, and \\(\\delta\\) is the Dirac distribution. Here, $Z(\\lambda)$ is:<\/p>\n\n\n\n<p>$$Z(\\lambda) = \\int_0^{x_\\max} \\lambda e^{-\\lambda x} \\> dx = (1 &#8211; e^{-\\lambda x_\\max})$$<\/p>\n\n\n\n<p>$$Z'(\\lambda) = 1 &#8211; Z(\\lambda) = e^{-\\lambda x_\\max}$$<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"missing-data\">Missing Data<\/h2>\n\n\n\n<p>The final case is missing data. Here, you know when the data is missing but you have no information about where it was. This sort of data must be fitted using the original method using only the observed values.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I have been working with probability and machine learning lately, particularly with fitting distributions to datasets. Fitting data is something covered in conventional pre-college curriculum, but I have only ever done it when all of my data has been complete. More recently I ran into a problem. One fundamental feature in autonomous driving is the [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-138","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/timallanwheeler.com\/blog\/wp-json\/wp\/v2\/posts\/138","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/timallanwheeler.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/timallanwheeler.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/timallanwheeler.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/timallanwheeler.com\/blog\/wp-json\/wp\/v2\/comments?post=138"}],"version-history":[{"count":7,"href":"https:\/\/timallanwheeler.com\/blog\/wp-json\/wp\/v2\/posts\/138\/revisions"}],"predecessor-version":[{"id":148,"href":"https:\/\/timallanwheeler.com\/blog\/wp-json\/wp\/v2\/posts\/138\/revisions\/148"}],"wp:attachment":[{"href":"https:\/\/timallanwheeler.com\/blog\/wp-json\/wp\/v2\/media?parent=138"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/timallanwheeler.com\/blog\/wp-json\/wp\/v2\/categories?post=138"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/timallanwheeler.com\/blog\/wp-json\/wp\/v2\/tags?post=138"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}