sql
php
css
ajax
python
mysql
linux
xcode
ruby-on-rails
regex
flash
html5
json
facebook
cocoa
delphi
apache
mvc
asp
api
If your input is specified in the form of a non-normalized histogram, then simply using the built-in quantile() function automatically computes the data point for a specified quantile, which is what the inverse-CDF does. If the histogram is normalized by the number of data points (making it a probability vector), then just multiply it by the number of data points first. See here for the quantile() details. Basically, you'll assume that given your histogram/data, the first parameter is fixed, which turns quantiles() into a function only of the specified probability values p. You could easily write a wrapper function to make it more convenient if necessary. This removes the need to explicitly compute the CDF with cumsum().
quantile()
quantiles()
p
cumsum()
Added
If we assume the histogram, bins, and number of data points are h, b, and N, respectively, then:
h, b, and N
h1 = N*h; %// Only if histogram frequencies have been normalized. data = []; for kk = 1:length(h1) data = [data repmat(b(kk), 1, h1(kk))]; end %// Set p to the probability you want the inv-cdf for... p = 0.5; inv_cdf = quantiles(data,p)
For solutions that must leverage the existing PDF vector, we can do the following. Assume that x_old and pdf_old are the histogram bins and histogram frequencies, respectively.
x_old
pdf_old
p = 0.5; %// the inv-cdf probability that I want num_points_i_want = 100; %// the number of points I want in my histogram vector x_new = linspace(min(x_old),max(x_old),num_points_i_want); pdf_new = interp1(x_old,pdf_old,x_new); cdf_new = cumsum(pdf_new); inv_cdf = min(x_new(cdf_new >= p));
Alternatively, we could create the cumsum() CDF first and use interp1() on that if it's not desirable to interpolate first.
interp1()
Ok, I think I found a much shorter version, which works at least as fast and as accurately :
cdf = cumsum(pdf); M = length(cdf); xx = linspace(0,1,M); invcdf = interp1(cdf,xx,xx)
[EDIT : No this is actually still two to three times slower than the initial code... don't ask me why! And it does not handle non strictly monotonous functions : this produces the error : "The values of X should be distinct"]