Data Transformations, Statistics & Normalizations - Finance

Data Transformations, Statistics & Normalizations - Finance Mar 10, 2024 16:33:30 GMT -6

Quote

Post by Admin/YBB on Mar 10, 2024 16:33:30 GMT -6

Data Transformations, Statistics & Normalizations - Finance

Statistical parameters and interpretations are specific to applications. There are many arguments over them simply because people may not be aware of the applications or related practices. And some formulas and concepts may be used even when they may not be truly applicable – that is what applications are all about. After all, for obtaining approximate and plausible conclusions, why be fussy about the preciseness of the formulas? The discussion below is focused primarily on BUSINESS and FINANCE.

In finance, the PRICE data may be used directly (e.g., Bollinger Bands (BB), Bollinger Band Width (BBW), etc), or the price series may be converted to daily, weekly, or monthly CHANGES (returns-series). Often, the returns-series is converted into logarithmic-returns-series, i* = LN (1 + 0.01*TR). Much of the financial data reported are for the returns-series; Morningstar (M*), Yahoo Finance, Portfolio Visualizer (PV), etc use monthly returns, while Stock Rover (SR) uses daily returns. Interestingly, not many use the weekly returns that may offer the best compromise for computational resources between the use of daily and monthly data.

Data TRANSLATION is when a constant C is added to the data series. The entire probability distribution shifts/translates without changing the shape. It affects only the mean (Xbar becomes Xbar + C), but not the SD. Normalization may be to a mean of 0.

Data SCALING affects both the mean and SD. It may be normalized to the mean of 1, i.e., the SD per unit of mean, or SD/Xbar. The formal name for that is Coefficient of Variation (CV) or Relative-SD (RSD). But this use of RSD is different from YBB RSD that is SD relative to a benchmark SD (YBB RSD = SD/SDbenchmark). But CV isn’t useful where the mean is around 0, or the mean is somewhat arbitrary (e.g. temperatures in degC vs degF; but OK for absolute temperature degK).

Another normalization is by the SD, or mean per unit of SD, or Xbar/SD. In finance, this is a common definition of RISK-ADJUSTED RETURN, although there are other definitions that may impose some return-penalties (M*, etc). This normalization is almost the SHARPE RATIO, where the excess-mean is used ( = mean – risk-free return mean; T-Bill returns are used for risk-free returns).

Yet another normalization is wrt SDbenchmark, i.e.,

SD/SDbenchmark = beta/r ,

where beta is short-term volatility (from MPT) & r is correlation coefficient. It may also be called beta per unit of correlation r. It has been described elsewhere as EFFECTIVE EQUITY and RELATIVE SD, or YBB RSD (to differentiate it from an alternate use of RSD described above).

This can be rearranged as,

SD = (beta/r)*SDbenchmark,

and note that all terms on the right-hand side depend on benchmark, but the left-hand side SD is independent of benchmark.

Sampling theory and the relation between the POPULATION SD and the SD of SAMPLED MEAN is also used to annualize SD obtained from daily, weekly, or monthly returns; these factors are, respectively, sqrt(250) = 15.8114, sqrt(52) = 7.2111, sqrt(12) = 3.4641; for example, annualized SD = 3.4641*SD of monthly returns; obviously, annualized mean = 12*mean of monthly returns. Some rigorous statisticians may object to this, but it’s a widely used practice in finance.

Normal distribution CONFIDENCE INTERVALS for +/- SD, 2*SD, 3*SD (with confidence levels of 68.27%, 95.45%, 99.73%) are also used although returns aren’t normally distributed, but logarithmic-returns are closer to normal than returns. Some say that this means that SDs don’t work, while it is correct to say that only these confidence intervals may not apply. SD is a valid statistic that has been around for centuries (Karl Pearson, 1852); while it comes out of the MPT analysis (1952), SD can also be calculated independently.

Others including Nassim Taleb have suggested that MAD (mean absolute deviation) is better than SD in this computer age. There are arguments against MAD of its discontinuity and difficulties in deriving formal relations, but it’s more intuitive and closer to reality. For normal distributions, SD = sqrt(pi)*SD ~ 1.25*SD.

en.wikipedia.org/wiki/Standard_deviation
en.wikipedia.org/wiki/Geometric_standard_deviation
www.statology.org/advantages-disadvantages-of-standard-deviation/
en.wikipedia.org/wiki/Standard_error#Standard_error_of_the_mean
www.encyclopedia.com/science-and-technology/mathematics/mathematics/standard-deviation
www.encyclopedia.com/people/science-and-technology/genetics-and-genetic-engineering-biographies/karl-pearson

www.edge.org/response-detail/25401
en.wikipedia.org/wiki/Confidence_interval
en.wikipedia.org/wiki/68%E2%80%9395%E2%80%9399.7_rule
mathworld.wolfram.com/StandardDeviation.html
en.wikipedia.org/wiki/Coefficient_of_variation
en.wikipedia.org/wiki/Sharpe_ratio
en.wikipedia.org/wiki/Risk-adjusted_return_on_capital
en.wikipedia.org/wiki/Risk%E2%80%93return_ratio
en.wikipedia.org/wiki/Modern_portfolio_theory