I've started a similar CodeBank thread before, but I'm now thinking I went too complex, as there was no interest. Just looking around earlier today, I saw a request under a CodeBank entry by The Trick. I didn't address all the requests in that entry, but I did address some of them. Maybe, if this has some interest, I'll develop some quartile/percentile functions as well as others.
Basically, I've just provided some one-sample statistical functions. I've also made a decision on how to handle missing values. I've struggled with this in VB6. One option is certainly the use of Variant. However, I've never been terribly happy with that option. Therefore, I've decided on sticking with Double arrays for my data, and using the IEEE Double NaN value to denote missing values. This can be seen in the code.
Now, for the uninitiated, NaN values can be a bit tricky. They're somewhat similar to the Null value, but even more restrictive. Once you get a NaN, you can continue to do math with it, but the results will be NaN (similar to Null in Variants). However, you can't do Boolean comparisons with a NaN. In other words, they'll crash if used in an If statement. Therefore, anyone using these functions, needs to develop a practice of checking return values with the IsNan() function. This will keep you out of trouble.
Now, most of what I did today is straight-forward. However, I did dip into calculating a p-value (and confidence intervals), which requires "distributions". I've leaned on the ALGLIB project to derive my PDF (probability distribution function [not portable document format]) and CDF (cumulative distrubution function) values.
The first part doesn't require this though. I've attached a complete project. All is tested, but I didn't really develop much of an interface. If you're interested, focus first on the modSimpleStats module. Here's the part of that module that doesn't use distributions. It's stand-alone:
And here's a continuation of that module, but this part does require distributions:
And, as stated, complete "run-able" project is attached.
Please feel free to make additional requests, and I'll possibly add them.
Take Care,
Elroy
Basically, I've just provided some one-sample statistical functions. I've also made a decision on how to handle missing values. I've struggled with this in VB6. One option is certainly the use of Variant. However, I've never been terribly happy with that option. Therefore, I've decided on sticking with Double arrays for my data, and using the IEEE Double NaN value to denote missing values. This can be seen in the code.
Now, for the uninitiated, NaN values can be a bit tricky. They're somewhat similar to the Null value, but even more restrictive. Once you get a NaN, you can continue to do math with it, but the results will be NaN (similar to Null in Variants). However, you can't do Boolean comparisons with a NaN. In other words, they'll crash if used in an If statement. Therefore, anyone using these functions, needs to develop a practice of checking return values with the IsNan() function. This will keep you out of trouble.
Now, most of what I did today is straight-forward. However, I did dip into calculating a p-value (and confidence intervals), which requires "distributions". I've leaned on the ALGLIB project to derive my PDF (probability distribution function [not portable document format]) and CDF (cumulative distrubution function) values.
The first part doesn't require this though. I've attached a complete project. All is tested, but I didn't really develop much of an interface. If you're interested, focus first on the modSimpleStats module. Here's the part of that module that doesn't use distributions. It's stand-alone:
Code:
Option Explicit
'
Private Declare Sub GetMem1 Lib "msvbvm60.dll" (ByRef Source As Any, ByRef Dest As Any)
Private Declare Sub GetMem2 Lib "msvbvm60.dll" (ByRef Source As Any, ByRef Dest As Any)
Private Declare Sub GetMem4 Lib "msvbvm60.dll" (ByRef Source As Any, ByRef Dest As Any)
Private Declare Sub GetMem8 Lib "msvbvm60.dll" (ByRef Source As Any, ByRef Dest As Any)
Private Declare Function ArrPtr Lib "msvbvm60.dll" Alias "VarPtr" (Ptr() As Any) As Long
'
' *******************************************
' *******************************************
'
' We start with some "helper" functions.
'
' *******************************************
' *******************************************
Public Function NaN() As Double
' Math can be done on these, but nothing changes.
' They can NOT be used in "if NaN = NaN Then", or an overflow will result. Use IsNaN().
'
Const bb7 As Byte = &HF8
Const bb8 As Byte = &HFF
'
GetMem1 bb7, ByVal VarPtr(NaN) + 6&
GetMem1 bb8, ByVal VarPtr(NaN) + 7&
End Function
Public Function IsNaN(d As Double) As Boolean
' Infinity also returns TRUE, but we shouldn't be running across infinities.
'
Static bb(1 To 8) As Byte
Const bb7 As Byte = &HF0 ' High 4 bits of byte #7. \
Const bb8 As Byte = &H7F ' Low 7 bits of byte #8. / If all on, it's NaN (or Inf if all other non-sign bits are zero).
'
GetMem8 d, bb(1)
IsNaN = ((bb(7) And bb7) = bb7) And ((bb(8) And bb8) = bb8)
End Function
Public Sub ChangeMissingToNaN(d() As Double, Optional MissingValue As Double = 0&)
' This changes the array "in place" to save memory.
' Just call: ChangeMissingToNaN YourArray
' Or: Call ChangeMissingToNaN(YourArray, MissingValue)
'
Dim i As Long
'
If DblDims(d) <> 1 Then Exit Sub
For i = LBound(d) To UBound(d)
If Not IsNaN(d(i)) Then If d(i) = MissingValue Then d(i) = NaN
Next i
End Sub
Public Function DblDims(dArray() As Double) As Integer
' Works on both Static and Dynamic arrays.
Dim pSA As Long
'
GetMem4 ByVal ArrPtr(dArray), pSA
If pSA <> 0& Then GetMem2 ByVal pSA, DblDims
End Function
' *******************************************
' *******************************************
'
' And now, just some simple statistics.
'
' *******************************************
' *******************************************
Public Function Count(d() As Double) As Long
' Returns 0 if not dimensioned.
' Skips any NaNs and INFs in the array.
'
Dim i As Long
'
If DblDims(d) <> 1 Then Exit Function
For i = LBound(d) To UBound(d)
If Not IsNaN(d(i)) Then Count = Count + 1&
Next i
End Function
Public Function Sum(d() As Double) As Double
' Returns NaN if d() isn't dimensioned.
' Skips any NaNs and INFs in the array.
' If there are no valid numbers, NaN is returned.
'
Dim i As Long
Dim cnt As Long
'
If DblDims(d) <> 1 Then Sum = NaN: Exit Function
For i = LBound(d) To UBound(d)
If Not IsNaN(d(i)) Then
cnt = cnt + 1
Sum = Sum + d(i)
End If
Next i
If cnt = 0& Then Sum = NaN
End Function
Public Function Mean(d() As Double) As Double
' Returns NaN if d() isn't dimensioned.
' Skips any NaNs and INFs in the array.
' If there are no valid numbers, NaN is returned.
'
Dim cnt As Long
'
cnt = Count(d)
If cnt = 0& Then Mean = NaN: Exit Function
Mean = Sum(d) / cnt
End Function
Public Function SumSq(d() As Double) As Double
' Returns NaN if d() isn't dimensioned.
' Skips any NaNs and INFs in the array.
' If there are no valid numbers, NaN is returned.
'
Dim i As Long
Dim cnt As Long
'
If DblDims(d) <> 1 Then SumSq = NaN: Exit Function
For i = LBound(d) To UBound(d)
If Not IsNaN(d(i)) Then
cnt = cnt + 1
SumSq = SumSq + d(i) * d(i)
End If
Next i
If cnt <> 0 Then SumSq = NaN
End Function
Public Function SumSqDiff(d() As Double) As Double
' This one is the sum-of-squared-differences-from-the-mean.
' Returns NaN if d() isn't dimensioned.
' Skips any NaNs and INFs in the array.
' If there are no valid numbers, NaN is returned.
'
Dim i As Long
Dim m As Double
'
m = Mean(d)
If IsNaN(m) Then SumSqDiff = NaN: Return
For i = LBound(d) To UBound(d)
If Not IsNaN(d(i)) Then
SumSqDiff = SumSqDiff + (d(i) - m) * (d(i) - m)
End If
Next i
End Function
Public Function VariancePop(d() As Double) As Double
VariancePop = MeanSqPop(d)
End Function
Public Function MeanSqPop(d() As Double) As Double
' Mean of squared differences based on POPULATION of numbers.
' This is also know as the VARIANCE.
' This one is for population (all items counted).
'
' Returns NaN if d() isn't dimensioned.
' Skips any NaNs and INFs in the array.
' If there are no valid numbers, NaN is returned.
'
Dim cnt As Long
'
cnt = Count(d)
If cnt = 0& Then MeanSqPop = NaN: Exit Function
MeanSqPop = SumSqDiff(d) / cnt
End Function
Public Function VarianceSamp(d() As Double) As Double
VarianceSamp = MeanSqSamp(d)
End Function
Public Function MeanSqSamp(d() As Double) As Double
' Mean of squared differences based on SAMPLE of numbers.
' This is also know as the VARIANCE.
' This one is for sample of items (sampled from some population).
'
' Returns NaN if d() isn't dimensioned.
' Skips any NaNs and INFs in the array.
' If there are less than 2 valid numbers, NaN is returned. Valid N must be > 1 to calculate for SAMPLE.
'
Dim cnt As Long
'
cnt = Count(d)
If cnt < 2& Then MeanSqSamp = NaN: Exit Function
MeanSqSamp = SumSqDiff(d) / (cnt - 1&)
End Function
Public Function StDevPop(d() As Double) As Double
' Standard deviation based on POPULATION of numbers.
'
' Returns NaN if d() isn't dimensioned.
' Skips any NaNs and INFs in the array.
' If there are no valid numbers, NaN is returned.
'
StDevPop = MeanSqPop(d)
If IsNaN(StDevPop) Then Exit Function
StDevPop = Sqr(StDevPop)
End Function
Public Function StDevSamp(d() As Double) As Double
' Standard deviation based on SAMPLE of numbers.
'
' Returns NaN if d() isn't dimensioned.
' Skips any NaNs and INFs in the array.
' If there are less than 2 valid numbers, NaN is returned. Valid N must be > 1 to calculate for SAMPLE.
'
StDevSamp = MeanSqSamp(d)
If IsNaN(StDevSamp) Then Exit Function
StDevSamp = Sqr(StDevSamp)
End Function
Public Function StErr(d() As Double) As Double
' Standard error of the mean (aka, standard error).
' This has no population equivalent.
'
' Returns NaN if d() isn't dimensioned.
' Skips any NaNs and INFs in the array.
' If there are less than 2 valid numbers, NaN is returned. Valid N must be > 1 to calculate.
'
StErr = MeanSqSamp(d)
If IsNaN(StErr) Then Exit Function
StErr = Sqr(StErr / Count(d))
End Function
Public Function OneSampleStudentT(d() As Double, Optional TestVal As Double = 0&) As Double
' Returns NaN if d() isn't dimensioned.
' Skips any NaNs and INFs in the array.
' If there are less than 2 valid numbers, NaN is returned. Valid N must be > 1 to calculate.
'
Dim se As Double
'
se = StErr(d)
If IsNaN(se) Then OneSampleStudentT = NaN: Exit Function
OneSampleStudentT = (Mean(d) - TestVal) / se
End Function
Code:
' *******************************************
' *******************************************
'
' From here down requires the distributions.
' Most of which were developed from the ALGLIB project.
'
' *******************************************
' *******************************************
Public Function OneSampleTTestPValue(d() As Double, Optional TestVal As Double = 0&, Optional Tails As Long = 2&) As Double
' A T-test can be performed either ONE-tailed or TWO-tailed.
' This returns the p value, the probability of observing these data if the null hypothesis is true.
' If you specify ONE-tailed, you should evaluate the mean, and only consider changes in ONE-DIRECTION from your TestVal as statistically significant.
'
' Returns NaN if d() isn't dimensioned.
' Skips any NaNs and INFs in the array.
' If there are less than 2 valid numbers, NaN is returned. Valid N must be > 1 to calculate.
'
Dim t As Double
'
If Tails < 1& Or Tails > 2& Then Error 6
t = OneSampleStudentT(d, TestVal)
If IsNaN(t) Then OneSampleTTestPValue = NaN: Exit Function
OneSampleTTestPValue = (1# - StudentTCdf(t, Count(d) - 1&)) * Tails
End Function
Public Sub OneSampleConfInt(d() As Double, LoValOut As Double, HiValOut As Double, Optional pCrit As Double = 0.05, Optional Tails As Long = 2&)
' As with a T-test, confidence intervals can be constructed either ONE-tailed or TWO-tailed.
' However, if you specify ONE-tailed, you should either use LoValOut or HiValOut, but not both.
' If TWO-tailed is specified, you would use both LoValOut and HiValOut to construct your confidence interval.
'
' pCrit is the equivalent p-value for your confidence intervals.
' For instance, for a 95% CI, we'd specify pCrit = .05.
' for a 90% CI, we'd specify pCrit = .10.
' pCrit must be in the range of 0 < pCrit < .5 for TWO-tailed; and 0 < pCrit < 1 for ONE-Tailed.
'
' Returns NaN if d() isn't dimensioned.
' Skips any NaNs and INFs in the array.
' If there are less than 2 valid numbers, NaN is returned. Valid N must be > 1 to calculate.
'
Dim cnt As Long
Dim tCrit As Double
Dim se As Double
Dim m As Double
'
' Validations.
If Tails < 1& Or Tails > 2& Then Error 6
If pCrit <= 0# Then Error 6
If pCrit >= 1# Then Error 6
If Tails = 2& And pCrit >= 0.5 Then Error 6
'
cnt = Count(d)
If cnt < 2& Then LoValOut = NaN: HiValOut = NaN: Exit Sub
tCrit = StudentTCdfInv(1# - (pCrit / Tails), cnt - 1&)
se = StErr(d)
m = Mean(d)
'
LoValOut = m - tCrit * se
HiValOut = m + tCrit * se
End Sub
Please feel free to make additional requests, and I'll possibly add them.
Take Care,
Elroy