grpstatsFS
grpstatsFS calls grpstats and reshapes the output in a much better way
Syntax
statTable=grpstatsFS(TBL, groupvars, whichstats)
example
statTable=grpstatsFS(TBL, groupvars, whichstats,Name,Value)
example
Description
grpstatsFS calls grpstats, but shows the output in much better way. The
output of grpstatsFS is a table with a number of rows equal to the
number of variables for which statistics are computed. The number of
columns of this table is equal to the number of statistics which are
computed. In presence of a grouping variable the number of rows of the
- output table remains the same but the number of columns is equal to the
- number of statistics times the number of groups.
Examples
expand all
+ output table remains the same, but the number of columns is equal to the
+ number of statistics multiplied by the number of groups. The statistics which
+ are computed by the default are the two (non robust and robust) indexes
+ of location, (mean and median) the two (non robust and robust) indexes
+ of spread (standard deviation and scaled MAD) and the two (non robust
+ and robust) indexes of skewness. The robust index of skewness is the
+ medcouple. The scaled MAD is defined as 1.4826(med|x-med(x)|).
Examples
expand all
Load a table
mean median std MAD skewness kurtosis
- ______ ______ ______ ______ ________ ________
-
- addedval 18096 18469 4941.6 6001.8 0.1079 2.2193
- depos 7769.1 7661.1 2841.4 3150.5 1.0734 6.1128
- pensions 10044 9975.2 1230.8 1170.6 0.31692 2.9485
- unemploy 10.173 6.42 7.8789 4.8778 1.0687 2.9546
- export 23.11 21.53 15.642 16.457 0.5797 2.7386
- bankrup 30.467 29.62 12.11 11.49 1.0067 4.7287
- billsoverd 44.614 40.6 22.783 19.496 0.98031 3.7585
-
-
mean median std MAD skewness medcouple
+ ______ ______ ______ ______ ________ _________
+
+ addedval 18096 18469 4941.6 6001.8 0.1079 -0.12451
+ depos 7769.1 7661.1 2841.4 3150.5 1.0734 -0.03612
+ pensions 10044 9975.2 1230.8 1170.6 0.31692 0.089163
+ unemploy 10.173 6.42 7.8789 4.8778 1.0687 0.61846
+ export 23.11 21.53 15.642 16.457 0.5797 0.066667
+ bankrup 30.467 29.62 12.11 11.49 1.0067 -0.010969
+ billsoverd 44.614 40.6 22.783 19.496 0.98031 0.1586
+
+
meanN medianN stdN MADN skewnessN kurtosisN meanCS medianCS stdCS MADCS skewnessCS kurtosisCS
- ______ _______ ______ ______ _________ _________ ______ ________ ______ ______ __________ __________
-
- addedval 22071 21936 2926.4 2136.5 0.78292 4.2884 14888 14049 3760.7 3500.3 0.83573 3.1482
- depos 9614.6 9407.5 2214.3 1052.6 2.9736 15.912 6279.7 5506.5 2389.6 1810.8 1.5434 5.6777
- pensions 10798 10658 891.71 716.05 0.8251 3.1664 9434.9 9465.1 1129.5 1103.6 0.95725 4.462
- unemploy 4.4713 4.34 1.6834 1.7643 0.81049 3.7212 14.774 15.25 7.9086 10.126 0.35304 1.9998
- export 30.366 29.025 13.042 13.84 0.48928 2.7313 17.255 12.94 15.193 14.366 1.1879 4.0334
- bankrup 27.855 27.745 10.451 10.215 0.73877 3.6387 32.575 31.64 13.009 12.691 1.0051 4.6087
- billsoverd 31.775 30.2 14.368 10.156 1.2639 5.3863 54.975 52.34 23.127 19.422 0.68542 3.2312
-
-
expand all
meanN medianN stdN MADN skewnessN medcoupleN meanCS medianCS stdCS MADCS skewnessCS medcoupleCS
+ ______ _______ ______ ______ _________ __________ ______ ________ ______ ______ __________ ___________
+
+ addedval 22071 21936 2926.4 2136.5 0.78292 -0.024878 14888 14049 3760.7 3500.3 0.83573 0.23097
+ depos 9614.6 9407.5 2214.3 1052.6 2.9736 -0.054686 6279.7 5506.5 2389.6 1810.8 1.5434 0.37286
+ pensions 10798 10658 891.71 716.05 0.8251 0.14446 9434.9 9465.1 1129.5 1103.6 0.95725 -0.15654
+ unemploy 4.4713 4.34 1.6834 1.7643 0.81049 0.0022075 14.774 15.25 7.9086 10.126 0.35304 -0.10805
+ export 30.366 29.025 13.042 13.84 0.48928 0.088586 17.255 12.94 15.193 14.366 1.1879 0.29376
+ bankrup 27.855 27.745 10.451 10.215 0.73877 -0.10754 32.575 31.64 13.009 12.691 1.0051 0.010132
+ billsoverd 31.775 30.2 14.368 10.156 1.2639 0.056604 54.975 52.34 23.127 19.422 0.68542 0.069632
+
+
expand all
The second element is empty, that is there is no grouping variable.
meanCIinf meanCIsup mean
+ _________ _________ ______
+
+ addedval 17131 19062 18096
+ depos 7213.7 8324.4 7769.1
+ pensions 9803.2 10284 10044
+ unemploy 8.6327 11.712 10.173
+ export 20.053 26.167 23.11
+ bankrup 28.101 32.834 30.467
+ billsoverd 40.161 49.066 44.614
+
+
Robust location Non robust location
+ _______________ ___________________
+
+ addedval 18469 18096
+ depos 7661.1 7769.1
+ export 21.53 23.11
+
+
Mean: lower confidence intervalN Mean: upper confidence intervalN Sample meanN Mean: lower confidence intervalCS Mean: upper confidence intervalCS Sample meanCS
+ ________________________________ ________________________________ ____________ _________________________________ _________________________________ _____________
+
+ addedval 21202 22940 22071 13891 15886 14888
+ unemploy 3.9714 4.9712 4.4713 12.675 16.872 14.774
+
+
Name-Value Pair Arguments
Specify optional comma-separated pairs of Name,Value
arguments.
Name
is the argument name and Value
is the corresponding value. Name
must appear
inside single quotes (' '
).
You can specify several name and value pair arguments in any order as
Name1,Value1,...,NameN,ValueN
.
Example:
'Alpha',0.01
, 'DataVars',[2 4]
+
, 'VarNames',["location" "robust location"];
Significance level for confidence and prediction intervals.
For additional information on Alpha see the help of
grpstats.
Example: 'Alpha',0.01
Data Types: double
Example: 'DataVars',[2 4]
-
Data Types: character vector | string array | cell array of character vectors | vector of positive integers | logical vector
Output Arguments
expand all
statTable
—table with p rows.
The rows are referred to the variables of the input table
- and the columns to the requested statistics
+
Data Types: character vector | string array | cell array of character vectors | vector of positive integers | logical vector
Note that the length of VarNames must be equal to the
+ number of statistics which are computed. Variable (column)
+ names for the output table statTable, specified as a string
+ array or a cell array of character vectors. By default,
+ grpstatsFS removes the @ if it is present in the name of
+ the statistic. In presence of a grouping variable
+ grpstatsFS appends the name corresponding to each
+ category of the groups.
+
Example: 'VarNames',["location" "robust location"];
+
Data Types: character vector | string array | cell array of character vectors | vector of positive integers | logical vector
Output Arguments
expand all
+ The rows are referred to the variables of the input table
+ and the columns to the requested statistics.
The number of columns of statTable is equal to
- the number of required statistcs times
- the number of groups.
This page has been automatically generated by our routine publishFS
\ No newline at end of file
+ the number of requested statistics multiplied by
+ the number of groups.
This page has been automatically generated by our routine publishFS
\ No newline at end of file
diff --git a/toolbox/utilities_stat/grpstatsFS.m b/toolbox/utilities_stat/grpstatsFS.m
index 8a5d4b027..e2a0ad079 100644
--- a/toolbox/utilities_stat/grpstatsFS.m
+++ b/toolbox/utilities_stat/grpstatsFS.m
@@ -1,37 +1,44 @@
-function [statTable]=grpstatsFS(X, groupvars, whichstats, varargin)
+function [statTable]=grpstatsFS(TBL, groupvars, whichstats, varargin)
%grpstatsFS calls grpstats and reshapes the output in a much better way
%
%
Link to the help function
%
-% grpstatsFS calls grpstats but shows the output in much better way. The
+% grpstatsFS calls grpstats, but shows the output in much better way. The
% output of grpstatsFS is a table with a number of rows equal to the
% number of variables for which statistics are computed. The number of
% columns of this table is equal to the number of statistics which are
% computed. In presence of a grouping variable the number of rows of the
-% output table remains the same but the number of columns is equal to the
-% number of statistics times the number of groups.
+% output table remains the same, but the number of columns is equal to the
+% number of statistics multiplied by the number of groups. The statistics which
+% are computed by the default are the two (non robust and robust) indexes
+% of location, (mean and median) the two (non robust and robust) indexes
+% of spread (standard deviation and scaled MAD) and the two (non robust
+% and robust) indexes of skewness. The robust index of skewness is the
+% medcouple. The scaled MAD is defined as 1.4826(med|x-med(x)|).
%
% Required input arguments:
%
-% X: Input data. Table. Table containing n observations on v variables.
-% Rows of Y represent observations, and columns
-% represent variables.
+% TBL: Input data. Table. Table containing n observations on p variables.
+% Rows of TBL represent observations, and columns
+% represent variables. If it necessary to compute the
+% statistics for subgroups, TBL must include at least one
+% grouping variable, which you specify using groupvars.
%
% groupvars: grouping variable.
-% Identifiers for the grouping variables in input X.
-% If groupvars is [] than the output refers to the overall
+% Identifiers for the grouping variables in input TBL.
+% If groupvars is [] then the output refers to the overall
% sample. For additional information on groupvars see the
-% help of grpstats.
-% Example - 'groupvars',[ones(50,1); 2*ones(50,1)]
+% help of grpstats. For example
+% Example - 'groupvars',2
% Data Types - character vector | string array | cell array of character vectors | vector of positive integers | logical vector | []
%
% whichstats: Types of summary statistics.
-% Name of the statistics which have to be computed
+% Name of the statistics which have to be computed.
% For additional information on whichstats see the help of
-% grpstats. If whichstats is empty of is not specifies the
+% grpstats. If whichstats is empty or it is not specified, the
% summary statistics which are computed are ["mean" "median"
-% "std" "MAD" "skewness" "kurtosis"]
-% Example - 'groupvars',"group"
+% "std" "MAD" "skewness" "medcouple"];
+% Example - ["mean" "std"]
% Data Types - character vector | string array | function handle | cell array of character vectors or function handles.
%
% Optional input arguments:
@@ -49,16 +56,32 @@
% Example - 'DataVars',[2 4]
% Data Types - character vector | string array | cell array of character vectors | vector of positive integers | logical vector
%
+% VarNames : Variable names for output table. cell array or characters
+% or string array.
+% Note that the length of VarNames must be equal to the
+% number of statistics which are computed. Variable (column)
+% names for the output table statTable, specified as a string
+% array or a cell array of character vectors. By default,
+% grpstatsFS removes the @ if it is present in the name of
+% the statistic. In presence of a grouping variable
+% grpstatsFS appends the name corresponding to each
+% category of the groups.
+% Example - 'VarNames',["location" "robust location"];
+% Data Types - character vector | string array | cell array of character vectors | vector of positive integers | logical vector
+%
+%
% Output:
%
-% statTable : table with p rows. The rows are referred to the variables of the input table
+% statTable : table with p rows.
+% Table containing summary statistics for the table input TBL.
+% The rows are referred to the variables of the input table
% and the columns to the requested statistics.
% The number of columns of statTable is equal to
-% the number of required statistcs times
+% the number of requested statistics multiplied by
% the number of groups.
%
%
-% See also: grpstats
+% See also: grpstats, medcouple
%
% References:
%
@@ -74,17 +97,17 @@
%$LastChangedDate:: $: Date of the last commit
%{
- %% grpstats with just one input argument.
+ %% grpstatsFS with just one input argument.
% Load a table
load citiesItaly.mat
- % Compute mean, median, std, MAD, skewness, kurtosis
+ % Compute mean, median, std, MAD, skewness and medcouple
% for the 7 variables of the input table citiesItaly
TBL=grpstatsFS(citiesItaly);
disp(TBL)
%}
%{
- %% grpstats with second input the grouping variable.
+ %% grpstatsFS with second input the grouping variable.
load citiesItaly.mat
% The first 46 rows are referred to provinces located in northern Italy and
% the remaining in centre-south Italy.
@@ -98,7 +121,7 @@
%}
%{
- %% Example of call to grpstats with personalized statistics.
+ %% Example of call to grpstatsFS with personalized statistics.
% The second element is empty, that is there is no grouping variable.
load citiesItaly.mat
% Just compute mean and median
@@ -126,6 +149,16 @@
disp(TBL)
%}
+%{
+ %% Example of call to grpstatsFS to create conf int for the mean.
+ load citiesItaly.mat
+ % Note that in this case meanci has in output two columns
+ stats={"meanci" 'mean'};
+ % Confidence interval for the sample means
+ TBL=grpstatsFS(citiesItaly,[],stats);
+ disp(TBL)
+%}
+
%{
%% Example of call to grpstatsFS to create conf int for the mean with groups.
load citiesItaly.mat
@@ -157,7 +190,7 @@
%}
%{
- % Example of the use of option DataVars.
+ %% Example of the use of option DataVars.
load citiesItaly.mat
% Note that in this case meanci has in output two columns
stats={"meanci" 'mean'};
@@ -171,6 +204,34 @@
disp(TBL)
%}
+%{
+ %% Example of the use of option DataVars with VarNames.
+ load citiesItaly.mat
+ % Note that in this case meanci has in output two columns
+ stats={"median" 'mean'};
+ TBL=grpstatsFS(citiesItaly,[],stats, ...
+ 'DataVars',[1 2 5],'VarNames', ...
+ ["Robust location" "Non robust location"]);
+ disp(TBL)
+%}
+
+%{
+ %% Example of the use of option DataVars with VarNames and grouping variable.
+ load citiesItaly.mat
+ % Note that in this case meanci has in output two columns
+ stats={"meanci" 'mean'};
+ % The first 46 rows are referred to provinces located in northern Italy and
+ % the remaining in centre-south Italy.
+ zone=[repelem("N",46) repelem("CS",57)]';
+ % Add zone to citiesItaly
+ citiesItaly.zone=zone;
+ % Confidence interval for the sample means separated for the 2 groups
+ TBL=grpstatsFS(citiesItaly,"zone",stats, ...
+ 'DataVars',["addedval" "unemploy"],'VarNames', ...
+ ["Mean: lower confidence interval" "Mean: upper confidence interval" "Sample mean"]);
+ disp(TBL)
+%}
+
%% Beginning of code
if nargin<2
groupvars=[];
@@ -180,8 +241,8 @@
mads=@(x)median(abs(x-median(x)))/norminv(0.75);
if nargin<3 || isempty(whichstats)
- whichstats={"@mean" "@median" "@std" mads "@skewness" "@kurtosis"};
- nomiStat=["mean" "median" "std" "MAD" "skewness" "kurtosis"];
+ whichstats={"@mean" "@median" "@std" mads "@skewness" "@medcouple"};
+ nomiStat=["mean" "median" "std" "MAD" "skewness" "medcouple"];
else
if iscell(whichstats)
@@ -198,29 +259,39 @@
end
if ~isempty(varargin)
- UserOptions=varargin{1:2:length(varargin)};
+ UserOptions=varargin(1:2:length(varargin));
% Check if DataVars is present inside varargin
checkDataVars = strcmp(UserOptions,'DataVars')>0;
- if checkDataVars==true
+ if any(checkDataVars==true)
lmsval = varargin{2*find(checkDataVars)};
- vnames=X.Properties.VariableNames(lmsval);
+ vnames=TBL.Properties.VariableNames(lmsval);
else
- vnames=X.Properties.VariableNames;
+ vnames=TBL.Properties.VariableNames;
+ end
+
+ % Check if VarNames is present inside varargin
+ checkVarNames = strcmp(UserOptions,'VarNames')>0;
+ if any(checkVarNames)==true
+ fvarNames=2*find(checkVarNames);
+ lmsval = varargin{fvarNames};
+ nomiStat=lmsval;
+ varargin([fvarNames-1 fvarNames])=[];
end
+
else
- if istable(X)
- vnames=X.Properties.VariableNames;
+ if istable(TBL)
+ vnames=TBL.Properties.VariableNames;
else
error('FSDA:grpstatsFS:WrongInp','grpstatsFS just supports a table in input')
- % vnames="X"+string(1:size(X,2));
+ % vnames="TBL"+string(1:size(TBL,2));
end
end
if ~isempty(groupvars)
vnames=setdiff(vnames,groupvars,'stable');
end
p=length(vnames);
-tabTutti=grpstats(X,groupvars,whichstats,varargin{:});
+tabTutti=grpstats(TBL,groupvars,whichstats,varargin{:});
ngroups=size(tabTutti,1);
lstats=length(nomiStat);
if ngroups>1