如何确定一组数据服从什么分布?数据如下416,782,674,1121,5,1277,1033,1167,1136,1296,1084,191,343,141,2108,720,92,351,1371,281,1589,1190,412,779,713,1767,142,228,170,313,2272,2068,1252,547,864,917,745,1812,2237,866,739,213,1062,1534,1253,715,166,1304,637,1802,1538,414,1951,787,2216,59,1382,1503,831,501,1060,2205,1544,968,758,1126,845,745,120,1507,200,1278,1361,1619,1168,1234,298,1366,1931,252,1195,1875,793,1646,1583,1892,1477,1177,922,2098,1671,1648,2039,1431,15,1060,1774,1649,1473,1920,1002,1089,1551,713,80,760,1440,716,1301,269,359,1372,2106,1256,161,683,1546,1478,1676,1398,1126,1026,1964,383,1602,1217,960,1515,611,700,1099,900,1095,2203,1248,1133,974,794,2178,1318,443,29,825,1773,174,1120,1025,310,1370,633,2159,339,1039,940,1520,770,1765,1029,2554,541,379,740,1063,1755,276,680,1450,1475,1115,990,1091,47,420,1984,1226,1486,1421,1137,21,1488,1518,307,970,810,688,12,16,483,1246,1412,1131,477,1396,182,2018,2072,640,1027,926,451,1193,1513,201,263,1409,729,1340,205,1520,1484,837,1059,147,1663,1787,1671,437,1399,1373,876,1644,1678,558,1138,1693,1638,579,1638,2141,1464,254,502,1312,1202,643,1332,1408,1521,1299,1390,58,905,204,864,579,946,312,748,1055,844,909,167,2311,711,1238,1267,821,608,686,2174,229,1457,108,2351,947,397,812,803,62,2298,720,2401,1220,628,947,779,1195,995,2158,1477,1062,1127,757,1095,1488,2128,1162,1321,1072,1023,342,1179,1103,977,1154,457,79,828,1098,1861,451,164,679,2391,1281,429,369,1023,2310,421,106,893,666,2322,216,371,106,1110,716,608,331,191,491,1469,1040,703,695,1045,639,689,504如果确定这组数据属于那种分布?怎么把它化为正态分布?.
如何确定一组数据服从什么分布?
数据如下
416,782,674,1121,5,1277,1033,1167,1136,1296,1084,191,343,141,2108,720,92,351,1371,281,1589,1190,412,779,713,1767,142,228,170,313,2272,2068,1252,547,864,917,745,1812,2237,866,739,213,1062,1534,1253,715,166,1304,637,1802,1538,414,1951,787,2216,59,1382,1503,831,501,1060,2205,1544,968,758,1126,845,745,120,1507,200,1278,1361,1619,1168,1234,298,1366,1931,252,1195,1875,793,1646,1583,1892,1477,1177,922,2098,1671,1648,2039,1431,15,1060,1774,1649,1473,1920,1002,1089,1551,713,80,760,1440,716,1301,269,359,1372,2106,1256,161,683,1546,1478,1676,1398,1126,1026,1964,383,1602,1217,960,1515,611,700,1099,900,1095,2203,1248,1133,974,794,2178,1318,443,29,825,1773,174,1120,1025,310,1370,633,2159,339,1039,940,1520,770,1765,1029,2554,541,379,740,1063,1755,276,680,1450,1475,1115,990,1091,47,420,1984,1226,1486,1421,1137,21,1488,1518,307,970,810,688,12,16,483,1246,1412,1131,477,1396,182,2018,2072,640,1027,926,451,1193,1513,201,263,1409,729,1340,205,1520,1484,837,1059,147,1663,1787,1671,437,1399,1373,876,1644,1678,558,1138,1693,1638,579,1638,2141,1464,254,502,1312,1202,643,1332,1408,1521,1299,1390,58,905,204,864,579,946,312,748,1055,844,909,167,2311,711,1238,1267,821,608,686,2174,229,1457,108,2351,947,397,812,803,62,2298,720,2401,1220,628,947,779,1195,995,2158,1477,1062,1127,757,1095,1488,2128,1162,1321,1072,1023,342,1179,1103,977,1154,457,79,828,1098,1861,451,164,679,2391,1281,429,369,1023,2310,421,106,893,666,2322,216,371,106,1110,716,608,331,191,491,1469,1040,703,695,1045,639,689,504
如果确定这组数据属于那种分布?怎么把它化为正态分布?
题目解答
答案
正态分布
平均值1035.2,置信区间(1033.2,1037.3)
方差595.5501,置信区间(594.6990,597.6117)
用MATLAB画出分布直方图,估计为正态分布
求法:设上述数据为向量X
选取“取伪”错误的概率a=0.01
利用Jarque-Bera检验原则校验数据正态分布的合理性
命令为:jbtest(X,0.01)
得到结果为0,说明数据基本符合正态分布要求
利用正态分布拟合函数求正态分布基本参数
[u,o,Au,Ao]=normfit(X,0.01)
得到平均值u,平均值置信区间Au,方差o,方差置信区间Ao.
解析
核心思路:确定数据分布类型通常需要结合图形观察和统计检验两方面。
- 图形观察:绘制直方图或密度图,观察数据分布形态是否符合典型分布特征(如正态分布的钟形对称曲线)。
- 统计检验:通过假设检验(如Jarque-Bera检验、Shapiro-Wilk检验)验证数据是否符合特定分布。
- 参数估计:若数据符合正态分布,可通过样本均值和方差估计总体参数,并计算置信区间。
破题关键:
- 直方图形态是直观判断分布的第一步。
- Jarque-Bera检验通过偏度和峰度判断正态性,若检验结果不显著,则数据符合正态分布假设。
- 正态分布参数通过
normfit函数直接估计,包含均值、方差及其置信区间。
步骤1:绘制直方图观察分布形态
使用MATLAB绘制数据直方图,观察数据是否呈现单峰对称的钟形曲线。若形态接近正态分布,则进一步验证。
步骤2:Jarque-Bera检验验证正态性
- 检验假设:
- 原假设 $H_0$:数据服从正态分布。
- 备择假设 $H_1$:数据不服从正态分布。
- 检验命令:
[h,p,jbstat,cv] = jbtest(X,0.01)- 若返回结果
h=0,说明无法拒绝原假设,数据符合正态分布。
- 若返回结果
步骤3:正态分布参数估计
- 命令:
[u,o,Au,Ao] = normfit(X,0.01)u:样本均值(估计总体均值)。o:样本方差(估计总体方差)。Au:均值的置信区间(默认99%置信水平)。Ao:方差的置信区间。
步骤4:结果分析
- 均值:$\mu = 1035.2$,置信区间 $(1033.2, 1037.3)$。
- 方差:$\sigma^2 = 595.55$,置信区间 $(594.70, 597.61)$。
- 结论:数据符合正态分布,参数估计结果可靠。