散文網(wǎng) » 生活 »日常 » R語(yǔ)言代做編程輔導(dǎo)M3S2 Spring - Assessed Coursework：linear model（附答案）

R語(yǔ)言代做編程輔導(dǎo)M3S2 Spring - Assessed Coursework：linear model（附答案）

2022-12-14 22:26 作者:拓端tecdat 0人讀過 | 我要投稿

全文鏈接：http://tecdat.cn/?p=30888

For this coursework you are required to download a dataset personal to you. Your dataset is available at:
http://wwwf.imperial.ac.uk/~fdl06/M3S2_cw_2015/.RData
where you must replace with your CID number. Any problems, email me. This
dataset contains a dataframe called mydat | it consists of a response y and 3 columns of
covariates x1, x2 and x3. Be aware!

Q1) (a) In R fit the normal linear model with:

Based upon the summary of the model, do you think that the model fits the data
well? Explain your reasoning using the values reported in the R summary | but
do not include the whole summary in your report.
(b) Perform a hypothesis test to ascertain whether or not to include the intercept
term | use a 5% significance level. Include your code.
(c) Conduct a hypothesis test comparing the models:
E(Y ) = β1 against E(Y ) = β1 + β2x2 + β3x3 + β4x4
as a 5% level. Include your code.
(d) By inspecting the leverages and residuals, identify any potential outliers. Name
these data points by their index number. Give your reasoning as to why you
believe these are potential outliers. You may present up to three plots if necessary

? mod=lm(y~x1+x2+x3,data=mydat)summary(mod)

從殘差值來看，擬合模型的預(yù)測(cè)值與實(shí)際數(shù)值差值較小，因此模型擬合較好。

常數(shù)項(xiàng)，x1，x2的p值均小于0.05，說明以上變量對(duì)y均有顯著的影響。

從R-square值來看，該模型的擬合程度仍有提高的空間。

?

B）#b.r

? ?mod2=lm(y~x1+x2+x3-1,data=mydat)#刪除常數(shù)項(xiàng)t.test(mod2$fitted.values,mod$fitted.values,conf.level=0.95)

從檢驗(yàn)結(jié)果來看，在5%的顯著性水平上可以看到兩個(gè)模型存在差異。

和模型1的擬合結(jié)果相比可以發(fā)現(xiàn)去除常數(shù)項(xiàng)后，模型2的R-squre要大于模型1，即擬合程度要好于模型1.

C）#c.r

? ?mod3=lm(y~1)summary(mod3)

可以發(fā)現(xiàn)包含常數(shù)項(xiàng)和僅包含常數(shù)項(xiàng)的兩個(gè)模型非常相似。P值大于0.05，因此可以接受原假設(shè)，即這兩個(gè)模型是相似的。

D）#d.r

可以發(fā)現(xiàn)第6,57,38個(gè)樣本的預(yù)測(cè)值與實(shí)際樣本值的標(biāo)準(zhǔn)殘差要大于其他值，因此可以認(rèn)為6,57,38個(gè)樣本為離群點(diǎn)。

可以看到底38,101個(gè)樣本對(duì)cook距離的值產(chǎn)生了較大的影響，明顯不同與其他樣本。因此可以認(rèn)為第38和101個(gè)樣本對(duì)模型產(chǎn)生了影響，因此可以認(rèn)為是離群點(diǎn)。

Q2) We shall now consider a GLM with a Gamma response distribution.
(a) Show that a random variable Y where Y follows a Gamma distribution with
probability density function:

(c) Rewrite (by \hand") the IWLS algorithm (similar to Algorithm 3.1 in notes on ?

page 38) specifically for the Gamma response and using the link:

This is called the inverse link function. ?

Continue to use the inverse link function for the remainder of the
questions.
(d) Write the components of the total score U1; : : : ; Up and the Fisher information
matrix for this model.
(e) Given the observations y, what is a sensible initial guess to begin the IWLS
algorithm in general?
(f) Manually write an IWLS algorithm to fit a Gamma GLM using your data, mydat,
using the inverse link and same linear predictor in Q1a). Use the deviance as the
convergence criteria and initial guess of β as (0:5; 0:5; 0:5; 0:5). Present your code
and along with your final estimate of β and final deviance.
(g) Based on your IWLS results, compute φbD and φbp and the estimates of var(βb2)

In R fit the model again with a Gamma response i.e. ?

glm(y~x1+x2+x3,family=Gamma,data=mydat)
Note the capital G in Gamma. Verify the results with your IWLS results.
(h) Give a prediction for the response given by the model for x1= 13, x2= 5 x3= 0:255
and give a 91% confidence interval for this prediction. Include your code.
(i) Perform a hypothesis test between this model and another model with the same
link and response distribution but with linear predictor η where
ηi = β1 + β2xi1 + β3xi2 for i = 1; : : : ; n:
Use a 5% significance level. You may use the deviance function here. Include
your code.
(j) Using your IWLS results, manually compute the leverages of the observations for this model | present your code (but not the values) and plot the leverages
against the observation index number.
(k) Proceed to investigate diagnostic plots for your Gamma GLM. Identify any potential outliers | give your reasoning. Remove the most suspicious data point | you must remove 1 and only 1 | and refit the same model. Compare and
comment on the change of the model with and without this data point | you
may wish to refer to the relative change in the estimated coefficients. You may present up to three plots if necessary.

? ? ? ? ? ?x3 <- mydat$x3 X=cbind(1,x1,x2,x3)ilogit <- function(u)? 1/(1+exp(-u))D <- function(mu){#deviance函數(shù)? a <- (y-mu)/mu ? b <- -log(y/mu)

? ? ? G)#g.r?eta = cbind(1,x1,x2,x3)%*%betamu=1/(eta)z = eta+((y-mu)/(-mu^2)) #form the adjusted variatew = mu^2 #weights

? ? ? ? ? ? H)#h.rmod= glm(y~x1+x2+x3,family=Gamma,data=mydat) ? ? x1= 13? pp=predict(mod, newdata=data.frame(x1,x2,x3), level = 0.91, int = 'p')#用估計(jì)的參數(shù)對(duì)樣本點(diǎn)進(jìn)行預(yù)測(cè) ? ? ?

? ? ? I)#i.rmod2=lm(y~x1+x2,data=mydat)

? 由于p值大于0.05，無(wú)法拒絕原假設(shè)H0，因此從deviance的差異度來看，可以認(rèn)為兩個(gè)模型并沒有顯著的差別。 ? ? ? ? ? ? J)#j.rplot(mod)

? ? ? K)#k.r y1=exp(beta[1]+beta[2]*x1+beta[3]*x2+beta[4]*x3)

? 從殘差擬合情況圖來看，第44，28,81號(hào)樣本點(diǎn)的殘差值較大，可能為異常點(diǎn)，其中81號(hào)樣本與擬合值的殘差是最大的。 ? ? ?

? 從正態(tài)分布qq圖來看，大部分樣本點(diǎn)分布在正態(tài)分布直線周圍，可以認(rèn)為樣本點(diǎn)的總體服從正態(tài)分布。其中44,28,81號(hào)樣本點(diǎn)里正態(tài)分布直線較遠(yuǎn)，因此可以認(rèn)為其不符合正態(tài)分布，可能是離群點(diǎn)。

?

從殘差leverage圖來看，第57,101，40號(hào)樣本具有較大的cook距離，即都對(duì)我們的預(yù)測(cè)值產(chǎn)生了較大的影響。

? 計(jì)算這3個(gè)樣本的leverage統(tǒng)計(jì)量，可以發(fā)現(xiàn)第44號(hào)樣本的值大于其他連個(gè)樣本，因此認(rèn)為第44號(hào)樣本為異常點(diǎn)，可以刪去。

對(duì)比刪去44號(hào)樣本的模型和原來的模型

? ? ? mod2=lm(y~x1+x2+x3, family = Gamma,data=mydat1)summary(mod2)

可以看到修改后的模型deviance residuals值減少了，不同變量對(duì)因變量的影響也更加顯著，因此模型的擬合度提高。

標(biāo)簽：