UPDATE: 2021-09-24 22:07:56

はじめに

ここではデータ分析のための線形代数の章末問題をRで回答していきます。

データ分析のための線形代数

表1.1が表すアンケートデータ

D <- matrix(c(1,3,1,1,4,5,26,
              1,2,1,3,1,3,22,
              2,3,1,1,1,5,27,
              1,2,2,1,1,5,34,
              1,2,2,2,3,2,28,
              3,4,1,1,3,5,20,
              4,1,2,3,4,4,18,
              4,4,1,2,4,4,30,
              2,3,2,4,2,4,23,
              1,2,2,1,4,1,32),
            nrow = 10, ncol = 7, byrow = TRUE)
colnames(D) <- paste0("Q", 1:7)
rownames(D) <- paste0("ID", 1:10)

# 1.1のアンケートデータ
D

##      Q1 Q2 Q3 Q4 Q5 Q6 Q7
## ID1   1  3  1  1  4  5 26
## ID2   1  2  1  3  1  3 22
## ID3   2  3  1  1  1  5 27
## ID4   1  2  2  1  1  5 34
## ID5   1  2  2  2  3  2 28
## ID6   3  4  1  1  3  5 20
## ID7   4  1  2  3  4  4 18
## ID8   4  4  1  2  4  4 30
## ID9   2  3  2  4  2  4 23
## ID10  1  2  2  1  4  1 32

2章の章末問題

2.1 表1.1の回答者1の設問1と2への回答、回答者2の設問1と2への回答、および回答者3の設問1と2への回答を行ベクトルで表わせ。設問1への回答者1から3の回答、および設問2への回答者1から3の回答を列ベクトルで表せ。

# 回答者1
D[1, 1:2]

## Q1 Q2 
##  1  3

# 回答者2
D[2, 1:2]

## Q1 Q2 
##  1  2

# 回答者3
D[3, 1:2]

## Q1 Q2 
##  2  3

# 設問1
D[1:3, 1]

## ID1 ID2 ID3 
##   1   1   2

# 設問2
D[1:3, 2]

## ID1 ID2 ID3 
##   3   2   3

2.2 以下のベクトルがあるとき、1から16の計算をせよ(または質問に答えよ)。演算が定義されていない場合は「定義されていない」と答えること。

\[ \begin{eqnarray} \boldsymbol{ a } =\left(\begin{array}{c} 3 \\ 4 \end{array}\right), \boldsymbol{ b } =\left(\begin{array}{c} -1 \\ -5 \end{array}\right), \boldsymbol{ c } =\left(\begin{array}{c} -3 \\ 2 \end{array}\right), \boldsymbol{ d } =\left(\begin{array}{c} 4 \\ 3 \end{array}\right), \boldsymbol{ g } =\left(\begin{array}{c} -3 \\ 2 \\ 0 \end{array}\right), \boldsymbol{ h } =\left(\begin{array}{c} 3 \\ 5 \\ -1 \end{array}\right), \boldsymbol{ p } =\left(\begin{array}{c} -3 \\ 2 \\ 0 \end{array}\right) \end{eqnarray} \]

a <- c(3,4)
b <- c(-1,-5)
c <- c(-3,2)
d <- c(4,3)
g <- c(-3,2,0)
h <- c(3,5,-1)
p <- c(-3,2,0)

2.2.1 ベクトルgとpは等しいか

# 等しい

2.2.2 ベクトルgとcは等しいか

# 等しくない

2.2.3 ||b||

norm <- function(x) sqrt(sum(x^2))
# sqrt(26)
norm(b)

## [1] 5.09902

2.2.4 ||h||

# sqrt(35)
norm(h)

## [1] 5.91608

2.2.5 -5d

-5 * d

## [1] -20 -15

library(matlib)
xlim <- c(-10,10)
ylim <- c(-20,10)
plot(xlim, ylim, type = "n", xlab = "X", ylab = "Y", asp = 1)
abline(v = 0, h = 0, col = "gray")
grid()
vectors(rbind(d, -5*d), col = c("red", "blue"))

2.2.6 a+b

a + b

## [1]  2 -1

xlim <- c(-10,10)
ylim <- c(-10,10)
plot(xlim, ylim, type = "n", xlab = "X", ylab = "Y", asp = 1)
abline(v = 0, h = 0, col = "gray")
grid()
vectors(rbind(a, b, "a+b" = a + b), col = c("red", "blue", "darkgreen"))

2.2.7 c-b

c - b

## [1] -2  7

xlim <- c(-10,10)
ylim <- c(-10,10)
plot(xlim, ylim, type = "n", xlab = "X", ylab = "Y", asp = 1)
abline(v = 0, h = 0, col = "gray")
grid()
vectors(rbind(c, b, "c-b" = c + (-1)*b), col = c("red", "blue", "darkgreen"))

2.2.8 g+p

g + p

## [1] -6  4  0

xlim <- c(-10,10)
ylim <- c(-10,10)
plot(xlim, ylim, type = "n", xlab = "X", ylab = "Y", asp = 1)
abline(v = 0, h = 0, col = "gray")
grid()
vectors(rbind(g, p, "g+p" = g + p), col = c("red", "blue", "darkgreen"))

2.2.9 h+c

# 定義されない

2.2.10 ||d-c||

# sqrt(50)
norm(d - c)

## [1] 7.071068

2.2.11 ||2d+b||

# sqrt(50)
norm(2*d + b)

## [1] 7.071068

2.2.12 ||g||-||a|

norm_g <- norm(g)
norm_a <- norm(a)
# sqrt(13) - sqrt(25)
norm_g - norm_a

## [1] -1.394449

2.2.13 a/||a||

a / norm(a)

## [1] 0.6 0.8

2.2.14 -3d-b+2a

-3*d - b + 2*a

## [1] -5  4

2.2.15 p-2g+5h

p - 2*g + 5*h

## [1] 18 23 -5

2.2.16 2p+5a-g-4b

# 定義されない

2.3 下記のベクトルhを単位ベクトルの1次結合で表わせ。

\[ \begin{eqnarray} \boldsymbol{ h } =\left(\begin{array}{c} 3 \\ 5 \\ -1 \end{array}\right) \end{eqnarray} \]

h <- c(3,5,-1)
e1 <- c(1,0,0)
e2 <- c(0,1,0)
e3 <- c(0,0,1)

# 3*e1 + 5*e2 + (-1)*e3
h[1]*e1 + h[2]*e2 + h[3]*e3

## [1]  3  5 -1

2.4 問題2.2のベクトルを用いて、1から5の計算をせよ。演算が定義されていない場合は「定義されていない」と答えること。

# c %*% b
t(matrix(c)) %*% matrix(b)

##      [,1]
## [1,]   -7

2.4.2 (g,d)

# 定義されない

2.4.3 (2p,h)

t(matrix(2*p)) %*% matrix(h)

##      [,1]
## [1,]    2

2.4.4 ベクトルcとbのなす角の余弦(cos)

# cos = (c・d) / ||c||*||b||
norm_c <- norm(c)
norm_b <- norm(b)
innerdot_cb <- as.vector(t(matrix(c)) %*% matrix(b))

# -7/(sqrt(13)*sqrt(26))
innerdot_cb / (norm_c * norm_b)

## [1] -0.3807498

xlim <- c(-5,5)
ylim <- c(-5,5)
plot(xlim, ylim, type = "n", xlab = "X", ylab = "Y", asp = 1)
abline(v = 0, h = 0, col = "gray")
grid()
vectors(rbind(c, b), col = c("red", "blue"))

2.4.5 ベクトルpとgのなす角の余弦(cos)

norm_p <- norm(p)
norm_g <- norm(g)
innerdot_pg <- as.vector(t(matrix(p)) %*% matrix(g))

# 13/(sqrt(13)*sqrt(13))
innerdot_pg / (norm_p * norm_g)

## [1] 1

xlim <- c(-5,5)
ylim <- c(-5,5)
plot(xlim, ylim, type = "n", xlab = "X", ylab = "Y", asp = 1)
abline(v = 0, h = 0, col = "gray")
grid()
vectors(rbind(p, g), col = c("red", "blue"))

2.5 表1.1の回答者6から10の設問2と6への回答について、1から5の計算をせよ。

2.5.1 設問2と6について、偏差を要素するベクトル

D2 <- D[6:10, 2]
D6 <- D[6:10, 6]

m2 <- mean(D2)
m6 <- mean(D6)

D2_hensa <- D2 - m2
D6_hensa <- D6 - m6
D2_hensa;D6_hensa

##  ID6  ID7  ID8  ID9 ID10 
##  1.2 -1.8  1.2  0.2 -0.8

##  ID6  ID7  ID8  ID9 ID10 
##  1.4  0.4  0.4  0.4 -2.6

2.5.2 2つのベクトル各々の長さ

# sqrt(6.8)
norm(D2_hensa)

## [1] 2.607681

# sqrt(9.2)
norm(D6_hensa)

## [1] 3.03315

2.5.3 2つのベクトルの内積

innerdot_hensa26 <- as.vector(t(matrix(D2 - m2)) %*% matrix(D6 - m6))
innerdot_hensa26

## [1] 3.6

2.5.4 設問2と6の相関係数

変数\(\boldsymbol{x}\)と変数\(\boldsymbol{y}\)の相関係数は下記の通り定義されます。

\[ r_{xy} = \frac{\sum(x_{i} - \bar{x})\sum(y_{i} - \bar{y})} { \sqrt{\sum(x_{i} - \bar{x})^{2} \sum(y_{i} - \bar{y})^{2}} } \]

そして、分母の標準偏差は下記のように書き換えられます。

\[ \sqrt{\sum(x_{i} - \bar{x})^{2}}=\sqrt{n}s_{x}\\ \sqrt{\sum(y_{i} - \bar{y})^{2}}=\sqrt{n}s_{y} \]

標準偏差を使って標準化を行えることを考えると、

\[ \frac{(x_{i} - \bar{x})}{s_{x}} = z_{x} \\ \frac{(y_{i} - \bar{y})}{s_{y}} = z_{y} \]

相関係数は下記のようにも書くことができます。標準化した変数の積和をベクトルの数で割ったものです。

\[ \begin{eqnarray} r_{xy} &=& \frac{\sum(x_{i} - \bar{x}) (y_{i} - \bar{y})} { \sqrt{\sum(x_{i} - \bar{x})^{2} \sum(y_{i} - \bar{y})^{2}} } \\ &=& \sum \left( \frac{(x_{i} - \bar{x})}{ \sqrt{\sum(x_{i} - \bar{x})^2} } \frac{(y_{i} - \bar{y})}{ \sqrt{\sum(y_{i} - \bar{y})^2} } \right) \\ &=& \sum \frac{(x_{i} - \bar{x})}{\sqrt{n}s_{x}} \frac{(y_{i} - \bar{y})}{\sqrt{n}s_{y}} \\ &=& \sum \frac{z_{x}}{\sqrt{n}}\frac{z_{y}}{\sqrt{n}} \\ &=& \sum \frac{z_{x} z_{y}}{n} \end{eqnarray} \]

相関係数の分子は偏差ベクトルの内積であり、偏差ベクトルを\(\boldsymbol{x}_{(m)}\)と\(\boldsymbol{y}_{(m)}\)として表すと、

\[ (\boldsymbol{x}_{(m)}, \boldsymbol{y}_{(m)}) = \sum(x_{i}-\bar{x})(y_{i}-\bar{y}) \]

相関係数の分母は各偏差ベクトルの「長さ」の積である。

\[ ||\boldsymbol{x}_{(m)}|| = \sqrt{\sum(x_{i} - \bar{x})^{2}} = \sqrt{n} s_{x} \\ ||\boldsymbol{y}_{(m)}|| = \sqrt{\sum(y_{i} - \bar{y})^{2}} = \sqrt{n} s_{y} \]

つまり、相関係数は下記のように内積とベクトルの長さを使っても表現できる。

\[ r_{xy} = \frac{(\boldsymbol{x}_{(m)}, \boldsymbol{y}_{(m)})}{||\boldsymbol{x}_{(m)}|| ||\boldsymbol{y}_{(m)}||} \]

cor(D2, D6)

## [1] 0.4551496

innerdot_hensa26 / (norm(D2_hensa) * norm(D6_hensa))

## [1] 0.4551496

2.5.5 設問2と6の共分散

共分散は下記のように定義される。

\[ cov_{xy} = \frac{1}{n} \sum (x_{i} - \bar{x})(y_{i} - \bar{y}) = \frac{(\boldsymbol{x}_{(m)}, \boldsymbol{y}_{(m)})}{n} \]

innerdot_hensa26 / length(D2)

## [1] 0.72

2.6 表1.1のデータについて、1から6の計算をせよ。

2.6.1 設問1への回答者1から3の回答を表すベクトルの長さ

D1 <- D[1:3, 1]

# sqrt(6)
norm(D1)

## [1] 2.44949

2.6.2 設問2への回答者1から3の回答を表すベクトルの長さ

D2 <- D[1:3, 2]

# sqrt(22)
norm(D2)

## [1] 4.690416

2.6.3上記1の設問1を表すベクトルから2の設問2を表すベクトルを引いたベクトル

D1 - D2

## ID1 ID2 ID3 
##  -2  -1  -1

2.6.4 上記3の設問1と2を表すベクトルの先端の間の距離

# sqrt(6)
norm(D1 - D2)

## [1] 2.44949

2.6.5 回答者2の設問1と2への回答を表すベクトルから回答者3の設問1と2への回答を表すベクトルを引いたベクトル

D[2, 1:2] - D[3, 1:2]

## Q1 Q2 
## -1 -1

2.6.6 上記5の回答者2と3を表すベクトルの先端の間の距離

# sqrt(2)
norm(D[2, 1:2] - D[3, 1:2])

## [1] 1.414214

sessionInfo()

## R version 4.0.3 (2020-10-10)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur 10.16
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] ja_JP.UTF-8/ja_JP.UTF-8/ja_JP.UTF-8/C/ja_JP.UTF-8/ja_JP.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] matlib_0.9.5
## 
## loaded via a namespace (and not attached):
##  [1] zip_2.1.1         Rcpp_1.0.6        highr_0.8         cellranger_1.1.0 
##  [5] compiler_4.0.3    pillar_1.6.2      forcats_0.5.1     tools_4.0.3      
##  [9] digest_0.6.27     jsonlite_1.7.2    evaluate_0.14     lifecycle_1.0.0  
## [13] tibble_3.1.3      pkgconfig_2.0.3   rlang_0.4.10      openxlsx_4.2.3   
## [17] crosstalk_1.1.1   curl_4.3          yaml_2.2.1        haven_2.3.1      
## [21] xfun_0.24         rio_0.5.16        stringr_1.4.0     knitr_1.33       
## [25] vctrs_0.3.8       htmlwidgets_1.5.3 systemfonts_1.0.2 hms_1.0.0        
## [29] data.table_1.13.6 R6_2.5.0          textshaping_0.3.5 fansi_0.4.2      
## [33] readxl_1.3.1      rgl_0.107.10      foreign_0.8-80    rmarkdown_2.6    
## [37] carData_3.0-4     car_3.0-10        magrittr_2.0.1    htmltools_0.5.1.1
## [41] ellipsis_0.3.2    MASS_7.3-53       abind_1.4-5       xtable_1.8-4     
## [45] ragg_1.1.3        utf8_1.1.4        stringi_1.5.3     crayon_1.4.0

データ分析と線形代数02