Black Iûnn ê 天跤下: 2018

2018年12月25日星期二

Python Deep Learning 深學筆記 - 基本運算个倒退攄

←前一篇後一篇→

咱uì前一篇知影倒退攄演算法个概念，透過計算圖, 咱會當知影目的就是揣出 +, dot, sigmoid (抑是其他戛火函式) 个倒退攄運算。毋過, 事事項項攏愛uì上基本个所在來起造. 比如講: dot 運算本身乘(*)佮加(+) 整--起來. 這就是 "分而治之" 的手路.

加法倒退攄

若是 z = x + y:

伊的微分是 dz/dx = 1， dz/dy = 1。也都是 x 變 1, z 嘛綴伊仝款變 1. x (抑是 y) 變偌濟, z 就變偌濟。顛倒頭講, z 變偌濟，x (抑是 y) 就變偌濟, 所以伊的倒退攄是:

乘法倒退攄

若是 z = x * y, 伊的計算圖 :

伊的偏微分: ∂z/∂x = y, ∂z/∂y = x, 所以, 伊的倒退攄算法是:

咱會使按呢來了解: uì x 這爿來講, 伊小可變化, 對輸出个影響是會有 y 倍。仝款个道理, y 小可變化, 會有 x 倍个變化。咱舉一个實際个例, 比如講: 10 * 2 = 20:

咱若是用 1 來做倒退攄个輸入, 照咱推理模型:

x 爿是 2, y 爿是 10. 這意思是講:

若是 x 變做 10 + 1 = 11, z 會成做 20 + 2 = 22
若是 y 添 1 成做 2 + 1 = 3, z 會變做 20 + 10 = 30.

按呢敢著咧? 咱看覓: 本底 10 * 2 = 20

(10 + 1) * 2 = 11 * 2 = 22 = 20 + 2
10 * (2+1) = 10 * 3 = 30 = 20 + 10

這是完全對同。

到今咱會當了解: 倒退攄演算法, 窮實就是先共基本運算个微分模型算出來。

伊代表个意義是: 輸入个一絲絲仔改變, 會造成輸出偌濟變化! 這就是微分个概念, 精差普通微分是對倒爿算到正爿, 倒退攄是對正爿算轉去倒爿!

動手用 Python 來實作

理論分析了, 咱參考 layer_naive.py: 伊用 class 來共向前行(forward propagation) 佮倒退攄(backward propagation) 敆做伙, 成做一棧: AddLayer() 是加法棧, MulLayer() 是乘法棧. 伊的寫法真直觀, 家己看就知.

因為 ReLU 佮 Sigmoid 的倒退攄攏是數學, 我就干焦共in的算式寫落來. 詳細按怎算, 你會使去看原作者的書, 抑是其他數學的冊。

ReLU 棧的倒退攄

y = x (if x > 0)

y = 0 (if x < 0)

伊的倒退攄是:

dy/dx = 1 (if x > 0)

dy/dx = 0 (if x < 0)

伊的 Python 實作參考 layers.py 的 Relu Class。

Sigmoid 棧的倒退攄

y = 1/(1+exp(-x))

伊的倒退攄是 (d表示 delta):

dx = y * ( 1 - y) * dL

中方 dL 是 y 彼爿的變化。當然，這咧公式較複雜，經過幾落改轉換。這寡轉換目的是愛佇最後，會使用 y 來表示。因為 y 是彼擺向前行算出來的值，咱欲攄轉來的 dx 佮伊有關係。

伊的 Python 實作參考 layers.py 的 Sigmoid Class。

←前一篇後一篇→

2018年11月11日星期日

Python Deep Learning 深學筆記 - 倒退攄演算法簡介

←前一篇後一篇→

倒退攄演算法 (Back Propagation) 是相對向前行演算法 (Forward Propagation) 个專有名詞. 伊是用來解決微分抑是梯度法傷過食時間个問題。

啥物是向前行演算法 (Forward Propagation)

向前行演算法, 就是咱一直講到今, uì 輸入, 第一棧, 第二棧, 到輸出个神經網路. 咱攏已經學過啊嘛奕過.

伊就是前一篇內面 predict() 函式做的代誌, 抑就是神經網路 uì 輸入 (input), 經過一棧一棧个神經元, 到輸出 (out) 這个過程, 咱若參考這篇:

3 棧 ê 神經網路, 伊會使用這組函式來表示:

A1 = np.dot(X, W1) + B1
Z1 = sigmoid(A1)

A2 = np.dot(Z1, W2) + B2
Z2 = sigmoid(A2)

A3 = np.dot(Z2, W3) + B3
y = identity_function(A3)

當然, 後來 identity_function() 是用 softmax() 函式來取代. 伊是 uì 輸入 X 行到輸出 y.

若是畫圖, 是按呢生:

伊箭頭个方向, 攏是 uì 倒爿行到正手爿, 這就是向前行演算法會計算圖(computational graph).

咱會使看著, 逐个圓箍仔, 攏是確定个數學算式, dot, +, sigmoid 這寡運算 (operation).

倒退攄演算法是想欲tshòng啥物咧?

咱翻頭來想咱為啥貨欲學微分和梯度?

目的是當當(tng-tong) 咱uì一擺學習 X0 得著 Y0 這个結果. 咱若共 Y0, 變一屑屑仔, 咱共伊寫做 dY (delta Y), 閣揣出 dX (delta X), 咱就會得著 dY/dX 這个斜率/梯度. 咱就有法度決定對佇一个方向徙振動, 會予 Y 收縮, 行向咱想欲挃的值.

微分佮梯度是完全照數學理論, 干焦了解數學理論, 你就了解伊佇創啥貨. 毋過實際傷開時間, 無實用.

咱用另外一款思考, 有法度親像向前行演算法相siâng, 一節一節算過去, 按呢咧?

若有法度揣著內面逐點(圓箍仔) 个倒退攄个運算, 按呢咱是毋是就會當得著: dY/dX, dY/dW, dY/dB1, ..., 這寡微分/梯度咧? 當然, 咱上注心是揣出 dW1, dW2 欲變uì佗位去.

內面逐點, 我加一个問號 "?", 就是咱紲落來个欲解決个問題: 除了用計算圖解說倒退攄演算法會用得, 閣愛一个一个揣出 dot, +, sigmoid, ... 等等个運算欲按怎倒退攄?

←前一篇後一篇→

2018年10月28日星期日

Python Deep Learning 深學筆記 - 完全照微分理論會拄著的問題

←前一篇後一篇→

咱來走看覓

咱進前介紹个 5 个步序，踮 two_layer_net.py 共伊分別實作予好, 佇 train_neuralnet.py 共這五个步事整起來, uì 頭迵到尾, 行看覓:

$ python3 train_neuralnet.py 
train acc, test acc | 0.0903, 0.0899
train acc, test acc | 0.7732833333333333, 0.7781
train acc, test acc | 0.87545, 0.879
train acc, test acc | 0.89705, 0.8995
train acc, test acc | 0.907, 0.9103
train acc, test acc | 0.9141, 0.9184
train acc, test acc | 0.9193333333333333, 0.9221
train acc, test acc | 0.9239666666666667, 0.9249
train acc, test acc | 0.9285666666666667, 0.9291
train acc, test acc | 0.9303, 0.9318
train acc, test acc | 0.9334, 0.9352
train acc, test acc | 0.9364833333333333, 0.9377
train acc, test acc | 0.9385, 0.939
train acc, test acc | 0.9416166666666667, 0.942
train acc, test acc | 0.9425666666666667, 0.9417
train acc, test acc | 0.9447, 0.9435
train acc, test acc | 0.94585, 0.9454

紲落來閣彈出來一張圖, 看來真順利.

若有認真看原始碼 (Source code)

毋過, 咱斟酌看 train_neuralnet.py , 中方有

    grad = network.gradient(x_batch, t_batch)

這敢是咱佇遮介紹規半晡个 numerical_gradient()? 毋是呢, 伊是 gradient(), 毋是 numerical_gradient().

歡喜傷早囉!

閣看 two_layer_net.py 內面有 numerical_gradient(), 嘛有 gradient() 函式. 按呢斟酌看來,
train_nerual_net.py 並無用咱的講規半晡个 numerical_gradient().

按呢毋是佇咧裝痟个?

共伊改轉來 numerical_gradient() 奕看覓

我kā train_neuralnet.py 的 graidient() 改轉來:

    grad = network.numerical_gradient(x_batch, t_batch)

行看覓, 等真久, 干焦出現一逝:

train acc, test acc | 0.09736666666666667, 0.0982

就袂振袂動，時間堅凍.

發生啥物代誌咧? 一開始我是懷疑程式有蟲 (bug), 是毋是愛毒蟲(thāu-thâng)?

後來, 我用 timeit Má-tsìo 來小可看覓 numerical_gradient() 開偌濟時間, 按呢寫:

import timeit      
...
time_start = timeit.default_timer()
grad = network.numerical_gradient(x_batch, t_batch)
print("grad cal: {}".format(timeit.default_timer() - time_start))

結果是:

iter_per_epoch=600.0
grad cal: 46.7133047870002
train acc, test acc | 0.09736666666666667, 0.0982

頭一逝是我另外kā iter_per_epoch 印出來. 咱佇遮看著:

行一遍 numerical_gradient() 愛四十五秒.

iter_per_epoch 是設定走幾輾, 印一擺 "train acc, test acc". 頭一擺先印, 紲落來就愛 600 擺 numerical_gradient() 了後才會閣印.

若是 45 * 600 = 27000 秒, 差不度是 7.5 點鐘才會印第二擺.

若是參考頂懸彼例使用 gradient(), 印 "train acc, test acc" 16 擺才收縮到滿意的程度，numerical_gradient() 就愛五工才有初步个結果。

以上个時間計算，愛看你个電腦速度, 你算出來無一定佮我相仝。

gradient() 佮 numerical_gradient() 這兩个方法，時間那會差遐爾濟?

gradient() 佮 numerical_gradient() 時間差蓋濟

咱若是仝款用 time_it 來算 gradient() 方法 (method) 个時間，伊是量其約 0.01 秒, 和 45 比起來有 4000 外倍時間个精差.

gradient() 是按怎會遐爾仔緊, 伊就是用倒退攄 (Back Propagation) 法啦!

←前一篇後一篇→

2018年8月6日星期一

Python Deep Learning 深學筆記 - 神經網路的學習

←前一篇  後一篇→

前兩篇紹介微分和梯度, 閣有 in 按怎收縮(convergence), 去揣著上細值梯度法, 是為著這篇做準備个: 透過梯度法, 共損失函式收縮到接近 0.

若按呢生, 伊的流程是:

先決定欲偌濟神經元, 幾棧
先予權重 (W) 和 B 一个初值. 普通這是用隨機 (random) 方式來起頭
用細批个資料, 來算出伊的損失函式 (loss function) 值.
計算損失函式佇彼的值的梯度, 伊會顯示行佗一个方向, 損失函式會減上濟
共權重（W) 換新算出來, 閣轉去 3, 4, 5, 重複運算, 一直到滿意的答案, 抑就是損失函式足倚零足倚零, 到咱會接受个戶橂.

當然, 3 內面的細批資料, 逐遍攏無仝!

下面, 咱若看若參考: two_layer_net.py

先決定欲偌濟神經元, 幾棧, 起頭的值

two_layer_net.py 內底, __init__() 就是定義這个神經網路: 伊是兩棧ê神經網路:

a1 = W1 * x + b1
a2 = W2 * a1 + b2
y = softmax(a2)

咱直接揤頂仔彼咧連結, 去看 __init__(), 伊用 numpy ê randomn 函式來生出一个陣列, numpy 个簡介, 會用參考遮. 伊个大細, 是呼(khoo)个人, 佇呼 ê 時陣交代落來个.

伊會用 Python 字典 (dictionary) 這種資料型態.

因為 x 是輸入, 佇 __init__() 是定義 W1, b1, W2, b2.
input_size 是 x 的數目. hidden_size 是中方彼棧个數目。
W1 个大細是(橫, 直) (input_size, hidden_size) 个陣列.
W2 个大細是(橫, 直) (hidden_size, output_size) 个陣列.

    def __init__(self, input_size, hidden_size, output_size, weight_init_std=0.01):
        self.params = {}
        self.params['W1'] = weight_init_std * np.random.randn(input_size, hidden_size)
        self.params['b1'] = np.zeros(hidden_size)
        self.params['W2'] = weight_init_std * np.random.randn(hidden_size, output_size)
        self.params['b2'] = np.zeros(output_size)

這就是頂懸講个第二步.

計算損失函式

計算損失函式進前, 愛先根據一組輸入，來算出輸出, 這是進前介紹過个 predict(), 也就是真正行這个 predict() 函式:

a1 = W1 * x + b1
a2 = W2 * a1 + b2

就是這个函式:

    def predict(self, x):
        W1, W2 = self.params['W1'], self.params['W2']
        b1, b2 = self.params['b1'], self.params['b2']
    
        a1 = np.dot(x, W1) + b1
        z1 = sigmoid(a1)
        a2 = np.dot(z1, W2) + b2
        y = softmax(a2)
        
        return y

紲落來，算出伊的 cross_entropy_error() 損失函式, 就是 y = softmax(a2) 个實作:

    def loss(self, x, t):
        y = self.predict(x)     
        return cross_entropy_error(y, t)

這是頂懸个第三步.

2018年7月17日星期二

Python 程式的 Profiling 2 - cProfile Má-tsìo (Module)

cProfile Má-tsìo (Module)

cProfile 是共規个程式來分析, 程式內面个函式, 予人呼幾遍? 逐遍開偌濟時間? 予你真簡單就會使了解程式个關頭佇佗位? 紲落來个例, 有一寡是直接 uì 伊的官網提過來.

上完整个了解 cProfile, 當然是參考伊的官網. 佇遮, 我有閣參考這篇: Python 用 cProfile 測量程式效能瓶頸與 gprof2dot 視覺化分析教學 , 有興趣會使揤入去看.

簡單例

佮 timeit 相siâng, 伊會使用佇命令列直接用抑是 Python 內底用 Má-tsìo 方式共伊 import 入來使用。咱先介紹命令列, 用法是:

python -m cProfile [-o output_file] [-s sort_order] (-m module | myscript.py)

-s : 是排序(pâi-sū) 个根據, 較捷捷用个有:

'calls' :予人呼个數目. 這是預設值
'cumtine' :粒積个時間

咱來起行 train_neuralnet.py 做一个例較清楚: (注意: 這个咧愛uì https://gitlab.com/Iunn/deep-learning-from-scratch/tree/master 全部搝落來才會振動)

$ python3 -m cProfile train_neuralnet.py
train acc, test acc | 0.09915, 0.1009
...
train acc, test acc | 0.9449333333333333, 0.9429
train acc, test acc | 0.9467, 0.9431
         887729 function calls (881058 primitive calls) in 237.808 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      571    0.001    0.000    0.001    0.000 :103(release)
      360    0.000    0.000    0.000    0.000 :143(__init__)
      360    0.000    0.000    0.004    0.000 :147(__enter__)
      360    0.000    0.000    0.001    0.000 :151(__exit__)
      571    0.002    0.000    0.003    0.000 :157(_get_module_lock)
      359    0.001    0.000    0.001    0.000 :176(cb)
      211    0.000    0.000    0.001    0.000 :194(_lock_unlock_module)
    479/7    0.000    0.000    0.403    0.058 :211(_call_with_frames_removed)
     3475    0.001    0.000    0.001    0.000 :222(_verbose_message)
        5    0.000    0.000    0.000    0.000 :232(_requires_builtin_wrapper)
....

伊會自動共上定定用个囥頭一个, 才那來若細排--落來. 欄位(nuâ-uī) 个意義是按呢:

ncalls :予人呼幾擺
tottime : 佇這个函式來底, 無共囝函式 (subroutine) 算在內, 開偌濟時間
percall : tottime/ncall, 也就是一个 call 个時間。
cumtime : 粒積个時間, 共囝函式个時間攏算入來
percall :第二个 percall是 cumtime/ncalls
filename:lineno :函式名和第幾行

咱若是用 -s cumulative 來行, 伊就是按呢生:

$ python3 -m cProfile -s cumulative train_neuralnet.py
train acc, test acc | 0.09751666666666667, 0.0974
。。。。

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    318/1    0.009    0.000   36.585   36.585 {built-in method builtins.exec}
        1    2.237    2.237   36.585   36.585 train_neuralnet.py:2()
    70371   17.960    0.000   17.960    0.000 {built-in method numpy.core.multiarray.dot}
    10000    0.667    0.000   14.686    0.001 two_layer_net.py:55(gradient)
    10034    0.619    0.000   14.124    0.001 two_layer_net.py:18(predict)
    10000    0.035    0.000    7.642    0.001 two_layer_net.py:30(loss)
       34    0.043    0.001    7.193    0.212 two_layer_net.py:35(accuracy)
    40034    6.362    0.000    6.362    0.000 functions.py:13(sigmoid)

前兩個就差不多是規个 Má-tsìo, uì 第三个開始是咱的函式名. -s 个完整參數佇官網有. 你嘛會使看輸出个欄位名.

共輸出儉入去檔案

頂懸个命令, 干焦共結果 uì 標準輸出 (stdout)。咱嘛會使先共結果儉--起來，事後才閣用另一个Má-tsìo來分析. 按呢咱就會使用無仝个排序來斟酌分析結果, 毋免行幾落變程式。
若是欲共輸出儉佇檔案, 愛添 -o:
.

$ python3 -m cProfile -o train_neuralnet.pstats train_neuralnet.py

綴佇後壁 train_neuralnet.pstats 是檔案名.
徙落來, 咱會使用 -m pstats 來分析這个檔案:

$ python3 -m pstats -o train_neuralnet.pstats

你揤 ENTER 了後, 就入去一咧殼 (Shell), 這个殼裡, 是 pstats 的世界, 你會當落 pstats 个命令:

train_neuralnet.pstats % sort cumtime       # 用 cumtime 排序
train_neuralnet.pstats % stats              # 共伊印出來
Mon Jul 23 20:51:48 2018    ./out.pstate

         898211 function calls (891473 primitive calls) in 32.642 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    318/1    0.010    0.000   32.642   32.642 {built-in method builtins.exec}
        1    1.784    1.784   32.642   32.642 train_neuralnet.py:2()
    70612   14.804    0.000   14.804    0.000 {built-in method numpy.core.multiarray.dot}
    10000    0.541    0.000   12.327    0.001 /home/black/data/myRoadMap/myGit/my_python/my-deep-learning/deep-learning-from-scratch/ch04/two_layer_net.py:55(gradient)
    10034    0.476    0.000   11.870    0.001 /home/black/data/myRoadMap/myGit/my_python/my-deep-learning/deep-learning-from-scratch/ch04/two_layer_net.py:18(predict)
    10000    0.027    0.000    6.370    0.001 /home/black/data/myRoadMap/myGit/my_python/my-deep-learning/deep-learning-from-scratch/ch04/two_layer_net.py:30(loss)
       34    0.019    0.001    6.070    0.179 /home/black/data/myRoadMap/myGit/my_python/my-deep-learning/deep-learning-from-scratch/ch04/two_layer_net.py:35(accuracy)
    40034    5.631    0.000    5.631    0.000 ../common/functions.py:13(sigmoid)
.....
train_neuralnet.pstats % sort ncalls          # 用 ncalls 排序
train_neuralnet.pstats % stats 10             # 共伊印出來, 干焦印十行
Mon Jul 23 20:51:48 2018    ./out.pstate

         898211 function calls (891473 primitive calls) in 32.642 seconds

   Ordered by: call count
   List reduced from 3637 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    80644    1.161    0.000    1.161    0.000 {method 'reduce' of 'numpy.ufunc' objects}
74790/74786    0.038    0.000    0.042    0.000 {built-in method builtins.isinstance}
    70612   14.804    0.000   14.804    0.000 {built-in method numpy.core.multiarray.dot}
    50072    0.022    0.000    0.542    0.000 _methods.py:31(_sum)
    50068    0.146    0.000    0.720    0.000 fromnumeric.py:1778(sum)
    40034    5.631    0.000    5.631    0.000 functions.py:13(sigmoid)
    26000    0.005    0.000    0.005    0.000 {method 'append' of 'list' objects}
    20514    0.003    0.000    0.003    0.000 {method 'startswith' of 'str' objects}
    20167    0.013    0.000    0.540    0.000 _methods.py:25(_amax)
    20109    0.079    0.000    0.619    0.000 fromnumeric.py:2222(amax)

uì頂仔个例, 咱會使隨時換捌種排序个方式, 免閣重行規个程式. 咱嘛會寫落 help 來看命令ê用法.

train_neuralnet.pstats% help

Documented commands (type help ):
========================================
EOF  add  callees  callers  help  quit  read  reverse  sort  stats  strip

train_neuralnet.pstats% help sort
Sort profile data according to specified keys.
(Typing `sort' without arguments lists valid keys.)
train_neuralnet.pstats% sort
Valid sort keys (unique prefixes are accepted):
calls -- call count
ncalls -- call count
cumtime -- cumulative time
cumulative -- cumulative time
file -- file name
filename -- file name
line -- line number
module -- file name
name -- function name
nfl -- name/file/line
pcalls -- primitive call count
stdname -- standard name
time -- internal time
tottime -- internal time
train_neuralnet.pstats%

Má-tsìo 方式 (Module)

就是用 import 共伊搝入來, 使用伊的 run() 方法:

import cProfile
import re
cProfile.run('re.compile("foo|bar")', 'restats')

佇頂仔个例, 咱對 re.compile("foo|bar") (re 是 regular expression)做 profile, 共結果儉入去restats 這个檔案中方.

咱會當用 pstats Má-tsìo 來共 restats 轉做人看有个型式:

import pstats

p = pstats.Stats('restats')
p.strip_dirs().sort_stats(-1).print_stats()

佇我的電腦 (Ubuntu 18.04, Python 3.6.5), 輸出:

Wed Jul 18 21:01:07 2018    restats

         199 function calls (194 primitive calls) in 0.001 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.001    0.001 :1()
        4    0.000    0.000    0.000    0.000 enum.py:265(__call__)
        4    0.000    0.000    0.000    0.000 enum.py:515(__new__)
        2    0.000    0.000    0.000    0.000 enum.py:801(__and__)
        1    0.000    0.000    0.001    0.001 re.py:231(compile)
....

咱會使看著: 伊是用名个順序來排先後.

排序个根據

若是欲改變排序个根據, 佇咧 sort_stats() 內底囥無仝个參數, 親像 'time', 'cumtime', 就會使.

>>> help(p.sort_stats)

>>> p.sort_stats('cumtime')

>>> p.print_stats()
Wed Jul 18 21:12:10 2018    restats

         199 function calls (194 primitive calls) in 0.001 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.001    0.001 {built-in method builtins.exec}
        1    0.000    0.000    0.001    0.001 :1()
        1    0.000    0.000    0.001    0.001 re.py:231(compile)
        1    0.000    0.000    0.001    0.001 re.py:286(_compile)

NOTE: 我是佇 Python 3.6.x 版, 猶原是用字串, 若是 3.7 版, 就愛 import SortKey, 用伊的定義好个符號:

from pstats import SortKey
p.sort_stats(SortKey.NAME)

減省輸出

有當時, 咱干焦想欲看較重要个代誌, 無想欲全部个資訊攏摒摒--出來, 咱看甲目睭花, 咱會使佇 print_stats() 个參數添數字, 比如講 5 就是咱干焦想欲看上懸个五个函式:

>>> p.sort_stats('time').print_stats(5)
Wed Jul 18 21:12:10 2018    restats

         199 function calls (194 primitive calls) in 0.001 seconds

   Ordered by: internal time
   List reduced from 42 to 5 due to restriction <5>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.001    0.001 {built-in method builtins.exec}
      3/1    0.000    0.000    0.000    0.000 sre_parse.py:173(getwidth)
        1    0.000    0.000    0.001    0.001 re.py:286(_compile)
      3/1    0.000    0.000    0.000    0.000 sre_compile.py:64(_compile)
        2    0.000    0.000    0.000    0.000 sre_parse.py:470(_parse)

你嘛會使干焦看指定ê函式名:

>>> p.sort_stats('time').print_stats('__init__')

抑是看啥人呼伊? 抑是予人呼?

>>> p.print_callers(,'init')
...
>>> p.print_callees()

命令列用法和 Má-tsìo 用法 ê 討論

命令列用法 -m cProfile 是方便, 毋過, 伊是 uì 程式外爿開始做, 所以戲文本身嘛會算入來. 親像頂仔个例:

$ python3 -m cProfile -s cumulative train_neuralnet.py
...
...
        1    2.237    2.237   36.585   36.585 train_neuralnet.py:2()

若是用 Module 用法, 伊就是真正是戲文內面使用著个物件. 而且, 伊會使佇你干焦想欲 profiling 个所在做就好:

import cProfile

pr = cProfile.Profile()
pr.enable()
....                           # 想欲 profiling 个所在
pr.disable()
pr.print_stats(sort='time')

若是欲趕時間, 就是用命令列, 若是欲做較幼路个, 就是使用 Má-tsìo 用法.

用 gprof2dot 來生圖形

親像頂懸介紹个輸出, 是一逝一逝个表。雖罔有排序, 毋過對咱人來講, 也是較食力看. 若是會使用圖形, 較直接.

若是欲按呢做, 愛用著另外兩種工具, 一个是 pyhton 个 Má-tsìo (Module), 另一个是 Linux 頂懸个包袱仔 (package).

咱這馬來安 gprof2dot Má-tsìo:

sudo pip3 install gprof2dot

另外閣安 graphviz:
佇 Ubuntu/Debian Linux:

sudo apt-get install graphviz

佇 Fedora Linux:

sudo yum install graphviz

若是你的 sudo 袂喈, 你就愛成做 root 使用者.

咱用進前个 train_neuralnet.py 來生出 train_neuralnet.pstats

python3 -m cProfile -o train_neuralnet.pstats train_neuralnet.py

閣來才利用 pstats 來生出 png 圖檔:

python3 -m gprof2dot -f pstats train_neuralnet.pstats |dot -T png -o train_neuralnet-profile.png

頂懸个命令你先共伊看做公式, 除了輸入 train_neuralnet.pstats 佮輸出 train_neuralnet-profile.png 你會當家己指定, 賰咧攏固定. 你若是有興趣欲了解閣較濟, 會用个參考 Graphviz 官網. 這是伊的輸出:

若是欲放大, 點遮.
咱佇圖裡, 真簡單就會當看出來: gradient 39% 佔上濟, predict 38%. 毋過佇較低層內面, numpy 个 dot 佔欲 48%.

2018年7月13日星期五

Python 程式的 Profiling 1 - timeit

啥物是 Phū-lō-hai-līng (Profiling)? 踮軟體界, 伊是一種收集, 分拆當佇運行程式个手路. 比如講, 有時你想欲知影你寫个函式, 抑是開偌濟時間, 予人呼幾遍。

Profiling 會使用家己个智識來寫, 嘛會使用人攢便便的工具。Python 本是就有工具會當使用.

Timeit Má-tsìo (Module)

Timeit 是一个 Má-tsìo，來計算一屑仔程式(code snippet)个時間真好用. 伊會使用佇應聲模式抑是 Python 戲文模式內底。伊的使用方法蓋簡單:

簡單例

以下是應聲模式, (這例是uì Python 官方網站个例 kok-pì 過來):

$ python3 -m timeit '"-".join(str(n) for n in range(100))'
10000 loops, best of 5: 30.2 usec per loop
$ python3 -m timeit '"-".join([str(n) for n in range(100)])'
10000 loops, best of 5: 27.5 usec per loop
$ python3 -m timeit '"-".join(map(str, range(100)))'
10000 loops, best of 5: 23.2 usec per loop

這是戲文模式:

>>> import timeit
>>> timeit.timeit('"-".join(str(n) for n in range(100))', number=10000)
0.3018611848820001
>>> timeit.timeit('"-".join([str(n) for n in range(100)])', number=10000)
0.2727368790656328
>>> timeit.timeit('"-".join(map(str, range(100)))', number=10000)
0.23702679807320237

number 是予伊做幾擺. 應聲模式是自動決定. 因為電腦運行環境四常變化, 愛予伊走較濟遍才來揀出較合軀个數字.
伊的介面真濟, 完整个介紹請參考遮. 這馬我干焦欲展示用著个.

`timeit.default_timer`()

伊是真簡單用, 佇戲文模式中方, 計算一段 Khóo (code) 運行時間. 伊的使用方法真間單，我以我所寫的程式做例:

import timeit

time_start = timeit.default_timer()
grad = network.numerical_gradient(x_batch, t_batch)
time_end = timeit.default_timer() - time_start

就是用 default_timer() 共一段 Khóo (code) 挾起來, 取伊前後个時間, 相減就是伊運行个時間。

當然, timeit 个使用毋但按呢, 家己若有需要會使去伊个官網踅踅, 抑是後擺我有用著, 才閣添入來.

若是欲挃閣較詳細个資訊, 著愛使用 cProfile module

2018年6月24日星期日

Python ê 基礎 -- 字典(Dictionary)

和名單(Lists)無仝, 字典(Dictionary)毋是用數字來囥物件佮提物件. 你會使用任何會當作 Khí (key) 个物件來囥物件。這有啥物好空咧? 按呢對咱人來講, 較好了解和使用. 咱人記有意義的名, 比單純數字加記較會牢.

咱先舉例按呢較簡單了解:

起造一个字典

>>> score = { 'john': 90, 'may': 60}     ### 幾落--ê
>>> score[ 'jack' ] = 75                 ### 一擺一个
>>> score
{'john': 90, 'may': 60, 'jack': 75}
>>> score[ 'john' ]
75

若是一擺欲起做幾閣个 Khí/值, 咱是用虯虯个 { 佮 } 共伊包--起來, 我暫是共伊號做虯號.
一對个 Khí/值中方用兩點 : 共 in 分開.
一對佮一對, 用讀點 , 共 in 分開.

若是干焦一對, 就直接用名加角號 [], 角號內面囥 Khí. 用 score[ KEY ] = VALUE 的運算式共伊添入去 score 內面.

佇上尾兩的算式, 干焦 score 就是共所有，一對一對个 Khí/值印出來. 抑是干焦欲提指定彼咧Khì 的值.

其實, 這用法和名單真成，精差佇名單伊的角號內面干焦會使囥數字爾爾.

共一對Khí/值提挕捒

>>> del score['may']
>>> score
{'john': 90, 'jack': 75}

揣出字典所有个Khí

你會使用 keys() 方法來共所有个 Khí 印出來:

>>> score.keys()
dict_keys(['john', 'jack'])
>>> list(score.keys())
['john', 'jack']
>>> sorted(list(score.keys()))
['jack', 'john']

score.keys() 是一个 dict_keys 物件, 咱用 list() 共伊成做名單. 毋過, 伊是無照順序來排, 咱用 sorted() 函式來予依照英文字母順序.

啥乜物件會使做 Khí

咱看著字串會使做 Khí。閣有啥物會使做 Khí 咧? 答案是: 袂改變个物件 (immutable type). 數字和字串是當然會用得, 名單(Lists)袂用得. Tuple 若是無會變動个內容在內, 嘛是會使.

2018年5月27日星期日

Python Deep Learning 深學筆記 - 梯度

←前一篇  後一篇→

梯度法(Thui-tōo huat)

一維个微分, 佇兩維以上, 就號做梯度 (Gradient).

一維微分較簡單了解意義, 咱毋才先用伊做例, 了解切線 (tangent line), 斜率 (slope), 和 f(x) 變化个趨勢个關係. 利用這个趨勢, 咱會用咧寬寬仔倚近上懸點抑是上低點.

啥乜是偏微分?

若是維度是 2 以上, 就愛利用偏微分 (partial differential), 一擺對其中一个變數微分, 比如講:

f(x0, x1) = x0**2 + x1**2

咱會看伊的生張: 運行 meshplot-x2y2.py: (咱無欲深入了解按怎共伊畫出來, 若是對伊有興趣, 遮有一寡討論)

現此時 f(x0, x1) 有兩个變數, 佇一个點, 都愛分開: 先 x1 固定數字, 對 x0 微分一擺, 才閣固定 x0, 對 x1 微分(詳細个理論當然愛去上課, 咱遮先抾來用), 掠準咱佇 (5, 6) 這點:

對 x0 偏微分, 先共 x1 固定佇咧 6:

def function_tmp1(x0):
    return x0 ** 2 + 6 ** 2.0

對 x1 偏微分, 先共 x0 固定咧 5:

def function_tmp2(x1):
    return 5 ** 2 + x1 ** 2.0

這兩的合起來, 就成做 f 函式佇 (5, 6) 這點个微分, 咱佇記做:

用 Python 來寫二維个偏微分

咱理解了後, 欲按怎寫二維个偏微分咧? 參考 gradient_2d.py 內面个 numerical_gradient():

def _numerical_gradient_no_batch(f, x):
    h = 1e-4 # 0.0001
    grad = np.zeros_like(x)
    
    for idx in range(x.size):
        tmp_val = x[idx]
        x[idx] = float(tmp_val) + h
        fxh1 = f(x) # f(x+h)
        
        x[idx] = tmp_val - h 
        fxh2 = f(x) # f(x-h)
        grad[idx] = (fxh1 - fxh2) / (2*h)
        
        x[idx] = tmp_val # 値を元に戻す
        
    return grad

共空逝算在內, 干焦 16 逝, 就會使做 2 維个數字微分 (numerical), 真奅!

佇遮, 輸入个參數兩个: f 是函式, x 是 numpy 陣列 (array). 這个 array 照理是 (x0, x1), idx == 0 是維度 0, idx == 1 是維度 2.咱用頭前个 (5, 6) 做例, x 就是 [x0=5, x1=6]. 你會使奕看覓, 使用 gradient_2d.py, 這个例, 起

python3 -m pdb gradient_2d.py

pdb 是 Python 个 debug module, 會使予你一步一步來掠蟲, 伊就是 Python 个 GDB, 你若是對 GDB 有熟, 伊對你就是零星个. 就算你袂曉, 你就照我紲落來寫个起字--落去就用得:

你看著:

-> import numpy as np
(Pdb)

表示你入去 pdb, 起 "b 59", 徙落去起 "l 58"

(Pdb) b 59
Breakpoint 1 at /..../deep-learning-from-scratch/ch04/gradient_2d.py:59
(Pdb) l 58
 53       x1 = np.arange(-2, 2.5, 0.25)
 54       X, Y = np.meshgrid(x0, x1)
 55   
 56       X = X.flatten()
 57       Y = Y.flatten()
 58   
 59 B     grad = numerical_gradient(function_2, np.array([X, Y]) )
 60   
 61       plt.figure()
 62       plt.quiver(X, Y, -grad[0], -grad[1],  angles="xy",color="#666666")#,headwidth=10,scale=40,color="#444444")
 63       plt.xlim([-2, 2])

(Pdb)

"b 59" 是叫伊踮 59 彼逝設擋點 (break point), "l 58" 只是予你看咱較實佇 59 有一个擋點.

紲落來, 起 "c", "interact":

(Pdb) c
> /.../deep-learning-from-scratch/ch04/gradient_2d.py(59)()
-> grad = numerical_gradient(function_2, np.array([X, Y]) )
(Pdb) interact
*interactive*
>>>

"c" 是 continue, 就是叫程式起行, 伊會擋佇咧 59 逝, "interact" 是進入對話模式 (interactive mode), 這个咱會使用 numerical_graident() 和 function_2() 這个函式:

>>> numerical_gradient(function_2, np.array([5.0,6.0]))
array([ 10.,  12.])
>>> numerical_gradient(function_2, np.array([3.0,4.0]))
array([ 6.,  8.])
>>> numerical_gradient(function_2, np.array([0.0,2.0]))
array([ 0.,  4.])

咱看著 function_2 佇 [5, 6], [3, 4], [ 0, 2] 个梯度. 內面 [3, 4], [0, 2] 和書頂仔个答案仝款

使用 pdb, 就是我貧惰閣 copy numerical_gradient()/funcion_2 來做單元測試, 直接用 gradient_2d.py 寫便个來看伊的功能.

踮 gradient_2d.py 內面 numerical_gradient() 个疑問

佇課本伊的梯度圖是用 gradient_2d.py 共伊畫--出來:

圖看起來真媠嘛真合理: 箭頭較長的, 就是變動較大个, 上中方是上低點. 毋過, 我一直卡牢佇咧一个所在:

    X = X.flatten()
    Y = Y.flatten()
    
    grad = numerical_gradient(function_2, np.array([X, Y]) )

照道理, _numerical_gradient_no_batch 个第二个參數是 x 是 [x, y] 對, 佇遮就是 [ X, Y] 對, 分別對應著 x 杆, y 杆个值.

毋過佇 numerical_gradient() 呼 _numerical_gradient_batch() 个時,

def numerical_gradient(f, X):
    if X.ndim == 1:
        return _numerical_gradient_no_batch(f, X)
    else:
        grad = np.zeros_like(X)
        
        for idx, x in enumerate(X):
            grad[idx] = _numerical_gradient_no_batch(f, x)
        
        return grad

進前小節 用 Python 來寫二維个偏微分 咱使干焦使用

numerical_gradient(function_2, np.array([5.0,6.0]))

是一維陣列, 伊的 X.ndim==1, 會行 if 彼个 case, 所以 X 是真正 (x, y), 一个 x 杆, 一个 y 杆, 傳入去 _numerical_gradient_no_batch(), 這無問題! (x0 對應著 X, x1 對應著 Y).

毋過, 因為這馬 X, Y 毋是孤一个數字, 是 array, 所致 X.ndim 會是 2, 行 else 這爿.

若是經過 for idx, x in enumerate(X) 處理, 變成原在:

idx=0 是原在个 X, 傳入去 _numerical_gradient_no_batch(function_2, X).

idx=1 是 _numerical_gradient_no_batch(function_2, Y)if 和 else 這兩條路線傳入去 _numerical_gradient_no_batch() 个參數意義無仝, 我踮遮就拍結矣!

我踮遮擋真久, 想袂通. 毋過, 無法度閣再拖落去, 就先共伊記錄起來, 後擺有機會才閣轉來解決一个問題!

←前一篇後一篇→

2018年5月5日星期六

Python matplotlib 欲按怎畫網仔圖 (Mesh plot)

佇做深學个筆記个時陣, 發現我畫袂出來 f(x0, x1) = x0**2 + x1**2 个圖, 所致家己揣資料來試看覓:

咱參考: meshplot-x2y2.py:

伊親像一个垂落來个網仔. 這个程式書頂仔無寫, 我是家己試--出來:

    x0 = np.arange(-3, 3, 0.1)
    x1 = np.arange(-3, 3, 0.1)
    X, Y = np.meshgrid(x0, x1)

    ## This is OK
    Z = X**2 +Y**2 

    ## This have errors
    # ValueError: shape mismatch: objects cannot be broadcast to a single shape
    # Z = function_2(np.array([X, Y])) 
    
    fig = plt.figure()
    ax = fig.gca(projection='3d')

    surf = ax.plot_surface(X, Y, Z)
    plt.show()

總講一句: 用 np.meshgrid() 產生 X, Y 來做底蒂, 才共你的 z = f(x, y) 用落去, 就是這幾步:

    X, Y = np.meshgrid(x0, x1)
    ... 你的函式來生出 Z
    fig = plt.figure()
    ax = fig.gca(projection='3d')
    surf = ax.plot_surface(X, Y, Z)

就會用之.

我拄著一个問題: 用 np.meshgrid(x0, x1) 轉出个 X, Y 無法度用咧咱寫好个 function_2. 毋過 Z = X**2 + Y**2 會使. 這當然是因為咱的 function_2 个寫法無支援, 聽候我有閒研究看按怎解較好勢!

2018年5月3日星期四

Python Deep Learning 深學筆記 - 微分和揣上細值

←前一篇後一篇→

微分(bî-hun)个觀念

微分个基礎觀念真簡單: 咱若有一个函式 y=f(x), 若是 x 變化真幼, 咱共伊標做 dx, 按呢 y 嘛綴 dx 變一點點矣, 咱共伊標做 dy, dy 親像這張圖:

y = f(x) 是一个函式, dy/dx 嘛是一个函式. in 兩的是啥物關係咧? 咱舉一个例: y = x**2, 這是真簡單个微積分, 咱先寫伊的表示:

def func_y(x):
     return x**2.0

紲落來, 欲按怎表示微分咧? 若是數學个微分理論, 愛處理連紲性(continuity) 和無限細个問題, 伊的答案號做解析 (analytic), 是完全正確--ê, 比如講: d(x**2)/dx = 2x, 就是解析. 伊佇 x = 0.5 个 dy/dx 就是 2*0.5 =1, 佇 x = 0 个 dy/dx 就是 2*0 = 0.

數值微分(Numerical differential)个觀念

若是電腦, 伊毋捌數彼款抽象思考, 數學家和電腦專家嘛發展出方法來處理微分, 彼就是數值微分(sòo-ta̍t-bî-hun). 咱用 Python 來奕看覓.

Python, 咱用一个真幼个數來逼倚, 比如講 1e-4 是 10 个負 4 次方:

def numerical_diff(f, x):
      h = 1e-4
      d =  (f(x+h) - f(x-h))/(2.0*h)
      return d

按呢共伊敆--起來:

#!/usr/bin/python3
def func_y(x):
    return x**2.0

def numerical_diff(f, x):
    h = 1e-4
    d =  (f(x+h) - f(x-h))/(2.0*h)
    return d

print(numerical_diff(func_y, 0.5))
print(numerical_diff(func_y, 0))

答案是:

0.9999999999998899
0.0

和 1 佮 0 精差是夭壽幼. 這款个方式, 號做數值微分 (numerical differentiation).

切線(tangent line) 和斜率 (slope)

佇 y=f(x), x=5 个彼點个微分, 就是彼个點个斜率 (tshiâ-lu̍t), 咱會使用彼斜率畫一條線迵過彼點, 這條線就是切線(tangent line), 咱來運行書頂的 ch04/gradient_1d.py, 伊的輸出是:

這的程式踮短短 31 逝, 就畫出一个函式佮伊的切線, 咱看伊的後半段:

def tangent_line(f, x):
    d = numerical_diff(f, x)
    print(d)
    y = f(x) - d*x
    return lambda t: d*t + y
     
x = np.arange(0.0, 20.0, 0.1)
y = function_1(x)
plt.xlabel("x")
plt.ylabel("f(x)")

tf = tangent_line(function_1, 5)
y2 = tf(x)

plt.plot(x, y)
plt.plot(x, y2)
plt.show()

函式 tagent_line(f, x) 是一个有影媠氣个寫法, 佇 4 逝就共函式 f 佇 x 點个切線函式生--出來. 伊利用一寡數學推算, 佮 lambda 這種 "生函式" 个方法.

閣紲落來就是使用 matplotlib.pylab 真勥(khiàng)个畫圖功能.

落梯法(lo̍h-thui-huat)

咱共這个函式小改一个, 親像這个程式: gradient_x2.py:

我予伊畫三條切線出來, 佇 x=5 伊的斜率是正个, x=0 伊的斜率拄拄好是 0, 佇 x=-5 伊的斜率是負个. 咱若是凊彩揣一點 x 想欲揣 f(x) 的上細值, 也就是 0 的所在, 欲按怎做咧?

咱會使那呢想: 斜率正个彼點, 咱 x 就向倒爿行. 斜率負个彼點, 咱就向正爿行, 若是伊有上細值, 伊的斜率就會那來若接近 0, 這就是上細點. 這種方法, 號做落梯法(gradient descent method).

共這个想法成做算式:

x = x - lr * df/dx

這就是逐擺 x 愛徙落去个下一步. 斜率正, df/dx > 0, 負號予 -n*df/dx 變細向倒爿. 斜率負 df/dx < 0, -n*df/dx 負負得正向正爿. n 是咱愛斟酌揀个一个數字, 傷大佮傷細攏會歹收縮. 咱用tha̍h-拄仔个 y=0.01*x**2 來做例, 參考: gradient_1_descent.py:

def gradient_descent(f, init_x, lr=0.1, step_num=10000):
    x = init_x

    for i in range(step_num):
        grad = numerical_diff(f, x)
        x -= lr * grad
    return x, f(x)


x1, y1 = gradient_descent(function_1, 10)
print(x1, y1)
x1, y1 = gradient_descent(function_1, -10)
print(x1, y1)

伊的結果是:

2.0202860902402727e-08 4.0815558864183276e-18
-2.0202860902402727e-08 4.0815558864183276e-18

雖然無仝, 毋過攏是倚 0, 彼差別幼微个程度, 會使共伊當做無差別.

←前一篇後一篇→

訂閱：文章 (Atom)

2018年12月25日 星期二

加法倒退攄

乘法倒退攄

動手用 Python 來實作

ReLU 棧的倒退攄

Sigmoid 棧的倒退攄

2018年11月11日 星期日

啥物是向前行演算法 (Forward Propagation)

倒退攄演算法是想欲tshòng啥物咧?

2018年10月28日 星期日

咱來走看覓

若有認真看原始碼 (Source code)

共伊改轉來 numerical_gradient() 奕看覓

gradient() 佮 numerical_gradient() 時間差蓋濟

2018年8月6日 星期一

先決定欲偌濟神經元, 幾棧, 起頭的值

計算損失函式

2018年7月17日 星期二

cProfile Má-tsìo (Module)

簡單例

共輸出儉入去檔案

Má-tsìo 方式 (Module)

排序个根據

減省輸出

命令列用法和 Má-tsìo 用法 ê 討論

用 gprof2dot 來生圖形

2018年7月13日 星期五

Timeit Má-tsìo (Module)

簡單例

timeit.default_timer()

2018年6月24日 星期日

起造一个字典

共一對Khí/值提挕捒

揣出字典所有个Khí

啥乜物件會使做 Khí

2018年5月27日 星期日

梯度法(Thui-tōo huat)

啥乜是偏微分?

用 Python 來寫二維个偏微分

踮 gradient_2d.py 內面 numerical_gradient() 个疑問

2018年5月5日 星期六

2018年5月3日 星期四

微分(bî-hun)个觀念

數值微分(Numerical differential)个觀念

切線(tangent line) 和斜率 (slope)

落梯法(lo̍h-thui-huat)

2018年12月25日星期二

2018年11月11日星期日

2018年10月28日星期日

2018年8月6日星期一

2018年7月17日星期二

2018年7月13日星期五

`timeit.default_timer`()

2018年6月24日星期日

2018年5月27日星期日

2018年5月5日星期六

2018年5月3日星期四