XORの学習 - chainerで自然言語処理できるかマン

chainerのバージョンを1.6.1へあげてみたので、TutorialをやりながらXORの学習を行うMulti-layer Perceptronを書いてみました。
初期値(L.LinearのWがランダム)に依って局所解に落っこちやすいみたいで、損失が十分に小さくなってくれないことが多いです。。。

コード

#encoding: utf-8
#
# Copyright (c) 2016 chainer_nlp_man
#
# This software is released under the MIT License.
# http://opensource.org/licenses/mit-license.php
#
import numpy as np
import chainer
from chainer import cuda, Function, gradient_check, Variable, optimizers, serializers, utils
from chainer import Link, Chain, ChainList
import chainer.functions as F
import chainer.links as L

# 2入力、2出力
# [nobias_flagがFalseの場合]
#  x h y
# -o-o-o-
#   x x
# -o-o-o-
# (Trueの場合は、xとhにバイアスノードが1つずつ加わる)
class MLP(Chain):
    def __init__(self, nobias_flag):
        super(MLP, self).__init__(
            l1 = L.Linear(2,2,nobias=nobias_flag),
            l2 = L.Linear(2,2,nobias=nobias_flag),
        )
        self.nobias_flag = nobias_flag

    def __call__(self, x):
        h = F.sigmoid(self.l1(x))
        y = self.l2(h)
        return y

    def dump(self):
        print(self.l1.W.data)
        if not self.nobias_flag:
            print(self.l1.b.data)
        print(self.l2.W.data)
        if not self.nobias_flag:
            print(self.l2.b.data)

class Classifier(Chain):
    def __init__(self,predictor):
        super(Classifier, self).__init__(
            predictor = predictor
        )

    def __call__(self, x, t):
        y = self.predictor(x)
        self.loss = F.softmax_cross_entropy(y,t)
        self.accuracy = F.accuracy(y,t)
        return self.loss


# モデルの準備
model = Classifier(MLP(False))
optimizer = optimizers.Adam()
optimizer.setup(model)

# 学習ループ
loss_value = 100000
cnt = 0
while loss_value > 1e-5:
    # 学習データ
    x = Variable(np.array([[0,0],[1,0],[0,1],[1,1]], dtype=np.float32))
    t = Variable(np.array([0,1,1,0], dtype=np.int32))

    # 学習
    model.zerograds()
    loss = model(x,t)
    loss_value = loss.data
    loss.backward()
    optimizer.update()

    cnt += 1
    if cnt%1000 == 0:
        # 途中結果の出力
        y = F.softmax(model.predictor(x))

        print("=====iter = {0}, loss = {1}=====".format(cnt, loss_value))
        print("---output value---")
        print(y.data)
        print("---result---")
        print(y.data.argmax(1))
        print("---dump---")
        model.predictor.dump()


# モデルファイルを保存
serializers.save_npz('my_xor.model', model)

実行

$ python xor.py

結果

L.Linearが持つWとbのうち、Wの初期値はデフォルトでランダムに設定されるので、何回か実行しています。

うまくいくケース

初期値がいいところにいると、最適解に向かってくれて損失(loss)が0に近づいてくれるようです。

=====iter = 24000, loss = 1.2278556823730469e-05=====
---output value---
[[  9.99987483e-01   1.25582164e-05]
 [  9.03956334e-06   9.99990940e-01]
 [  1.53962192e-05   9.99984622e-01]
 [  9.99987483e-01   1.25290881e-05]]
---result---
[0 1 1 0]
---dump---
[[ 9.62363434 -9.92488861]
 [-9.60831738  9.82478237]]
[-5.2305131  -5.20280075]
[[-11.67700863 -10.53997993]
 [ 11.75784016  12.29858112]]
[ 5.6855135  -5.84928703]

うまくいかないケース

局所解によく落ちてしまっています。(終わらない)

=====iter = 84000, loss = 0.3465735912322998=====
---output value---
[[  1.00000000e+00   3.97014652e-08]
 [  4.53840165e-08   1.00000000e+00]
 [  5.00000417e-01   4.99999613e-01]
 [  5.00000417e-01   4.99999613e-01]]
---result---
[0 1 0 0]
---dump---
[[ -9.91501236  21.46763802]
 [ 10.34690571  21.0509758 ]]
[ 5.1153326  -4.40958834]
[[ 8.80068302 -9.16878891]
 [-8.29266644  8.18357563]]
[ 0.0821298  -0.17688689]