ptbで学習したモデルを使って文生成
example\ptbを読む - chainerで自然言語処理できるかマンの学習結果のrnnlm.modelファイルを使って、文生成をしてみます。
準備
下記ファイルを同じディレクトリ内に用意しておきます。
- ptb.train.txt
- ptb.test.txt
- ptb.valid.txt
- rnnlm.model
- net.py
コード
#encoding: utf-8 # # Copyright (c) 2016 chainer_nlp_man # # This software is released under the MIT License. # http://opensource.org/licenses/mit-license.php # import argparse import math import sys import itertools import random import bisect import numpy as np import chainer import chainer.links as L import chainer.functions as F from chainer import serializers import net # 引数にモデルファイルを指定 parser = argparse.ArgumentParser() parser.add_argument('--model', '-m', default='', help='the model from given file') args = parser.parse_args() # 単語<->ID変換用 vocab2id = {} id2vocab = {} # train_ptb.pyと同じ読み込み方にすることで単語とIDのペアが一致するようにする def load_data(filename): global vocab2id, id2vocab, n_vocab words = open(filename).read().replace('\n', '<eos>').strip().split() for i, word in enumerate(words): if word not in vocab2id: vocab2id[word] = len(vocab2id) id2vocab[vocab2id[word]] = word load_data('ptb.train.txt') load_data('ptb.valid.txt') load_data('ptb.test.txt') # train_ptb.pyと同じ設定にする n_units = 650 lm = net.RNNLM(len(vocab2id), n_units, False) model = L.Classifier(lm) # モデルデータの読み込み serializers.load_hdf5(args.model, model) # 文の適当な生成 for i in range(0,10): print(i+1, end=": ") # モデルの状態をいったんリセット model.predictor.reset_state() word = "<eos>" while True: # RNNLMへの入力を準備 x = chainer.Variable(np.array([vocab2id[word]])) # RNNLMの出力のsoftmaxを取得 y = F.softmax(model.predictor(x)) # 各単語の確率値として、単語をサンプリングし、次の単語とする y_accum = np.add.accumulate(y.data[0]) r = random.random() word = id2vocab[bisect.bisect(y_accum, r)] # もし文末だったら終了 if word == "<eos>": print(".") break else: print(word, end=" ")
実行
$ python gen_sentence.py -m rnnlm.model
結果
1: mr. burton said certificates of annuity and acquisitions received by the government to be submitted on a <unk> basis by painewebber inc. will have just been involved in the forest business and private incentives but declined to disclose what full licensing only . 2: <unk> white house of duff & trecker said the new trading company 's cash flow has been reduced by $ N million for the sale of those shares under the agreement . 3: the fund called cholesterol . 4: gary l. <unk> head of the only reinsurance department 's office raised his <unk> account title to revise washington motor co. cleveland securities . 5: short interest in shares of high-yield high-risk junk bonds moved up N last week mostly as <unk> from boston 's N N N high over $ N billion . 6: the cut would focus on <unk> loans designed to distribute <unk> even to the ldp earlier this year . 7: but top trial activist david <unk> d. ore. cited the unprecedented factors of operating in damages of gifts from fashionable banks that be undervalued in <unk> division . 8: the fda 's successor will become married artistic efforts by a <unk> in the press as well as the quality of the reasons . 9: the house which will help lay <unk> out steppenwolf 's <unk> operations once became a direct <unk> to international environmental protection although it sold the $ N million procter & gamble co. buddy <unk> thompson operations and rjr nabisco inc. in a fraudulent interview . 10: a number of agencies not <unk> legal corruption and lawyers at new york have <unk> the government 's <unk> plea into l. <unk> .
結構文っぽく生成されているように見えます。