数据来源
本次使用的数据来源于网络电影数据库的IMDB数据集,其中包含50,000条影评文本。
数据处理
从该数据集切割出的25,000条评论用作训练,另外25,000条用作测试。训练集与测试集是平衡的,意味着它们包含相等数量的积极和消极评论。
数据下载
模型的构建
模型的编译与训练
模型的评估
完整代码
# !/usr/bin/env python
# —*— coding: utf-8 —*—
# @Time: 2020/1/2 7:42
# @Author: Martin
# @File: Film_Reviews_Classification.py
# @Software:PyCharm
from __future__ import absolute_import, division, print_function, unicode_literals
import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_datasets as tfds
import os
os.environ["TF_CPP_MIN_LOG_LEVEL"] = '3'
# 下载IMDB数据集
train_validation_split = tfds.Split.TRAIN.subsplit([6, 4])
(train_data, validation_data), test_data = tfds.load(
name="imdb_reviews",
split=(train_validation_split, tfds.Split.TEST),
as_supervised=True
)
# 构建模型
embedding = "https://hub.tensorflow.google.cn/google/tf2-preview/gnews-swivel-20dim/1"
hub_layer = hub.KerasLayer(embedding, input_shape=[], dtype=tf.string, trainable=True)
model = tf.keras.Sequential()
model.add(hub_layer)
model.add(tf.keras.layers.Dense(16, activation='relu'))
model.add(tf.keras.layers.Dense(1, activation='sigmoid'))
# 编译模型
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
# 训练模型
history = model.fit(train_data.shuffle(10000).batch(512),
epochs=20,
validation_data=validation_data.batch(512),
verbose=1)
# 评估模型
results = model.evaluate(test_data.batch(512), verbose=2)
for name, value in zip(model.metrics_names, results):
print("%s: %.3f" % (name, value))
最终结果
Epoch 1/20
1/Unknown - 1s 1s/step - loss: 0.8219 - accuracy: 0.4609
2/Unknown - 2s 782ms/step - loss: 0.8012 - accuracy: 0.4756
3/Unknown - 2s 563ms/step - loss: 0.7966 - accuracy: 0.4818
4/Unknown - 2s 450ms/step - loss: 0.7910 - accuracy: 0.4800
5/Unknown - 2s 381ms/step - loss: 0.7840 - accuracy: 0.4828
6/Unknown - 2s 334ms/step - loss: 0.7851 - accuracy: 0.4801
7/Unknown - 2s 299ms/step - loss: 0.7776 - accuracy: 0.4886
8/Unknown - 2s 274ms/step - loss: 0.7712 - accuracy: 0.4944
9/Unknown - 2s 255ms/step - loss: 0.7656 - accuracy: 0.4998
10/Unknown - 2s 240ms/step - loss: 0.7622 - accuracy: 0.5033
11/Unknown - 2s 227ms/step - loss: 0.7624 - accuracy: 0.5025
12/Unknown - 3s 215ms/step - loss: 0.7596 - accuracy: 0.5065
13/Unknown - 3s 206ms/step - loss: 0.7586 - accuracy: 0.5066
14/Unknown - 3s 197ms/step - loss: 0.7569 - accuracy: 0.5070
15/Unknown - 3s 190ms/step - loss: 0.7537 - accuracy: 0.5096
16/Unknown - 3s 184ms/step - loss: 0.7504 - accuracy: 0.5125
17/Unknown - 3s 178ms/step - loss: 0.7484 - accuracy: 0.5149
18/Unknown - 3s 174ms/step - loss: 0.7449 - accuracy: 0.5193
19/Unknown - 3s 170ms/step - loss: 0.7421 - accuracy: 0.5225
20/Unknown - 3s 166ms/step - loss: 0.7401 - accuracy: 0.5238
21/Unknown - 3s 162ms/step - loss: 0.7382 - accuracy: 0.5252
22/Unknown - 3s 159ms/step - loss: 0.7366 - accuracy: 0.5264
23/Unknown - 4s 156ms/step - loss: 0.7347 - accuracy: 0.5284
24/Unknown - 4s 153ms/step - loss: 0.7327 - accuracy: 0.5303
25/Unknown - 4s 151ms/step - loss: 0.7312 - accuracy: 0.5304
26/Unknown - 4s 149ms/step - loss: 0.7290 - accuracy: 0.5322
27/Unknown - 4s 147ms/step - loss: 0.7269 - accuracy: 0.5348
28/Unknown - 4s 144ms/step - loss: 0.7255 - accuracy: 0.5363
29/Unknown - 4s 142ms/step - loss: 0.7242 - accuracy: 0.5369
30/Unknown - 4s 139ms/step - loss: 0.7231 - accuracy: 0.5374
30/30 [==============================] - 6s 207ms/step - loss: 0.7231 - accuracy: 0.5374 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 2/20
1/20 [>.............................] - ETA: 12s - loss: 0.6648 - accuracy: 0.6133
2/20 [==>...........................] - ETA: 6s - loss: 0.6550 - accuracy: 0.6230
3/20 [===>..........................] - ETA: 4s - loss: 0.6492 - accuracy: 0.6243
4/20 [=====>........................] - ETA: 4s - loss: 0.6570 - accuracy: 0.6211
5/20 [======>.......................] - ETA: 3s - loss: 0.6621 - accuracy: 0.6133
6/20 [========>.....................] - ETA: 2s - loss: 0.6596 - accuracy: 0.6159
7/20 [=========>....................] - ETA: 2s - loss: 0.6600 - accuracy: 0.6158
8/20 [===========>..................] - ETA: 2s - loss: 0.6589 - accuracy: 0.6187
9/20 [============>.................] - ETA: 1s - loss: 0.6568 - accuracy: 0.6237
10/20 [==============>...............] - ETA: 1s - loss: 0.6567 - accuracy: 0.6244
11/20 [===============>..............] - ETA: 1s - loss: 0.6574 - accuracy: 0.6223
12/20 [=================>............] - ETA: 1s - loss: 0.6539 - accuracy: 0.6265
13/20 [==================>...........] - ETA: 1s - loss: 0.6535 - accuracy: 0.6259
14/20 [====================>.........] - ETA: 0s - loss: 0.6540 - accuracy: 0.6253
15/20 [=====================>........] - ETA: 0s - loss: 0.6531 - accuracy: 0.6254
16/20 [=======================>......] - ETA: 0s - loss: 0.6520 - accuracy: 0.6263
17/20 [========================>.....] - ETA: 0s - loss: 0.6516 - accuracy: 0.6252
18/20 [==========================>...] - ETA: 0s - loss: 0.6519 - accuracy: 0.6253
19/20 [===========================>..] - ETA: 0s - loss: 0.6504 - accuracy: 0.6278
30/30 [==============================] - 5s 173ms/step - loss: 0.6466 - accuracy: 0.6398 - val_loss: 0.6144 - val_accuracy: 0.6696
Epoch 3/20
1/20 [>.............................] - ETA: 12s - loss: 0.6175 - accuracy: 0.6602
2/20 [==>...........................] - ETA: 6s - loss: 0.6040 - accuracy: 0.6826
3/20 [===>..........................] - ETA: 5s - loss: 0.6080 - accuracy: 0.6823
4/20 [=====>........................] - ETA: 3s - loss: 0.5987 - accuracy: 0.6914
5/20 [======>.......................] - ETA: 3s - loss: 0.5974 - accuracy: 0.6922
6/20 [========>.....................] - ETA: 2s - loss: 0.5993 - accuracy: 0.6872
7/20 [=========>....................] - ETA: 2s - loss: 0.5989 - accuracy: 0.6881
8/20 [===========>..................] - ETA: 2s - loss: 0.6000 - accuracy: 0.6890
9/20 [============>.................] - ETA: 1s - loss: 0.5987 - accuracy: 0.6866
10/20 [==============>...............] - ETA: 1s - loss: 0.5985 - accuracy: 0.6867
11/20 [===============>..............] - ETA: 1s - loss: 0.5995 - accuracy: 0.6848
12/20 [=================>............] - ETA: 1s - loss: 0.5986 - accuracy: 0.6857
13/20 [==================>...........] - ETA: 0s - loss: 0.5981 - accuracy: 0.6870
14/20 [====================>.........] - ETA: 0s - loss: 0.5998 - accuracy: 0.6871
15/20 [=====================>........] - ETA: 0s - loss: 0.5991 - accuracy: 0.6880
16/20 [=======================>......] - ETA: 0s - loss: 0.5985 - accuracy: 0.6890
17/20 [========================>.....] - ETA: 0s - loss: 0.5982 - accuracy: 0.6898
18/20 [==========================>...] - ETA: 0s - loss: 0.5972 - accuracy: 0.6900
19/20 [===========================>..] - ETA: 0s - loss: 0.5948 - accuracy: 0.6926
30/30 [==============================] - 5s 171ms/step - loss: 0.5923 - accuracy: 0.6983 - val_loss: 0.5705 - val_accuracy: 0.7160
Epoch 4/20
1/20 [>.............................] - ETA: 13s - loss: 0.5646 - accuracy: 0.7031
2/20 [==>...........................] - ETA: 7s - loss: 0.5533 - accuracy: 0.7158
3/20 [===>..........................] - ETA: 5s - loss: 0.5497 - accuracy: 0.7240
4/20 [=====>........................] - ETA: 4s - loss: 0.5535 - accuracy: 0.7168
5/20 [======>.......................] - ETA: 3s - loss: 0.5614 - accuracy: 0.7094
6/20 [========>.....................] - ETA: 3s - loss: 0.5617 - accuracy: 0.7113
7/20 [=========>....................] - ETA: 2s - loss: 0.5603 - accuracy: 0.7132
8/20 [===========>..................] - ETA: 2s - loss: 0.5576 - accuracy: 0.7161
9/20 [============>.................] - ETA: 2s - loss: 0.5545 - accuracy: 0.7216
10/20 [==============>...............] - ETA: 1s - loss: 0.5570 - accuracy: 0.7209
11/20 [===============>..............] - ETA: 1s - loss: 0.5560 - accuracy: 0.7227
12/20 [=================>............] - ETA: 1s - loss: 0.5540 - accuracy: 0.7251
13/20 [==================>...........] - ETA: 1s - loss: 0.5549 - accuracy: 0.7258
14/20 [====================>.........] - ETA: 0s - loss: 0.5525 - accuracy: 0.7288
15/20 [=====================>........] - ETA: 0s - loss: 0.5504 - accuracy: 0.7314
16/20 [=======================>......] - ETA: 0s - loss: 0.5510 - accuracy: 0.7311
17/20 [========================>.....] - ETA: 0s - loss: 0.5492 - accuracy: 0.7333
18/20 [==========================>...] - ETA: 0s - loss: 0.5483 - accuracy: 0.7340
19/20 [===========================>..] - ETA: 0s - loss: 0.5470 - accuracy: 0.7366
30/30 [==============================] - 6s 200ms/step - loss: 0.5450 - accuracy: 0.7421 - val_loss: 0.5294 - val_accuracy: 0.7485
Epoch 5/20
1/20 [>.............................] - ETA: 15s - loss: 0.5316 - accuracy: 0.7480
2/20 [==>...........................] - ETA: 8s - loss: 0.5228 - accuracy: 0.7588
3/20 [===>..........................] - ETA: 6s - loss: 0.5195 - accuracy: 0.7572
4/20 [=====>........................] - ETA: 4s - loss: 0.5217 - accuracy: 0.7515
5/20 [======>