整理中。。。?
? ? ? ?情感分析(Sentiment analysis),又稱傾向性分析,包含較多的任務,如意見抽取(Opinion extraction),意見挖掘(Opinion mining),情感挖掘(Sentiment mining),主觀分析(Subjectivity analysis)。它是對帶有情感色彩的主觀性文本進行分析、處理、歸納和推理的過程,如從評論文本中分析用戶對“數碼相機”的“變焦、價格、大小、重量、閃光、易用性”等屬性的情感傾向。是NLP領域一個比較重要的課題。
情感分類有什么用?
? ? ? ? 物品好壞分析: 從評論中分析物品的好壞。例如電影好壞,是否值得看。
? ? ? ? 物品屬性分析:例如某些價位區間幾款車的舒適度,油耗, 操作性能等。
? ? ? ? 產品反饋分析: 產品哪些功能點最受用戶喜歡,哪些功能最受用戶吐槽。
? ? ? ? 網民輿情分析: 例如分析美團外面清真事情等。
? ? ? ?金融走勢分析:例如,2012年5月,世界首家基于社交媒體的對沖基金 Derwent Capital Markets 上線。它會即時關注Twitter? 中的公眾情緒指導投資。
? ? ? ?總的來說:情感分類分析再小到平臺物品,產品本身,大到金融事情都有其用武之地。隨著這波數據浪潮和人工智能浪潮的興起。這一領域將會起到越來越重要的作用。
#!/usr/bin/env python3
# -*- coding:utf-8 -*-
# Author: Hou-Houimport numpy as npclass DataIterator:"""產生batch數據"""def __init__(self, data1, data2, batch_size):self.data1 = data1self.data2 = data2self.batch_size = batch_sizeself.iter = self.make_random_iter() # 數據集的索引值def next_batch(self):try:idxs = next(self.iter) # 對迭代器不斷使用next()來獲取下?條數據except StopIteration:self.iter = self.make_random_iter()idxs = next(self.iter)X = [self.data1[i] for i in idxs]Y = [self.data2[i] for i in idxs]X = np.array(X)Y = np.array(Y)return X, Y# 將數據集切分成batch_size長度的數據def make_random_iter(self):splits = np.arange(self.batch_size, len(self.data1), self.batch_size) # 起點,終點,步長# np.random.permutation產生一個隨機序列作為索引,再使用這個序列從原來的數據集中按照新的隨機順序產生隨機數據集it = np.split(np.random.permutation(range(len(self.data1))), splits)[:-1] # 第二個參數為沿軸切分的位置return iter(it) # iter() 函數用來生成迭代器。
#!/usr/bin/env python3
# -*- coding:utf-8 -*-
# Author: Hou-Houfrom keras.datasets import imdb
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow.python.ops import rnn, rnn_cellimport createdata(X_train,y_train),(X_test,y_test) = imdb.load_data(num_words=None,skip_top=0,maxlen=None,seed=113,start_char=1,oov_char=2,index_from=3)
a = [len(x) for x in X_train]
plt.plot(a)
plt.show()t = [item for sublist in X_train for item in sublist]
vocabulary = len(set(t))+1# 將序列處理為指定長度
max_length = 200
x_filter = []
y_filter = []for i in range(len(X_train)):a = len(X_train[i])if a<max_length:X_train[i] = X_train[i] + [0]*(max_length-a)x_filter.append(X_train[i])y_filter.append(y_train[i])elif a > max_length:X_train[i] = X_train[i][:max_length]# 定義超參數
embedding_size = 100
n_hidden = 200 # 隱藏單元數
learning_rate = 0.06
training_iters = 100000
batch_size = 32
beta = 0.0001# 聲明其他參數
n_steps = max_length # timestepswords
n_classes = 2 # 0/1 : binary classification for negative and positive reviews
da = 350 # hyper-parameter : Self-attention MLP has hidden layer with da units
r = 30 # count of different parts to be extracted from sentence (= number of rows in matrix embedding)
display_step = 10
hidden_units = 3000y_train = np.asarray(pd.get_dummies(y_filter))
X_train = np.asarray([np.asarray(g) for g in x_filter])# 創建一個內部文件夾來記錄日志
logs_path = './recent_logs/'# 初始化權重和偏差
with tf.name_scope("weights"):Win = tf.Variable(tf.random_uniform([n_hidden*r, hidden_units], -1/np.sqrt(n_hidden), 1/np.sqrt(n_hidden)), name='W_input') # 均勻分布Wout = tf.Variable(tf.random_uniform([hidden_units, n_classes], -1/np.sqrt(n_hidden), 1/np.sqrt(n_hidden)), name='W-out')Ws1 = tf.Variable(tf.random_uniform([da, n_hidden], -1/np.sqrt(da), 1/np.sqrt(da)), name='Ws1')Ws2 = tf.Variable(tf.random_uniform([r, da], -1/np.sqrt(r), 1/np.sqrt(r)), name='Ws2')
with tf.name_scope("biases"):biasesin = tf.Variable(tf.random_normal([hidden_units]), name='biases-in')biasesout = tf.Variable(tf.random_normal([n_classes]), name='biases-out')# 定義輸入輸出占位符
with tf.name_scope('input'):x = tf.placeholder("int32", [32,max_length], name='x-input')y = tf.placeholder("int32", [32, 2], name='y-input')# 嵌入向量
with tf.name_scope('embedding'):embeddings = tf.Variable(tf.random_uniform([vocabulary, embedding_size], -1, 1), name='embeddings')embed = tf.nn.embedding_lookup(embeddings, x) # 將單詞轉換為向量表示形式def length(sequence):# Computing maximum of elements across dimensions of a tensorused = tf.sign(tf.reduce_max(tf.abs(sequence), reduction_indices=2))length = tf.reduce_sum(used, reduction_indices=1)length = tf.cast(length, tf.int32) # 將length的數據格式轉化成dtypereturn lengthwith tf.variable_scope('forward', reuse=True):lstm_fw_cell = rnn_cell.BasicLSTMCell(n_hidden)with tf.name_scope('model'):outputs, states = rnn.dynamic_rnn(lstm_fw_cell, embed, sequence_length=length(embed), dtype=tf.float32, time_major=False)# in the next step we multiply the hidden-vec matrix with the Ws1 by reshapingh = tf.nn.tanh(tf.transpose(tf.reshape(tf.matmul(Ws1, tf.reshape(outputs, [n_hidden, batch_size*n_steps])), [da, batch_size, n_steps]), [1, 0, 2]))# in this step we multiply the generated matrix with Ws2a = tf.reshape(tf.matmul(Ws2, tf.reshape(h, [da, batch_size*n_steps])), [batch_size, r, n_steps])def fn3(a,x):return tf.nn.softmax(x)h3 = tf.scan(fn3, a)'''tf.scan(fn, elems, initializer=None, parallel_iterations=10, back_prop=True, swap_memory=False, infer_shape=True, name=None)fn:計算函數elems:以elems的第一維度的變量list作函數計算直到遍歷完整個elemsinitializer:fn計算的初始值,替代elems做第一次計算。'''with tf.name_scope('flattening'):# here we again multiply(batch) of the generated batch with the same hidden matrixh4 = tf.matmul(h3,outputs)# flattening the output embedded matrixlast = tf.reshape(h4,[-1,r*n_hidden])with tf.name_scope('MLP'):tf.nn.dropout(last,.5, noise_shape=None, seed=None, name=None)pred1 = tf.nn.sigmoid(tf.matmul(last, Win)+biasesin)pred = tf.matmul(pred1, Wout) + biasesout# Define loss and optimizer
with tf.name_scope('cross'):cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y) + beta*tf.nn.l2_loss(Ws2) )with tf.name_scope('train'):optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)gvs = optimizer.compute_gradients(cost)# tf.clip_by_norm對梯度進行裁剪,通過控制梯度的最大范式,防止梯度爆炸的問題capped_gvs = [(tf.clip_by_norm(grad, 0.5), var) for grad, var in gvs]optimizer.apply_gradients(capped_gvs)optimized = optimizer.minimize(cost)# Evaluate model
with tf.name_scope('Accuracy'):correct_pred = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32)) # 將correct_pred的數據格式轉化成dtype# TensorFlow中最重要的可視化方法是通過tensorBoard、tf.summary和tf.summary.FileWriter這三個模塊相互合作來完成的
tf.summary.scalar("cost", cost)
tf.summary.scalar("accuracy", accuracy)# merge all summaries into a single "summary operation" which we can execute in a session
summary_op = tf.summary.merge_all()# Initializing the variables
train_iter = createdata.DataIterator(X_train, y_train, batch_size)
init = tf.global_variables_initializer()# This could give warning if in case the required port is being used already
# Running the command again or releasing the port before the subsequent run should solve the purpose# 開始訓練模型
with tf.Session() as sess:sess.run(init)# Creating log file writer objectwriter = tf.summary.FileWriter(logs_path, graph=tf.get_default_graph())step = 1# Keep training until reach max iterationswhile step * batch_size < training_iters:batch_x, batch_y = train_iter.next_batch()sess.run(optimized, feed_dict={x: batch_x, y: batch_y})# Executing the summary operation in the sessionsummary = sess.run(summary_op, feed_dict={x: batch_x, y: batch_y})# Writing the values in log file using the FileWriter object created abovewriter.add_summary(summary, step * batch_size)if step % display_step == 2:# Calculate batch accuracyacc = sess.run(accuracy, feed_dict={x: batch_x, y: batch_y})# Calculate batch lossloss = sess.run(cost, feed_dict={x: batch_x, y: batch_y})print("Iter " + str(step * batch_size) + \", Minibatch Loss= " + "{:.6f}".format(loss) + \", Training Accuracy= " + "{:.2f}".format(acc * 100) + "%")step += 1print("Optimization Finished!")# tensorboard --logdir=./
參考:
https://www.jianshu.com/p/158c3f02a15b
《新程序員》:云原生和全面數字化實踐50位技術專家共同創作,文字、視頻、音頻交互閱讀
總結
以上是生活随笔為你收集整理的基于LSTM的情感分类案例:Tensorflow代码的全部內容,希望文章能夠幫你解決所遇到的問題。
如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。