Precisión de Tensorflow / Recall / puntuación F1 y matriz de confusión

Me gustaría saber si hay una manera de implementar la función de puntuación diferente del paquete de aprendizaje scikit como este:

from sklearn.metrics import confusion_matrix confusion_matrix(y_true, y_pred) 

en un modelo de tensorflow para obtener la puntuación diferente.

 with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as sess: init = tf.initialize_all_variables() sess.run(init) for epoch in xrange(1): avg_cost = 0. total_batch = len(train_arrays) / batch_size for batch in range(total_batch): train_step.run(feed_dict = {x: train_arrays, y: train_labels}) avg_cost += sess.run(cost, feed_dict={x: train_arrays, y: train_labels})/total_batch if epoch % display_step == 0: print "Epoch:", '%04d' % (epoch+1), "cost=", "{:.9f}".format(avg_cost) print "Optimization Finished!" correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1)) # Calculate accuracy accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float")) print "Accuracy:", batch, accuracy.eval({x: test_arrays, y: test_labels}) 

¿Tendré que volver a ejecutar la sesión para obtener la predicción?

Realmente no necesitas sklearn para calcular la puntuación de precisión / recuperación / f1. Puedes expresslos fácilmente en forma TF-ish observando las fórmulas:

introduzca la descripción de la imagen aquí

Ahora, si tiene sus valores actual y predicted como vectores de 0/1, puede calcular TP, TN, FP, FN utilizando tf.count_nonzero :

 TP = tf.count_nonzero(predicted * actual) TN = tf.count_nonzero((predicted - 1) * (actual - 1)) FP = tf.count_nonzero(predicted * (actual - 1)) FN = tf.count_nonzero((predicted - 1) * actual) 

Ahora tus métricas son fáciles de calcular:

 precision = TP / (TP + FP) recall = TP / (TP + FN) f1 = 2 * precision * recall / (precision + recall) 

Tal vez este ejemplo te hable:

  pred = multilayer_perceptron(x, weights, biases) correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1)) accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float")) with tf.Session() as sess: init = tf.initialize_all_variables() sess.run(init) for epoch in xrange(150): for i in xrange(total_batch): train_step.run(feed_dict = {x: train_arrays, y: train_labels}) avg_cost += sess.run(cost, feed_dict={x: train_arrays, y: train_labels})/total_batch if epoch % display_step == 0: print "Epoch:", '%04d' % (epoch+1), "cost=", "{:.9f}".format(avg_cost) #metrics y_p = tf.argmax(pred, 1) val_accuracy, y_pred = sess.run([accuracy, y_p], feed_dict={x:test_arrays, y:test_label}) print "validation accuracy:", val_accuracy y_true = np.argmax(test_label,1) print "Precision", sk.metrics.precision_score(y_true, y_pred) print "Recall", sk.metrics.recall_score(y_true, y_pred) print "f1_score", sk.metrics.f1_score(y_true, y_pred) print "confusion_matrix" print sk.metrics.confusion_matrix(y_true, y_pred) fpr, tpr, tresholds = sk.metrics.roc_curve(y_true, y_pred) 

Estuche multi-etiqueta

Las respuestas anteriores no especifican cómo manejar el caso de tags múltiples, de modo que aquí hay una versión que implementa tres tipos de puntuación de etiqueta múltiple f1 en tensorflow : micro, macro y ponderada (según scikit-learn)

Actualización (06/06/18): escribí una publicación en el blog sobre cómo calcular la puntuación de multilabel f1 de transmisión en caso de que ayude a alguien (es un proceso más largo, no quiero sobrecargar esta respuesta)

 f1s = [0, 0, 0] y_true = tf.cast(y_true, tf.float64) y_pred = tf.cast(y_pred, tf.float64) for i, axis in enumerate([None, 0]): TP = tf.count_nonzero(y_pred * y_true, axis=axis) FP = tf.count_nonzero(y_pred * (y_true - 1), axis=axis) FN = tf.count_nonzero((y_pred - 1) * y_true, axis=axis) precision = TP / (TP + FP) recall = TP / (TP + FN) f1 = 2 * precision * recall / (precision + recall) f1s[i] = tf.reduce_mean(f1) weights = tf.reduce_sum(y_true, axis=0) weights /= tf.reduce_sum(weights) f1s[2] = tf.reduce_sum(f1 * weights) micro, macro, weighted = f1s 

Exactitud

 def tf_f1_score(y_true, y_pred): """Computes 3 different f1 scores, micro macro weighted. micro: f1 score accross the classes, as 1 macro: mean of f1 scores per class weighted: weighted average of f1 scores per class, weighted from the support of each class Args: y_true (Tensor): labels, with shape (batch, num_classes) y_pred (Tensor): model's predictions, same shape as y_true Returns: tuple(Tensor): (micro, macro, weighted) tuple of the computed f1 scores """ f1s = [0, 0, 0] y_true = tf.cast(y_true, tf.float64) y_pred = tf.cast(y_pred, tf.float64) for i, axis in enumerate([None, 0]): TP = tf.count_nonzero(y_pred * y_true, axis=axis) FP = tf.count_nonzero(y_pred * (y_true - 1), axis=axis) FN = tf.count_nonzero((y_pred - 1) * y_true, axis=axis) precision = TP / (TP + FP) recall = TP / (TP + FN) f1 = 2 * precision * recall / (precision + recall) f1s[i] = tf.reduce_mean(f1) weights = tf.reduce_sum(y_true, axis=0) weights /= tf.reduce_sum(weights) f1s[2] = tf.reduce_sum(f1 * weights) micro, macro, weighted = f1s return micro, macro, weighted def compare(nb, dims): labels = (np.random.randn(nb, dims) > 0.5).astype(int) predictions = (np.random.randn(nb, dims) > 0.5).astype(int) stime = time() mic = f1_score(labels, predictions, average='micro') mac = f1_score(labels, predictions, average='macro') wei = f1_score(labels, predictions, average='weighted') print('sklearn in {:.4f}:\n micro: {:.8f}\n macro: {:.8f}\n weighted: {:.8f}'.format( time() - stime, mic, mac, wei )) gtime = time() tf.reset_default_graph() y_true = tf.Variable(labels) y_pred = tf.Variable(predictions) micro, macro, weighted = tf_f1_score(y_true, y_pred) with tf.Session() as sess: tf.global_variables_initializer().run(session=sess) stime = time() mic, mac, wei = sess.run([micro, macro, weighted]) print('tensorflow in {:.4f} ({:.4f} with graph time):\n micro: {:.8f}\n macro: {:.8f}\n weighted: {:.8f}'.format( time() - stime, time()-gtime, mic, mac, wei )) compare(10 ** 6, 10) 

salidas :

 >> rows: 10^6 dimensions: 10 sklearn in 2.3939: micro: 0.30890287 macro: 0.30890275 weighted: 0.30890279 tensorflow in 0.2465 (3.3246 with graph time): micro: 0.30890287 macro: 0.30890275 weighted: 0.30890279 

Como no tengo suficiente reputación para agregar un comentario a Salvador Dalis, esta es la forma de proceder:

tf.count_nonzero sus valores en un tf.int64 menos que se especifique lo contrario. Utilizando:

 argmax_prediction = tf.argmax(prediction, 1) argmax_y = tf.argmax(y, 1) TP = tf.count_nonzero(argmax_prediction * argmax_y, dtype=tf.float32) TN = tf.count_nonzero((argmax_prediction - 1) * (argmax_y - 1), dtype=tf.float32) FP = tf.count_nonzero(argmax_prediction * (argmax_y - 1), dtype=tf.float32) FN = tf.count_nonzero((argmax_prediction - 1) * argmax_y, dtype=tf.float32) 

Es realmente una buena idea.

Utilice las API de métricas proporcionadas en tf.contrib.metrics, por ejemplo:

 labels = ... predictions = ... accuracy, update_op_acc = tf.contrib.metrics.streaming_accuracy(labels, predictions) error, update_op_error = tf.contrib.metrics.streaming_mean_absolute_error(labels, predictions) sess.run(tf.local_variables_initializer()) for batch in range(num_batches): sess.run([update_op_acc, update_op_error]) accuracy, mean_absolute_error = sess.run([accuracy, mean_absolute_error])