【python】【数据处理】画多维数据分布图
小姿势:
-
Matplotlib中%matplotlib inline是什么、如何使用 https://blog..net/liangzuojiayi/article/details/78183783 load_iris 可以加载sklearn自带的鸢尾花数据集(根据花萼、花瓣的长宽分辨属于哪一个类),数据格式:
data.feature_names(data[feature_names]): [sepal length (cm), sepal width (cm), petal length (cm), petal width (cm)] data.target_names: array([setosa, versicolor, virginica], dtype=<U10) data[data]: array([[5.1, 3.5, 1.4, 0.2], [4.9, 3. , 1.4, 0.2], ...... [6.5, 3. , 5.2, 2. ], [6.2, 3.4, 5.4, 2.3], [5.9, 3. , 5.1, 1.8]]) data[target]: array([0, 0, 0, 0,.......2, 2, 2])
-
sklearn.dataset可以加载很多种数据 t-SNE: https://blog..net/hustqb/article/details/78144384 详细解释了tsne的原理优缺点和使用方法
代码:
–画出手写数字图片的数据分布图
from time import time import numpy as np import matplotlib.pyplot as plt from sklearn import datasets from sklearn.manifold import TSNE import pandas as pd
digits = datasets.load_digits(n_class=10) df = pd.DataFrame(digits.data) label = digits.target df[label] = label print(type(digits.data)) df # orginal data
<class numpy.ndarray>
1797 rows × 65 columns
tsne = TSNE(n_components=2, init=pca, random_state=1)
result = tsne.fit_transform(digits.data) result
array([[ -4.2510934, 57.605927 ], [ 27.768238 , -18.912882 ], [ 19.440983 , -7.737709 ], ..., [ 10.630893 , -12.436025 ], [-18.820362 , 28.899649 ], [ 6.5873857, -8.608063 ]], dtype=float32)
# draw 2-dimension pic x_min, x_max = np.min(result), np.max(result) # 这一步似乎让结果都变为0-1的数字 result = (result - x_min)/(x_max-x_min) fig = plt.figure() # subplot可以画出一个矩形,长宽由参数的前两位确定,参数越大,边长越小 ax = plt.subplot(111) for i in range(result.shape[0]): plt.text(result[i,0], result[i,1], str(label[i]), color=plt.cm.Set1(label[i] / 10.), fontdict={ weight: bold,size: 9}) plt.xticks([]) plt.yticks([]) plt.title(hello) plt.show(fig)
结果: