【python】【数据处理】画多维数据分布图

小姿势:

    Matplotlib中%matplotlib inline是什么、如何使用 https://blog..net/liangzuojiayi/article/details/78183783 load_iris 可以加载sklearn自带的鸢尾花数据集(根据花萼、花瓣的长宽分辨属于哪一个类),数据格式:
data.feature_names(data[feature_names]):
 		[sepal length (cm),
		 sepal width (cm),
		 petal length (cm),
		 petal width (cm)]
data.target_names:
	array([setosa, versicolor, virginica], dtype=<U10)
data[data]:
	array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       ......
       [6.5, 3. , 5.2, 2. ],
       [6.2, 3.4, 5.4, 2.3],
       [5.9, 3. , 5.1, 1.8]])
 data[target]:
 	array([0, 0, 0, 0,.......2, 2, 2])
    sklearn.dataset可以加载很多种数据 t-SNE: https://blog..net/hustqb/article/details/78144384 详细解释了tsne的原理优缺点和使用方法

代码:

–画出手写数字图片的数据分布图

from time import time
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.manifold import TSNE
import pandas as pd
digits = datasets.load_digits(n_class=10)
df = pd.DataFrame(digits.data)
label = digits.target
df[label]  = label
print(type(digits.data))
df
# orginal data
<class numpy.ndarray>
0 1 2 3 4 5 6 7 8 9 ... 55 56 57 58 59 60 61 62 63 label 0 0.0 0.0 5.0 13.0 9.0 1.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 6.0 13.0 10.0 0.0 0.0 0.0 0 1 0.0 0.0 0.0 12.0 13.0 5.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 11.0 16.0 10.0 0.0 0.0 1 2 0.0 0.0 0.0 4.0 15.0 12.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 3.0 11.0 16.0 9.0 0.0 2 3 0.0 0.0 7.0 15.0 13.0 1.0 0.0 0.0 0.0 8.0 ... 0.0 0.0 0.0 7.0 13.0 13.0 9.0 0.0 0.0 3 4 0.0 0.0 0.0 1.0 11.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 2.0 16.0 4.0 0.0 0.0 4 ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 1792 0.0 0.0 4.0 10.0 13.0 6.0 0.0 0.0 0.0 1.0 ... 0.0 0.0 0.0 2.0 14.0 15.0 9.0 0.0 0.0 9 1793 0.0 0.0 6.0 16.0 13.0 11.0 1.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 6.0 16.0 14.0 6.0 0.0 0.0 0 1794 0.0 0.0 1.0 11.0 15.0 1.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 2.0 9.0 13.0 6.0 0.0 0.0 8 1795 0.0 0.0 2.0 10.0 7.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 5.0 12.0 16.0 12.0 0.0 0.0 9 1796 0.0 0.0 10.0 14.0 8.0 1.0 0.0 0.0 0.0 2.0 ... 0.0 0.0 1.0 8.0 12.0 14.0 12.0 1.0 0.0 8

1797 rows × 65 columns

tsne = TSNE(n_components=2, init=pca, random_state=1)
result = tsne.fit_transform(digits.data)
result
array([[ -4.2510934,  57.605927 ],
       [ 27.768238 , -18.912882 ],
       [ 19.440983 ,  -7.737709 ],
       ...,
       [ 10.630893 , -12.436025 ],
       [-18.820362 ,  28.899649 ],
       [  6.5873857,  -8.608063 ]], dtype=float32)
# draw 2-dimension pic

x_min, x_max = np.min(result), np.max(result)

# 这一步似乎让结果都变为0-1的数字
result = (result - x_min)/(x_max-x_min)
fig = plt.figure()
# subplot可以画出一个矩形,长宽由参数的前两位确定,参数越大,边长越小
ax = plt.subplot(111)
for i in range(result.shape[0]):
    plt.text(result[i,0], result[i,1], str(label[i]), color=plt.cm.Set1(label[i] / 10.), fontdict={
          
   weight: bold,size: 9})
plt.xticks([])
plt.yticks([])
plt.title(hello)
plt.show(fig)

结果:

经验分享 程序员 微信小程序 职场和发展