【python】【数据处理】画多维数据分布图
小姿势:
-
Matplotlib中%matplotlib inline是什么、如何使用 https://blog..net/liangzuojiayi/article/details/78183783 load_iris 可以加载sklearn自带的鸢尾花数据集(根据花萼、花瓣的长宽分辨属于哪一个类),数据格式:
data.feature_names(data[feature_names]):
[sepal length (cm),
sepal width (cm),
petal length (cm),
petal width (cm)]
data.target_names:
array([setosa, versicolor, virginica], dtype=<U10)
data[data]:
array([[5.1, 3.5, 1.4, 0.2],
[4.9, 3. , 1.4, 0.2],
......
[6.5, 3. , 5.2, 2. ],
[6.2, 3.4, 5.4, 2.3],
[5.9, 3. , 5.1, 1.8]])
data[target]:
array([0, 0, 0, 0,.......2, 2, 2])
-
sklearn.dataset可以加载很多种数据 t-SNE: https://blog..net/hustqb/article/details/78144384 详细解释了tsne的原理优缺点和使用方法
代码:
–画出手写数字图片的数据分布图
from time import time import numpy as np import matplotlib.pyplot as plt from sklearn import datasets from sklearn.manifold import TSNE import pandas as pd
digits = datasets.load_digits(n_class=10) df = pd.DataFrame(digits.data) label = digits.target df[label] = label print(type(digits.data)) df # orginal data
<class numpy.ndarray>
1797 rows × 65 columns
tsne = TSNE(n_components=2, init=pca, random_state=1)
result = tsne.fit_transform(digits.data) result
array([[ -4.2510934, 57.605927 ],
[ 27.768238 , -18.912882 ],
[ 19.440983 , -7.737709 ],
...,
[ 10.630893 , -12.436025 ],
[-18.820362 , 28.899649 ],
[ 6.5873857, -8.608063 ]], dtype=float32)
# draw 2-dimension pic
x_min, x_max = np.min(result), np.max(result)
# 这一步似乎让结果都变为0-1的数字
result = (result - x_min)/(x_max-x_min)
fig = plt.figure()
# subplot可以画出一个矩形,长宽由参数的前两位确定,参数越大,边长越小
ax = plt.subplot(111)
for i in range(result.shape[0]):
plt.text(result[i,0], result[i,1], str(label[i]), color=plt.cm.Set1(label[i] / 10.), fontdict={
weight: bold,size: 9})
plt.xticks([])
plt.yticks([])
plt.title(hello)
plt.show(fig)
结果:
