MNIST data set is applied for K-mean clustering using AWS sagemaker notebook.
This is an example procedure for a K-mean clustering using jupyter notebook on AWS sagemaker. I followed a tutorial shown in a github repository [1]
Summary of the procedure
- Download MNIST input data set and load it on S3.
- Load the input data and build K-mean model on sagemaker jupyter notebook.
- Get results of clustering classes and verifies. the model
MNIST Input Data Loading
- mnist.pkl.gz data file is found in the internet and downloaded. The file is directly uploaded to the notebook instance.
- using pickle module train and test data set are made.
- those input data is sent to the created S3 bucket object.
K-Mean model build
K-mean model attributes are set and model is built using the input data. Here cluster number 10 is set for the number from 0 to 9. The model deployment is made.
Showing the output cluster
Top cluster classes willl be found. validation data is shown indeed belong to the classified cluster
Code used in this example is show in [4]
Reference
[1] GitHub - mtm12/SageMakerDemo
[2] K-Means Algorithm - Amazon SageMaker
[3] https://sagemakerexamples.readthedocs.io/en/latest/introduction_to_applying_machine_learning/US-census_population_segmentation_PCA_Kmeans/sagemaker-countycensusclustering.html
[4] GitHub - dchoi/mnistKmean: mnistKmean