For faster navigation, this Iframe is preloading the Wikiwand page for MNIST数据库.

MNIST数据库

MNIST sample images
來自MNIST測試数据库的示例圖像

MNIST数据库(源自“National Institute of Standards and Technology database”[1] )是一個通常用於训练各種數位影像處理系統的大型数据库[2][3]。该数据库通过对來自NIST原始数据库的樣本进行修改創建,涵盖手写数字的图像,共包含60,000张训练图像和10,000张测试图像,尺寸为28×28像素。该数据库广泛运用于机器学习領域的訓練与測試當中[4][5]。MNIST在其发布时使用支持向量机的错误率为0.8%,但一些研究后来通过使用深度学习技术显著改进了这一成绩。

历史

[编辑]

MNIST数据库通過「重混」(re-mixing)的來自NIST原始数据库的樣本創建[6]。創建者認為,由於NIST的訓練数据來自美國人口普查局的員工,而測試数据取自美國高中學生,这样的数据集不适合用来进行研究[7]。此外,NIST的黑白图像被歸一化英语Normalization (image processing)处理,以适应28×28像素的边界框,并进行了抗鋸齒英语Spatial anti-aliasing处理,从而引入了灰度级别[7]

MNIST數據库包含有60,000張訓練圖像与10,000張測試圖像[8]。训练集的一半和测试集的一半来自NIST的训练数据集,而训练集的另一半和测试集的另一半则来自NIST的测试数据集[9]。数据库的原始创建者保留了一些在其上测试的算法方法的列表[7]。在他们的原始论文中,他们使用支持向量机获得了0.8%的错误率[10]。然而,原始的MNIST数据库含有至少4个错误标签[11]

扩展MNIST(EMNIST)是由NIST开发和发布的一个更新的数据集,作为MNIST的(最终)继任者[12][13]。MNIST仅包含手写数字的图像,而EMNIST包括NIST特别数据库19中的所有图像,该数据库包含大量的手写大写和小写字母以及数字的图像[14][15]

表现

[编辑]

一些研究通过使用人工神经网络在MNIST数据库中取得了“接近人类的表现”[16]。原始数据库官方网站上列出的最高错误率为12%,这是使用简单线性分类器且没有预处理时的成绩[10][7]

在2004年,研究人员使用一种名为“LIRA”的基于罗森布拉特感知器原理的三层神经分类器,在数据库上实现了0.42%的最佳错误率[17]

一些研究者使用随机失真的MNIST数据库对人工智能系统进行测试。这些系统通常是人工神经网络系统,所使用的失真方式可能是仿射失真弹性失真英语Elastic deformation[7]。在某些情况下,这些系统可以非常成功;其中一个系统在数据库上实现了0.39%的错误率[18]

2011年,研究人员报告使用类似的神经网络系统,实现了0.27%的错误率,提升了之前的最佳成绩[19]。2013年,一种基于DropConnect正则化神经网络的方法声称实现了0.21%的错误率[20]。2016年,单个卷积神经网络在MNIST上的最佳性能为0.25%的错误率[21]。截至2018年8月,使用MNIST训练数据、没有数据增强的单个卷积神经网络的最佳性能为0.25%的错误率[21][22]。此外,乌克兰赫梅尔尼茨基的并行计算中心(Parallel Computing Center)使用了仅5个卷积神经网络的集成,在MNIST数据库上表现为0.21%的错误率[23][24]

参见

[编辑]

参考来源

[编辑]
  1. ^ THE MNIST DATABASE of handwritten digits. Yann LeCun, Courant Institute, NYU Corinna Cortes, Google Labs, New York Christopher J.C. Burges, Microsoft Research, Redmond. 
  2. ^ Support vector machines speed pattern recognition - Vision Systems Design. Vision Systems Design. [2013-08-17]. 
  3. ^ Gangaputra, Sachin. Handwritten digit database. [2013-08-17]. 
  4. ^ Qiao, Yu. THE MNIST DATABASE of handwritten digits. 2007 [2013-08-18]. (原始内容存档于2018年2月11號). 
  5. ^ Platt, John C. Using analytic QP and sparseness to speed training of support vector machines (PDF). Advances in Neural Information Processing Systems. 1999: 557–563 [2013-08-18]. (原始内容 (PDF)存档于2016-03-04). 
  6. ^ Grother, Patrick J. NIST Special Database 19 - Handprinted Forms and Characters Database (PDF). National Institute of Standards and Technology. 
  7. ^ 7.0 7.1 7.2 7.3 7.4 LeCun, Yann; Cortez, Corinna; Burges, Christopher C.J. The MNIST Handwritten Digit Database. Yann LeCun's Website yann.lecun.com. [2020-04-30]. 
  8. ^ Kussul, Ernst; Baidyk, Tatiana. Improved method of handwritten digit recognition tested on MNIST database. Image and Vision Computing. 2004, 22 (12): 971–981. doi:10.1016/j.imavis.2004.03.008. 
  9. ^ Zhang, Bin; Srihari, Sargur N. Fast k-Nearest Neighbor Classification Using Cluster-Based Trees (PDF). IEEE Transactions on Pattern Analysis and Machine Intelligence. 2004, 26 (4): 525–528 [2020-04-20]. PMID 15382657. doi:10.1109/TPAMI.2004.1265868. (原始内容 (PDF)存档于2021年7月25號). 
  10. ^ 10.0 10.1 LeCun, Yann; Léon Bottou; Yoshua Bengio; Patrick Haffner. Gradient-Based Learning Applied to Document Recognition (PDF). Proceedings of the IEEE. 1998, 86 (11): 2278–2324 [2013-08-18]. doi:10.1109/5.726791. 
  11. ^ Muller, Nicolas M.; Markert, Karla. Identifying Mislabeled Instances in Classification Datasets. 2019 International Joint Conference on Neural Networks (IJCNN). IEEE: 1–8. July 2019. ISBN 978-1-7281-1985-4. arXiv:1912.05283可免费查阅. doi:10.1109/IJCNN.2019.8851920. 
  12. ^ NIST. The EMNIST Dataset. NIST. 2017-04-04 [2022-04-11]. 
  13. ^ NIST. NIST Special Database 19. NIST. 2010-08-27 [2022-04-11]. 
  14. ^ Cohen, G.; Afshar, S.; Tapson, J.; van Schaik, A. EMNIST: an extension of MNIST to handwritten letters.. 2017. arXiv:1702.05373可免费查阅 [cs.CV]. 
  15. ^ Cohen, G.; Afshar, S.; Tapson, J.; van Schaik, A. EMNIST: an extension of MNIST to handwritten letters.. 2017. arXiv:1702.05373v1可免费查阅 [cs.CV]. 
  16. ^ Cires¸an, Dan; Ueli Meier; Jürgen Schmidhuber. Multi-column deep neural networks for image classification (PDF). 2012 IEEE Conference on Computer Vision and Pattern Recognition. 2012: 3642–3649. CiteSeerX 10.1.1.300.3283可免费查阅. ISBN 978-1-4673-1228-8. S2CID 2161592. arXiv:1202.2745可免费查阅. doi:10.1109/CVPR.2012.6248110. 
  17. ^ Kussul, Ernst; Tatiana Baidyk. Improved method of handwritten digit recognition tested on MNIST database (PDF). Image and Vision Computing. 2004, 22 (12): 971–981 [2013-09-20]. doi:10.1016/j.imavis.2004.03.008. (原始内容 (PDF)存档于2013-09-21). 
  18. ^ Ranzato, Marc'Aurelio; Christopher Poultney; Sumit Chopra; Yann LeCun. Efficient Learning of Sparse Representations with an Energy-Based Model (PDF). Advances in Neural Information Processing Systems. 2006, 19: 1137–1144 [2013-09-20]. 
  19. ^ Ciresan, Dan Claudiu; Ueli Meier; Luca Maria Gambardella; Jürgen Schmidhuber. Convolutional neural network committees for handwritten character classification (PDF). 2011 International Conference on Document Analysis and Recognition (ICDAR). 2011: 1135–1139 [2013-09-20]. CiteSeerX 10.1.1.465.2138可免费查阅. ISBN 978-1-4577-1350-7. S2CID 10122297. doi:10.1109/ICDAR.2011.229. (原始内容 (PDF)存档于2016-02-22). 
  20. ^ Wan, Li; Matthew Zeiler; Sixin Zhang; Yann LeCun; Rob Fergus. Regularization of Neural Network using DropConnect. International Conference on Machine Learning(ICML). 2013. 
  21. ^ 21.0 21.1 SimpleNet. Lets Keep it simple, Using simple architectures to outperform deeper and more complex architectures. 2016 [2020-12-03]. arXiv:1608.06037可免费查阅. 
  22. ^ SimpNet. Towards Principled Design of Deep Convolutional Networks: Introducing SimpNet. Github. 2018 [2020-12-03]. arXiv:1802.06205可免费查阅. 
  23. ^ Romanuke, Vadim. Parallel Computing Center (Khmelnytskyi, Ukraine) represents an ensemble of 5 convolutional neural networks which performs on MNIST at 0.21 percent error rate.. [2016-11-24]. 
  24. ^ Romanuke, Vadim. Training data expansion and boosting of convolutional neural networks for reducing the MNIST dataset error rate. Research Bulletin of NTUU "Kyiv Polytechnic Institute". 2016, 6 (6): 29–34. doi:10.20535/1810-0546.2016.6.84115可免费查阅. 

延伸阅读

[编辑]

外部链接

[编辑]
{{bottomLinkPreText}} {{bottomLinkText}}
MNIST数据库
Listen to this article

This browser is not supported by Wikiwand :(
Wikiwand requires a browser with modern capabilities in order to provide you with the best reading experience.
Please download and use one of the following browsers:

This article was just edited, click to reload
This article has been deleted on Wikipedia (Why?)

Back to homepage

Please click Add in the dialog above
Please click Allow in the top-left corner,
then click Install Now in the dialog
Please click Open in the download dialog,
then click Install
Please click the "Downloads" icon in the Safari toolbar, open the first download in the list,
then click Install
{{::$root.activation.text}}

Install Wikiwand

Install on Chrome Install on Firefox
Don't forget to rate us

Tell your friends about Wikiwand!

Gmail Facebook Twitter Link

Enjoying Wikiwand?

Tell your friends and spread the love:
Share on Gmail Share on Facebook Share on Twitter Share on Buffer

Our magic isn't perfect

You can help our automatic cover photo selection by reporting an unsuitable photo.

This photo is visually disturbing This photo is not a good choice

Thank you for helping!


Your input will affect cover photo selection, along with input from other users.

X

Get ready for Wikiwand 2.0 🎉! the new version arrives on September 1st! Don't want to wait?