In today’s society, a wide variety of data such as data on urban infrastructure and the global environment, medical and health related data, marketing data, data on materials and their physical properties, are generated every day. Analysis and utilization of these large-scale data are indispensable for the development of fundamental information technologies for new social systems and services. In our laboratory, we conduct research on”statistical learning theory”, the fundamental theory of machine learning,  “data mining”, its practical application, and development of algorithms for large-scale data that support both aspects of machine learning.

Research on Learning Schemes


Dual Cached Loops

非同期マルチプロセススキーム Cached Loops の概念図

Dual Cached Loops(DCL)is an optimization scheme for machine learning methods by operating two threads asynchronously. One type of threads called Writer thread sequentially accesses to hard disk and repeatedly writes data into RAM. Another type of threads called Training Thread continuously updates parameters without being yielded by other operations. We revealed that this scheme allows efficient processing of data that exceed memory capacity and tera-bytes scale data can be dealt with in one machine.

Distributed Stochastic Optimization based on double separability

Distributed Stochastic Optimization

Stochastic optimization is often used in case one deals regularized risk minimization problem (RERM) with large-scale data such that one machine can not deal with it. In such cases that stochastic optimization is conducted in a distributed environment, parameters have to be synchronized frequently and this is the bottleneck of computation time.
We established Distributed Stochastic Optimization (DSO) method that deals with the equivalent saddle-point-problem to  RERM and prevents the number of synchronization.

Data-driven Machine Learning