The Art of Statistical Learning from Data

Our research is focused on computational aspects and techniques for statistical learning from data. Advances in science and technology are now getting heavily dependent upon data and computing. Fast-advancing hardware for scientific experiments is producing a wide variety of big data, and scientific breakthroughs are increasingly powered by advanced computing capabilities. Business organizations of all sizes are struggling to keep up with the rate and pace of big data and use it to improve products, services, or the customer experience.

As the explosive proliferation of diverse and complicated data grows apparent everywhere around us, we are being pushed to rethink and redefine computing technology to make sense and make full use of the data. Given the ever more overwhelming pace of the increase, we can no longer survive only with makeshift solutions to individual problems. To leverage the intrinsic value of the richness of data, we need to precisely understand what is possible with the data and what is not by theoretical considerations of the common structure of real-world problems.

We develop theoretical bases that can clear the way for many individual cases of "learning from data" and solve the common difficulties in one go. Without theory, we tend to be biased and just see what we want to see. What theory do we need to practically identify regularity in data, and convincingly guarantee that it actually exists? Working through many real problems and data in life sciences, we also seek for good general theory that can answer those fundamental questions.

Research Areas

  • Machine learning and its applications
  • Data mining algorithms for knowledge discovery
  • Computational biology and bioinformatics

Some Current Research Topics

  • Statistical theories for learning representations by sparsity-inducing principles
  • Statistical theories for graph-valued random variables
  • Statistical problems with network-structured variables
  • Constrained enumeration of significant substructural patterns
  • Statistical reverse engineering of multi-layer hierarchical systems with cross-layer coordinations
  • Non-linear genetic interactions over partly observed networks
  • Computational analysis of promiscuous molecular interactions

Publications

[Book chapters]

  • A Bioinformatics Approach for Understanding Genotype–Phenotype Correlation in Breast Cancer. [doi]
    Yotsukura S, Karasuyama M, Takigawa I, Mamitsuka H
    Big Data Analytics in Genomics 2016;397-428
  • An in silico model for interpreting polypharmacology in drug–target networks. [doi]
    Takigawa I, Tsuda K, Mamitsuka H
    In Silico Models for Drug Discovery (Methods in Molecular Biology) 2013;993:67-80
  • Identifying pathways of coordinated gene expression. [doi]
    Hancock T, Takigawa I, Mamitsuka H
    Data Mining for Systems Biology (Methods in Molecular Biology) 2013;939:69-85

[Refereed journal papers]

  • Genomic copy number variation analysis in multiple system atrophy. [doi]
    Hama Y, Katsu M, Takigawa I, Yabe I, Matsushima M, Takahashi I, Katayama T, Utsumi J, Sasaki H.
    Molecular Brain. 2017; Accepted.
  • Machine learning reveals orbital interaction in materials. [doi]
    Pham T L, Kino H, Terakura K, Miyake T, Tsuda K, Takigawa I, Dam H C
    Science and Technology of Advanced Materials. 2017; 18(1): 756-765.
  • Generalized sparse learning of linear models over the complete subgraph feature set. [doi]
    Takigawa I, Mamitsuka H
    IEEE Transactions on Pattern Analysis and Machine Intelligence. 2017; 39(3): 617-624. (supplementary file)
  • An online self-constructive normalized Gaussian network with localized forgetting. [doi]
    Backhus J, Takigawa I, Imai H, Kudo M, Sugimoto M
    IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences. 2017; E100.A (3): 865-876.
  • Machine-learning prediction of d-band center for metals and bimetals. [doi]
    Takigawa I, Shimizu K, Tsuda K, Takakusagi S
    RSC Advances. 2016; 6: 52587-52595.
    highlighted in the article Machine-learning accelerates catalytic trend spotting (Chemistry World)
  • Exploring phenotype patterns of breast cancer within somatic mutations: a modicum in the intrinsic code. [doi]
    Yotsukura S, Karasuyama M, Takigawa I, Mamitsuka H
    Briefings in Bioinformatics. 2016.
  • Dense core model for cohesive subgraph discovery. [doi]
    Kojaku S, Takigawa I, Kudo M, Imai H
    Social Networks. 2016; 44: 143–152.
  • Mining approximate patterns with frequent locally optimal occurrences. [doi]
    Nakamura A, Takigawa I, Tosaka H, Kudo M, Mamitsuka H
    Discrete Applied Mathematics. 2016; 200:123–152
  • Predictions of Cleavability of Calpain Proteolysis by Quantitative Structure-Activity Relationship Analysis Using Newly Determined Cleavage Sites and Catalytic Efficiencies of an Oligopeptide Array [doi]
    Shinkai-Ouchi F, Koyama S, Ono Y, Hata S, Ojima K, Shindo M, duVerle D, Ueno M, Kitamura F, Doi N, Takigawa I, Mamitsuka H, Sorimachi H.
    Molecular & Cellular Proteomics. 2016; 15(4): 1262-80.
  • Ensemble and multiple kernel regressors: which is better? [doi]
    Tanaka A, Takebayashi H, Takigawa I, Imai H, Kudo M
    IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences. 2015; E98-A(11): 2315-2324.
  • The cell competition-based high-throughput screening identifies small compounds that promote the elimination of RasV12-transformed cells from epithelia. [doi]
    Yamaguchi H, Matsumaru T, Morita T, Ishikawa S, Maenaka K, Takigawa I, Semba K, Kon S, Fujita Y
    Scientific Report. 2015;15336.
  • MED26 regulates the transcription of snRNA genes through the recruitment of little elongation complex. [doi]
    Takahashi H, Takigawa I, Watanabe M, Anwar D, Shibata M, Tomomori-Sato C, Sato S, Ranjan A, Seidel C W, Tsukiyama T, Mizushima W, Hayashi M, Ohkawa Y, Conaway J W, Conaway R C, Hatakeyama S
    Nature Communications. 2015;6(5941).
  • The impact of income disparity on vulnerability and information collection: an analysis of the 2011 Thai flood. [doi]
    Henry M, Kawasaki A, Takigawa I, Meguro K
    Journal of Flood Risk Management. (In Press).
  • Ribosomes in a stacked array: Elucidation of the step in translation elongation at which they are stalled during S-adenosyl-L-methionine-induced translation arrest of CGS1 mRNA. [doi]
    Yamashita Y, Kadokura Y, Sotta N, Fujiwara T, Takigawa I, Satake A, Onouchi H, Naito S
    Journal of Biological Chemistry. 2014;289(18):12693-704.
  • Similarity-based machine learning methods for predicting drug–target interactions: a brief review. [doi]
    H Ding, I Takigawa, H Mamitsuka, S Zhu
    Briefings in Bioinformatics. 2014;15(5):734-747 (Review Paper)
  • SiBIC: A web server for generating gene set networks based on biclusters obtained by maximal frequent itemset mining. [doi]
    Takahashi K, Takigawa I, Mamitsuka H
    PLoS One. 2013;8(12) e82890.
  • Fast algorithms for finding a minimum repetition representation of strings and trees. [doi]
    Nakamura A, Saito T, Takigawa I, Kudo M, Mamitsuka H
    Discrete Applied Mathematics. 2013;161(10-11):1556–1575
  • Graph mining: procedure, application to drug discovery and recent advances. [doi]
    Takigawa I, Mamitsuka H
    Drug Discovery Today. 2013;18(1-2):50-57 (Review Paper)
  • Identifying neighborhoods of coordinated gene expression and metabolite profiles. [doi]
    Hancock T, Wicker N, Takigawa I, Mamitsuka H
    PLoS One. 2012;7(2) e31345.
  • ROS-DET: robust detector of switching mechanisms in gene expression. [doi]
    Kayano M, Takigawa I, Shiga M, Tsuda K, Mamitsuka H
    Nucleic Acids Research. 2011;39(11): e74.
  • Mining significant substructure pairs for interpreting polypharmacology in drug-target network. [doi]
    Takigawa I, Tsuda K, Mamitsuka H
    PLoS One. 2011;6(2): e16999.
  • Efficiently mining delta-tolerance closed frequent subgraphs. [doi]
    Takigawa I, Mamitsuka H
    Machine Learning. 2011;82(2): 95-121.
  • A spectral approach to clustering numerical vectors as nodes in network. [doi]
    Shiga M, Takigawa I, Mamitsuka H
    Pattern Recognition. 2011;44(2): 236-251.
  • Mining metabolic pathways through gene expression. [doi]
    Hancock T, Takigawa I, Mamitsuka H
    Bioinformatics. 2010;26(17): 2128-2135.
  • On the performance of methods for finding a switching mechanism in gene expression. [doi]
    Kayano M, Takigawa I, Shiga M, Tsuda K, Mamitsuka H
    Genome Informatics. 2010;24: 69-83.
    (from the 10th Annual International Workshop on Bioinformatics and Systems Biology (IBSB2010), Kyoto, Japan, July 26-28, 2010)
  • Convex sets as prototypes for classifying patterns. [doi]
    Takigawa I, Kudo M, Nakamura A
    Engineering Applications of Artificial Intelligence. 2009;22(1): 101-108.
  • CaMPDB: a resource for calpain and modulatory proteolysis. [doi]
    duVerle D, Takigawa I, Ono Y, Sorimachi H, Mamitsuka H
    Genome Informatics. 2009;22: 202-214.
    (from the 9th Annual International Workshop on Bioinformatics and Systems Biology (IBSB2009), Boston, USA, July 27-29, 2009)
  • Efficiently finding genome-wide three-way gene interactions from transcript- and genotype-data. [doi]
    Kayano M, Takigawa I, Shiga M, Tsuda K, Mamitsuka H
    Bioinformatics. 2009;25(21): 2735-2743.
  • Field independent probabilistic model for clustering multi-field documents. [doi]
    Zhu S, Takigawa I, Zeng J, Mamitsuka H
    Information Processing & Management. 2009;45(5): 555-570.
  • Mining significant tree patterns in carbohydrate sugar chains. [doi]
    Hashimoto K*, Takigawa I*, Shiga M, Kanehisa M, Mamitsuka H (* equally contributed)
    Bioinformatics. 2008;24(16): i167-i173.
    (from ECCB'08 European Conference on Computational Biology, Cagliari, Italy, Sep 22-26, 2008)
  • Probabilistic path ranking based on adjacent pairwise coexpression for metabolic transcripts analysis. [doi]
    Takigawa I, Mamitsuka H
    Bioinformatics. 2008;24(2): 250-257.
  • Annotating gene function by combining expression data with a modular gene network. [doi]
    Shiga M, Takigawa I, Mamitsuka H
    Bioinformatics. 2007;23(13): i468-i478.
    (from the 15th Annual International Conference on Intelligent Systems for Molecular Biology (ISMB/ECCB 2007), Vienna, Austria, Jul 21-15, 2007)
  • Performance analysis of minimum L1-norm solutions for underdetermined source separation. [doi]
    Takigawa I, Kudo M, Toyama J
    IEEE Transactions on Signal Processing. 2004;52(3): 582-591.
  • The boosted/bagged subclass method.
    Takigawa I, Abe N, Shidara Y, Kudo M
    International Journal of Computing Anticipatory Systems. 2004;14: 311-320.
    (from the 6th International Conference on Computing Anticipatory Systems (CASYS'03), Liege, Belgium, Aug 11-16, 2003)

[Refereed conference papers]

  • Online EM for the Normalized Gaussian Network with Weight-Time-Dependent Updates. [doi]
    Backhus J, Takigawa I, Imai H, Kudo M, Sugimoto M.
    The 23rd International Conference on Neural Information Processing (ICONIP 2016) Kyoto, Japan, October 16–21, 2016
  • Reducing Redundancy with Unit Merging for Self-constructive Normalized Gaussian Networks. [doi]
    Backhus J, Takigawa I, Imai H, Kudo M, Sugimoto M.
    The 25th International Conference on Artificial Neural Networks (ICANN 2016), Barcelona Spain, September 6-9, 2016.
  • Community change detection in dynamic networks in noisy environment. [WWW15]
    Koujaku S, Kudo M, Takigawa I, Imai H
    The 6th International Workshop on Modeling Social Media - Behavioral Analytics in Social Media, Big Data and the Web (MSM 2015), Florence, Italy, May 19, 2015
  • Theoretical analyses on ensemble and multiple kernel regressors. [JMLR proc]
    Tanaka A, Takigawa I, Imai H, Kudo M
    The 6th Asian Conference on Machine Learning (ACML2014), Nha Trang, Vietnam, November 26-28, 2014
  • Analyses on generalization error of ensemble kernel regressors. [doi]
    Tanaka A, Takigawa I, Imai H, Kudo M
    Proceedings of the Joint IAPR International Workshop on Statistical, Structural, and Syntactic Pattern Recognition (S+SSPR 2014), Joensuu, Finland, August 20-22, 2014.
    Lecture Notes in Computer Science, 2014;8621: 273-281.
  • Structual change point detection for evolutional networks. [link]
    Koujaku S, Kudo M, Takigawa I, Imai H
    Proceedings of the 2013 International Conference of Computational Statistics and Data Engineering, London, UK, July 3-5, 2013.
  • Extended analyses for an optimal kernel in a class of kernels with an invariant metric. [doi]
    Tanaka A, Takigawa I, Imai H, Kudo M
    Proceedings of the Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition (SSPR&SPR 2012), Hiroshima, Japan, November 7-9, 2012.
    Lecture Notes in Computer Science, 2012;7627: 345-353.
  • Algorithms for finding a minimum repetition representation of a string. [doi]
    Nakamura A, Saito T, Takigawa I, Mamitsuka H, Kudo M
    Proceedings of the 17th symposium on String Processing and Information Retrieval (SPIRE2010), 185-190, Los Cabos, Mexico, Oct 11-13, 2010.
  • Classification by reflective convex hulls. [doi]
    Kudo M, Nakamura A, Takigawa I
    Proceedings of the 19th International conference on pattern recognition (ICPR2008), WeAT9.3, Tampa, Florida, USA, Dec 8-11, 2008.
  • A spectral clustering approach to optimally combining numerical vectors with a modular network. [doi]
    Shiga M, Takigawa I, Mamitsuka H
    Proceedings of the Thirteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2007), 647-656, San Jose, CA, USA, Aug 12-15, 2007.
  • A probabilistic model for clustering text documents with multiple fields. [doi]
    Zhu S, Takigawa I, Zhang S, Mamitsuka H
    the 29th European Conference on Information Retrieval (ECIR 2007), Roma, Italy, Apr 2-5, 2007.
    Lecture Notes in Computer Science, 2007;4425: 331-342.
  • Applying Gaussian distribution-dependent criteria to decision trees for high-dimensional microarray data. [doi]
    Wan R, Takigawa I, Mamitsuka H
    VLDB Workshop on Data Mining in Bioinformatics, Seoul, Korea, Sep 11, 2006.
    Lecture Notes in Computer Science, 2006;4316: 40-49.
  • The convex subclass method: combinatorial classifier based on a family of convex sets. [doi]
    Takigawa I, Kudo M, Nakamura A
    the IAPR International Conference on Machine Learning and Data Mining in Pattern Recognition (MLDM 2005), Leipzig, Germany, Jul 9-11, 2005.
    Lecture Notes in Computer Science, 2005;3587: 90-99.
  • Projection learning based kernel machine design using series of monotone increasing reproducing kernel hilbert spaces. [doi]
    Tanaka A, Takigawa I, Imai H, Kudo M, Miyakoshi M
    the 8th International Conference on Knowledge-Based Intelligent Information & Engineering Systems (KES2004), Wellington, New Zealand, Sep 20-24, 2004.
    Lecture Notes in Computer Science, 2004;3213: 1058-1064.
  • On the minimum L1-norm signal recovery in underdetermined source separation. [doi]
    Takigawa I, Kudo M, Nakamura A, Toyama J
    the 5th International Conference on Independent Component Analysis and Blind Signal Separation (ICA2004), Granada, Spain, Sep 22-24, 2004.
    Lecture Notes in Computer Science, 2004;3195: 193-200.
  • Error analysis of MAP solutions under Laplace prior in underdetermined blind source separation.
    Takigawa I, Kudo M, Toyama J, Shimbo M
    Proceedings of the Second International ICSC Symposium on Advances in Intelligent Data Analysis (AIDA'01), paper 1724-169, Bangor, U.K., June 19-22, 2001.
    (Proceedings CIMA'2001, ISBN 3-906454-26-6)
  • A modified LEGION using a spectrogram for speech segregation. [doi]
    Takigawa I, Kudo M, Toyama J, Shimbo M
    Proceedings of IEEE International Conference on Systems, Man, and Cybernetics (SMC'99), paper I 526-531, Tokyo, Japan, Oct 12-15, 1999.
    (ISBN 0-7803-5734-5, IEEE Catalog Number 99CH37028C)

[Unrefereed publications]

  • Mining patterns from glycan structures.
    Takigawa I, Hashimoto K, Shiga M, Kanehisa M, Mamitsuka H
    Proceedings of the International Beilstein Symposium on Glyco-Bioinformatics, 13-14, 2010. (Invited Talk, the International Beilstein Symposium on Glyco-Bioinformatics (Glyco-Bioinformatics2009), Potsdam, Germany, 4-8 October, 2009)
  • Combining vector-space and word-based aspect models for passage retrieval.
    Wan R, Ngoc Anh V, Takigawa I, Mamitsuka H
    Proceedings of 15th Text Retrieval Conference (TREC 2006), Gaithersburg, Maryland, Nov 14-17, 2006.
  • Subclass covering by balls for pattern classification.
    Takigawa I, Kudo M, Nakamura A
    Proceedings of The 2nd International Workshop on Ubiquitous Knowledge Network Environment, Sapporo, Japan, Mar 16-18, 2005.
  • The subclass method using adaptive sampling.
    Takigawa I, Abe N, Shidara M, Kudo M,
    Proceedings of The 1st International Workshop on Ubiquitous Knowledge Network Environment, Sapporo, Japan, Nov 25-27, 2003.