Works
Foundations of Natural Language Processing
This is a Japanese textbook for the foundations of Natural Language Processing based on Deep Learning. This book is intended primarily for the beginner who will start NLP research and the software developers who want to strengthen the theoretical aspects.
NLP 100 Exercise
NLP 100 Exercise is a boot camp designed for learning skills for programming, data analysis, and research activities by taking practical and exciting assignments. It covers UNIX commands, regular expressions, part-of-speech tagging, dependency parsing, word embeddings, and deep neural network for the research and development of Natural Language Processing.
Machine Learning Notebook
Machine Learning Notebook aims to realize a 'note' for learning machine learning as a new form of a notebook enhanced by computers. It covers the theories and implementations of machine learning methods such as regression, classification, clustering, and principal component analysis. It is used in the lecture on "Machine Learning" (CSC.T254), Tokyo Institute of Technology.
Python Quick Reference
Python Quick Reference provides Jupyter notebooks for taking a quick tour of Python programs and their executions. It covers the basics of Python as well as useful libraries such as NumPy and Matplotlib. One can start learning Python on Google Colaboratory and Amazon SageMaker Studio Lab.
Introduction to Deep Learning
This site provides slides and Jupyter Notebooks about Natural Language Processing based on Deep Learning, including word embeddings, RNN, LSTM, CNN, sequence-to-sequence models, attention mechanism, Transformer, GPT, and BERT. It is used in the latter half of the lecture, "Advanced Machine Learning" (ART.T458), Tokyo Institute of Technology.
libLBFGS
libLBFGS is a C port of the implementation of Limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) method written by Jorge Nocedal in FORTRAN. Unlike C codes generated automatically by f2c (Fortran 77 into C converter), this port includes changes based on my interpretations, improvements, optimizations, and enhancements.
SimString
SimString is an implementation of a simple and efficient algorithm for approximate string matching, which retrieves strings in a database whose similarity with a query string is no smaller than a threshold. SimString facilitates various applications including spelling correction, fuzzy search, and approximate dictionary matching.
Biography
- 2017-: Professor, Okazaki Lab, Artificial Intelligence Course, Department of Computer Science, School of Computing, Tokyo Institute of Technology
- 2011-2017: Associate Professor, Inui-Okazaki Lab, Graduate School of Information Sciences, Tohoku University
- 2007-2011: Researcher, Tsujii Lab, Graduate School of Information Science and Technology, University of Tokyo
- 2003-2007: PhD Course, Department of Information and Communication Engineering, Graduate School of Information Science and Technology, University of Tokyo
- 2001-2003: Master Course, Department of Information and Communication Engineering, Graduate School of Information Science and Technology, University of Tokyo
- 1997-2001: Bachelor Course, Department of Information and Communication Engineering, School of Engineering, University of Tokyo
Award
- Best Paper Award, the 30th Annual Meeting of The Association for Natural Language Processing, Swallow Corpus: Japanese Large-Scale Web Corpus (2024)
- Best Paper Award, the 30th Annual Meeting of The Association for Natural Language Processing, Constructing Large Language Models with Strong Japanese Capability Through Continual Pre-training (2024)
- Encouragement Award, the 18th Symposium of Young Researcher Association for NLP Studies, Constructing Document-Level Relation Extraction Corpora in Japanese (2023)
- Sponsorship Award (PKSHA Technology), the 18th Symposium of Young Researcher Association for NLP Studies, OUTFOX: LLM-generated Essay Detection through In-context Learning with Adversarially Generated Examples (2023)
- Sponsorship Award (HAKUHODO Technologies), the 18th Symposium of Young Researcher Association for NLP Studies, OUTFOX: LLM-generated Essay Detection through In-context Learning with Adversarially Generated Examples (2023)
- Best Paper Award (first place), the 29th Annual Meeting of The Association for Natural Language Processing, DREEAM: Guiding Attention with Evidence for Improving Document-Level Relation Extraction (2023)
- Best Paper Award, the 29th Annual Meeting of The Association for Natural Language Processing, Semantic Specialization for Knowledge-based Word Sense Disambiguation (2023)
- Sponsorship Award (Hitachi), the 29th Annual Meeting of The Association for Natural Language Processing, Query Suggestion and Summarization: Generating Query-Summary Pairs for Query-Focused Summarization (2023)
- Special Committee Award, the 29th Annual Meeting of The Association for Natural Language Processing, Solving NLP Problems through Human-System Collaboration: A Discussion-based Approach (2023)
- Best paper award (first place), the Association for Natural Language Processing, Optimizing Word Segmentation for Downstream Tasks by Weighting Text Vector (2022)
- Best Paper Award, the 28th Annual Meeting of The Association for Natural Language Processing, IMPARA: Impact-based Metrics for GEC using PARAllel Data (2022)
- Special Committee Award, the 28th Annual Meeting of The Association for Natural Language Processing, Non-autoregressive Generation using the Nearest Neighbor (2022)
- Special Committee Award, the 28th Annual Meeting of The Association for Natural Language Processing, Selective Prediction for Evaluating Confidence of Knowledge in Language Models (2022)
- Special Committee Award, the 28th Annual Meeting of The Association for Natural Language Processing, An Automatic Selection Method for Thumbnail Image using Movie Title (2022)
- AKBC2021 Outstanding Paper Award, Behavioral Testing of Knowledge Graph Embedding Models for Link Prediction (2021)
- Best Paper Award, the 27th Annual Meeting of The Association for Natural Language Processing, Hyponymy Detection using Hierarchical Code Learning (2021)
- Committee Special Award, the 27th Annual Meeting of The Association for Natural Language Processing, Headline Generation that Reliably Contains the Specified Words (2021)
- Sponsor Award, the 27th Annual Meeting of The Association for Natural Language Processing, Headline Generation that Reliably Contains the Specified Words (2021)
- TokyoTech Education Award 2020, University-wide Education Program of Data Science and Artificial Intelligence for Graduate Students (2021)
- Presentation award, the 15th NTCIR, WER99 at the NTCIR-15 QA Lab-PoliInfo-2 Classification Task (2020)
- Winning the Video-guided Machine Translation (VMT) Challenge 2020, Keyframe Segmentation and Positional Encoding for Video-guided Machine Translation Challenge 2020 (2020)
- Language resource award, the 26th Annual Meeting of The Association for Natural Language Processing, Style Transfer for Abstractive Summarization in a Small-scale Resource (2020)
- Best Paper Award, the 240th Meeting of Special Interest Group of Natural Language Processing (SIGNL), Information Processing Society of Japan (IPSJ), Re-examinating the Task of Headline Generation based on Textual Entailment (2019)
- JSAI 2018 Best Paper Award, Learning to Compose Distributed Representations of Relational Patterns (2018)
- Best Paper Award, the 24th Annual Meeting of The Association for Natural Language Processing, Reducing odd generation in neural headline generation (2018)
- 2016 Microsoft Research Award on Information Processing (2017)
- PACLIC-30 Best Paper Honorable Mentions, Recognizing Open-Vocabulary Relations between Objects in Images (2016)
- The 15th Funai Scholarly Award (2016)
- The Young Scientists' Prize, the Commendation for Science and Technology by the Minister of Education, Culture, Sports, Science and Technology (2016)
- Best Paper Award (supreme), the 22nd Annual Meeting of The Association for Natural Language Processing, Model for Selectional Preference using Context Information based on Distributed Representation (2016)
- Best Paper Award, the 22nd Annual Meeting of The Association for Natural Language Processing, Dynamic Distributed Representation of Local Context within Discourse (2016)
- Research award, IPSJ SIG of Natural Language Processing, Contextualizing Selectional Preference Model based on Distributed Representation (2016)
- PACLIC-29 Best Paper Award (Computation), Reducing Lexical Features in Parsing by Word Embeddings (2015)
- Docomo Mobile Science Award on Cutting-Edge Technology (2015)
- Research award, IPSJ SIG of Natural Language Processing, Additive Composition of Logarithm of Co-occurrence Vector (2015)
- Best Paper Award (supreme), the 21st Annual Meeting of The Association for Natural Language Processing, Semantic Computation of Relation Patterns based on Compositionality (2015)
- Research Award for Young Scientists, Minoru Ishida Foundation (2014)
- Research award, ARG SIG Web Intelligence and Intraction (WI2), Follow-up Analysis of Tweets regarding the Cover of Japanese "Artificial Intelligence" Journal (2014)
- AMT2014 Best Paper Award, Mining False Information on Twitter for a Major Disaster Situation (2014)
- Best Paper Award (supreme), the 20th Annual Meeting of The Association for Natural Language Processing, Distributional Semantic Representation of Words and Phrases based on Gauss Distribution (2014)
- Best Paper Award, Journal of Natural Language Processing, Extracting False Information on Twitter and Analyzing its Diffusion Processes by using Linguistic Patterns for Correction (2014)
- Best Paper Award, the 26th Annual Meeting of the Society for Risk Analysis Japan, Analysis and Countermeasure of Rumors about Peaches Grown in Fukushima Prefecture based on Twitter (2013)
- Best Paper Award, Journal of Natural Language Processing, Generalization of Semantic Roles in Automatic Semantic Role Labeling (2011)
- Best Paper Award for Young Researcher of IPSJ National Convention, Fast Algorithm for Approximate Dictionary Matching (2011)
Services (international)
Executives
- Members-at-Large (MAL), Asian Federation of Natural Language Processing (AFNLP) (2017-2018)
International journals
- Editorial board, Computational Intelligence (2015-)
- Action Editor, Transactions of the Association for Computational Linguistics (2023-2025)
- Standing reviewer team, Transactions of the Association for Computational Linguistics (2014-2023)
- Reviewer, AI Communications (2014)
- Reviewer, IEEE/ACM Transactions on Audio Speech and Language (2019)
- Reviewer, American Society for Information Science and Technology (2009)
- Reviewer, Applied Clinical Informatics (2014)
- Reviewer, Bioinformatics (2016, 2017)
- Reviewer, BMC Bioinformatics (2010)
- Reviewer, Cheminformatics (2014)
- Reviewer, Computer Speech and Language (2022)
- Reviewer, Computational Intelligence (2011, 2012, 2013)
- Reviewer, Computers in Industry (2015)
- Reviewer, Data and Knowledge Engineering (2016)
- Reviewer, IEICE Transaction on Information and Systems (2010, 2012, 2016)
- Reviewer, IEEE Transaction on Neural Networks and Learning Systems (2016)
- Reviewer, Information Processing (2015)
- Reviewer, Journal of Information Processing (2017)
- Reviewer, Information Processing and Management (2011)
- Reviewer, Information Sciences (2011)
- Reviewer, Knowledge-Based Systems
- Reviewer, Journal of Cheminformatics (2014)
- Reviewer, Language Resources and Evaluation (2012, 2019)
- Reviewer, Machine Learning Research (2009, 2012, 2015, 2016)
- Reviewer, New Generation of Computing (2019)
- Reviewer, Transactions of the Association for Computational Linguistics (2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021)
- Reviewer, Transactions on Knowledge and Data Engineering (2012)
- Reviewer, Transactions on Management Information Systems (2013)
- Reviewer, Journal of Medical Internet Research (2017, 2019)
- Reviewer, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) (2021, 2022)
International conferences
- Program chair, ACL 2023
- Tutorial chair, ACL 2022, LREC-COLING 2024
- Workshop chair, IJCNLP 2013, EMNLP-IJCNLP 2019
- Publication chair, EMNLP-CoNLL 2012
- Senior area chair, EMNLP 2023, EMNLP 2024, COLING 2025
- Area chair, ACL 2012, 2016, 2021, EMNLP 2022, LREC 2022
- Action editor, ACL Rolling Review (ARR) 2021, 2022, 2023, 2024
- Senior Program Committee (SPC), AAAI 2021, 2022, 2023
- Senior Program Committee (SPC), IJCAI 2020, 2021, 2022
- Program Committee, AAAI 2011, 2014, 2015, 2017, 2018, 2019, 2020
- Program Committee, AACL 2020
- Program Committee, ACL 2009, 2010, 2013, 2015, 2016, 2017, 2018, 2019, 2020
- Program Committee, ACL Rolling Review (ARR) 2021
- Program Committee, BigComp 2015, 2016
- Program Committee, BioNLP 2011, 2013, 2015, 2016, 2017, 2018, 2020
- Program Committee, BioTxtM 2012, 2014, 2016
- Program Committee, Coling 2008, 2010, 2012, 2014, 2016, 2018, 2020, 2022
- Program Committee, CoNLL 2014, 2015
- Program Committee, Conference on Language Modeling (COLM) 2024
- Program Committee, DTMBIO 2012
- Program Committee, EACL 2012, 2014, 2017
- Program Committee, EDB 2016
- Program Committee, EMNLP 2010, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021
- Program Committee, GeCKo 2020
- Program Committee, ICLR 2020, 2021, 2022
- Program Committee, IJCAI 2011, 2016, 2018, 2019
- Program Committee, IJCNLP 2011, 2017
- Program Committee, KIKE 2016
- Program Committee, LOUHI 2018
- Program Committee, LREC 2018, 2020
- Program Committee, NAACL 2016, 2018, 2021
- Program Committee, NeurIPS 2021, 2022
- Program Committee, NewSum 2021, 2023
- Program Committee, SemEval 2020
- Program Committee, SMBM 2010, 2012
- Program Committee, SPNLP 2022
- Program Committee, W-NUT 2016, 2017, 2018, 2020, 2021
- General chair, Young Researchers Symposium on Natural Language Processing 2016 (YRSNLP 2016)