a picture of me

Kui Tang

Machine Teacher

When you are perfectly free to feel stuck or not stuck, then you're unstuck.
Alan Watts

I have worked on machine learning research for nearly five years, working with Prof. Tony Jebara while I was an undergrad at Columbia University and then enrolling in the PhD program there to work with Prof. David Blei as well. I've worked on approximate inference in graphical models, computation for Bayesian machine learning, convex optimization, and vector models for words, retail products, and other domains. After receiving my M.S., I left in order to focus on real-world applications of machine learning.

I have also completed graduate internships at Palantir and Microsoft Research, where in addition to applied modeling and communicating methods and results to stakeholders, I also helped build production data pipelines and worked with Azure, Spark, and other big-data platforms.

Below is some of the academic work I did.


  1. K. Tang, N. Ruozzi, D. Belanger, T. Jebara. "Bethe Learning of Conditional Random Fields via MAP Decoding". Artificial Intelligence and Statistics (AISTATS) 2016. [pdf] [bib] [code]
  2. A. Weller, K. Tang, D. Sontag, T. Jebara. "Approximating the Bethe Partition Functions". Uncertainty in Artificial Intelligence (UAI) 2014. [pdf] [code] [bib]
  3. M. Kambadur, K. Tang, M. Kim. "ParaShares: Finding the Important Basic Blocks in Multithreaded Programs." Euro-Par 2014. [pdf] [bib]
  4. K. Tang, A. Weller and T. Jebara. "Network Ranking with Bethe Pseudomarginals". NIPS Workshop on Discrete Optimization in Machine Learning 2013. [pdf] [bib]
  5. K. Choromanski, T. Jebara and K. Tang. "Adaptive Anonymity via b-Matching". Neural Information Processing Systems (NIPS) 2013. [pdf] [supp] [code] [bib]
  6. M. Kambadur, K. Tang, J. Lopez, and M. Kim. "Parallel scaling properties from a basic block view". International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS) 2013. [pdf] [bib]
  7. M. Kambadur, K. Tang, and M. Kim. "Parallel Block Vectors: Collection, Analysis, and Uses". IEEE Micro, 33(3):86-94, 2013. [pdf] [code] [bib]
  8. M. Kambadur, K. Tang, and M. Kim. "Harmony: Collection and Analysis of Parallel Block Vectors". International Symposium for Computer Architecture (ISCA), Jun. 2012. [pdf] [code] [bib]

Invited Talks

Bethe Learning of Conditional Random Fields via MAP Decoding.
(Cambridge University Engineering Department, May 2016) [slides]
Machine Learning: Lesson for Emerging Scholars Program
(Columbia University, November 2015) An introductory lesson to machine learning, covering supervised and unsupervised learning, as well as distributed computation. Interactive discussion and problem-solving prompts included. Designed for first- and second-year undergraduates with no prior background. [slides]
Adaptive Anonymity via b-Matching
(UMass Amherst Machine Learning and Friends Lunch, February 2014) [slides]
Statistical Machine Learning with Bayesian Networks
(Columbia Data Science Society, November 2013; hackNY Masters, New York University, September 2013.) I teach the fundamentals of Bayesian networks, MCMC inference, and probabilistic programming to undergraduates and masters students with little previous exposure to statistics or machine learning. [IPython notebook] [github]
The Computational Complexity of Probabilistic Inference Using Bayesian Belief Networks
(Machine Learning Reading Group, Columbia University, June 2012.) I walk through the proof by G.F. Cooper that posterior inference in Bayesian networks is NP-hard. The proof reduces from 3SAT using an elegant gadget graph. [slides]


See also my GitHub page for projects not listed here.
mexcpp: Easier MATLAB and C++ Integration
This library makes it easier to write MEX extensions by providing a natural C++ syntax to read and write MATLAB structures with no runtime penalty thanks to template metaprogramming. Arays, cells, and structures are all supported. [github]
Polynomial Time Inference of Bethe Partition Function and Marginals
Code to minimize the Bethe free energy up to additive error via discrete optimization, as well as continuous methods to minimize the Bethe free energy over various polytopes (but with no guarantees). See our UAI 2014 paper. [tarball]
Adaptive Anonymity via b-Matching
Safely release a sensitive database for analysis by censoring selected entries. We compute the censored entries by b-matching, which results in more released data compared to the state of the art, while maintaining the same privacy guarantee. See our NIPS 2013 paper. [tarball] [instructions] [b-match solver documentation]
Profile multithreaded programs: measure the number of times each line of code was run and how many threads ran it at the same time. This can be used to detect concurrency bottlenecks and analyze global program efficiency. See our ISCA and Micro papers. [homepage]


Below are notes on various mathematical topics, course projects, and unpublished manuscripts.

Review Notes
To review for finals, I like to condense the content of a course into 10-20 pages of notes. The following may be useful references:
Quasi Monte Carlo Inference for Log Gaussian Cox Processes
(With Jessica Forde and Liam Paninski, 2013) We give a self-contained introduction to quasi Monte Carlo, a deterministic substitute to Monte Carlo, and then use it for a Bayesian inference for a computationally difficult problem, including embedding quasi Monte Carlo samples into a Metropolis-Adjusted Langevin algorithm. Encouraging convergence rate speedup results are presented. [paper] [slides]
Inferring Direct and Indirect Functional Connectivity Between Neurons From Multiple Neural Spike Train Data
(With Ben Shababo and Frank Wood, 2012) We aim to model the functional connectivity of neural microcircuits by accounting for (1) unobserved neurons and (2) long-run connections, two problems not yet adequately addressed in the literature. Encouraging simulation and lab-data results are analyzed. [paper]
Personalized Emotion Classification with Latent Dirichlet Allocation
(2012) Work on sentiment analysis of text has focused on scalar values of sentiment. We instead represent emotions multidimensionally as topic vectors and learn a personalized emotion lexicon. We take a semi-supervised approach with latent Dirichlet allocation (LDA) and a derivative, SubjLDA, initializing our model with a psychologically validated seed corpus. We demonstrate significant improvements to the baseline. [paper]