BlackHC's Adventures

BlackHC's Adventures in the Dev World http://blog.blackhc.net A Riff on "The Slow Death of Scaling" Sara Hooker’s essay “On the Slow Death of Scaling” (2026) is a thought-provoking piece that deserves a careful read.1 What follows is my interpretation (or perhaps more accurately, a riff) on her arguments, where I’ll both steelman her position and push back where I think the evidence points elsewhere. Scaling... Mon, 12 Jan 2026 00:00:00 +0000 http://blog.blackhc.net/2026/01/riff-on-death-of-scaling/ http://blog.blackhc.net/2026/01/riff-on-death-of-scaling/ Active Learning vs. Data Filtering:<br>Selection vs. Rejection What is the difference between active learning (and active sampling) and data filtering? And why do we treat data selection differently during training versus before training? This post explores the fundamental distinction between active dataset selection and data filtering, which we will phrase as “selection vs. rejection” using an appeal to... Sat, 17 May 2025 00:00:00 +0100 http://blog.blackhc.net/2025/05/active-learning-vs-filtering/ http://blog.blackhc.net/2025/05/active-learning-vs-filtering/ The Paradox of Polarization: When More Facts Backfire This note explores a seemingly simple yet surprisingly profound example of how rational agents can diverge in their beliefs even when exposed to identical evidence. While the mathematical model we’ll examine is highly simplified, its core mechanism offers a potential lens through which to understand the complex dynamics of real-world... Tue, 07 Jan 2025 00:00:00 +0000 http://blog.blackhc.net/2025/01/belief-polarization/ http://blog.blackhc.net/2025/01/belief-polarization/ Why is the Bayesian Model Average the best choice? Why is the Bayesian model average (BMA) often hailed as the optimal choice for rational actors making predictions under uncertainty? Is this claim justified, and if so, what’s the underlying logic? No seriously, why? 😂 Please point me to the right references on this. Below, I enumerate some naive thoughts... Sat, 10 Aug 2024 00:00:00 +0100 http://blog.blackhc.net/2024/08/BMA-optimality/ http://blog.blackhc.net/2024/08/BMA-optimality/ Function-Space Variational Inference and Label Entropy Regularization (#2) In the first part of this two-part series on Function-Space Variational Inference (FSVI), we looked at the Data Processing Inequality (DPI). In this second part, we finally look at the relationship between FSVI, a method focusing on the Bayesian predictive posterior rather than the parameter space, and the DPI. We... Sat, 11 Nov 2023 00:00:00 +0000 http://blog.blackhc.net/2023/11/sdpi_fsvi_2/ http://blog.blackhc.net/2023/11/sdpi_fsvi_2/ Data Processing Inequalities and Function-Space Variational Inference (#1) In information theory, the data processing inequality (DPI) is a powerful concept. Informally, it tells us that processing data cannot increase the amount of contained information. In this two-part blog post, we will explore the DPI and its applications to function-space variational inference (FSVI). The data processing inequality examines how... Mon, 14 Aug 2023 00:00:00 +0100 http://blog.blackhc.net/2023/08/sdpi_fsvi/ http://blog.blackhc.net/2023/08/sdpi_fsvi/ Bayesian Appropriation: Variational Inference = PAC-Bayes Optimization? In this blog post, following the previous blog post1 on “Bayesian Appropriation: General Likelihood for Loss Functions”, we will examine and better understand parts of the paper “PACTran: PAC-Bayesian Metrics for Estimating the Transferability of Pretrained Models to Classification Tasks”2 (“PACTran”), which was presented as an oral at the ECCV... Thu, 29 Jun 2023 00:00:00 +0100 http://blog.blackhc.net/2023/06/variational-inference-and-pac-bayes/ http://blog.blackhc.net/2023/06/variational-inference-and-pac-bayes/ Bayesian Appropriation: General Likelihood for Loss Functions In this blog post, we explore how some losses could be rewritten as a Bayesian objective using ideas from variational inference—hence, the tongue-in-cheek “Bayesian Appropriation.” This can make it easier to see connections between loss functions and Bayesian methods (e.g. by spotting similar patterns in the wild). We will first provide... Mon, 19 Jun 2023 00:00:00 +0100 http://blog.blackhc.net/2023/06/general_likelihood/ http://blog.blackhc.net/2023/06/general_likelihood/ Understanding the Rao-Blackwell Theorem The Rao-Blackwell theorem is a fundamental theorem in statistics that offers a powerful method for improving estimators by conditioning on sufficient statistics. It is named after two statisticians, C.R. Rao and David Blackwell, who independently discovered it. The theorem is relevant in many areas of statistics, including machine learning algorithms... Sat, 03 Jun 2023 00:00:00 +0100 http://blog.blackhc.net/2023/06/rao-blackwell/ http://blog.blackhc.net/2023/06/rao-blackwell/ Simplicity Wins: How Large Language Models Will Revolutionize Software Engineering Software engineering is on the brink of a revolution with the emergence of large language models (LLMs). LLMs are AI systems that have been trained on large amounts of data, allowing them to generate natural language text and source code. LLMs allow developers to specify intent using prompts, rather than... Fri, 23 Dec 2022 00:00:00 +0000 http://blog.blackhc.net/2022/12/llm_software_engineering/ http://blog.blackhc.net/2022/12/llm_software_engineering/ Research Idea: Encouraging Ensemble Diversity and Model Disagreement in Active Learning and Beyond While training of deep ensembles or BNNs, we should be able to maximize the BALD score (model disagreement metric) as a regularizer using unlabeled data to improve model diversity and active learning efficiency (or OOD detection) where it matters: for pool or evaluation set data. Given the limited capacity of... Fri, 28 Oct 2022 00:00:00 +0100 http://blog.blackhc.net/2022/10/diversify_active_learning_and_ensembles_via_BALD/ http://blog.blackhc.net/2022/10/diversify_active_learning_and_ensembles_via_BALD/ Research Idea: Approximating BatchBALD via "k-BALD" This post introduces a family of much less expensive approximations for BatchBALD that might work well where BatchBALD works. You might have noticed that BatchBALD can be very, very slow. We can approximate BatchBALD using pairwise mutual information terms, leading to a new approximation, we call 2-BALD, or generally, following... Sat, 02 Jul 2022 00:00:00 +0100 http://blog.blackhc.net/2022/07/kbald/ http://blog.blackhc.net/2022/07/kbald/ Paper Review: Bayesian Model Selection, the Marginal Likelihood, and Generalization The paper, accepted as Long Oral at ICML 2022, discusses the (log) marginal likelihood (LML) in detail: its advantages, use-cases, and potential pitfalls, with an extensive review of related work. It further suggests using the “conditional (log) marginal likelihood (CLML)” instead of the LML and shows that it captures the... Sat, 04 Jun 2022 00:00:00 +0100 http://blog.blackhc.net/2022/06/bayesian-model-selection-marginal-likehood-generalization/ http://blog.blackhc.net/2022/06/bayesian-model-selection-marginal-likehood-generalization/ On the Total Variation Distance The definition of the total variation distance can be confusing (at least to me) as it is formulated as a supremum. There is a simpler formulation. We connect the two here and provide some intuitions. The reason for this post is that recently, I was looking at some more theoretical... Sat, 26 Feb 2022 00:00:00 +0000 http://blog.blackhc.net/2022/02/total-variation-distance/ http://blog.blackhc.net/2022/02/total-variation-distance/ On Classification Metrics and an Alternative to the F1 Score We express common performance metrics, such as recall, precision and so on, for classification tasks using probabilities and examine the F1 score and simplify it to a ratio that is simpler to understand. The F1 score \(F\) is usually defined as harmonic mean of precision and recall: \[ F_1 =... Mon, 21 Feb 2022 00:00:00 +0000 http://blog.blackhc.net/2022/02/f1-score-linearization/ http://blog.blackhc.net/2022/02/f1-score-linearization/ Research Idea: Intellectually Pleasing Outlier Exposure (with Applications in Active Learning) This post discusses potential failure cases of outlier exposure—when using “fake” label distributions for outliers—and presents an intellectually pleasing version of outlier exposure in latent space, treating outliers as purely negative samples from a contrastive point-of-view. About this post: I repeat my motivation from the last post: during my day-to-day,... Mon, 07 Feb 2022 00:00:00 +0000 http://blog.blackhc.net/2022/02/intellectually-pleasing-outlier-exposure/ http://blog.blackhc.net/2022/02/intellectually-pleasing-outlier-exposure/ Research Idea: Active Learning for NLP Models via Question Asking During my day-to-day, I read papers and procrastinate from writing my thesis, so I often come up with high-level questions that I cannot research because I don’t have the experience, time, and computing resources. The following is such a research question which—if it has not been answered by someone else... Mon, 10 Jan 2022 00:00:00 +0000 http://blog.blackhc.net/2022/01/al-for-large-nlp-models/ http://blog.blackhc.net/2022/01/al-for-large-nlp-models/ Reading the Deep Learning Book - Chapter 2 This are my notes and observations from reading the Linear Algebra chapter of the Deep Learning book. The following notes are presented in order of value to the reader. I start with a discussion of the Moore-Penrose pseudoinverse, followed by a short reflection on “broadcasting”. Moore-Penrose pseudoinverse The Moore-Penrose pseudoinverse... Tue, 28 Mar 2017 00:48:34 +0100 http://blog.blackhc.net/2017/03/dlb-chapter2/ http://blog.blackhc.net/2017/03/dlb-chapter2/ Recent Medium Posts I have published a couple of posts on Medium to see how it works: A Dart REPL PoC - Hacking with Dart A well-received post about https://github.com/BlackHC/dart_repl which is a PoC interactive shell for Dart. I hope I’ve been able to restore some Karma points with the Dart team over... Mon, 13 Mar 2017 21:48:34 +0000 http://blog.blackhc.net/2017/03/recent-medium/ http://blog.blackhc.net/2017/03/recent-medium/ Imitating PBRT-style literate programming in LaTeX Today, I want to release another bit of code from my master’s thesis. However, this time it won’t be C++ code, instead I’m going to release some LaTeX code which I used to display source code fragments with. Specifically the results look like this: You can download the accompanying example... Sun, 24 Mar 2013 23:19:44 +0000 http://blog.blackhc.net/2013/03/imitating-pbrt-style-literate-programming-in-latex/ http://blog.blackhc.net/2013/03/imitating-pbrt-style-literate-programming-in-latex/