RESEARCH

PULSE

研究心拍

Discipline today.
Breakthrough tomorrow.

← Dashboard

Activity archive

Daily logs and weekly reflections, newest first.

Weekly reflection

Jun 8, 2026 – Jun 14, 2026

2026-24

No reflection text.

Final weekly XP (after rules): 0 XP

Fri, June 12, 2026

0 XP

PAPERS (SKIM)

[AgentVista: Evaluating Multimodal Agents in Ultra-Challenging Realistic Visual Scenarios] - as the name of the paper, focus on realism and task diversity.

Weekly reflection

Jun 1, 2026 – Jun 7, 2026

2026-23

No reflection text.

Final weekly XP (after rules): 34 XP

Sun, June 7, 2026

61 XP

EXPERIMENTS

Test Qwen3-ASR on different sets of hyperparams. Lessons: maybe I am wrong when trying to solve a solved problem, maybe I identify the problem wrongly, I am wrong when trying to put a solution on a problem, this is not how we solve sth.

WORDS WRITTEN: 400

Thu, June 4, 2026

0 XP

PAPERS (SKIM)

(SURF: Separation via Unsupervised Remixing Flow; ICML 2026, GG DeepMind) - combination of supervised flow matching and regression-based self-supervised techniques, do not require clean source data, connect to Wake-Sleep algorithm. This paper's idea is very creative.

EXPERIMENTS

Try constraint decoding, have better mental model before fine-tuning

Weekly reflection

May 25, 2026 – May 31, 2026

2026-22

No reflection text.

Final weekly XP (after rules): 0 XP

Sat, May 30, 2026

0 XP

PAPERS (SKIM)

Adaptive Contrastive Search: : Uncertainty-Guided Decoding for Open-Ended Text Generation

EXPERIMENTS

Normalize Chinese text and fine-tune again, add a small trick for token predicted at chunk boundary. Try to have more diverse hypotheses.

Thu, May 28, 2026

0 XP

EXPERIMENTS

Test TRELLIS
Integrate Qwen3-ASR-0.6B ec ckpt to current streaming system

Tue, May 26, 2026

0 XP

EXPERIMENTS

Ultravox-v0.5-Llama3.2-1B is very bad at following the instructions.

Dell ma nay comeback (18 days left)

Weekly reflection

May 18, 2026 – May 24, 2026

2026-21

No reflection text.

Final weekly XP (after rules): 0 XP

Mon, May 18, 2026

0 XP

EXPERIMENTS

Pruning Ultravox with Llama backbone by 30% params in Llama layers, the overall performance is worsen but better than I expected for a training-free method.

Weekly reflection

May 11, 2026 – May 17, 2026

2026-20

Harder please.

Final weekly XP (after rules): 196 XP

Sun, May 17, 2026

0 XP

PAPERS (SKIM)

"A SIMPLE AND EFFECTIVE PRUNING APPROACH FOR LARGE LANGUAGE MODELS" - a training-free pruning technique leveraging weight magnitude and corresponding input activations, estimated using a small set of calibration data.

EXPERIMENTS

Furthur fine-tuning on waihu (both normalization + punc restoration) and tested on waihu, but the performance is worse than original recognizer plus fine-tuning on waihu.

Let try adding some novelties.

Fri, May 15, 2026

122 XP

PAPERS (DEEP)

"https://www.youtube.com/watch?v=ptFiH_bHnJw" - LLM Architectures, Hyperparameters. This talks about Rotary Position Embeddings (RoPE), pre norm and RMSNorm, SwigGLU/GeGLU, dropping bias terms. Hyperparamters: d_ff~=4xd_model, head_dim*n_heads~=d_model, model should be deep or wide -> d_model/n_layers ~ 10 to 100, vocab size: 30-50k for monoling and 100-250k for multiling, weight decay for better training losses, z-loss & Q, K matrix layer norm for training stability, KV cache, attention head.

EXPERIMENTS

Finetuned on punctuation restoration data vs mixed between punc and non-punc. Finetuned with 1e-5 lr vs 1e-4 lr.

WORDS WRITTEN: 300

Countdown: 7 days. Also, just started thinking about my MPhil Thesis

Wed, May 13, 2026

0 XP

PAPERS (DEEP)

I read "Why vulnerability discovery is mathematically difficult" with this link "https://github.com/yo-yo-yo-jbo/vr_difficulty". Why I read this? First, this topic belongs to CS. Second, it discusses challenges of AI in a specific domain by the domain expert. Random thought while reading: it would be cool if I could use reduction approach in my future paper. Note: AI can't solve everything because the Halting Problem is unsolvable.

I should practice both thinking and speaking in both English and Vietnamese since rely on one language for one skill will limit and confuse myself when using that skill in other language.

Tue, May 12, 2026

62 XP

PAPERS (SKIM)

"When AVSR Meets Video Conferencing: Dataset, Degradation, and the Hidden Mechanism Behind Performance Collapse" - report performance degration of AVSR on video conferencing, construct a VC dataset,

EXPERIMENTS

Fine-tuning on mixed offline + online data. Then, fine-tuning on online data. The performance improves.

WORDS WRITTEN: 150

Những việc cần làm cho hôm nay: DynaCocktail, cải thiện kết quả của StreamCorrect trên data của WeBank, ngủ sớm

Mon, May 11, 2026

61 XP

PAPERS (SKIM)

EXPERIMENTS

Fine-tune on offline error correction data before transfering to streaming error correction data. I have also tried continue fine-tuning from previous checkpoint's optimizer state and with this, the loss diverged. I am not sure this is good or bad, it could be good if the signigicantly low losses mean excessive memorization.

WORDS WRITTEN: 650

- Nhận ra mình chỉ có thể làm việc một mình hoặc đánh support chứ không giỏi việc lead. - Papers thì đọc cho sota nhưng mà experiment thì còn stuck ở training data quality, duma how far behind the frontier am I :((?

Weekly reflection

May 4, 2026 – May 10, 2026

2026-19

No reflection text.

Final weekly XP (after rules): 34 XP

Sat, May 9, 2026

0 XP

PAPERS (SKIM)

"Pinyin Regularization in Error Correction for Chinese Speech Recognition with LLMs" - only use the BELLE model to collect the hypotheses, only infer at beam size of 10, one of the core technique is leveraging the Pinyin-regularized text. Gonna use the dataset in this paper to pretrain my error corrector.

Fri, May 8, 2026

Grace day61 XP

Thu, May 7, 2026

0 XP

PAPERS (SKIM)

"Whispering LLaMA: A Cross-Modal Generative Error Correction Framework for Speech Recognition" - an old paper on error correction, I am thinking if there are any insights from this paper that could be applied to my work: weight initilization for unfreeze layers to better align weights of distinct models? Learnable adapter (but I already finetuned a speechlm)? Their prompt template?
"Chain of Correction for Full-Text Speech Recognition with LLMs" - correct documents, articles, news report,... segment by segment. The paper also includes a correction threshold to balance between under-correction and over-rephrasing. Interestingly, I realize that this full-text correction is quite similar to my current streaming correction, instead of segment by segment, streaming needs to handle chunk by chunk, a smaller unit of speech.

Learnt that during vibe coding, if I give the coding agent the bugs and it tries to patch them case by case, it's highly likely that there still exists other bugs/cases aren't fixed. Simply asking the model for a more generalizable algorithm could result in better fix.

Wed, May 6, 2026

0 XP

PAPERS (DEEP)

"Better & Faster Large Language Models via Multi-token Prediction" MTP wasn't scalable, the paper proposes scalable paradigm with no train time and memory overhead. Works naturally with self-speculative decoding -> 3x faster. 4-token decoding is optimal based on experiments.

Tue, May 5, 2026

0 XP

EXPERIMENTS

Qwen3-ASR-1.7B full parametric fine-tuning - dell ma fine-tune có tí cho recognition task performance cải thiện nhanh hơn plug in error corrector vào nữa
Nghịch Claude API, tks capstone

Chopin - Nocturne No. 21 in C Minor

Mon, May 4, 2026

0 XP

PAPERS (SKIM)

"Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding": using Bayesian Optimization to selectively skipping some layers of the original LLM, hence generating drafting tokens with faster inference speed.

Đang đọc "Better & Faster Large Language Models via Multi-token Prediction" mà đọc chậm vc

Weekly reflection

Apr 27, 2026 – May 3, 2026

2026-18

Dành gần hết cả tuần tìm cách synthesize data ít noise nhất có thể, rồi fine-tune Ultravox-v0.5-Llama3.2-1B và Qwen3-ASR-0.6B. Qwen3-ASR-0.6B pretrained trên speech recognition task nên khi fine-tune trên error correction task thì loss fluctuate kinh khủng. Ultravox-v0.5-Llama3.2-1B thì tốt hơn, không biết có phải do Ultravox là general speechlm không nữa...

Final weekly XP (after rules): 49 XP

Sun, May 3, 2026

61 XP

PAPERS (SKIM)

SepPrune proposes a differentiable pruning technique for speech separation model. After masking high computational cost layers, the model is fine-tuning on remaining parameters in 1 epoch, which recovers 86%+ performance of the original model fine-tuned on 493 epochs. The codebase of this paper isn't well documented. Pruning a model for 36x faster training is cool but I am not sure whether I need this technique now, I am just wondering why people who training foundational model don't use this?

EXPERIMENTS

16-bit LoRA fine-tuning on 2x data size 16-bit LoRA fine-tuning on 2x data size but removing samples where top-1 hypothesis=ground-truth to reduce bias towards top-1.
Full parametric fine-tuning of Qwen3-ASR-0.6B. Losses are crazy.

How much "noisy" is acceptable for training data? I know this would vary across models and tasks. There obviously exists papers talking about this, at this stage, I only need to have a feeling about how much noisy my data will be during planning model architecture and data preparation pipeline.

Sat, May 2, 2026

0 XP

Build cái web này hết mẹ buổi chiều.