CS224N-Lecture01-Word Vectors

2025-10-23

NLP

1.8k 词

WordNet 对比项普通字典 WordNet 建立方式人类写作者为读者编写语言学家为计算机编写内容结构每个词独立定义、例句每个词义（Synset）之间有网络关系主要用途查含义、拼写计算机进行语义推理、相似度计算数据形式文本图结构（词义节点 + 语义边） Problems with WordNet One-hot Problem of similarity WordNet 方式失败，因此要“学习向量”而非人工列举。 Distributional semantics: word vector word vector=word embedding =word representation “分布式”意思是——词义被分布在多个维度上，不是集中在某一个维度。每一维都反映一部分语义（可能与语法、语境、主题有关）。例如，“banking”的第 7 维可能反映“金融性”，第 19 维可能反映“机构性”，等等。因此，语义信息是**分布式编码（distributed encoding）**的。 Word2vec 核心概念Word2...

整数规划经典问题类型

2025-10-11

未分类

1.2k 词

BIP with additional constraints ConstraintsIf investment 2 is made, then investment 4 must also be made：it means: if then If investment 1 is made, then investment 3 cannot be made, it means: if then 0-1 Knapsack Problem Integer Knapsack Problem Assignment problem Set covering problem 注意这里是任意i和任意j都满足。 Traveling salesman problem 注意：因为我们并不知道在求解时，哪个子集 S 会形成独立的小圈(subtour)。为了防止任何这种情况出现，我们必须“对所有可能的 S”都强制加上限制条件。 Brute Force The Knapsack and Covering Problems 设 () 是一个人为构造的假设场景这是一个特殊例子，不...

凹凸性的信息

2025-10-11

数学

2.8k 词

Jensen’s inequality注释：函数值的平权大于函数点的平均 Non-negativity of KL-Distance Maximum Entropy Theorem最大熵定理用KL-distance 非负性推出KL 散度总是非负：展开：注意，所以：因此：等号条件当 P=U即 X 服从均匀分布时），KL 散度为零，上界取到：严格凹函数 ⇒ 极值唯一我们已经知道是严格凹函数（因为每个项的二阶导数为）。对一个严格凹函数 f，若在凸约束集上（如概率单纯形）存在驻点，则该驻点是唯一的全局最大值。换句话说：严格凹+线性约束⟹唯一的最大点。这里的 |X|代表什么实际上这里的 ∣X∣ 是指随机变量 X 能取的不同取值的数量，也就是它的样本空间的基数（cardinality）。 KL散度和互信息的区别：层级 KL 散度互信息样本空间一维二维比较对象真实分布 vs 模型分布真实联合分布 vs 假设独立分布几何意义衡量“两个分布的距离” 衡量“真实相关性偏离独立的程度” 信息含义模型误差变量间...

Generalized least squares estimators

2025-10-01

数学

66 词

Grouped data Generalized least squares estimators: Unknown form of variance

The Goldfeld–Quandt test

2025-09-29

数学

105 词

The Goldfeld–Quandt test How to correct heteroskedasticity Generalized least squares estimators: Known form of variance

Heteroskedasticity

2025-09-28

数学

1.1k 词

Heteroskedasticity come from The passage is describing heteroskedasticity, a situation where the variance of the error term ( ) is not constant but depends on the explanatory variable (x).In ordinary regression we usually assume homoskedasticity: (), the same for all values of . But here, the idea is that when (x) is large in magnitude, the spread (variance) of the errors is also larger. In probabilistic terms, if () grows with (x), then the probability that () takes on large positive or ne...

Chow-test

2025-09-28

数学

1.4k 词

Main idea of “restriction”A restriction is always about which coefficients we force to equal zero under the null hypothesis. It depends on the question we want to test. If the question is “Does SOUTH matter?”, then the restrictions only apply to the θ’s (the coefficients that multiply SOUTH and its interactions). If the question is “Does race (BLACK), gender (FEMALE), or their interaction matter?”, then the restrictions would apply to the δ’s and γ as well. So it’s about the hypothesis, ...

Least squares prediction and Indicator Variables

2025-09-23

数学

2.9k 词

Least squares prediction CI and PI CI: “Where is the average wage for people with 16 years of education likely to be?” PI: “If I pick one random person with 16 years of education, what wage will they likely earn?” Confidence Interval (CI) Target: The conditional mean This is a fixed but unknown number (not random once the data and are fixed). The randomness comes from the estimation process (sampling variation in ). So the CI is telling you: “With 95% confidence, the true mean l...

What is information theory and coding

2025-09-22

数学

2.3k 词

goal what should we learn communication system Source Encoder Converts the raw source (text, audio, image) into a sequence of bits . Channel Encoder Adds redundancy to protect against noise. Modulator Converts the encoded bits into waveforms/signals that can physically travel through the channel. Channel + Noise The medium distorts the signal (attenuation, interference, random noise). Demodulator Recovers the bit sequence from the noisy received signal. Channel Decoder Uses r...

Further inference in the multiple linear regression model

2025-09-22

数学

2.2k 词

Joint hypothesis testing Simple null hypothesis → involves a restriction on one sign (<,=,>) only Joint null hypothesis → involves two or more restrictions at the same time Testing the effect of advertising: The F-test restricted model : An unrestricted model is the “full” regression specification, where you estimate all parameters freely without imposing any restrictions. For example, if your regression is then the unrestricted model estimates all at once. A restricte...

分类

标签