Activity-based Twitter sampling for content-based and user-centric prediction models

Table 1 Notations of variables and their corresponding description

Notations	Descriptions
N	Total number of documents
M	Total number of users, \(1<m<M\)
V	Global vocabulary
w	Word in vocabulary
\({u}_i\)	ith user out of m
\({s}_u\)	Sentiment score belongs to user (u)
q	Aggregation window
\({y}_i\)	Crime rate at time t(i)
\({\Delta }r\)	Lag between a document and a target trend
\({p}_i\)	A post tweeted at time t(i)
\({d}_i\)	A document sampled at time t(i)
\({X}^{\langle c \rangle }\)	Document term matrix of size \(N*\|V\|\) sparse matrix
\({X}^{\langle u \rangle }\)	Document sentiment matrix of size \(N*M\) sparse matrix