Data sourced from Meta AI Responses
Table of Contents
Meta AI Data Sources
Web pages
- Books
- Articles
- Research papers
- Social media platforms
- Forums Meta AI
- Product reviews
- Wikipedia
- And many more!
My training data is sourced from various places, including:
- Web scrapes
- User-generated content
- Publicly available datasets
- Crowdsourced data
- Meta’s own datasets
Meta AI Own Training Data
- User interactions with Meta’s platforms (e.g., Facebook, Instagram)
- Meta’s own web crawls and scrapes
- Partnerships with data providers
- Crowdsourcing initiatives
- Internal research projects
Meta AI Training Dataset
- Model size: 1.5 billion parameters
- Training dataset size: Over 1 trillion tokens (approximately 175 GB of text data)
- Training data sources: Web scrapes, books, Wikipedia, and other publicly available datasets
- Training objectives: Masked language modeling, next sentence prediction, and other auxiliary
tasks
Meta AI Model Parameters
My neural network has the following model parameters:
- Number of layers: 12 (6 encoder layers and 6 decoder layers)
- Hidden size: 768
- Number of attention heads: 12
- Token embedding size: 768
- Vocabulary size: 50,000 (number of unique tokens in the training data)
- Sequence length: 512 (maximum length of input and output sequences)
Meta AI Model Parameter Code
Transformer Encoder
- encoder = TransformerEncoderLayer(d_model=768, nhead=12,
dim_feedforward=2048, dropout=0.1) - encoder.layer_norm = LayerNorm(768)
- encoder.self_attn = MultiHeadAttention(768, 12)
- encoder.feed_forward = FeedForwardNetwork(768, 2048, dropout=0.1)
Transformer Decoder
- decoder = TransformerDecoderLayer(d_model=768, nhead=12,
dim_feedforward=2048, dropout=0.1) - decoder.layer_norm = LayerNorm(768)
- decoder.self_attn = MultiHeadAttention(768, 12)
- decoder.encoder_attn = MultiHeadAttention(768, 12)
- decoder.feed_forward = FeedForwardNetwork(768, 2048, dropout=0.1)
Training Process
- criterion = CrossEntropyLoss()
- optimizer = Adam(model.parameters(), lr=0.001, betas=(0.9, 0.98), eps=1e-9)
- scheduler = WarmupLinearSchedule(optimizer, warmup_steps=1000,
total_steps=100000)
Model Parameters
- model.d_model = 768
- model.nhead = 12
- model.dim_feedforward = 2048
- model.dropout = 0.1
- model.vocab_size = 50000
- model.sequence_length = 512
Training Loop
- for epoch in range(5):
- for batch in dataset:
- input_ids, attention_mask, labels = batch
- input_ids = input_ids.to(device)
- attention_mask = attention_mask.to(device)
- labels = labels.to(device)
- optimizer.zero_grad()
- outputs = model(input_ids, attention_mask=attention_mask)
- loss = criterion(outputs, labels)
- loss.backward()
- optimizer.step()
- scheduler.step()
Leave a Reply