Quickstart¶

Follow these 3 simple steps to train your multi-task model!

Step 1 - Define your task file¶

Task file is a YAML format file where you can add all your tasks for which you want to train a multi-task model.

TaskA:
  model_type: BERT
  config_name: bert-base-uncased
  dropout_prob: 0.05
  label_map_or_file:
  -label1
  -label2
  -label3
  metrics:
  - accuracy
  loss_type: CrossEntropyLoss
  task_type: SingleSenClassification
  file_names:
  - taskA_train.tsv
  - taskA_dev.tsv
  - taskA_test.tsv

TaskB:
  model_type: BERT
  config_name: bert-base-uncased
  dropout_prob: 0.3
  label_map_or_file: data/taskB_train_label_map.joblib
  metrics:
  - seq_f1
  - seq_precision
  - seq_recall
  loss_type: NERLoss
  task_type: NER
  file_names:
  - taskB_train.tsv
  - taskB_dev.tsv
  - taskB_test.tsv

For knowing about the task file parameters to make your task file, refer here.

Step 2 - Run data preparation¶

After defining the task file in Step 1, run the following command to prepare the data.

$ python data_preparation.py \
    --task_file 'sample_task_file.yml' \
    --data_dir 'data' \
    --max_seq_len 50

For knowing about the data_preparation.py script and its arguments, refer here.

Step 3 - Run train¶

Finally you can start your training using the following command.

$ python train.py \
    --data_dir 'data/bert-base-uncased_prepared_data' \
    --task_file 'sample_task_file.yml' \
    --out_dir 'sample_out' \
    --epochs 5 \
    --train_batch_size 4 \
    --eval_batch_size 8 \
    --grad_accumulation_steps 2 \
    --log_per_updates 25 \
    --save_per_updates 1000 \
    --eval_while_train True \
    --test_while_train True \
    --max_seq_len 50 \
    --silent True

For knowing about the train.py script and its arguments, refer here.