2025-10-09 19:34:26 - experiment_save_merged_model - INFO - Starting merged model save process 2025-10-09 19:34:26 - experiment_save_merged_model - INFO - Arguments: {'lambdas_path': '/work/gj26/b20042/LLM-AdaMerge/outputs/deepseek-7b/task-wise/cross_entropy-ep3-lr0001-10%dataset-lambda01_wo_scheduler/llm_adamerge_lambdas.json', 'model_config': '/work/gj26/b20042/LLM-AdaMerge/outputs/deepseek-7b/task-wise/cross_entropy-ep3-lr0001-10%dataset-lambda01_wo_scheduler/model_config.yaml', 'output_dir': '/work/gj26/b20042/LLM-AdaMerge/mergekit/outputs/deepseek-7b/llmadamerge/task-wise-cross_entropy-lr0001-ep3-10%dataset/lambda01_wo_scheduler', 'model_name': 'merged-model', 'push_to_hub': False, 'hub_repo_id': 'lejelly/deepseek-ep3-data10-taskwise-lambda01', 'private': False, 'device': 'cuda', 'debug': False} 2025-10-09 19:34:26 - experiment_save_merged_model - INFO - Loading lambdas from /work/gj26/b20042/LLM-AdaMerge/outputs/deepseek-7b/task-wise/cross_entropy-ep3-lr0001-10%dataset-lambda01_wo_scheduler/llm_adamerge_lambdas.json 2025-10-09 19:34:26 - experiment_save_merged_model - INFO - Auto-detected parameter-wise merge from JSON structure 2025-10-09 19:34:26 - experiment_save_merged_model - INFO - Merge type: parameter_wise 2025-10-09 19:34:26 - experiment_save_merged_model - INFO - [Initial] Memory Usage: 2025-10-09 19:34:26 - experiment_save_merged_model - INFO - Process: 0.43 GB (0.2%) 2025-10-09 19:34:26 - experiment_save_merged_model - INFO - System: 31.30 GB / 212.49 GB (19.3%) 2025-10-09 19:34:26 - experiment_save_merged_model - INFO - Available: 171.40 GB 2025-10-09 19:34:26 - experiment_save_merged_model - INFO - GPU 0: Allocated: 0.00 GB, Reserved: 0.00 GB, Total: 94.50 GB 2025-10-09 19:34:26 - experiment_save_merged_model - INFO - Loading models 2025-10-09 19:34:48 - experiment_save_merged_model - INFO - [After loading models] Memory Usage: 2025-10-09 19:34:48 - experiment_save_merged_model - INFO - Process: 38.80 GB (18.3%) 2025-10-09 19:34:48 - experiment_save_merged_model - INFO - System: 70.27 GB / 212.49 GB (41.0%) 2025-10-09 19:34:48 - experiment_save_merged_model - INFO - Available: 125.43 GB 2025-10-09 19:34:48 - experiment_save_merged_model - INFO - GPU 0: Allocated: 0.00 GB, Reserved: 0.00 GB, Total: 94.50 GB 2025-10-09 19:34:48 - experiment_save_merged_model - INFO - Initializing parameter_wise AdaMerge 2025-10-09 19:35:56 - experiment_save_merged_model - INFO - Loading learned lambdas 2025-10-09 19:35:56 - experiment_save_merged_model - INFO - Deleting original models to free memory (task vectors already computed) 2025-10-09 19:35:56 - experiment_save_merged_model - INFO - [Before deleting models] Memory Usage: 2025-10-09 19:35:56 - experiment_save_merged_model - INFO - Process: 64.56 GB (30.4%) 2025-10-09 19:35:56 - experiment_save_merged_model - INFO - System: 96.66 GB / 212.49 GB (52.3%) 2025-10-09 19:35:56 - experiment_save_merged_model - INFO - Available: 101.38 GB 2025-10-09 19:35:56 - experiment_save_merged_model - INFO - GPU 0: Allocated: 0.00 GB, Reserved: 0.00 GB, Total: 94.50 GB 2025-10-09 19:35:56 - experiment_save_merged_model - INFO - Clearing model_loader references 2025-10-09 19:35:56 - experiment_save_merged_model - INFO - Deleting model variables 2025-10-09 19:35:56 - experiment_save_merged_model - INFO - Running garbage collection 2025-10-09 19:35:56 - experiment_save_merged_model - INFO - [After deleting models and GC] Memory Usage: 2025-10-09 19:35:56 - experiment_save_merged_model - INFO - Process: 38.80 GB (18.3%) 2025-10-09 19:35:56 - experiment_save_merged_model - INFO - System: 70.45 GB / 212.49 GB (40.0%) 2025-10-09 19:35:56 - experiment_save_merged_model - INFO - Available: 127.60 GB 2025-10-09 19:35:56 - experiment_save_merged_model - INFO - GPU 0: Allocated: 0.00 GB, Reserved: 0.00 GB, Total: 94.50 GB 2025-10-09 19:35:56 - experiment_save_merged_model - INFO - [After loading lambdas] Memory Usage: 2025-10-09 19:35:56 - experiment_save_merged_model - INFO - Process: 38.80 GB (18.3%) 2025-10-09 19:35:56 - experiment_save_merged_model - INFO - System: 70.45 GB / 212.49 GB (40.0%) 2025-10-09 19:35:56 - experiment_save_merged_model - INFO - Available: 127.60 GB 2025-10-09 19:35:56 - experiment_save_merged_model - INFO - GPU 0: Allocated: 0.00 GB, Reserved: 0.00 GB, Total: 94.50 GB 2025-10-09 19:35:56 - experiment_save_merged_model - INFO - Creating merged model with learned lambdas 2025-10-09 19:35:56 - experiment_save_merged_model - INFO - Using merge_models_for_save() 2025-10-09 19:37:52 - experiment_save_merged_model - INFO - [After merging models] Memory Usage: 2025-10-09 19:37:52 - experiment_save_merged_model - INFO - Process: 38.07 GB (17.9%) 2025-10-09 19:37:52 - experiment_save_merged_model - INFO - System: 96.30 GB / 212.49 GB (49.8%) 2025-10-09 19:37:52 - experiment_save_merged_model - INFO - Available: 106.67 GB 2025-10-09 19:37:52 - experiment_save_merged_model - INFO - GPU 0: Allocated: 12.87 GB, Reserved: 25.74 GB, Total: 94.50 GB 2025-10-09 19:37:52 - experiment_save_merged_model - INFO - Freeing memory from AdaMerge object (task vectors and base params no longer needed) 2025-10-09 19:37:52 - experiment_save_merged_model - INFO - Deleting task vectors 2025-10-09 19:37:52 - experiment_save_merged_model - INFO - Deleting base params 2025-10-09 19:37:52 - experiment_save_merged_model - INFO - Deleting functional model 2025-10-09 19:37:53 - experiment_save_merged_model - INFO - [After freeing AdaMerge memory] Memory Usage: 2025-10-09 19:37:53 - experiment_save_merged_model - INFO - Process: 0.04 GB (0.0%) 2025-10-09 19:37:53 - experiment_save_merged_model - INFO - System: 45.21 GB / 212.49 GB (25.8%) 2025-10-09 19:37:53 - experiment_save_merged_model - INFO - Available: 157.75 GB 2025-10-09 19:37:53 - experiment_save_merged_model - INFO - GPU 0: Allocated: 12.87 GB, Reserved: 12.87 GB, Total: 94.50 GB 2025-10-09 19:37:53 - experiment_save_merged_model - INFO - Saving merged model to /work/gj26/b20042/LLM-AdaMerge/mergekit/outputs/deepseek-7b/llmadamerge/task-wise-cross_entropy-lr0001-ep3-10%dataset/lambda01_wo_scheduler 2025-10-09 19:37:53 - experiment_save_merged_model - INFO - Moving merged model to CPU for saving 2025-10-09 19:38:47 - experiment_save_merged_model - INFO - Successfully saved 3 safetensors files: 2025-10-09 19:38:47 - experiment_save_merged_model - INFO - - model-00003-of-00003.safetensors (3674.14 MB) 2025-10-09 19:38:47 - experiment_save_merged_model - INFO - - model-00002-of-00003.safetensors (4750.20 MB) 2025-10-09 19:38:47 - experiment_save_merged_model - INFO - - model-00001-of-00003.safetensors (4756.17 MB) 2025-10-09 19:38:47 - experiment_save_merged_model - INFO - [After saving model] Memory Usage: 2025-10-09 19:38:47 - experiment_save_merged_model - INFO - Process: 12.85 GB (6.0%) 2025-10-09 19:38:47 - experiment_save_merged_model - INFO - System: 45.24 GB / 212.49 GB (28.8%) 2025-10-09 19:38:47 - experiment_save_merged_model - INFO - Available: 151.28 GB 2025-10-09 19:38:47 - experiment_save_merged_model - INFO - GPU 0: Allocated: 0.00 GB, Reserved: 0.00 GB, Total: 94.50 GB 2025-10-09 19:38:47 - experiment_save_merged_model - INFO - Saving tokenizer 2025-10-09 19:38:47 - experiment_save_merged_model - INFO - Copied lambdas file to /work/gj26/b20042/LLM-AdaMerge/mergekit/outputs/deepseek-7b/llmadamerge/task-wise-cross_entropy-lr0001-ep3-10%dataset/lambda01_wo_scheduler/learned_lambdas.json 2025-10-09 19:38:47 - experiment_save_merged_model - INFO - Creating model card 2025-10-09 19:38:47 - experiment_save_merged_model - INFO - Cleaning up models 2025-10-09 19:38:47 - experiment_save_merged_model - INFO - [After cleanup] Memory Usage: 2025-10-09 19:38:47 - experiment_save_merged_model - INFO - Process: 12.86 GB (6.1%) 2025-10-09 19:38:47 - experiment_save_merged_model - INFO - System: 45.23 GB / 212.49 GB (28.8%) 2025-10-09 19:38:47 - experiment_save_merged_model - INFO - Available: 151.28 GB 2025-10-09 19:38:47 - experiment_save_merged_model - INFO - GPU 0: Allocated: 0.00 GB, Reserved: 0.00 GB, Total: 94.50 GB 2025-10-09 19:38:47 - experiment_save_merged_model - INFO - Model saved successfully to /work/gj26/b20042/LLM-AdaMerge/mergekit/outputs/deepseek-7b/llmadamerge/task-wise-cross_entropy-lr0001-ep3-10%dataset/lambda01_wo_scheduler