--- license: mit tags: - generated_from_trainer - code language: - en - es base_model: - deepseek-ai/deepseek-llm-7b-base pipeline_tag: text-generation library_name: transformers datasets: - miguelmejias0512/solidity_personal_dataset --- #### Text Completion ```python import torch from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig model_name = "miguelmejias0512/deepseek-solidity-coder-llm-7b-finetuned" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto") model.generation_config = GenerationConfig.from_pretrained(model_name) model.generation_config.pad_token_id = model.generation_config.eos_token_id text = "An attention function can be described as mapping a query and a set of key-value pairs to an output, where the query, keys, values, and output are all vectors. The output is" inputs = tokenizer(text, return_tensors="pt") outputs = model.generate(**inputs.to(model.device), max_new_tokens=100) result = tokenizer.decode(outputs[0], skip_special_tokens=True) print(result) ``` ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-05 - train_batch_size: 8 - eval_batch_size: 8 - seed: 42 - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: linear - num_epochs: 1 ### Framework versions - Transformers 4.51.3 - Pytorch 2.6.0+cu124 - Datasets 2.18.0 - Tokenizers 0.21.1