TensorFlow Lite를 이용한 기기 내 대규모 언어모델 탑재 실습

git clone keras nlp

$ pip install -q git+https://github.com/keras-team/keras-nlp.git@google-io-2023 tensorflow-text==2.12

사전학습된 GPT-2 모델과 글 생성하기

import keras_nlp

gpt2_tokenizer = keras_nlp.models.GPT2Tokenizer.from_preset("gpt2_base_en")
gpt2_preprocessor = keras_nlp.models.GPT2CausalLMPreprocessor.from_preset(
    "gpt2_base_en",
    sequence_length=256,
    add_end_token=True,
)
gpt2_lm = keras_nlp.models.GPT2CausalLM.from_preset("gpt2_base_en", preprocessor=gpt2_preprocessor)

글 생성

start = time.time()

output = gpt2_lm.generate("My trip to Yosemite was", max_length=200)
print("\\nGPT-2 output:")
print(output.numpy().decode("utf-8"))

end = time.time()
print("TOTAL TIME ELAPSED: ", end - start)

[참고] TensorFlowLite (모바일, 마이크로컨트롤러 및 기타 에지 장치에 모델을 배치하기 위한 모바일 라이브러리)

- 먼저 TensorFlow Lite 컨버터를 사용하여 TensorFlow 모델을 보다 콤팩트한 TensorFlow Lite 형식으로 변환
- 변환된 모델을 실행하기 위해 모바일 장치에 매우 최적화된 TensorFlow Lite 인터프리터를 사용
- 변환 프로세스 중에 양자화와 같은 여러 가지 기법을 활용하여 모형을 더욱 최적화하고 추론을 가속화TensorFlow Lite를 사용하기 위한 개발자 워크플로우

3. Convert the generate() function from GPT2CausalLM

generate() 함수를 TensorFlow 콘크리트 함수로 래핑

@tf.function
def generate(prompt, max_length):
    return gpt2_lm.generate(prompt, max_length)

concrete_func = generate.get_concrete_function(tf.TensorSpec([], tf.string), 100)

주어진 입력과 TFLite 모델로 추론을 실행할 도우미 함수를 정의

def run_inference(input, generate_tflite):
  interp = interpreter.InterpreterWithCustomOps(
      model_content=generate_tflite,
      custom_op_registerers=tf_text.tflite_registrar.SELECT_TFTEXT_OPS)
  interp.get_signature_list()

  generator = interp.get_signature_runner('serving_default')
  output = generator(prompt=np.array([input]))
  print("\\nGenerated with TFLite:\\n", output["output_0"])

모델을 변환 (함수 → TFLite)

gpt2_lm.jit_compile = False
converter = tf.lite.TFLiteConverter.from_concrete_functions([concrete_func],
                                                            gpt2_lm)
converter.target_spec.supported_ops = [
  tf.lite.OpsSet.TFLITE_BUILTINS, # enable TensorFlow Lite ops.
  tf.lite.OpsSet.SELECT_TF_OPS # enable TensorFlow ops.
]
converter.allow_custom_ops = True
converter.target_spec.experimental_select_user_tf_ops = ["UnsortedSegmentJoin", "UpperBound"]
converter._experimental_guarantee_all_funcs_one_use = True
generate_tflite = converter.convert()
run_inference("I'm enjoying a", generate_tflite)

4. 모델 최적화 (양자화)

TensorFlowLite는 모델 크기를 줄이고 모델 추론을 가속화하기 위해 양자화라는 최적화 기술을 구현
양자화는 연속 값의 입력을 이산 세트에 매핑하는 프로세스
머신 러닝 컨텍스트에서는 모델 가중치의 32비트 부동 숫자를 보다 효율적인 8비트 정수로 변환하여 모델 크기를 4배 줄이고 현대 하드웨어에서 보다 효율적으로 실행

gpt2_lm.jit_compile = False
converter = tf.lite.TFLiteConverter.from_concrete_functions([concrete_func],
                                                            gpt2_lm)
converter.target_spec.supported_ops = [
  tf.lite.OpsSet.TFLITE_BUILTINS, # enable TensorFlow Lite ops.
  tf.lite.OpsSet.SELECT_TF_OPS # enable TensorFlow ops.
]
converter.allow_custom_ops = True
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.experimental_select_user_tf_ops = ["UnsortedSegmentJoin", "UpperBound"]
converter._experimental_guarantee_all_funcs_one_use = True
quant_generate_tflite = converter.convert()
run_inference("I'm enjoying a", quant_generate_tflite)

저장

with open('autocomplete.tflite', 'wb') as f:
  f.write(quant_generate_tflite)

완성된 APP 안에 탑재

안드로이드 스튜디오로 만든 app 입니다.
- https://drive.google.com/file/d/1lRNhjOfG7zE6PsdGsa1eTxpQocAFaQRu/view?usp=sharing

Text generator : 출력되는데 약 10초 소요
변환 전 모델 크기 :
변환 후 모델 크기 : 500mb

728x90

저작자표시 비영리 변경금지 (새창열림)

'데이터 어쩌구 > 기술 써보기' 카테고리의 다른 글

[2주차] Ice Breaker app 만들기 (1) (2)	2024.01.02
[1주차] 강의 시작 : "Hello World" chain (0)	2023.12.26
[NLP] Negative Log Likelihood (0)	2023.08.28
훈련시 활용한 자원 및 툴 (0)	2023.08.28
[Class 101] 협업 필터링 기반 추천 목록 만들기 (0)	2023.08.27

보리보리쌀 ;p

TensorFlow Lite를 이용한 기기 내 대규모 언어모델 탑재 실습

완성된 APP 안에 탑재

'데이터 어쩌구 > 기술 써보기' 카테고리의 다른 글

티스토리툴바

TensorFlow Lite를 이용한 기기 내 대규모 언어모델 탑재 실습

완성된 APP 안에 탑재

'데이터 어쩌구 > 기술 써보기' 카테고리의 다른 글

관련글

티스토리툴바