VectorDB : Milvus

벡터DB 오픈소스

Milvus
Chroma
Weaviate

사용해보기

Milvus https://milvus.io/docs/example_code.md

설치

wget https://github.com/milvus-io/milvus/releases/download/v2.3.4/milvus-standalone-docker-compose-gpu.yml -O docker-compose.yml

실행 (docker-compose)

docker compose up -d

docker compose ps

에러…

Error response from daemon: driver failed programming external connectivity on endpoint milvus-minio (5ee5bd584bae5eaa1d26cbca38cc5799786490e0807ce42529f7c2360e23ec9c): Error starting userland proxy: listen tcp4 0.0.0.0:9001: bind: address already in use

240223 재시도

참고

Vector Database(Milvus) 도입 검토를 위한 Quick Start

기존 에러는 주소 사용 에러여서 → 포트 주소를 바꿔서 로컬에 점유하고 있던 9001, 9000 포트를 각각 9991, 9990으로 수정하여 실행

PyMilvus 접속 및 구조 설계

milvus-standalone 서비스 포트 19530 포트에 pymilvus.connections.connect() 함수를 이용해 접속

collection 생성 (RDB의 테이블과 비슷)

생성을 위한 스키마 구조 설계

# 예시
# We're going to create a collection with 3 fields.
# +-+------------+------------+------------------+------------------------------+
# | | field name | field type | other attributes |       field description      |
# +-+------------+------------+------------------+------------------------------+
# |1|    "pk"    |   VarChar  |  is_primary=True |      "primary field"         |
# | |            |            |   auto_id=False  |                              |
# +-+------------+------------+------------------+------------------------------+
# |2|  "random"  |    Double  |                  |      "a double field"        |
# +-+------------+------------+------------------+------------------------------+
# |3|"embeddings"| FloatVector|     dim=8        |  "float vector with dim 8"   |
# +-+------------+------------+------------------+------------------------------+

코드 실행

fields = [
  FieldSchema(name="pk", dtype=DataType.INT64, is_primary=True, auto_id=False),
  FieldSchema(name="random", dtype=DataType.DOUBLE),
  FieldSchema(name="embeddings", dtype=DataType.FLOAT_VECTOR, dim=8)
]
schema = CollectionSchema(fields, description="hello_milvus is the simplest demo to introduce the APIs")
hello_milvus = Collection("hello_milvus", schema)

collection 데이터 삽입

랜덤하게 생성한 샘플링 데이터 삽입

entities = [
  [i for i in range(3000)],  # field pk
  [float(random.randrange(-20, -10)) for _ in range(3000)],  # field random
  [[random.random() for _ in range(8)] for _ in range(3000)],  # field embeddings
]
insert_result = hello_milvus.insert(entities)
hello_milvus.flush()

검색 기능을 위한 준비

Index 생성
- 위에서 생성한 collection을 index에 맵핑해야함.
```
index = {
  "index_type": "IVF_FLAT",
  "metric_type": "L2",
  "params": {"nlist": 128},
}

hello_milvus.create_index("embeddings", index)
```
- index_type: IVF_FLAT (VF_FLAT divides vector data into nlist cluster units)
- metric_type: L2 (Euclidean Distance)
- params
  - nlist: number of cluster unit (1~65536, default=128)

Index 불러오기 & 유사 결과 검색

  vectors_to_search = entities[-1][-2:]
      # array([[0.64756871, 0.01197686, 0.5924749 , 0.80828321, 0.70348865,
      #    0.27177397, 0.90567045, 0.67848907],
      #   [0.31415601, 0.15629844, 0.85363175, 0.77644426, 0.0821534 ,
      #    0.07103991, 0.25484217, 0.46181565]])
  search_params = {
      "metric_type": "L2",
      "params": {"nprobe": 10},
  }
  result = hello_milvus.search(vectors_to_search, anns_field="embeddings", param=search_params, limit=3, output_fields=["random"]) # expr="random > 0.5"

3000개에서 유사한 결과 3개 검색하는데 0.3460s 걸렸다
제약조건 expr="random > 0.5” 이런거 추가하면 훨씬 빨라짐

삭제


###############################################################################
# 6. delete entities by PK
# You can delete entities by their PK values using boolean expressions.
ids = insert_result.primary_keys

expr = f'pk in ["{ids[0]}" , "{ids[1]}"]'
print(fmt.format(f"Start deleting with expr `{expr}`"))

result = hello_milvus.query(expr=expr, output_fields=["random", "embeddings"])
print(f"query before delete by expr=`{expr}` -> result: \n-{result[0]}\n-{result[1]}\n")

hello_milvus.delete(expr)

result = hello_milvus.query(expr=expr, output_fields=["random", "embeddings"])
print(f"query after delete by expr=`{expr}` -> result: {result}\n")

###############################################################################
# 7. drop collection
# Finally, drop the hello_milvus collection
print(fmt.format("Drop collection `hello_milvus`"))
utility.drop_collection("hello_milvus")

(이하생략)

728x90

저작자표시 비영리 변경금지

'코딩 어쩌구 > Data' 카테고리의 다른 글

Multimodal Data Processing (0)	2024.03.25
(트랜잭션) CAP theorem -> ACID vs BASE DBs (0)	2024.03.15
[풀스택] Database : SQL (0)	2022.02.03
[SQL] 데이터 조작 언어(DML) 기본 (0)	2021.11.05
[SQL] MySQL 사용해보기 (0)	2021.10.29

보리보리쌀 ;p

VectorDB : Milvus

벡터DB 오픈소스

사용해보기

설치

실행 (docker-compose)

에러…

PyMilvus 접속 및 구조 설계

검색 기능을 위한 준비

삭제

'코딩 어쩌구 > Data' 카테고리의 다른 글

티스토리툴바

VectorDB : Milvus

벡터DB 오픈소스

사용해보기

설치

실행 (docker-compose)

에러…

PyMilvus 접속 및 구조 설계

검색 기능을 위한 준비

삭제

'코딩 어쩌구 > Data' 카테고리의 다른 글

관련글

티스토리툴바