ํ‹ฐ์Šคํ† ๋ฆฌ ๋ทฐ

 

๋‹ค์„ฏ๋ฒˆ์งธ ์„น์…˜

RAG ๊ธฐ๋ฐ˜ ๋Œ€ํ™”ํ˜• ์ฑ—๋ด‡ ๋งŒ๋“ค๊ธฐ์ž…๋‹ˆ๋‹ค.

 

 

ํ•„์š”ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋“ค์ž…๋‹ˆ๋‹ค.

!pip install streamlit-chat
!pip install streamlit
!pip install langchain
!pip install faiss-cpu

 

์„ค์น˜ ์ดํ›„ ์ฃผ์„์ฒ˜๋ฆฌ ํ•ด์ค๋‹ˆ๋‹ค

 

 

 

RAG์— ์“ฐ์ผ ๋ฐ์ดํ„ฐ๋Š” PDF์ž…๋‹ˆ๋‹ค.

PDF๋ฅผ loadํ•œ ๋’ค, ๊ฐ€์ ธ์˜จ ํ…์ŠคํŠธ๋ฅผ ์ž„๋ฒ ๋”ฉ ์ฒ˜๋ฆฌ ์ดํ›„ ํŒŒ์ด์‹œ์Šค์— ์ €์žฅํ•˜๋Š” ์ฝ”๋“œ๋ฅผ ์ž‘์„ฑํ•ฉ๋‹ˆ๋‹ค.

 

 

 

ํ•„์š”ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ๋ถˆ๋Ÿฌ์˜ค๊ณ , open ai key๋ฅผ ์ž…๋ ฅํ•ฉ๋‹ˆ๋‹ค sk-

import streamlit as st
from streamlit_chat import message
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.chat_models import ChatOpenAI
from langchain.chains import ConversationalRetrievalChain
from langchain.vectorstores import FAISS
import tempfile
from langchain.document_loaders import PyPDFLoader

import os
os.environ["OPENAI_API_KEY"] = "sk" #openai ํ‚ค ์ž…๋ ฅ

uploaded_file = st.sidebar.file_uploader("upload", type="pdf")

 

๊ทธ๋ฆฌ๊ณ  pdf๋ฅผ upload ํ•˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค

 

 

 

์ดํ›„

 

๊ฐ€์ ธ์˜จ data(pdf)๋ฅผ  embding ์ฒ˜๋ฆฌ ์ดํ›„ FAISS vectorDB์— ์ €์žฅํ•ด์ค๋‹ˆ๋‹ค

if uploaded_file :
    with tempfile.NamedTemporaryFile(delete=False) as tmp_file:
        tmp_file.write(uploaded_file.getvalue())
        tmp_file_path = tmp_file.name
    
    loader = PyPDFLoader(tmp_file_path)
    data = loader.load()

    embeddings = OpenAIEmbeddings()
    vectors = FAISS.from_documents(data, embeddings)

    chain = ConversationalRetrievalChain.from_llm(llm = ChatOpenAI(temperature=0.0,model_name='gpt-4'), retriever=vectors.as_retriever())

    def conversational_chat(query):  #๋ฌธ๋งฅ ์œ ์ง€๋ฅผ ์œ„ํ•ด ๊ณผ๊ฑฐ ๋Œ€ํ™” ์ €์žฅ ์ด๋ ฅ์— ๋Œ€ํ•œ ์ฒ˜๋ฆฌ      
        result = chain({"question": query, "chat_history": st.session_state['history']})
        st.session_state['history'].append((query, result["answer"]))        
        return result["answer"]
    
    if 'history' not in st.session_state:
        st.session_state['history'] = []

    if 'generated' not in st.session_state:
        st.session_state['generated'] = ["์•ˆ๋…•ํ•˜์„ธ์š”! " + uploaded_file.name + "์— ๊ด€ํ•ด ์งˆ๋ฌธ์ฃผ์„ธ์š”."]

    if 'past' not in st.session_state:
        st.session_state['past'] = ["์•ˆ๋…•ํ•˜์„ธ์š”!"]
        
    #์ฑ—๋ด‡ ์ด๋ ฅ์— ๋Œ€ํ•œ ์ปจํ…Œ์ด๋„ˆ
    response_container = st.container()
    #์‚ฌ์šฉ์ž๊ฐ€ ์ž…๋ ฅํ•œ ๋ฌธ์žฅ์— ๋Œ€ํ•œ ์ปจํ…Œ์ด๋„ˆ
    container = st.container()

    with container: #๋Œ€ํ™” ๋‚ด์šฉ ์ €์žฅ(๊ธฐ์–ต)
        with st.form(key='Conv_Question', clear_on_submit=True):           
            user_input = st.text_input("Query:", placeholder="PDFํŒŒ์ผ์— ๋Œ€ํ•ด ์–˜๊ธฐํ•ด๋ณผ๊นŒ์š”? (:", key='input')
            submit_button = st.form_submit_button(label='Send')
            
        if submit_button and user_input:
            output = conversational_chat(user_input)
            
            st.session_state['past'].append(user_input)
            st.session_state['generated'].append(output)

    if st.session_state['generated']:
        with response_container:
            for i in range(len(st.session_state['generated'])):
                message(st.session_state["past"][i], is_user=True, key=str(i) + '_user', avatar_style = "fun-emoji", seed = "Nala")
                message(st.session_state["generated"][i], key=str(i), avatar_style = "bottts", seed = "Fluffy")

 

์‚ฌ์šฉ์ž๊ฐ€ query๋ฅผ ์ œ์ถœํ•˜๋ฉด conversational_chat() ๋ฉ”์†Œ๋“œ๊ฐ€ ํ˜ธ์ถœ๋ฉ๋‹ˆ๋‹ค.

๊ทธ๋ฆฌ๊ณ  history ๋ฆฌ์ŠคํŠธ์— (query, answer) ํ˜•ํƒœ๋กœ ๊ฐ’์ด ์ €์žฅ๋˜๊ณ ,

history๋ฅผ ํ†ตํ•ด ๋Œ€ํ™”๊ฐ€ ๊ธฐ๋ก๋˜๊ณ  ๊ธฐ๋ก๋œ ๋Œ€ํ™”๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ฌธ๋‹ต์ด ์ด์–ด์ง‘๋‹ˆ๋‹ค.

 

- past, generated๋Š” streamlit ui์—์„œ ์ฒ˜๋ฆฌ๋˜๋Š” ๋ฌธ๋‹ต ๋ฐฐ์—ด

 

 

์ดํ›„ streamlit์—์„œ ์‹คํ–‰ํ•˜๋ฉด

 

 

์ž˜ ๋‚˜์˜ค๋Š” ๋ชจ์Šต

 

 

 


 

 

 

 

์—ฌ์„ฏ๋ฒˆ์งธ ์„น์…˜

๋ฒˆ์—ญ ์„œ๋น„์Šค ๋งŒ๋“ค๊ธฐ ์ž…๋‹ˆ๋‹ค.

 

 

๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์„ค์น˜ํ•ฉ๋‹ˆ๋‹ค

 

!pip install langchain
!pip install streamlit
!pip install openai

 

 

์„ค์น˜๊ฐ€ ๋๋‚˜๋ฉด ์ฃผ์„์ฒ˜๋ฆฌ ํ•ด์ค๋‹ˆ๋‹ค

 

 

์ดํ›„

 

 

๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ๊ฐ€์ ธ์˜ค๊ณ  api ํ‚ค๋ฅผ ๋„ฃ์–ด์ค๋‹ˆ๋‹ค.

import streamlit as st
from langchain.chat_models import ChatOpenAI
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain.memory import ConversationBufferMemory
import os
os.environ["OPENAI_API_KEY"] = "sk" #openai ํ‚ค ์ž…๋ ฅ

 

 

 

๊ทธ๋ฆฌ๊ณ 

ํ•ด๋‹น ์งˆ๋ฌธ์„ ํŠน์ • ์–ธ์–ด๋กœ ๋ฒˆ์—ญํ•ด๋‹ฌ๋ผํ•˜๋Š” template์œผ๋กœ

prompt๋ฅผ ๋ฒˆ์—ญํ•˜๋„๋ก ์ง€์‹œํ•ฉ๋‹ˆ๋‹ค.

 

# ์›นํŽ˜์ด์ง€์— ๋ณด์—ฌ์งˆ ๋‚ด์šฉ
langs = ["Korean", "Japanese", "chinese", "English"]  #๋ฒˆ์—ญ์„ ํ•  ์–ธ์–ด๋ฅผ ๋‚˜์—ด
left_co, cent_co,last_co = st.columns(3)

#์›นํŽ˜์ด์ง€ ์™ผ์ชฝ์— ์–ธ์–ด๋ฅผ ์„ ํƒํ•  ์ˆ˜ ์žˆ๋Š” ๋ผ๋””์˜ค ๋ฒ„ํŠผ 
with st.sidebar:
     language = st.radio('๋ฒˆ์—ญ์„ ์›ํ•˜๋Š” ์–ธ์–ด๋ฅผ ์„ ํƒํ•ด์ฃผ์„ธ์š”.:', langs)

st.markdown('### ์–ธ์–ด ๋ฒˆ์—ญ ์„œ๋น„์Šค์˜ˆ์š”~')
prompt = st.text_input('๋ฒˆ์—ญ์„ ์›ํ•˜๋Š” ํ…์ŠคํŠธ๋ฅผ ์ž…๋ ฅํ•˜์„ธ์š”')  #์‚ฌ์šฉ์ž์˜ ํ…์ŠคํŠธ ์ž…๋ ฅ

trans_template = PromptTemplate(
    input_variables=['trans'],
    template='Your task is to translate this text to ' + language + 'TEXT: {trans}'
)  #ํ•ด๋‹น ์„œ๋น„์Šค๊ฐ€ ๋ฒˆ์—ญ์— ๋Œ€ํ•œ ๊ฒƒ์ž„์„ ์ง€์‹œ

#momory๋Š” ํ…์ŠคํŠธ ์ €์žฅ ์šฉ๋„
memory = ConversationBufferMemory(input_key='trans', memory_key='chat_history')

llm = ChatOpenAI(temperature=0.0,model_name='gpt-4')
trans_chain = LLMChain(llm=llm, prompt=trans_template, verbose=True, output_key='translate', memory=memory)

# ํ”„๋กฌํ”„ํŠธ(trans_template)๊ฐ€ ์žˆ์œผ๋ฉด ์ด๋ฅผ ์ฒ˜๋ฆฌํ•˜๊ณ  ํ™”๋ฉด์— ์‘๋‹ต์„ ์ž‘์„ฑ
if st.button("๋ฒˆ์—ญ"):
    if prompt:
        response = trans_chain({'trans': prompt})
        st.info(response['translate'])

 

 

๊ทธ๋ฆฌ๊ณ 

ConversationBufferMemory๋กœ ๋Œ€ํ™” ๋ฌธ๋งฅ์„ ์ €์žฅํ•˜์—ฌ LLMChain์˜ memory์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ ๋„ฃ์–ด์ค๋‹ˆ๋‹ค

๋‹ต๋ณ€(๋ฒˆ์—ญ)ํ• ๋•Œ ๋Œ€ํ™” ๋ฌธ๋งฅ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•ด ์ž‘์„ฑ๋œ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

 

 

 

 

์ž˜ ๋™์ž‘ํ•˜๋Š” ๋ชจ์Šต

 

 


 

 

์ผ๊ณฑ๋ฒˆ์งธ ์„น์…˜

๋ฉ”์ผ ๋Œ€์‹  ์ž‘์„ฑํ•ด์ฃผ๋Š” ํ”„๋กœ๊ทธ๋žจ์ž…๋‹ˆ๋‹ค.

 

 

๋จผ์ € ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์„ค์น˜ํ•ด์ค€ ๋’ค

!pip install streamlit
!pip install langchain
!pip install openai

 

์ฃผ์„์ฒ˜๋ฆฌํ•ด์ค๋‹ˆ๋‹ค (์ŠคํŠธ๋ฆผ๋ฆฟ์œผ๋กœ ์‹คํ–‰ํ•ด์•ผํ•ด์„œ)

 

 

 

 

์ดํ›„ open ai key๋ฅผ ๋“ฑ๋กํ•ด์ฃผ๊ณ 

import streamlit as st
import os
os.environ["OPENAI_API_KEY"] = "sk" #openai ํ‚ค ์ž…๋ ฅ

st.set_page_config(page_title="์ด๋ฉ”์ผ ์ž‘์„ฑ ์„œ๋น„์Šค์˜ˆ์š”~", page_icon=":robot:")
st.header("์ด๋ฉ”์ผ ์ž‘์„ฑ๊ธฐ")

 

streamlit์˜ header์™€ page title์„ ์ž‘์„ฑํ•ด์ค๋‹ˆ๋‹ค.

 

 

 

 

์–ด๋–ค ๋‚ด์šฉ์œผ๋กœ ์ด๋ฉ”์ผ์„ ์ž‘์„ฑํ• ์ง€ ์ž…๋ ฅ๋ฐ›๋Š” ์ฝ”๋“œ๋ฅผ ์ž‘์„ฑํ•ฉ๋‹ˆ๋‹ค

def getEmail():
    input_text = st.text_area(label="๋ฉ”์ผ ์ž…๋ ฅ", label_visibility='collapsed',
                              placeholder="์–ด๋–ค ๋ฉ”์ผ์„ ์ž‘์„ฑํ•˜์‹œ๊ฒ ์Šต๋‹ˆ๊นŒ?", key="input_text")
    return input_text

input_text = getEmail()

 

 

 

PromptTemplate์— ์‚ฌ์šฉ๋  ํ…œํ”Œ๋ฆฟ์„ ์ž‘์„ฑํ•ด์ฃผ๊ณ 

(๋ฉ”์ผ์„ ์ž‘์„ฑํ•ด๋‹ฌ๋ผ๋Š” ํ…œํ”Œ๋ฆฟ)

# ์ด๋ฉ”์ผ ๋ณ€ํ™˜ ์ž‘์—…์„ ์œ„ํ•œ ํ…œํ”Œ๋ฆฟ ์ •์˜
query_template = """
    ๋ฉ”์ผ์„ ์ž‘์„ฑํ•ด์ฃผ์„ธ์š”.    
    ์•„๋ž˜๋Š” ์ด๋ฉ”์ผ์ž…๋‹ˆ๋‹ค:
    ์ด๋ฉ”์ผ: {email}
"""

 

 

 

 

PromptTemplate ์ธ์Šคํ„ด์Šค๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค

from langchain import PromptTemplate
# PromptTemplate ์ธ์Šคํ„ด์Šค ์ƒ์„ฑ
prompt = PromptTemplate(
    input_variables=["email"],
    template=query_template,
)

 

 

 

๊ทธ๋ฆฌ๊ณ  gpt llm์„ ๋ฐ˜ํ™˜ํ•˜๋Š” ๋ฉ”์†Œ๋“œ๋ฅผ ์ž‘์„ฑํ•ฉ๋‹ˆ๋‹ค

from langchain.chat_models import ChatOpenAI
# ์–ธ์–ด ๋ชจ๋ธ์„ ํ˜ธ์ถœํ•ฉ๋‹ˆ๋‹ค
def loadLanguageModel():
    llm = ChatOpenAI(temperature=0.0,model_name='gpt-4')
    return llm

 

 

 

 

 

์ดํ›„ ์ž…๋ ฅ๋ฐ›์€ ๋‚ด์šฉ์„ template ํ˜•ํƒœ๋กœ ํ•˜์—ฌ llm prompt๋กœ ํ˜ธ์ถœํ•ฉ๋‹ˆ๋‹ค

# ์˜ˆ์‹œ ์ด๋ฉ”์ผ์„ ํ‘œ์‹œ
st.button("*์˜ˆ์ œ๋ฅผ ๋ณด์—ฌ์ฃผ์„ธ์š”*", type='secondary', help="๋ด‡์ด ์ž‘์„ฑํ•œ ๋ฉ”์ผ์„ ํ™•์ธํ•ด๋ณด์„ธ์š”.")
st.markdown("### ๋ด‡์ด ์ž‘์„ฑํ•œ ๋ฉ”์ผ์€:")

if input_text:
    llm = loadLanguageModel()
    # PromptTemplate ๋ฐ ์–ธ์–ด ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ด๋ฉ”์ผ ํ˜•์‹์„ ์ง€์ •
    prompt_with_email = prompt.format(email=input_text)
    formatted_email = llm.predict(prompt_with_email)
    # ์„œ์‹์ด ์ง€์ •๋œ ์ด๋ฉ”์ผ ํ‘œ์‹œ
    st.write(formatted_email)

 

 

 

์ดํ›„ streamlit ์œผ๋กœ ์‹คํ–‰ํ•˜๋ฉด

 

 

 

์ž˜ ๋‚˜์˜ค๋Š” ๋ชจ์Šต

 

 

 


 

 

์—ฌ๋Ÿ๋ฒˆ์งธ ์„น์…˜

CSVํŒŒ์ผ ๋ถ„์„ ์„œ๋น„์Šค ๋งŒ๋“ค๊ธฐ ์ž…๋‹ˆ๋‹ค.

 

 

๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์„ค์น˜ํ•ฉ๋‹ˆ๋‹ค

!pip install langchain-experimental
!pip install tabulate
!pip install pandas
!pip install openai

 

pandas๋Š” ๋ฐ์ดํ„ฐ ์กฐ์ž‘ ๋ถ„์„ ์šฉ๋„,

tabulate๋Š” ํ…Œ์ด๋ธ” ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€๋…์„ฑ ์ข‹๊ฒŒ ์ถœ๋ ฅํ•˜๋Š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์ž…๋‹ˆ๋‹ค

 

 

csv ํŒŒ์ผ์„ ๊ฒฝ๋กœ์— ๋งž๊ฒŒ ์„ค์ •ํ•ด์ค๋‹ˆ๋‹ค.

import pandas as pd #ํŒŒ์ด์ฌ ์–ธ์–ด๋กœ ์ž‘์„ฑ๋œ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„์„ ๋ฐ ์กฐ์ž‘ํ•˜๊ธฐ ์œ„ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ

#csv ํŒŒ์ผ์„ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์œผ๋กœ ๊ฐ€์ ธ์˜ค๊ธฐ
df = df = pd.read_csv('Order_v3.csv', encoding='ISO-8859-1') # ํŒŒ์ผ์ด ์œ„์น˜ํ•œ ๊ฒฝ๋กœ ์ง€์ •
df.head()

 

 

 

key๋ฅผ ์ž…๋ ฅํ•ด์ฃผ๊ณ  pandas langchain agent๋ฅผ  ์ƒ์„ฑํ•˜๋Š” ์ฝ”๋“œ๋ฅผ ์ž‘์„ฑํ•ฉ๋‹ˆ๋‹ค.

from langchain_experimental.agents.agent_toolkits import create_pandas_dataframe_agent
from langchain.chat_models import ChatOpenAI
from langchain.agents.agent_types import AgentType
import os
os.environ["OPENAI_API_KEY"] = "sk" #openai ํ‚ค ์ž…๋ ฅ

#์—์ด์ „ํŠธ ์ƒ์„ฑ
agent = create_pandas_dataframe_agent(
    ChatOpenAI(temperature=0.7, model='gpt-4o'),  #gpt-3.5-turbo ์‚ฌ์šฉ
    df,             #๋ฐ์ดํ„ฐ๊ฐ€ ๋‹ด๊ธด ๊ณณ
    verbose=False,  #์ถ”๋ก  ๊ณผ์ •์„ ์ถœ๋ ฅํ•˜์ง€ ์•Š์Œ
    agent_type=AgentType.OPENAI_FUNCTIONS, 
    allow_dangerous_code=True,
)

 

 

 

๊ทธ๋ฆฌ๊ณ  jupyter notebook์—์„œ

agent.run('์–ด๋–ค csv ํŒŒ์ผ์ธ์ง€ ์š”์•ฝํ•ด์ค˜')

 

์งˆ๋ฌธ์„ ๋‹ด์•„ agent ๋ฅผ ์‹คํ–‰ํ•ด์ค๋‹ˆ๋‹ค

 

 

 

 

์ž˜ ๋‚˜์˜ค๋Š” ๋ชจ์Šต

๋Œ“๊ธ€