Uncategorized

Different prompt tokens betwen OpenAI tokenizer or Azure OpenAI and OPENAI API via python library – API



Hi!

I’m testing the option “bring our own data” to chatGPT and I notice the number of prompt tokens are different between OpenAI Tokenizer or Azure OpenAI and when I using the OpenAI python library (openai==1.7.0 or openai==0.27.7), via API the usage return more 4x or 5x times prompt tokens.

I define the connector AzureCognitiveSearch to search in my documents. Follow my example for chat completion:

message_text = [{“role”:“system”,“content”:“You are an AI assistant that helps people find information.”}, {“role”:“user”,“content”:myQuestion}]

completion = openai.ChatCompletion.create(
engine=“gpt-35-turbo”,
messages = message_text,
temperature=0.7,
max_tokens=800,
top_p=0.95,
frequency_penalty=0,
presence_penalty=0,
stop=None,
extra_body={
“dataSources”: [
{
“type”: “AzureCognitiveSearch”,
“parameters”: {
“endpoint”: ###My Azure AI Search Endpoint,
“key”: “XXXXXXXXXXX”,
“indexName”: “XXXXXXXX-vector-index”,
“embeddingDeploymentName”: “text-embedding-ada-002”,
“embeddingEndpoint”: ###My Azure OpenAI Endpoint ,
“embeddingKey”: “XXXXXXXXXX”,
“fieldsMapping”: {
“contentFields”: [“Content”],
“titleField”: “title”,
“urlField”: “HTML_URL”,
“vectorFields”: [“ContentVector”]
},
“queryType”: “vectorSemanticHybrid”,
“semanticConfiguration”: “XXXXXXXX-semantic-config” ,
“inScope”: False,
“topNDocuments”: 5,
“strictness”: 3
}
}
]
}
)

Does anyone have the same problem or any clues as to why this happened?
I try to check what was passed from the “AzureCognitiveSearch” connector, but I don’t find relevant information.

Thanks!



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *