All I’m trying to do is using an AI language model to summarize text. The text I’m getting is from an html form, more specifically a textarea element. Likely this would be an article or paragraph they would find online. I’m using the facebook/bart-large-cnn model. I tried other models, but they don’t summarize very well. If there is a better model that is good at summarizing text, let me know, I can try that one. It works, but it can take like 15 to 20 seconds. I’m new to learning AI so sorry if my code isn’t the greatest looking, still learning. How can I make this faster?
The app is using Flask, for the downvoter. This is being done on my local machine so there is no latency.
def summarizer():
form = SummarizeForm(request.form)
if request.method == 'POST' and form.validate():
text = escape(form.text.data)
summary_length = form.summary_length.data.lower() # Convert to lowercase
# Define a maximum word count (adjust as needed)
max_word_count = 3000
model_name="facebook/bart-large-cnn"
model = BartForConditionalGeneration.from_pretrained(model_name)
tokenizer = BartTokenizer.from_pretrained(model_name)
if(tokenizer.model_max_length < max_word_count):
max_word_count = tokenizer.model_max_length
# Check if the word count exceeds the maximum
word_count = len(text.split())
if word_count > max_word_count or word_count > tokenizer.model_max_length:
error=f'The text exceeds the maximum word count of {max_word_count} words.'
return render_template('result.html', text="error", summary=error)
# Calculate min_length based on input text length
word_count = len(text.split())
min_length_factor = 0.1 # Adjust this factor based on your preferences
min_length = int(word_count * min_length_factor)
# Calculate max_length based on summary length
if summary_length == 'short':
min_length = min_length + 50
elif summary_length == 'medium':
min_length = min_length + 150
elif summary_length == 'long':
min_length = min_length + 300
else:
return render_template('result.html', error="Invalid summary length selected.")
max_length = min_length + 300
max_token_limit = tokenizer.model_max_length
inputs = tokenizer.encode("summarize: " + text, return_tensors="pt", max_length=1024, truncation=True)
summary_ids = model.generate(inputs, max_length=max_length, min_length=min_length, length_penalty=2.0, num_beams=4, early_stopping=True, no_repeat_ngram_size=2)
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
return render_template('result.html', text=text, summary=summary)
return render_template('index.html', error="Invalid form submission.")