Uncategorized

artificial intelligence – My general text summarizer written in Python is too slow, how to improve speed?


All I’m trying to do is using an AI language model to summarize text. The text I’m getting is from an html form, more specifically a textarea element. Likely this would be an article or paragraph they would find online. I’m using the facebook/bart-large-cnn model. I tried other models, but they don’t summarize very well. If there is a better model that is good at summarizing text, let me know, I can try that one. It works, but it can take like 15 to 20 seconds. I’m new to learning AI so sorry if my code isn’t the greatest looking, still learning. How can I make this faster?

The app is using Flask, for the downvoter. This is being done on my local machine so there is no latency.

def summarizer():
    form = SummarizeForm(request.form)
    if request.method == 'POST' and form.validate():
        text = escape(form.text.data)
        summary_length = form.summary_length.data.lower()  # Convert to lowercase

        # Define a maximum word count (adjust as needed)
        max_word_count = 3000

        model_name="facebook/bart-large-cnn"
        model = BartForConditionalGeneration.from_pretrained(model_name)
        tokenizer = BartTokenizer.from_pretrained(model_name)

        if(tokenizer.model_max_length < max_word_count):
            max_word_count = tokenizer.model_max_length

        # Check if the word count exceeds the maximum
        word_count = len(text.split())
        if word_count > max_word_count or word_count > tokenizer.model_max_length:
            error=f'The text exceeds the maximum word count of {max_word_count} words.'
            return render_template('result.html', text="error", summary=error)

        # Calculate min_length based on input text length
        word_count = len(text.split())
        min_length_factor = 0.1  # Adjust this factor based on your preferences
        min_length = int(word_count * min_length_factor)

        # Calculate max_length based on summary length
        if summary_length == 'short':
            min_length = min_length + 50
        elif summary_length == 'medium':
            min_length = min_length + 150
        elif summary_length == 'long':
            min_length = min_length + 300
        else:
            return render_template('result.html', error="Invalid summary length selected.")

        max_length = min_length + 300
        max_token_limit = tokenizer.model_max_length

        inputs = tokenizer.encode("summarize: " + text, return_tensors="pt", max_length=1024, truncation=True)
        summary_ids = model.generate(inputs, max_length=max_length, min_length=min_length, length_penalty=2.0, num_beams=4, early_stopping=True, no_repeat_ngram_size=2)
        summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)

        return render_template('result.html', text=text, summary=summary)

    return render_template('index.html', error="Invalid form submission.")



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *