Bibliometrics is a technique frequently employed to assess the performance of research undertaken by institutions and organizations (Kumar et al. 2021). It is a term derived from the words “biblio,” which refers to evaluation, and “metrics,” which pertains to literature. By scrutinizing variables like article count, citations, and reviews, bibliometrics can provide an accurate assessment of research data (xxx 2021). This type of analysis can be useful in pinpointing areas for future research and providing authors with suggestions on how they can contribute to the field. Performance analysis and scientific mapping (Noyons et al. 1999) are the two basic methods used in bibliometric analysis. Science mapping lays more emphasis on comprehending the links and interconnections between various research features, such as themes and areas, whereas performance analysis concentrates on analysing the specific contributions of research components, such as authors and publications.
4.1 Overview
4.1.1 Basic information of dataset
The Fig. 2 illustared the basic description of 109 studies extracted and filtered from Scopus database and further used in this manuscript. The retrieved information has been categorized using primary data, document contents, authors, authors’ collaborations, and document categories. The selected research includes a total of 1345 references, comprising 65 journal and book sources. On average, each paper cites 13.52 articles and demonstrates an annual growth rate of 42.18%. The papers feature 574 keywords, with 241 unique author keywords from 337 authors. Collaboratively, the data gathering includes 4.33 co-authors and 11.93 foreign co-authors, with three individuals working independently. The publications are classified into different categories, such as book chapters, conference papers, Review papers, Book Chapters, and reviews.
4.1.2 Scientific publications per year
In order to assess changes in apple disease research, the project has collected data on apple leaves covering 11 years, from 2011 to 2022. The study’s accompanying graph demonstrates that the area of apple disease research is continuously evolving, with a noticeable surge in publications beginning in 2018 shown in Fig. 3. It is clear that the topic has gained greater interest recently and is relatively fresh, even though the exact number of publications for 2022 has not yet been calculated (due to the continuing nature of the search).
4.1.3 Three field plot
A Sankey diagram (Riehmann et al. 2005; Koo 2021) was made to graphically show the distribution of research subjects among nations and the recentness of referenced references. The Plot which contains the titles,nation, and Keyword as “TI_TM”,“AU_CO” and “DE” parameters of publication of the cited references, gives a clear picture of the proportion of study subjects for each country as well as the freshness of the papers they mentioned. Figure 4 provides a comprehensive representation of the complex interconnections among three fundamental components of apple leaf disease detection based on machine learning. The left side of the graph displays the 10 most often used title keywords by researchers. The inclusion of specific terms such as “leaf,” “apple,” “disease,” “neural,” “deep,” and “learning” plays a pivotal role in shaping the scholarly literature of this topic, exerting a significant influence on the ongoing discourse and research trajectory. In the intermediate phase, an examination is conducted on the 10 primary nations that contribute the most, including countries such as China, India, and Saudi Arabia, among others. These countries are crucial in the advancement of apple leaf disease diagnostics, highlighting the worldwide scope and importance of this area of study. In this discourse, we examine the foremost 10 methodologies for illness identification, including “Deep Learning,” “Transfer Learning,” “CNN,” “Machine Learning,” “Image Processing,” and “Feature Extraction.” The aforementioned strategies include the array of tools used by researchers to tackle the intricate task of detecting apple leaf diseases via the utilization of sophisticated computer algorithms. The interplay between these three areas is apparent. The choice of title keywords significantly influences the selection of research methodologies, which are often shaped by the research agendas of different nations and areas. Comprehending these relationships is of utmost importance in order to ascertain the direction of future research and devise efficient decision support systems for the control of diseases in apple orchards.
4.2 Sources
4.2.1 Most relevant sources
Sources that are frequently cited and have a high level of impact are considered the most significant in a particular research field. By utilizing these sources, researchers can gain valuable insights and knowledge that can keep them informed about advancements in their field and help them make informed decisions about which sources to prioritize for their own research. The study’s findings highlight the numerous platforms’ most pertinent sources for research publishing. With nine publications included in this study, the analysis shows in Fig. 5 that “Lecture Notes in Networks and Systems” is the most significant source. In addition, “Frontiers in Plant Science,” “Computers and Electronics in Agriculture,” “IEEE Access,” “Sensor,” and “Symmetry” are also considered significant sources for this research field. These findings underscore the importance of utilizing a diverse set of sources to gain a comprehensive understanding of a research topic.
4.2.2 Law of Bradford
Law of Bradford (Thompson and Walker 2015) is a bibliometric principle that describes the distribution of scientific literature among journals. The law posits that, when journals in a particular field are ranked by article productivity, the number of articles published in each journal will be inversely proportional to its rank. This means that the top-ranked journals will have a higher number of published articles than the middle- and lower-ranked journals. The law was named after Samuel C. Bradford, who observed this pattern while examining the distribution of papers in geology journals (Ikpaahindi 1985). The Fig. 6 will illustrate the higher cite source generated by “Lecture Notes in Networks and Systems” with the rank one, Frequency nine, Cumulative Frequency nine and under Zone1 and followed by “Frontiers in Plant Science”, “Computers and Electronics in Agriculture”, “IEEE Access”,“Computers, Materials and Continua”,“Sensors”, “Symmetry” and “2022 IEEE Delhi Selection Conference, Delcon 2022”.
4.2.3 Annual source production
In bibliometric analysis, “source production over time” refers to the number of publications produced within a specific field, by an individual author or group of authors, or by a particular institution during a designated time-frame. This information can be obtained by analyzing bibliographic data from databases such as Scopus or Web of Science, which provide information on the publication date, author, journal, and citation counts of each publication. The presented figure displays the pattern of source production rate over time, with distinct colors indicating different time frames. The graph 7 illustrates a consistent upward trend in the number of articles produced between 2018 to 2022, reaching its maximum value during this period. This indicates a noteworthy level of research activity during these years, possibly influenced by various factors such as technological advancements, increased funding, or emerging research areas (Fig. 7).
4.3 Authors
4.3.1 Most significant authors
The term “most significant author” in bibliometrics refers to an individual who has made a notable and impactful contribution to a specific research field. This determination often relies on bibliographic data and citation analysis, which can reveal an author’s body of work and the extent of their influence. To identify the most significant authors in a field, bibliometric analysis may consider various metrics such as the total number of publications, citation counts, the h-index, and other indicators of scholarly impact. The provided Fig. 8 demonstrates that “LIU B” holds the highest number of publications among all authors featured in the graph, indicating their noteworthy impact in the field of apple leaf disease detection. The remaining authors on the list are arranged in decreasing order of publication count.
4.3.2 Most significant affiliation
To determine the Most Significant Affiliation (MSA) of a research output, bibliometric tools such as the Affiliation Index and the Fractional Count approach are utilized to analyze the affiliations of the authors listed in the publication and estimate the relative contribution of each institution. The Fig. 9 highlights the institution with the maximum affiliation in a particular research area, which is the “Northwestern University of Agricultural and Forestry Science and Technology,” having collaborated on 59 articles. In contrast, the “JILIN AGRICULTURAL UNIVERSITY” and “THIAGARAJAR COLLEGE OF ENGINEERING” had the minimum affiliation, with only 8 articles published in the same research area.
4.3.3 Most significant cited countries
The term “Most Significant Cited Countries” (MSCC) pertains to countries that have made the most substantial contributions to the scholarly literature in a particular field. The identification of MSCC is based on the frequency with which scholars from that country are cited in academic works within that area of study. To determine the MSCC, citation patterns within a specific study area, such as a discipline or subfield, are analyzed using various bibliometric methods, such as Scopus and the Essential Science Indicators (ESI) database.
Based on the citation breakdown by country in the provided Fig. 10, “China” emerged as the leading country with the highest number of citations, having contributed to 323 published articles. “India” came in second place with 248 citations, followed by “Pakistan”, “ Turkey”, “Spain”, “Indonesia”, “Norway”, “Saudi Arabia” and “Mexico”.
4.4 Document
4.4.1 Most frequent words
In bibliometric analysis, stopwords and punctuation are often removed from the text data like in other text analysis methods in order to determine the most frequently occurring words. After counting the remaining words, the most frequent words are determined. Numerous textual pieces of information, including publication abstracts or full-text articles, are frequently analysed as part of bibliometric study. The most popular themes, keywords, or study areas may be found within this data by using the most frequent terms. Deep leaning, apple leaf disease detection, malus, precision agriculture, and feature extraction were among the terms used in this study. The phrases “Deep Learning” and “Apple Leaf Disease Detection” with occurrence numbers of 64 and 41, respectively, have the highest frequency, per the results shown in Fig. 11.
4.4.2 Word cloud
A word cloud (Heimerl et al. 2014) is a graphic depiction of the words that appear most frequently in a corpus, dataset, or collection of texts. Word clouds (Heimerl et al. 2014) can be used in bibliometrics to examine the main themes, subjects, and areas of study within a certain field or discipline. The most popular terms are graphically represented as word clouds, with the size of each word corresponding to its frequency in the text corpus. Use software tools like Wordle, TagCrowd, or Voyant Tools to do this.
Figure 12 uses two alternative word clouds created by selecting 20 keywords based on the title and author’s keywords.
4.4.3 Word tree map
A rectangular form of hierarchical data may be shown using a treemap (Scheibel et al. 2020), a visualisation tool. The rectangles that make up the treemap each indicate a particular value or amount by virtue of their size. The rectangles are arranged in a hierarchy, with each rectangle broken into smaller rectangles to symbolise each subcategory under the parent category. The illustration showed a treemap of several keywords, each specified by a rectangle-shaped image in a distinct colour. The Fig. 13 included terms like leaf (which occurred 84 times), apple (which occurred 78 times), and soon.
4.5 Conceptual structure
4.5.1 Co-occurennce all keywords analysis
The underlying themes and subjects within a certain text corpus or dataset can be discovered by analysing the co-occurrence (Hosseini et al. 2021) of keywords or phrases. Additionally, it can be useful to find connections between various phrases or keywords, such as synonyms or associated ideas, which can assist to hone search terms and increase the precision of keyword-based searches (Fig. 14, Table 2).
The effective method of grouping related objects or entities based on shared traits or qualities is known as cluster analysis. Cluster analysis is employed in network visualisation maps to locate node clusters that are tightly related to one another and to show these nodes as separate clusters. According to the research, just 43 of the total 1466 keywords had at least 10 occurrences, which was the cutoff point. The links and correlations among these 43 keywords were highlighted by further grouping them into 3 groups. Three primary clusters are depicted in the illustration 14 as “Red,” “Green,” and “Blue” bubbles. There are ten things in the red colour bubble, including “Convolutional Neural Network,” “Deep Neural Network,” “Leaf Disease,” and “Convolution.” Twelve components, including “Segmentation,” “Extraction,” “Feature Extraction,” “CNN,” “Textures,” and “Recognition Accuracy,” are included in the green colour bubble. Nine objects, including “Crops,” “Machine Learning,” etc., are contained in the blue colour bubble.
4.5.2 Co-occurrence all index keywords analysis
A technique used in computational linguistics to find and quantify the frequency of co-occurrence of words (Pourhatami et al. 2021) or phrases within a particular corpus of text is called co-occurrence index keywords analysis. The co-occurrence index is derived by dividing the total co-occurrences of two words within a given word frame by the product of the individual frequencies of the two words. The Fig. 15 depict the maximum occurrence of deep learning, features extraction, plants disease and many more.
4.5.3 Thematic map
The term “thematic map” (Yu and Muñoz 2020) refers to a sort of map that presents geographic data about a particular theme or topic rather than merely physical or political elements. Thematic maps are useful for displaying a variety of facts on a certain topic, such as population density, economic activity, or climatic trends. A range of visual components, including colours, symbols, and shading, are frequently used in the representation of data in thematic maps (Forrest 2015). To illustrate varying amounts of a variable, such as income or temperature, over several geographic locations, a choropleth map can, for instance, utilise various colour tones.
The Fig. 16 explore the concept of four themes such as “Basic”, “Motor”, “Nache” and “Emerging or Declined”. The each instance identified with round shaped different color bubbles. The Basic theme contains “Leaf Apple Disease” and “Detection Learning classification”. The Motor theme contains “Approach”, “Agriculture” and “Artificial”. The Niche theme described “Rust”,“Detecting”, “Images”, “Category”, “Multi” and “Rapid”. The Emerging theme descrubed about “Leaves”, “Data” and “Learning Based”.