Quantcast
Channel: Visual C# forum
Viewing all articles
Browse latest Browse all 31927

How can i cancel/puase/resume a backgroundworker and/or a recursive loop ?

$
0
0

Depending if i need to cancel/puase/resume also the backgroundworker or only the recursive loop.

First this is the class where i have the recursive loop:

public List<string> webCrawler(string mainUrl, int levels)
        {
                // to make config. to all variables here and in the offline function !
            // CancelAsync to abort the process to return without doinf the work return back without return anything.
            // To check about timeout when loading url/s
            // to check the site familymediation.co.il
            // *** To save/keep all settings like url change and levels to crawl and all checkboxes and options in designer to keep/save while program is running *** \\
            //levels = levelsToCrawl;
            List<string> csFiles = new List<string>();
            wc = new System.Net.WebClient();
                HtmlWeb hw = new HtmlWeb();
                List<string> webSites;
                csFiles.Add("temp string to know that something is happening in level = " + levels.ToString());
                csFiles.Add("current site name in this level is : " + mainUrl);//wccfg.url);
                /* later should be replaced with real cs files .. cs files links..*/
                try
                {
                    HtmlAgilityPack.HtmlDocument doc = TimeOut.getHtmlDocumentWebClient(mainUrl, false, "", 0, "", "");
                    if (doc == null)
                    {
                        failed = true;
                        wccfg.failedUrls++;
                        failed = false;
                    }
                    else
                    {
                        done = true;
                        // progress should be reported here I guess
                        Object[] temp_arr = new Object[8];
                        temp_arr[0] = csFiles;
                        temp_arr[1] = mainUrl;
                        temp_arr[2] = levels;
                        temp_arr[3] = currentCrawlingSite;
                        temp_arr[4] = sitesToCrawl;
                        temp_arr[5] = done;
                        temp_arr[6] = wccfg.failedUrls;
                        temp_arr[7] = failed;

                        OnProgressEvent(temp_arr);

                        /* if (doc == null)
                         {
                             this.Invoke(new MethodInvoker(delegate { Texts(richTextBox1, " Check The Link" + Environment.NewLine, Color.Green); }));
                             return csFiles;
                         }*/
                        //string html = doc.DocumentNode.InnerHtml;
                        //get Text
                        //string pageText = doc.DocumentNode.InnerText;
                        //doc = hw.Load(url);
                        currentCrawlingSite.Add(mainUrl);
                        webSites = getLinks(doc);


                        removeDupes(webSites);
                        removeDuplicates(webSites, currentCrawlingSite);
                        removeDuplicates(webSites, sitesToCrawl); // to add a filter to check in the List webSites if there are any links to files like photos: jpg bmp gif and more.
                        // if in the List there are any images links to download them in the filter method as files and then remove them from the List webSites.s
                        if (wccfg.removeext == true)
                        {
                            for (int i = 0; i < webSites.Count; i++)
                            {
                                webSites.Remove(removeExternals(webSites, mainUrl, wccfg.localy));
                            }
                        }
                        if (wccfg.downloadcontent == true)
                        {
                            retwebcontent.retrieveImages(mainUrl); // to check when its not // when im using and calling the function to retrieve images the program is not working good not crawling ! to check why.
                        }
                        // maybe something like this :
                        if (levels > 0)
                            sitesToCrawl.AddRange(webSites);// we want this to grow..(but not in the most deep level..cause we are not going to dive anyway in this level)

                        /* to call here the duplicates function with current sites with the sites visited *\
                        to call again the duplicates function with the same currentsites with the list number 2 in the form level the sites im going to visits them !
                        the list webSites are the links im going to visit im adding to sitestocrawl

                        /* to filter/clean same sites already when gewtting all links here **
                        /* 


                        /*
                        2DO:
                        webSites = FilterJunkLinks(webSites); // keeps only things that start with http or https.. and maybe 
                         * remove self site.. or other junk..
                         * */

                        if (levels == 0)
                        {
                            return csFiles;
                        }
                        else
                        {


                            for (int i = 0; i < webSites.Count(); i++)//&& i < 20; i++) // limiting ourseleves for 20 sites for each level for now..
                            //or it will take forever.
                            {
                                //int mx = Math.Min(webSites.Count(), 20); 

                                if (wccfg.toCancel == true)
                                {
                                    return new List<string>();
                                }
                                string t = webSites[i];
                                if ((t.StartsWith("http://") == true) || (t.StartsWith("https://") == true)) // replace this with future FilterJunkLinks function
                                {
                                    csFiles.AddRange(webCrawler(t, levels - 1));

                                }

                            }
                            return csFiles;
                        }
                    }
                    return csFiles;
                }

                catch (WebException ex)
                {
                    failed = true;
                    wccfg.failedUrls++;
                    return csFiles;
                }
                catch (Exception ex)
                {
                    failed = true;
                    wccfg.failedUrls++;
                    throw;
                }
        }


This function is recursive it keep calling it self over and over again.

In Form1 i have a cancel button event click:

private void button3_Click(object sender, EventArgs e)
        {
            bgwc.CancelWorker();
            cancel = true;
            wcfg.toCancel = cancel;
        }


bgwc is a class where i hold and start two backgroundworkers:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using HtmlAgilityPack;
using System.Net;
using System.Windows.Forms;
using System.ComponentModel;
using System.Threading;

namespace GatherLinks
{
    class BackgroundWebCrawling
    {
        public string f;
        int counter = 0;
        List<string> WebSitesToCrawl;
        int MaxSimultaneousThreads;
        public BackgroundWorker mainBackGroundWorker;
        BackgroundWorker secondryBackGroundWorker;
        WebcrawlerConfiguration webcrawlerCFG;
        List<WebCrawler> webcrawlers;
        int maxlevels;
        public event EventHandler<BackgroundWebCrawlingProgressEventHandler> ProgressEvent;
        ManualResetEvent _busy = new ManualResetEvent(true);
        // mainBack to make doWork event 
        // function that called start that get list<string> websitestocrawl anbd get int number how many threads to work at once ! in Form1 to make start to be public
        // to make progress event to the main background ! mainBack first time to make it to work without progress maybe !
        // to make getList function to get all lists from the webcrawlers instances !
        // in the mainBack to check and create instances for secondryBackground each time.
        // to the seocndry to make DoWork as it is now !
        // in the secondry to call the webcrawler from the class WebCrawler .
        // since WebCrawler is a class in the main background DoWork event before calling for a new thread for the secondry to make a new instance to the webCrawler class !!!
        // to call to the FUNCTION webCrawler to use the e.Argument to get create instance of the webCrawler function.
        // to make the progress event in the end to report for labels and richtextboxes in Form1. maybe in Form1 to use invokes for this event.
        // the function start will get parameters she need and that webcrawer need from webcrawler cfg class. this list to send to each webcrawler instance.

        public BackgroundWebCrawling()
        {
            webcrawlers = new List<WebCrawler>();
            mainBackGroundWorker = new BackgroundWorker();
            mainBackGroundWorker.WorkerSupportsCancellation = true;
            mainBackGroundWorker.DoWork += mainBackGroundWorker_DoWork;
        }

        private void mainBackGroundWorker_DoWork(object sender, DoWorkEventArgs e)
        {
            try
            {
                BackgroundWorker worker = sender as BackgroundWorker;
                for (int i = 0; i < WebSitesToCrawl.Count; i++)
                {
                    _busy.WaitOne();
                    if ((worker.CancellationPending == true))
                    {
                        e.Cancel = true;
                        break;
                    }
                    while (counter >= MaxSimultaneousThreads)
                    {
                        Thread.Sleep(10);
                    }


                    WebCrawler wc = new WebCrawler(webcrawlerCFG);
                    webcrawlers.Add(wc);
                    counter++;
                    secondryBackGroundWorker = new BackgroundWorker();
                    secondryBackGroundWorker.DoWork += secondryBackGroundWorker_DoWork;
                    object[] args = new object[] { wc, WebSitesToCrawl[i] };
                    secondryBackGroundWorker.RunWorkerAsync(args);



                }
                while (counter > 0)
                {
                    Thread.Sleep(10);
                }
            }
            catch
            {
                MessageBox.Show("err");
            }
        }

        private void secondryBackGroundWorker_DoWork(object sender, DoWorkEventArgs e)
        {


            try
            {
                object[] args = (object[])e.Argument;
                WebCrawler wc = (WebCrawler)args[0];
                string mainUrl = (string)args[1];
                wc.ProgressEvent += new EventHandler<WebCrawler.WebCrawlerProgressEventHandler>(x_ProgressEvent);
                wc.webCrawler(mainUrl, maxlevels);


                counter--;
            }
            catch
            {
                MessageBox.Show("err");
            }
        }

        public void Start(List<string> sitestocrawl, int threadsNumber, int maxlevels, WebcrawlerConfiguration wccfg)
        {
            this.maxlevels = maxlevels;
            webcrawlerCFG = wccfg;
            WebSitesToCrawl = sitestocrawl;
            MaxSimultaneousThreads = threadsNumber;
            mainBackGroundWorker.RunWorkerAsync();
        }

        private void x_ProgressEvent(object sender, WebCrawler.WebCrawlerProgressEventHandler e)
        {
            // ok .. so now you get the data here in e
            // and here you should call the event to form1 
            Object[] temp_arr = new Object[8];
            temp_arr[0] = e.csFiles;
            temp_arr[1] = e.mainUrl;
            temp_arr[2] = e.levels;
            temp_arr[3] = e.currentCrawlingSite;
            temp_arr[4] = e.sitesToCrawl;
            temp_arr[5] = e.done;
            temp_arr[6] = e.failedUrls;
            temp_arr[7] = e.failed;
            OnProgressEvent(temp_arr); /// send the data + additional data from this class to Form1..
                                       /// 
            /*
             * temp_arr[0] = csFiles;
                temp_arr[1] = mainUrl;
                temp_arr[2] = levels;
                temp_arr[3] = currentCrawlingSite;
                temp_arr[4] = sitesToCrawl;*/
        }

        private void GetLists(List<string> allWebSites)
        {
        }

        public class BackgroundWebCrawlingProgressEventHandler : EventArgs
        {
            public List<string> csFiles { get; set; }
            public string mainUrl { get; set; }
            public int levels { get; set; }
            public List<string> currentCrawlingSite { get; set; }
            public List<string> sitesToCrawl { get; set; }
            public bool done { get; set; }
            public int failedUrls { get; set; }
            public bool failed { get; set; }
        }

        protected void OnProgressEvent(Object[] some_params) // probably you need to some vars here to...
        {

            // some_params to put in evenetArgs..
            if (ProgressEvent != null)
                ProgressEvent(this,
                    new BackgroundWebCrawlingProgressEventHandler()
                    {
                        csFiles = (List<string>)some_params[0],
                        mainUrl = (string)some_params[1],
                        levels = (int)some_params[2],
                        currentCrawlingSite = (List<string>)some_params[3],
                        sitesToCrawl = (List<string>)some_params[4],
                        done = (bool)some_params[5],
                        failedUrls = (int)some_params[6],
                        failed = (bool)some_params[7]
                    });
        }

        public void PauseWorker()
        {
            if (mainBackGroundWorker.IsBusy)
            {
                _busy.Reset();
            }
        }

        public void ContinueWorker()
        {
            _busy.Set();
        }

        public void CancelWorker()
        {
            ContinueWorker();
            mainBackGroundWorker.CancelAsync();
        }

    }
}


In Form1 i start with a button the first backgroundworker that start the second backgroundworker and the second backgroundworker call the recursive loop .

private void button1_Click(object sender, EventArgs e)
        {
            List<string> sites = new List<string>();
            List<string> a = (List<string>) listBox1.Tag;
            foreach (var x in listBox1.SelectedIndices)
            {
                sites.Add(a[(int)x]);
            }
            wcfg = new WebcrawlerConfiguration();
            bgwc = new BackgroundWebCrawling();
            wcfg.downloadcontent = downLoadImages;
            wcfg.failedUrls = failedUrls;
            wcfg.localy = LocalyKeyWords;
            wcfg.removeext = removeExt;
            bgwc.Start(sites, 3, (int)numericUpDown1.Value, wcfg);
       }


The bgwc start the first background that start the second background that call the recursive loop.

Now wcfg in Form1 is another class for configuration:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace GatherLinks
{
    class WebcrawlerConfiguration
    {

        public string url;
        public Dictionary<string,List<string>> localy;
        public bool removeext;
        public bool downloadcontent;
        public int failedUrls;
        public bool toCancel;
        public bool offlineonline;

        public WebcrawlerConfiguration()
        {

        }
    }
}


So if i clicked the cancel button so in the recursive loop im checking the the toCancel variable in the configuration class is true and make return empty List and then in Form1 i have completed event:

private void backgroundWorker1_RunWorkerCompleted(object sender, RunWorkerCompletedEventArgs e)
        {
            button3.Enabled = false;
            checkBox1.Enabled = true;
            checkBox2.Enabled = true;
            numericUpDown1.Enabled = true;
            button1.Enabled = true;
            button2.Enabled = true;
            this.Text = "Web Crawling";
            if (cancel == true)
            {
                label6.Text = "Process Cancelled";
            }
            else
            {
                label6.Text = "Process Completed";
            }
            button6.Enabled = true;
            button4.Enabled = false;
            button5.Enabled = false;
            listBox1.Enabled = true;

        }


So when i click the button it does cancel/stop the operation .

But the pause and resume buttons clicks dosent effect/work at all.

private void button4_Click(object sender, EventArgs e)
        {
            bgwc.PauseWorker();
            label6.Text = "Process Paused";
            button5.Enabled = true;
            button4.Enabled = false;
        }

        private void button5_Click(object sender, EventArgs e)
        {
            bgwc.ContinueWorker();
            label6.Text = "Process Resumed";
            button4.Enabled = true;
            button5.Enabled = false;
        }

Both button4 and button5 maybe pause/resume the backgroundworker the main backgroundworker but they never pause/resume the  recursive loop.

1. How do i pause/resume the recursive loop ? I know how to cancel/stop the loop but how do i pause/resume the loop ?

2. Do i need to cancel/pause/resume the main backgroundworker and the second backgroundworker too or if i cancel/pause/resume the recursive loop is enough ? The backgroundworkers can keep working it dsoent matter or it does ? So maybe i need to cancel/pause/resume all of them all backgroundworkers and the recursive loop togeather ?

3. How do i do it all ? Its a little bit mess with the code. I mean i can cancel the loop its working but im not sure if leaving the backgroundworkers work is good and if not so how do i cancel/pause/resume them all togeather ?

What should be the logic around this recursive loop ?

What i need in general is to be able to cancel/pause/resume the operation of the recursive loop.


Viewing all articles
Browse latest Browse all 31927

Trending Articles