Quantcast
Channel: Visual C# forum
Viewing all articles
Browse latest Browse all 31927

Extracting text from html file using indexof and substring not working perfect how can i fix it ?

$
0
0

In the top of form1 i did:

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;
using System.Net;
using System.Text.RegularExpressions;
using System.IO;

namespace DownloadImages
{
    public partial class Form1 : Form
    {
        string f;

        public Form1()
        {
            InitializeComponent();

            string localFilename = @"d:\localpath\";
                using (WebClient client = new WebClient())
                {
                    client.DownloadFile("http://www.sat24.com/foreloop.aspx?type=1&continent=europa#",localFilename + "test.html");
                }

                f = File.ReadAllText(localFilename + "test.html");
                test();
        }

        private void test()
        {
            List<string> imagesUrls = new List<string>();
            int startIndex = 0;
            int endIndex = 0;
            int position = 0;

            string startTag = "http://www.niederschlagsradar.de/images.aspx";
            string endTag = "cultuur=en-GB&continent=europa";

            startIndex = f.IndexOf(startTag);

            while (startIndex > 0)
            {
                endIndex = f.IndexOf(endTag,startIndex);
                if (endIndex == -1)
                {
                    break;
                }
                string t = f.Substring(startIndex, endIndex - startIndex + endTag.Length);
                imagesUrls.Add(t);

                position = endIndex + endTag.Length;

                startIndex = f.IndexOf(startTag,position);
            }
        }

        private void Form1_Load(object sender, EventArgs e)
        {

        }
    }
}


The problem is that the last itertion/loop its extracting the strig i want and all the left content to the end of the file.

For example the List<string> contin in index 0:

http://www.niederschlagsradar.de/images.aspx?jaar=-6&type=europa.precip&datum=201309151500&cultuur=en-

GB&continent=europa

Then in index 62:

http://www.niederschlagsradar.de/images.aspx?jaar=-6&type=europa.precip&datum=201309230600&cultuur=en-

GB&continent=europa


But in the last index 63  it contain:

http://www.niederschlagsradar.de/images.aspx?jaar=-6&type=europa.precip&cultuur=thumbnail&continent=europa" border="0"/></a></li><li style="margin-top: -12px;text-align: center;"><a href="/?ir=true&co=true&li=false" target="_top" class="white"><div style=";top:14px;font-size:14px;">KM</div><img 

Thel ast index in the List contain the string i need: http://www.niederschlagsradar.de/images.aspx?jaar=-6&type=europa.precip&cultuur=thumbnail&continent=europa

But also it contain all the content of the html left from this index.

How can i make sure in my code that the last string it will stop after it and will not add all the rest of the file content ?

So in the last index i will see only: http://www.niederschlagsradar.de/images.aspx?jaar=-6&type=europa.precip&cultuur=thumbnail&continent=europa


Viewing all articles
Browse latest Browse all 31927

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>