Quantcast
Channel: Visual C# forum
Viewing all articles
Browse latest Browse all 31927

How do i filter and remove lines that don't contain the specifc word/s?

$
0
0

newText is List and WorldsList.words is string[]

I loop over the lines in newText and loop over the words and i want to check this way:

First line in newText to loop over all the words if none of any of the words exist in this line remove the current line and the next line after it. For example in newText if line in index 0 is : Hello everyone And line in index 1 is: created at 12/3/2002 Then remove index 0 and index 1

Index 2 is empty like a space empty line so do not remove it.

Then index 3 to loop over all the words if nonoe of the words exist in line in index 3 remove index 3 and index 4 .

And so on...

How can i do it ?

This is the method im trying to use now:

private void WordsFilter(List<string> newText)
        {


            for (int i = 0; i < newText.Count; i++)
            {
                for (int x = 0; x < WordsList.words.Length; x++)
                {
                    if (!newText[i].Contains(WordsList.words[x]))
                    {
                        newText.RemoveAt(i);
                        if (i + 1 < newText.Count)
                            newText.RemoveAt(i);
                    }
                }
            }
        }

The problem is that newText contain 180 lines after the filtering there are only 3 lines left and then it throw exception out of index on the IF line.

Second thing is that i dont want that it will remove the lines once a word is not exist i want that it will loop each time on an line on all over the words and if any of the words is not exist only then remove the two lines.

There are 23 words now. The inner loop should loop on each word check if this word not exist in the current line only if after it was loop on the current line with all the 23 words and none of the words exist in the current line remove the current line and the line under it.

If in index 0 i have the text: hello all

In index 1 i have: created on 4/6/2014

The inner loop should loop over all the 23 words check if none of this words exist in index 0 text if not exist delete index 0 and 1. 

Index 2 is empty its like empty line like a space. So if index 0 and 1 removed also index 2 should be removed so there will be no just empty line/space at the beginning now.

Then index 3 is also a text: hello world

Index 4: created on 4/6/2014

Loop on all the words compare with index 3(line 3) none of the words exist in the text remove index 3 and 4. 

And so on to do it on all the lines in newText.

The format of newText is like this:

ליברמן: נעדיף ללכת לבחירות ולא לשחרר מחבלים
דווח במקור בתאריך: 06.04.14  שעה : 17:01

הסמארטפונים גומרים לנו את האנרגיה. תשכחו ממכוניות ומטוסים
דווח במקור בתאריך: 06.04.14  שעה : 10:47

כביש 60: רכב צבאי הותקף בבקבוקי תבערה - אין נפגעים
דווח במקור בתאריך: 06.04.14  שעה : 19:15

''וואלה'':''הנה אנחנו מעשנים ג'וינט באולפן, בואו לעצור אותנו''
דווח במקור בתאריך: 06.04.14  שעה : 16:02

מאדים מתקרב לנקודה הקרובה ביותר לכדור הארץ: המדריך המלא כיצד לצפות במאורע
דווח במקור בתאריך: 06.04.14  שעה : 18:12

The text is in hebrew but that dosent matter. 

The format is index 0 a line with text index 1 line with date and time then index 2 empty/space

Index 3 again line with text index 4 date and time.

What i want to do is to keep this format but remove all the lines that none of the words exist in them !


Viewing all articles
Browse latest Browse all 31927

Trending Articles