0
голосов
2ответов
2911 просмотров

Get text from HTML

I need a way to get all text from my aspx files. They may contain javascrip also but I only need this for the HTML code. Basically I need to extract everything on Text or Value attributes, text within code, whatever... Is there any parser API available? Cheers! Alex

3
голосов
1ответов
1012 просмотров

DataBinding with a DataRow - problems

I am currently coding a project under c# and am experiencing problems. I'll give a brief description of my form: It has a datagridview on the bottom half of the form with single row selection, and is read only. On the top half of the form I have various components which are databound to the sele...

0
голосов
2ответов
344 просмотров

Getting DocumentSteam for html websites without using a Winforms WebBrowser control for parsing?

Getting DocumentSteam for html websites without using a Winforms WebBrowser control for parsing? Is this possible? I would like to create some types like: HtmlDocument doc = new HtmlDocument ("http://www.ms.com"); DocumentStream ds = doc.GetFullStream(); ... Also if possible, please post code.

8
голосов
2ответов
29046 просмотров

PHP DOMDocument getting Attribute of Tag

Hello I have an api response in xml format with a series of items such as this: <item> <title>blah balh</title> <pubDate>Tue, 20 Oct 2009 </pubDate> <media:file date="today" data="example text string"/> </item> I want to use DOMDocument to get the att...

6
голосов
6ответов
15146 просмотров

Parse integer from string containing letters and spaces - C#

What is the most efficient way to parse an integer out of a string that contains letters and spaces? Example: I am passed the following string: "RC 272". I want to retrieve 272 from the string. I am using C# and .NET 2.0 framework.

1
голосов
3ответов
105 просмотров

Get a look at the temporary files a process creates

I'm trying to reverse-engineer a program that does some basic parsing: text in, text out. I've got an executable "reference implementation" and the source code to what must be a different version, since the compiled source output != executable output. The process creates and deletes temporary f...

3
голосов
5ответов
7627 просмотров

Create array from the contents of <div> tags in php</div>

I have the contents of a web page assigned to a variable $html Here's an example of the contents of $html: &lt;div class="content"&gt;something here&lt;/div&gt; &lt;span&gt;something random thrown in &lt;strong&gt;here&lt;/strong&gt;&lt;/span&gt; &lt;div class="content"&gt;more stuff&lt;/div&gt...

0
голосов
3ответов
276 просмотров

How to match these strings with Regex?

&lt;div&gt; &lt;a href="http://website/forum/f80/ThreadLink-new/" id="thread_gotonew_565407"&gt;&lt;img class="inlineimg" src="http://website/forum/images/buttons/firstnew.gif" alt="Go to first new post" border="0" /&gt;&lt;/a&gt; [MULTI] &lt;a href="http...

4
голосов
7ответов
894 просмотров

What would be your choice of Perl XML Parsers for files greater than 15 GB?

I know there are some very good Perl XML parsers like XML::Xerces, XML::Parser::Expat, XML::Simple, XML::RapidXML, XML::LibXML, XML::Liberal, etc. Which XML parser would you select for parsing large files and on what parameter would you decide one over another? If the one you would like to selec...

4
голосов
2ответов
3650 просмотров

HTML page to XHTML with TagSoup

Sorry if this is too simple, but I simply couldn't find a tutorial nor the documentation of the Java version of TagSoup. Basically I want to download an HTML webpage from the internet and turn it into XHTML, contained in a string. How can I do this with TagSoup? Thanks!

1
голосов
2ответов
904 просмотров

Regex - extract info from a String in Java

I have a problem in applying a regex in my Java code. My text string is like this (String myString) name: Abc Def; blah: 1 2 3; second name: Ghi; I need to extract the name information (Abc Def). Name can contain a string of 1+ words. All the properties (name, blah, sec...

0
голосов
1ответов
1355 просмотров

Help optimize my RPN evaluation function

My parser evaluates PEMDAS expressions by first converting from infix to postfix then uses the standard postfix evaluation rules. I parse the expression and store the tokens in a list. This precompilation is ok for me since I plan on caching the precompiled functions. I am trying to optimize th...

4
голосов
5ответов
6096 просмотров

Tell SAX Parser to ignore invalid characters?

SAX keeps on dying on the following exception: Invalid byte 2 of 3-byte UTF-8 sequence The problem is its mostly correctly UTF-8 encoded but there are a few errors in it. We cannot get a new version of the file, we have to use this file. So how do we tell SAX to ignore invalid character seque...

21
голосов
3ответов
94879 просмотров

Preg match text in php between html tags

Hello I would like to use preg_match in PHP to parse the "Desired text" out of the following from a html document &lt;p class="review"&gt; Desired text &lt;/p&gt; Ordinarily I would use simple_html_dom for such things but on this occasion it cannot be used (the above element doesn't appear in ...

1
голосов
1ответов
160 просмотров

Remove large block of blank text in PHP

Hello I have a string in PHP $string = "...................blah blah blah.................." where the ......... are blank spaces (stackoverflow doesn't let me enter many blank spaces). How do I remove this block of blank spaces before and after the "blah blah blah" text? "blah blah blah" is ...

2
голосов
4ответов
4762 просмотров

parsing string to a dict

I have a string output which is in form of a dict ex. {'key1':'value1','key2':'value2'} how can make easily save it as a dict and not as a string?

11
голосов
3ответов
9209 просмотров

Parsing grammars using OCaml

I have a task to write a (toy) parser for a (toy) grammar using OCaml and not sure how to start (and proceed with) this problem. Here's a sample Awk grammar: type ('nonterm, 'term) symbol = N of 'nonterm | T of 'term;; type awksub_nonterminals = Expr | Term | Lvalue | Incrop | Binop | Num;; l...

0
голосов
1ответов
415 просмотров

DOMDocument parsing (php)

I am parsing an xml file from an API which I have converted into a DOMDocument in php. This is mostly fine but one problem I have is when I do this: $feeditem-&gt;getElementsByTagName('extra'); as part of a forall statment and the element extra doesn't exist in one of the feeditems I am iterat...

0
голосов
4ответов
481 просмотров

PHP - How to parse this xml?

I'm trying to parse the XML below so that I wind up with an array that looks like the sample included... I'm having a hard time figuring out how to get the attributes inside of the tags to output the way I want it to... The XML &lt;cust rid="999999" memberid="12345" lname="Doe" fname="John"&gt...

2
голосов
4ответов
3151 просмотров

Trouble Parsing Atom feed with jQuery

I have an Atom feed like this... &lt;?xml version="1.0"?&gt; &lt;feed xml:base="http://earthquake.usgs.gov/" xmlns="http://www.w3.org/2005/Atom" xmlns:georss="http://www.georss.org/georss"&gt; &lt;updated&gt;2009-10-12T14:47:25Z&lt;/updated&gt; &lt;title&gt;USGS M2.5+ Earthqua...

1
голосов
2ответов
245 просмотров

Format C# Source Code with Hyperlinks to Reference Library Documentation

I'm wondering if anyone has done this already. I want to format C# source code in HTML. But with a twist! I want to turn the names of all types and methods that appear in the code into hyperlinks to the MSDN Library documentation of the types and methods. To do a good job, the data types of var...

1
голосов
2ответов
1781 просмотров

How do I get subset of a Java XML org.w3c.dom.Document?

I have a XML org.w3c.dom.Document object. It looks sorta like this: &lt;A&gt; &lt;B&gt; &lt;C/&gt; &lt;D/&gt; &lt;E/&gt; &lt;F/&gt; &lt;/B&gt; &lt;G&gt; &lt;H/&gt; &lt;H/&gt; &lt;J/&gt; &lt;/G&gt; &lt;/A&gt; How can I convert the Document obje...

8
голосов
2ответов
7974 просмотров

Replace SRC of all IMG elements using Parser

I am looking for a way to replace the SRC attribute in all IMG tags not using Regular expressions. (Would like to use any out-of-the box HTML parser included with default Python install) I need to reduce the source from what ever it may be to: &lt;img src="cid:imagename"&gt; I am trying to rep...

1
голосов
2ответов
306 просмотров

Parse from a list with a quote in text

I want to put these strings in a list, how do I do that? The hold up is the double "", found in the first two lines. How can I work around this? scriptTxt = new string[] { "#$language = "VBScript"", "#$interface = "1.0"", "crt.Screen.Synchronous = True", "Sub Main" };

3
голосов
4ответов
1430 просмотров

Will this regex be enough to remove C++ multiline comments?

I need to parse some C++ files, and to make things easier for me, I thought about removing multiline comments. I tried the following regex : /(\/\*.*?\*\/)/, using the multiline modifier, and it seems to work. Do you think there will be any case where it will fail?

2
голосов
7ответов
835 просмотров

Instrumenting JavaScript

I would like to instrument JavaScript code in order to "log" values of global variables. For example I would like to know all the values a particular variables foo had during execution. The logging is not the problem. What would be the easiest way to implement this? I was thinking of using Rhino...

0
голосов
2ответов
1033 просмотров

Windows batch parse template and increment values

I'm trying to write a windows batch file that would read in a text file with certain text in it and increment some values in that file. The text file would contain text like : public static const COUNTER:int = 0 the batch file would then search for "COUNTER:int = 0" and increment the 0 value....

3
голосов
5ответов
1077 просмотров

Process 40M of documents (and index) as fast as possible

Have a good day. So my problem is basically this, I need to process 37.800.000 files. Each "file" is really more than that, what I have is: 37.800.000 XML documents. More than 120.000.000 of Tiff images. Each of the XML documents reference one or more Tiff images and provides a set of commo...

3
голосов
7ответов
4554 просмотров

Javascript parsing an integer from a string

What I want to do is determine if a string is numeric. I would like to know what people think about the two solutions I am trying to decide between (OR if there is a better solution that I have not found yet). The parseInt function is not suitable because it will return an integer value for a par...

0
голосов
1ответов
463 просмотров

Facebook FBJS UTC

I am pulling in data from a JSON file using FBJS AJAX. On of the values in the json file is a date. The date has a UTC format, Date(1255535021000-0600). However, I am getting an "invalid date" or "NaN" error whatever I do. I have tried the following: new Date(1255535021000-0600), new Date(12555...