OpenCores
no use no use 1/2 Next Last
newb suffering from information overload
by btah on Oct 18, 2017
btah
Posts: 10
Joined: Oct 17, 2017
Last seen: Oct 24, 2017
Hello,

Newb here... Although I am not sure OpenCores is the place to start, I will give it a shot. I have information overload from researching how to get started. Not sure if much of what I am reading from my google results is relevant or out of date, so I will simply ask the following:

I am exploring the possibility of using something like FPGAs to process date via a network connection (tcp and multicast) and parse the data into a common format. More specifically, processing financial feed data like stocks and options pricing data (which each has their own proprietary data format) and parse the data to a common format for an application I am responsible for.

Where is a good place that has up to date information that I can use to start my research?

Am I correct in thinking this seems like a good choice for FPGA processing since it is processing very high data rates with simple parsing logic that gets transformed into a common format and then delivered to an application for further process?

Thanks in advance.
RE: newb suffering from information overload
by metaforest on Oct 19, 2017
metaforest
Posts: 10
Joined: Jul 21, 2017
Last seen: Jan 25, 2023
It seems unlikely that FPGAs are going to help in the context you describe.
Any modern PC is capable of processing TCP data with only modest hardware upgrades at multiple gigabits per second.

My suggestion would be to develop the application processing and see where your process gets bottlenecked. Work from there to characterize the bottlenecks and develop solutions to those issues.

Any bandwidth related issues are more likely to be solved using GPU approaches.

While it is possible to develop processing chains using FPGAs to do this, the engineering effort is usually not going to survive a CBA.
RE: newb suffering from information overload
by btah on Oct 19, 2017
btah
Posts: 10
Joined: Oct 17, 2017
Last seen: Oct 24, 2017
Thanks for your response. I haven't been able to find a definition of "CBA" so I don't know what you mean.
RE: newb suffering from information overload
by btah on Oct 19, 2017
btah
Posts: 10
Joined: Oct 17, 2017
Last seen: Oct 24, 2017
Also, the application is already a production application. Just looking at this technology to reduce latency. There is only so fast you can process/parse 5 to 10 million distinct multicast messages per second via a software solution. At some point you want to look to see if there is a hardware solution that can help shave off some microseconds.

I am running on high end server hardware with 32 cores and on some machines 56 cores. The app has been tuned over several years and is extremely fast, just exploring ways to make it faster.
RE: newb suffering from information overload
by wingedpower on Oct 19, 2017
wingedpower
Posts: 5
Joined: Jan 5, 2017
Last seen: Mar 9, 2024
I think "CBA" means "cost benefit analysis".
RE: newb suffering from information overload
by wingedpower on Oct 19, 2017
wingedpower
Posts: 5
Joined: Jan 5, 2017
Last seen: Mar 9, 2024
"I am running on high end server hardware with 32 cores and on some machines 56 cores."
"There is only so fast you can process/parse 5 to 10 million distinct multicast messages per second via a software solution."

Not really knowing about the architecture of your system, I'm guessing it's implementing something like: https://www.cisco.com/c/dam/en_us/solutions/industries/docs/finance/md-arch-ext.pdf

In which case, in what context are you referring to when you say you want to shave off 5-10 ms? So long as you have sufficient memory to both buffer the incoming streams and sufficient capacity to process the streams in real-time... to reduce the "processing lag" by 5-10 ms, the "easy" fixes would be:

* Faster processors or processors with more cache(provided your app stack code can leverage the memory)
* Employ GPU(s) to help with Acceleration(OpenCL/Cuda) as metaforest suggests.
* Get a server with a faster bus speed for the RAM

If the streams are SSL encoded or otherwise employs crypto, you can potentially improve the throughput by leveraging the proper crypto acceleration cards so that the CPU(s) are freed from having to do the SSL/crypto processing.

All of these are ways to reduce the lag from received to sent out, without employing FPGA(s).

With FPGA(s), you would be literally rebuilding part of your stack and most likely on custom boards so that you can leverage enough FPGA modules to make it worth your while. The cases where financial institutions process massive amounts of work faster employs a large number of FPGA cores on custom boards.

But not knowing your infrastructure, it's hard to say where you can shave off 5-10ms. You noted that the stack has been tuned over the years. I'm guessing all the relevant network infrastructure and processing nodes have been upgraded.

But those topics are not FPGA related per se.
RE: newb suffering from information overload
by inflector on Oct 19, 2017
inflector
Posts: 6
Joined: Aug 28, 2017
Last seen: May 31, 2018
If you are trying to reduce latency, then you need to identify your latency bottlenecks. What are they?

I've got quite a bit of experience with financial data, but it's been a few years. FPGAs can be much lower latency but you need to specify what specific processing is currently slow if you want more specific advice. What specific code needs to be faster?

What is the current latency from data input at the network card to output? What kinds of mathematical or format transformation are involved in your current pipeline?
RE: newb suffering from information overload
by btah on Oct 19, 2017
btah
Posts: 10
Joined: Oct 17, 2017
Last seen: Oct 24, 2017
We are starting to head in a direction that I wasn't really planning on going to. The system is too big to start describing on a forum in the detail that is needed for that. Plus, I don't know know all the nitty-gritty details that are needed to discuss performance. One the folks that replied mentioned shaving off 5 to 10ms. I think that person misread one of my posts. The system is processing 5 to 10 million multicast messages per second. My system currently processes data, end to end, in about 50 microseconds. This time includes getting the message from the network stack, parsing, memory updates, determining where the data needs to be directed, formatting a message(s) for the redirection, and putting it on the wire to the end client. Purely a server system, no user interface other than command line.

As far as where there is a problem, I would say there currently isn't a problem. This isn't a reactionary topic, more of an exploratory topic for myself. My system runs quiet and I have a lot of room for expansion, this technology caught my interest.
RE: newb suffering from information overload
by btah on Oct 19, 2017
btah
Posts: 10
Joined: Oct 17, 2017
Last seen: Oct 24, 2017
However, I can say I partially got an answer here. I don't quite understand the whole hardware setup for FPGAs. All the stuff I keep running across are little experimental boards used to turn LEDs on/off (I am sure there are other purposes, but I keep running into how to light up an LED...).

But, for an enterprise class piece of hardware capable of doing some serious work, I get the impression we are talking about having someplace (Xilinx?) build custom made machines? Or at least custom boards for the machines they support? Is that how that works? You are not going and buying real machines with 56 processors, off the shelf, with FPGAs in them, right?
RE: newb suffering from information overload
by btah on Oct 19, 2017
btah
Posts: 10
Joined: Oct 17, 2017
Last seen: Oct 24, 2017
What would be cool from my standpoint is to have a NIC that can be programmed to do the initial parsing on a received message and pass it to the existing software already parsed.
RE: newb suffering from information overload
by inflector on Oct 19, 2017
inflector
Posts: 6
Joined: Aug 28, 2017
Last seen: May 31, 2018
Can you describe the parsing process? And what else (besides parsing) is going on during that 50 microseconds?

Can you give us an example of one set of inputs and the resulting outputs?

FPGAs are very good at doing work which can be split into parallel logic flows. It's hard to know what might be possible without knowing the form of the data to see if any actual latency reduction is possible.

For example, are your inputs fed to different output streams? What sort of multicast does your NIC receive? Are you processing ticker data, options pricing? What are you doing to the resulting parsed data? I.e. is it being sent to other systems for analysis? Forwarded to customers over the internet or dedicated networks?

Does the parsing require lookups of large datasets? How much memory do these datasets require? Can the work be sharded or split according to some simple algorithm with an easy mathematical transformation?



RE: newb suffering from information overload
by aikijw on Oct 19, 2017
aikijw
Posts: 76
Joined: Oct 21, 2011
Last seen: Jul 8, 2023
I keep seeing folks suggesting a “GPU” as a solution, which in my experience (which covers both FPGA and GPU based solutions), is a much less flexible solution than an FPGA. I suspect that the OP is currently passing packets over a PCIe bus and the process uses a standard approach to handling network traffic (DMA triggered by an interrupt). If this is the case then the majority of his remaining latency is probably in bus transfers... Dismissing an FPGA, parked on the “net” side of the bus seems premature... I’d bet that there’s quite a bit of processing that can be done there, especially if you drop an Ultrascale+ Zync with an GigE+ interface plumbed directly into the Zync... My 0.02USD...

RE: newb suffering from information overload
by metaforest on Oct 20, 2017
metaforest
Posts: 10
Joined: Jul 21, 2017
Last seen: Jan 25, 2023
Developing custom hardware to solve an internal application issue is a huge commitment. This is what I was kind of saying when I was talking about CBA (Cost-Benefit Analysis)

Hardware is a messy approach to solving problems. The reason we don't see a lot of custom hardware is that even with FPGAs the time and engineering effort to make it happen only makes sense if you can sustain the engineering tasks for a year or two and there is a retail outlet at the end of it. From what you have said, so far is that this is an internal application that won't have any retail outlet. There would have to be a huge internal benefit to justify the engineering effort.

I see others suggesting FPGA solutions as if they are a gingersnap from realization... This is not how it works it takes a huge commitment to bring a solution like this online.

It is not impossible, but such an effort should have exhausted all other avenues before taking such an effort seriously.

~m
RE: newb suffering from information overload
by btah on Oct 20, 2017
btah
Posts: 10
Joined: Oct 17, 2017
Last seen: Oct 24, 2017
Yes, this is purely an internal application. Nobody here know much about FPGA other than at a 1 million foot view and on a good day most people can spell FPGA... I was just exploring to see if I could pitch an idea. But, based on what I have learned over the last few days, it is not feasible. Sounds like this is more of an engineering project as far as the hardware is concerned. I originally thought I could buy machines already built that we could work with. That doesn't seem to be the case. The industry is not there yet.

To the person who asked about what type of date the system is processing. It is all the major financial feeds, OPRA, SIAC, NASDAQ, S&P, CME, Reuters, and about a dozen others. Read the data (huge majority is multicast), parse it to a common internal format, update some memory tables, determine who has subscribed for the data, build a message for that client, and deliver it. Essentially, a big router.
RE: newb suffering from information overload
by aikijw on Oct 20, 2017
aikijw
Posts: 76
Joined: Oct 21, 2011
Last seen: Jul 8, 2023
I disagree with the opinion that this is a major undertaking... I also disagree with the opinion that “custom hardware” should be a last resort (including the disinformed implication that custom hardware is somehow an improper solution). This opinion disagrees with the rather robust FPGA market that exists today (including extensive investment in development of vendor tool chains to support rapid prototyping)... So... While its convienient to use a CBA as a justification for not doing something, it’s equally important to actually have real world experience with both the target problem/solution space. When software is your only hammer, it’s easy to develop a bias like this...

It would take about six months to put a viable prototype together... Including Gingersnaps...

I’d start with a PCIe based FMC carrier... Zynq... drop your choice of high speed baseband interface on the carrier (QSFP?) and maybe a few Gig of HMC memory to support very fast pattern matching... Hardware investment
Six months of very lazy development...

Not trying to be disrespectful, but out of hand dismissal of solutions with knee jerk faux business justifications like a “CBA”, or making a fallacious “retail” argument tend to indicate a lack of real, honest, experience with the problem space...

Best!

/jw



Yes, this is purely an internal application. Nobody here know much about FPGA other than at a 1 million foot view and on a good day most people can spell FPGA... I was just exploring to see if I could pitch an idea. But, based on what I have learned over the last few days, it is not feasible. Sounds like this is more of an engineering project as far as the hardware is concerned. I originally thought I could buy machines already built that we could work with. That doesn't seem to be the case. The industry is not there yet.

To the person who asked about what type of date the system is processing. It is all the major financial feeds, OPRA, SIAC, NASDAQ, S&P, CME, Reuters, and about a dozen others. Read the data (huge majority is multicast), parse it to a common internal format, update some memory tables, determine who has subscribed for the data, build a message for that client, and deliver it. Essentially, a big router.
no use no use 1/2 Next Last
© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.