OpenCores
First Prev 2/2 no use no use
RE: newb suffering from information overload
by aikijw on Oct 20, 2017
aikijw
Posts: 76
Joined: Oct 21, 2011
Last seen: Jul 8, 2023
Apologies... Phone cut off part of my post...

Hardware Investment...

1) Zynq Ultrascale+ based PCIe FMC carrier: 4-6k
2) FMC Baseband Interface (QSFP-ish): 2-3k (might want more than one)
3) HMC memory... 2-3k
4) Host computer... 6-7k (a bit application dependent, but probably hefty enough to prototype)
5) Vivado license (2.5-3k... Might be low, as I don't pay for them these days...)
6) Misc IP... 10-12k

Drop all of the interface hardware on the Zynq PL... Use the Zynq ARM cores to take care of management... Most of the protocol sorting (and probably any pattern matching/sorting algorithms) can be expressed in C and largely dumped, via Vivado HLS, right into the PL... If not, you've got 4x ARM A53s running at 1.5 GHz to help out...
Zero bus latency... The only time you'd have to jump the PCIe bus is when an exception needed to be sorted out or some kind of configuration change needed to occur...

This sounds like nearly all UDP, so you're just picking packets... peeking at them... making some decisions... plonking them in a queue? Rewriting headers...

Sounds like a perfect job for an FPGA, honestly...

If you're currently using a GP processor stack (assuming Linux... Please! :-)) to do this job, the bulk of your latency is going to be (1) Bus latency between your network adaptor and the GP (2) Kernel latency associated with processing high speed packet data (depends a bit on whether or not you're grabbing those packets in user space or you've done some significant kernel module development)... If you're using Windows... May Gawd Have Mercy on your Immortal Soul... ;-)

Again... An opinion that's probably worth about what you've paid for it... ;-)

/jw




I disagree with the opinion that this is a major undertaking... I also disagree with the opinion that “custom hardware” should be a last resort (including the disinformed implication that custom hardware is somehow an improper solution). This opinion disagrees with the rather robust FPGA market that exists today (including extensive investment in development of vendor tool chains to support rapid prototyping)... So... While its convienient to use a CBA as a justification for not doing something, it’s equally important to actually have real world experience with both the target problem/solution space. When software is your only hammer, it’s easy to develop a bias like this...

It would take about six months to put a viable prototype together... Including Gingersnaps...

I’d start with a PCIe based FMC carrier... Zynq... drop your choice of high speed baseband interface on the carrier (QSFP?) and maybe a few Gig of HMC memory to support very fast pattern matching... Hardware investment
Six months of very lazy development...

Not trying to be disrespectful, but out of hand dismissal of solutions with knee jerk faux business justifications like a “CBA”, or making a fallacious “retail” argument tend to indicate a lack of real, honest, experience with the problem space...

Best!

/jw



Yes, this is purely an internal application. Nobody here know much about FPGA other than at a 1 million foot view and on a good day most people can spell FPGA... I was just exploring to see if I could pitch an idea. But, based on what I have learned over the last few days, it is not feasible. Sounds like this is more of an engineering project as far as the hardware is concerned. I originally thought I could buy machines already built that we could work with. That doesn't seem to be the case. The industry is not there yet.

To the person who asked about what type of date the system is processing. It is all the major financial feeds, OPRA, SIAC, NASDAQ, S&P, CME, Reuters, and about a dozen others. Read the data (huge majority is multicast), parse it to a common internal format, update some memory tables, determine who has subscribed for the data, build a message for that client, and deliver it. Essentially, a big router.
RE: newb suffering from information overload
by btah on Oct 23, 2017
btah
Posts: 10
Joined: Oct 17, 2017
Last seen: Oct 24, 2017
So here is a very basic (well, one of many basic) thing I am missing. You get a board, drop in a PCIe card, and then what? I am assuming the whole mess doesn't just drop into a 56 CPU HP server... That is where all the hardware/electrical engineering comes in to play, right?
RE: newb suffering from information overload
by btah on Oct 23, 2017
btah
Posts: 10
Joined: Oct 17, 2017
Last seen: Oct 24, 2017
I should also say this application runs on 100+ servers company wide. Lots of redundancy, lots of locations. HP Ivy Bridge servers. Replacement of all hardware is not feasible. Additional components to each machine is feasible depending on the total cost. Or additions to a set of key machines is feasible.

RE: newb suffering from information overload
by aikijw on Oct 23, 2017
aikijw
Posts: 76
Joined: Oct 21, 2011
Last seen: Jul 8, 2023
The short answer is "yes"... :-)

The longer answer...

The BOM I described earlier...

The FMC Carrier has several FMC interfaces (kind of a 'socket' with a metric ton of pins)... The other interface edge is a PCIe interface... In your case, I think the PCIe interface isn't going to end up very heavily used... Depends on how much of the work you end up moving to the interface side of the PCIe bus... I don't know what your network interface looks like physically, but if it's 10G/40G/100G Ethernet, most of the interface development is pretty straightforward...

Unless there's a strongly supported reason not to, I always tend to hit prototypes with a sledgehammer (big FPGA... big processor(s)...) and then scale to cost later... I'd recommend a large Ultrascale+ Zynq... The 2-4 ARM cores will run embedded linux... You'd need to craft IP that does the following:

1) Manages the PCIe bus...
2) Manages your baseband net interface...
3) Supports your basic packet sniffing, sorting, and plonking rule set...

#1 and #2 are mostly off the shelf IP... usually available from the board vendor as part of a "framework"... #3... You'd have to develop... However, I suspect that you could probably implement #3 using Xilinx Vivado HLS. At the risk of oversimplifying, HLS compiles C into hardware... IMHO, it's a little twitchy at times, but it's probably improved a good bit since the last time I used it for anything significant...

There'd need to be a bit of work done to partition your routing/mangling process:

1) Routing that never has to leave the Zynq PL (enters via one baseband interface and is sniffed, rewritten/scribbled on, plonked into an outbound queue)

2) Routing that leaves the PL and has to be managed by the local ARM cores...

3) Routing that generates some kind of exception requiring handling by your "56 processor" box...

Most of the engineering will be with regard to partitioning your processes across these three domains... If your application allows moving a big chunk of what the general purpose cores are doing, to the "interface" side of the PCIe bus, I suspect you can pick some low hanging fruit pretty easily... The more you can keep in domain #1, the more you'll benefit, I suspect.

I'll caveat this by saying that much of what I'm speculating on here is, well, speculation... Your mileage may vary... :-)

You'd need to hire someone (consultants are available) to build up firmware for you, but you should be able, with a combination of performance analysis + some upfront system engineering (before buying anything) and six-ish months of prototyping to both determine whether the problem is worth solving on a larger scale...

My preferred vendor for FMC carriers, for applications like this, is HiTechGlobal... Their hardware tends to be on the shelf... Their prices and support costs aren't ridiculous, and their designs are pretty conservative... (Just looked, and it's not too surprising that they sell the exact board I'm describing... :-) http://www.hitechglobal.com/Boards/MPSOC_UltraScale+.htm)

Yep... You'd need some engineering support... This isn't a novice project... But... This is do-able, and I suspect, if you're buying machines with a crap-ton of general purpose cores (and you're keeping them busy), you could very probably shave some latency by doing more work on the interface side of the PCIe bus... :-)

Best!






So here is a very basic (well, one of many basic) thing I am missing. You get a board, drop in a PCIe card, and then what? I am assuming the whole mess doesn't just drop into a 56 CPU HP server... That is where all the hardware/electrical engineering comes in to play, right?
RE: newb suffering from information overload
by metaforest on Oct 24, 2017
metaforest
Posts: 10
Joined: Jul 21, 2017
Last seen: Sep 22, 2024
The path forward here would be to have a competent computer hardware engineering firm, with FPGA specialization, analyze your existing application for ways it could be accelerated. Based on the hardware and and redundancy you describe, there would be some kind of determination as to what custom-hardware-FPGA solutions are going to give a meaningful speedup, and the cost/complexity for developing what form it might take. My guess would be most-likely some kind of custom/semi-custom solution replacing the NICs.
It might be that such a partner could help develop a long term plan for a fully custom solution that could be migrated to...

I've worked on small scale custom embedded systems, and I worked on late 80's personal computer, and game platform hardware and firmware projects. Developing custom hardware, even with FPGAs to deal with the large blocks of functionality, is much less costly than it used to be. However, it is still a large undertaking for an enterprise that doesn't have experience with managing custom hardware projects.

FPGA tech is very helpful in reducing cost and risk in custom hardware solutions but it is not a silver bullet. There are still a lot of money and risk in the approach. Random engineers and enthusiasts here
are not being realistic is they tell you it is easy.

~m
RE: newb suffering from information overload
by aikijw on Oct 24, 2017
aikijw
Posts: 76
Joined: Oct 21, 2011
Last seen: Jul 8, 2023
LOL... This "random enthusiast" disagrees with your assessment of the difficulty involved in prototyping a solution to this problem... This is about six months of moderate effort... (My BoE... Ive been designing and producing complex signal processing solutions, for about 20 years... This problem is actually not that complex...)

EVERYTHING involved here is fairly low risk... Off the shelf hardware and mostly off the shelf IP... This is actually a fairly common thing to do, and has been for probably 10 years...

I do agree that the OP needs a competent partner... I'm on holiday right now... The financial industry isn't mainstream for my firm, but I'd be happy to discuss with the OP, if he wants to contact me offline... I can probably help find either a consultant, or determine whether my firm is able to assist (we are almost always pretty heavily booked, unfortunately...) I'm kind of done debating the difficulty of doing things that I do, more or less, on a daily basis...

/jw


The path forward here would be to have a competent computer hardware engineering firm, with FPGA specialization, analyze your existing application for ways it could be accelerated. Based on the hardware and and redundancy you describe, there would be some kind of determination as to what custom-hardware-FPGA solutions are going to give a meaningful speedup, and the cost/complexity for developing what form it might take. My guess would be most-likely some kind of custom/semi-custom solution replacing the NICs.
It might be that such a partner could help develop a long term plan for a fully custom solution that could be migrated to...

I've worked on small scale custom embedded systems, and I worked on late 80's personal computer, and game platform hardware and firmware projects. Developing custom hardware, even with FPGAs to deal with the large blocks of functionality, is much less costly than it used to be. However, it is still a large undertaking for an enterprise that doesn't have experience with managing custom hardware projects.

FPGA tech is very helpful in reducing cost and risk in custom hardware solutions but it is not a silver bullet. There are still a lot of money and risk in the approach. Random engineers and enthusiasts here
are not being realistic is they tell you it is easy.

~m
RE: newb suffering from information overload
by btah on Oct 24, 2017
btah
Posts: 10
Joined: Oct 17, 2017
Last seen: Oct 24, 2017
Thank you for all the responses. I think at this point, based on what I have found in my research, I have to put this aside for now. I had a different impression as to where the industry was at with this technology.

Thanks again.
RE: newb suffering from information overload
by inflector on Nov 6, 2017
inflector
Posts: 6
Joined: Aug 28, 2017
Last seen: May 31, 2018
This new product may change this equation entirely and be the sort of off-the-shelf product you were looking for:

http://www.mellanox.com/related-docs/prod_adapter_cards/PB_Innova-2_Flex.pdf

You get a fast 25Gb/s NIC with an FPGA built in.

You could use these on the input side to pre-parse packets, and on the output side to handle distribution of data to many clients with packets that need to be distributed to many different users for the same data updates.
RE: newb suffering from information overload
by aikijw on Nov 7, 2017
aikijw
Posts: 76
Joined: Oct 21, 2011
Last seen: Jul 8, 2023
Yep... You could accomplish the same thing with a PCIe based FMC carrier and an FMC based interface of choice (10 GigE/25 GigE/40 GigE/100 GigE). See the BOM I posted earlier... This would not be a difficult thing to accomplish. There is little "custom" hardware to build... I'll disagree with Metaforest again, in that this task isn't one I'd put in the "hard" or "risky" column... I think our respected concept of what is/is not "hard" is calibrated somewhat differently... :-) NetFPGA cards have been around for a REALLY long time... Here's another revelation... The original poster isn't the first person to consider this... It's been done... It's being used... It is, of course, proprietary to the company that built it... There are companies that produce turn-key solutions for this kind of thing... http://algo-logic.com/sites/default/files/Algo_Logic_How_To_Build_An_Exchange.pdf
RE: newb suffering from information overload
by metaforest on Nov 7, 2017
metaforest
Posts: 10
Joined: Jul 21, 2017
Last seen: Sep 22, 2024
Yep... You could accomplish the same thing with a PCIe based FMC carrier and an FMC based interface of choice (10 GigE/25 GigE/40 GigE/100 GigE). See the BOM I posted earlier... This would not be a difficult thing to accomplish. There is little "custom" hardware to build... I'll disagree with Metaforest again, in that this task isn't one I'd put in the "hard" or "risky" column... I think our respected concept of what is/is not "hard" is calibrated somewhat differently... :-) NetFPGA cards have been around for a REALLY long time...


The Innova-2 NIC looks like a good base for a solution to reduce risk and dev effort. I'd seen other boards like this that just weren't up to the bandwidth capacities, and/or the FPGA was too small to do anything of any serious complexity. With the card described in the mellanox paper, I think at least doing an initial assessment becomes more reasonable. There are also a lot fewer vendors involved than FMC based solutions.

~m
RE: newb suffering from information overload
by aikijw on Nov 11, 2017
aikijw
Posts: 76
Joined: Oct 21, 2011
Last seen: Jul 8, 2023
[SMH] Wow... So... I'm posting this to benefit other folks that may be considering FPGA based solutions, that may have run into excuses/reasoning from people that focus on fabricated "risk" arguments or the perception that "custom hardware" isn't a good solution... These arguments are often manufactured out of things like moist tissue paper and pre-chewed bubblegum, and it's important that young engineers learn to recognize this kind of thing...

A single vendor can supply the FMC based solution I suggested earlier. That single vendor can also supply a framework that will make this about a six month development effort. Your views about multiple vendors being either necessary, or a representative of "risk" are not supported by fact.

I don't understand exactly what you're talking about when you use the phrase "bandwidth capacities", but are you suggesting that a Zynq Ultrascale+ ZU19EG sitting on a board with an 8-lane PCIe v4 interface to the host isn't adequate to support two 25 GigE interfaces? If you are suggesting this, you are, in fact, incorrect. I don't recall the OP actually mentioning what his "bandwidth capacities" needed to be, in any case...

I suggested the Zynq Ultrascale+, over a Kintex Ultrascale, because it appeared the OP probably had more experience with "software" than "hardware", and providing him/her with four high performance ARM cores effectively reduces the development risk when building a prototype. Standing the board up with embedded Linux makes the prototype a pretty straightforward thing to do. Having a BigAss(tm) PL segment on the Zynq allows the OP to potentially move the more appropriate parts of his solution to dedicated logic. Vivado tends to support this really well. My view was that this would provide the OP a way to garner early acceptance/support via a rapid prototype, while providing room to evolve to an accelerated architecture. You seem to be suggesting that just a Kintex Ultrascale is adequate. What's your thinking on this? Are you now saying that the "hardware" is where the risk is, vs. firmware development? I'd suggest that you appear to be contradicting your earlier opinions about the risks associated with an effort like this. In the opinion of this "enthusiast" you seem to be red-shifting the risk register a bit here... Seems a little "throw the guy in the deep end" from where I sit... Maybe I should do a "CBA", or something?

I also suggested my earlier solution because, unlike the Mellanox board, the FMC carrier I suggested supports an HMC... I think an HMC has potential to significantly contribute to improved pattern matching performance. Do you disagree?

BTW... Functionality very close to the Mellanox board is also available from the SAME SINGLE FMC solution vendor that I mentioned earlier (The HTG-830). Kintex Ultrascale+ XCKU115... *DDR4* memory... 2 GB HMC... Two port QSFP module (2x 100G)... 8-lane PCIe Gen3 interface... I *know* that this solution works and I know that the OP's application would take about six months because I had two of this exact board on my bench six months ago. Two Zynq Ultrascale+ dev platforms are arriving on my bench within the next two weeks... I'm currently designing with the Zynq Ultrascale+ and will be turning out (
Honestly... I dunno... After a little knee-slappin', I see the Mellanox solution as being a MUCH higher risk solution. It's a custom card, bred for a single purpose, selling into a niche market. Personally, I'd choose to approach a prototyping effort with a less focused piece of hardware that allows me to, say, swap my 25G interfaces for 40G (or maybe 100G)... If I need a larger FPGA... I can upscale my FMC carrier... If I've overestimated, I can scale back and address cost... I'd also rather buy my prototyping fixtures from a company that builds about a bazillion boards a month, and that has the support infrastructure to lend a hand if I get sideways. (You might want to stop using the word "risk" so often... I'm not sure it means what you think it does... ;-))

Please stop calling people "random engineer" and/or "enthusiast"... I'm pretty sure you were talking about me, and while I'm certainly "enthusiastic", I'm far from "random"...

Respectfully,

/jw





The Innova-2 NIC looks like a good base for a solution to reduce risk and dev effort. I'd seen other boards like this that just weren't up to the bandwidth capacities, and/or the FPGA was too small to do anything of any serious complexity. With the card described in the mellanox paper, I think at least doing an initial assessment becomes more reasonable. There are also a lot fewer vendors involved than FMC based solutions.

~m
RE: newb suffering from information overload
by Trackside on Jan 11, 2018
Trackside
Posts: 1
Joined: Jan 11, 2018
Last seen: Jan 11, 2018
One thing you could do that would speed things up a lot is, instead of using FPGAs to implement custom hardware, compiling your current software into logic and running it in hardware instead of software. It should be possible to run an entire main loop in one clock cycle, something you haven't got a hope of doing with conventional computing. "C to gates", appears to be the best Google string. Wikipedia likes to call this a systolic array. It'd probably be a good idea to check and optimise the result manually, partially as the compiler has no concept of you representing a graphical program flowchart as a program as a graphical circuit, partly to check the complier hasn't gone down the cheap (and slow) route of implementing a small core with a virtual ROM of your program.
First Prev 2/2 no use no use
© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.