Default protocol ports are great, but ones that will work in the real world are better.
If you want something done properly, you should probably ignore the specification of the protocols you use every once in awhile. When I worked years ago in implementing protocols directly, there was this notion – you need to send messages in the strictest format possible but be very lenient in how you enable receiving them. The reason behind that is that by being strict on the sender side, you will achieve higher interoperability (more devices will be able to “decipher” what you sent) and by being lenient on the receiving side, you achieve the same (being able to understand messages from more devices). Somehow, it isn’t worth to be right here – it just makes more sense to be smart.
The same apply to default protocol ports.
Assume for the sake of argument that we have a theoretical protocol that requires the use of port number 5349. You setup the server, configure it to listen on that port (after all, we want to be standard compliant), and you run your service.
Will that work well for you?
For the most part, as the illustration above shows, yes it will.
The protocol is probably client-server based. A client somewhere from inside his private network is accessing the Internet, going to the public IP of your server to that specific port and connects. Life is good.
Only sometimes it isn’t.
Hmm… what’s going on here now? Someone in the IT department decided to block outgoing traffic to port 5349. Or maybe, just maybe, he decided to open outgoing traffic solely for ports 80 and 443. And why would he do that? Because that’s where HTTP and HTTPS traffic go to, which is web servers that our browsers connect to. And I don’t know any blue collar employee today who would be able to do his job without connecting the the Internet with his browser. Writing this draft of an article requires such a connection (I do it on Google Doc and then copy it to WordPress once done).
So the same scenario, with the same requirements won’t work if our server decides to use the default port 5349.
What if we decide to pass it through port 443?
Now it has a better chance of working. Why? Because port 443 is reserved for TLS traffic, which is encrypted. This means that beyond the destination of the data, the firewall we’re dealing with can’t know a thing about what’s being sent or where, so he will usually treat it as “HTTPS” type of traffic and will just pass it along.
There are caveats here. If the enterprise is enforcing a local trusted web proxy, it actually acts as a man in the middle and opens all packets, which means he now sees the traffic and might decide not to pass it since he can’t understand it.
What we’re aiming for is best coverage. And port 443 will give us that. It might get blocked, but there’s less of a chance for that to happen.
Here are a few examples where ignoring your protocol default ports is suggested:
TURNThe reason for this article is TURN. TURN is used by WebRTC (and other protocols) to get your media session connected in case you can’t send it directly peer-to-peer. It acts as a relay to the media that sits in the public internet with the sole purpose of punching holes in NATs and traversing firewalls.
TURN runs over UDP, TCP and TLS. And yes. You WANT to configure and run it on UDP, TCP and TLS (don’t be lazy – configure them all – it won’t cost you more).
Want to learn more about WebRTC in general and NAT traversal specifically? Enroll to my WebRTC training today to become a pro WebRTC developer.
The default ports for your STUN and TURN servers (you’re most probably going to deploy them in the same process) are:
A few things that come to mind from this list above:
Here’s the thing. If you deploy only STUN, then many WebRTC sessions won’t connect. If you deploy also with TURN/UDP then some sessions still won’t connect (mainly because of IT admins blocking UDP altogether). TURN/TCP might not connect either. And guess what – TURN/TLS on 5349 can still be blocked.
What a developer to do in such a case?
Just point your WebRTC devices towards port 443 for ALL of your STUN/TURN traffic and be done with it. This approach has no real downsides versus deploying with the default ports and all the potential upsides.
Here’s how a couple of services I checked almost on random do this properly (I’ve used chrome://webrtc-internals to get them):
Hangouts Meet
Or Google Hangouts. Or Google Meet. Or whatever name it now has. I did use the Meet one:
https://meet.google.com/goe-nxxv-ryp?authuser=1, { iceServers: [stun:stun.l.google.com:19302, stun:stun1.l.google.com:19302, stun:stun2.l.google.com:19302, stun:stun3.l.google.com:19302, stun:stun4.l.google.com:19302], iceTransportPolicy: all, bundlePolicy: max-bundle, rtcpMuxPolicy: require, iceCandidatePoolSize: 0 }, {enableDtlsSrtp: {exact: false}, enableRtpDataChannels: {exact: true}, advanced: [{googHighStartBitrate: {exact: 0}}, {googPayloadPadding: {exact: true}}, {googScreencastMinBitrate: {exact: 400}}, {googCpuOveruseDetection: {exact: true}}, {googCpuOveruseEncodeUsage: {exact: true}}, {googCpuUnderuseThreshold: {exact: 55}}, {googCpuOveruseThreshold: {exact: 85}}]}
Google Meet comes with STUN:19302 with 5 different subdomain names for the server. There’s no TURN here because the service uses ICE-TCP directly from their media servers.
The selection of port 19302 is quaint. I couldn’t find any reference to that number or why it is interesting (not even a mathematical one).
Google AppRTC
You’d think Google’s showcase of WebRTC would be an exemplary citizen of a solid STUN/TURN configuration. Well… he’s what it got me:
https://appr.tc/r/986533821, { iceServers: [turn:74.125.140.127:19305?transport=udp, turn:[2a00:1450:400c:c08::7f]:19305?transport=udp, turn:74.125.140.127:443?transport=tcp, turn:[2a00:1450:400c:c08::7f]:443?transport=tcp, stun:stun.l.google.com:19302], iceTransportPolicy: all, bundlePolicy: max-bundle, rtcpMuxPolicy: require, iceCandidatePoolSize: 0 },
It had TURN/UDP at 19305, TURN/TCP at 443 and STUN at 19302. Unlike others, it had explicit IPv6 addresses. It had no TURN/TLS.
Jitsi Meet
https://meet.jit.si/RandomWerewolvesPierceAlone, { iceServers: [stun:all-eu-central-1-turn.jitsi.net:443, turn:all-eu-central-1-turn.jitsi.net:443, turn:all-eu-central-1-turn.jitsi.net:443?transport=tcp, stun:all-eu-west-1-turn.jitsi.net:443, turn:all-eu-west-1-turn.jitsi.net:443, turn:all-eu-west-1-turn.jitsi.net:443?transport=tcp, stun:all-eu-west-2-turn.jitsi.net:443, turn:all-eu-west-2-turn.jitsi.net:443, turn:all-eu-west-2-turn.jitsi.net:443?transport=tcp], iceTransportPolicy: all, bundlePolicy: balanced, rtcpMuxPolicy: require, iceCandidatePoolSize: 0 }, {advanced: [{googHighStartBitrate: {exact: 0}}, {googPayloadPadding: {exact: true}}, {googScreencastMinBitrate: {exact: 400}}, {googCpuOveruseDetection: {exact: true}}, {googCpuOveruseEncodeUsage: {exact: true}}, {googCpuUnderuseThreshold: {exact: 55}}, {googCpuOveruseThreshold: {exact: 85}}, {googEnableVideoSuspendBelowMinBitrate: {exact: true}}]}
Jitsi shows multiple locations for STUN and TURN – eu-central, eu-west with STUN:443, TURN/UDP:443 and TURN/TCP:443. No TURN/TLS.
appear.in
https://appear.in/bloggeek, { iceServers: [turn:turn.appear.in:443?transport=udp, turn:turn.appear.in:443?transport=tcp, turns:turn.appear.in:443?transport=tcp], iceTransportPolicy: all, bundlePolicy: balanced, rtcpMuxPolicy: require, iceCandidatePoolSize: 0 }, {advanced: [{googCpuOveruseDetection: {exact: true}}]}
appear.in went for TURN/UDP:443, TURN/TCP:443 and TURN/TLS:443. STUN is implicit here via the use of TURN.
Facebook Messenger
https://www.messenger.com/videocall/incall/?peer_id=100000919010117, { iceServers: [stun:stun.fbsbx.com:3478, turn:157.240.1.48:40002?transport=udp, turn:157.240.1.48:3478?transport=tcp, turn:157.240.1.48:443?transport=tcp], iceTransportPolicy: all, bundlePolicy: balanced, rtcpMuxPolicy: require, iceCandidatePoolSize: 0 }, {advanced: [{enableDtlsSrtp: {exact: true}}]}
Messenger uses port 3478 for STUN, TURN over UDP on port 40002, TURN over TCP on port 3478. It also uses TURN over TCP on port 443. No TURN/TLS for Messenger.
Here’s what I’ve learned here:We’ve looked at at NAT Traversal and its STUN and TURN server. But what about some signaling protocols? The first one that came to mind when I thought about other examples was MQTT.
MQTT is a messaging protocol that is used in the IOT and M2M space. Others use it as well – Facebook for example:
They explained how MQTT is used as part of their Messenger backend for the WebRTC signaling (and I guess all other messages they send over Messenger).
MQTT can run over TCP listening on port 1883 and over TLS on port 8883. But then when you look at the AWS documentation for AWS IOT, you find this:
There’s no port 1883 at all, and now port 443 can be used directly if needed.
It would be interesting to know if Facebook Messenger on their mobile app use MQTT over port 443 or 8883 – and if it is port 443, is it MQTT over TLS or MQTT over WebSocket. If what they do with their STUN and TURN servers is any indication, any port number here is a good guess.
SIPSIP is the most common VoIP signaling protocol out there. I haven’t remembered the details, so I checked in Wikipedia:
SIP clients typically use TCP or UDP on port numbers 5060 or 5061 for SIP traffic to servers and other endpoints. Port 5060 is commonly used for non-encrypted signaling traffic whereas port 5061 is typically used for traffic encrypted with Transport Layer Security (TLS).
Port 5060 for UDP and TCP traffic. And port 5061 for TLS traffic.
Then I asked a friend who knows a thing or two about SIP (he’s built more than his share of production SIP networks). His immediate answer?
443.
He remembered 5060 was UDP, 5061 was TCP and 443 is for TLS.
When you want to deploy a production SIP network, you configure your servers to do SIP over TLS on port 443.
Next StepsIf you are looking at protocol implementations and you happen to see some default ports that are required, ask yourself if using them is in your best interest. To get past firewalls and other nasty devices along the route, you might want to consider using other ports.
While you’re at it, I’d avoid sending stuff in the clear if possible and opt for TLS on the connection, which brings us back to 443. Possibly the most important port on the Internet.
If you are serious about learning WebRTC, then check out my online WebRTC training:
The post You Better Ignore the Default Protocol Ports You Implement appeared first on BlogGeek.me.
Open Source SDKs from SaaS vendors aren’t interesting.
Every once in awhile, I see a SaaS vendor boasting to have open source SDKs. The assumption is that if you say “open source” on something you are doing it immediately makes the thing free and open. The truth is far from it.
Planning on selecting a CPaaS vendor? Check out this shortlist of CPaaS vendor selection metrics:
Get the shortlist
Open Source TodayI want to start with an explanation of open source today.
Open source is a way for a vendor or a single developer to share his code with the “community” at large. There are many reasons why a vendor would do such a thing:
The above reasons are related to companies with proprietary software that they want protected. What they end up doing, is share modules or parts of their codebase as open source. Usually ones they assume won’t help a competitor copy and compete with them directly.
The other approach, is to use open source as a full fledged business model:
A good example here is FreeSWITCH. They are offering support and customization work around this popular open source project. And now, there’s SignalWire, an upcoming hosted version of FreeSWITCH.
You see, for a company to employ open source, there needs to be an upside. Philanthropy isn’t a business model for most.
Cloud versus On-premise when Consuming Open SourceSaaS changes the equation a bit.
I tried placing different open source licenses on a kind of a graph, alongside different deployment models. Here’s what I got:
(if you’re interested here’s where to learn more about open source licenses)
CPaaS and SaaS in general are cloud deployments. They enable the company more leeway in the type of open source licenses it can consume. An on-premise type of business better beware of using GPL, whereas a cloud deployment one is just fine using GPL.
This isn’t to say that GPL can’t be used by on premise deployments – just that it complicates things to a point that oftentimes the risks of doing so outweighs the potential reward.
CPaaS / SaaS vendors and InterfacesOn the other end of the equation you’ll find how customers interact with CPaaS vendors.
Towards that goal, the main approach today is by way of an API. And APIs today are almost always defined using REST.
In the illustration above, we have a SaaS or CPaaS vendor exposing a REST API. On top of that API, customers can build their own applications. The vendor wants to make life easier for them, to increase adoption, so he ends up implementing helper libraries. The helper libraries can be official ones or unofficial ones, either created by third parties or the vendor himself. They can just be reference implementations on top of the API, offered as starting points to customers with no real documentation or interface of their own.
For the most part, helper libraries are something I’d expect customers to deploy and run on their servers, to make it easier for them to connect from whatever language and framework they want to use to the vendor’s service.
On a client device, we have SDKs. In some ways, SDKs are just like helper libraries. They connect to the backend REST API, though sometimes they may have a more direct/optimized connection to the platform (proprietary, undocumented WebSocket connection for example).
SDKs is something you’ll find with most of the services where a state machine needs to be maintained on the client side. In the context of most of the things I write here, this includes CPaaS platforms deciding to offer VoIP calling (voice or video) by way of WebRTC or by other means over non-browser implementations. In many of these cases, the developers never actually implement REST calls – they just use the SDK’s interface to get things done.
Which is where the notion of open source SDKs sometimes comes up.
The Open Source SDKIf we’re talking about a SaaS platform, then having the source code of the SDK has its benefits, but none of them relate to “open source”. There’s no ecosystem or adoption at play for the open source code.
The reasons why we’d like to have the source code of an SDK are varied:
Here’s the thing though –
Trying to market the SDK as open source is kinda misleading as to what you’re getting out of your end of the deal.
When it comes to CPaaS and WebRTC, there’s this added complexity: vendors will “open source” or give the source code of their JS SDK (because there’s no real alternative today, at least not until WebAssembly becomes commonplace). As for the Android and iOS SDKs, I don’t remember seeing one that is offered in source code form – probably because all vendors are tweaking and modifying the baseline WebRTC code.
SaaS and Open SourceIn a way, SaaS has changed the models and uses of open source. When it was first introduced to the world, software was executed on premise only. There was no cloud, and SDKs and frameworks were commercially licensed. If you wanted something done, you either had to license it or build it yourself.
Open source came and changed all that by enabling vendors to build on top of open source code. Vendors came out with business models around dual licensing of code as well as support and customization models.
SaaS vendors today use open source in three different ways:
Planning on selecting a CPaaS vendor? Check out this shortlist of CPaaS vendor selection metrics:
Get the shortlist
The post “Open Source” SDK for SaaS and CPaaS are… Meh appeared first on BlogGeek.me.
TL;DR – YES.
Do I need a media server for a one-to-many WebRTC broadcast?
That’s the question I was asked on my chat widget this week. The answer was simple enough – yes.
Decided you need a media server? Here are a few questions to ask yourself when selecting an open source media server alternative.
Get the Selection Sheet
Then I received a follow up question that I didn’t expect:
Why?
That caught me off-guard. Not because I don’t know the answer. Because I didn’t know how to explain it in a single sentence that fits nicely in the chat widget. I guess it isn’t such a simple question either.
The simple answer is a limit in resources, along with the fact that we don’t control most of these resources.
The Hard Upper LimitWhenever we want to connect one browser to another with a direct stream, we need to create and use a peer connection.
Chrome 65 includes an upper limit to that which is used for garbage collection purposes. Chrome is not going to allow more than 500 concurrent peer connections to exist.
500 is a really large number. If you plan on more than 10 concurrent peer connections, you should be one of those who know what they are doing (and don’t need this blog). Going above 50 seems like a bad idea for all use cases that I can remember taking part of.
Understand that resources are limited. Free and implemented in the browser doesn’t mean that there aren’t any costs associated with it or a need for you to implement stuff and sweat while doing so.
Bitrates, Speeds and FeedsThis is probably the main reason why you can’t broadcast with WebRTC, or with any other technology.
We are looking at a challenging domain with WebRTC. Media processing is hard. Real time media processing is harder.
Assume we want to broadcast a video at a low VGA resolution. We checked and decided that 500kbps of bitrate offers good results for our needs.
What happens if we want to broadcast our stream to 10 people?
Broadcasting our stream to 10 people requires bitrate of 5mbps uplink.
If we’re on an ADSL connection, then we can find ourselves with 1-3mbps uplink only, so we won’t be able to broadcast the stream to our 10 viewers.
For the most part, we don’t control where our broadcasters are going to be. Over ADSL? WiFi? 3G network with poor connectivity? The moment we start dealing with broadcast we will need to make such assumptions.
That’s for 10 viewers. What if we’re looking for 100 viewers? A 1,000? A million?
With a media server, we decide the network connectivity, the machine type of the server, etc. We can decide to cascade media servers to grow our scale of the broadcast. We have more control over the situation.
Broadcasting a WebRTC stream requires a media server.
Sender UniformityI see this one a lot in the context of a mesh group call, but it is just as relevant towards broadcast.
When we use WebRTC for a broadcast type of a service, a lot of decisions end up taking place in the media server. If a viewer has a bad network, this will result with packet loss being reported to the media server. What should the media server do in such a case?
While there’s no simple answer to this question, the alternatives here include:
You can’t do most of these in a browser. The browser will tend to use the same single encoded stream as is to send to all others, and it won’t do a good job at estimating bandwidth properly in front of multiple users. It is just not designed or implemented to do that.
You Need a Media ServerIn most scenarios, you will need a media server in your implementation at some point.
If you are broadcasting, then a media server is mandatory. And no. Google doesn’t offer such a free service or even open source code that is geared towards that use case.
It doesn’t mean it is impossible – just that you’ll need to work harder to get there.
Looking to learn more about WebRTC? In the coming weeks, I’ll be refreshing my online WebRTC training. Join now so you don’t miss out.
The post Do I Need a Media Server for a One-to-Many WebRTC Broadcast? appeared first on BlogGeek.me.
Time to stop playing things on the internet and start building the internet of things.
We’ve been using that stupid IOT acronym for quite some time. Probably a decade. The idea and notion that every object can be network enabled, share its collected data and receive its commands remotely is quite exciting. I think we’re far from that vision.
It isn’t that we’re not making progress. We are. The apartment building I now live in is 3 years old. It is more automated than the previous apartment building I lived in, which was 15 years old. I wouldn’t call it IOT or a smart building quite yet. And I don’t think there’s a simple way to turn a dumb building into a smart one either.
When we moved to our new apartment we renovated a bit. There was this opportunity to add smart-home capabilities into the apartment. There were just a few teeny set of problems here:
And to top it all, it felt like a one time undertaking that will be hard/impossible to upgrade or modify later on without a complete overhaul. That wasn’t what I was aiming for.
Mozilla just announced their Things Gateway that can be installed on a Raspberry Pi 3. It is a rather interesting project, especially since its learnings are then applied to the W3C Web of Things Interest Group with the intent of reducing the fragmentation of IOT. They’ve got their hands full of work.
IOT today is a patchwork of devices and companies, each trying to become a dominant player. The end result is that we’re living in a world where things can be placed on the internet, but they don’t amount for an internet of things.
Here are a few questions/hurdles that I think we’ll need to answer as an industry before we can reach that vision of IOT.
SecurityI am putting security here first. Here’s why:
I’ve seen it happen with VoIP and it is definitely happening today with IOT.
Until this becomes a priority, IOT will not really happen.
Security has many different aspects to it:
Most vendors won’t be able to get these done properly to being with. And they don’t have any real incentive to do that either.
StandardizationThere’s a need for standardization in this space. One that tackles all levels of the IOT food-chain.
Out of the top of my head, here are a few areas:
I don’t believe we’ll get this thing standardized properly in our industry for quite some time.
AutomationI’ve seen a lot of rules engines when it comes to IOT. You can program them to create sequences of events – if the density sensor indicates someone is at home, open the lights.
The problem is that you need to program them. This can’t scale.
The other problem is the issue of what to do with all that sensor data? Someone needs to collect it, aggregate it, process it, analyze it and make decisions out of it.
Simple rule engines are nice, but they won’t get us far down the IOT path.
We also need to add machine learning and AI into the mix.
The end result? Probably similar in nature to AWS Deep Lens. Only problem, it either needs to be really generic and flexible.
Different Industries, Different Requirements and EcosystemsThere are different markets in IOT. they have different needs and different customers. They will have different ecosystems around them.
In broad strokes, we can split to consumer and enterprise. Enterprise here includes industrial, smart cities, etc. The consumer is all about the home, the car and the self.
Who will be the players here?
From Smartphones to Smart SpeakersThis is where I think we made the most progress.
Up until a year ago, IOT was something you end up delivering to customers via apps on a smartphone. You purchase a lightbulb, you get an app. You get a new TV, there’s an app. Refrigerator? App.
Amazon Alexa did something miraculous. It moved the discussion over the home from an app towards a stationary home device with voice activation and control. No screen or touch screen needed.
Since then, Google and Apple have joined and voice assistants in the home are all the rage now.
In some ways, I expect this to find its way into the enterprise as well. First via conference rooms and later – who knows?
This is one more piece in the IOT puzzle.
Where do we go from here?I have no clue.
To me, it seems that we’re still in the things on the internet, and we will be there for a lot longer.
The post The Internet of Things or Things on the Internet? appeared first on BlogGeek.me.
There are things you don’t want to do when you are NIH’ing your way to a stellar WebRTC application.
Here’s a true, sad story. This month, the unimaginable happened. Rain (!) dropped from the sky here in Israel. The end of it was that 6 apartments in my building are suffering from moisture due to a leakage from a balcony of the penthouse. Being a new building, we’re at the mercies of the contractor to fix it.
Nothing in the construction market moves fast in Israel – or without threats, so we had to start sending official sounding letters to the constructor about the leak. I took charge, and immediately said we need to lawyer up and have a professional assist us in writing a letter from us to the constructor. Others were in the opinion we can do it on our own, as we need a lawyer only if he is signed directly on the document.
And then it hit me. I wanted to lawyer up is because I see many smart people failing with WebRTC. They are making rookie mistakes, and I didn’t want to make rookie mistakes when it comes to the moisture problems in my apartment.
Why are we Failing with WebRTC?I am not sure that smart people fail a lot more around WebRTC technology than they are with other technologies, but it certainly feels that way.
A famous Mark Twain quote goes like this:
“There is no such thing as a new idea. It is impossible. We simply take a lot of old ideas and put them into a sort of mental kaleidoscope. We give them a turn and they make new and curious combinations. We keep on turning and making new combinations indefinitely; but they are the same old pieces of colored glass that have been in use through all the ages.”
Many of the rookie mistakes people do about WebRTC stems from this. WebRTC is this kind of new. It is simply a lot of old ideas meshed into a new and curious combination. So we know it. And we assume we know how to handle ourselves around it.
Entrepreneurs? Skype is 14 years old. It shouldn’t be that hard to build something like Skype today.
VoIP developers? SIP we know. WebRTC is just SIP without the signaling. So we force SIP onto it and we’re done.
Web developers? WebRTC is part of HTML5. A few lines of JS code and we’re practically ready to go live.
Video developers? We can just take the WebRTC video feeds and put them on a CDN. Can’t we?
The result?
My biggest gripe recently is people who decide in 2018 that peerJS is what they need for their WebRTC application. A project with 402 lines of code, last updated in 2015 (!). You can’t use such code with WebRTC. Code older than a year is stale or dead already. WebRTC is still too new and too dynamic.
That said, it isn’t as if you have a choice anymore. Flash is dying, and there’s no other serious alternative to WebRTC. If you’re thinking of adopting WebRTC, then here are five mistakes to avoid.
Mistake #1: Failing to Configure STUN/TURNYou wouldn’t believe how often developers fail to configure NAT traversal servers. Just yesterday I had someone ask me over the chat widget of my website how can he run his application by hosting his signaling and web servers on HostGator without any STUN/TURN servers. It just doesn’t work.
The simple answer is that you can’t – barring some esoteric use cases, you will definitely need STUN servers. And for most use cases, TURN servers will also be mandatory if you want sessions to connect.
In the past month, I found myself explaining quite a lot about NAT traversal:
There’s more, but this should get you started.
Mistake #2: Selecting the WRONG Signaling FrameworkPeerJS anyone? PeerJS feels like a tourist trap:
With 1,693 stars and 499 forks, PeerJS is one of the most popular WebRTC projects on github. What can go wrong?
Maybe the fact that it is older than the internet?
A WebRTC project that had its last commit 3 years ago can’t be used today.
Same goes for using Muaz Khan’s code snippets and expecting them to be commercial grade, stable, highly scalable products. They’re not. They’re just very useful code snippets.
Planning to use some open source project? Make sure that:
Don’t take the selection process here lightly. Not when it comes to a signaling server and not when it comes to a media server.
Mistake #3: Not Using Media Servers When You ShouldI know what you’re thinking. WebRTC is peer to peer so there’s no need for servers. Some think that even signaling and web servers aren’t needed – I hope they can explain how participants are going to find each other.
To some, this peer to peer concept also means that you can run these ridiculously large scale sessions with no servers that carry on media.
Here are two such “architectures” I come across:
Mesh. It’s great. Don’t assume you can get it to run properly this year or the next. Move on.
Live broadcasting by forwarding content. It can be done, but most probably not the way you expect it to grow to a million users with no infrastructure and zero latency.
For many of the use cases out there, you will need a media server to process and route the media for you. Now that you are aware of it, go search for an open source media server. Or a commercial one.
Mistake #4: Thinking Short-TermYou get an outsourcing vendor. Write him a nice requirements doc. Pay him. Get something implemented. And you’re done.
Not really.
WebRTC is still at its infancy. The spec is changing. Browser implementations are changing. It is all in flux all the time. If you’re going to use WebRTC, either:
WebRTC code rots faster than most other HTML5 code. It will eventually change, but we’re not there yet.
It is also the reason I started with a few colleagues testRTC a few years ago. To help with the lifecycle of WebRTC applications, especially in the area of testing and monitoring.
Mistake #5: Failing to Understand WebRTCThey say assumption is the mother of all mistakes. Google seems to agree with it. Almost.
WebRTC isn’t trivial. It sits somewhere between VoIP and the web. It is new, and the information out there on the Internet about it is scattered and somewhat dynamic (which means lots of it isn’t accurate).
If you plan on using WebRTC, make sure you first understand it and its intricacies. Understand the servers that are needed to deploy a WebRTC application. Understand the signaling mechanisms that are built into WebRTC. Understand how media is processes and sent over the network. understand the rich ecosystem of solutions that can be used with WebRTC to build a production ready system.
Lots of things to learn here. Don’t assume you know WebRTC just because you know web development or because you know VoIP or video processing.
If you are looking to seriously learn WebRTC, why not enroll to my Advanced WebRTC Architecture course?
–
What about my apartment? We’ve lawyered up, and now I have someone review and fix all the official sounding letters we’re sending out. Hopefully, it will get us faster to a resolution.
The post 5 Mistakes to Avoid When Developing WebRTC Applications appeared first on BlogGeek.me.
For WebRTC, Mobile and PC are moving in different directions. In the desktop, WebRTC Electron apps are gaining momentum.
In the good old days, people used to complain that WebRTC isn’t available on all browsers. Mobile was less of an issue for most as mobile application developers port WebRTC and use it natively on both iOS and Android.
How times change.
Need to know where WebRTC is available? Download this free WebRTC Device Cheat Sheet.
Today? All modern browsers support WebRTC. We’ve got Chrome, Firefox, Edge and Safari with official WebRTC implementations.
The challenge? None of the browsers are ready:
What’s a developer to do?
Use adapter.js. Or go for a plugin. Or just ignore a few browsers.
Or maybe. Just maybe you should treat PCs and laptops the same way you do mobile? And build an app.
If that’s what you plan on doing then you’re not alone.
The most popular way to build an app for the desktop is by using Electron. There are other ways, like CEF and actual native development, but Electron is by far the most common approach.
Here are 3 vendors making use of Electron (and WebRTC) for their desktop application:
#1 – SlackSlack are a popular team collaboration application. I’ve been using it in the browser for the last 3 years, but switched to their desktop Electron app on both my Ubuntu desktop and my Windows 10 laptop.
Why didn’t I use the app for so long? Because I don’t like installing things.
Why have I installed it now? Because I need to track 3+ slack accounts in parallel at all times now. This means a tab per slack account in my browser. On the desktop app, they don’t “eat up” multiple tabs. It isn’t a matter of memory or performance for me. Just one of “esthetics” – trying to preserve a tabs diet on my Chrome.
And that’s how Slack likes it. During the last Kranky Geek, the Slack team gave an interesting presentation about their current plans. It had about a minute dedicated to Electron in 2:30 of the session:
This recording lacks the Q&A part of the session. In an answer to a question regarding browsers support, Andrew MacDonald of Slack, said their focus is in their desktop app – not the browser. They make sure everything works on Chrome. Invest less time and effort on the other browsers. And focus a lot on their Slack desktop application.
It was telling.
If you are looking for desktop-application-only-features in Slack, then besides having a single window for all projects, there’s the collaboration they offer during screen sharing that isn’t available in the browser (yet another reason for me to switch – to check it out).
During that session, at 2:30 minutes? Andrew says why Electron is so useful to Slack, and it is in the domain of cross platform development and time to market – with their team size, they can’t update as fast as Electron does, so they took it “as is” for the built-in WebRTC implementation of it.
#2 – DiscordDiscord is a kind of Slack but different. A social network targeting gamers. You can also find there non-gaming groups. Discord is doing all it can to get you from the comfort of your browser right into their native application.
Here’s how the homepage looks like:
From the get go their call to action is to either Open Discord (in the browser) or Download for your operating system. On mobile, if you’re curious, the only alternative is to download the app.
Here’s the interesting part, though.
Discord’s call to action suggest by using green buttons you open Discord in the browser. That’s a lower friction action. You select a user name. Then pick an email and password (or use an unclaimed channel until you add your username and password). And now that you’re signed up for the service, it is time to suggest again you use their app:
And… if you skip this one, you’ll get a top bar reminder as well (that orange strip at the top):
You can do with Discord almost anything inside the browser, but they really really really want to get you off that damn internet and into their desktop app.
And it is working for them!
#3 – TalkDeskTalkDesk has its own reason for adopting Electron.
TalkDesk is a contact center solution that integrates with CRMs and third party systems. Towards that goal, you can:
That third option is going the way of the dodo, along with Chrome apps. TalkDesk solved that by introducing Callbar Electron.
What we see here differs slightly from the previous two examples.
Where Slack and Discord try getting people off the web and into their desktop application, TalkDesk is just trying to be everywhere for them. Using HTML5 and Electron means they need not write yet-another-application for the desktop – they can reuse parts of their web app.
They are NOT AloneThere are other vendors I know of that are using Electron for their WebRTC applications. They do it for one of the following reasons:
Add to that CPaaS vendors officially supporting Electron. Vidyo.io and TokBox are such examples. They do it not because they think it is nice, but because there’s customer demand for it.
This shift towards Electron apps makes it harder to estimate the real usage base of WebRTC. If most communications is shifting from Chrome browser (lets face it, most WebRTC comms happens in Chrome today if you only care about browsers) towards applications, then the statistics and trends collected by Google about WebRTC use are skewed. That said, it makes Chrome all the more dominant, as Electron use can be attributed back to Chromium.
Expect vendors to continue adopting Electron for their WebRTC applications. This trend is on .
Need to know where WebRTC is available? Download this free WebRTC Device Cheat Sheet.
The post WebRTC Electron Implementations are on 🔥 appeared first on BlogGeek.me.
Are AI cameras in our future?
In last year’s AWS re:invent event, which took place end of November, Amazon unveiled an interesting product: AWS DeepLens
There’s decent information about this new device on Amazon’s own website but very little of anything else out there. I decided to put my own thoughts on “paper” here as well.
Interested in AI, vision and where it meets communications? I am going to cover this topic in future articles, so you might want to sign-up for my newsletter
Get my free content
What is AWS DeepLens?AWS DeepLens is the combination of 3 components: hardware (camera + machine), software and cloud. These 3 come in a tight integration that I haven’t seen before in a device that is first and foremost targeting developers.
With DeepLens, you can handle inference of video (and probably audio) inputs in the camera itself, without shipping the captured media towards the cloud.
The hype words that go along with this device? Machine Vision (or Computer Vision), Deep Learning (or Machine Learning), Serverless, IoT, Edge Computing.
It is all these words and probably more, but it is also somewhat less. It is a first tentative step of what a camera module will look like 5 years from today.
I’d like to go over the hardware and software and see how they combine into a solution.
AWS DeepLens HardwareAWS DeepLens hardware is essentially a camera that has been glued to an Intel NUC device:
Neither the camera nor the compute are on the higher end of the scale, which is just fine considering where we’re headed here – gazillion of low cost devices that can see.
The device itself was built in collaboration with Intel. As all chipset vendors, Intel is plunging into AI and deep learning as well. More on AWS+Intel vs Google later.
Here’s what’s in this package, based on the AWS blog post on DeepLens:
The hardware tries to look somewhat polished, but it isn’t. Although this isn’t written anywhere, this is:
In a way, this is just a more polished hardware version of Google’s computer vision kit. The real difference comes with the available tooling and workflow that Amazon baked into AWS DeepLens.
AWS DeepLens SoftwareThe AWS DeepLens software is where things get really interesting.
Before we get there, we need to understand a bit how machine learning works. At its basic, machine learning is about giving a “machine” a large dataset, letting it learn the data in one way or another, and then when you introduce similar new data, it will be able to classify it.
Dumbing the whole process and theory, at the end of the day, machine learning is built out of two main steps:
With AWS DeepLens, the intent is to run the training in the AWS cloud (obviously), and then run the deployment step for real time classification directly on the AWS DeepLens device. This also means that we can run this while being disconnected from the cloud and from any other network.
How does all this come to play in AWS DeepLens software stack?
On deviceOn the device, AWS DeepLens runs two main packages:
Why MXNet and not TensorFlow?
The main component here is the new Amazon SageMaker:
SageMarker takes the effort away from the management of training machine learning, streamlining the whole process. That last step in the process of Deploy takes place in this case directly on AWS DeepLens.
Besides SageMaker, when using DeepLens you will probably make use of Amazon S3 for storage, Amazon Lambda when running serverless in the cloud, as well as other AWS services. Amazon even suggests using AWS DeepLens along with the newly announced Amazon Rekognition Video service.
To top it all, Amazon has a few pre-trained models and sample projects, shortening the path from getting a hold of an AWS DeepLens device to seeing it in action.
AWS+Intel vs GoogleSo we’ve got AWS DeepLens. With its set of on-device and cloud software tools. Time to see what that means in the bigger picture.
I’d like to start with the main players in this story. Amazon, Intel and Google. Obviously, Google wasn’t part of the announcement. Its TensorFlow project was mentioned in various places and can be made to work with AWS DeepLens. But that’s about it.
Google is interesting here because it is THE company today that is synonymous to AI. And there’s the increasing rivalry between Amazon and Google that seems to be going on multiple fronts.
When Google came out with TensorFlow, it was with the intent of creating a baseline for artificial intelligence modeling that everyone will be using. It open sourced the code and let people play with it. That part succeeded nicely. TensorFlow is definitely one of the first projects developers would try to dabble with when it comes to machine learning. The problem with TensorFlow seems to be the amount of memory and CPU it requires for its computations compared to other frameworks. That is probably one of the main reasons why Amazon decided to place its own managed AI services on a different framework, ending up with MXNet which is said to be leaner with good scaling capabilities.
Google did one more thing though. It created its own special Tensor processing unit, calling it TPU. This is an ASIC type of a chip, designed specifically for high performance of machine learning calculations. In a research paper released by Google earlier last year, they show how their TPUs perform better than GPUs when it comes to TensorFlow machine learning work loads:
And if you’re wondering – you can get CLOUD TPU on the Google Cloud Platform, albait this is still in alpha stage.
This gives Google an advantage in hosting managed TensorFlow jobs, posing a threat to AWS when it comes to AI heavy applications (which is where we’re all headed anyway). So Amazon couldn’t really pick TensorFlow as its winning horse here.
Intel? They don’t sell TPUs at the moment. And like any other chip vendor, they are banking and investing heavily in AI. Which made working with AWS here on optimizing and working on end-to-end machine learning solutions for the internet of things in the form of AWS DeepLens an obvious choice.
Artificial Intelligence and VisionThese days, it seems that every possible action or task is being scrutinized to see if artificial intelligence can be used to improve it. Vision is no different. You can find it other computer vision or machine vision and it covers a broad set of capabilities and algorithms.
Roughly speaking, there are two types of use cases here:
As with anything else in artificial intelligence and analytics, none of this is workable at the moment for a broad spectrum of classifications. You need to be very specific in what you are searching and aiming for, and this isn’t going to change in the near future.
On the other hand, there are many many cases where what you need is a camera to classify a very specific and narrow vision problem. The usual things include person detection for security cameras, counting people at an entrance to a store, etc. There are other areas you hear about today such as using drones for visual inspection of facilities and robots being more flexible in assembly lines.
We’re at a point where we already have billions of cameras out there. They are in our smartphones and are considered a commodity. These cameras and sensors are now headed into a lot of devices to power the IOT world and allow it to “see”. The AWS DeepLens is one such tool that just happened to package and streamline the whole process of machine vision.
PricingOn the price side, the AWS DeepLens is far from a cheap product.
The baseline cost is of an AWS DeepLens camera? $249
But as with other connected devices, that’s only a small part of the story. The device is intended to be connected to the AWS cloud and there the real story (and costs) takes place.
The two leading cost centers after the device itself are going to be AWS Greengrass and Amazon SageMaker.
AWS Greegrass starts at $1.49 per year per device. Amazon SageMaker costs 20-25% on top of the usual AWS EC2 machine prices. To that, add the usual bandwidth and storage pricing of AWS, and higher prices for certain regions and discounts on large quantities.
It isn’t cheap.
This is a new service that is quite generic and is aimed at tinkerers. Startups looking to try out and experiment with new ideas. It is also the first iteration of Amazon with such an intriguing device.
I, for one, can’t wait to see where this is leading us.
3 Different Compute Models for Machine VisionAWS DeepLens is one of 3 different compute models that I see in this space of machine vision.
Here are all 3 of them:
#1 – CloudIn a cloud based model, the expectation is that the actual media is streamed towards the cloud:
The data can be a video stream, or more often than not, it is just a set of captured images.
And that data gets classified in the cloud.
Here are two recent examples from a domain close to my heart – WebRTC.
At the last Kranky Geek event, Philipp Hancke shared how appear.in is trying to determine NSFW (Not Safe For Work):
The way this is done is by using Yahoo’s Open NSFW open source package. They had to resize images, send them to a server and there, using Python classify the image, determining if it is safe for work or not. Watch the video – it really is insightful at how to tackle such a project in the real world.
The other one comes from Chad Hart, who wrote a lengthy post about connecting WebRTC to TensorFlow for machine vision. The same technique was used – one of capturing still images from the stream and sending them towards a server for classification.
These approaches are nice, but they have their challenges:
This alternative is what we have today in smartphones and probably in modern room based video conferencing devices.
The camera is just the optics, but the heavy lifting takes place in the main processor that is doing other things as well. And since most modern CPUs today already have GPUs embedded as part of the SoC, and chip vendors are actively working on AI specific additions to chips (think Apple’s AI chip in the iPhone X or Google’s computational photography packed into the Pixel X phones).
The underlying concept here is that the camera is always tethered or embedded in a device that is powerful enough to handle the machine learning algorithms necessary.
They aren’t part of the camera but rather the camera is part of the device.
This works rather well, but you end up with a pricy device which doesn’t always make sense. Remember that our purpose here is to aim at having a larger number of camera sensors deployed and having an expensive computing device attached to it won’t make sense for many of the use cases.
#3 – In the CameraThis is the AWS DeepLens model.
TBD – IMAGE
The computing power needed to run the classification algorithms is made part of the camera instead of taking place on another CPU.
We’re talking about $249 right now, but assuming this approach becomes popular, prices should go down. I can easily see such devices retailing at $49 on the low end in 2-3 technology cycles (5 years or so). And when that happens, the power developers will have over what use cases can be created are endless.
Think about a home surveillance system that costs below $1,000 to purchase and install. It is smart enough to have a lot less false positives in alerting its users. AND can be upgraded in its classification as time goes by. There can be a service put in place behind it with a monthly fee that includes such things. You can add face detection and classification of certain people – alerting you when the kids come home or leave for example. Ignoring a stray cat that came into view of the camera. And this system is independent of an external network to run on a regular basis. You can update it when an external network is connected, but other than that, it can live “offline” quite nicely.
No Winning ModelYet.
All of the 3 models have their place in the world today. Amazon just made it a lot easier to get us to that third alternative of “in the camera”.
IoT and the CloudEdge computing. Fog computing. Cloud computing. You hear these words thrown in the air when talking about the billions of devices that will comprise the internet of things.
For IoT to scale, there are a few main computing concepts that will need to be decided sooner rather than later:
I was reading The Meridian Ascent recently. A science fiction book in a long series. There’s a large AI machine there called Big John which sifts through the world’s digital data:
“The most impressive thing about Big John was that nobody comprehended exactly how it worked. The scientists who had designed the core network of processors understood the fundamentals: feed sufficient information to uniquely identify a target, and then allow Big John to scan all known information – financial transactions, medical records, jobs, photographs, DNA, fingerprints, known associates, acquaintances, and so on.
But that’s where things shifted into another realm. Using the vast network of processors at its disposal, Big John began sifting external information through its nodes, allowing individual neurons to apply weight to data that had no apparent relation to the target, each node making its own relevance and correlation calculations.”
I’ve emphasized that sentence. To me, this shows the view of the same IoT network looking at it from a cloud perspective. There, the individual sensors and nodes need to be smart enough to make their own decisions and take their own actions.
–
All these words for a device that will only be launched April 2018…
We’re not there yet when it comes to IoT and the cloud, but developers are working on getting the pieces of the puzzle in place.
Interested in AI, vision and where it meets communications? I am going to cover this topic in future articles, so you might want to sign-up for my newsletter
Get my free content
The post AWS DeepLens and the Future of AI Cameras and Vision appeared first on BlogGeek.me.
Phosfluorescently utilize future-proof scenarios whereas timely leadership skills. Seamlessly administrate maintainable quality vectors whereas proactive mindshare.
Dramatically plagiarize visionary internal or "organic" sources via process-centric. Compellingly exploit worldwide communities for high standards in growth strategies.
Wow, this most certainly is a great a theme.
Donec sed odio dui. Nulla vitae elit libero, a pharetra augue. Nullam id dolor id nibh ultricies vehicula ut id elit. Integer posuere erat a ante venenatis dapibus posuere velit aliquet.
Donec sed odio dui. Nulla vitae elit libero, a pharetra augue. Nullam id dolor id nibh ultricies vehicula ut id elit. Integer posuere erat a ante venenatis dapibus posuere velit aliquet.